Go Back   vb.org Archive > Community Discussions > Forum and Server Management
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools Display Modes
  #1  
Old 09-29-2006, 07:05 PM
orban orban is offline
 
Join Date: Jan 2005
Posts: 445
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default Sphinx Search

Sphinx Implementation for vBulletin:

Version 0.1 Hooray!

Just sharing as usual, let the discussions begin (in b4 TECK "MINE IS BETTER")

Only tested with Sphinx-0.9.8-rc2 (r1234; Mar 29, 2008).

If you are upgrading from my old tutorial, backup your search.php (you know, just in case you need the old hacked up version again) and restore the original from the zip/tar, no more file modifications!

http://sphinxsearch.com/downloads.html

Tested on 3.6.10, should work on 3.7 if you modify /*insert query*/ on Line 522 (I removed 'prefixchoice' field because it doesn't exist in 3.6)

No support for tags/thread prefix yet, because I don't have access to a 3.7 installation at the moment

Similar threads is also being worked on

Alpha release for some feedback, hopefully it will be production ready soon

I assume you already have Sphinx up and running... see attached sphinx.conf.example for a minimalistic setup

Installation notes inside search_sphinx.php

Well yeah enjoy. And PM me if you need help

The old post is here: https://vborg.vbsupport.ru/showpost....&postcount=387

The Good:
  • Search this forum
  • Search this thread
  • Find all posts by User
  • Find all threads started by User
  • "Search Entire Posts"/"Search Titles Only" and "Show Results as Threads"/"Show Results as Posts" in all four combinations supported
  • "Search Entire Posts" can be sorted by rank/post.dateline (postuserid, forumid will sort by integer)
  • "Search Titles Only" can be sorted by rank, last reply date, first post date, number of replies (views if you add that value to sphinx.conf)
  • Really fast

The Bad:
  • This means you can't sort posts by title, number of replies/views, thread start date, last reply date (Sphinx doesn't have this data).*
  • You could possibly add this to sphinx.conf but it will only be as good as your last full post index update
  • "Find Threads with At Least/Most X Replies" doesn't work when "Search Entire Posts"
  • Search results are delayed (depending on how often you run indexer)
  • "New Posts" not supported... too much logic in the query?!

The Ugly:
  • Sorting is kinda messed up (especially when "Search Entire Posts" and "Show Results as Threads" are combined)
  • search_sphinx.php is messy, duplicated code from search.php

*The Infamous Post Sorting Quirk

What happens here is that when you "Search Entire Posts" and "Show Results as Threads", do you want you threads sorted by:
  • First post dateline (vBulletin option)
  • Last post dateline (vBulletin default)
  • The matching post dateline (Sphinx)

Our Sphinx setup does not have first post and last post dateline stored in its post index (and it would be pretty much useless too) so the first two options are not available. vBulletin offers a function called "sort_search_items()" (search.php:633 3.7) which could, in theory, be used to sort the threads by last post dateline.

It does not fix the problem though. Let's assume we set maxresults to 5. We are searching for threads for "funny". We have 7 threads created today:

1. Thread "Cows", Created 08:00, Last Post 17:00 | "Funny Cows", Created 09:00
2. Thread "Cats", Created 09:00, Last Post 14:00 | "Funny Cats", Created 14:00
3. Thread "Dogs", Created 10:00, Last Post 12:00 | "Funny Dogs", Created 11:00
4. Thread "Mice", Created 11:00, Last Post 15:00 | "Funny Mice", Created 13:00
5. Thread "Rats", Created 12:00, Last Post 13:00 | "Funny Rats", Created 12:00
6. Thread "Eels", Created 13:00, Last Post 19:00 | "Funny Eels", Created 18:00
7. Thread "Fish", Created 14:00, Last Post 18:00 | "Funny Fish", Created 17:00

Do we want to show threads 6, 7, 2, 4, 5 (Sphinx)? Or do we want to show threads 6, 7, 1, 4, 2 (vB)?

vBulletin finds all 7 posts, orders them by last post descending, and grabs the top 5.
Sphinx will find the newest 5 matching posts and then returns you the associated threads.

Reordering search results with "sort_search_items()" does not fix the problem because there might be older threads with very recent replies that Sphinx won't even consider. Let's consider an 8th thread:

8. Thread "Bees", Created 2002, Last Post 20:00 | "Funny Bees", Created 2002

vBulletin will list this one on top, Sphinx will not consider it. So even re-sorting the search items will not make this thread appear.
Attached Files
File Type: php search_sphinx.0.1.php (17.3 KB, 665 views)
File Type: txt sphinx.conf.example.txt (3.6 KB, 778 views)
Reply With Quote
  #2  
Old 09-29-2006, 07:30 PM
Adrian Schneider's Avatar
Adrian Schneider Adrian Schneider is offline
 
Join Date: Jul 2004
Posts: 2,528
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Nice find! I'll play around with it once I get some time.
Reply With Quote
  #3  
Old 09-29-2006, 07:37 PM
orban orban is offline
 
Join Date: Jan 2005
Posts: 445
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Obviously the only options you will have on the advanced search page are:

Key Words:
Search In: Thread Titles/Posts
Sort Results by: Relevancy, Date Asc, Date Desc
Search in Forums:

And I guess searching by username will still be the built in way. (As in, without a search term, just list his posts.)

Gonna try to hack that up, when I make it work I'll release it I hope

But the fact you can index 4k posts/second is absolutely insane, and that was with 800 users online...
Reply With Quote
  #4  
Old 09-29-2006, 07:39 PM
Paul M's Avatar
Paul M Paul M is offline
 
Join Date: Sep 2004
Location: Nottingham, UK
Posts: 23,748
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Hmm, yes, that looks interesting, bookmarked for later.
Reply With Quote
  #5  
Old 09-29-2006, 07:50 PM
orban orban is offline
 
Join Date: Jan 2005
Posts: 445
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Also means I can remove that 400mb fulltext index from post table making MySQL even faster.

The right tool for the job.

Filtering by forumid already works, so does sorting by date.

And it still says 0.000003 seconds. Incredible.
Reply With Quote
  #6  
Old 09-29-2006, 08:20 PM
forumdude's Avatar
forumdude forumdude is offline
 
Join Date: Nov 2001
Posts: 50
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Hmm good timing. I got on here today to see if there were any other resources out there for searching and vbulletin and this showed up in the results.

We've had soooo much trouble keeping our search up. We're using the fulltext search right now with the search on its own server on tables reduced in size. Huge pain and it still doesn't return some results.

Keep us updated please, this looks cool.
Reply With Quote
  #7  
Old 09-29-2006, 09:36 PM
forumdude's Avatar
forumdude forumdude is offline
 
Join Date: Nov 2001
Posts: 50
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Awsome!

If I get some time tonight (probably not!) I will download Sphinx and give it a look.

What kind of data do you have to test this with?

We're looking at about 9 million records on our live post table (millions more archived). I'm very curious how well this would hold up to that amount of data.
Reply With Quote
  #8  
Old 09-29-2006, 10:26 PM
mute mute is offline
 
Join Date: Dec 2002
Location: Phoenixville, PA
Posts: 152
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Can I get a peek at your sphinx.conf?
Reply With Quote
  #9  
Old 09-29-2006, 10:33 PM
mute mute is offline
 
Join Date: Dec 2002
Location: Phoenixville, PA
Posts: 152
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

wow, you are fast! thanks. I'm tossing it 24 million posts to see what it does
Reply With Quote
  #10  
Old 09-29-2006, 11:28 PM
mute mute is offline
 
Join Date: Dec 2002
Location: Phoenixville, PA
Posts: 152
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

*waits for post index to build*

So far so good. It ripped through 1,652,726 thread titles in about 2 minutes, on a machine replicating a very active forum, and one running a test upgrade from 3.5.5 to 3.6.1

So far, I'm happy! I think with a little work this could be amazing. The api is a little unfriendly when it comes to errors and what not, but with some polishing and figuring out the targeting of searches and by name, and we're good to go.

Orban you are a hero among men!

Just FYI:

thread table:
collected 1658976 docs, 48.1 MB
sorted 5.1 Mhits, 100.0% done
total 1658976 docs, 48070959 bytes
total 148.426 sec, 323872.56 bytes/sec, 11177.16 docs/sec

post table:
collected 8860446 docs, 1416.9 MB
sorted 140.2 Mhits, 100.0% done
total 8860446 docs, 1416892676 bytes
total 3168.862 sec, 447129.84 bytes/sec, 2796.10 docs/sec

that is word length of 4 and no stopwords.
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 06:51 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.08010 seconds
  • Memory Usage 2,293KB
  • Queries Executed 14 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)ad_showthread_firstpost
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)navbar
  • (3)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (2)pagenav_pagelink
  • (2)pagenav_pagelinkrel
  • (10)post_thanks_box
  • (10)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (10)post_thanks_postbit_info
  • (10)postbit
  • (2)postbit_attachment
  • (10)postbit_onlinestatus
  • (10)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_attachment
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete