vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   Forum and Server Management (https://vborg.vbsupport.ru/forumdisplay.php?f=232)
-   -   Sphinx Search (https://vborg.vbsupport.ru/showthread.php?t=127868)

orban 09-29-2006 07:05 PM

Sphinx Search
 
1 Attachment(s)
Sphinx Implementation for vBulletin:

Version 0.1 Hooray!

Just sharing as usual, let the discussions begin (in b4 TECK "MINE IS BETTER")

Only tested with Sphinx-0.9.8-rc2 (r1234; Mar 29, 2008).

If you are upgrading from my old tutorial, backup your search.php (you know, just in case you need the old hacked up version again) and restore the original from the zip/tar, no more file modifications!

http://sphinxsearch.com/downloads.html

Tested on 3.6.10, should work on 3.7 if you modify /*insert query*/ on Line 522 (I removed 'prefixchoice' field because it doesn't exist in 3.6)

No support for tags/thread prefix yet, because I don't have access to a 3.7 installation at the moment

Similar threads is also being worked on

Alpha release for some feedback, hopefully it will be production ready soon :p

I assume you already have Sphinx up and running... see attached sphinx.conf.example for a minimalistic setup

Installation notes inside search_sphinx.php

Well yeah enjoy. And PM me if you need help

The old post is here: https://vborg.vbsupport.ru/showpost....&postcount=387

The Good:
  • Search this forum
  • Search this thread
  • Find all posts by User
  • Find all threads started by User
  • "Search Entire Posts"/"Search Titles Only" and "Show Results as Threads"/"Show Results as Posts" in all four combinations supported
  • "Search Entire Posts" can be sorted by rank/post.dateline (postuserid, forumid will sort by integer)
  • "Search Titles Only" can be sorted by rank, last reply date, first post date, number of replies (views if you add that value to sphinx.conf)
  • Really fast

The Bad:
  • This means you can't sort posts by title, number of replies/views, thread start date, last reply date (Sphinx doesn't have this data).*
  • You could possibly add this to sphinx.conf but it will only be as good as your last full post index update
  • "Find Threads with At Least/Most X Replies" doesn't work when "Search Entire Posts"
  • Search results are delayed (depending on how often you run indexer)
  • "New Posts" not supported... too much logic in the query?!

The Ugly:
  • Sorting is kinda messed up (especially when "Search Entire Posts" and "Show Results as Threads" are combined)
  • search_sphinx.php is messy, duplicated code from search.php

*The Infamous Post Sorting Quirk

What happens here is that when you "Search Entire Posts" and "Show Results as Threads", do you want you threads sorted by:
  • First post dateline (vBulletin option)
  • Last post dateline (vBulletin default)
  • The matching post dateline (Sphinx)

Our Sphinx setup does not have first post and last post dateline stored in its post index (and it would be pretty much useless too) so the first two options are not available. vBulletin offers a function called "sort_search_items()" (search.php:633 3.7) which could, in theory, be used to sort the threads by last post dateline.

It does not fix the problem though. Let's assume we set maxresults to 5. We are searching for threads for "funny". We have 7 threads created today:

1. Thread "Cows", Created 08:00, Last Post 17:00 | "Funny Cows", Created 09:00
2. Thread "Cats", Created 09:00, Last Post 14:00 | "Funny Cats", Created 14:00
3. Thread "Dogs", Created 10:00, Last Post 12:00 | "Funny Dogs", Created 11:00
4. Thread "Mice", Created 11:00, Last Post 15:00 | "Funny Mice", Created 13:00
5. Thread "Rats", Created 12:00, Last Post 13:00 | "Funny Rats", Created 12:00
6. Thread "Eels", Created 13:00, Last Post 19:00 | "Funny Eels", Created 18:00
7. Thread "Fish", Created 14:00, Last Post 18:00 | "Funny Fish", Created 17:00

Do we want to show threads 6, 7, 2, 4, 5 (Sphinx)? Or do we want to show threads 6, 7, 1, 4, 2 (vB)?

vBulletin finds all 7 posts, orders them by last post descending, and grabs the top 5.
Sphinx will find the newest 5 matching posts and then returns you the associated threads.

Reordering search results with "sort_search_items()" does not fix the problem because there might be older threads with very recent replies that Sphinx won't even consider. Let's consider an 8th thread:

8. Thread "Bees", Created 2002, Last Post 20:00 | "Funny Bees", Created 2002

vBulletin will list this one on top, Sphinx will not consider it. So even re-sorting the search items will not make this thread appear.

Adrian Schneider 09-29-2006 07:30 PM

Nice find! I'll play around with it once I get some time.

orban 09-29-2006 07:37 PM

Obviously the only options you will have on the advanced search page are:

Key Words:
Search In: Thread Titles/Posts
Sort Results by: Relevancy, Date Asc, Date Desc
Search in Forums:

And I guess searching by username will still be the built in way. (As in, without a search term, just list his posts.)

Gonna try to hack that up, when I make it work I'll release it I hope :)

But the fact you can index 4k posts/second is absolutely insane, and that was with 800 users online... :D

Paul M 09-29-2006 07:39 PM

Hmm, yes, that looks interesting, bookmarked for later. :)

orban 09-29-2006 07:50 PM

Also means I can remove that 400mb fulltext index from post table making MySQL even faster.

The right tool for the job. :)

Filtering by forumid already works, so does sorting by date.

And it still says 0.000003 seconds. Incredible.

forumdude 09-29-2006 08:20 PM

Hmm good timing. I got on here today to see if there were any other resources out there for searching and vbulletin and this showed up in the results.

We've had soooo much trouble keeping our search up. We're using the fulltext search right now with the search on its own server on tables reduced in size. Huge pain and it still doesn't return some results.

Keep us updated please, this looks cool.

forumdude 09-29-2006 09:36 PM

Awsome!

If I get some time tonight (probably not!) I will download Sphinx and give it a look.

What kind of data do you have to test this with?

We're looking at about 9 million records on our live post table (millions more archived). I'm very curious how well this would hold up to that amount of data.

mute 09-29-2006 10:26 PM

Can I get a peek at your sphinx.conf?

mute 09-29-2006 10:33 PM

wow, you are fast! thanks. I'm tossing it 24 million posts to see what it does :)

mute 09-29-2006 11:28 PM

*waits for post index to build*

So far so good. It ripped through 1,652,726 thread titles in about 2 minutes, on a machine replicating a very active forum, and one running a test upgrade from 3.5.5 to 3.6.1 :)

So far, I'm happy! I think with a little work this could be amazing. The api is a little unfriendly when it comes to errors and what not, but with some polishing and figuring out the targeting of searches and by name, and we're good to go.

Orban you are a hero among men!

Just FYI:

thread table:
collected 1658976 docs, 48.1 MB
sorted 5.1 Mhits, 100.0% done
total 1658976 docs, 48070959 bytes
total 148.426 sec, 323872.56 bytes/sec, 11177.16 docs/sec

post table:
collected 8860446 docs, 1416.9 MB
sorted 140.2 Mhits, 100.0% done
total 8860446 docs, 1416892676 bytes
total 3168.862 sec, 447129.84 bytes/sec, 2796.10 docs/sec

that is word length of 4 and no stopwords.


All times are GMT. The time now is 10:35 AM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01531 seconds
  • Memory Usage 1,760KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (2)pagenav_pagelink
  • (2)pagenav_pagelinkrel
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (10)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete