Go Back   vb.org Archive > Community Discussions > Forum and Server Management
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools Display Modes
  #361  
Old 07-05-2007, 10:53 AM
orban orban is offline
 
Join Date: Jan 2005
Posts: 445
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Implementing Sphinx full-text search engine

Based on Sphinx 0.7.9 and vB 3.6.7 PL1. This means all file edits and config files are only tested with those two versions, it doesn't mean you cannot make Sphinx work with your vB 3.5 installation but it will require manual work on your side.

Known limitations
  1. You cannot filter by number of replies
    • Possible Fix: Add another "sql_group_column" holding the number of thread replies, the search will using the numbers of the last thread reindexing though (depending on your setup, hours to days old results).
  2. Sorting by title/number of replies/views/thread start date/username/forum isn't possible
    • Basically same issue as in (1.), Sphinx doesn't have the necessary data.
  3. You can only use Sphinx to perform queries that have a full text component. So searches by userid/forumid WITHOUT a key word are not possible. These searches can run on indices though so they shouldn't be an issue.
    • Workaround by kmike. "You can emulate the search by user in sphinx by adding a fake unique keyword per each member in the mix (e.g. "_userid_12345"). Searching by this keyword will return all posts by the member with userid 12345."
  4. Search Results out of order because the time stamps are too old
    • Sphinx doesn't query MySQL to get the latest time stamps. So if your thread had its last reply 3 days ago, was indexed by Sphinx 2 days ago and now today got a new reply, Sphinx will still assume its last reply was 3 days ago. In the search results, it will put waaay back instead being at the top. There is no easy fix for this, and certainly no fast one, because this is just what makes Sphinx so fast. We're sacrificing a bit of "up-to-date-ness" to gain speed. If are in desperate need of fixing this, kmike outlined a fix. Basically this will send a results to MySQL and sort it again, giving you up-to-date results by sacrificing speed. It's up to you to find out if it's worth it.
What Sphinx can do for you
  1. Incredibly fast full text searches on huge amounts of posts
    • It's really fast, really really really fast. Even on intersections of multiple keywords on several hundred thousand results.
  2. Replace forum search, search in this forum and search in this thread
    • Mimicking the default forum search for all but a few details
  3. Nearly instant indexing of new posts
    • Thanks to a special config file setup called "Live Updates"
Setting up Sphinx
  1. Grab Sphinx here: http://www.sphinxsearch.com/downloads.html and compile it
  2. Read a bit of the documentation to get familiar with it, might wanna peek in the installation bit
  3. Grab the sphinx.conf.txt at the end of this document (rename it to sphinx.conf). This is my configuration file. You have to, at least, fill in your database info and adjust the paths /.../
  4. You have to create a counter table that holds information about the last indexed post/thread for the Live Updates:
    Code:
    CREATE TABLE sph_counter
    (
        counter_id INTEGER PRIMARY KEY NOT NULL,
        max_doc_id INTEGER NOT NULL
    );
    You can either place this in the same database as vB or in a different one, but don' forget to adjust sphinx.conf accordingly then (prefixing sph_counter with your database name: yourdb.sph_counter)
Running Sphinx
  • You start Sphinx with "searchd --config /.../sphinx.conf" this will create a new process called "searchd".
  • Indexing documents is handled by "indexer". You have to make sure you know whether it's running or not before you start an indexing process, this is crucial.
    • searchd is running: use "indexer --rotate", it will create temporary new files and rotate them in so searching won't be broken
    • searchd isn't running: use "indexer" without rotating it will just replace your current files
  • For creating the full indices it is recommended to shut down Sphinx because it might take a while and your server will be quite busy (unless you run sphinx on a slave). Reindexing all posts and threads is done by "indexer --config /.../sphinx.conf --all" or "indexer --config /.../sphinx.conf --rotate --all" if searchd is running.
  • Creating the delta indices for Live Updates is issued by "indexer --config /.../sphinx.conf --rotate postdelta threaddelta"
  • You can test your indices with "search", the third executable installed by Sphinx. Call "search" and it tell you how to use it
Live Updates
  • You have to figure out a couple values: How often to re-index the whole thing, how often to re-index all threads, how often to do Live Updates for postdelta and threaddelta.
  • "indexer -all": I do this about once per week on a very un-busy time, usually manually.
  • I re-index all threads once per day, we just have 80k so this takes no time.
  • I recreate the delta indices every five minutes for both posts and threads so you have to wait between 1 and 5 minutes before your new threads/posts start showing up in search results.
  • I suggest adding cron jobs for those taks on *n*x, other OSes I don't know, can you even run Sphinx on Windows?
Plugging Sphinx into vBulletin
  1. Sadly enough this requires file modification. I'm checking every version if they finally added a way to plug in a different search system like WordPress for examples does, but no luck so far. There is 5 edits required, I listed them in search.php.txt at the end of this document for easier references and so you can save it for future use. You will be editing "/.../forums/search.php". Don't forget that every vB upgrade the file will be overwritten and you will have to apply the changes again.
  2. We also need sphinxapi.php, it's from "/.../src/sphinx-0.9.7/api" where your Sphinx source files are. Copy paste it to "/.../forums", where global.php lies.
  3. And last item is sphinx.php which will handle the search. Grab sphinx.txt.php and rename it to sphinx.php and put it into "/.../forums/includes". Open it and adjust the values on top. You can obviously move those files to where you want just don't forget to adjust paths.
  4. Because we cannot offer all search options vB default search can, I removed a couple lines from the "search_forums" template. They are listed in search_forums.txt at the end of this document.
Bugs and FixesContributions
Attached Files
File Type: txt search.php.txt (4.2 KB, 485 views)
File Type: txt search_forums.txt (2.0 KB, 381 views)
File Type: txt sphinx.php.txt (5.6 KB, 344 views)
File Type: txt sphinx.conf.txt (7.3 KB, 368 views)
Reply With Quote
  #362  
Old 07-05-2007, 11:03 AM
orban orban is offline
 
Join Date: Jan 2005
Posts: 445
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Can somebody give this a look, I tried to list some limitations/bugs/contributions that we are currently experiencing. Did I miss anything important?
Reply With Quote
  #363  
Old 07-05-2007, 06:22 PM
ekool ekool is offline
 
Join Date: Jun 2003
Posts: 49
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Orban,

Very nicely put together. I still have my older working Sphinx setup working (thanks to you and many others in here) so I have no need to change anything just yet, but thanks for the wonderful write-up!
Reply With Quote
  #364  
Old 07-07-2007, 07:18 PM
PSS PSS is offline
 
Join Date: Jul 2007
Posts: 9
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by orban View Post
Can somebody give this a look, I tried to list some limitations/bugs/contributions that we are currently experiencing. Did I miss anything important?
Couple of small things:

1. you did

CREATE TABLE sph_counter

but used sphinx_counter in sphinx.conf.

2. Then it would be great to have PREFIX_ where you would place your personal Vb table prefix. I added them there but it is not an easy task for those who do not know mysql syntax.

3. A step by step how to implement sort_search_items() would be nice.

Thanks for EXCELLENT work!

EDIT: Couple of things I would still like to know: when you have Sphinx search in place, do you need to have FULLTEXT index(es) in Vbulletin at all?

Also, is Sphinx used in "new posts" seach, too?
Reply With Quote
  #365  
Old 07-09-2007, 01:56 AM
TECK's Avatar
TECK TECK is offline
 
Join Date: Nov 2001
Location: Canada
Posts: 4,182
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

orban, I don't see any reference in vBulletin or Sphinx to 'timesegments':
Code:
elseif ($vbulletin->GPC['sortby'] == 'timesegments')
        $cl->SetSortMode ( SPH_SORT_TIME_SEGMENTS);
Is there something I miss? Thanks for explaining.
Also, if anyone got kmike's trick (for username is userid_12345) fixed into their configuration files, could you be kind and post here the actual code?

Thanks for taking the time to write this up.
Reply With Quote
  #366  
Old 07-09-2007, 03:37 PM
PSS PSS is offline
 
Join Date: Jul 2007
Posts: 9
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Another question: is there a way to check if searchd is running and if not, put text "search is offline" to the search page?
Reply With Quote
  #367  
Old 07-09-2007, 07:20 PM
TECK's Avatar
TECK TECK is offline
 
Join Date: Nov 2001
Location: Canada
Posts: 4,182
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Very easy.

PHP Code:
function get_searchd_status()
{
    return 
file_exists('/var/run/searchd.pid');
}

$searchd get_searchd_status();
if (
$searchd)
{
    
// searchd enabled;

I wrote the check as a function because you can use it in several areas, this way.

Now, back to my question. Can anyone help me with the username setup? I can't think how you can use a variable in Sphinx conf file... because you cannot. Obviously I`m wrong, kmike did it but unfortunatelly he is not available.
Reply With Quote
  #368  
Old 07-09-2007, 08:21 PM
orban orban is offline
 
Join Date: Jan 2005
Posts: 445
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

<a href="http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat" target="_blank">http://dev.mysql.com/doc/refman/5.0/...unction_concat</a>

The query that grabs the posts, use two concats:

CONCAT( post, ' ', CONCAT( 'userid_', userid ) )

untested, but I hope you get the idea. Then you need to modify search.php to transform a given userid into the string...
Reply With Quote
  #369  
Old 07-10-2007, 01:51 AM
TECK's Avatar
TECK TECK is offline
 
Join Date: Nov 2001
Location: Canada
Posts: 4,182
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Aha, thanks orban. What I want to do is this:
If an user wants to search for all threads/posts related to a specific user, he enters a username then leaves the search field empty. The results will show all threads started by that user, ordered the way you like it in Sphinx.

Anyone wants to work with me on this project? I PM'ed kmike, hoping he will join us... since he is the only one who managed to fix this, not to mention other little extras.

Quote:
Originally Posted by DaiTengu View Post
You wouldn't happen to have an easy way to implement that, would you? My PHP knowledge is somewhat lacking
Code:
/***
 * Removes duplicate keys and orders thread id's by lastpost
 *
 * @param	array	record id's to be ordered
 *
 * @return	array
 */
function sort_thread_ids($keys)
{
	global $vbulletin;

	$itemids = array();

	$items = $vbulletin->db->query_read_slave("
		SELECT threadid FROM " . TABLE_PREFIX . "thread AS thread
		WHERE threadid IN (" . implode(',', array_keys($keys)) . ")
		ORDER BY lastpost " . $vbulletin->GPC['sortorder'] . "
	");

	while ($item = $vbulletin->db->fetch_array($items))
	{
		$itemids[] = $item['threadid'];
	}

	unset($item);
	$vbulletin->db->free_result($items);

	return $itemids;
}
Then you call it anywhere you like:
Code:
if (!$vbulletin->GPC['showposts'] AND $vbulletin->GPC['sortby'] == 'lastpost')
{
	$orderedids = sort_thread_ids($orderedids);
}
Can you post results on your busy boards and let me know how it impacts the performance?
The function above has less processing code then the original sort_search_items() function.

The PHP BBCode at vb.org is screwed, it breaks the code lines. Switched back to Code, much better.
Reply With Quote
  #370  
Old 07-10-2007, 06:12 AM
amcd amcd is offline
 
Join Date: Oct 2004
Posts: 218
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by TECK View Post
Very easy.

PHP Code:
function get_searchd_status()
{
    return 
file_exists('/var/run/searchd.pid');
}

$searchd get_searchd_status();
if (
$searchd)
{
    
// searchd enabled;

I wrote the check as a function because you can use it in several areas, this way.

Now, back to my question. Can anyone help me with the username setup? I can't think how you can use a variable in Sphinx conf file... because you cannot. Obviously I`m wrong, kmike did it but unfortunatelly he is not available.
this will not work in a multi-server setup.
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 06:20 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.05693 seconds
  • Memory Usage 2,329KB
  • Queries Executed 14 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)ad_showthread_firstpost
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (4)bbcode_code
  • (2)bbcode_php
  • (3)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)navbar
  • (3)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (2)pagenav_pagelinkrel
  • (10)post_thanks_box
  • (10)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (10)post_thanks_postbit_info
  • (10)postbit
  • (4)postbit_attachment
  • (10)postbit_onlinestatus
  • (10)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_attachment
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete