PDA

View Full Version : Sphinx Search


Pages : 1 [2] 3 4

orban
12-02-2006, 08:15 AM
You realize that there is a difference between sorting in 0.9.6 and 0.9.7-RC1? They are NOT compatible.

mute
12-02-2006, 10:25 PM
Yeah, they're supposed to be sorted by date. Apparently I'm not the only person having the problem.

You aren't alone. We too seem to get out of order results when searching, and not sorting by relevance.

mute
12-05-2006, 12:46 AM
Is there any hope of Sphinx handling "Find all posts by user" searches?

adalren
12-05-2006, 05:31 AM
I found a bug. When searching in title only, it doesn't honor datecut.

To fix, change:

// get posts from before $datecut
case 'before':
$cl->SetFilterRange('dateline', 0, $datecut);
break;

// get posts from after $datecut
default:
$cl->SetFilterRange('dateline', $datecut, 0xFFFFFFFF);

to:

// get posts from before $datecut
case 'before':
$cl->SetFilterRange($sphinx_sort_by_date, 0, $datecut);
break;

// get posts from after $datecut
default:
$cl->SetFilterRange($sphinx_sort_by_date, $datecut, 0xFFFFFFFF);


Another thing I found is that vb caches the search results and it tries to find the closest or exact match. There's no problem with exact matches, but when it saves the results sorted in ascending date order the subsequent searches in reverse order shows only old threads. To fix this, just comment out the $highScore = 1 & highScore = 2 lines in search.php. This disables using the stale cache for non-exact matches.

Thanks for the great hack orban!

orban
12-15-2006, 02:42 PM
<a href="http://www.sphinxsearch.com/index.html" target="_blank">http://www.sphinxsearch.com/index.html</a>

RC2 released, I'll upgrade tomorrow and see if there are any changes for us.

-----------

Recreated index, copied over new sphinxapi.php seems to work okay. There's a new "extended" search mode but don't think I'll use that (too complicated for users anyway).

mute
12-19-2006, 05:56 PM
Hm. So all was going well, but out of the blue our subforum searches stopped working. If i don't specify a subforum, the searches work. If I do, I get an assertion failure in sphinxapi.php @ line 290 (with 0.9.7-rc2).

I ran into the problem on -rc1, and decided to upgrade to see if it had been fixed, but it has not. I'm a tad stumped.

Edit: I seem to have fixed it by adding a "$value = intval($value);" before the assert() in sphinxapi.php. Guess this is related to the assertion failures earlier, so much for not having to cast variables :)

orban
12-19-2006, 06:07 PM
Tried to do an intval() on the forumid?

amcd
12-19-2006, 06:21 PM
@mute, good that you solved the problem, but i would not edit sphinxapi.php

since more people are facing the same problem, let me post an easy to follow solution. this is what i did to solve the problem on my forums, after reading the conversation between alanjay and orban

in includes/sphinx.php, around line 44, change

from: if ( count ( $sphinx_forumids ) > 0 ) $cl->SetFilter ( 'forumid', $sphinx_forumids );
if ( count ( $sphinx_userids ) > 0 ) $cl->SetFilter ( 'postuserid', $sphinx_userids );


to if ( count ( $sphinx_forumids ) > 0 ) $cl->SetFilter ( 'forumid', intvalarray($sphinx_forumids) );
if ( count ( $sphinx_userids ) > 0 ) $cl->SetFilter ( 'postuserid', intvalarray($sphinx_userids) );


around line 69, change

from if ( count ( $sphinx_forumids ) > 0 ) $cl->SetFilter ( 'forumid', $sphinx_forumids );
if ( count ( $sphinx_threadids ) > 0 ) $cl->SetFilter ( 'threadid', $sphinx_threadids );
if ( count ( $sphinx_userids ) > 0 ) $cl->SetFilter ( 'userid', $sphinx_userids );
if ( count ( $sphinx_postuserids ) > 0 ) $cl->SetFilter ( 'postuserid', $sphinx_postuserids );


to if ( count ( $sphinx_forumids ) > 0 ) $cl->SetFilter ( 'forumid', intvalarray($sphinx_forumids) );
if ( count ( $sphinx_threadids ) > 0 ) $cl->SetFilter ( 'threadid', intvalarray($sphinx_threadids) );
if ( count ( $sphinx_userids ) > 0 ) $cl->SetFilter ( 'userid', intvalarray($sphinx_userids) );
if ( count ( $sphinx_postuserids ) > 0 ) $cl->SetFilter ( 'postuserid', intvalarray($sphinx_postuserids) );


and lastly, around line 157, right at the end of the file, change

fromunset($cl, $res, $doc, $docinfo);
?>


tounset($cl, $res, $doc, $docinfo);

// these functions added by me on 16 nov 06 to avoid assert failed errors from sphinxapi.php
function intvalarray($thearray)
{
array_walk($thearray, "intvalitem");
return $thearray;
}

function intvalitem(&$item, $key)
{
$item = intval($item);
}

?>

mute
12-19-2006, 06:41 PM
Thanks amcd, that is indeed a better solution. I've made the changes and all appears to be working well :)

ubuntu-geek
01-02-2007, 02:21 PM
Just out of curiosity.. Are people using the standard vb search or full text in conjunction with their sphinx implementations?

orban
01-02-2007, 02:35 PM
full text so vB doesn't populate its search tables (at least me, I just noticed that this is actually not mentioned in the guide)

ubuntu-geek
01-02-2007, 02:40 PM
great thanks.. :)

Orban have you updated to sphinx rc2 yet?

amcd
01-03-2007, 01:43 AM
orban, i think you should clearly mention this in the tutorial. if vb is not switched to fulltext, there will be hardly any benefit from sphinx. also, the fulltext indices should be dropped, otherwise mysql will keep updating them and waste time.

mute
01-03-2007, 07:28 PM
So, is there any hope to having sphinx handle searching for all of a users posts? I forget what we determined earlier in the thread. Despite sphinx being fast, I'm still seeing slowdowns related to doing the "find all posts by user" searches :(

orban
01-03-2007, 09:32 PM
I don't think Shodan (from sphinx) has implement keyword-less queries yet but he plans to afaik.

kmike
01-04-2007, 06:58 AM
You can emulate the search by user in sphinx by adding a fake unique keyword per each member in the mix (e.g. "_userid_12345"). Searching by this keyword will return all posts by the member with userid 12345.

ubuntu-geek
01-16-2007, 12:51 PM
I've noticed a few people talking about the sorting of searches being off. I am having the same issue, has anyone found a fix for it yet?

mute
01-16-2007, 01:32 PM
I've noticed a few people talking about the sorting of searches being off. I am having the same issue, has anyone found a fix for it yet?

Not I. I can't say i've looked into it, but a magical fix would be awesome.

DigitalCrowd
01-18-2007, 06:50 PM
I think I figured out the problem, but not sure I know how to fix it at this very moment.

The sphinx.conf file that is being distributed in this thread builds the date_column as "dateline" for post index, and "lastpost" for thread index. I noticed the output from Sphinx is in proper order by dateline, but since the two indexes are not being given the same date source, then your threads will be out of order on search results when, I assume, that it groups the posts into the threads and the output displays the last post date of the thread and not the "post date" of the post that your search matched.

While you could fetch the lastpost date of a thread that is associated with the post and this way you use one date column unique across the indexes, my assumption is that unless you rebuild the full index (not deltas) constantly, that your searches will still be messed up.

We already know that when we say we want 1000 results and 7000 documents match, that we may only have 853 results, once all posts are grouped under specific threads.

I will have to think about this one.

Wait, hold on, things are coming to quickly...

Example:

You do a search on "Trees", and it finds 10 posts with the word "Trees" in it. For the sake of this discussion, "Trees" is only in one post in each thread. The results come back and order in DESC order, all the post's dateline.

Well, this is great, except, additional posts to those threads may have happened, and as such the search results are all out of order, because the search returns the dateline order, not the lastpost order of the associated thread.

This ALSO explains why when each of us first installed this and indexed our boards, that everything worked perfectly, because it was a brand new index. But, once you start building onto that index that is when things go astray.

I believe this is what is causing it all, but I might be missing something.

amcd
01-18-2007, 07:13 PM
DigitalCrowd, I think you have hit the nail right on the head. I rebuild my full index everyday, and my results are just slightly out of order. And the thread which are our of order are the ones which have been updated today, after the last rebuild.

So, how do we fix this problem? Once the search results have been received from sphinx, we then re-sort them by lastpost if the user has requested 'show results as threads'?

DigitalCrowd
01-18-2007, 07:37 PM
Well, then you get into having to fetch the current lastpost field of all matching threads and that would be, on larger boards, significant overhead. Now you move away from just searching Sphinx, to now search the database as well, then sorting your resulting arrays and pretty soon... you have a SLOW search again.

After further testing..

I rebuilt my index, search results in order. I did a reply to a post about the third the way down, not using the word "the" (my search word) in it. Now, when I do a search for "the", the third post down is out of order.

The ONLY way for a thread to get bumped up, is if the search word has been used again at a later time in that thread.

The Best way to do search, IMHO is to give more weight to recent threads, but get away from sort by date search results. Even Google doesn't offer this, but we are so accustomed to it in the forum world that getting people to break from it is hard. The best way would be to optimize how Sphinx (or the code that makes the call to Sphinx) weighs results.

I did notice on the Sphinx Forums that the sort mode "SPH_SORT_TIME_SEGMENTS" was made to address this and that if it doesn't work to our liking, it can be modified to perform better. I will have too look into it.

Without adding overhead to the search process, I think the days of instant lastpost sorting are gone, unless you can rebuild your full index every 15 minutes or so and for some people, that index process might last that long or longer.

kontrabass
01-25-2007, 08:35 PM
Boy, if ever a hack was ripe for a commercial opportunity... ;) A no-fuss, easy-to-install sphinx search for VBulletin with full search functionality and smart results ordering - how many big boarders would pay big money for this? (I would... it'd be a lot cheaper than buying another new server... :cool: ). I certainly HOPE this free development continues, but I look forward to the time when bugs like this aren't an issue.

stinger2
01-29-2007, 06:26 PM
Boy, if ever a hack was ripe for a commercial opportunity... ;) A no-fuss, easy-to-install sphinx search for VBulletin with full search functionality and smart results ordering - how many big boarders would pay big money for this? (I would... it'd be a lot cheaper than buying another new server... :cool: ). I certainly HOPE this free development continues, but I look forward to the time when bugs like this aren't an issue.

bookmarking this

Nerudo
01-29-2007, 07:32 PM
It´s amazing. I´m bookmarking too.

jason|xoxide
02-14-2007, 05:34 PM
I wouldn't classify this as a bug, just more of an oversight. The max_matches variable in the sphinx.conf file is ignored when using the PHP API script. It doesn't matter if it's left at the default 1000, the 1500 that orban's file has been modified to use, or 1000000 as the config file says not to do. If you want to change the number of results returned then you need to change line 15 of 'includes/sphinx.php'.

Old:
$cl->SetLimits ( 0, $vbulletin->options['maxresults'] );

New (replace '2500' with the number you want):
$cl->SetLimits ( 0, $vbulletin->options['maxresults'], 2500 );

Otherwise, great work! This has really sped up searching on the forums I have used for testing (6K posts, 820K posts, and 3.2M posts).

eoc_Jason
02-19-2007, 06:22 PM
Just installed on a forum with ~5.4 million posts... Before some searches would literally take forever (I had a query kill script that would kill the thread after a minute), now searches come back in less than a second! :)

Slow searches are a killer on a forum since it locks the post table. Also users seem to have a habit of clicking the search button multiple times if results aren't returned within a few seconds.

One thing, search results I've noticed don't always come back sorted by date properly. I skimmed over a few posts talking about this, I guess I need to go back and read it more in-depth. I'm sure the fix wouldn't be too hard.

Here's the data from the initial build, even with such a large index file it is still super fast.

indexing index 'post_index'...
collected 5562411 docs, 1881.1 MB
sorted 189.9 Mhits, 100.0% done
total 5562411 docs, 1881073642 bytes
total 523.946 sec, 3590205.25 bytes/sec, 10616.38 docs/sec
indexing index 'post_index_delta'...
collected 75 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 75 docs, 18407 bytes
total 0.294 sec, 62659.52 bytes/sec, 255.31 docs/sec
indexing index 'thread_index'...
collected 357824 docs, 10.6 MB
sorted 1.2 Mhits, 100.0% done
total 357824 docs, 10635814 bytes
total 17.875 sec, 595012.62 bytes/sec, 20018.19 docs/sec
indexing index 'thread_index_delta'...
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.010 sec, 0.00 bytes/sec, 0.00 docs/sec
skipping index 'fulltext_post_index' (distributed indexes can not be directly indexed)...
skipping index 'fulltext_thread_index' (distributed indexes can not be directly indexed)...

kontrabass
02-19-2007, 08:58 PM
Just installed on a forum with ~5.4 million posts... Before some searches would literally take forever (I had a query kill script that would kill the thread after a minute), now searches come back in less than a second! :)

Slow searches are a killer on a forum since it locks the post table. Also users seem to have a habit of clicking the search button multiple times if results aren't returned within a few seconds.

One thing, search results I've noticed don't always come back sorted by date properly. I skimmed over a few posts talking about this, I guess I need to go back and read it more in-depth. I'm sure the fix wouldn't be too hard.

Here's the data from the initial build, even with such a large index file it is still super fast.

Thanks for the report! Mind if I ask a couple questions (gathering info to see where I would stand): Are you running a slave DB server for searches? What kind of hardware is behind your database? Thanks :)

Mickie D
02-19-2007, 09:39 PM
this is fantastic well done all involved :)

im having a small issue i need some help with

1) if i issue the search command from ssh it gives me alot of results with words and if i do a search in the forums it gives me 1 or 2 results which i know there is alot more

2) i have a few words that i like included in the vboptions that are 3 letter words how do i enable them in sphinx without enabling all 3 letter words ?

any ideas ?

cheers

mute
02-20-2007, 04:58 PM
Well, then you get into having to fetch the current lastpost field of all matching threads and that would be, on larger boards, significant overhead. Now you move away from just searching Sphinx, to now search the database as well, then sorting your resulting arrays and pretty soon... you have a SLOW search again.

After further testing..

I rebuilt my index, search results in order. I did a reply to a post about the third the way down, not using the word "the" (my search word) in it. Now, when I do a search for "the", the third post down is out of order.

The ONLY way for a thread to get bumped up, is if the search word has been used again at a later time in that thread.

The Best way to do search, IMHO is to give more weight to recent threads, but get away from sort by date search results. Even Google doesn't offer this, but we are so accustomed to it in the forum world that getting people to break from it is hard. The best way would be to optimize how Sphinx (or the code that makes the call to Sphinx) weighs results.

I did notice on the Sphinx Forums that the sort mode "SPH_SORT_TIME_SEGMENTS" was made to address this and that if it doesn't work to our liking, it can be modified to perform better. I will have too look into it.

Without adding overhead to the search process, I think the days of instant lastpost sorting are gone, unless you can rebuild your full index every 15 minutes or so and for some people, that index process might last that long or longer.

Digital,

Have you managed to find a fix for this short of doing a full reindex? We're still just doing incremental updates, but I am still pretty annoyed about the "out of order" results.

kmike
02-20-2007, 05:49 PM
Well, then you get into having to fetch the current lastpost field of all matching threads and that would be, on larger boards, significant overhead. Now you move away from just searching Sphinx, to now search the database as well, then sorting your resulting arrays and pretty soon... you have a SLOW search again.
....
Without adding overhead to the search process, I think the days of instant lastpost sorting are gone, unless you can rebuild your full index every 15 minutes or so and for some people, that index process might last that long or longer.
Actually there is a function in vB which does exactly that (sorts the results by the specified field):
sort_search_items() in includes/functions_search.php
It's just one query, and it runs on a slave server if it's set up. Also its overhead depends only on the number of returned search results.
And you don't even need to run through it on every search, only when listing the results as threads, sorted by lastpost.

Mickie D
02-21-2007, 08:58 PM
anyone know why its not getting the full amount of results ?

if i search with command line its getting houndreds and if i search from the forums its finding 1 or 2 things which i know there is more of ?

raywjohnson
03-01-2007, 08:38 PM
if you're interested in an implementation
https://vborg.vbsupport.ru/showpost.php?p=1104866

Thank you for the great instructions. A few hours work ( I am very new to vBulletin ), plenty of Google searching, and another successful installation of Sphinx! Forum "write-lock" issues have disappeared. Thanks again!

Forgot to ask, is there any kind of setting to block indexing before a certain date? One of my admins informed me that he cannot find any post prior to 2005.

Later, RayJ

raywjohnson
03-04-2007, 01:58 AM
I wouldn't classify this as a bug, just more of an oversight. The max_matches variable in the sphinx.conf file is ignored when using the PHP API script. It doesn't matter if it's left at the default 1000, the 1500 that orban's file has been modified to use, or 1000000 as the config file says not to do. If you want to change the number of results returned then you need to change line 15 of 'includes/sphinx.php'.

Old:
$cl->SetLimits ( 0, $vbulletin->options['maxresults'] );

New (replace '2500' with the number you want):
$cl->SetLimits ( 0, $vbulletin->options['maxresults'], 2500 );

Otherwise, great work! This has really sped up searching on the forums I have used for testing (6K posts, 820K posts, and 3.2M posts).

Little help!

Most likly I am just misunderstanding the results.
Here are the results from a command line search: (displaying matches: snipped)

[root@MYSERVER ~/]# search test

Sphinx 0.9.7-RC2
Copyright (c) 2001-2006, Andrew Aksyonoff

index 'mypostidx': query 'test': returned 1000 matches of 191296 total in 0.029 sec

words:
1. 'test': 191296 documents, 475585 hits
index 'mypostidxdelta': query 'test': returned 56 matches of 56 total in 0.000 sec

words:
1. 'test': 56 documents, 134 hits
index 'mythreadidx': query 'test': returned 1000 matches of 2847 total in 0.154 sec

words:
1. 'test': 2847 documents, 2879 hits
index 'mythreadidxdelta': query 'test': returned 0 matches of 0 total in 0.000 sec

But, searching in the forum: (show posts/search entire post)
Search: Key Word(s): test Showing results 1 to 40 of 392

(I created a huge test forum and it has many more posts with the word "test" than 392)

And using test.php
php test.php test
Query failed: searchd error: index 'mythreadidx': incompatible schemas: non-virtual attributes count mismatch: 4 in schema '/var/sphinx/mythreadidx', 5 in schema '/var/sphinx/mypostidx'.

On the forum, the search never returns any more that 400 results (i.e. Showing: 40 of 400). I cannot find a setting that cuts off the results at 400.

I have read through this thread (twice!) and made suggested changes to sphinx.php [$cl->SetLimits()] and sphinx.conf (max_matches) with no change to the results.
(I also searches the "Common Forum" at sphinxsearch.com, no luck!)

Any insight into this would be most appreciated!

Later, RayJ

orban
03-04-2007, 10:20 AM
Have you restarted searchd?

UK Jimbo
03-04-2007, 12:37 PM
There were some problems with the assert function around post #280.

My solution to these was to turn assert warnings off using the single line of code:

assert_options(ASSERT_ACTIVE, false);

You can do this in the api file or in sphinx.php

I've just re-indexed with the post table at a min word length of 3. As you can see from the command line the process was niced at 20 and there were 400 active users on the site. I'm hugely impressed by this implemention.

sphinx@new [/usr/local/etc]# nice -n 20 indexer --config /usr/local/etc/sphinx.conf --rotate --all
Sphinx 0.9.7-RC2
Copyright (c) 2001-2006, Andrew Aksyonoff

using config file '/usr/local/etc/sphinx.conf'...
indexing index 'post'...
collected 4180518 docs, 1218.2 MB
sorted 165.8 Mhits, 100.0% done
total 4180518 docs, 1218224054 bytes
total 632.617 sec, 1925689.34 bytes/sec, 6608.29 docs/sec
indexing index 'post_delta'...
collected 22 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 22 docs, 6315 bytes
total 0.012 sec, 545974.95 bytes/sec, 1902.05 docs/sec
indexing index 'thread'...
collected 253498 docs, 7.0 MB
sorted 0.8 Mhits, 100.0% done
total 253498 docs, 6961147 bytes
total 4.818 sec, 1444710.55 bytes/sec, 52610.76 docs/sec
indexing index 'thread_delta'...
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.014 sec, 0.00 bytes/sec, 0.00 docs/sec
skipping index 'fulltext_post' (distributed indexes can not be directly indexed)...
skipping index 'fulltext_thread' (distributed indexes can not be directly indexed)...
rotating indices: succesfully sent SIGHUP to searchd (pid=8972).

UK Jimbo
03-04-2007, 10:18 PM
Another post from me (might get auto-merged)...

I wanted to easily see the query.log that searchd creates, I always think it's good to try to give something back to a project that you like too :)


Place the attached file sphinx_search_log.php to your admincp directory, edit the reference to query.log if necessary
Place the attached file cpnav_sphinx_search_log.xml to your includes/xml directory


It's really that easy.

Now look in your AdminCP menu system under Statistics & Logs for Sphinx Search Log :)

raywjohnson
03-05-2007, 08:45 PM
Have you restarted searchd?

Thanks for the help!

After I read your post I restarted searchd and preformed command line and forum search again, but the results were the same. I indexed -all as well and tried again, no joy!

Do you know of any settings in vBulletin that would limit the search results?

Later, RayJ

eoc_Jason
03-08-2007, 05:15 PM
Thanks for the report! Mind if I ask a couple questions (gathering info to see where I would stand): Are you running a slave DB server for searches? What kind of hardware is behind your database? Thanks :)

No, just one single server. Dual Woodcrest (4 cores total) & 4GB RAM. A slave server was an initial consideration, but even if there was a slave server the mysql fulltext searches would not have run any faster really. It would of only alieviated the locking on the master db.

Depending on what people searched before, queries could of taken several minutes (and we all know nobody waits that long for a web page to load). Queries like that would cause the post table to be locked and thus anyone trying to post would of also been sitting waiting until the search compelted. Usually people got impatient too and would click the search button several times, only queuing up the searches even more.

Until I installed this sphinx search mod, the only course of action was to have a custom script that would kill any search queries that took over 60 seconds (to prevent the issues above).

Yes everything on the server was extremely optimized, and I even had to set the mysql fulltext min characters to 5, and max to like 12-15 I think it was.

Basically the only two things out there is sphinx & mnogosearch mods for vB. I chose this one because it was the most transparent. Now searches are usually done in way under 1 second, and even though the results could sometimes be out of (date) order, it still works a million times better than before.

I plan on installing this on my own forum too, as searches are starting to cause issues.

Searching is one of the last weak points of vBulletin and really needs to be addressed.

Thanks for the help!

After I read your post I restarted searchd and preformed command line and forum search again, but the results were the same. I indexed -all as well and tried again, no joy!

Do you know of any settings in vBulletin that would limit the search results?

Did you increase your max results in both the sphinx and the other php api file? Also I think vBulletin might have some limits in the control panel. I don't know which all are used for searching as I haven't had time to really look through all the underlying code with sphinx. I just got it up and running and have been letting it do it's own thing.

vB also does a lot of weighting and will toss out low results (irritating as it can produce no results even when there are). I do not know if this is still used with sphinx though, if it is then that probably explains your issue.

raywjohnson
03-09-2007, 01:31 AM
Did you increase your max results in both the sphinx and the other php api file? Also I think vBulletin might have some limits in the control panel. I don't know which all are used for searching as I haven't had time to really look through all the underlying code with sphinx. I just got it up and running and have been letting it do it's own thing.

vB also does a lot of weighting and will toss out low results (irritating as it can produce no results even when there are). I do not know if this is still used with sphinx though, if it is then that probably explains your issue.

I did make the changes to both files, still no change. But... your post promoted me to look deeper into a possible limit imposed by vBulletin.

Eureka! :D vBulletin Control Panel -> vBulletin Options -> Message Searching Options -> Maximum Search Results to Return (was set to 400 now set to 9000) Worked perfectly! :D

Thank you! And thanks to all who worked on helping get Sphinx Search working on vBulletin! Extra thanks to orban!

Later, RayJ

eoc_Jason
03-09-2007, 04:19 PM
Glad to hear you found it, I guess I should go back and check what I have it set to also.

Ah, post #301 was what I was referring to before. Anyhow, glad you figured out what it was. I guess all three of those settings need to be the same for the most optimal results.

This is a true must-have for any large forum, the mysql fulltext index search goes painfully slow after you exceed a certain number of posts, and the other vB search feature never worked all that wll for me.

This would really be the next big thing I would like to see vB integrate into new versions. They already support things like other datastore caches, why not other search engines?

Anyhow, I'm about to tackle another install, this time on my forum. Should go smoother than the first time now that I know all the ins & outs. The biggest thing is just making sure you rename everything properly in the config files.

Mb81
03-11-2007, 02:08 PM
I got some large forums and i try to use Sphinx now.

Problem 1.)

using config file '/usr/local/sphinx/etc/sphinx.conf'...
WARNING: index 'vbpost': failed to preload schema and docinfos - NOT SERVING
WARNING: index 'vbpostindex': failed to preload schema and docinfos - NOT SERVING
WARNING: index 'vbthreadindex': failed to preload schema and docinfos - NOT SERVING
WARNING: index 'vbthreadindexdelta': failed to preload schema and docinfos - NOT SERVING
WARNING: index 'vbfulltext': no such local index 'vbpost' - NOT SERVING
WARNING: index 'vbfulltext': no such local index 'vbpostindex' - NOT SERVING
WARNING: index 'vbfulltext': no valid local/remote indexes in distributed index - NOT SERVING
WARNING: index 'vbfulltextthread': no such local index 'vbthreadindex' - NOT SERVING
WARNING: index 'vbfulltextthread': no such local index 'vbthreadindexdelta' - NOT SERVING
WARNING: index 'vbfulltextthread': no valid local/remote indexes in distributed index - NOT SERVING

The stuff was build, still i get this. What todo ?

Problem 2.)
Can it be used for multiple forums on the same server ?

eoc_Jason
03-12-2007, 05:27 PM
You probably have a configuration error, I would double-check the conf file and compare to the one in the post. One little mistake and the whole thing breaks (took me forever to find the one line I missed).

I don't see why you couldn't use it for multiple forums, just create more things in the config file with different names connecting to the different databases.

Mb81
03-12-2007, 06:09 PM
Here it is. I don?t see any mistake.
It would be really nice if someone could confirm it. Thanks alot.


#
# sphinx configuration file sample
#

################################################## ###########################
## data source definition
################################################## ###########################

source src1
{
type = mysql
strip_html = 0
sql_host = localhost
sql_user = root
sql_pass = xx
sql_db = xx
sql_port = 3306


sql_query_pre = REPLACE INTO sphinx.sphinx_counter SELECT 1, MAX(postid) FROM post
sql_query_range = SELECT MIN(postid), MAX(postid) FROM post
sql_range_step = 1000
sql_query = \
SELECT postid, forumid, post.threadid as threadid, IF(post.userid=0,99999999,post.userid) AS userid, IF(postuserid=0,99999999,postuserid) AS postuserid, post.title, pagetext, post.dateline \
FROM post \
INNER JOIN thread AS thread ON(thread.threadid = post.threadid) \
WHERE post.visible = 1 AND postid >= $start AND postid <= $end \
AND postid <= (SELECT max_doc_id FROM sphinx.sphinx_counter WHERE counter_id = 1);

sql_group_column = forumid
sql_group_column = threadid
sql_group_column = userid
sql_group_column = postuserid
sql_date_column = dateline

sql_query_post =
}

source src2 : src1
{

sql_query_pre =
sql_query_range = SELECT ( SELECT max_doc_id FROM sphinx.sphinx_counter WHERE counter_id = 1 ), MAX(postid) FROM post
sql_range_step = 1000
sql_query = \
SELECT postid, forumid, post.threadid as threadid, IF(post.userid=0,99999999,post.userid) AS userid, IF(postuserid=0,99999999,postuserid) AS postuserid, post.title, pagetext, post.dateline \
FROM post \
INNER JOIN thread AS thread ON(thread.threadid = post.threadid) \
WHERE post.visible = 1 AND postid >= $start AND postid <= $end \
AND postid > ( SELECT max_doc_id FROM sphinx.sphinx_counter WHERE counter_id = 1 );
}


source src3
{
type = mysql
strip_html = 0
sql_host = localhost
sql_user = root
sql_pass = xx
sql_db = xx
sql_port = 3306


sql_query_pre = REPLACE INTO sphinx.sphinx_counter SELECT 2, MAX(threadid) FROM thread
sql_query_range = SELECT MIN(threadid), MAX(threadid) FROM thread
sql_range_step = 1000
sql_query = \
SELECT threadid, forumid, title, IF(postuserid=0,99999999,postuserid) AS postuserid, IF(firstpostid=0,99999999,firstpostid) as firstpostid, lastpost \
FROM thread \
WHERE visible = 1 AND threadid >= $start AND threadid <= $end \
AND threadid <= ( SELECT max_doc_id FROM sphinx.sphinx_counter WHERE counter_id = 2 );

sql_group_column = forumid
sql_group_column = postuserid
sql_group_column = firstpostid
sql_date_column = lastpost

sql_query_post =
}

source src4 : src3
{
sql_query_pre =
sql_query_range = SELECT ( SELECT max_doc_id FROM sphinx.sphinx_counter WHERE counter_id = 2 ), MAX(threadid) FROM thread
sql_range_step = 1000
sql_query = \
SELECT threadid, forumid, title, IF(postuserid=0,99999999,postuserid) AS postuserid, IF(firstpostid=0,99999999,firstpostid) as firstpostid, lastpost \
FROM thread \
WHERE visible = 1 AND threadid >= $start AND threadid <= $end \
AND threadid > ( SELECT max_doc_id FROM sphinx.sphinx_counter WHERE counter_id = 2 );
}

################################################## ###########################
## index definition
################################################## ###########################

# local index example
#
# this is an index which is stored locally in the filesystem
# all indexing-time options (such as morphology and charsets) belong to the index
index vbpost
{
source = src1
path = /var/sphinx/vbpost
docinfo = extern
morphology = none
stopwords =
min_word_len = 4
charset_type = sbcs
}

index vbpostindex
{
source = src2
path = /var/sphinx/vbpostindex
docinfo = extern
morphology = none
stopwords =
min_word_len = 4
charset_type = sbcs
}

index vbthreadindex
{
source = src3
path = /var/sphinx/vbthreadindex
docinfo = extern
morphology = none
stopwords =
min_word_len = 4
charset_type = sbcs
}

index vbthreadindexdelta
{
source = src4
path = /var/sphinx/vbthreadindexdelta
docinfo = extern
morphology = none
stopwords =
min_word_len = 4
charset_type = sbcs
}

index vbfulltext
{
type = distributed
local = vbpost
local = vbpostindex
}

index vbfulltextthread
{
type = distributed
local = vbthreadindex
local = vbthreadindexdelta
}

################################################## ###########################
## indexer settings
################################################## ###########################

indexer
{
# memory limit
# can be specified in bytes, kilobytes (mem_limit=1000K) or megabytes (mem_limit=10M)
# will grow if set unacceptably low
# will warn if set too low, hurting the performance
# optional, default is 32M
mem_limit = 64M
}

################################################## ###########################
## searchd settings
################################################## ###########################

searchd
{
# port on which search daemon will listen
port = 3312

# log file
# searchd run info is logged here
log = /var/log/searchd.log

# query log file
# all the search queries are logged here
query_log = /var/log/query.log

# client read timeout, seconds
read_timeout = 5

# maximum amount of children to fork
# useful to control server load
max_children = 30

# a file which will contain searchd process ID
# used for different external automation scripts
# MUST be present
pid_file = /var/run/searchd.pid

# maximum amount of matches this daemon would retrieve from each index
# and serve to client
#
# this parameter affects per-client memory usage slightly (16 bytes per match)
# and CPU usage in match sorting phase; so blindly raising it to 1 million
# is definitely NOT recommended
#
# default is 1000 (just like with Google)
max_matches = 1500
}

eoc_Jason
03-12-2007, 06:21 PM
Do you have sphinx as a separate DB? I just made it a table within my forum to keep everything consolidated (since the tables don't hold much data anyhow).

I would double check the sphinx table & field names. Like on mine they are 'sph_counter', not 'sphinx_counter', which is an inconsistency within the documentation and supplied example stuff. Other than that, the code looks okay to me.

Mb81
03-12-2007, 06:31 PM
Do you have sphinx as a separate DB? I just made it a table within my forum to keep everything consolidated (since the tables don't hold much data anyhow).

I would double check the sphinx table & field names. Like on mine they are 'sph_counter', not 'sphinx_counter', which is an inconsistency within the documentation and supplied example stuff. Other than that, the code looks okay to me.

I checked it; it seems all fine. I use sphinx.sphinx_counter; that shouldnt make a difference.

mute
03-13-2007, 04:00 AM
Did anyone else upgrade to php 5.2.1 and have their sphinx install break? I haven't had time to look into it yet, but mine fails to return results and I'm getting a:

Query '' retrieved -2114543231 of 1 matches in -2147483.222 sec.

Heh.

Hm. It is definitely something PHP 5.2.1 related. I went back to 5.2.0 and it is working just fine. I guess I'll have to look at the changelog in the morning to see if I can figure out what is wrong.

orban
03-13-2007, 09:04 AM
Yeah I had that, and recreated all my indices and restarted searchd and then it worked ;/ Really wierd tho.

mute
03-13-2007, 03:07 PM
Yeah I had that, and recreated all my indices and restarted searchd and then it worked ;/ Really wierd tho.

I'm going to try it again, but I had already done that before I downgraded. I just got Andrew to send me a CVS snapshot so I'm going to try that as well, as I've found a segfault in the commandline client that is rather irritating as well.

I'm going to try it again, but I had already done that before I downgraded. I just got Andrew to send me a CVS snapshot so I'm going to try that as well, as I've found a segfault in the commandline client that is rather irritating as well.

So I upgraded to the new sphinx CVS snapshot, stopped searchd, nuked all my indexes, rebuilt them all and tried to search with php 5.2.1, and it's still broken. php 5.2.0 works just fine, so there is something going on. I've read the changelogs and nothing really stood out so I'm stumped.

I made sure I upgraded my sphinxapi.php file when I upgraded too, and that didn't do it, so it is either that or something in your sphinx.php that is breaking, but I haven't been able to figure out what just yet.

Is anyone else running 5.2.1?

orban
03-13-2007, 03:50 PM
I'm running 5.2.1 :/

It didn't work but after recreating all indices and restarting searchd it suddenly did. I didn't have to change any other files.

mute
03-13-2007, 04:23 PM
I'm running 5.2.1 :/

It didn't work but after recreating all indices and restarting searchd it suddenly did. I didn't have to change any other files.

Hmm. Boo. I've done that twice already, I wonder what else it could be? I have multiple webservers setup, and with the same exact settings, the 5.2.0 webservers work and the 5.2.1 webserver does not.

Update: I'm working with the Sphinx author on a fix. It's a 64-bit/PHP 5.2.1 + sphinxapi bug.

Mb81
03-14-2007, 01:06 PM
Still waiting for some help here. Thanks.

Here it is. I don?t see any mistake.
It would be really nice if someone could confirm it. Thanks alot.


#
# sphinx configuration file sample
#

################################################## ###########################
## data source definition
################################################## ###########################

source src1
{
type = mysql
strip_html = 0
sql_host = localhost
sql_user = root
sql_pass = xx
sql_db = xx
sql_port = 3306


sql_query_pre = REPLACE INTO sphinx.sphinx_counter SELECT 1, MAX(postid) FROM post
sql_query_range = SELECT MIN(postid), MAX(postid) FROM post
sql_range_step = 1000
sql_query = \
SELECT postid, forumid, post.threadid as threadid, IF(post.userid=0,99999999,post.userid) AS userid, IF(postuserid=0,99999999,postuserid) AS postuserid, post.title, pagetext, post.dateline \
FROM post \
INNER JOIN thread AS thread ON(thread.threadid = post.threadid) \
WHERE post.visible = 1 AND postid >= $start AND postid <= $end \
AND postid <= (SELECT max_doc_id FROM sphinx.sphinx_counter WHERE counter_id = 1);

sql_group_column = forumid
sql_group_column = threadid
sql_group_column = userid
sql_group_column = postuserid
sql_date_column = dateline

sql_query_post =
}

source src2 : src1
{

sql_query_pre =
sql_query_range = SELECT ( SELECT max_doc_id FROM sphinx.sphinx_counter WHERE counter_id = 1 ), MAX(postid) FROM post
sql_range_step = 1000
sql_query = \
SELECT postid, forumid, post.threadid as threadid, IF(post.userid=0,99999999,post.userid) AS userid, IF(postuserid=0,99999999,postuserid) AS postuserid, post.title, pagetext, post.dateline \
FROM post \
INNER JOIN thread AS thread ON(thread.threadid = post.threadid) \
WHERE post.visible = 1 AND postid >= $start AND postid <= $end \
AND postid > ( SELECT max_doc_id FROM sphinx.sphinx_counter WHERE counter_id = 1 );
}


source src3
{
type = mysql
strip_html = 0
sql_host = localhost
sql_user = root
sql_pass = xx
sql_db = xx
sql_port = 3306


sql_query_pre = REPLACE INTO sphinx.sphinx_counter SELECT 2, MAX(threadid) FROM thread
sql_query_range = SELECT MIN(threadid), MAX(threadid) FROM thread
sql_range_step = 1000
sql_query = \
SELECT threadid, forumid, title, IF(postuserid=0,99999999,postuserid) AS postuserid, IF(firstpostid=0,99999999,firstpostid) as firstpostid, lastpost \
FROM thread \
WHERE visible = 1 AND threadid >= $start AND threadid <= $end \
AND threadid <= ( SELECT max_doc_id FROM sphinx.sphinx_counter WHERE counter_id = 2 );

sql_group_column = forumid
sql_group_column = postuserid
sql_group_column = firstpostid
sql_date_column = lastpost

sql_query_post =
}

source src4 : src3
{
sql_query_pre =
sql_query_range = SELECT ( SELECT max_doc_id FROM sphinx.sphinx_counter WHERE counter_id = 2 ), MAX(threadid) FROM thread
sql_range_step = 1000
sql_query = \
SELECT threadid, forumid, title, IF(postuserid=0,99999999,postuserid) AS postuserid, IF(firstpostid=0,99999999,firstpostid) as firstpostid, lastpost \
FROM thread \
WHERE visible = 1 AND threadid >= $start AND threadid <= $end \
AND threadid > ( SELECT max_doc_id FROM sphinx.sphinx_counter WHERE counter_id = 2 );
}

################################################## ###########################
## index definition
################################################## ###########################

# local index example
#
# this is an index which is stored locally in the filesystem
# all indexing-time options (such as morphology and charsets) belong to the index
index vbpost
{
source = src1
path = /var/sphinx/vbpost
docinfo = extern
morphology = none
stopwords =
min_word_len = 4
charset_type = sbcs
}

index vbpostindex
{
source = src2
path = /var/sphinx/vbpostindex
docinfo = extern
morphology = none
stopwords =
min_word_len = 4
charset_type = sbcs
}

index vbthreadindex
{
source = src3
path = /var/sphinx/vbthreadindex
docinfo = extern
morphology = none
stopwords =
min_word_len = 4
charset_type = sbcs
}

index vbthreadindexdelta
{
source = src4
path = /var/sphinx/vbthreadindexdelta
docinfo = extern
morphology = none
stopwords =
min_word_len = 4
charset_type = sbcs
}

index vbfulltext
{
type = distributed
local = vbpost
local = vbpostindex
}

index vbfulltextthread
{
type = distributed
local = vbthreadindex
local = vbthreadindexdelta
}

################################################## ###########################
## indexer settings
################################################## ###########################

indexer
{
# memory limit
# can be specified in bytes, kilobytes (mem_limit=1000K) or megabytes (mem_limit=10M)
# will grow if set unacceptably low
# will warn if set too low, hurting the performance
# optional, default is 32M
mem_limit = 64M
}

################################################## ###########################
## searchd settings
################################################## ###########################

searchd
{
# port on which search daemon will listen
port = 3312

# log file
# searchd run info is logged here
log = /var/log/searchd.log

# query log file
# all the search queries are logged here
query_log = /var/log/query.log

# client read timeout, seconds
read_timeout = 5

# maximum amount of children to fork
# useful to control server load
max_children = 30

# a file which will contain searchd process ID
# used for different external automation scripts
# MUST be present
pid_file = /var/run/searchd.pid

# maximum amount of matches this daemon would retrieve from each index
# and serve to client
#
# this parameter affects per-client memory usage slightly (16 bytes per match)
# and CPU usage in match sorting phase; so blindly raising it to 1 million
# is definitely NOT recommended
#
# default is 1000 (just like with Google)
max_matches = 1500
}

raywjohnson
03-18-2007, 02:55 AM
Still waiting for some help here. Thanks. Did you find a solution? I did a line by line comparison to my conf and did not find anything obvious.

You XXed out your db name (sql_db=xx) does that match sphinx.sphinx_counter. What I mean is, does your sql_db=sphinx ? (they should match or you can remove the sphinx. part and it should work as well)

Also, just as a suggestion, I would set up a mysql user other than root (i.e. sql_user = root).

-RayJ

Mb81
03-18-2007, 02:50 PM
Did you find a solution? I did a line by line comparison to my conf and did not find anything obvious.

You XXed out your db name (sql_db=xx) does that match sphinx.sphinx_counter. What I mean is, does your sql_db=sphinx ? (they should match or you can remove the sphinx. part and it should work as well)

Also, just as a suggestion, I would set up a mysql user other than root (i.e. sql_user = root).

-RayJ

No, the forum is in another database. sphinx is just a database for the counter.
I try the user change; but that shouldn?t make any trouble.

raywjohnson
03-18-2007, 09:05 PM
No, the forum is in another database. sphinx is just a database for the counter.
I try the user change; but that shouldn?t make any trouble.

Bummer, I was hoping it was something simple. As far as the user name goes, that just seems to be the default security advice that I run into. So I thought I would pass it on.

I put my counter in the same DB as vBulletin (I am lazy!).

My question is: Does your script have a "mysql_select_db()" command to access the "sphinx" counter DB? I am asking partially due to my own ignorance as I have not taken the time to really learn how vBulletin and Sphinx work together and I am not sure if the counter DB is automatically "selected" for you. (again with the lazy!)

-RayJ

amcd
03-18-2007, 09:42 PM
Why do you want to complicate things by keeping the counter in a separate db? What is the issue with keeping it in the same db?

Even if that does not solve the problem, at least we all will know the problem is elsewhere.

Mb81
03-18-2007, 10:24 PM
Here i go with sphinx counter in forum db:

debian:/usr/local/sphinx/bin# ./indexer --config /usr/local/sphinx/etc/sphinx.conf --rotate --all
Sphinx 0.9.7-RC2
Copyright (c) 2001-2006, Andrew Aksyonoff

using config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'vbpost'...
collected 6058520 docs, 1721.0 MB
sorted 173.1 Mhits, 100.0% done
total 6058520 docs, 1721024070 bytes
total 1107.264 sec, 1554303.57 bytes/sec, 5471.61 docs/sec
indexing index 'vbpostindex'...
collected 268 docs, 0.1 MB
sorted 0.0 Mhits, 100.0% done
total 268 docs, 67017 bytes
total 0.037 sec, 1802990.42 bytes/sec, 7210.13 docs/sec
indexing index 'vbthreadindex'...
WARNING: zero/NULL attribute 'lastpost' for document_id=159188, fixed up to 1
WARNING: zero/NULL attribute 'lastpost' for document_id=174376, fixed up to 1
WARNING: zero/NULL attribute 'lastpost' for document_id=188952, fixed up to 1
collected 144559 docs, 4.1 MB
sorted 0.4 Mhits, 100.0% done
total 144559 docs, 4113375 bytes
total 2.832 sec, 1452684.27 bytes/sec, 51052.62 docs/sec
indexing index 'vbthreadindexdelta'...
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.010 sec, 0.00 bytes/sec, 0.00 docs/sec
skipping index 'vbfulltext' (distributed indexes can not be directly indexed)...
skipping index 'vbfulltextthread' (distributed indexes can not be directly indexed)...
WARNING: failed to read pid_file '/var/run/searchd.pid'.
WARNING: indices NOT rotated.
debian:/usr/local/sphinx/bin# ./searchd --config /usr/local/sphinx/etc/sphinx.conf
Sphinx 0.9.7-RC2
Copyright (c) 2001-2006, Andrew Aksyonoff

using config file '/usr/local/sphinx/etc/sphinx.conf'...
WARNING: index 'vbpost': failed to preload schema and docinfos - NOT SERVING
WARNING: index 'vbpostindex': failed to preload schema and docinfos - NOT SERVING
WARNING: index 'vbthreadindex': failed to preload schema and docinfos - NOT SERVING
WARNING: index 'vbthreadindexdelta': failed to preload schema and docinfos - NOT SERVING
WARNING: index 'vbfulltext': no such local index 'vbpost' - NOT SERVING
WARNING: index 'vbfulltext': no such local index 'vbpostindex' - NOT SERVING
WARNING: index 'vbfulltext': no valid local/remote indexes in distributed index - NOT SERVING
WARNING: index 'vbfulltextthread': no such local index 'vbthreadindex' - NOT SERVING
WARNING: index 'vbfulltextthread': no such local index 'vbthreadindexdelta' - NOT SERVING
WARNING: index 'vbfulltextthread': no valid local/remote indexes in distributed index - NOT SERVING
FATAL: no valid indexes to serve
debian:/usr/local/sphinx/bin#

raywjohnson
03-19-2007, 12:12 AM
Here i go with sphinx counter in forum db:
debian:/usr/local/sphinx/bin# ./indexer --config /usr/local/sphinx/etc/sphinx.conf --rotate --all
....
debian:/usr/local/sphinx/bin# ./searchd --config /usr/local/sphinx/etc/sphinx.conf
....


According to Orban
3. Indexing
Important: When searchd is running, add --rotate, if it's not running, don't add :)So....

If searchd is already running
./indexer --config /usr/local/sphinx/etc/sphinx.conf --rotate --all

If searchd is NOT already running
./indexer --config /usr/local/sphinx/etc/sphinx.conf --all

I would delete all the indexes that were created under /var/sphinx/ first.

-RayJ

Mb81
03-19-2007, 01:39 AM
According to Orban
So....

If searchd is already running
./indexer --config /usr/local/sphinx/etc/sphinx.conf --rotate --all

If searchd is NOT already running
./indexer --config /usr/local/sphinx/etc/sphinx.conf --all

I would delete all the indexes that were created under /var/sphinx/ first.

-RayJ

Thanks alot ! That was the problem. Thanks again.

Another question; what are recommended update settings ?

eoc_Jason
03-19-2007, 06:54 PM
Just curious for people's indexer updates, how often do you do them and how to you manage the delta vs full?

Right now I have a cron job running every 5 minutes to do the delta files since they only take a few seconds. I do a full rebuild maybe once a week or so (after disabling the delta cronjob so it doesn't try to run at the same time).

I would like to automate the full rebuild, but changing the delta one would require a really ugly cron looking line. Alternatively I could have another script check the time and figure out which job to run and if a job is already running and all that.

Second, how do you guys rotate your searchd & query log files? The only way I have gotten it to do that is to kill searchd completely and restart it. It would be nice if there was a more graceful method.

Oh boy... found a little issue that might affect some people...

*If* you are running MySQL 4.0.x you can't do the sub-selects in the conf file. Not a *huge* deal as the code can be re-written (which I'm doing now). Yes I know I should be running a much newer version of mysql, but I have an app that uses the old timestamp format and I haven't had a chance to re-write it all completely, thus I'm still using the old mysql.

I'll post the modified conf up later once I fix it compeltely and edit out stuff specific to my forum. ;)

raywjohnson
03-19-2007, 09:08 PM
Thanks alot ! That was the problem. Thanks again. No problem! Having just gone through all this myself, I am happy to help by passing on what I have learned.

I also like to give credit where it is due:
so thank you Orban (https://vborg.vbsupport.ru/member.php?u=72989!) (
https://vborg.vbsupport.ru/showpost.php?p=1104866 )

Another question; what are recommended update settings ? If you are asking about indexing, keep on reading!

Just curious for people's indexer updates, how often do you do them and how to you manage the delta vs full?

Right now I have a cron job running every 5 minutes to do the delta files since they only take a few seconds. I do a full rebuild maybe once a week or so (after disabling the delta cronjob so it doesn't try to run at the same time).

I would like to automate the full rebuild, but changing the delta one would require a really ugly cron looking line. Alternatively I could have another script check the time and figure out which job to run and if a job is already running and all that.

Second, how do you guys rotate your searchd & query log files? The only way I have gotten it to do that is to kill searchd completely and restart it. It would be nice if there was a more graceful method.

When to run the indexer seems to be a matter of preference, keeping in mind the usage/size of the database in question. I run two (almost) identical crons, one every 20 min (for the deltas) and one every day (for the full index). The LOCKFILE helps to keep them from stepping on each other.


#!/bin/sh

LOCKFILE=/var/lock/sphinx.cron.lock
INDEXER_CONF=/usr/local/etc/sphinx.conf

# the lockfile is not meant to be perfect, it's just in case the
# two sphinx cron scripts get run close to each other to keep
# them from stepping on each other's toes.

[ -f $LOCKFILE ] && exit 0

trap "{ rm -f $LOCKFILE ; exit 255; }" EXIT

touch $LOCKFILE

cd /usr/local/bin/ #(NOTE: this should be where indexer is located)

DELTAS_ONLY_LINE: ./indexer --config $INDEXER_CONF --rotate YOUR_POST_INDEX_DELTA YOUR_THREAD_INDEX_DELTA >/dev/null 2>&1
OR
FULL_INDEX_LINE ./indexer --all --rotate --config $INDEXER_CONF >/dev/null 2>&1

exit 0


You could also replace ">/dev/null 2>&1" with "| mail -s "Sphinx Report" YOUR_EMAIL_HERE" to get an email of the output.

I created a folder called: "/etc/cron.20minutes" and then added the line "*/20 * * * * root run-parts /etc/cron.20minutes" to "/etc/crontab"

then put the cron with the DELTAS_ONLY_LINE in that folder and put the cron with FULL_INDEX_LINE in my "/etc/cron.daily" (you could also put it in "/etc/cron.weekly")

As far as log rotation goes, I used the Power of Linux once more! I just created the file:

"/etc/logrotate.d/sphinx"

/var/log/searchd.log /var/log/query.log {
notifempty
missingok
}


NOTE: the two paths/files need to match your sphinx.conf
log = /var/log/searchd.log
query_log = /var/log/query.log

Now they rotate all by themselves!

-RayJ

eoc_Jason
03-19-2007, 09:26 PM
Thanks for all that info! I'm about to head home for the day but I will definitly re-read it more thoroughly later tonight. :)

eoc_Jason
03-20-2007, 02:05 AM
Well, I made some headway after reading through this entire thread...

First I had the issue with the coventry error, since I don't use it I just commented out the code as a quick fix for now.

Then I started getting the assert error if I searched a different time frame from the default (ex: a week ago and newer). So I added the assert_options as another quick fix.

What was bugging me the most was my dates were not being sorted properly, and upon digging through the code I discovered why...

First, in your sphinx.conf file's "post_src" (whatever you name it - it's the first bracket code), in the sql_query you need to add lastpost to the end of the SELECT. Then below the existing sql_date_column line, add another saying sql_date_column = lastpost. You will also need to add the lastpost to the SELECT in the delta code (next bracket code bit).

Rebuild your indexes completely...

Then in the sphinx.php file you will notice the sort_date defaults to 'dateline', and in the 'titleonly' if/else statement it changes to 'lastpost'. The "else" part of that statement refers to all queries that searches the entire posts - which can return as either posts or threads. So basically within the 'else' code (like after the set weights) just add:

if ($vbulletin->GPC['showposts'] == false)
{
$sphinx_sort_by_date = 'lastpost';
}


So now when results are returned as threads, it sorts them based on the last post date, not the date when the post with the matching word was found.

As promised, I'm attaching my sphinx.conf file that works with MySQL 4.0.x (because that version does not support sub-selects). I also have the above mentioned changes if you don't fully understand what I was trying to explain above. Included too is my stopwords file (it's just the MySQL default list).

Tomorrow I'm going to try and go through and ensure all the current functionality works with all the various search options, then I'm going to slowly try and work my way through to add the features that were removed from the vB template. :)

kmike
03-20-2007, 07:27 AM
The problem with adding the "lastpost" to the attributes is that it will only be accurate immediately after reindexing. On a sufficiently busy forum, with frequently updated threads, the sorting by lastpost may be off by large amount just a few hours after full reindexing.
Though it's something you can live with if the forum isn't very active.

eoc_Jason
03-20-2007, 01:51 PM
Ooo... you are right, I guess that's what I get for the late night coding. Boy, that is a pickle, since the delta only adds new posts & threads there is no way to go back and update the existing data (without a full reindex).

If only there was a way to query the mysql table to grab the latest post/thread attributes based on the sphinx results. I haven't fully examined the code so I don't know what all is and isn't possible yet.

Thanks for pointing out my oversight... ;)

kmike
03-20-2007, 09:34 PM
I outlined the solution to this problem in the post #306:
https://vborg.vbsupport.ru/showpost.php?p=1186878&postcount=306

kmike
03-22-2007, 08:20 AM
Did anyone else upgrade to php 5.2.1 and have their sphinx install break? I haven't had time to look into it yet, but mine fails to return results and I'm getting a:

Query '' retrieved -2114543231 of 1 matches in -2147483.222 sec.

Heh.

Hm. It is definitely something PHP 5.2.1 related. I went back to 5.2.0 and it is working just fine. I guess I'll have to look at the changelog in the morning to see if I can figure out what is wrong.
It's because of the bug in unpack() function in PHP 5.2.1:
http://bugs.php.net/bug.php?id=40749
First seen here, workaround included:
http://www.sphinxsearch.com/forum/view.html?id=340&from=1887

mute
03-22-2007, 10:09 AM
It's because of the bug in unpack() function in PHP 5.2.1:
http://bugs.php.net/bug.php?id=40749
First seen here, workaround included:
http://www.sphinxsearch.com/forum/view.html?id=340&from=1887

Yeah, shoban and I figured out what the problem was, he was contacting the PHP guys to yell at them who claim it "is not a bug". I hadn't seen the workaround though, thanks!

eoc_Jason
03-23-2007, 05:48 PM
kmike - Yep, that worked perfectly.

The more I look at the sphinx.php code, the more I think overall functionality would be better just to merge the code in with the search.php file, then I think all stock vB functionality could be brought back. I'm going to tackle this on Sunday as I have some other things I need to take care of today and tomorrow I've got plans.

kmike
04-06-2007, 07:53 AM
Sphinx 0.9.7 has been released:
http://www.sphinxsearch.com/doc.html#changelog
It fixes the crashes in the excerpts bulding routines and also the memory fragmentation problem. A host of new features are added, too.

jason|xoxide
04-06-2007, 04:46 PM
Sphinx 0.9.7 has been released:
http://www.sphinxsearch.com/doc.html#changelog
It fixes the crashes in the excerpts bulding routines and also the memory fragmentation problem. A host of new features are added, too.

Have you upgraded yet? Any issues? If I do upgrade from RC2, do I need to copy sphinxapi.php to all of my sites again or hasn't it changed?

kmike
04-11-2007, 06:19 PM
Yes, I've upgraded and it works perfectly.
I don't know if there are any protocol changes in 0.9.7 release vs RC2, but I guess it's better to update sphinxapi.php just in case.

orban
04-11-2007, 06:22 PM
Yeah there's quite some changes to sphinxapi.php so I'd advise you to replace yours with the new one.

ALanJay
04-11-2007, 06:38 PM
Well good to hear the new version works fine and there are no more changes. I really need to upgrade from the version we have been running since last year. I think we will probably wait and move with the next upgrade to Vb.

By the way has anyone had any problems compiling the latest release - I will be asking over on the sphinxsearch forums but thought I would ask here.

I did a quick configure and make on a new machine running FreeBSD 6.2 and it didn't like it at all.

ubuntu-geek
04-12-2007, 03:26 PM
Upgraded to sphinx 0.9.7 no issues yet.. Just curious has anyone found a way to fix the search ordering yet?

ALanJay
04-29-2007, 08:41 AM
As we are plannning to move to vBulletin 3.6.5 (or whatever the latset version is) I was wondering which is the latest set of instructions for changing the vBulletin search.php instructions.

Having tried the first example I found - hoping that I cound just use the diff file I seem to get errors on my test server - the latest release of sphinx works fine and I can get results using test.php (though i did have fun installing on FreeBSD due to the 64bit error floating around - but fortunately the updated ports version worked).

Anyway whenever I try to use the modified version of search.php I don't get any results and get errors on the page.

Any ideas on what might be going wrong?

I have checked and as far as I can tell I am looking in the correct sphinxsearch database (on the local machine).

ALanJay
04-30-2007, 01:10 PM
Having researched this frther I have found that although searchd and test.php work fine for my internal database (DS) when I try to use test.php against the hybrid index I get:

Query failed: failed to read searchd response (status=0, ver=263, len=-2147483487, read=0).

My guess this is possibly some kind of configuration issue with the sphinx.conf for the files and index though it seems to index the file perfectly.

Anyone have any ideas?

orban
04-30-2007, 01:13 PM
Are you running a 64bit operating system?

fastforward
04-30-2007, 01:18 PM
Having researched this frther I have found that although searchd and test.php work fine for my internal database (DS) when I try to use test.php against the hybrid index I get:

Query failed: failed to read searchd response (status=0, ver=263, len=-2147483487, read=0).

My guess this is possibly some kind of configuration issue with the sphinx.conf for the files and index though it seems to index the file perfectly.

Anyone have any ideas?
As Orban says, that sounds like the 64bit issue. I was hoping the latest sphinx update fixed it (the changelog seemed to indicate it had), but I still get that error with php 5.2.1. I had to revert back to 5.2.0.

The latest sphinx and php 5.2.0 works fine with 64bit.

orban
04-30-2007, 02:49 PM
<a href="http://www.sphinxsearch.com/forum/view.html?id=340#1900" target="_blank">http://www.sphinxsearch.com/forum/view.html?id=340#1900</a>

ALanJay
04-30-2007, 02:56 PM
Thanks - I had a different 64bit issue which I solved and didn't realise there was a second one.

bmanzzz
05-01-2007, 09:42 AM
can some one please provide a step by step how to install Sphinx Search and configuring
and then how to use it with Vbulletin

ALanJay
05-01-2007, 10:03 AM
Hi,

Well I now have a system that appears to work but for the vBulletin database always gives no results but for my own database gives the expected results.

Between 0.9.6 and 0.9.7 have there been many changes to the sphinx.conf it is the only thing that I can think might be causing the issue unless anyone has any other ideas?

amcd
05-01-2007, 10:08 AM
can some one please provide a step by step how to install Sphinx Search and configuring
and then how to use it with Vbulletin
https://vborg.vbsupport.ru/showpost.php?p=1104866

This link is given right at the top of the first post of this thread. This is the most comprehensive guide so far. If this is not enough, then you have to read through the whole thread.

ALanJay, I am running sphinx without problems on FreeBSD amd64. If you have any specific questions about versions etc, maybe I can help.

ALanJay
05-01-2007, 10:34 AM
ALanJay, I am running sphinx without problems on FreeBSD amd64. If you have any specific questions about versions etc, maybe I can help.

Thanks amcd,

Well most of my FreeBSD issues have been solved - compilation and running all see fine.

I have changed sphinxapi.php to include the sugested:

function unpack31($f,$s)
{
$arr=unpack($f,$s);
foreach($arr as $k=>$v) {
$b = sprintf("%b", $v);
if(strlen($b) == 64){
$arr[$k]=bindec(substr($b, 33));
}
}
return $arr;
}


I have made the changes to sphix.conf for I think for the changes to the column settings ie

sql_group_column = forumid
sql_group_column = threadid
sql_group_column = userid
sql_group_column = postuserid
sql_date_column = dateline
sql_query_post =

I think the SQL stuff is unchanged from 0.9.6 to 0.9.7 and still have for the Post Index:



sql_query_pre = REPLACE INTO spy_forum.sph_counter SELECT 1, MAX(postid) FROM post
sql_query_range = SELECT MIN(postid), MAX(postid) FROM post
sql_range_step = 1000
sql_query = \
SELECT postid, forumid, post.threadid as threadid, IF(post.userid=0,99999999,post.userid) AS userid, IF(postuserid=0,99999999,postuserid) AS postuserid, p
ost.title, pagetext, post.dateline \
FROM post \
INNER JOIN thread AS thread ON(thread.threadid = post.threadid) \
WHERE post.visible = 1 AND postid >= $start AND postid <= $end \
AND postid <= ( SELECT max_doc_id FROM spy_forum.sph_counter WHERE counter_id = 1 );


and the delta


sql_query_pre =
sql_query_range = SELECT ( SELECT max_doc_id FROM spy_forum.sph_counter WHERE counter_id = 1 ), MAX(postid) FROM post
sql_range_step = 1000
sql_query = \
SELECT postid, forumid, post.threadid as threadid, IF(post.userid=0,99999999,post.userid) AS userid, IF(postuserid=0,99999999,postuserid) AS postuserid, p
ost.title, pagetext, post.dateline \
FROM post \
INNER JOIN thread AS thread ON(thread.threadid = post.threadid) \
WHERE post.visible = 1 AND postid >= $start AND postid <= $end \
AND postid > ( SELECT max_doc_id FROM spy_forum.sph_counter WHERE counter_id = 1 );


index DSFullTextPostIndex
{
type = distributed
local = DSPostIndex
local = DSPostIndexDELTA
}


it all seems to work ok when I create the indexes from scratch but the test.php and vB search.php always give 0 results.

ie

php ./xx-test-search.php -i DSFullTextPostIndex "digital tv"
Query 'digital tv ' retrieved 0 of 0 matches in 0.000 sec.
Query stats:
'digital' found 0 times in 0 documents
'tv' found 0 times in 0 documents

yet when I use my internal database it all works fine:

php ./xx-test-search.php -i DSramsIndex "digital tv"
Query 'digital tv ' retrieved 1000 of 1729 matches in 0.005 sec.
Query stats:
'digital' found 20655 times in 4243 documents
'tv' found 24047 times in 5666 documents
Matches:
1. doc_id=4744, weight=208, date=2002-01-20 22:44:53
2. doc_id=4868, weight=208, date=2002-01-31 20:01:25

This would imply some sort of error with the creating of the vBulletin index in differences between the 3.0 stream I was previously using and the 3.6 stream of the new test site?

Any thoughts?

amcd
05-01-2007, 11:05 AM
I am using sphinx 0.9.7 rc1. I have not made the edit for unpack31.

Why do you write spy_forum.sph_counter all the time? Is the counter in a different DB? I have the counter in the same DB.

One thing I noticed is that the command line search does not return any results from compound indexes.

search -c /usr/local/etc/sphinx.conf --index 'postmain' something
and
search -c /usr/local/etc/sphinx.conf --index 'postdelta' something
both work.

but
search -c /usr/local/etc/sphinx.conf --index 'post' something
returns zero results.

Maybe you are facing the same issue with the test script.

For what it is worth, I can send you my files if you want.

orban
05-01-2007, 01:44 PM
One thing I noticed is that the command line search does not return any results from compound indexes.

Confirmed...I don't know if that is a bug or a feature.

ALanJay
05-01-2007, 03:13 PM
Well that explains something - having done some more testing I can get results from the test programme now from ThreadIndex but not PostIndex which is very peculiar.

And this follows through on the actual vb search.php in that if you "search titmes only" it seems to work but "search entire posts" :(

One of my colleages was kind enough to compare the actual mySQL and the results from the different 3.0 and 3.6 databases and we discovered that there was a difference. It appears that after the upgrade (to 3.6) the index doesn't seem to be being correctly read so an explicit inclusion of - USE INDEX (threadid) - we thoguht might work:

SELECT postid, forumid, post.threadid as threadid, IF(post.userid=0,99999999,post.userid) AS userid, IF(postuserid=0,99999999,postuserid) AS postuserid, post.
title, pagetext, post.dateline \
FROM post \
USE INDEX (threadid) \
INNER JOIN thread AS thread ON(thread.threadid = post.threadid) \
WHERE post.visible = 1 AND postid >= $start AND postid <= $end \
AND postid > ( SELECT max_doc_id FROM spy_forum.sph_counter WHERE counter_id = 1 );


Except that doesn't work when creating the sphinx index but does when using mysql directly.

:(

Rayn21
05-08-2007, 08:16 PM
You can emulate the search by user in sphinx by adding a fake unique keyword per each member in the mix (e.g. "_userid_12345"). Searching by this keyword will return all posts by the member with userid 12345.Are further modifications to orban's code required to get this working?

It seems that my modified search.php is still using the VB search engine for some queries. (those without keywords)

orban
05-08-2007, 08:20 PM
Yes mine doesn't support that.

rix
05-10-2007, 07:43 PM
thanks orban for the script, I'm loving it!.

I used the search.php and applied orban's patch but I only get the result when I choose the option xx month ago or something but not "Any Date". The debug returns the following for "Any Date"


Query '' retrieved 0 of 0 matches in 0.027 sec.
Query stats:
'kereta' found 5277 times in 3731 documents


While the same keyword from "Yesterday" returns

Query '' retrieved 4 of 4 matches in 0.005 sec.
Query stats:
'kereta' found 5277 times in 3731 documents


I'm guessing something wrong with the php but not sure which script.

Neil Lock
05-16-2007, 06:54 AM
Hey All,

Mentioned way back was a possible solution for getting posts per userid into sphinx - has actually implemented this or does anyone have any other ideas - im pretty sure that some of the remaining slow queries in my board relate to this functionality and would be curious to see what people are doing?

btw the sphinx engine is running amazingly we are averaging now just under 10000 searches on it a day and its had no real issues!

Cheers

Neil

amcd
05-18-2007, 05:15 AM
ALanJay, did you finally solve the problem? I upgraded php yesterday, and now my search doesn't work. All searches return zero results.

edit: It works now. The unpack31 trick did it.

ekool
05-29-2007, 09:50 AM
I have this working but seem to be running into a strange issue..

When I search for the same word more then once, I get the following:


Search took 0.32 seconds; generated 50 minute(s) ago.

So, it appears that once I search for a particular keyword, it never "renews" the search and instead always shows the old results that were generated a long time ago?

rix
05-29-2007, 09:53 AM
that's because u set the search to share result.

Options->Message Searching Option->Search Result Sharing

ekool
05-29-2007, 09:23 PM
that's because u set the search to share result.

Options->Message Searching Option->Search Result Sharing

Yup, your right. Thanks.

erm yeah, it's $Coventry. We don't use it, so that is probably why. For now i just set error_reporting to 0 and it went away :)

I do not see an error_reporting in the php files? How did you disable it?

amcd
05-30-2007, 02:58 AM
I do not see an error_reporting in the php files? How did you disable it?its in php.ini

most vbulletin scripts also set it in the first few lines like this:
// ####################### SET PHP ENVIRONMENT ###########################
error_reporting(E_ALL & ~E_NOTICE);

TECK
06-10-2007, 04:13 PM
orban, I have a question. Did you patched and compiled MySQL 5.x as described in the readme file?
http://www.sphinxsearch.com/doc.html#sphinxse

I'm about to build an RPM for MySQL with SphinxSE. I think is good we test everything in this direction also, it should increase performance since everything is directly build into MySQL.
Let me know what you think. Thanks.

orban
06-10-2007, 04:15 PM
No, I'm not using SphinxSE. Afraid. :(

mute
06-14-2007, 04:58 AM
Has anyone figured out a fix for the "out of order" results issue yet?

kmike
06-14-2007, 09:26 AM
Has anyone figured out a fix for the "out of order" results issue yet?The fix is to use sort_search_items() function where appropriate (https://vborg.vbsupport.ru/showpost.php?p=1186878&postcount=306).

DaiTengu
06-14-2007, 10:07 AM
The fix is to use sort_search_items() function where appropriate (https://vborg.vbsupport.ru/showpost.php?p=1186878&postcount=306).

You wouldn't happen to have an easy way to implement that, would you? My PHP knowledge is somewhat lacking :)

mute
06-14-2007, 02:39 PM
You wouldn't happen to have an easy way to implement that, would you? My PHP knowledge is somewhat lacking :)

Hehe, I was hoping for something a bit more cut and paste as well. I was hoping to not have to familiarize myself with the search code since I managed to avoid it the first time around ;)

It's been one of those things we get a user complaining about every so often that I've intended on fixing at some point but just haven't gotten to, and still don't really have the time to, but keeps getting brought up..

Rayn21
06-17-2007, 09:27 PM
I changed the minimum search word length to 3 in sphinx.conf, but searches for 3 letter words still return no results (I rebuilt all the search indices)

Is there something else that will need to be changed to make this work?

fastforward
06-17-2007, 09:30 PM
I changed the minimum search word length to 3 in sphinx.conf, but searches for 3 letter words still return no results (I rebuilt all the search indices)

Is there something else that will need to be changed to make this work?You also need to change the limit in the admin panel.

eoc_Jason
06-19-2007, 06:17 PM
You wouldn't happen to have an easy way to implement that, would you? My PHP knowledge is somewhat lacking :)

I did that on my forum. I'll try to find the relevant code and post it here for ya. It was pretty easy IIRC.

mute
06-20-2007, 04:44 PM
I did that on my forum. I'll try to find the relevant code and post it here for ya. It was pretty easy IIRC.

Thanks Jason, that would be appreciated. I have had my hands full with other projects and haven't had time to go digging either.

andrewkhunn
06-25-2007, 02:33 PM
I'd really appreciate that code as well.

On another note, does anyone know if I can use Sphinx to power the similar threads search in vBulletin or will I still need to use the default engine for that. Any pointers here would be much appreciated.

TECK
06-27-2007, 08:03 AM
Orban, the template you have on your site:
http://forums.mtgsalvation.com/search.php

is this one?
http://dragy.de/public/sphinx/sphinx_search_forums.template.txt

If you made any changes, please post them. Thanks.

doopz
06-27-2007, 08:06 PM
Hello!

does anyone have a correct patch for search.php on a vbulletin 3.5?
what to add / replace etc.

orban
07-04-2007, 09:36 PM
See below. Easier.

doopz: You can probably apply to changes I've made down there easily to vB 3.5.


orban
07-05-2007, 10:53 AM
Implementing Sphinx full-text search engine

Based on Sphinx 0.7.9 and vB 3.6.7 PL1. This means all file edits and config files are only tested with those two versions, it doesn't mean you cannot make Sphinx work with your vB 3.5 installation but it will require manual work on your side.

Known limitations
You cannot filter by number of replies
Possible Fix: Add another "sql_group_column" holding the number of thread replies, the search will using the numbers of the last thread reindexing though (depending on your setup, hours to days old results).
Sorting by title/number of replies/views/thread start date/username/forum isn't possible

Basically same issue as in (1.), Sphinx doesn't have the necessary data.
You can only use Sphinx to perform queries that have a full text component. So searches by userid/forumid WITHOUT a key word are not possible. These searches can run on indices though so they shouldn't be an issue.

Workaround by kmike (https://vborg.vbsupport.ru/showthread.php?p=1150437#post1150437). "You can emulate the search by user in sphinx by adding a fake unique keyword per each member in the mix (e.g. "_userid_12345"). Searching by this keyword will return all posts by the member with userid 12345."
Search Results out of order because the time stamps are too old

Sphinx doesn't query MySQL to get the latest time stamps. So if your thread had its last reply 3 days ago, was indexed by Sphinx 2 days ago and now today got a new reply, Sphinx will still assume its last reply was 3 days ago. In the search results, it will put waaay back instead being at the top. There is no easy fix for this, and certainly no fast one, because this is just what makes Sphinx so fast. We're sacrificing a bit of "up-to-date-ness" to gain speed. If are in desperate need of fixing this, kmike outlined a fix (https://vborg.vbsupport.ru/showpost.php?p=1186878&postcount=306). Basically this will send a results to MySQL and sort it again, giving you up-to-date results by sacrificing speed. It's up to you to find out if it's worth it. What Sphinx can do for you

Incredibly fast full text searches on huge amounts of posts

It's really fast, really really really fast. Even on intersections of multiple keywords on several hundred thousand results.
Replace forum search, search in this forum and search in this thread

Mimicking the default forum search for all but a few details
Nearly instant indexing of new posts

Thanks to a special config file setup called "Live Updates (http://www.sphinxsearch.com/doc.html#live-updates)" Setting up Sphinx
Grab Sphinx here: http://www.sphinxsearch.com/downloads.html and compile it
Read a bit of the documentation (http://www.sphinxsearch.com/doc.html)to get familiar with it, might wanna peek in the installation bit
Grab the sphinx.conf.txt at the end of this document (rename it to sphinx.conf). This is my configuration file. You have to, at least, fill in your database info and adjust the paths /.../
You have to create a counter table that holds information about the last indexed post/thread for the Live Updates: CREATE TABLE sph_counter
(
counter_id INTEGER PRIMARY KEY NOT NULL,
max_doc_id INTEGER NOT NULL
); You can either place this in the same database as vB or in a different one, but don' forget to adjust sphinx.conf accordingly then (prefixing sph_counter with your database name: yourdb.sph_counter)Running Sphinx
You start Sphinx with "searchd --config /.../sphinx.conf" this will create a new process called "searchd".
Indexing documents is handled by "indexer". You have to make sure you know whether it's running or not before you start an indexing process, this is crucial.
searchd is running: use "indexer --rotate", it will create temporary new files and rotate them in so searching won't be broken
searchd isn't running: use "indexer" without rotating it will just replace your current files
For creating the full indices it is recommended to shut down Sphinx because it might take a while and your server will be quite busy (unless you run sphinx on a slave). Reindexing all posts and threads is done by "indexer --config /.../sphinx.conf --all" or "indexer --config /.../sphinx.conf --rotate --all" if searchd is running.
Creating the delta indices for Live Updates is issued by "indexer --config /.../sphinx.conf --rotate postdelta threaddelta"
You can test your indices with "search", the third executable installed by Sphinx. Call "search" and it tell you how to use it ;)Live Updates
You have to figure out a couple values: How often to re-index the whole thing, how often to re-index all threads, how often to do Live Updates for postdelta and threaddelta.
"indexer -all": I do this about once per week on a very un-busy time, usually manually.
I re-index all threads once per day, we just have 80k so this takes no time.
I recreate the delta indices every five minutes for both posts and threads so you have to wait between 1 and 5 minutes before your new threads/posts start showing up in search results.
I suggest adding cron jobs for those taks on *n*x, other OSes I don't know, can you even run Sphinx on Windows?Plugging Sphinx into vBulletin
Sadly enough this requires file modification. I'm checking every version if they finally added a way to plug in a different search system like WordPress for examples does, but no luck so far. There is 5 edits required, I listed them in search.php.txt at the end of this document for easier references and so you can save it for future use. You will be editing "/.../forums/search.php". Don't forget that every vB upgrade the file will be overwritten and you will have to apply the changes again.
We also need sphinxapi.php, it's from "/.../src/sphinx-0.9.7/api" where your Sphinx source files are. Copy paste it to "/.../forums", where global.php lies.
And last item is sphinx.php which will handle the search. Grab sphinx.txt.php and rename it to sphinx.php and put it into "/.../forums/includes". Open it and adjust the values on top. You can obviously move those files to where you want just don't forget to adjust paths.
Because we cannot offer all search options vB default search can, I removed a couple lines from the "search_forums" template. They are listed in search_forums.txt at the end of this document.Bugs and Fixes
Problems with $Coventry: Try to kill " AND in_array($docinfo['attrs'][$sphinx_conventry_userid], $Coventry)" from sphinx.php if you are not using Coventry
Problem with asserts in sphinxapi.php, values sent to sphinx not being integer: https://vborg.vbsupport.ru/showpost.php?p=1141750&postcount=284 Fix by amcd, thank you
sphinxapi.php ignoring max matches limit set in sphinx.conf: https://vborg.vbsupport.ru/showpost.php?p=1182576&postcount=301 Thanks jason
Problems with 2 or 3 letter words: Make sure you change the limit in vB AdminCP search options and in sphinx.conf, turn off searchd, indexer -all
PHP unpack() bug on 64bit systems: http://www.sphinxsearch.com/forum/view.html?id=340&from=1887 4th post is helpful and fixed the issue for me https://vborg.vbsupport.ru/showpost.php?p=1238827&postcount=360 here's another post by AlanJay outlining the problem, and updating Sphinx
MySQL 4.0 doesn't support Sub-Selects, eoc_Jason created a fix: https://vborg.vbsupport.ru/showpost.php?p=1207786&postcount=338 Thanks
Issues with very old results showing up: Options->Message Searching Option->Search Result Sharing turn it offContributions
Viewing Sphinx Log files in AdminCP: https://vborg.vbsupport.ru/showpost.php?p=1195791&postcount=312 Thanks UK_Jimbo.
Bash script to help indexer fun more smoothly by creating lock files: https://vborg.vbsupport.ru/showpost.php?p=1207641&postcount=336 Thanks raywjohnson

orban
07-05-2007, 11:03 AM
Can somebody give this a look, I tried to list some limitations/bugs/contributions that we are currently experiencing. Did I miss anything important?

ekool
07-05-2007, 06:22 PM
Orban,

Very nicely put together. I still have my older working Sphinx setup working (thanks to you and many others in here) so I have no need to change anything just yet, but thanks for the wonderful write-up!

PSS
07-07-2007, 07:18 PM
Can somebody give this a look, I tried to list some limitations/bugs/contributions that we are currently experiencing. Did I miss anything important?

Couple of small things:

1. you did

CREATE TABLE sph_counter

but used sphinx_counter in sphinx.conf.

2. Then it would be great to have PREFIX_ where you would place your personal Vb table prefix. I added them there but it is not an easy task for those who do not know mysql syntax.

3. A step by step how to implement sort_search_items() would be nice.

Thanks for EXCELLENT work!

EDIT: Couple of things I would still like to know: when you have Sphinx search in place, do you need to have FULLTEXT index(es) in Vbulletin at all?

Also, is Sphinx used in "new posts" seach, too?

TECK
07-09-2007, 01:56 AM
orban, I don't see any reference in vBulletin or Sphinx to 'timesegments':
elseif ($vbulletin->GPC['sortby'] == 'timesegments')
$cl->SetSortMode ( SPH_SORT_TIME_SEGMENTS);

Is there something I miss? Thanks for explaining.
Also, if anyone got kmike's trick (https://vborg.vbsupport.ru/showpost.php?p=1150437&postcount=292) (for username is userid_12345) fixed into their configuration files, could you be kind and post here the actual code?

Thanks for taking the time to write this up.

PSS
07-09-2007, 03:37 PM
Another question: is there a way to check if searchd is running and if not, put text "search is offline" to the search page?

TECK
07-09-2007, 07:20 PM
Very easy.

function get_searchd_status()
{
return file_exists('/var/run/searchd.pid');
}

$searchd = get_searchd_status();
if ($searchd)
{
// searchd enabled;
}

I wrote the check as a function because you can use it in several areas, this way.

Now, back to my question. Can anyone help me with the username setup? I can't think how you can use a variable in Sphinx conf file... because you cannot. Obviously I`m wrong, kmike did it but unfortunatelly he is not available.

orban
07-09-2007, 08:21 PM
<a href="http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat" target="_blank">http://dev.mysql.com/doc/refman/5.0/...unction_concat</a>

The query that grabs the posts, use two concats:

CONCAT( post, ' ', CONCAT( 'userid_', userid ) )

untested, but I hope you get the idea. Then you need to modify search.php to transform a given userid into the string...

TECK
07-10-2007, 01:51 AM
Aha, thanks orban. What I want to do is this:
If an user wants to search for all threads/posts related to a specific user, he enters a username then leaves the search field empty. The results will show all threads started by that user, ordered the way you like it in Sphinx.

Anyone wants to work with me on this project? I PM'ed kmike, hoping he will join us... since he is the only one who managed to fix this, not to mention other little extras. :)

You wouldn't happen to have an easy way to implement that, would you? My PHP knowledge is somewhat lacking :)

/***
* Removes duplicate keys and orders thread id's by lastpost
*
* @param array record id's to be ordered
*
* @return array
*/
function sort_thread_ids($keys)
{
global $vbulletin;

$itemids = array();

$items = $vbulletin->db->query_read_slave("
SELECT threadid FROM " . TABLE_PREFIX . "thread AS thread
WHERE threadid IN (" . implode(',', array_keys($keys)) . ")
ORDER BY lastpost " . $vbulletin->GPC['sortorder'] . "
");

while ($item = $vbulletin->db->fetch_array($items))
{
$itemids[] = $item['threadid'];
}

unset($item);
$vbulletin->db->free_result($items);

return $itemids;
}

Then you call it anywhere you like:
if (!$vbulletin->GPC['showposts'] AND $vbulletin->GPC['sortby'] == 'lastpost')
{
$orderedids = sort_thread_ids($orderedids);
}

Can you post results on your busy boards and let me know how it impacts the performance?
The function above has less processing code then the original sort_search_items() function.

The PHP BBCode at vb.org is screwed, it breaks the code lines. Switched back to Code, much better.

amcd
07-10-2007, 06:12 AM
Very easy.

function get_searchd_status()
{
return file_exists('/var/run/searchd.pid');
}

$searchd = get_searchd_status();
if ($searchd)
{
// searchd enabled;
}

I wrote the check as a function because you can use it in several areas, this way.

Now, back to my question. Can anyone help me with the username setup? I can't think how you can use a variable in Sphinx conf file... because you cannot. Obviously I`m wrong, kmike did it but unfortunatelly he is not available.
this will not work in a multi-server setup.

TECK
07-10-2007, 06:42 AM
this will not work in a multi-server setup.

True, I'm not there yet with multiple servers. :)
Use this instead:

$cl = new SphinxClient();
$cl->SetServer($sphinx_server, $sphinx_port);
$cl->SetLimits(0, $vbulletin->options['maxresults']);
$cl->SetMatchMode(SPH_MATCH_ALL);
...
$res = $cl->Query($vbulletin->GPC['query'] , $sphinx_index);
...
if (!is_array($res))
{
$sphinxerror = $cl->GetLastError();
if ($sphinxerror)
{
// server not running
}
}

I run a failsafe on my server... if searchd is crashing, vbulletin search will take over automatically.

Edit: Let me dig into this more... I think that searchd will still spit an error, even if it's running, something like (no error).
I will post at sphinx site to ask Andrew how exacly the last error works.

TECK
07-13-2007, 06:41 AM
When to run the indexer seems to be a matter of preference, keeping in mind the usage/size of the database in question. I run two (almost) identical crons, one every 20 min (for the deltas) and one every day (for the full index). The LOCKFILE helps to keep them from stepping on each other.


#!/bin/sh

LOCKFILE=/var/lock/sphinx.cron.lock
INDEXER_CONF=/usr/local/etc/sphinx.conf

# the lockfile is not meant to be perfect, it's just in case the
# two sphinx cron scripts get run close to each other to keep
# them from stepping on each other's toes.

[ -f $LOCKFILE ] && exit 0

trap "{ rm -f $LOCKFILE ; exit 255; }" EXIT

touch $LOCKFILE

cd /usr/local/bin/ #(NOTE: this should be where indexer is located)

DELTAS_ONLY_LINE: ./indexer --config $INDEXER_CONF --rotate YOUR_POST_INDEX_DELTA YOUR_THREAD_INDEX_DELTA >/dev/null 2>&1
OR
FULL_INDEX_LINE ./indexer --all --rotate --config $INDEXER_CONF >/dev/null 2>&1

exit 0


You could also replace ">/dev/null 2>&1" with "| mail -s "Sphinx Report" YOUR_EMAIL_HERE" to get an email of the output.

-RayJ

You should use lockrun instead, is way more robust then a shell script.

TECK
07-22-2007, 07:24 PM
Never mind, I sort it. :)

PSS
07-24-2007, 11:02 PM
I still would like to know: when you have Sphinx search in place, do you need to have FULLTEXT index(es) in Vbulletin at all?

Maybe it is a stupid question and FAQ and RTFM etc, but please take a second to answer yes or no if you know the answer, thanks! :)

mute
07-24-2007, 11:12 PM
I still would like to know: when you have Sphinx search in place, do you need to have FULLTEXT index(es) in Vbulletin at all?

Maybe it is a stupid question and FAQ and RTFM etc, but please take a second to answer yes or no if you know the answer, thanks! :)

Nope.

amcd
07-25-2007, 06:07 AM
not only do you not need the fulltext indexes, but also that having them will not give you the full benefit of an external search solution as mysql will continue to spend (waste) time keeping them up to date.

TECK
07-25-2007, 07:53 AM
Hmmm amcd, you dropped the indexes? I never thought of that.
What exactly you guys did related to this issue? Thanks for your reply.

UK Jimbo
07-25-2007, 07:55 AM
Hmmm amcd, you dropped the indexes? I never thought of that.
What exactly you guys did related to this issue? Thanks for your reply.

Dropping the indexes from the thread and post tables is one of the first things I did after installing sphinx.

Remember to close the forum while you drop the indexes but you should find that inserts to these tables are much faster.

TECK
07-25-2007, 08:12 AM
Thanks for the info. :)
Can you post the queries?

UK Jimbo
07-25-2007, 08:17 AM
If you're using FULLTEXT MySQL search then the word table won't be being used. I'd truncate the word table rather than dropping it just in case.

FULLTEXT search works (from memory) using indexes on the thread and post tables. You can drop those two after cutting over to sphinx.

edit: nice edit there TECK while I was posting :p

TECK
07-25-2007, 08:19 AM
You are to fast for me. :)
I edited the previous reply. Could you be kind and post the queries?
I never played before with indexes. Thanks.

PSS
07-26-2007, 12:57 PM
You are to fast for me. :)
I edited the previous reply. Could you be kind and post the queries?
I never played before with indexes. Thanks.

TRUNCATE TABLE `PREFIX_word`;

ALTER TABLE `PREFIX_post` DROP INDEX `title`;

KRon's improvements are worth doing, too:

ALTER TABLE `PREFIX_post` ADD INDEX `th_search` ( `threadid` , `visible` , `dateline` )

ALTER TABLE PREFIX_pmreceipt DROP KEY `userid`

ALTER TABLE `PREFIX_pmreceipt` ADD KEY `userid` (`userid`, `readtime`);

ALTER TABLE `PREFIX_post` DROP INDEX `userid`

ALTER TABLE `PREFIX_post` ADD INDEX (userid, dateline);

TECK
07-26-2007, 01:48 PM
Thanks PSS for integrating Kron's MySQL optimizations. :)
What is the `th_search`? Is not a vBulletin field. Thanks.

amcd
07-26-2007, 02:42 PM
Thanks PSS for integrating Kron's MySQL optimizations. :)
What is the `th_search`? Is not a vBulletin field. Thanks.
that is just an index name. you can write anything there.

orban
07-27-2007, 11:56 AM
I'm leaving this place if somebody wants to take over this thread and keep the guide up to date feel free to do so it's on page 26 I think. Bye.

ALanJay
07-27-2007, 12:19 PM
Sorry to hear you won't be posting here any more.

TECK
07-27-2007, 11:38 PM
I'm leaving this place if somebody wants to take over this thread and keep the guide up to date feel free to do so it's on page 26 I think. Bye.
Why am I not surprised... Today, I've got a 30 points infraction warning for expressing myself freely.
Check my blog for more details.

BigSoccer Tech.
07-31-2007, 02:26 PM
I'd really appreciate that code as well.

On another note, does anyone know if I can use Sphinx to power the similar threads search in vBulletin or will I still need to use the default engine for that. Any pointers here would be much appreciated.

Any ideas on this?

BillP
08-01-2007, 10:58 PM
I am having problems with my Sphinx search.

I set it up in a basic setting and it is working fine, with a 4-character minimum search. I used the settings and hacks to search.php as described earlier in this thread.

Then I changed VBulletin to allow 3 character words and some exceptions for 2-letter words. I changed sphinx.conf to allow 2 letter words.

I reindexed sphinx, and still can search only for 4+ character words.

Some of the complicating factors: Web server is NOT the searchd server. I do my indexing and run searchd on one host, I run the web server on another host.

Any ideas? I can search using "search" from the CLI and find the 3-letter hits, so I think it has something to do with the way sphinx is shoe-horned in to search.php.

Zia
08-02-2007, 10:24 AM
I'm leaving this place if somebody wants to take over this thread and keep the guide up to date feel free to do so it's on page 26 I think. Bye.

hello...really curious...any one can say..
whats wrong with orban & Orbans Hack (Plugin base templet cache) -the hack moved to graveyard & deleted ?

RS_Jelle
08-07-2007, 06:07 PM
hello...really curious...any one can say..
whats wrong with orban & Orbans Hack (Plugin base templet cache) -the hack moved to graveyard & deleted ?


I'm also curious about this :(
I've read all his latest posts and there's no sign of anything that could be wrong. Pretty strange. All his mods were removed on his request.

Has any staff member more information about this as I also can't contact him (pm/email contact turned off) about it and the future of his old mods ...

Neil Lock
08-10-2007, 06:27 AM
Hey all,

I'm a bit confused as to whats going on - sphinx has saved our board massively and I want to continue to use it! However, I came to look for the install stuff as am about to upgrade versions and find that its gone - does anyone have access to that first post and also the diff files etc and I will attempt to see whats going on so that i can use it on my upgrade!

Obviously don't know any of the politics here, either way - thanks for all you have given this far orban this product has certainly saved my bacon and I truly hope that we can continue to use it!

Thanks

mlx
08-10-2007, 07:01 AM
You mean this post: https://vborg.vbsupport.ru/showpost.php?p=1283359&postcount=387 ?

Neil Lock
08-10-2007, 08:07 AM
You mean this post: https://vborg.vbsupport.ru/showpost.php?p=1283359&postcount=387 ?

sorry my mistake - i always got the info through post 1 which now says ...

Thanks!

BamaStangGuy
08-12-2007, 09:46 PM
It's a shame that vb.org staff keep running away people that improve on vBulletin. This thread has helped so many big board admins run their boards more efficently.

amcd
08-12-2007, 09:55 PM
Did he post anywhere that he is quitting vb.org because of staff problems?

RS_Jelle
08-13-2007, 09:34 PM
No, there was no vB.org staff related discussion with orban on the forums. So I really don't understand it (any staff members to give us some more information about it?) :(

Zia
08-14-2007, 07:12 AM
would be appriciated..if any one can give any clue........

Erwin
08-15-2007, 09:58 PM
Very interesting thread, following this closely.

BillP
08-16-2007, 05:46 PM
Does this hack for Sphinx with VB allow for searches in titles only? We have it installed, I believe I installed it all correctly, and searches are working great. But if the search is done from the Advanced page, with "Search Titles Only" selected, it fails to return anything.

I can go in to the server itself and manually query the titles and get hits. It's purely the integration with VB that seems to be falling over.

Any ideas?

========== later ==============
Never mind, I found a typo in my sphinx.php file in <vbhome>/includes. Now that it's pointing at the right index, it works great.

Rayn21
08-28-2007, 11:19 AM
I'm finding that my sphinx search is only returning results to vbulletin from the last month or so, regardless of the advanced search settings. When I run commands directly to search on the command line, I get many, many more results ... but when interacting with vbulletin something goes wrong.

Any immediate ideas?

telescopi
08-28-2007, 11:55 AM
Works here - I just followed the instructions on obans post on page 26.

Could your vbulletin max results be limiting what sphinx is returning?

What happens if you sort ascending?

Rayn21
08-28-2007, 01:49 PM
I set the vbulletin max results to 6000 just to see what happens and I wound up with more posts, but still I am missing some. I am wondering what I am missing. It had been working for me as far as I know, it's just that the popular search terms seem to have a hidden time constraint on them ... maybe because they return so many results.

If I search for a popular search term, I'll get back maybe 300 results all from a recent time frame (about a month). If I search for a less popular search term, I'll get results going all the way back to 2000 like I should. I will have to go through and re-examine all the scripts again, but something is funny going on.

What could limit my results between the search command line and vbulletin itself ... because there are many more results returns on the command line.

If I search for a popular term like 'Kurayami' a popular poster who is mentioned often ... with 6000 max results in vB and 1200 max results in sphinx.conf ... I'll max out at 1000 results all very recent. Weird but understandable. If I search 'Rayn' ... a few results but many less than 1000 ... all within a relatively recent time frame. There should be more, the command line returns thousands of posts.

I cannot locate the source of this discrepancy.

http://www.tribalwar.com/forums/search.php

mlx
08-28-2007, 01:59 PM
I think the problem with the instructions posted here is that the search result isn't grouped by threadid if you are searching for threads only.

i.e. Sphinx' API only returns 500 posts but not threads containing the search query (or whatever you are using as the vb search limit).

I think we have fixed this at our forum, I can post our improved sphinx.php once I'm back home.

Rayn21
08-28-2007, 02:32 PM
What you are describing is consistent with what I am observing right now. I'd love to see what solution you found to the issue if you remember to post it later.

mlx
08-29-2007, 06:20 PM
I'm leaving this place if somebody wants to take over this thread and keep the guide up to date feel free to do so it's on page 26 I think. Bye.

So I guess it's OK to post our improved sphinx.php here. Thanks to orban and everyone else contributing to this thread for their work!

I have added some code to group the search results by threadid and to actually sort the results just like vBulletin's default search engine does.

I think the code should work fine with the config posted by orban on page 26. Let me know if you have any problems though.

Rayn21
08-30-2007, 12:48 AM
The code works wonderfully for most of my users, but for some users I get an empty searchd result back if - and only if - I am returning results as threads. (When searching for posts). When I use the search command line with the threadid as the groupby and the same index, I get results. I looked through the code, but I am not sure what would cause this particular issue. Have any ideas? I love the improvements.

Jon
09-02-2007, 09:09 AM
Sphinx works fine on our forums when displaying results as threads. Displaying results as posts however returns an incomplete set of results, but only with terms that occur often (which is not hard with > 8M posts :)). On top I get results with a match on the title only (why..?), and those are followed by older results with a match on posts. Recent posts seem to be missing in the result set.

Any hints on how to get the search to behave properly?

amcd
09-02-2007, 06:21 PM
maybe your post delta is not being updated properly.

Jon
09-02-2007, 07:45 PM
No, the indexer works fine. There could be up to hundreds of posts not included. Say I'm searching for a term, with a match on posts, displayed as posts, I first get 3 pages of matches on title, ranging from 2007 to 2005. After that I get matches on post, ranging from 2005 to 2003. Recent posts are nowhere to be found.

TECK
09-06-2007, 10:22 PM
Good to see the thread is back! The need queries are still here... Hoooray!

TheComputerGuy
09-06-2007, 11:59 PM
I think I am going to give this a try. This should be awesome if it works.

MentaL
09-15-2007, 01:00 AM
Guys, Im really stuck, im trying to set this up and i just run into trouble, if anyone can gimme a good guide, thats great, but im stuck at this point.



root@mental [~/sphinx-0.9.7]# indexer --config sphinx.conf --all
Sphinx 0.9.7
Copyright (c) 2001-2007, Andrew Aksyonoff

using config file 'sphinx.conf'...
indexing index 'post'...
ERROR: index 'post': failed to open sphinx-data/fulltext.tmp0: No such file or directory.
total 0 docs, 0 bytes
total 0.023 sec, 0.00 bytes/sec, 0.00 docs/sec
indexing index 'postdelta'...
ERROR: index 'postdelta': failed to open sphinx-data/fulltextdelta.tmp0: No such file or directory.
total 0 docs, 0 bytes
total 0.017 sec, 0.00 bytes/sec, 0.00 docs/sec
indexing index 'thread'...
ERROR: sql_connect: Unknown MySQL server host 'db.local' (1) (DSN=mysql://:***@db.local:3306/).
ERROR: index 'thread': (no error message).
total 0 docs, 0 bytes
total 5.149 sec, 0.00 bytes/sec, 0.00 docs/sec
indexing index 'threaddelta'...
ERROR: sql_connect: Unknown MySQL server host 'db.local' (1) (DSN=mysql://:***@db.local:3306/).
ERROR: index 'threaddelta': (no error message).
total 0 docs, 0 bytes
total 1.451 sec, 0.00 bytes/sec, 0.00 docs/sec
distributed index 'fulltext' can not be directly indexed; skipping.
distributed index 'threadtitles' can not be directly indexed; skipping.
root@mental [~/sphinx-0.9.7]#

amcd
09-15-2007, 05:28 AM
Post your configuration file.

From what you posted, it appears that you have 2 problems, at least.
1. sphinx-data does not exist or the sphinx user does not have proper permissions to that directory.
2. sphinx is not able to connect to mysql because the connection parameters are not correct.

MentaL
09-15-2007, 09:54 AM
Post your configuration file.

From what you posted, it appears that you have 2 problems, at least.
1. sphinx-data does not exist or the sphinx user does not have proper permissions to that directory.
2. sphinx is not able to connect to mysql because the connection parameters are not correct.

Managed to fix that error but Now i have this error.


root@mental [~/sphinx-0.9.7]# indexer --config sphinx.conf --all
Sphinx 0.9.7
Copyright (c) 2001-2007, Andrew Aksyonoff

using config file 'sphinx.conf'...
indexing index 'post'...
ERROR: index 'post': raw_hits: write error: 5941 of 262023 bytes written.
total 2337609 docs, 630070420 bytes
total 89.155 sec, 7067095.78 bytes/sec, 26219.46 docs/sec
indexing index 'postdelta'...
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.010 sec, 0.00 bytes/sec, 0.00 docs/sec
indexing index 'thread'...
ERROR: index 'thread': raw_hits: write error: 156494 of 262021 bytes written.
total 215804 docs, 5689335 bytes
total 2.430 sec, 2341232.40 bytes/sec, 88806.04 docs/sec
indexing index 'threaddelta'...
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.010 sec, 0.00 bytes/sec, 0.00 docs/sec
distributed index 'fulltext' can not be directly indexed; skipping.
distributed index 'threadtitles' can not be directly indexed; skipping.
root@mental [~/sphinx-0.9.7]#

epheph
09-26-2007, 05:44 AM
Hello, I have installed this mod and sphinx into a sizable vBulletin 3.6.5 installation (6M+ posts) and the performance so far in testing is very good. The only issue I have is that 'show as posts' doesn't work all the time. It seems if the search term has many results, I get

"Sorry - no matches. Please try some different terms."

But everything else works great (by user, etc). If I intentionally search for a term I know will be quite low, like misspellings, the "show by posts" works just fine. Any ideas?

kontrabass
10-21-2007, 08:44 PM
Well I've finally got this up and going. Install went well, thanks to all the input here. Took me a while to figure out why a search term with thousands of matches returned only 180 vbulletin threads (1000 post matches gets distilled down into the 180 threads that hold them!). So I upped the max results to 2000 and get satisfactory result sets.

I was disappointed with not being able to do "phrase matches" - until I changed this line in sphinx.php:

$cl->SetMatchMode ( SPH_MATCH_ALL );

to

$cl->SetMatchMode ( SPH_MATCH_EXTENDED );

Extended mode enables all this stuff (including "phrase matching") :

http://www.sphinxsearch.com/doc.html#extended-syntax

(sorry if this is repeated)

However it seems that unless you can teach your users to use pipes and ampersands, "AND" and "OR" text boolean operators will not work :(

Edit: Actually, even pipes and apmersands don't work I guess ("The search term you specified (|) is under the minimum word length (3) and therefore will not be found. Please make this term longer."). Lol

Is everyone using this search doing ok without any boolean operators? Every search is treated as an "AND" search I guess.

BillP
10-23-2007, 04:08 AM
No, the indexer works fine. There could be up to hundreds of posts not included. Say I'm searching for a term, with a match on posts, displayed as posts, I first get 3 pages of matches on title, ranging from 2007 to 2005. After that I get matches on post, ranging from 2005 to 2003. Recent posts are nowhere to be found.

We are getting strange results as well. Looking at results via sphinx "search" CLI shows me that there are hits in the right date range, but in our case we have lots of data since August, and spotty or missing data going back to 2003, light data back from there to 2001, and then a whole lot of data again. It's like the index is "light" for a few years.

=========

Bzzt! I forgot to update search.php after the last VB upgrade (it was on my to-do list, I just muffed it)

If you have weird data being returned, take a look at your search.php to see if you accidentally upgraded it and lost your edits!

eoc_Jason
11-01-2007, 03:05 PM
Didn't want to see this thread die as I still heavy rely on Sphinx for my vB search. It's disappointing that the vB team *still* has not come up with a built-in solution for searching that is acceptable for large forums.

One bit of advice - The code in the vB search does a lot of weighting & filtering. I have many instances where I search for a specific word that is in a post, and doing a raw search I can find it, but after vB works its magic it will give a 'no results found'. So don't think something is broken, I guess technically it is, but it's 'by design'.

Anyhow, here's some bits of code that might help get your sphinx running a little smoother.

One thing you really should do is run the search daemon & indexer under a non-root user, I use 'sphinx'. If yours is different than simply adjust the files accordingly. you will probably need to make a few directories mentioned in the scripts below and have them owned by your user that sphinx is running as. (like for the lock file & log)

I run on Redhat / CentOS, here's my script that goes in the /etc/rc.d/init.d/ directory. I call it 'searchd'. Simply use 'chkconfig' to add it and have it start up when you system boots.

#!/bin/sh
#
# searchd This script starts and stops the sphinx search engine
#
# chkconfig: - 80 15
#
# description: Stand Alone Search Engine
# processname: searchd
# config: /usr/local/etc/sphinx.conf
# pidfile: /var/run/searchd/searchd.pid


# Source function library.
. /etc/rc.d/init.d/functions

RETVAL=0

start() {
echo -n "Starting Sphinx: "
sudo -u sphinx /usr/local/bin/searchd --config /usr/local/etc/sphinx.conf > /dev/null 2>&1
RETVAL=$?
if [ $RETVAL -eq 0 ]; then
success startup
touch /var/lock/subsys/searchd
else
failure startup
fi
echo
return $RETVAL
}

stop() {
echo -n "Shutting Down Sphinx: "
kill `cat /var/run/searchd/searchd.pid`
RETVAL=$?
if [ $RETVAL -eq 0 ]; then
success shutdown
rm -f /var/lock/subsys/searchd /var/run/searchd/searchd.pid
else
failure shutdown
fi
echo
return $RETVAL
}

restart() {
stop
start
}

case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
status)
status searchd
;;
*)
echo $"Usage: $0 {start|stop|restart|status}"
exit 1
esac

exit $RETVAL

Second, here is my cron script. I made a directory called 'cron.quarterly' that runs every 15min. Similar to the cron.hourly, cron.daily, etc... You can do whatever. If you make that directory be sure to edit your /etc/crontab file accordingly too. I added a time-check so once a day sphinx will do a full-reindex, and it makes a lock file so if yours takes a long time to reindex you won't run into issues. Basically at 5am it will do a re-index. I chose that time because it's a low usage period for my forum and server. By default the cron.daily scripts run at 4am so you wouldn't want to do it then since everything else will be eating CPU cycles.


#!/bin/sh

# the lockfile is not meant to be perfect, it's just in case the
# two sphinx cron scripts get run close to each other to keep
# them from stepping on each other's toes.

LOCKFILE=/var/lock/subsys/sphinx_indexer

# If the lockfile exists then exit!

if [ -f $LOCKFILE ]; then
echo "Lockfile already exists, not running sphinx indexer!"
exit
fi;

touch $LOCKFILE

compareh=$(date +%k)
comparem=$(date +%M)

if [ $compareh -eq "5" ] && [ $comparem -le "14" ]; then
sudo -u sphinx /usr/local/bin/indexer --config /usr/local/etc/sphinx.conf --rotate --all > /dev/null 2>&1
else
sudo -u sphinx /usr/local/bin/indexer --config /usr/local/etc/sphinx.conf --rotate post_index_delta thread_index_delta > /dev/null 2>&1
fi;

rm -f $LOCKFILE

exit 0

I also have my logs in their own /var/log/searchd/ directory, you set this in the sphinx config (most people probably just have them in the /var/log dir). Again that directory will need to be owned by the sphinx user that you use.


/var/log/searchd/*.log {
missingok
compress
postrotate
if test -n "`ps acx|grep searchd`"; then
/sbin/service searchd restart 2> /dev/null > /dev/null || true
fi
endscript
}


Oh, and for people wanting to know how to implement the sort_search_items(), here's some code. It goes in the includes/sphinx.php file.

At the end where there is the following:

if ($vbulletin->GPC['titleonly'] == $vbulletin->GPC['showposts'])
$orderedids[$docinfo['attrs'][$sphinx_switch_fields]] = $docinfo['attrs'][$sphinx_switch_fields];
else
$orderedids[] = $doc;
}
}
else
$orderedids = array();
}


Replace that with:



if ($vbulletin->GPC['titleonly'] == $vbulletin->GPC['showposts'])
{
$orderedids[$docinfo['attrs'][$sphinx_switch_fields]] = $docinfo['attrs'][$sphinx_switch_fields];
$itemids[$docinfo['attrs'][$sphinx_switch_fields]] = $docinfo['attrs'][$sphinx_switch_fields];
}
else
{
$orderedids[] = $doc;
$itemids["$doc"] = true;
}
}
}
else
{
$orderedids = array();
}

// ################################################## ###########################
// now sort the results into order
// ################################################## ###########################
if (!$vbulletin->GPC['titleonly'] OR $vbulletin->GPC['showposts'])
{
// sort by database field
if ($vbulletin->GPC['sortby'] == 'post.dateline' || $vbulletin->GPC['sortby'] == 'lastpost')
{
if (empty($itemids))
{
$errors[] = array('searchnoresults', $displayCommon);
}
else
{
// remove dupes and make query condition
$itemids = iif($vbulletin->GPC['showposts'], 'postid', 'threadid') . ' IN(' . implode(',', array_keys($itemids)) . ')';

// sort the results and create the final result set
$orderedids = sort_search_items($itemids, $vbulletin->GPC['showposts'], $vbulletin->GPC['sortby'], $vbulletin->GPC['sortorder']);
}
}
}

// END Results
}


Be sure to leave the unset line at the very very bottom.


I'm about to make the upgrade to vB 3.6.8 whatever, and I'm running php 5.2.4, so I'll let you know how the upgrade goes. I want to look over everyones modifications and updates in this thread. Then I'll post some more files and stuff if necessary.

mute
11-01-2007, 03:34 PM
Thanks for that search results sort order fix, it seems to be working nicely.

All that really stands between this being a perfect solution is finding a way to handle all of the "Find all posts by user" type searches.

mute
11-02-2007, 04:58 PM
Has anyone else had any luck with making Sphinx search for threads started and all posts by username?

I found this thread yesterday, but it seems really broken, at least I can't get it to return the same results.

https://vborg.vbsupport.ru/showthread.php?t=138668

eoc_Jason
11-02-2007, 11:23 PM
IIRC, all posts by user should use userid / threadid / postid and those should be indexed? Or do you mean searching for a phrase by a particular user? If you elaborate I can look into it more. I need to do some tweaking on my sphinx, definitly when I do my vB upgrade and I want to make it more transparent (or even better) than the default vB search.

mute
11-03-2007, 03:29 PM
I mean tossing the searches done via the advanced search page "Find all posts by user" or "Find all threads started by user" to Sphinx. Under the current incarnation those still hit MySQL.

weeno
11-04-2007, 11:39 PM
Thanks for everyone involved. I appreciate the info. Especially, eoc_Jason. Those scripts (https://vborg.vbsupport.ru/showpost.php?p=1373497&postcount=445) saved me a lot of time.

I had to recreate much of the final sphinx.php file from the original file and the followup posts. So I thought I'd post my final edit so people can build on it.

I took the sphinx.php file from orban
https://vborg.vbsupport.ru/showpost.php?p=1283359&postcount=387

Applied these changes:

sort_search_items (https://vborg.vbsupport.ru/showpost.php?p=1373497&postcount=445) changes from the bottom of the post by eoc_Jason

Fixed a Problem with asserts in sphinxapi.php (https://vborg.vbsupport.ru/showpost.php?p=1141750&postcount=284)

Added only the 64-bit Unpack fix detailed here (https://vborg.vbsupport.ru/showpost.php?p=1238827&postcount=360). If you aren't on the 64-bit system, you prob can remove this function. I didn't add/change the rest of ALanJay's changes as I wasn't sure what they were for.

One problem for was the Coventry variable that seems to be causing causing errors:

Warning: in_array() [function.in-array]: Wrong datatype for second argument in /includes/sphinx.php on line 125

that didn't seem to affect the search results. NOTE: I added error_reporting(0) as did mute (https://vborg.vbsupport.ru/showpost.php?p=1105539&postcount=208) at the beginning to simply surpress these errors as I didn't find another solution. If anyone has a solution for this, let me know. If you are testing this script out, you should remove that line to make sure it works before you surpress errors.

I didn't use

this code (https://vborg.vbsupport.ru/showpost.php?p=1328258&postcount=432) by mlx because I wasn't sure exactly what was added.

sphinxapi.php ignoring max matches limit set in sphinx.conf fix (https://vborg.vbsupport.ru/showpost.php?p=1182576&postcount=301) - again, wasn't sure if I needed to use it at the time.

config

my config is PHP 5.2.4 for php servers. Centos 5 64-bit for mysql. vbulletin 3.6.7.

arn

Jon
11-05-2007, 06:53 AM
Thanks for everyone involved. I appreciate the info. Especially, eoc_Jason. Those scripts (https://vborg.vbsupport.ru/showpost.php?p=1373497&postcount=445) saved me a lot of time.

I had to recreate much of the final sphinx.php file from the original file and the followup posts. So I thought I'd post my final edit so people can build on it.
[...]


Thanks! This helped us get rid of our problems with missing posts.

Spinball
11-05-2007, 12:10 PM
I am in the UK and I need to get Sphinx running ASAP.
I've had a read of this thread but I'm finding it pretty baffling. I have reasonable MySQL and PHP skills but leave the server and PHP/MySQL setup well alone.
it's 14:10 here right now and my host will be around later to help with the install.
There's $100 / ?50 in it for someone who can hand-hold me through the installation today onto www.avforums.com.
Any takers?

eoc_Jason
11-05-2007, 02:35 PM
Is there any details on the unpack function? I looked on several pages but I can't find it. I've got a site running spinx that is 64bit, but IIRC they aren't using the unpack and aren't running into issues. What problems were people having?

Oh, about the other post on searching by user... As long as there isn't any search words (just a username) it should use the numeric indexes. It's been a while since I've peeled through the search code though, I'll try to look over it one of these days and see what can be done. I'm not sure if the code works if you put in a search term AND username.

mute
11-05-2007, 03:40 PM
Is there any details on the unpack function? I looked on several pages but I can't find it. I've got a site running spinx that is 64bit, but IIRC they aren't using the unpack and aren't running into issues. What problems were people having?

Oh, about the other post on searching by user... As long as there isn't any search words (just a username) it should use the numeric indexes. It's been a while since I've peeled through the search code though, I'll try to look over it one of these days and see what can be done. I'm not sure if the code works if you put in a search term AND username.

If you check out the link I pasted someone attempted to get this working. Granted I am running 3.6.2 with a 3.6.8 search.php+/includes/functions_search.php, but I believe something weird is going on with it, but I can't figure out exactly what is going on.

There's a definite bug with usernames with a "=" in them regardless of if I'm broken for other reasons :)

BrandiDup
11-05-2007, 04:12 PM
Does anyone have a forum where I can see this type of search in action? I have tried reading through this thread and on the sphinx site but can't seem to grasp how exactly it's different than vbulletin search, aside from it being a better solution for big boards. We have 1.5 mil posts on our site and search doesn't seem to lag too much right now, but this may be something we should look into soon before any problems arise. How is the interface of the search different? Does it look similar to google's search?

weeno
11-05-2007, 11:35 PM
Does anyone have a forum where I can see this type of search in action? I have tried reading through this thread and on the sphinx site but can't seem to grasp how exactly it's different than vbulletin search, aside from it being a better solution for big boards. We have 1.5 mil posts on our site and search doesn't seem to lag too much right now, but this may be something we should look into soon before any problems arise. How is the interface of the search different? Does it look similar to google's search?

http://forums.macrumors.com/search.php

Sphinx search with two different search options (match documentation (http://www.sphinxsearch.com/doc.html#matching-modes))

Normal = Match All (SPH_MATCH_ALL mode)
Extended = Extended (SPH_MATCH_EXTENDED mode - not all options work)

both use sphinx.

arn

BrandiDup
11-05-2007, 11:39 PM
http://forums.macrumors.com/search.php

Sphinx search with two different search options (match documentation (http://www.sphinxsearch.com/doc.html#matching-modes))

Normal = Match All (SPH_MATCH_ALL mode)
Extended = Extended (SPH_MATCH_EXTENDED mode)

both use sphinx.

arn

OK, I'm thoroughly confused now. It looks JUST LIKE vBulletin search. So, the sphinx search is integrated into the actual forums, it's not at all like the google search that you get for your site that brings the search up in a totally different format, right? The search is kind of slow on our site but I tried the google and didn't like it b/c of the format that the results get brought back in. How exactly (in dummy terms) is it different from the search engine built into vbulletin? It looks so much like the standard search, which is very nice, but confusing to me...

weeno
11-05-2007, 11:43 PM
How exactly (in dummy terms) is it different from the search engine built into vbulletin? It looks so much like the standard search, which is very nice, but confusing to me...

When you initiate a search with normal vBulletin code, it performs a complicated MySQL query to your database. This can be so complicated that it slows down your database and make it unresponsive.

What Sphinx does is run an indexer, which pulls the data from the database and indexes it in a file on the hard drive. So when you search, you access these files. It doesn't tie up the database and is much faster and less resource intensive. The catch is you need to keep updating the files on a regular basis (usually with an automted script running ever 20 minutes or so).

So, from vbulletin's side, you need to modify vb's search.php file, so instead of asking MySQL for search results, it asks Sphinx. The results are then sent back to VB and displayed just like you are used to.

arn

BrandiDup
11-06-2007, 01:28 AM
When you initiate a search with normal vBulletin code, it performs a complicated MySQL query to your database. This can be so complicated that it slows down your database and make it unresponsive.

What Sphinx does is run an indexer, which pulls the data from the database and indexes it in a file on the hard drive. So when you search, you access these files. It doesn't tie up the database and is much faster and less resource intensive. The catch is you need to keep updating the files on a regular basis (usually with an automted script running ever 20 minutes or so).

So, from vbulletin's side, you need to modify vb's search.php file, so instead of asking MySQL for search results, it asks Sphinx. The results are then sent back to VB and displayed just like you are used to.

arn

OK, that makes sense then. How cool that it integrates so nicely, to be exactly like what the users are used to. My members aren't at all keen on the google search b/c of the way the results display but I think we'll need a better solution than the standard vb search soon. So, this may be worth looking in to.

I did try to skim through most of this very long thread but the original post is gone and some of the thread is sort of confusing to me. So, can you or anyone else tell me if this is a good, stable solution that would be fairly easy for implement?

For the updating that needs to be done every 20 minutes or so, is that done by cron through vbulletin? Or do you enable it somewhere else? And, lastly, does this updating bog the server down? Every 20 minutes seems like a lot of updating but I guess it wouldn't matter if it's not that server intensive.

I'm just trying to figure out a solution that will work for us as we grow.


While we're on this subject though, is the new posts search function as server intensive as normal searches are? Nearly everyone who uses our site uses the new posts feature, so we've got no less than 100 people clicking new posts at the same time, over and over and over again. I was just wondering if this is the same as normal search or if this is something entirely different.

Thank you so much for your help.

weeno
11-06-2007, 02:27 AM
I did try to skim through most of this very long thread but the original post is gone and some of the thread is sort of confusing to me. So, can you or anyone else tell me if this is a good, stable solution that would be fairly easy for implement?

For the updating that needs to be done every 20 minutes or so, is that done by cron through vbulletin? Or do you enable it somewhere else? And, lastly, does this updating bog the server down? Every 20 minutes seems like a lot of updating but I guess it wouldn't matter if it's not that server intensive.

I'm just trying to figure out a solution that will work for us as we grow.

While we're on this subject though, is the new posts search function as server intensive as normal searches are? Nearly everyone who uses our site uses the new posts feature, so we've got no less than 100 people clicking new posts at the same time, over and over and over again. I was just wondering if this is the same as normal search or if this is something entirely different.

Thank you so much for your help.

It's not that easy a solution to implement. you have to know your way around the linux/unix command line a bit to get the solution working.

Sphinx itself needs to compiled/installed. the config file need to be modified. cron jobs added etc... all doable, but not entirely straightforward esp if you have limited experience with it. myself included, it took me a while to stumble through it myself. All the information is best summarized in these posts:

https://vborg.vbsupport.ru/showpost.php?p=1283359&postcount=387
https://vborg.vbsupport.ru/showpost.php?p=1373497&postcount=445
https://vborg.vbsupport.ru/showpost.php?p=1373497&postcount=450

It does appear to be a stable solution and works very well as you can see.

The New Posts search does not appear as intensive and mine is still running through my regular mysql search and doing fine.

arn

Spinball
11-06-2007, 05:57 AM
Does anyone have a forum where I can see this type of search in action? Yesterday Jason (https://vborg.vbsupport.ru/member.php?u=3454) very kindly helped me to get Sphinx going on my forums with over 5 million posts at www.avforums.com/forums/search.php
Basically turned a struggling site into a quick one. :up:

Spinball
11-07-2007, 07:52 AM
I just deleted the fulltext indexes from post and thread and got the following error when undeleting a thread:
Database error in vBulletin 3.6.8:

Invalid SQL:

SELECT thread.threadid, MATCH(thread.title) AGAINST ('Premium Pack ?190') AS score
FROM thread AS thread

WHERE MATCH(thread.title) AGAINST ('Premium Pack ?190')
AND thread.open <> 10
AND thread.threadid <> 528589

LIMIT 5;

MySQL Error : Can't find FULLTEXT index matching the column list
Error Number : 1191
Date : Wednesday, November 7th 2007 @ 09:50:56 AM
Script : http://www.avforums.com/forums/postings.php?do=updatethread&t=528589
Referrer : http://www.avforums.com/forums/postings.php?do=editthread&t=528589
IP Address : 82.37.224.75
Username : Stuart Wright
Classname : vB_Database

HELP!

(I recreated the index for now)

Marco van Herwaarden
11-07-2007, 08:24 AM
Most likely something to do with Similar Threads. You can try disabling this in your vBulletin Options.

Spinball
11-07-2007, 08:50 AM
If it is similar threads and I want to keep the functionality, is there a work around?

Marco van Herwaarden
11-07-2007, 08:57 AM
Then you shold not remove the index.

weeno
11-07-2007, 03:24 PM
Then you shold not remove the index.

I suppose this answers a question I had about similar threads. I guess similar threads uses full text search. Anyone interested in hacking it to use sphix?

arn

eoc_Jason
11-08-2007, 05:13 PM
You know I was thinking about how to add some of the search features back on the search page.

The current scenario is a basic thread & post index, and then doing delta's of those two. However doing a thread full reindex doesn't take that long (20 seconds or less for me), and the thread table holds very relevant info such as number of replies, number of views, etc..

That's my first goal for this upcoming weekend project. Then I'm going to work on phrases & boolean since sphinx supports that. Google does phrases by using quotes, and so I figure doing a little regular expression matching to set a flag on the search time would work well for sphinx. Also I think I might have to do some character conversion to go between the common boolean options to the sphinx ones.

------------------

To answer someone's question above, usually vB will start to take a dive in search performance after about 2 million posts. It really depends on the amount of content in each post.

The search issue arises like this... A person does a search, so mysql is running that query. In the mean time people can view posts on the forum as usual. However, as soon as a person posts a new thread or reply, that INSERT has to wait until the search is complete. A side effect of that INSERT is that every SELECT afterwards (people just browsing the forum) also has to wait until the INSERT is complete (which is waiting on the search). So queries really start to build up in queue as people are waiting. And if your users are like most that are impatient, if the page doesn't load in a couple seconds they click refresh, which sends ANOTHER query to the DB.... All this has a downward spiral as memory is being used up and the server starts to swap to disk. By then your server is probably completely unresponsive.

----------------

As for the "new posts", that is a simple search based on a 'date' column, it has nothing to do with the post content searching. The relevant fields should already have indexes and it should run pretty good. It can never really be cached by mysql though since the date is different for each person and the results are constantly changing because of replies and such.

---------------

Don't even get me started about the vB search code that can remove valid results due to the extra vB weighting and such.

---------------
Okay, that was a little off track... hehe... Anyhow, I'll try to post whatever new code I come across. I rely on Sphinx to keep my forum search useable, so I won't be going anywhere.

--------------- Added 1194619418 at 1194619418 ---------------

Just realized the thread full re-index thing won't work as I initially hoped. Searching post contents uses only the post index which includes some thread info (like replies). Though I think I can write some secondary cleanup code to re-search and gather realtime info after the sphinx query has run.

weeno
11-12-2007, 01:37 AM
my searchd quit on its own today for unclear reasons. I restarted it manually.

How are you guys keeping it running? And why would it quit?

arn

amcd
11-12-2007, 05:13 AM
never had it quit on its own.

eoc_Jason
11-12-2007, 12:55 PM
my searchd quit on its own today for unclear reasons. I restarted it manually.

How are you guys keeping it running? And why would it quit?

It has never died on my forum, but I have had rare instances where it does die every now and then on another forum that I help a guy out with. He is running x86_64, I'm still 32bit. Unfortunatly I could never find any logs or reason for it to die, it was just totally random it seemed.

On my site I use SIM from rfxnetworks (it does service monitoring and will restart a downed service). I just coded a custom addition for searchd. But I've never been notified of it going down (on my site). You could just as easly write a little cron script doing a 'ps aux | grep searchd' type thing... Or maybe something with inittab?


Unfortunatly I didn't get to work on any coding for sphinx this weekend due to some unexpected happenings. I have all the logic worked out in my head I just need to convert that into useable code.

weeno
11-16-2007, 07:03 AM
It has never died on my forum, but I have had rare instances where it does die every now and then on another forum that I help a guy out with. He is running x86_64, I'm still 32bit. Unfortunatly I could never find any logs or reason for it to die, it was just totally random it seemed.

On my site I use SIM from rfxnetworks (it does service monitoring and will restart a downed service). I just coded a custom addition for searchd. But I've never been notified of it going down (on my site). You could just as easly write a little cron script doing a 'ps aux | grep searchd' type thing... Or maybe something with inittab?

here's a short shell script I pieced together based on some of your code

#!/bin/sh
if test -z "`ps acx|grep searchd`"; then
/etc/rc.d/init.d/searchd start 2> /dev/null > /dev/null || true
fi


cron'd it every minute to keep searchd running. Just checks to see if searchd is running, if not, it starts it.

arn

UK Jimbo
11-16-2007, 07:19 AM
cron'd it every minute to keep searchd running. Just checks to see if searchd is running, if not, it starts it.

What he said. This is similar to what I have.

I've seen it die for no reason two or three times in the last eight months on a busy box.

TheComputerGuy
11-25-2007, 11:45 PM
i wonder how this will work on a shared server enviroment.

UK Jimbo
11-26-2007, 07:37 AM
i wonder how this will work on a shared server enviroment.

It depends on how you want to run it. Sphinx runs as a TCP based daemon so it's possible to have more than one client (vBulletin instance) connecting in to the same sphinx daemon.

It's also possible to build the index in one place and then copy it out to a number of servers all running the daemon. This can be handy if you don't want to place excessive load on the servers while replicating.

amcd
11-26-2007, 08:15 AM
methinks he was talking about one server shared among many websites

UK Jimbo
11-26-2007, 08:23 AM
methinks he was talking about one server shared among many websites

Now that I've had a coffee and woken up more I think so too :)

I think I read "distributed" or something :rolleyes:

weeno
11-28-2007, 02:35 AM
anyone interested in modifying the "similar thread" search feature to utilize sphinx?

arn

weinstoc
11-28-2007, 02:56 PM
anyone interested in modifying the "similar thread" search feature to utilize sphinx?

arn

This brings up a (novice?) pre-install question for me. If I use Sphinx for searching my big board do I need to build the search index for similar threads to work? Does this use the fulltext index or the standard vB one? And if the fulltext one why bother with Sphinx. Won't the fulltext cause the load whether it's used for searching or similar threads?

Chuck

weeno
11-28-2007, 06:25 PM
This brings up a (novice?) pre-install question for me. If I use Sphinx for searching my big board do I need to build the search index for similar threads to work? Does this use the fulltext index or the standard vB one? And if the fulltext one why bother with Sphinx. Won't the fulltext cause the load whether it's used for searching or similar threads?


Well, i've never kept similar thread search on, but if you were to keep it on with a large db, I'd think it would cause the same problems as regular search.

I'm not sure I understand what your question is. But I guess my answer is I don't have similar threads on because it will hurt performance, but if sphinx is used, I could potentially turn it on.

arn

weinstoc
11-28-2007, 06:58 PM
Well, i've never kept similar thread search on, but if you were to keep it on with a large db, I'd think it would cause the same problems as regular search.

I'm not sure I understand what your question is. But I guess my answer is I don't have similar threads on because it will hurt performance, but if sphinx is used, I could potentially turn it on.

arn

Thank you. That was what I suspected.

Chuck

eoc_Jason
11-28-2007, 08:21 PM
In the functions_search.php file, locate the fetch_similar_threads() function.

You can bypass the stock search code and pass it to sphinx. You can have sphinx return relevancy scores so it should work.

I do not know how often similar threads are updated. Are they created only when the inital thread is, or checked with every reply? If you have a really large forum your similar threads might get stale rather quickly (assuming they are only searched on thread creation).

I'm working on a big re-write of the sphinx / vB search. Basically making it act more like google (or any real search engine for that matter).... I'll be sure to add in a clause for the similar threads though just to be complete.

When it's done I'll probably post the code here.

andrewkhunn
11-29-2007, 08:31 PM
In the functions_search.php file, locate the fetch_similar_threads() function.

You can bypass the stock search code and pass it to sphinx. You can have sphinx return relevancy scores so it should work.

I do not know how often similar threads are updated. Are they created only when the inital thread is, or checked with every reply? If you have a really large forum your similar threads might get stale rather quickly (assuming they are only searched on thread creation).

I'm working on a big re-write of the sphinx / vB search. Basically making it act more like google (or any real search engine for that matter).... I'll be sure to add in a clause for the similar threads though just to be complete.

When it's done I'll probably post the code here.

Similar threads are only updated when the thread is created AFAIK. I have similar threads turned on with Sphinx, and I am pretty sure that Similar Threads is using FULLTEXT search with Sphinx ignoring everything it is doing currently.

Also, please code the aforementioned rewrite up with supprt for similar threads! This is hands-down the best mod for vBulletin right now and I can't imagine search being fixed before vBulletin 4.0 is released.

mute
12-01-2007, 04:53 PM
So, vB 3.7 features are out and look pretty darn cool.

However, still no sphinx support. If you go onto vB.com and look at the search page, it's quite different, and I have a bad feeling that our existing sphinx "addon" as it is referred to will not cut the mustard for those who decide to upgrade to 3.7 as opposed to waiting for 4.0.

Here's a post I made:

http://www.vbulletin.com/forum/showpost.php?p=1458896&postcount=1697

Is it me, or were they sort of insinuating that they were going to support Sphinx unofficially (like memcache) in 4.0?

I'm at the point where come 3.7 I'd be willing to shell out some cash to get 100% of the vB search functionality (tags, find threads by user, find posts by user) all hitting sphinx.

weeno
12-01-2007, 07:09 PM
I added a voice of support for sphinx in that thread.

I hope sphinx can get running in 3.7 easily.... as tags are another feature I have wanted for some time, and it looks very appealing.

Note that they seperated out tags from the search functionality in the most recent revision. So there's a separate tags.php file that handles it. Depending on how tags are implemented, this may be "ok" to keep separate and not need sphinx to take that functionality over.

arn

amcd
12-02-2007, 01:15 PM
I am not upgrading to 3.7 until sphinx search is available. It is going to be the single greatest factor for the decision.

mute
12-02-2007, 09:47 PM
I am not upgrading to 3.7 until sphinx search is available. It is going to be the single greatest factor for the decision.

In thinking about it, it's going to be ours too. We have a ton of other customs tuff but that typically doesn't take much work to upgrade. The sphinx stuff is just something we can't live without. I wish jelsoft would release unofficial support for it, I'm sick of the existing bugs we're running into and I don't have the time to support it myself.

eoc_Jason
12-06-2007, 01:29 PM
Guys, the bulk of the "slowdown" is/was simply the search of the post fulltext index. If your system lags doing other things (like finding all posts by username - with no query text) then you should probably look into optimizing MySQL and/or getting a little bit beefer hardware.

Searching by tag or prefix + keyword shouldn't be too much of an issue. You should be able to add tag info to sphinx and can add a conditional. Alternately you could just search the sphinx index for all results then run a seconday query afterwards pruning out the results that don't match a tag or prefix.

I do plan on upgrading to vB 3.7, so I am definitly going to make sphinx work.

Spinball
12-06-2007, 01:49 PM
I think what this needs is some heroic person to go through this thread, and cherry pick the best ideas and release a full hack with complete instructions for the hopeless but enthusiastic users of Sphinx like me.

weinstoc
12-06-2007, 02:01 PM
We are using Sphinx search following the instructions elsewhere in this thread (including the config file example, etc.) However phrase search does not seem to work right. If I search for "one two" I get what I would expect for "one" AND "two" instead of "one" ADJACENT TO "two".

Is there a configuration parameter that can fix this? It's quite annoying.

Chuck

weeno
12-06-2007, 11:48 PM
We are using Sphinx search following the instructions elsewhere in this thread (including the config file example, etc.) However phrase search does not seem to work right. If I search for "one two" I get what I would expect for "one" AND "two" instead of "one" ADJACENT TO "two".

Is there a configuration parameter that can fix this? It's quite annoying.

Chuck


In my limited testing of this, you would need to use Extended Query syntax:

see: http://www.sphinxsearch.com/doc.html#extended-syntax

which is a modifcation to the code (which is simple)... but then you need to be able to use quotes

"one two" for example.

And I think vb search somehow strips the quotes and doesn't get passed to sphinx, since this phrase searching doesn't work even if you turn on Extended Syntax. I was going to look into it, but never found the time.

arn

Deriel
12-07-2007, 08:52 AM
Guys, the bulk of the "slowdown" is/was simply the search of the post fulltext index. If your system lags doing other things (like finding all posts by username - with no query text) then you should probably look into optimizing MySQL and/or getting a little bit beefer hardware.

Searching by tag or prefix + keyword shouldn't be too much of an issue. You should be able to add tag info to sphinx and can add a conditional. Alternately you could just search the sphinx index for all results then run a seconday query afterwards pruning out the results that don't match a tag or prefix.

I do plan on upgrading to vB 3.7, so I am definitly going to make sphinx work.

I think there are people here, like me, that would be glad to donate to such project. And I do not need a "complete add-on" (ok, that would be cool), just a way to make vB 3.7 work with Sphinx with all new fluffy stuff (mainly tags and thread prefixes)

:)

TECK
12-07-2007, 02:33 PM
Is it me, or were they sort of insinuating that they were going to support Sphinx unofficially (like memcache) in 4.0?
Sphinx is supposed to make it into MySQL Falcon as default module, so vBulletin will support it, IMO. :)
I mean, why not use a good piece of software if it is part of MySQL already?

kerplunknet
12-18-2007, 04:14 PM
Hey guys, what happened to the instructions to set this up with vBulletin 3.6? Did the post get removed? If anyone could provide this again or point my in the right direction, that would be much appreciated. :)

Thanks!

mlx
12-18-2007, 05:05 PM
Hey guys, what happened to the instructions to set this up with vBulletin 3.6? Did the post get removed? If anyone could provide this again or point my in the right direction, that would be much appreciated. :)

Thanks!

The instructions are still here: https://vborg.vbsupport.ru/showpost.php?p=1283359&postcount=387

Would be great if a moderator could add a note to the first post in the thread ;)

kerplunknet
12-18-2007, 07:56 PM
That is what I was looking for. Thank you very much :)

eoc_Jason
12-19-2007, 02:28 PM
You have to remember though that vBulletin has to maintain backwards compability as most people are not running bleeding edge versions of PHP or MySQL. While search performance is a serious issue that the vB team has yet to address, I hope that they do offer some advanced choices for people that do know what they are doing. In short, it never hurts to hope, but don't hold you breath.

Probably next week since things will be slow and I'll be bored I'm going to download the 3.7 beta and take a look at the search code. But I'm not going to post anything based on 3.7 until it is actually released. I do plan on upgrading my forum to it as soon as it is stable / release, so you know I will be getting sphinx to work on it asap.

weeno
01-02-2008, 07:18 AM
find the time to look into 3.7 it Jason?

arn

Xorlev
01-11-2008, 08:30 PM
From what I've been reading, the Sphinx addon doesn't have the ability to search for posts by user, right? And the only way is to make a new field for a fulltext index?

Well, I'm not sure about Sphinx 0.9.7, but on my own (not vBulletin) site running Sphinx 0.9.8 I was able to replicate every feature of the vBulletin search and then some using only the API calls in the Sphinx PHP API. If someone could summarize what's missing in the vB Sphinx implementation I could possibly help write in the features missing. It really takes some experience to be able to work with Sphinx, the documentation is rather lacking.

mute
01-11-2008, 08:55 PM
From what I've been reading, the Sphinx addon doesn't have the ability to search for posts by user, right? And the only way is to make a new field for a fulltext index?

Well, I'm not sure about Sphinx 0.9.7, but on my own (not vBulletin) site running Sphinx 0.9.8 I was able to replicate every feature of the vBulletin search and then some using only the API calls in the Sphinx PHP API. If someone could summarize what's missing in the vB Sphinx implementation I could possibly help write in the features missing. It really takes some experience to be able to work with Sphinx, the documentation is rather lacking.

From what I gather, there is a way to "find all posts by user" and "find all threads by user" using sphinx and appending a hidden prefix to each thread. I'm not running the latest vB so I couldnt get it working properly, but I did notice when I rebuilt my indexes with the additional information it bloated the index size up quite a bit.

I think I posted a page or so back about it.

amcd
01-11-2008, 09:08 PM
It is not needed as those searches do not use fulltext index. They use the index on the userid field which exists for both post and thread tables.

mute
01-11-2008, 09:15 PM
It is not needed as those searches do not use fulltext index. They use the index on the userid field which exists for both post and thread tables.

It most certainly IS needed. Those searches hurt on big sites, even with the indexes.

Xorlev
01-11-2008, 09:51 PM
One can easily setup the index to grab userids as a filterable attribute in sphinx.conf. I suppose what I'm trying to understand is why one would use a new generated fulltext index on a column with "userid_xxxx" when they could just add it to their integer indexes. I would think the performance on the non-fulltext index would be greater, but I could be wrong.

PSS
01-15-2008, 11:07 PM
You have to remember though that vBulletin has to maintain backwards compability as most people are not running bleeding edge versions of PHP or MySQL. While search performance is a serious issue that the vB team has yet to address, I hope that they do offer some advanced choices for people that do know what they are doing. In short, it never hurts to hope, but don't hold you breath.

Probably next week since things will be slow and I'll be bored I'm going to download the 3.7 beta and take a look at the search code. But I'm not going to post anything based on 3.7 until it is actually released. I do plan on upgrading my forum to it as soon as it is stable / release, so you know I will be getting sphinx to work on it asap.

Thanks it would be great if you'd share the 3.7 + Sphinx info! I'm not going to use Vb inbuilt search ever again.

Xorlev
01-19-2008, 04:29 PM
I thought you all that use (and program with) Sphinx might be interested to know that Shodan, the programmer of Sphinx, put up a wiki for the Sphinx community after we cajoled a bit. To show the fruits of that effort, I've written the documentation for the PHP API of Sphinx 0.9.8-r1065 on the wiki. It's a bit more friendly than the comments of sphinxapi.php

http://sphinxsearch.com/wiki/doku.php?id=php_api_docs

Most of the methods are valid for 0.9.7, but honestly, 0.9.7 is so outdated by 0.9.8 there's hardly a comparison. There's a lot of methods added in 0.9.8.

We haven't got around to updating the actual Sphinx documentation to 0.9.8 yet but it's planned.

jwksite
01-19-2008, 09:33 PM
Xorlev,

Have you replicated the Find all posts by user or Find all threads by user in Sphinx then?

I think why they're trying to add a fake keyword is that the search for user's posts/threads normally works by username and not userid. Therefore adding it as a sort field would be impossible as it's not an integer.

I understand that links to users' posts/threads from anywhere other than the actual search.php page would then be able to be turned into one based off the userid instead.. But how do you suggest turning the username into a userid on the actual search.php page? I have only just started looking into Sphinx but I certainly have the capability of learning this really quickly so if you or somebody else could chime in that would be great.

I'm going to be trying to get Sphinx up on my forum in the next few weeks, but I really wanted to hold off until someone comes up with a solid solution to moving "Find all posts/threads" to Sphinx.

On that note.. does this seem to be a good solution? (https://vborg.vbsupport.ru/showpost.php?p=1326804&postcount=4) Someone replied in that thread that they had to tweak the search to show actual POSTS and not THREADS, though.

Or would using the userid ultimately be a nicer solution? It'd still require running a query (only when searching right from Advanced Search) to grab the username, though, I assume.. but nothing that would be as awful on MySQL as the current username search. :)

Xorlev
01-19-2008, 10:27 PM
Well, lets see here. What I'd do is grab the userid searched by (either by id, or running the through a query and getting it back).

Note: This uses my knowledge of Sphinx 0.9.8. I'm not sure about the limitations of 0.9.7.

Show all threads by started by user:

// Other stuff (sorting, etc.)
// Now we grab all threads started by userid.
$cl->SetFilter('postuserid', $userid);
$cl->Query('', 'thread;threadelta');

Here we can then grab the threadids from the matches array.

Show all posts by user:

// Other stuff (sorting, etc.)
// Now we grab all posts by userid.
$cl->SetFilter('userid', $userid);
$cl->Query('', 'post;postdelta');

Grab the postids, retrieve, and show.

Show all threads posted in by user:

// Other stuff (sorting, etc.)
// Now we grab all posts
$cl->SetFilter('userid', $userid);
$cl->Query('', 'post;postdelta');


Now all we have is a bunch of posts. But, what we also have is "threadid" in the attributes of each post. We can collect them, filter duplicates, then retrieve the threads.

Whatever fuzzy matches were done on the username can be done on the user table and returned. The filter value $userid has to be an array anyways, so we can pass it multiple userids we grabbed from the user table. I'm not entirely familiar with vBulletin's classes and such, but I might give it a try to writing in Sphinx 0.9.8 support.

Edit:
I just looked back again and reread this:
# You can only use Sphinx to perform queries that have a full text component. So searches by userid/forumid WITHOUT a key word are not possible. These searches can run on indices though so they shouldn't be an issue.

* Workaround by kmike. "You can emulate the search by user in sphinx by adding a fake unique keyword per each member in the mix (e.g. "_userid_12345"). Searching by this keyword will return all posts by the member with userid 12345."


That's one of the changes in Sphinx 0.9.8. It supports scans, so a blank query returns all results that fall within your filters. Sphinx 0.9.8 will apparently be far easier to implement than 0.9.7.

jwksite
01-19-2008, 11:41 PM
This sounds promising.

The one problem I can see now is searching with "Exact Name" unchecked? It sounds like this is only going to rely on ONE userid. I guess while this new feature of Sphinx is being worked on that we could FORCE "Exact Name" to avoid any issues.

Anyway, I've read through this entire thread by now and this seems like the first real hint of promise since before orban left!

You said a few posts ago that you were willing to help fill in the blanks, so would you be so kind? :) Some kind of outline of changes between the old implementation and what you've done for the new 0.9.8 would be really great. I guess the biggest blank is how you've changed sphinx.php. I imagine that the rest of the old method is mostly unchanged? (Except for a new sphinxapi.php?)

Haha, sorry.. I don't mean for that to sound demanding, obviously. I'm just ready at any point over the next few days to install 0.9.8 and I really want to help get any issues worked out!

Let me know your thoughts on the "non-Exact Name" search, too. I think even in the meantime that it could just reroute to the old vBulletin search?

Edit: Actually, I just guess the exact name thing would rely on how the query to return the username is formed? I'm sure Sphinx will support searching over multiple userids, right?

Xorlev
01-20-2008, 12:28 AM
It currently supports non-exact checking. The nice thing about sphinx filters is you can pass it an array.

I'm actually working on it right now. Mostly I just ripped out vBulletin's search completely rather than the conditional "is the query blank or not?" and handling with vB's or Sphinx. Right now I'm working on adding the features sphinx.php lacks from the advanced search as well as all the sorting modes. If it's on the advanced search page, I'll add it.

After that it's most likely sanity checking to make sure it's all working.

Coventry: All we want to do is exclude posts/threads from userids in the list, right?

andrewkhunn
01-20-2008, 03:27 AM
Coventry: All we want to do is exclude posts/threads from userids in the list, right?

I believe so, yes.

Also, if it's not too much trouble... If you could see about getting the "Similar Thread" search functionality pawned off to Sphinx too, that would be great. It's not that big of a deal for me because it only fires on new threads (might be for some I suppose), but being able to drop the standard FULLTEXT tables from the DB would be nice.

TECK
01-20-2008, 03:24 PM
Right now I'm working on adding the features sphinx.php lacks from the advanced search as well as all the sorting modes. If it's on the advanced search page, I'll add it.

You can have all the vBulletin search features working with 0.9.7 if you know how to code. :)
The only feature I like in 0.9.8 is the usage of wildcards, that's the only option is currently missing in 0.9.7 to make it fully compatible.

I do not use Orban's hack, neither I use his coding approach. It is very bulky.

amcd
01-21-2008, 06:56 AM
What do we have to do to convince you to share your work?

ferreo
01-22-2008, 01:18 AM
This is a silly question but after going through this thread i couldn't find a definite answer.

Is the fulltext search index still used or can I drop it? Reason I ask is that it got corrupted somehow today and I am debating to drop it and not have it repaired. Thanks!

Xorlev
01-22-2008, 01:29 AM
If you use similar threads, then yes. If not, you can delete them. I'm working on replacing similar threads as well.

ferreo
01-22-2008, 01:36 AM
If you use similar threads, then yes. If not, you can delete them. I'm working on replacing similar threads as well.
Excellent, thank you very much! I will give it a shot.

Xorlev
01-22-2008, 02:06 AM
Er, by that I mean "yes you need them." similar threads will fail to work otherwise, if you don't use them you can delete them. I don't believe anything else other than search and similar threads use them though.

amcd
01-22-2008, 04:52 AM
I don't use similar threads and I dropped the fulltext indexes ages ago. No problems whatsoever.

xnetco
01-24-2008, 02:19 PM
Hello.

I'm not sure if this is a bug but maybe someone could help. I've installed sphinx on a board but although the administrators can search users can't search. Anyone seen this before?

Tim

jwksite
01-25-2008, 11:22 AM
Well, I seem to have gotten everything installed OK... but sorting by Relevancy returns 0 results...

Any reason why it should be doing this?

I am using Sphinx 0.9.8 and it did give me a few warnings about things in my sphinx.cron being deprecated, but the other parts of the search work just fine.

I followed the steps properly, so should I be looking in sphinx.php for the error or search.php? Thanks.

Resolved... I just used a different sphinx.php file. The one that Weeno supplied is brilliant! I was using a pretty hacked up one, I guess. :)

Actually.. I guess the only thing that does work is relevancy (aside from dateline)? I thought the sort_search_items() fix supplied a few pages ago was meant to allow searching with all the different sort options again. So I guess sorting by Thread, Username, etc is still not possible?

OK, I guess just anything numeric works, so sorting by dateline, replies, views, and relevancy works, but not the others. *edits the search template again*

--------------- Added 1201276186 at 1201276186 ---------------

Now... My next question is are we supposed to (or is it even needed) make use of "SphinxSE" if we have MySQL 5? I was reading that it plugs into MySQL for faster something-or-other but I'm not quite sure I get what it does.

Thanks everyone who has contributed! I'm really tickled to be able to search through 3M posts again!

Xorlev
01-25-2008, 03:50 PM
Well, I seem to have gotten everything installed OK... but sorting by Relevancy returns 0 results...

Any reason why it should be doing this?

I am using Sphinx 0.9.8 and it did give me a few warnings about things in my sphinx.cron being deprecated, but the other parts of the search work just fine.

I followed the steps properly, so should I be looking in sphinx.php for the error or search.php? Thanks.

Resolved... I just used a different sphinx.php file. The one that Weeno supplied is brilliant! I was using a pretty hacked up one, I guess. :)

Actually.. I guess the only thing that does work is relevancy (aside from dateline)? I thought the sort_search_items() fix supplied a few pages ago was meant to allow searching with all the different sort options again. So I guess sorting by Thread, Username, etc is still not possible?

OK, I guess just anything numeric works, so sorting by dateline, replies, views, and relevancy works, but not the others. *edits the search template again*

--------------- Added 1201276186 at 1201276186 ---------------

Now... My next question is are we supposed to (or is it even needed) make use of "SphinxSE" if we have MySQL 5? I was reading that it plugs into MySQL for faster something-or-other but I'm not quite sure I get what it does.

Thanks everyone who has contributed! I'm really tickled to be able to search through 3M posts again!

I personally don't use SphinxSE. It's just another interface for Sphinx.

It is possible to sort by thread, username, and forum name. It just requires sorting on a ordinal attribute (converts text to ord for sorting). It'll be in mine at any rate.

jwksite
01-26-2008, 07:50 AM
Ah, well I found sorting by forum actually works off the ID, as it groups them correctly. Not sure how the original sorting worked, whether it was by name (alphabetically) or numerically.

Thread and username would be nice though.. :)

TECK
01-27-2008, 09:45 AM
Once you change a forum or modify it and your ID's are not ordered properly, your forum name sorting is gone. Use the forum.title sorting to achieve the proper results, do not use the forumid. Also, you will never be able to properly sort by thread title, no matter what. This is the only feature I have it disabled in vBulletin 3.7, anything else is working perfectly.

You should upgrade to 0.9.8 r1065, there are a bunch a new cool features that will make your programmer life very easy.
I never had any file edits with vBulletin and Sphinx 0.9.7... but it helps to toss a bunch of code that can slow down your cluster performance. I removed more then half of the old PHP code, once I upgraded to latest Sphinx version. :)

It is really cool how smart the engine got. Now I can search for queries like:
search engine (query) OR
search engine (query) + nginx (tag) OR
*engi* (query) + Floren (username) + nginx (tag) OR
Floren (username) + nginx (tag) OR
nginx (tag) OR
Floren (username)

There is a bug with the wild cards, but you address it with this patch:
sphinx.cpp:3418
bool bSpecial =
( iCode & FLAG_CODEPOINT_SPECIAL ) &&
- !( ( iCode & FLAG_CODEPOINT_DUAL ) && m_iAccum );
+ !( ( iCode & FLAG_CODEPOINT_DUAL ) && m_iAccum ) && m_pCur[-1] != '*';

sphinxquery.cpp:900

int iSpecial = sToken
? ( IsSpecial(sToken[0]) ? sToken[0] : 0 )
: QUERY_END;
assert ( !( iSpecial>0 && sToken[1]!=0 ) );

+ if(sToken && sToken[0] == '*') iSpecial = 0;

All searches are sorted by:
- Relevancy
- Number of Replies
- Number of Views
- Thread Start Date
- Last Posting Date
- User Name
- Forum Name AND
* ASC/DESC sorting filters

As you can see, the only sorting option missing is Thread Title.
Since this is a place where Sphinx is already looking for data to pull, I would not recommend it to manipulate it through the ordinal column. The stored data is already huge as we speak, the ATTRS array has 13 keys to index the search data. From my tests, with the current array you get this ratio: 5 posts/1Kb, where each post has an average of 20-30 words. If we do a simple math operation, for 5 million posts you will inflate your data to a whopping 1GB of search data. The problem is not the speed, Sphinx can shred 1 terabyte of data in less then 3 seconds... but still, expect to lose few Gigs on your disks, if you manage a large board. :)

If you deal with a board that have over 5 million posts, make sure you experiment with the max_iosize and max_iops settings. So far those are the optimal values:
mem_limit = 64M
max_iosize = 1048576
max_iops = 40

Another nice feature I like the the seamless rotation you can perform now with the indexer.
Very elegant. I am truly impressed with Andrew's latest version.

jwksite
01-27-2008, 11:02 AM
Is the command for this new rotation any different? Or is the same command just performed differently? Just curious since I'm running the latest version and surely want the best performance out of it.

Also, I have only close to 3M posts, and I thought for sure the indexer data was already approaching 1GB if not just barely exceeding it. Care to share your .conf file? :D

And, like amcd said earlier on "What do we have to do to convince you to share your work?" :)

I would certainly like figuring out the proper optimizations for 0.9.8. I'm using a deprecated sphinx.conf file but I was getting too impatient with getting Sphinx to work in the first place to go in and use the new attributes. Now that I know it's working very well, I think I can finally get around to the tweaking stuff.

TECK
01-27-2008, 03:08 PM
Sorry, I cannot share my work because it is part of my optimization package I offer as a paying service to my clients (like running 3 queries on the vBulletin frontpage without file edits). I spent countless hours through white nights to test various scenarios to get it working perfectly. Basically, it is a combination of server optimizations and PHP code modifications. I also tested the optimization package into several large boards with over 5,000 users online. It improved the overall vBulletin performance and saved them a lot of money on cluster upgrades:

Optimizations Enabled
500 online users, 10 seconds parallel test
779 fetches, 500 max parallel, 1.564e+07 bytes, in 10.0024 seconds
20077 mean bytes/connection
77.8815 fetches/sec, 1.56363e+06 bytes/sec
msecs/connect: 1.50359 mean, 131.5 max, 0.07 min
msecs/first-response: 1674.58 mean, 8914.71 max, 97.031 min

1000 online users, 10 seconds parallel test
633 fetches, 1000 max parallel, 1.26093e+07 bytes, in 10.0019 seconds
19919.9 mean bytes/connection
63.2879 fetches/sec, 1.26069e+06 bytes/sec
msecs/connect: 124.028 mean, 3007.64 max, 0.073 min
msecs/first-response: 1971.85 mean, 8686.33 max, 1.099 min

Optimizations Disabled
500 online users, 10 seconds parallel test
269 fetches, 500 max parallel, 5.34661e+06 bytes, in 10.0094 seconds
19875.9 mean bytes/connection
26.8747 fetches/sec, 534159 bytes/sec
msecs/connect: 0.904112 mean, 85.94 max, 0.074 min
msecs/first-response: 3783.43 mean, 9272.46 max, 1.373 min

1000 online users, 10 seconds parallel test
263 fetches, 1000 max parallel, 5.32541e+06 bytes, in 10.0039 seconds
20248.7 mean bytes/connection
26.2897 fetches/sec, 532332 bytes/sec
msecs/connect: 0.824681 mean, 78.576 max, 0.071 min
msecs/first-response: 3868.94 mean, 8879.29 max, 165.277 min

Test Explained
The test emulated the matching number of users who tried to access as many times as possible the forums, from various locations.
1000 online users, 10 seconds parallel test means that 1000 users visited without interruption the forums. for 10 seconds while navigating as fast as they can through pages.
63.2879 fetches/sec means that those 1000 users managed to visit 63 pages per second, for a total of 633 fetches per user... pretty fast those guys on their browser. :)

Test Comparison
OPTIONS Optimizations Enabled Optimizations Disabled
-----------------------------------------------------------------
Users 1000 500 1000 500
-----------------------------------------------------------------
Fetches 633 779 263 269
Fetches/sec 63.28 77.88 26.28 26.87

In other words, with the optimizations enabled, the overall server performance was improved by 290%. That is almost 3 times faster compared to the optimizations disabled.

As I said before, I do not use Orban's code approach in any way, shape of form... I don't have one single line of PHP code identical to his and the Sphinx configuration file is exactly the opposite what you think it should look like.

amcd
01-27-2008, 05:46 PM
Well, this is a community forum for sharing. If you are only going to brag about your code and not share it, IMHO it should not be allowed.

TECK
01-27-2008, 09:32 PM
Well, this is a community forum for sharing.
If it is a community forum for sharing, why they allow here paid services?

If you are only going to brag about your code and not share it, IMHO it should not be allowed.
I did not bragged about it. I actually posted valuable information (https://vborg.vbsupport.ru/showpost.php?p=1429953&postcount=521) for programmers. Why should I not be allowed to talk about code? It is a coding forum, after all... not a communist zone where moderators dictate what regular users can post. As you noticed already, there are many people who post what they achieved with Sphinx, without sharing any code... I'm not the only one. Any experienced programmer will be able to adapt to vBulletin the Sphinx API and it's search engine in no time... because it is very simple to implement. Not to mention that you do not need any file edits.

andrewkhunn
01-27-2008, 11:45 PM
Sorry, I cannot share my work because it is part of my optimization package I offer as a paying service to my clients (like running 3 queries on the vBulletin frontpage without file edits).

Do you have a link or any more information about your optimization services?