Aha! Okay, I figured out how to change/bypass the MySQL settings for Minimum Word Length and the Stopword lists.
If anyone else is interested, here's what you have to do:
NOTE: The following instructions are applicable for MySQL 4.0.18. I believe they are valid for 4.0.3 and above.
1. In your MySQL option file ("C:\Windows\My.ini", for example), under the central group category ( "[mysqld]", for example), add the following two items:
Code:
ft_min_word_len=2
ft_stopword_file=""
The first item sets the minimum word length to be indexed. In my example, I have it set to 2 characters.
The second item bypasses the MySQL Stopword list - which covers common words like "from", "and", etc...
Alternately, you can always edit the Stopword file directly. To do this, you need to find out what the default file/location is for your particular install (based on version, Platform, how it was installed, etc...)
Use and/or edit either of these two options to suit your needs.
2. After you have made the above changes, restart MySQL. (or, alternately, the entire server).
3. From your AdminCP, execute the following Queries:
REPAIR TABLE post QUICK
REPAIR TABLE thread QUICK
If you use Table Prefixes, edit "post" and "thread" accordingly.
Here is one optimization. As I have time to work on this for 3.1, I'll try to post changes for you.
With 3.0.2 or 3.0.3 you should edit search.php and look for this around line 1303:
FROM " . TABLE_PREFIX . "post AS post
Change this to:
FROM " . TABLE_PREFIX . "post AS post " . iif($vboptions['fulltextsearch'] AND $searchuser, "USE INDEX (userid)") . "
When searching by posts by a specific user and returning results as posts, this will force mysql to search based on userid rather than using the fulltext index. On the whole this will be faster than searching fulltext and then manually scanning for userids.
I'm just getting into imposing the proper limit options and re-evaluating the relevancy junk for the non-fulltext search. It is the extra queries that we have in place to support this pseudo-relevancy that complicates limiting searches when returning results as threads.
FWIW, I've found removing the relevance checks speeds up the search, and the quality of the search is not really affected.
Also, like you say, it's really the returning the results as threads that's the site killer for large forums even with optimization and fulltext. Returning results as posts is not an issue with the right search code.
I did this, and cleared the search index, but it's still showing:
Index Usage 50.31 MB
in the AdminCP stats. Should that be zero?
I'll give you my best guess, which is that it should NOT be zero. When you enable FullText Searching, the two queries that you run create indexes within your "posts" and "threads" tables. Although these are fulltext type indexes, I would think that by virtue of being part of your database, they will still show up as part of your Index useage.
One of the Devs might be able to chime in to say whether I'm way off the mark or not...
Also, like you say, it's really the returning the results as threads that's the site killer for large forums even with optimization and fulltext. Returning results as posts is not an issue with the right search code.
hmmm... as a temporary solution, do you think it will improve performance to forbid searching for threads and limiting the search options to posts?