![]() |
sorry, but I can't test it with multibyte language...
But problem may be in too strict input filtering. Please try to replace $query = trim(preg_replace("~[\.,:;@=']~", " ", $query)); $query = preg_replace("~[^\w\xC0-\xFF\"\-+\* ]~i", "", trim($query)); with $query = preg_replace("~['<>]~i", "", trim($query)); but i still don't think that it will help... |
No, it didn't work either. It does not work even if I remove that part completely.
I think the issue lies between fulltext itself and my language character. I tried to do an search using SELECT * FROM post WHERE MATCH(title, pagetext) AGAINST('汉字') where inside the AGAINST is some chinese character, it returns nothing. Although I know that there are posts with these words. Is fulltext doing a partial match be default? Thanks for your help. |
Installed two days ago on a 1.5 million post VB. Works like a charm.
On our database server (dual AMD 2400MP system, 3 gigabytes RAM) we're seeing a load of about .25 during peak now (12.5% load). Used to be about .50. I was more concerned with overall response time, not load, and this has helped with both. Our front end system (dual AMD 2800MP system, 4 gigabytes RAM) still has a crap load, 1.50 - 2.50 and that's the one regret I have with moving from VB 2.x to VB 3.x. Our front end server used to run about a .75 to 1.5 load. :( |
I have 8+ years in system administration, and could try to help you lower that frontend system load if you wanted, with dual amd 2800, your load should be no where near that high.
|
I've been programming since 1979, sys admin since the early 90s. This is not a case of lack of experience, the system really is loaded down with VBulletin. We're quickly approaching a terabyte of bandwidth per month from the front end server. 10s of millions of page views each month, 1000-1500 users at once in the forums (except this week has been slower due to the Easter/Spring Break holiday).
Vbulletin eats up an enormous number of CPU cycles compared to the prior version. For instance, instead of having the language in the templates, language is now parsed in, adding overhead to every template built. In the last 24 hours, I've done a few things to reduce load not directly associated with VBulletin. For one, I'm running mod_gzip and it looks like image.php and attachment.php where being compressed (avatar.php was left over from the prior httpd.conf config and is now removed). Not a huge amount of CPU overhead reduced by not compressing these, but every bit helps. Additionally, at any given time we will have between 20-100 Google, Yahoo and Inktomi spiders crawling the site. That eats up a lot because they fetch more pages per minute than the average real user. I certainly don't want to turn them away, search engine spidering is crucial to success on the web these days.... |
We are confident that this hack causes some unnecessary mysql tables locking, raising server load to the roof. It was not so with vb2 version of fulltext search hack. Could it be that LEFT JOIN with thread table in the main SELECT causes that locking? VB2 version had a simple
PHP Code:
|
LEFT JOIN don't only add server load :) (besides it don't add too much - you can try to remove that lLEFT JOIN and look yourself at timings with debug=1)
If you have large forum with many different users and user groups and with many forums invisible for some of that usergroups or users, that LEFT JOIN make fulltext search run faster (and sometimes much faster) because some (or many) posts will be excluded from MATCH => AGAINST (and that part of request takes the most part or server load) Second reson is limiting mumber or results returned by search. vb2 version of fulltext search first do lookup in ALL posts with limiting number of matches and after it cut (from that already limited number) more posts looking at user permissions. So, I don't think that vb2 fulltext search works as it must work - sorry :) |
and if you start talking about server load, here I have one thing that really takes too much processor time.. My pain in that hack is relevance when seaching with modifiers (IN BOOLEAN MODE).. If you feel that you can live without relevance when seaching with modifiers, you may try to replace string starting from $fulltext_sql with something like it:
Code:
$isboolean="'"; |
Yes, forum is quite large, about 2,000,000 posts and over 1000 users online in the daytime.
I'll try to dig further, but fact stands as it is: only yesterday on the same hardware with vb2 fulltext search server load was around 2-3 in daytime and less than 1.0 in night time, whereas today it sometimes spikes to 30.0, and I'm seeing hundreds of locked mysql threads besides running search query. Not acceptable - I had to disable search completely for now (BTW, global "enablesearches" setting in admin CP is currently broken). |
Quote:
|
My database server has had no increase in load with the hack installed. 1.5 mil posts and 1000-1500 users online. My front end server is my concern, its been a dog since the day I upgraded from VB2 to VB3.
|
Quote:
|
motorhaven: you think ur is bad, i run always around 5 prolly and get up to 20 alot. I don't have the $$ to get a sep db server or http server. just running dual xeon 2.8 2gb ram. still on vb2 tho with too many hacks.
|
Database server was not issue with vb2 and is not issue with vb3 - db server load rarely reaches 2.0, though we didn't see whole 1300+ users online yet with vb3. Will see how it goes on Monday.
When I wrote about server load 30, I meant frontend server - http processes were stacking up on locked mysql queries, causing peaks of load. With search disabled, frontend server is humming nicely with average load around 1-2. Maybe a little higher than with vb2 at the same day/time, but acceptable. |
Our front end server had no increase in load since adding fulltext. Its just as crappy after fulltext as it was before! Here's a portion of my.cnf:
max_connections = 200 key_buffer_size = 496M # myisam_sort_buffer_size is used when repairing tables only. #myisam_sort_buffer_size = 48M join_buffer_size = 2M read_buffer_size = 2M read_rnd_buffer_size = 2M; sort_buffer_size = 4M table_cache = 1536 thread_cache_size = 250 wait_timeout = 3000 connect_timeout = 60 max_allowed_packet = 8M max_connect_errors = 10 query_cache_limit = 2M query_cache_size = 24M query_cache_type = 1 thread_concurrency = 8 # Full text search fine tuning ft_min_word_len=3 ft_max_word_len=25 ft_stopword_file=/home/database/mysql/stopwords flush_time=86400 3 gigs of RAM on the server.... every bit of the database is either in MySQL's buffers or the Linux disk cache. :) ft_max_word_len=25 really helped to decrease the size of the post index file. It specifies the maximum word length to index. There aren't very many useful words longer than 25! |
Quote:
|
Quote:
|
Here's an idea for those with disk space (and index RAM to spare)...
What about 2 post tables? One without the fulltext index, and the other with it. Searches would be pulled from the secondary post table with the fulltext index which would eliminate locking issues with the primary post table. Of course this would require every insert, update and delete to the post table to be duplicated to the secondary table. I'm beginning to suspect the post table lock mentioned may, as noted previously, cause lock issues causing our front end server to sit in an I/O wait for the data. This would explain why the front end server sees a large load and the database doesn't. I increased the key buffer size on the database server from 500 meg to 1 gigabyte and the query cache to 48 megabytes and saw no real difference on the database server but did see a load decrease on the front end server! Just a thought..... |
Having 2 post tables won't help if they're updated simultaneously - locking just shifts from first to second post table. Now, if new posts/edits are queued and dropped into second table in batches (via INSERT DELAYED or UPDATE LOW PRIORITY), it would make more sense and indeed will eliminate most of locking.
|
Quote:
Typical limiting condition consists of something like: thread.forumid [NOT] IN(25,197,68,159,193,191,120) sometimes with posts.userid IN (100,200,300) added. Now, if you run "EXPLAIN" on resulting queries, you'll see that first index used by mysql is FULLTEXT index on pagetext, and only then threadid index is applied using "where". It means ALL posts are being scanned first using FULLTEXT. The only search type where that LEFT JOIN really limits number of posts to search is a search within thread - EXPLAIN shows that first used index is threadid, and only then fulltext index on pagetext is used. But... I suspect mysql has a bug here, though - actual search time is exactly the same as without "AND thread.threadid=nnn" condition in WHERE clause, which suggests that fulltext index is used here first anyway. Quote:
Also, now that vb3 has a logic for caching search results, querying all posts and limiting results afterwards depending on user's permission or search preferences suddenly appears quite logical, isn't it? Chances are that another user will run search with the same query, in that case we'll just pick up saved post ids and apply query logic filter to them. |
Thinking about two post tables idea, it could go really well along with mysql replication.
Idea: set up second mysql server (even on the same physical machine), set it up to replicate post table from main forums db. All updates to that second mysql server will be handled by only one slave thread, and it could be locked freely without any effect on first mysql server perfomance. Since JohnWoo's hack uses two tables in the main query, it appears that thread table need to be replicated along with post table. Changes to current hack code would be really small: initialize another db connection and fire query to a second db instead of main one. I don't plan to implement this any time soon, though accidentally we have workng replication of our forum db to another server. I'm polishing a vb2-like hack of this (JohnWoo's) hack :) . It eliminates LEFT JOIN and always searches IN BOOLEAN MODE. Had to drop search by relevance though, I highly doubt someone used it even once though. |
i will try..and see what happen. Allway can come back .
|
i dont have to wait to much...
and work Fine whit 3.0.1 mysql> ALTER TABLE post ADD FULLTEXT (title); Query OK, 1762278 rows affected (4 min 2.67 sec) Records: 1762278 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE post ADD FULLTEXT (pagetext); Query OK, 1762278 rows affected (14 min 59.70 sec) Records: 1762278 Duplicates: 0 Warnings: 0 mysql> OPTIMIZE TABLE post; +----------+----------+----------+----------+ | Table | Op | Msg_type | Msg_text | +----------+----------+----------+----------+ | vb3.post | optimize | status | OK | +----------+----------+----------+----------+ 1 row in set (1 min 21.15 sec) |
mmm same problem whit server load...
Treads: 112,110, Post: 1,745,876, Members: 29,111 First day work fine whit 400 users online, second day server go to 8...11.11!!!!!! i have to uninstall...some fix? |
It's great, mysql load average goes as down as it can ;-) Thanks.
|
I've installed this!
It's working well... I'll look over the server loads over the next few days and report on it. Overall, took me only 3 hours... most of the time it was trying to find out why I couldn't add a fulltext index to post, to find out that my secured tmp directory was too small, so I had to modify my.cnf to change the tmpdir temporarily. :) |
Sorry for disappearing :)
Returning to discussion about SQL requests with or without LEFT JOIN :) I recently had a chance to do a lot of tests on one large programming forum with about 900.000 posts. And yes - it is true that excluding some forums from searches do not make search run faster... But including left join don't make it perceptible slower :) But after removing revelance it runs up to 10 times faster! Here are results as requests and time in seconds below. Each request was executed 10 times with clearing DB cache after each. Fastest and slowest time below. Forums were not closed at that moment and there were about 120 online users. Think that numbers too different because of it :) Code:
============ |
Well, my current server loads seem a bit lower than before I installed the hack. But it's not peak time yet. Search appears to be noticeably faster.
John, just tell me which code to use that is the fastest and puts the least amount of stress on the server, and I'll use it. :) Do I just remove the RELEVANCE part in search.php? |
yes :) to remove revelance replace
Code:
//fulltext search query Code:
//fulltext search query Code:
$postscores[$thispost["postid"]] = $thispost["relevance"] * $vboptions['multimatchscore']; |
Thanks again JohnWoo...just modify that in my file....in a few days will see :D
|
Upgraded. Thanks a lot! Searches are definitely FASTER than before.
I'd vote for this hack to be Hack of the Month, no problems. :) It's something that vBulletin should really have as default. |
I've put full text search back in. Works fine. Turns out a lot of the performance issues on the site were crawlers from China/Korea. I've completely blocked both countries from the server and load dropped in half during prime time. Because my site is about trucks, virtually no one from those countries is a legit visitor, and we don't ship to them so its not a loss if a handful can't visit. Until I can come up with a better long term solution blocking is the best route (these crawlers don't obey robots.txt).
Also, I'm in the stages of finishing out a "final" solution (for now) to performance issues related to search and the post table. Basically, every time a post is added, edited, moved or deleted, that post information is changed in a secondary table (postsearch). Searches are conducted in this table, all other activity is in the primary table. Off course, this means two tables are updated/locked every time there is a post/edit/move/delete but the number of these per day is minor compared to the number of table locks during load. Initial tests look very good, just have to make sure there are no bugs. :) |
We're running a cross-breed between this hack and fulltext search hack for vb2 (boolean queries only, no LEFT JOIN on thread table) for more than 2 weeks now. Runs smoothly and without any problems so far.
I think the total failure of this hack on our board is due to the fact that we have about 350 new threads per day, most of them happen in daytime. And mysql slow query log shows that some search queries take up to 30-40 seconds to complete. According to stats, there're about 15 of such slow queries per hour, likewise, with daytime peak. Couple that with LEFT JOIN on thread table, and you'll get quite a high chance of locked mysql process for new thread creation for every such search. What's worse, I think select queries will be locked as well as they wait in line for INSERT process to complete (captured mysql process list during lockup proves this). I'm not sure why fulltext searches do not lock post table as badly - new posts should trigger post table locks much more often than thread table gets locked with new threads. Maybe there's some kind of row-level or page-level locking kicking in here? motorhaven: I see the benefit of your solution in that only search queries will be locked, instead of hundreds of common SELECT queries waiting for INSERT/UPDATE to complete. Like I said before, you can improve your hack by deferring new posts or updates to second table and executing them in batch, probably using cron job like the one which is updating deferred thread views in official vb3. |
kmike, care to share your no LEFT JOIN version?
My slow_queries log shows that LEFT JOIN is a main culprit of queries >100 seconds. |
Erwin: well, I have to tidy up and clean the code from some custom tweaks first. Also, my hack changes search template a bit, like vb2 version of this hack did - there're 4 search fields instead of just one:
Search for items including these words: and excluding these words: and optionally including these words: search for exact phrase: That said, are you sure your search queries would run faster without LEFT JOIN? From my tests, that LEFT JOIN doesn't add substantial processing time, it only causes thread table locking which in turn causes peaks of load on frontend server. |
I have only 2 queries showing up in my mysql slow_queries log - the LEFT JOIN in search.php for this hack is one of them. The other one cannot be help. I have a lot of threads on my forum.
Please do share your code. :) Anything to remove that slow_query. Another thing, can you also share the IPs that you've banned? I suspect I have the same problem too. |
Erwin: ok, I'll post my changes after tidying up. Not before the end of next week though - I will be out of reach of computer until then.
Also, I want to point out that you won't eliminate that slow query altogether. I also have that search related slow query in mysql log, though its probably not so bad as for you - 20-40 seconds at max. It is limitation of mysql fulltext search engine, the more posts you have in your db the longer search proceeds. We can't help it either, just wait for mysql folks to improve their engine. |
It's not too bad at the moment - ranges from 30-60 seconds for the LEFT JOIN query.
Thanks for sharing your code. I want to test it out and see if it make a difference. |
Aarggh... just had another server crash because of the LEFT JOIN query in search.php - it locked up the whole mysql server, and the query was up to 400,000 rows before I nuked it.
|
Quote:
ps can you search within threads using your version of this hack ? because when i did it seemed to search outside of the thread in question aswell |
All times are GMT. The time now is 07:38 AM. |
Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information | |
---|---|
|
|
![]() |
|
Template Usage:
Phrase Groups Available:
|
Included Files:
Hooks Called:
|