Go Back   vb.org Archive > vBulletin Modifications > Archive > vB.org Archives > General > Big Board Discussions

Reply
 
Thread Tools
Is this a solution for large threads? Details »»
Is this a solution for large threads?
Version: , by amcd amcd is offline
Developer Last Online: Apr 2018 Show Printable Version Email this Page

Version: Unknown Rating:
Released: 03-27-2007 Last Update: Never Installs: 0
 
No support by the author.

The topic of large threads being detrimental to mysql performance is known to all big board admins. But why is it this way with vbulletin? Do all forum software have this problem? Does Gaia online suffer from similar drawbacks?

vb stores the threadid and dateline in the post table. Whenever a bunch of posts needs to be selected, vb filters the post table by threadid and orders the results by dateline, and takes a few of the posts (of course there is an index involved). In other words the query is similar to

SELECT * FROM post WHERE threadid=$threadid ORDER BY dateline LIMIT 8575,15

which means that mysql will have to throw away the first 8575 results and then return the next 15. If my understanding is correct, this is the reason why large threads are bad. We need only 15 rows, but mysql has to analyse thousands.

The post table is read in this manner very frequently, every time showthread.php is run. Other accesses to the post table are far less frequent. They are showpost, newreply, newthread, edit post and miscellaneous moderator functions like delete post, move/copy post, merge posts and merge threads. vb's method of doing things means that the frequent requirement (showthread) is slow whereas the infrequent requirements are much simpler and faster.

Now consider an alternative.

What if the position of the post in the thread is stored in the post table? Lets call this field 'postpos'. The first post in a thread will have postpos=1, and the 19th post will have postpos=19.

Now, the query for showthread will be something like

SELECT * FROM post WHERE threadid=$threadid AND postpos BETWEEN 8576 AND 8590 ORDER BY postpos

This should be extremely fast, and should be equally fast for any page in a thread of any size.

Of course, this means that the postpos column must be updated by all relevant modules of vbulletin. It will need to be given a proper initial value when a new post is made, and it will need to be updated whenever a post is moved, copied, merged or deleted. It will also need to be updated if a thread is merged or split. The additional code on newreply will not be running any heavy query, so it will be quite fast. Other functions may be slow for large threads, but since all functions except adding a new post happen quite infrequently, it should not matter so much.

Am I making a fundamental mistake somewhere? Is this feasible? Can this be done using plugins and without code edits?

Show Your Support

  • This modification may not be copied, reproduced or published elsewhere without author's permission.

Comments
  #2  
Old 03-27-2007, 11:18 AM
orban orban is offline
 
Join Date: Jan 2005
Posts: 445
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Yes, this correct, MySQL has to skip the first 8574 posts, that's why those queries are "slow", but they aren't that slow, really, I think. 8000 rows isn't that much (and it's fully indexed).

Yeah, adding postpos, or "page", could be a solution, but the issue is that deleting posts/moving/merging is a bit of a pain.

Worst case is that you remove the second post of a thread with 10000 replies and then you have to update 9999 rows, which is veeeeery bad. Then again you will hardly delete/move old posts in a thread and just on the last pages, where it's not a big issue to update a few dozen posts....I guess...

Mainly making sure you don't miss any actions that modify thread positions is the issue here and the problem that in certain cases you have to update many many many rows with a new postpos value.

Probably could be done just with a plugin (maybe modcp/admincp post moving options might be a problem).
Reply With Quote
  #3  
Old 03-27-2007, 11:29 AM
amcd amcd is offline
 
Join Date: Oct 2004
Posts: 218
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

8000 was just an example. We have a thread with more than 700,000 posts. The admins did not like members doing off-topic chatting in any and all threads, so the members made a thread only for chit-chat. Now that thread takes 20 seconds or more to load, and we have quite a powerful dbserver. We keep that thread closed, and another 'current' thread is present for chatting. The current thread is merged with the main thread once it reaches 8000 posts or so. The members are not happy with this.

I read elsewhere that large forums auto-split threads larger than 1000 posts. We have hundreds (maybe thousands) of threads significantly larger than 1000 posts.
Reply With Quote
  #4  
Old 03-27-2007, 05:52 PM
kmike kmike is offline
 
Join Date: Oct 2002
Posts: 169
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

You can alleviate the need to retrieve all thread posts by using a reverse index on dateline when looking in the second half of a large thread. This solution was first introduced by the admin of the infamous www.Gaiaonline.com forum (which is currently at 979 million posts - simply incredible!). You can see it here:
http://www.phpbb.com/community/viewt...135383#p771664

Implementing it will greatly decrease the number of retrieved posts for sorting, since most of the post accesses are near the thread end (members reading last posts), and in the worst case of reading near the middle it'll require retrieving and sorting only through half of the thread posts.

In fact, I wonder if this solution should be brought to the vB devs' attention, many forums are suffering from the huge thread issue.
Reply With Quote
  #5  
Old 03-27-2007, 07:29 PM
amcd amcd is offline
 
Join Date: Oct 2004
Posts: 218
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Thank you, kmike. That is an interesting approach. That thread also looks like a good read otherwise as well. Will read the whole thing when I get some time.

Do you see anything wrong with the approach I am planning. Should I go ahead and try it?
Reply With Quote
  #6  
Old 03-28-2007, 04:57 AM
kmike kmike is offline
 
Join Date: Oct 2002
Posts: 169
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

There's nothing wrong with your approach, but as orban already pointed out, there're too many places you have to track to keep the post positions updated, and renumbering the posts on every delete/merge/split is a costly operation. Also, you'd have to find the last postpos in a thread for every new post, which will be far more frequent operation if your forum is active.
Personally, I wouldn't bother with this approach.
Reply With Quote
  #7  
Old 04-01-2007, 11:33 AM
bigbadbob0 bigbadbob0 is offline
 
Join Date: Mar 2005
Posts: 5
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I'm pretty sure MySQL is optimized to not do stupid things when given a LIMIT. So this query probably isn't wasting quite as much as you're imagining.

http://dev.mysql.com/doc/refman/5.0/...imization.html
Reply With Quote
  #8  
Old 04-02-2007, 04:20 AM
kmike kmike is offline
 
Join Date: Oct 2002
Posts: 169
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

There's no imagination involved. A simple explain on any showthread query for a large thread will show an estimated number of scanned rows. Heck, doesn't "LIMIT 10000" for a 10000+ replies thread ring the bell?
It may look like a stupid thing, but to fetch a post at the end of a thread MySQL has to sort through all the previous posts (ordered by dateline ASC).
Reply With Quote
  #9  
Old 04-06-2007, 10:04 AM
orban orban is offline
 
Join Date: Jan 2005
Posts: 445
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by kmike View Post
You can alleviate the need to retrieve all thread posts by using a reverse index on dateline when looking in the second half of a large thread. This solution was first introduced by the admin of the infamous www.Gaiaonline.com forum (which is currently at 979 million posts - simply incredible!). You can see it here:
http://www.phpbb.com/community/viewt...135383#p771664

Implementing it will greatly decrease the number of retrieved posts for sorting, since most of the post accesses are near the thread end (members reading last posts), and in the worst case of reading near the middle it'll require retrieving and sorting only through half of the thread posts.

In fact, I wonder if this solution should be brought to the vB devs' attention, many forums are suffering from the huge thread issue.
How would you go about implementing a reverse index?

http://www.mysqlperformanceblog.com/...limit-queries/



Oh!

OH!

Does vB do JOINs on the LIMIT 10000000,10 queries?
Reply With Quote
  #10  
Old 04-10-2007, 02:39 AM
amcd amcd is offline
 
Join Date: Oct 2004
Posts: 218
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

no, apparently vB does not do that. vB first runs a simple query on a single table (post) and gets the relevant postids. Then it runs the complex query only for those postids.
Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 01:29 AM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.11690 seconds
  • Memory Usage 2,286KB
  • Queries Executed 23 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)modsystem_post
  • (1)navbar
  • (6)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (1)pagenav_pagelink
  • (10)post_thanks_box
  • (10)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (10)post_thanks_postbit_info
  • (9)postbit
  • (10)postbit_onlinestatus
  • (10)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete