Go Back   vb.org Archive > vBulletin 3 Discussion > vB3 Programming Discussions
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools Display Modes
  #1  
Old 07-26-2007, 11:06 AM
mihai11 mihai11 is offline
 
Join Date: Dec 2005
Location: Sibiu - Romania
Posts: 199
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default SQL optimization: "multiple queries" vs. "big join"

Hi,

I need data from several tables. What is best ? To make a big join between 4 tables, thus having 1 big query per page or to avoid a big join thus having multiple SQL queries (currently 36) ?

If I would use a join then I would be able to reduce the amount of queries to 6 or 7. What do you think ?

Regards,
Razvan M.
Reply With Quote
  #2  
Old 07-26-2007, 11:21 AM
Marco van Herwaarden Marco van Herwaarden is offline
 
Join Date: Jul 2004
Posts: 25,415
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

First of all, the absolute number of queries is largely overrated as a measurement of how effective the script is. I can have 10 queries that have less impact on performence then a single badly written query.

Now to answer your question: based on the information you provide this is not possible to give a good answer as it will depend on many factors.

Some of them:
- How many rows in each table.
- The datatype of the columns used to JOIN/WHERE
- The cardinality of the data in these columns (the number of unique values compaired to the total number of rows). Selecting on a column containing a unique ID is something totally different then using a column with only 2 unique values (yes/no for example).
- The order in which the JOINS are done
- the amount of memory needed versus the amount available to MySQL on the server
- The storage type of the table.
- ...and many more
Reply With Quote
  #3  
Old 07-26-2007, 11:37 AM
mihai11 mihai11 is offline
 
Join Date: Dec 2005
Location: Sibiu - Romania
Posts: 199
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Marco van Herwaarden View Post
First of all, the absolute number of queries is largely overrated as a measurement of how effective the script is. I can have 10 queries that have less impact on performence then a single badly written query.

Now to answer your question: based on the information you provide this is not possible to give a good answer as it will depend on many factors.

Some of them:
- How many rows in each table.
- The datatype of the columns used to JOIN/WHERE
- The cardinality of the data in these columns (the number of unique values compaired to the total number of rows). Selecting on a column containing a unique ID is something totally different then using a column with only 2 unique values (yes/no for example).
- The order in which the JOINS are done
- the amount of memory needed versus the amount available to MySQL on the server
- The storage type of the table.
- ...and many more
I have 4 tables:
  1. story table - thousands of entries
  2. thread table - thousands of entries (in fact this is the thread table from VBulletin)
  3. user table - thousands of entries on a big forum (this is the user table from VBulletin)
  4. icon table - less than 50 items; this is the "icon" table from Vbulletin.

The joins will be made on primary keys of type int.

I am not sure about the storage type, but I think it is INNODB. I have to check about this on the server, if you think that it is important.


First of all, just by answering your question I got a different perspective. Tables 1, 2 and 3 can all be HUGE tables. That being the case it might be better to avoid the join completely because it can use insane amounts of RAM when the forum grows.

Currently, I have a join between "story" table and "thread" table. I don't know if I should keep this join. Hmm.... what to do ? If I will kill this join then I will easily get 60-70 queries per page.
Reply With Quote
  #4  
Old 07-26-2007, 07:01 PM
Eikinskjaldi's Avatar
Eikinskjaldi Eikinskjaldi is offline
 
Join Date: Feb 2006
Location: Hell, never looked better
Posts: 572
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by mihai11 View Post
I have 4 tables:
  1. story table - thousands of entries
  2. thread table - thousands of entries (in fact this is the thread table from VBulletin)
  3. user table - thousands of entries on a big forum (this is the user table from VBulletin)
  4. icon table - less than 50 items; this is the "icon" table from Vbulletin.

The joins will be made on primary keys of type int.

I am not sure about the storage type, but I think it is INNODB. I have to check about this on the server, if you think that it is important.


First of all, just by answering your question I got a different perspective. Tables 1, 2 and 3 can all be HUGE tables. That being the case it might be better to avoid the join completely because it can use insane amounts of RAM when the forum grows.

Currently, I have a join between "story" table and "thread" table. I don't know if I should keep this join. Hmm.... what to do ? If I will kill this join then I will easily get 60-70 queries per page.
The join is the better option because it uses less overhead. Interesting you are ussing inno, is there a reason? do you ever backup your datatbase? did you purchase inno hotcopy?

run the query through explain, see what it looks like.
Reply With Quote
  #5  
Old 07-29-2007, 11:58 PM
Antivirus's Avatar
Antivirus Antivirus is offline
 
Join Date: Sep 2004
Location: Black Lagoon
Posts: 1,090
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

<a href="http://dev.mysql.com/doc/refman/4.1/en/explain.html" target="_blank">http://dev.mysql.com/doc/refman/4.1/en/explain.html</a>

Usually explain can show you its possible to speed up the query just by creating an index or two. I just recently started using it and its been very helpful.
Reply With Quote
  #6  
Old 08-01-2007, 09:11 AM
mihai11 mihai11 is offline
 
Join Date: Dec 2005
Location: Sibiu - Romania
Posts: 199
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Eikinskjaldi View Post
The join is the better option because it uses less overhead. Interesting you are ussing inno, is there a reason? do you ever backup your datatbase? did you purchase inno hotcopy?

run the query through explain, see what it looks like.
What if I will have 10000 stories and 10000 threads ? It will take a lot of memory to make a join of those 2 tables and most of the information will not be needed.

The user only sees 10 stories per page, so I will fetch those 10 stories then for each story I will fire a query to get the thread information corresponding for that story.

I changed my mind regarding the "user" table. I don't need this table any more.

This looks like the best trade-off for now. If you disagree, please comment.


About "hotcopy": I am sorry, but I never heard of it. Is this going to boost my DB performance ?

The reason for using innodb is because my host configure it like that.



Regards,
Razvan
Reply With Quote
  #7  
Old 08-01-2007, 12:27 PM
Eikinskjaldi's Avatar
Eikinskjaldi Eikinskjaldi is offline
 
Join Date: Feb 2006
Location: Hell, never looked better
Posts: 572
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by mihai11 View Post
What if I will have 10000 stories and 10000 threads ? It will take a lot of memory to make a join of those 2 tables and most of the information will not be needed. The user only sees 10 stories per page, so I will fetch those 10 stories then for each story I will fire a query to get the thread information corresponding for that story.
I am assuming you have some kind of where clause to limit the join, so it is not an all - all query
Assuming there is some kind of index field that relates storeis to threads, then the number in each table is irrelevant, the index and join will ensure you only reference the needed rows. Doing it in two queries costs more in query overhead. As I said the first time around. Don't wave your hands and guess, use explain and find out the actual cost!

Quote:
About "hotcopy": I am sorry, but I never heard of it. Is this going to boost my DB performance ?

The reason for using innodb is because my host configure it like that.
Without hotcopy you have to turn your server off to do a backup.
Reply With Quote
  #8  
Old 08-01-2007, 12:58 PM
mihai11 mihai11 is offline
 
Join Date: Dec 2005
Location: Sibiu - Romania
Posts: 199
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Eikinskjaldi View Post
I am assuming you have some kind of where clause to limit the join, so it is not an all - all query
I do have a "WHERE" clause, but that is not of much help. If there are no errors in the database then each story MUST have a corresponding thread. The only thing that will actually limit the number of returned rows is a "LIMIT clause".


Quote:
Originally Posted by Eikinskjaldi View Post
Assuming there is some kind of index field that relates stories to threads, then the number in each table is irrelevant, the index and join will ensure you only reference the needed rows.
I am using an index, but I do not agree with your conclusions. If I am using the "JOIN", then for 10.000 stories there will be 10.000 threads that will be selected. In the database there will be much more than 10.000 threads because a thread can exist without a story, but a story cannot exist without a thread. (in fact, a thread will hold comments for a given story).

I agree with you that the join will not join all the threads with all the stories, but the joined number of stories and threads is still very big. With my algorithm I will select 10 stories at a time (only 10 stories are displayed on a page) then I will select the required data for each story.


Quote:
Originally Posted by Eikinskjaldi View Post
As I said the first time around. Don't wave your hands and guess, use explain and find out the actual cost!
I am going to follow your advice on this.

Quote:
Originally Posted by Eikinskjaldi View Post
Without hotcopy you have to turn your server off to do a backup.
When I want to make a backup of my site I use this command:

Code:
mysqldump --opt --host=ABC-HOST -Q -uUSER -pPASSWORD db_name > db_file.sql
If I want to put the data back, I firstly stop the board (from admincp) then issue this command:

Code:
mysql -uUSER -pPASSWORD db_name < db_file.sql

I am never stopping the mysql server.
Reply With Quote
  #9  
Old 08-01-2007, 01:48 PM
Marco van Herwaarden Marco van Herwaarden is offline
 
Join Date: Jul 2004
Posts: 25,415
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

How effective a query is and how much temporary storage (or memory) is needed will depend a lot on the table structure, it's indexes, how tables are joined, how the WHERE-clause of the query is build, ORDER BY clauses and the use of LIMIT.
Reply With Quote
  #10  
Old 08-01-2007, 09:05 PM
Eikinskjaldi's Avatar
Eikinskjaldi Eikinskjaldi is offline
 
Join Date: Feb 2006
Location: Hell, never looked better
Posts: 572
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by mihai11 View Post
I do have a "WHERE" clause, but that is not of much help. If there are no errors in the database then each story MUST have a corresponding thread. The only thing that will actually limit the number of returned rows is a "LIMIT clause".
No, that is not how queries work. They use where to optimise. the explain will clearly show you this.

Quote:
I am using an index, but I do not agree with your conclusions.
They are not conclusions, they are facts. Use explain once or twice, and all will become clear.


Code:
mysqldump --opt --host=ABC-HOST -Q -uUSER -pPASSWORD db_name > db_file.sql
Ahh right, of course. I keep forgetting that works with innodb.
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 12:08 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.04422 seconds
  • Memory Usage 2,276KB
  • Queries Executed 13 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)ad_showthread_firstpost
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (3)bbcode_code
  • (11)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)navbar
  • (3)navbar_link
  • (120)option
  • (10)post_thanks_box
  • (10)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (10)post_thanks_postbit_info
  • (10)postbit
  • (10)postbit_onlinestatus
  • (10)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete