View Full Version : SQL optimization: "multiple queries" vs. "big join"
mihai11
07-26-2007, 11:06 AM
Hi,
I need data from several tables. What is best ? To make a big join between 4 tables, thus having 1 big query per page or to avoid a big join thus having multiple SQL queries (currently 36) ?
If I would use a join then I would be able to reduce the amount of queries to 6 or 7. What do you think ?
Regards,
Razvan M.
Marco van Herwaarden
07-26-2007, 11:21 AM
First of all, the absolute number of queries is largely overrated as a measurement of how effective the script is. I can have 10 queries that have less impact on performence then a single badly written query.
Now to answer your question: based on the information you provide this is not possible to give a good answer as it will depend on many factors.
Some of them:
- How many rows in each table.
- The datatype of the columns used to JOIN/WHERE
- The cardinality of the data in these columns (the number of unique values compaired to the total number of rows). Selecting on a column containing a unique ID is something totally different then using a column with only 2 unique values (yes/no for example).
- The order in which the JOINS are done
- the amount of memory needed versus the amount available to MySQL on the server
- The storage type of the table.
- ...and many more
mihai11
07-26-2007, 11:37 AM
First of all, the absolute number of queries is largely overrated as a measurement of how effective the script is. I can have 10 queries that have less impact on performence then a single badly written query.
Now to answer your question: based on the information you provide this is not possible to give a good answer as it will depend on many factors.
Some of them:
- How many rows in each table.
- The datatype of the columns used to JOIN/WHERE
- The cardinality of the data in these columns (the number of unique values compaired to the total number of rows). Selecting on a column containing a unique ID is something totally different then using a column with only 2 unique values (yes/no for example).
- The order in which the JOINS are done
- the amount of memory needed versus the amount available to MySQL on the server
- The storage type of the table.
- ...and many more
I have 4 tables:
story table - thousands of entries
thread table - thousands of entries (in fact this is the thread table from VBulletin)
user table - thousands of entries on a big forum (this is the user table from VBulletin)
icon table - less than 50 items; this is the "icon" table from Vbulletin.
The joins will be made on primary keys of type int.
I am not sure about the storage type, but I think it is INNODB. I have to check about this on the server, if you think that it is important.
First of all, just by answering your question I got a different perspective. Tables 1, 2 and 3 can all be HUGE tables. That being the case it might be better to avoid the join completely because it can use insane amounts of RAM when the forum grows.
Currently, I have a join between "story" table and "thread" table. I don't know if I should keep this join. Hmm.... what to do ? If I will kill this join then I will easily get 60-70 queries per page.
Eikinskjaldi
07-26-2007, 07:01 PM
I have 4 tables:
story table - thousands of entries
thread table - thousands of entries (in fact this is the thread table from VBulletin)
user table - thousands of entries on a big forum (this is the user table from VBulletin)
icon table - less than 50 items; this is the "icon" table from Vbulletin.
The joins will be made on primary keys of type int.
I am not sure about the storage type, but I think it is INNODB. I have to check about this on the server, if you think that it is important.
First of all, just by answering your question I got a different perspective. Tables 1, 2 and 3 can all be HUGE tables. That being the case it might be better to avoid the join completely because it can use insane amounts of RAM when the forum grows.
Currently, I have a join between "story" table and "thread" table. I don't know if I should keep this join. Hmm.... what to do ? If I will kill this join then I will easily get 60-70 queries per page.
The join is the better option because it uses less overhead. Interesting you are ussing inno, is there a reason? do you ever backup your datatbase? did you purchase inno hotcopy?
run the query through explain, see what it looks like.
Antivirus
07-29-2007, 11:58 PM
<a href="http://dev.mysql.com/doc/refman/4.1/en/explain.html" target="_blank">http://dev.mysql.com/doc/refman/4.1/en/explain.html</a>
Usually explain can show you its possible to speed up the query just by creating an index or two. I just recently started using it and its been very helpful.
mihai11
08-01-2007, 09:11 AM
The join is the better option because it uses less overhead. Interesting you are ussing inno, is there a reason? do you ever backup your datatbase? did you purchase inno hotcopy?
run the query through explain, see what it looks like.
What if I will have 10000 stories and 10000 threads ? It will take a lot of memory to make a join of those 2 tables and most of the information will not be needed.
The user only sees 10 stories per page, so I will fetch those 10 stories then for each story I will fire a query to get the thread information corresponding for that story.
I changed my mind regarding the "user" table. I don't need this table any more.
This looks like the best trade-off for now. If you disagree, please comment.
About "hotcopy": I am sorry, but I never heard of it. Is this going to boost my DB performance ?
The reason for using innodb is because my host configure it like that.
Regards,
Razvan
Eikinskjaldi
08-01-2007, 12:27 PM
What if I will have 10000 stories and 10000 threads ? It will take a lot of memory to make a join of those 2 tables and most of the information will not be needed. The user only sees 10 stories per page, so I will fetch those 10 stories then for each story I will fire a query to get the thread information corresponding for that story.
I am assuming you have some kind of where clause to limit the join, so it is not an all - all query
Assuming there is some kind of index field that relates storeis to threads, then the number in each table is irrelevant, the index and join will ensure you only reference the needed rows. Doing it in two queries costs more in query overhead. As I said the first time around. Don't wave your hands and guess, use explain and find out the actual cost!
About "hotcopy": I am sorry, but I never heard of it. Is this going to boost my DB performance ?
The reason for using innodb is because my host configure it like that.
Without hotcopy you have to turn your server off to do a backup.
mihai11
08-01-2007, 12:58 PM
I am assuming you have some kind of where clause to limit the join, so it is not an all - all query
I do have a "WHERE" clause, but that is not of much help. If there are no errors in the database then each story MUST have a corresponding thread. The only thing that will actually limit the number of returned rows is a "LIMIT clause".
Assuming there is some kind of index field that relates stories to threads, then the number in each table is irrelevant, the index and join will ensure you only reference the needed rows.
I am using an index, but I do not agree with your conclusions. If I am using the "JOIN", then for 10.000 stories there will be 10.000 threads that will be selected. In the database there will be much more than 10.000 threads because a thread can exist without a story, but a story cannot exist without a thread. (in fact, a thread will hold comments for a given story).
I agree with you that the join will not join all the threads with all the stories, but the joined number of stories and threads is still very big. With my algorithm I will select 10 stories at a time (only 10 stories are displayed on a page) then I will select the required data for each story.
As I said the first time around. Don't wave your hands and guess, use explain and find out the actual cost!
I am going to follow your advice on this.
Without hotcopy you have to turn your server off to do a backup.
When I want to make a backup of my site I use this command:
mysqldump --opt --host=ABC-HOST -Q -uUSER -pPASSWORD db_name > db_file.sql
If I want to put the data back, I firstly stop the board (from admincp) then issue this command:
mysql -uUSER -pPASSWORD db_name < db_file.sql
I am never stopping the mysql server.
Marco van Herwaarden
08-01-2007, 01:48 PM
How effective a query is and how much temporary storage (or memory) is needed will depend a lot on the table structure, it's indexes, how tables are joined, how the WHERE-clause of the query is build, ORDER BY clauses and the use of LIMIT.
Eikinskjaldi
08-01-2007, 09:05 PM
I do have a "WHERE" clause, but that is not of much help. If there are no errors in the database then each story MUST have a corresponding thread. The only thing that will actually limit the number of returned rows is a "LIMIT clause".
No, that is not how queries work. They use where to optimise. the explain will clearly show you this.
I am using an index, but I do not agree with your conclusions.
They are not conclusions, they are facts. Use explain once or twice, and all will become clear.
mysqldump --opt --host=ABC-HOST -Q -uUSER -pPASSWORD db_name > db_file.sql
Ahh right, of course. I keep forgetting that works with innodb.
vBulletin® v3.8.12 by vBS, Copyright ©2000-2025, vBulletin Solutions Inc.