PDA

View Full Version : Optimized attachments in vB2.X.X


s.molinari
02-23-2003, 10:00 PM
Hi vB Troopers,

*****************
* Hack Idea:
*****************
This hack is designed to improve 2 things:
1. Prevent duplicate attachments altogether.
2. Speed up the query for duplicates and avoid errors.

What does vB do now?
---------------------------------
If you have "Allow Duplicate Images" set to "No" in your attachments options and your database has a very large amount of attachments, at worst, the server will timeout or the query to check for duplicate attachments takes an extreme amount of time. The duplicates query also checks only for duplicates from a particular user so actually it is possible to have duplicate attachments in the database.

What does vB do differently after the hack?
------------------------
With the hack installed, each attachment receives a hash which is stored in the attachment table. This hash is then used to find duplicate attachments regardless of whether or not the attachment is from the same user or not. So duplicate attachments are not possible.

Why vB works faster with this hack?
------------------------------------
Since the hash field and not the actual file content is being compared, then much less server resources are needed for MySql to run the query. Also the hash field is indexed. This in turn speeds up the saving of the attachments when checking for duplicates.

Please be aware that this hack alters the database and when carried out incorrectly, could cause data loss. Installation is at your own risk.

That said, enjoy!:D

Scott

This hack was written by pogo.

Also note, a system similiar to this will be part of vB3, so no need to ask if it will be added.

s.molinari
02-24-2003, 03:17 PM
[Release vB 2.2/3.x] Optimierung des Speicherns von Anh?ngen
Sinn des Hacks:
Wenn man sehr viele Anh?nge in seiner DB hat und der Server mit einem zu kurzen Timeout versehen ist, passiert es, dass man keine Anh?nge mehr hochladen kann, weil ein 404 Fehler erscheint. Dieser Hack ver?ndert die ?berpr?fung auf doppelte Anh?nge derart, dass sie sehr viel schneller abl?uft.

Wie es im original vBulletin ist:
Wenn man in den Options keine doppelten Anh?nge erlaubt, wird beim Hochladen eines Anhangs ?berpr?ft, ob man diesen schon einmal hochgeladen hat. Es wird nicht ?berpr?ft, ob dergleiche Anhang schon in der DB, sondern nur, ob der augenblickliche Autor diesen Anhang schon gepostet hat.

Wie es mit dem Hack ist:
Es wird ?berpr?ft, ob dergleiche Anhang schon in der DB ist. Ist dies der Fall, wird der Anhang nicht erneut in der DB gespeichert, sondern der bestehende Anhang im Beitrag angezeigt.

Wer braucht den Hack?
Der Hack nimmt vermutlich allen Foren mit vielen Anh?ngen ein wenig (wenn nicht viel) Belastung vom Server.

Die Deutsche Version findest du hier (http://www.vbulletin-germany.com/forum/showthread.php?s=&postid=26937#post26937).

Xenon
02-24-2003, 03:48 PM
hmm, from what i see, it's looking good.
just one thing might cause trouble (very rarely of course but it could.)

you are using a hash function to decide if an attachment is already in the db or not.
lesson one of hashing is that hashes can be compared faster than the objects, but nevertheless the best hashing function could produce similar hash results with different objects (it's called the birthday problem or something like)

md5 hash spreads the hash result very good, but it is NO injective function, and therfore a bug can occure. (as said rarely but possible)

to solve the problem, you have to compare the full data after selecting it with the hash comparison.

Dean C
02-24-2003, 04:00 PM
Wow a hack from a vb dev - this is a rare and special occasion :)

This hack sounds impressive. Good job pal :)!

Regards

- miSt

s.molinari
02-24-2003, 04:12 PM
Originally posted by Mist
Wow a hack from a vb dev - this is a rare and special occasion :)

This hack sounds impressive. Good job pal :)!

Regards

- miSt

hehe!:D We're not vB devs. Just a future distributor of vB.

@Xenon. Good point but as you said, very rare that something like you mentioned would happen. Thanks for the input.

Scott

Xenon
02-24-2003, 04:26 PM
;) welcome scott :)
you know, me, always on the search of possible problems ;)
ok, the chance of two matching hashes is 1 to 3.4 * 10^38 but it is there ;)

@Mist: they are just developing the german in the new vb *gg*

Dean C
02-24-2003, 05:04 PM
Still developing vbulletin hehe ;)

This is such a good idea. I hope its in vb3? ;)

- miSt

ManagerJosh
02-24-2003, 05:34 PM
Well if the bugs get ironed out from what Xenon pointed out, I'll most definitely install :)

Xenon
02-24-2003, 07:56 PM
you can install it, the chance of this bug occurs is nearly 0

if you want to be absolutley sure use instead of this:
if (!$allowduplicates) {
if ($result=$DB_site->query_first("SELECT attachmentid
FROM attachment
WHERE hash = '".addslashes($attachmenthash)."'")) {
$attachmentid = $result[attachmentid];
}

this:

if (!$allowduplicates) {
if ($result=$DB_site->query_first("SELECT attachmentid,filedata
FROM attachment
WHERE hash = '".addslashes($attachmenthash)."'")) {
if($filestuff==$result[filedata]) {
$attachmentid = $result[attachmentid];
}

and add a } directly before return $attachmentid;

Erwin
02-24-2003, 08:04 PM
A hack from molinari is always a good one. :)

Boofo
02-25-2003, 08:18 AM
If you add the code Xenon pointed out, wouldn't it run the same speed as it does now?

to solve the problem, you have to compare the full data after selecting it with the hash comparison.

pogo
02-25-2003, 09:52 AM
No. It is only one additional comparison that should run fast.

Boofo
02-25-2003, 10:03 AM
So, as the hack writer, do you recommend to add it? ;)

Xenon
02-25-2003, 10:17 AM
can i answer?
it's not recommended, but it prevents a bug which can occur (normally one on every million board, but a bug, so in vb3 i think this would be in..)

Boofo
02-25-2003, 10:27 AM
Ok, then, just for curiosities sake, what would happen if the bug ever did occur? Would it only happen once in a long while and what would it do or cause?

pogo
02-25-2003, 10:48 AM
If the hashes of two different attachments are the same the new attachment wouldn't make it into the database because it is assumed that it is the same file like the one that is already in the database.

So the bug would be that the wrong attachment is shown in the post. I don't think that you will encounter this bug.

Boofo
02-25-2003, 10:51 AM
Thanks for explaining that. I will go ahead and add the original version then. Thanks! ;) And great hack, by the way. :)

Boofo
02-25-2003, 11:44 AM
This is bugging the heck out of me for some reason. What does this come out to?

1 to 3.4 * 10^38

I tried to do it on the windows Scientific Calculator and I only come up with something like 149.5.

Xenon
02-25-2003, 12:04 PM
it is 0.000000000000000000000000000000000000002941 :)

pogo
02-25-2003, 12:05 PM
The chances that you want to upload a file that has the same hash like a file that is already in the database are 1:340.000.000.000.000.000.000.000.000.000.000.000. 000

That means you have to upload this many files before the "bug" will occur. Theoretically.

Boofo
02-25-2003, 01:19 PM
Originally posted by pogo
The chances that you want to upload a file that has the same hash like a file that is already in the database are 1:340.000.000.000.000.000.000.000.000.000.000.000. 000

That means you have to upload this many files before the "bug" will occur. Theoretically.

Damn! I'm getting close. I better watch this. ;)

By the way, how would you enter that equation on a calculator?

Xenon
02-25-2003, 01:31 PM
*gg*
my calculator can handle exponental operations, so i can just type in 3.4 exp 38 :)
ok, it's not a MS calculator, but one i bought for scholl long time ago :)

Boofo
02-25-2003, 01:40 PM
Guess I'm going to have to get me one of those. Who knows when I'll run into this again? What do you think the equation would be on that one? LOL

Martin64
02-25-2003, 11:47 PM
What are the odds for it happening two times in a row for the same user? ;)

Just kidding. :)
Smart hack btw.

Smoothie
02-26-2003, 02:12 AM
Is this needed, or does it help if allow duplicate images is set to "yes"?

Brad
02-26-2003, 02:18 AM
I cant run this query via phpmyadmin, im getting a no SQL query error.

ALTER TABLE attachment ADD hash VARCHAR(32) DEFAULT '0' NOT NULL;

Brad
02-26-2003, 02:22 AM
Something is buggy in my cpanel, i got them to run.

lifesourcerec
02-26-2003, 11:26 AM
I noticed, before adding this hack, I had this line:


unlink($attachment);


But in the hack it has:


@unlink($attachment);


So, does the @ suppose to be there?

pogo
02-26-2003, 01:46 PM
Yes. It is this way in the actual vBulletin.

PHP supports one error control operator: the at sign (@). When prepended to an expression in PHP, any error messages that might be generated by that expression will be ignored.

TranceMaster
02-26-2003, 03:34 PM
how would i go about removing this hack
after installing it, my forum has been going very very slow :(
how do i remove these queries?
-------------------------------------------------------------------------------
ALTER TABLE attachment ADD hash VARCHAR(32) DEFAULT '0' NOT NULL;
-------------------------------------------------------------------------------


-------------------------------------------------------------------------------
UPDATE attachment SET hash = md5(filedata);
-------------------------------------------------------------------------------


-------------------------------------------------------------------------------
ALTER TABLE attachment ADD INDEX(hash);
-------------------------------------------------------------------------------

hope sum1 can help
thanks

Dean C
02-26-2003, 03:53 PM
lifesourcerec - The @ prevents it from outputting an error if the function fails. It was added in there in vb 2.2.7 i believe (i can remember from when i upgraded)

- miSt

pogo
02-26-2003, 04:04 PM
Originally posted by TranceMaster
how would i go about removing this hack
after installing it, my forum has been going very very slow :(
how do i remove these queries?
-------------------------------------------------------------------------------
ALTER TABLE attachment ADD hash VARCHAR(32) DEFAULT '0' NOT NULL;
-------------------------------------------------------------------------------


-------------------------------------------------------------------------------
UPDATE attachment SET hash = md5(filedata);
-------------------------------------------------------------------------------


-------------------------------------------------------------------------------
ALTER TABLE attachment ADD INDEX(hash);
-------------------------------------------------------------------------------

hope sum1 can help
thanks

If you use PHPMyAdmin you can remove the hash field and the index just by clicking the links in PHPMyAdmin.

In the table listing menu click attachment.
Then in the fields table click delete in the hash row.
And last click delete a little below in the index area.

TranceMaster
02-26-2003, 04:13 PM
Originally posted by pogo


If you use PHPMyAdmin you can remove the hash field and the index just by clicking the links in PHPMyAdmin.

In the table listing menu click attachment.
Then in the fields table click delete in the hash row.
And last click delete a little below in the index area.

tried that aswell, yet for some reason the forum is going very very slow.

dont get me wrong, great hack, i would just like 2 tottaly remove it now, :(

TranceMaster
02-26-2003, 04:21 PM
ahh yea slowly but surely, the speed is improving. :)

s.molinari
02-26-2003, 07:38 PM
Hi all. There is a small (embarrassing) mistake in the functions.php replacement code. :0 Please correct the following lines in functions.php.if (!$allowduplicates) {
if ($result=$DB_site->query_first("SELECT attachmentid
FROM attachment
WHERE hash = '".addslashes($attachmenthash)."'")) {
$attachmentid = $result[attachmentid];
}
} else {
$DB_site->query("INSERT INTO attachment (attachmentid,userid,dateline,filename,filedata,vi sible,hash) VALUES (NULL,$bbuserinfo[userid],".time().",'".addslashes($attachment_name)."','".addslashes($filestuff)."','$visible','".addslashes($attachmenthash)."')");
$attachmentid=$DB_site->insert_id();

with if (!$allowduplicates) {
if ($result=$DB_site->query_first("SELECT attachmentid
FROM attachment
WHERE hash = '".addslashes($attachmenthash)."'")) {
$attachmentid = $result[attachmentid];
return $attachmentid;
}
}
$DB_site->query("INSERT INTO attachment (attachmentid,userid,dateline,filename,filedata,vi sible,hash) VALUES (NULL,$bbuserinfo[userid],".time().",'".addslashes($attachment_name)."','".addslashes($filestuff)."','$visible','".addslashes($attachmenthash)."')");
$attachmentid=$DB_site->insert_id();

We apologize for the inconvenience. The download has also been updated.

Scott

s.molinari
02-26-2003, 07:40 PM
Originally posted by TranceMaster


tried that aswell, yet for some reason the forum is going very very slow.

dont get me wrong, great hack, i would just like 2 tottaly remove it now, :( Hi,
I highly doubt that this hack is the cause of the slow down on your site.

Scott

TranceMaster
02-26-2003, 07:52 PM
Originally posted by s.molinari
Hi,
I highly doubt that this hack is the cause of the slow down on your site.

Scott
well it was right after i installed it, that the forum came almost to a halt.
all is fine now though.
forum is running normally again.

:D

Boofo
02-26-2003, 07:53 PM
Scott, how about the English version? ;)

Brad
02-26-2003, 08:14 PM
When i apply the fix to functions.php i get this error:

Parse error: parse error in /home/kous/public_html/forums/admin/functions.php on line 1756

Fatal error: Call to undefined function: makelogincode() in /home/kous/public_html/forums/global.php on line 367

Aaron1
02-27-2003, 05:30 AM
I think the code should be like this:

if (!$allowduplicates) {
if ($result=$DB_site->query_first("SELECT attachmentid
FROM attachment
WHERE hash = '".addslashes($attachmenthash)."'")) {
$attachmentid = $result[attachmentid];
return $attachmentid;
}

$DB_site->query("INSERT INTO attachment (attachmentid,userid,dateline,filename,filedata,vi sible,hash) VALUES (NULL,$bbuserinfo[userid],".time().",'".addslashes($attachment_name)."','".addslashes($filestuff)."','$visible','".addslashes($attachmenthash)."')");
$attachmentid=$DB_site->insert_id();

I guess there was just one } to much.
Correct me if i am wrong Molinari...

Grtz

Boofo
02-27-2003, 05:46 AM
The extra bracket needs to be there because you have 2 of the other ones before it.

Aaron1
02-27-2003, 06:00 AM
Hmm, you're right that sounds logical Boofo!
But strangely enough; without it, i am not having parse errors anymore.

Although adding attachements with the post don't work anymore.
Argh :)

Boofo
02-27-2003, 06:09 AM
You do have an } after the very last return $attachmentid; right? I would say check your code again. You might have missed something in the update. ;)

Aaron1
02-27-2003, 06:23 AM
Damn, I couldn't find what i did wrong!
I changed it back to the old version hack, because that worked.
Well i guess it did :(

Boofo
02-27-2003, 06:36 AM
After you last message, I tested mine to make sure it works. It works fine here, although I did notice something with it. If you have 2 messages with the same attachment in it, once you delete one of the messages, the other message no longer has an attachment hooked to it.

pogo
02-27-2003, 08:14 AM
I hope this is the last bug ;)
Scott will update the hackfile when he is online but you can do this:

In admin/functions.php find
AND threadid<>'$postinfo[threadid]' in function deletepost().

Delete it or copy it in a comment line.

Then open editpost.php and find $DB_site->query("DELETE FROM attachment WHERE attachmentid=$postinfo[attachmentid]");
updatethreadcount($threadinfo[threadid]);replace it with $otherattachs=$DB_site->query("SELECT attachmentid FROM post WHERE attachmentid=$postinfo[attachmentid]");
if ($DB_site->num_rows($otherattachs)==1) {
$DB_site->query("DELETE FROM attachment WHERE attachmentid=$postinfo[attachmentid]");
updatethreadcount($threadinfo[threadid]);
}

If you find other inconsistencies, please tell me.

Boofo
02-27-2003, 08:25 AM
Thanks for the fix. I will test it out. Also, can you please put the English version in the first post after Scott updates it? The German version is there now. ;)

Edit: Works like a charm. Thanks!

TranceMaster
02-27-2003, 01:37 PM
seem to be having a new problem
god i suck at doing this :(
sorry guys
anyway hope u can help
what it is now is, the site is running fine, speed is ok, queries ran ok(i think) but when i try 2 attach something my message appers
but no attachment appears and no error message
any ideas?

TranceMaster
02-27-2003, 02:07 PM
ok nevermind
i finally got it all sorted
thanks again guys
appreciate ur help

Boofo
02-27-2003, 02:10 PM
What was causing your problem? Just curious. ;)

s.molinari
02-27-2003, 06:17 PM
Hi,

Newest version is up. This time in english.;)

Scott

TranceMaster
02-28-2003, 01:23 PM
Originally posted by Boofo
What was causing your problem? Just curious. ;)

there seemed to be duplicate hash indexes in the database.
so i just removed them
and everything seemed to work.

TheComputerGuy
03-04-2003, 09:07 PM
great hack.. will install tonight.

gmarik
01-31-2004, 05:36 PM
I hope it works from 2.3 -> 3.0 RC2