Log in

View Full Version : google spiders 60% of my threads


SkuZZy
11-29-2002, 05:24 AM
This hack has been released!

Click here: https://vborg.vbsupport.ru/showthread.php?s=&threadid=47087

woohoo :D

After trying the other two known spider scripts out there (overgrown's hack and fastforward's) and having no luck with them, I decided to try Xenon's. Back in mid-october I added the "beta" archive script (made by xenon, who is awesome) to my forums and just today google finally added many of my threads, about 2000, to their engine! FINALLY!!!

http://www2.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=battleforums+site%3Awww.battleforums.com

I might consider releasing the hack (xenon wants me to) if there is enough interest for it... or atleast release it in beta form. However, with the release of vb3.0 so close, i'm not sure it would be worth the effort to release it as a full hack... so maybe just release it as a beta the way it is ( as seen at http://www.battleforums.com/history ) for those who are going to stay with vb2.

Any interest for this to be released? (edit: I've released it, visit https://vborg.vbsupport.ru/showthread.php?s=&threadid=47087)

Smoothie
11-29-2002, 07:00 AM
yep. :)

thomas
11-29-2002, 07:23 AM
I'm definitely interested! :)

GTGT
11-29-2002, 10:35 AM
Hell yes!

Is it the best google hack out there?

SkuZZy
11-29-2002, 10:43 AM
Originally posted by GTGT
Hell yes!

Is it the best google hack out there?

Well, as far as getting threads spidered, yes. It will put all your threads in plain .html and it draws directly from the database, so it's instant. There are some requirements for it though and it is lacking a few features. For instance, it requires mod_rewrite if you want to have the files be .html and there is no feature in it right now to block out private forums selectively. I could probably add those small features to it though, just wondering if it's worth it since 2.2.9 is supposidly the last version of the 2.x.x series... and vb3 will probably be out in a few weeks. Which is why I started this thread, to see how much interest there is for this hack..... ;)

Xenon
11-29-2002, 11:19 AM
well, there will be much users who won't upgrade fast, so it's worth to release :)

thanx for the kind words :)

Logician
11-29-2002, 11:22 AM
If Stefan coded it, I'm sure it's great. :classic:

I just wonder the algorithm of it: Does it compile .html pages for everythread seperately and save it in the server or does it create the html page on the fly whenever it's requested (without having its physical existence in the server)?

Whatever way it works, so I believe you or Stefan should release it. It will definetely prove useful for many..

Xenon
11-29-2002, 11:44 AM
well, the html file save process is skuzzy's work, my script just created &? free urls....

Velocd
11-29-2002, 02:39 PM
Skuzzy forgot to mention that is he still using some of the code from either overgrown's or fastforward's hack, which is why it's producing the .html extensions, and which is why search engines are picking up on it better. The visual appearance and layout of his Archiv Hack is much different than original, not to mention the original purpose of the first archiv hack wasn't to allow easier spidering for search engines..but reduced file size in database.

Anyway, Skuzzy please if you could share the method for doing this as Logician mentioned. It would be of great help :p

SkuZZy
11-30-2002, 03:10 PM
Originally posted by Velocd
Skuzzy forgot to mention that is he still using some of the code from either overgrown's or fastforward's hack, which is why it's producing the .html extensions, and which is why search engines are picking up on it better. The visual appearance and layout of his Archiv Hack is much different than original, not to mention the original purpose of the first archiv hack wasn't to allow easier spidering for search engines..but reduced file size in database.

Anyway, Skuzzy please if you could share the method for doing this as Logician mentioned. It would be of great help :p

I'm not using any code from overgrowns hack. Overgrown's hack never made .html extensions, just directories... but it used 404 errors to do them, which search engines don't like. But with these scripts, I simply used mod_rewrite to do the .html part. It was pretty easy and it's all done server-side, so google can't tell the difference. The .html files don't actually exist though. About fastforward's hack, i'm not using any of his code, but it's the same basic idea. The only difference is, these archive scripts require no hacking of the board, it shouldn't even really be considered a hack, just an "addon". It's as simple as uploading the 3 scripts to a sub-directory and changing a few variables (you need to have mod_rewrite enabled though). With fastforward's hack (not that I have anything against it, it's a good hack) you needed to modify alot of templates and it gets pretty messy. Plus, with vb3 around the corner, it is silly to hack a board up when upgrading will soon be here ;)

But anyways, i've made up my mind... I will be releasing this hack. I'm going to add a few variables to it and clean it up a bit and should release it over the weekend, if I get some time. I want to state once again though, this is xenon's hack and he is the mastermind behind it, i've just spiced it up a bit.

So look forward to seeing this "addon" soon.

Thanks,
SkuZZy

Logician
11-30-2002, 04:16 PM
Originally posted by SkuZZy
The .html files don't actually exist though.
I was hoping so. This is the wise algorithm. I'm not sure if my server supports the feature but if it does, I'm definetely interested in this hack.. Thx to you both for coding and sharing..

Chruser
12-01-2002, 01:56 AM
An excellent hack, if I may add. I just hope it is as good as it sounds. :)

KelteN
12-01-2002, 03:33 AM
Release it! Release it! :D

SkuZZy
12-01-2002, 12:11 PM
w00t! ok, the script is all done... nothing special, but definately cleaned up. The best part about it is how easy it is to setup, just upload all the files to a directory, change 3 variables and that's it. There are also header/footer files so you can control background color, ect.

Anyways, I need a couple of people to beta test it for me. If you're interested, send me a PM here on vb.org or contact me on AOL @ acidrush16[/b]. I need about 3.

Thanks,
SkuZZy

ArunanS
12-01-2002, 12:44 PM
My archive looks like vB3 :p

I have yet to modify it to use html :)
http://www.noxmedia.net/support/archive

If there are any copyrights to the archive look, please tell me now so I may remoe it :)

Also click on News and Announcements, because that is the only forum I hae been testing my modifications and posts in :)

SkuZZy
12-01-2002, 01:00 PM
Originally posted by ArunanS
My archive looks like vB3 :p

I have yet to modify it to use html :)
http://www.noxmedia.net/support/archive

If there are any copyrights to the archive look, please tell me now so I may remoe it :)

Also click on News and Announcements, because that is the only forum I hae been testing my modifications and posts in :)

Looks nice man. Has google spidered it yet? I noticed the thread times (they say "NaN" on your site) load slow like vbulletin 3.0 does. I sincerely hope jelsoft fixes that. Some higher loading archives (with 300+ threads) can take literly 45 seconds to completely load, because the times load one at a time. Am I the only one experiencing this?

ArunanS
12-01-2002, 01:13 PM
No...google hasn't yet, I hae yet to make the archive work efficiently. I also havn't put in the modification make the html files :)

Erwin
12-01-2002, 09:33 PM
Just using a modified extended version of fastforward's hack (that completely makes all URLs .html) I've got 35,000 threads in Google, and it's still rising. :D

Velocd
12-02-2002, 04:43 AM
I've been considering this rewrite thing to eliminate sessionhash and add .HTML for google and such to index efficiently, but there are many hacks out there for this purpose.

I'm probably going to go with the one Filburt has posted at vBulletin.com as it requires little hacking, and Fastfowards didn't work too well.

What is your version of this Erwin? Is it a hack that could be released, or is it integrated very deeply into your site that to prepare a hack for it would be too troublesome?

I took a quick browse through your forums Erwin, and you did a great job with the rewrite, and how it's all organized ;)

subduck
12-02-2002, 06:42 AM
I'm not upgrading to vb3!
So please release it for 2.8 users!

Thanks :)

pimpingfools
12-02-2002, 07:34 AM
I second that last post..

Erwin
12-02-2002, 09:03 PM
Originally posted by Velocd
What is your version of this Erwin? Is it a hack that could be released, or is it integrated very deeply into your site that to prepare a hack for it would be too troublesome?

I took a quick browse through your forums Erwin, and you did a great job with the rewrite, and how it's all organized ;)

Essentially, fastforward's hack, but applied to every page in every thread, also to memberlist, avatarlist, search pages and changed most PHP files so that sessionhash and dynamic URLs are eliminated except for latest posts etc. - too extensive to release. I don't really look forward to vB3 when I have to recode all the files again. :)

Thanks for the compliment too.

indiamike
12-02-2002, 09:09 PM
Though it's a little off topic...and I just want to chime in a little bit here....
I use vbportal and google spiders my site almost everyday (usually 20 robots at a time), somtimes twice a day. It doesn't list all the posts in the search engine, at one time it did but dropped a bunch, however the big pitfall is that Google uses up around one gig of my bandwidth a month. I have a small site and that is still way to high.

I may start blocking Google just because of this....just a little warning if your inviting Google.

Cheers Anyway
Mike

NTLDR
12-02-2002, 09:12 PM
Perhaps you should create a non-graphical style that is used when the googlebot visits? I'm sure this wouldn't be too hard to hack in.

SkuZZy
12-02-2002, 09:42 PM
Originally posted by indiamike
Though it's a little off topic...and I just want to chime in a little bit here....
I use vbportal and google spiders my site almost everyday (usually 20 robots at a time), somtimes twice a day. It doesn't list all the posts in the search engine, at one time it did but dropped a bunch, however the big pitfall is that Google uses up around one gig of my bandwidth a month. I have a small site and that is still way to high.

I may start blocking Google just because of this....just a little warning if your inviting Google.

Cheers Anyway
Mike

Good point being made here. I can see how some people might be concerned with this. I should point out, the archive scripts i'm releasing are very, very small in size the way they come. Of course, if you add images and styles into them, then they get much bigger. But using them the way they are, the bandwidth used will be virtually un-noticable. When it comes to getting google to archive your posts, the simpler = the better ;)

Erwin
12-02-2002, 10:14 PM
You can use robots.txt to block Google or any other spiders spidering specific sections of your site or certain files like .gif etc.

Smoothie
12-03-2002, 01:21 AM
and there is no feature in it right now to block out private forums selectively. I could probably add those small features to it thoughSkuZZy- This is something I need to have before using this. I only have one private forum, but it's the mod forum. Some pretty heavy stuff goes on there. I would need to block this forum.

Also, in your instructions;Add a link to the archive on your main forums page, otherwise google won't know it exists and therefore won't spider it!Whats the best way to do this? Just a plain old link on forumhome? Then other members will be able to view it?

SkuZZy
12-03-2002, 01:24 AM
Originally posted by Smoothie
SkuZZy- This is something I need to have before using this. I only have one private forum, but it's the mod forum. Some pretty heavy stuff goes on there. I would need to block this forum.

Also, in your instructions;Whats the best way to do this? Just a plain old link on forumhome? Then other members will be able to view it?

I'll see about adding something to block out private forums, shouldn't be too hard. About the link, just add a text link to it at the bottom of your forums or whatever. You don't need to do this, you could just add the site to google (http://www.google.com/addurl.html) but linking it on your site is the best and fastest way to get it spidered... plus it will inherit your "PR" which will help the pages rank higher.

DrkFusion
12-03-2002, 01:38 AM
Doesn't Xenon's Hack block all private forums to guests, and normal members.

Smoothie, just add a link at the bottom :)

Smoothie
12-03-2002, 01:55 AM
Ok, probably a dumb question, but after I add the link to my forums, what link do I submit to google?

Erwin
12-03-2002, 01:58 AM
Add your homepage = google will then spider all the links on your site it can spider.

SkuZZy
12-03-2002, 02:01 AM
Originally posted by DrkFusion
Doesn't Xenon's Hack block all private forums to guests, and normal members.

Smoothie, just add a link at the bottom :)

yeah it does already block access to private forums, but not selectively... so if you want a private forum to be spidered, you're out of luck. For now I don't think it needs to be added (selectivity)... ;)

Smoothie
12-03-2002, 02:06 AM
SkuZZy-

Would it be possible to exclude all private forums from displaying?

fello9
12-03-2002, 03:45 AM
I WANT IT!!! I WANT IT!!! I WANT IT!!!

Please release it ASAP!

Thank you!

Velocd
12-03-2002, 04:32 AM
Originally posted by Erwin
You can use robots.txt to block Google or any other spiders spidering specific sections of your site or certain files like .gif etc.

Funny this was mentioned, because when looking into Filburts tutorial for creating friendly URL's, I also came across a thread by MarkB questioning how to stop webcrawlers from consuming so much bandwidth.

Take a look at the following thread for more solutions, in regard to also using robots.txt:
http://www.vbulletin.com/forum/showthread.php?threadid=44966

I installed Filburts hack as well today, and it was incredibly easy to set up compared to the troubles I had with Fastforwards hack. If anyone is interested in Filburts, here is the thread:
http://www.vbulletin.com/forum/showthread.php?threadid=56783

SkuZZy
12-03-2002, 12:38 PM
Originally posted by Smoothie
SkuZZy-

Would it be possible to exclude all private forums from displaying?

Yes, that is build in.

Xenon
12-03-2002, 03:43 PM
I don't understand why you want privat forums to be spidered, but not shown to users, bit confusing to me ;)

@Smoothie: The permissions are the same as you have in your normal board, so logged in as admin will show you all privat forums also in the archive...

Smoothie
12-03-2002, 05:49 PM
I checked, and mod_rewrite is installed. How do I see if its enabled?

Smoothie
12-03-2002, 05:53 PM
SkuZZy-

I tried your script, but I'm getting a no-permissions error.

SkuZZy
12-03-2002, 06:40 PM
Originally posted by Smoothie
SkuZZy-

I tried your script, but I'm getting a no-permissions error.

What are you trying to do? You want your private forums to be spidered? Or you don't want them to be spidered? Private forums will show up, but when you click on them, access will be denied (unless you're logged into an account that has access to view them).

Smoothie
12-03-2002, 06:53 PM
Here's what I did. I edited the config file, made the new directory, uploaded the files. When I run the url in my browser, I get a no permissions error.

DrkFusion
12-03-2002, 07:23 PM
What do you mean permission error?
If you are logged in as Admin you see all forums, just like the normal board, if you allow viewing of it to guests, I believe guests can view the archive etc. Its just like your forums, without the layout, and in a form that the spider can 'eat' it :)

SkuZZy
12-03-2002, 08:09 PM
Originally posted by Smoothie
Here's what I did. I edited the config file, made the new directory, uploaded the files. When I run the url in my browser, I get a no permissions error.

Do you have "Guests Viewing" off? If so, then this archive won't work for you, since google won't beable to spider it.

Smoothie
12-03-2002, 08:40 PM
No, I don't have guests viewing off. here's my link:
http://www.macfora.com/forums/archive

SkuZZy
12-03-2002, 09:16 PM
Originally posted by Smoothie
No, I don't have guests viewing off. here's my link:
http://www.macfora.com/forums/archive

It's a mod_rewrite problem. You'll need to append the 3 lines in the .htaccess file to your ROOT .htaccess file. The problem is, there is a .htaccess file in a directory below /forums/archive/ and it's over-ruling the commands. I forgot to add this info in readme.

Smoothie
12-03-2002, 09:19 PM
how do i do that?

Smoothie
12-03-2002, 09:21 PM
what directory would it be in? Inside my forums directory?

Erwin
12-03-2002, 09:30 PM
He means the root directory. As high as you can go on your server.

DrkFusion
12-03-2002, 09:51 PM
This is sometimes also your httpd.conf

If you are on ensim, I believe HTTPD.Conf can be only accessed by the admin, and <virtualblah> won't work in htaccess.

Chruser
12-04-2002, 12:51 PM
www.zelaron.com/forums/threads

So much luck with that. :p
By the way, skuz, read your PM. :)

Erwin
12-05-2002, 01:54 AM
Chruser,

To download hacks or receive support you will need to go to this (http://www.vbulletin.com/members/forums.php) page and enter your email address, to show you are licensed. (you will need to use your customer number and password to access that page)

Thank you.

Chruser
12-05-2002, 12:18 PM
There are always downsides with changing your email, people tell me all the time. At least I got that proved now. :D

Erwin
12-06-2002, 02:48 AM
LOL!!!

Chruser
12-06-2002, 06:55 PM
Pfft. :P
Either way, the hack simply refuses to work, as it gives me a white page (find the link above). Of course, the hack has proved to work for others, so it's only human error, but since I'm too proud to admit that I'm a stupid fool, please help me out, so I can still have my ego.

thomas
12-10-2002, 08:01 AM
I'm a bit confused... has this hack been released or does this thread just deal with Xenon's hack?

Thanks for enlightening me. :)

Xenon
12-10-2002, 03:58 PM
The hack was never released by me, but SkuzzY plans to finish it and release it :)

boatdesign
12-11-2002, 02:16 PM
I really want this hack as well!

flup
12-11-2002, 03:12 PM
LoL WIcked

Chruser
12-11-2002, 05:23 PM
I've beta tested it, and it seems to work for everyone but me. SkuZZ even helped me out with no luck, so the problem has to be server related. It's just showing a white screen. :(

DrkFusion
12-17-2002, 08:29 PM
Hey guys
check out my archive, with just pure php I have made it work like mod_rewrite
http://www.noxmedia.net/support/archive/index.html
Note the
(http://www.noxmedia.net/support/archive/forum/3.html)
WEe :) Now Its time for me to go back and see what I modified, I modified quite a bit of stuff.

Smoothie
12-17-2002, 10:13 PM
Hey, let me know when you can release this. I want it. :)

DrkFusion
12-18-2002, 12:31 AM
I can't release it, because Xenon made it, but if he decideds to give it to you being his friend and all, I will be glad to share my modification :)

Xenon
12-18-2002, 04:03 PM
Arunan, you can talk to SkuzzY to share the modifications with him.
He has my ok to release the hack once it's finished, so i'm sure you can work on it together to optimize it and release it afterwards

JustAskJulie
12-18-2002, 05:27 PM
I would definately be interested in this hack when you release it.

cklaszlo
12-19-2002, 02:02 PM
Looking good. I have to say that its been very frustrating that our boards are not getting index. for that matter only 68 pages in our main site are index at Google. We have a massive data base of Roller coaster and Amusement parks and Nada. Not a one is indexed. Very upsetting. I hope this hack works.

Our board stats: Total Threads: 21,051 | Total Posts: 313,234

So you can see we are fairly active. :)

Thanks

DrkFusion
12-19-2002, 07:15 PM
I see....cklaszlo...I would suggest getting help from Erwin...his site is listed as 1 and 2 in google for his keywords. Him and Filburt know alot about these stuff.

corsacrazy
12-19-2002, 07:37 PM
any news on the realese of this google spider ?

Xenon
12-19-2002, 10:17 PM
hmm, i don't have news, waiting for Skuzzy :)

pleas change your sigimage, you are beyond the 300 Pixels wide limit

ptm
12-20-2002, 04:17 PM
Please do release this hack!! I'm looking forward to it. :D

cklaszlo
12-26-2002, 12:52 PM
Thanks DrkFusion! I'll give them a shout. Although this hack might work for us.

SpeedStreet
12-26-2002, 02:03 PM
I am also eagerly awaiting this hack :)

bharvey42
12-26-2002, 02:24 PM
really interested in the hack

DrkFusion
12-26-2002, 02:50 PM
I am interested too :-p

You should make give a shout to Skuzzy...he maybe forgot, he does run a pretty busy blizzard games forum.

Regards

cklaszlo
12-26-2002, 02:59 PM
Thanks for the Hack!

http://www.thrillnetwork.com/boards/archive/

One question I I know its a repeat but I must be going blind (LOL) how can I hide our private forums? Or is it a cookie issue and since I have the correct permission I see the private forums?

How do I put a black border around the list so it fits our site look?

Also, FYI.

in topics.php there is a typo:

On line 56: include "$archiveurl/footer.php";

should read:

include "$archiveurl/header.php";


Thanks

mattcary
12-26-2002, 03:58 PM
Skuzzy, please let me know when you release this new hack of yours. I'm very excited about it! My email: mattcary@mindspring.com

Regards,

Matt

partang2
12-27-2002, 07:00 AM
Hm.... the first post in this thread was made 11-29-02 and still no release...??

Xenon
12-28-2002, 09:16 PM
@cklaszlo: if you are logged in, you can see the private forums, if you log out, you can't see privates (or you set the wrong permissions in your ACP :P)

@Arunan: i have as much contact to Skuzzy as you have, so all i can do is wait ;)

SkuZZy
12-29-2002, 01:08 AM
I released it: https://vborg.vbsupport.ru/showthread.php?s=&threadid=47087

Sorry for the wait everyone. I hope it was worth it ;)

SkuZZy

SpeedStreet
12-29-2002, 03:15 AM
Thanks for Replying Skuzzy! I will be installing this first thing Monday!

SloppyGoat
12-29-2002, 04:59 AM
Ok, a few important questions.

What is mod_rewrite?

Does it work on W2K/IIS servers?

Is there a way to do this on W2K/IIS servers?