PDA

View Full Version : Limit Spidering to ONLY the archive


Limey-YMR
07-07-2004, 01:57 AM
I help run a car club forum, it's something of a membership by association of car colour, therefore we aren't too bothered about having too many new (and potential non-same-car owning members)

I was wondering if it was possible by .htaccess or by hack of configuring our server so that spiders, mainly MSN, google and Inktomi/Yahoo could only "see" the Archive.

We are at 1and1 hosting - 100MB mySQL limit :( and today our web server was blocked from the database server for exactly two hours (to the minute), I believe this was their own firewall blocking us, possibly due to "over spidering" or our forum was just plain too busy for 1and1's sensitive firewall / IDS rules whatever the block cause, I would like to see if someone knows of an efficient way of herding the spiders :)
so I can spend time on admin/installing hacks and not have to learn how to code them or .htaccess files

Any help is greatly appreciated.

you know what they say - you can lead a spider to an archive, but you can't make it index.

AN-net
07-07-2004, 02:07 AM
um your best bet would be to use a robot.txt file and just list what forum files you dont want them to visit:)

Limey-YMR
07-07-2004, 02:37 AM
um your best bet would be to use a robot.txt file and just list what forum files you dont want them to visit:)

I thought a robots.txt file was for denying the robot altogether, I don't think it's selective, that would be the .htaccess
if anyone knows the .htaccess syntax it would be cool.

EDIT: my mistake, a bit of RTFM unearthed this handy resource for blocking spiders from specific resources
http://www.chami.com/tips/internet/010198I.html

Limey-YMR
07-13-2004, 03:55 AM
Robots wouldn't really prevent them from seeing *only* the archive, I want to deny them from seeing index.php, and since the archive is linked from there, they wouldn't spider anything - I want to make the board look like it's just an archive to a spider automagically.

Does anyone know if this is even a moot point? I've noticed that the spiders act differently these days and read the vbulletin three no longer uses session IDs (cookie based ony?)

I've switched the archive off since the spiders were pretty much ignoring it and it was probably causing them to hang around longer - I'm not so bothered about search engine exposure as I am about 8 spiders raping the site at once!