PDA

View Full Version : Spider/Bot Assault in progress


Pinpoint2RHh
07-17-2003, 11:16 AM
Hello all,

This is my first post to these forums, Usualy I read this massive library, and find the answers I need. However yesterday, I was looking ofver my forum and I have found that a massive spider or Bot assault is going on at my website. I have traced the bouts IP back to inktomisearch.com which is a search engine that suposedly got bought out by Yahoo.

I have had up to 44 Spider/bot visitors who are I assume "web crawling" my site. I use the Vbportal v2.3.0 (Final) and I am currently using vbulletin V2.3.0.

My question is how can I dump these little spiders as they have been at my site for 2 days now, is there a hack to loose these bots, or should I not even bother with it?

Just looking for suggestions or methods really!

Take care all, you guys Rock!
Pinpoint2RHh
http://www.2RHh.com

Oblivion Knight
07-17-2003, 03:05 PM
I've had these bots crawling Umbrella Online since yesterday afternoon..
I wouldn't know why you'd want to get rid of them though, search engine bots crawling more of your site = more "results" located by a search with Yahoo! or other search engines = more traffic to your forums.. ;)

Dan
07-17-2003, 03:36 PM
search for robot.txt on the net and it will tell you how to stop them

allan grossman
07-18-2003, 02:03 AM
There are several ways to keep the spiders out if you want to.

You can use a meta tag on any page to keep robots from crawling that page or any links they find on it.

<meta name="robots" content="noindex, nofollow">

You can place a text file called "robots.txt" in the web root (and only in the web root) that says

User-agent: inktomisearch.com
Disallow: /

or you can use an .htaccess file in any directory you don't want them in - like this:

Order allow,deny
allow from all
deny from inktomisearch.com

Notice there is no space between allow and deny - that's important. You can also use this in httpd.conf if you're running Apache.

But - if you'd prefer not to do DNS lookups for everyone that hits your site you can do it like this -

Order allow,deny
allow from all
deny from 209.131.63

but you'd substitute the subnet the spiders are coming from.

Pinpoint2RHh
07-18-2003, 03:34 PM
Thanks so much Allan thats perfect info!

I use Apache, So I think it would be best to just refuse the IP block method.

Thanks again, I was looking over the text for the Robots, that Oricon pointed out. (Thanks to you too Oricon)

Did I mention that you folks rock! :cheeky:

Pinpoint2RHh