PDA

View Full Version : Prevent folders being crawled


DemOnstar
03-08-2014, 11:20 AM
I have a test site that is in a folder within my forum root but I notice that there are 5 google spiders crawling certain bits of it?

How does one prevent a folder being crawled by any spiders?

Ta..

ozzy47
03-08-2014, 11:48 AM
Do you have a robots.txt file in your forum root?
If not add the following to it:

User-agent: *
Disallow: /forums/MY FOLDER NAME/

Remove this, forums and the trailing slash behind it if your site is in the public_html folder, or change it to your folder name that the site is in.

Change this, MY FOLDER NAME to the name of the folder.

If you already have a robots.txt file, just add this line to it:
Disallow: /forums/MY FOLDER NAME/

Of course following the above. :)

DemOnstar
03-08-2014, 12:36 PM
robots.txt?
Bloody ell, something else to know more about.
Just googled it, just done it. Thanks..

I came across this robots.txt a while ago and completely forgot about it. :o

Ta..

ozzy47
03-08-2014, 12:38 PM
Not a problem, glad to help. Also since you did not have one in place, you may want to add this to it also to prevent the bots going to these pages/folders:

Disallow: /cgi-bin/

Disallow: /activity.php

Disallow: /admincp/

Disallow: /announcement.php

Disallow: /calendar.php

Disallow: /cron.php

Disallow: /editpost.php

Disallow: /joinrequests.php

Disallow: /login.php

Disallow: /misc.php

Disallow: /modcp/

Disallow: /moderator.php

Disallow: /newreply.php

Disallow: /newthread.php

Disallow: /online.php

Disallow: /printthread.php

Disallow: /private.php

Disallow: /register.php

Disallow: /search.php

Disallow: /sendmessage.php

Disallow: /showgroups.php

Disallow: /showpost.php

Disallow: /subscription.php

Disallow: /subscriptions.php

Disallow: /threadrate.php

Disallow: /usercp.php

DemOnstar
03-08-2014, 01:20 PM
I did an online robots.txt generating script, it came up with this..


Sitemap: https://www.xxxxxxxxxxx.com/xxxxxxxx/vbulletin_sitemap_blog_0.xml.gz


User-agent: Baiduspider
Disallow: /
User-agent: *

Somebody advised me to add Baiduspider and the rest is pretty much the same as you previously posted.

It now looks like this.


Sitemap: https://www.xxxxxxxxxxx.com/xxxxxxxx/vbulletin_sitemap_blog_0.xml.gz


User-agent: Baiduspider
Disallow: /
User-agent: *
Disallow: /cgi-bin/
Disallow: /activity.php
Disallow: /admincp/
Disallow: /announcement.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /misc.php
Disallow: /modcp/
Disallow: /moderator.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /register.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /showpost.php
Disallow: /subscription.php
Disallow: /subscriptions.php
Disallow: /threadrate.php
Disallow: /usercp.php

Which now has me totally baffled. :D

Cheers for that. :)

Max Taxable
03-08-2014, 02:15 PM
Keep in mind, robots.txt is alot like gun control laws - only the law abiding pay any attention to it. Bad spiders such as Baidu and 100s of others completely ignore robots.txt. It's not a blocker, it is a list.

To block bad bots get the "Ban Spiders by User Agent" mod it is linked at the link in my sig.

DemOnstar
03-08-2014, 04:08 PM
Cheers for that..
But, in my entire duration, I have had no spam whatsoever. None have registered ever.
Must be going on for 2 years now. Nothing.

I am slightly worried.

Max Taxable
03-08-2014, 04:10 PM
Cheers for that..
But, in my entire duration, I have had no spam whatsoever. None have registered ever.
Must be going on for 2 years now. Nothing.

I am slightly worried.The Ban Spiders Mod isn't really a anti-spam mod per-se, it just blocks bad spiders and also anything else you put on the list. That makes it useful as part of a overall anti-spam battlement.

DemOnstar
03-08-2014, 04:35 PM
If I ever get spam, I will consider it. Thanks.

I think my main problem now is ranking, or more precisely, the lack of it.
This is my next adventure into the wilderness.

Lynne
03-08-2014, 06:10 PM
Test sites should be password protected. If they are, then they won't get indexed.

Max Taxable
03-08-2014, 09:44 PM
I completely missed that it was a test site.