View Full Version : Prevent folders being crawled
DemOnstar
03-08-2014, 11:20 AM
I have a test site that is in a folder within my forum root but I notice that there are 5 google spiders crawling certain bits of it?
How does one prevent a folder being crawled by any spiders?
Ta..
ozzy47
03-08-2014, 11:48 AM
Do you have a robots.txt file in your forum root?
If not add the following to it:
User-agent: *
Disallow: /forums/MY FOLDER NAME/
Remove this, forums and the trailing slash behind it if your site is in the public_html folder, or change it to your folder name that the site is in.
Change this, MY FOLDER NAME to the name of the folder.
If you already have a robots.txt file, just add this line to it:
Disallow: /forums/MY FOLDER NAME/
Of course following the above. :)
DemOnstar
03-08-2014, 12:36 PM
robots.txt?
Bloody ell, something else to know more about.
Just googled it, just done it. Thanks..
I came across this robots.txt a while ago and completely forgot about it. :o
Ta..
ozzy47
03-08-2014, 12:38 PM
Not a problem, glad to help. Also since you did not have one in place, you may want to add this to it also to prevent the bots going to these pages/folders:
Disallow: /cgi-bin/
Disallow: /activity.php
Disallow: /admincp/
Disallow: /announcement.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /misc.php
Disallow: /modcp/
Disallow: /moderator.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /register.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /showpost.php
Disallow: /subscription.php
Disallow: /subscriptions.php
Disallow: /threadrate.php
Disallow: /usercp.php
DemOnstar
03-08-2014, 01:20 PM
I did an online robots.txt generating script, it came up with this..
Sitemap: https://www.xxxxxxxxxxx.com/xxxxxxxx/vbulletin_sitemap_blog_0.xml.gz
User-agent: Baiduspider
Disallow: /
User-agent: *
Somebody advised me to add Baiduspider and the rest is pretty much the same as you previously posted.
It now looks like this.
Sitemap: https://www.xxxxxxxxxxx.com/xxxxxxxx/vbulletin_sitemap_blog_0.xml.gz
User-agent: Baiduspider
Disallow: /
User-agent: *
Disallow: /cgi-bin/
Disallow: /activity.php
Disallow: /admincp/
Disallow: /announcement.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /misc.php
Disallow: /modcp/
Disallow: /moderator.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /register.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /showpost.php
Disallow: /subscription.php
Disallow: /subscriptions.php
Disallow: /threadrate.php
Disallow: /usercp.php
Which now has me totally baffled. :D
Cheers for that. :)
Max Taxable
03-08-2014, 02:15 PM
Keep in mind, robots.txt is alot like gun control laws - only the law abiding pay any attention to it. Bad spiders such as Baidu and 100s of others completely ignore robots.txt. It's not a blocker, it is a list.
To block bad bots get the "Ban Spiders by User Agent" mod it is linked at the link in my sig.
DemOnstar
03-08-2014, 04:08 PM
Cheers for that..
But, in my entire duration, I have had no spam whatsoever. None have registered ever.
Must be going on for 2 years now. Nothing.
I am slightly worried.
Max Taxable
03-08-2014, 04:10 PM
Cheers for that..
But, in my entire duration, I have had no spam whatsoever. None have registered ever.
Must be going on for 2 years now. Nothing.
I am slightly worried.The Ban Spiders Mod isn't really a anti-spam mod per-se, it just blocks bad spiders and also anything else you put on the list. That makes it useful as part of a overall anti-spam battlement.
DemOnstar
03-08-2014, 04:35 PM
If I ever get spam, I will consider it. Thanks.
I think my main problem now is ranking, or more precisely, the lack of it.
This is my next adventure into the wilderness.
Lynne
03-08-2014, 06:10 PM
Test sites should be password protected. If they are, then they won't get indexed.
Max Taxable
03-08-2014, 09:44 PM
I completely missed that it was a test site.
vBulletin® v3.8.12 by vBS, Copyright ©2000-2025, vBulletin Solutions Inc.