vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vB4 General Discussions (https://vborg.vbsupport.ru/forumdisplay.php?f=251)
-   -   Prevent folders being crawled (https://vborg.vbsupport.ru/showthread.php?t=309308)

DemOnstar 03-08-2014 11:20 AM

Prevent folders being crawled
 
I have a test site that is in a folder within my forum root but I notice that there are 5 google spiders crawling certain bits of it?

How does one prevent a folder being crawled by any spiders?

Ta..

ozzy47 03-08-2014 11:48 AM

Do you have a robots.txt file in your forum root?
If not add the following to it:

Code:

User-agent: *
Disallow: /forums/MY FOLDER NAME/

Remove this, forums and the trailing slash behind it if your site is in the public_html folder, or change it to your folder name that the site is in.

Change this, MY FOLDER NAME to the name of the folder.

If you already have a robots.txt file, just add this line to it:
Code:

Disallow: /forums/MY FOLDER NAME/
Of course following the above. :)

DemOnstar 03-08-2014 12:36 PM

robots.txt?
Bloody ell, something else to know more about.
Just googled it, just done it. Thanks..

I came across this robots.txt a while ago and completely forgot about it. :o

Ta..

ozzy47 03-08-2014 12:38 PM

Not a problem, glad to help. Also since you did not have one in place, you may want to add this to it also to prevent the bots going to these pages/folders:

Code:

Disallow: /cgi-bin/

Disallow: /activity.php

Disallow: /admincp/

Disallow: /announcement.php

Disallow: /calendar.php

Disallow: /cron.php

Disallow: /editpost.php

Disallow: /joinrequests.php

Disallow: /login.php

Disallow: /misc.php

Disallow: /modcp/

Disallow: /moderator.php

Disallow: /newreply.php

Disallow: /newthread.php

Disallow: /online.php

Disallow: /printthread.php

Disallow: /private.php

Disallow: /register.php

Disallow: /search.php

Disallow: /sendmessage.php

Disallow: /showgroups.php

Disallow: /showpost.php

Disallow: /subscription.php

Disallow: /subscriptions.php

Disallow: /threadrate.php

Disallow: /usercp.php


DemOnstar 03-08-2014 01:20 PM

I did an online robots.txt generating script, it came up with this..

Code:

Sitemap: https://www.xxxxxxxxxxx.com/xxxxxxxx/vbulletin_sitemap_blog_0.xml.gz


User-agent: Baiduspider
Disallow: /
User-agent: *

Somebody advised me to add Baiduspider and the rest is pretty much the same as you previously posted.

It now looks like this.

Code:

Sitemap: https://www.xxxxxxxxxxx.com/xxxxxxxx/vbulletin_sitemap_blog_0.xml.gz


User-agent: Baiduspider
Disallow: /
User-agent: *
Disallow: /cgi-bin/
Disallow: /activity.php
Disallow: /admincp/
Disallow: /announcement.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /misc.php
Disallow: /modcp/
Disallow: /moderator.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /register.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /showpost.php
Disallow: /subscription.php
Disallow: /subscriptions.php
Disallow: /threadrate.php
Disallow: /usercp.php

Which now has me totally baffled. :D

Cheers for that. :)

Max Taxable 03-08-2014 02:15 PM

Keep in mind, robots.txt is alot like gun control laws - only the law abiding pay any attention to it. Bad spiders such as Baidu and 100s of others completely ignore robots.txt. It's not a blocker, it is a list.

To block bad bots get the "Ban Spiders by User Agent" mod it is linked at the link in my sig.

DemOnstar 03-08-2014 04:08 PM

Cheers for that..
But, in my entire duration, I have had no spam whatsoever. None have registered ever.
Must be going on for 2 years now. Nothing.

I am slightly worried.

Max Taxable 03-08-2014 04:10 PM

Quote:

Originally Posted by DemOnstar (Post 2485920)
Cheers for that..
But, in my entire duration, I have had no spam whatsoever. None have registered ever.
Must be going on for 2 years now. Nothing.

I am slightly worried.

The Ban Spiders Mod isn't really a anti-spam mod per-se, it just blocks bad spiders and also anything else you put on the list. That makes it useful as part of a overall anti-spam battlement.

DemOnstar 03-08-2014 04:35 PM

If I ever get spam, I will consider it. Thanks.

I think my main problem now is ranking, or more precisely, the lack of it.
This is my next adventure into the wilderness.

Lynne 03-08-2014 06:10 PM

Test sites should be password protected. If they are, then they won't get indexed.

Max Taxable 03-08-2014 09:44 PM

I completely missed that it was a test site.


All times are GMT. The time now is 04:59 PM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01030 seconds
  • Memory Usage 1,738KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (5)bbcode_code_printable
  • (1)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (11)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete