vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vB4 General Discussions (https://vborg.vbsupport.ru/forumdisplay.php?f=251)
-   -   Yet another ROBOTS.txt file question (sorry) (https://vborg.vbsupport.ru/showthread.php?t=285363)

invitezone 07-09-2012 05:42 PM

Yet another ROBOTS.txt file question (sorry)
 
Can someone give me some advice on this please.
I have a public section and a private section of my forum.
I want to allow bots from search engines to crawl and index my public forum in order to get some traffic through it.
But I also want to absolutely deny any bots at all from access and crawling anything else.

Can I specify a robots.txt that says allow 1 forum category and disallow everywhere else?

I have searched but can't find any information on these specifics. Obviously I found lots of robots.txt stuff, but nothing that can be of specific help.

Thanks for your help.

Zachery 07-09-2012 05:55 PM

Nope, that's not what robots.txt is for anyway.

invitezone 07-09-2012 07:08 PM

ok thanks, so how can I achieve what I described?
any help on that?

Thanks

nhawk 07-09-2012 07:22 PM

ACP->Forums & Moderators->Forum Permissions

Change the 'Unregistered/Not Logged In' permissions for forums you don't want viewed to all 'No'.

With that setting, the forums won't even exist to robots or unregistered users.

invitezone 07-09-2012 08:25 PM

oh as simple as that, ok didn't realise, im still new to most of this. So what is the point of the robots.txt other than to deny all bots? THANKS

nhawk 07-10-2012 04:34 PM

Robots.txt is good for asking robots not to access things like register.php, search.php, etc.

Note I said 'asking'. Not all robots obey robots.txt.

invitezone 07-10-2012 08:59 PM

for what reason would you do that? BTW thanks again for your patience and help

nhawk 07-11-2012 08:21 AM

Many pages serve no purpose to a robot and are just a waste of bandwidth when they crawl those pages.

Register.php, login.php, search.php, subscription.php, profile.php are just a few of those types of pages.

Macsee 07-11-2012 11:24 AM

There are threads around with lists of the pages you might want to include in your robots.txt block.

Note that the blocking suggested above blocks not just robots but also unregistered visitors. If that is your intention, fine. If you want to allow everyone to see a particular forum and just don't want that forum's threads appearing in search engines then I don't believe you can do that with vB.

nhawk 07-11-2012 12:53 PM

Quote:

Originally Posted by Macsee (Post 2346953)
Note that the blocking suggested above blocks not just robots but also unregistered visitors. If that is your intention, fine. If you want to allow everyone to see a particular forum and just don't want that forum's threads appearing in search engines then I don't believe you can do that with vB.

I said that. ;)

Macsee 07-11-2012 01:02 PM

You said what, that it can't be done with vB? ;)

In fact, I made that post hoping someone would tell me I was wrong and point to a way one could allow visitors into a forum but prevent that forum's threads from showing up in the SEs.

ForceHSS 07-11-2012 01:08 PM

If the forum is set to private spiders will not be able to access it anyway

Macsee 07-11-2012 01:28 PM

Setting it to private blocks unregistered visitors.

I know the logic may sound screwed up, but there is sense in such a setup. For example, where you want to allow links in the sub-forum but want to dissuade people posting links in there for the SEO benefit.

nhawk 07-11-2012 01:30 PM

Quote:

Originally Posted by Macsee (Post 2346974)
You said what, that it can't be done with vB? ;)

In fact, I made that post hoping someone would tell me I was wrong and point to a way one could allow visitors into a forum but prevent that forum's threads from showing up in the SEs.

I said the this part of your first post...

Quote:

Originally Posted by Macsee (Post 2346953)
.....

Note that the blocking suggested above blocks not just robots but also unregistered visitors. If that is your intention, fine....


invitezone 07-11-2012 03:21 PM

ok all good info thanks for your time everyone.
I actually want a small section of my forum to be PUBLIC and I want search engines to index it.
I just dont want to have 50 bots killing my bandwidth.

any answer to that?

Thanks a million for your help.

Nichtofen 08-04-2012 05:47 PM

bump...

1. Create a robots.txt to eliminate pages like profiles, search, etc from spiders (This will also help eliminate profiles and other pages from coming up in searches by potential traffic.)

2. Create permissions per forum for unregistered visitors like 'nhawk' described and be careful as visitors will not be able to view these forums either depending on configurations.

3. If you really wish to control which robots/spiders are accessing your forums and utilizing bandwidth, consider this add on. I do however recommend you read all 350 posts in order to guarantee your success. There is a lot to read on user agents even beyond the resources available within the tread in order to properly use this program and ensure you are not shutting out spiders that could help your traffic and cause.


Ban Spiders by User Agent
by Simon Lloyd

zascok 08-05-2012 02:31 PM

here is the start for you, then get robots you don't like covered by
User-agent: Name of Robot
Disallow: /

or ban them ^ just like said above

robot.txt
Code:

User-agent: *
Crawl-delay: 10
Disallow: /*.js
Disallow: /clientscript/
Disallow: /customgroupicons/
Disallow: /packages/
Disallow: /signaturepics/
Disallow: /customprofilepics/
Disallow: /store_sitemap/
Disallow: /vb/
Disallow: /cpstyles/
Disallow: /cron.php
Disallow: /customavatars/
Disallow: /customprofilepics/
Disallow: /includes/
Disallow: /images/
Disallow: /ajax.php
Disallow: /album.php
Disallow: /announcement.php
Disallow: /api.php
Disallow: /apichain.php
Disallow: /asset.php
Disallow: /assetmanage.php
Disallow: /attachment.php
Disallow: /attachment_inlinemod.php
Disallow: /blog_attachment.php
Disallow: /calendar.php
Disallow: /ckeditor.php
Disallow: /clear.gif
Disallow: /converse.php
Disallow: /cron.php
Disallow: /css.php
Disallow: /editor.php
Disallow: /editpost.php
Disallow: /entry.php
Disallow: /external.php
Disallow: /faq.php
Disallow: /favicon.ico
Disallow: /global.php
Disallow: /group.php
Disallow: /group_inlinemod.php
Disallow: /groupsubscription.php
Disallow: /image.php
Disallow: /infraction.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /LICENSE
Disallow: /list.php
Disallow: /login.php
Disallow: /member.php
Disallow: /member_inlinemod.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /mobile.php
Disallow: /moderation.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /payment_gateway.php
Disallow: /payments.php
Disallow: /picture.php
Disallow: /picture_inlinemod.php
Disallow: /picturecomment.php
Disallow: /poll.php
Disallow: /posthistory.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /subscription.php
Disallow: /threadrate.php
Disallow: /threadtag.php
Disallow: /uploadprogress.gif
Disallow: /usercp.php
Disallow: /usernote.php
Disallow: /visitormessage.php
Disallow: /widget.php
Disallow: /xmlsitemap.php


invitezone 08-05-2012 05:09 PM

thanks a lot for this zascok.
if I dont want to ban any spiders can I just copy and paste this into a txt file and leave it at that?

zascok 08-05-2012 06:12 PM

yup and nope you can't leave it at that you gotta up it into the root of your forum :)

Nichtofen 08-05-2012 06:58 PM

Indeed. The file should be located at www.yourdomain.com/robot.txt . Right in your root assuming that you have that access. That is where Google, as well as other well behaved bots will look for it. It gets more complicated if you are on a shared server through a provider that gives you a default address such as www.sharedservercompany.com/yourusername or something similar. In that situation, you would require you host to assist if they are able.

--------------- Added [DATE]1344202148[/DATE] at [TIME]1344202148[/TIME] ---------------

Curiously,
Within your code you do not have a "forums/" prefix on your items. That would be required if your forum was located within a forums folder in the root, correct?

zascok 08-05-2012 07:41 PM

Quote:

Originally Posted by Nichtofen (Post 2354783)
Curiously,
Within your code you do not have a "forums/" prefix on your items. That would be required if your forum was located within a forums folder in the root, correct?


all the same with /forums on front of each line for the forum itself, the rest is up to what you have in the root. I just don't have anything else but forum :) so it's right in the top.

Code:

Disallow: /forums/*.js
...
...
Disallow: /forums/xmlsitemap.php


Nichtofen 08-05-2012 07:50 PM

To clarify:
Code:

Disallow: /forums/*.js
...
...
Disallow: /forums/xmlsitemap.php

Has the same exact effect as:
Code:

Disallow: /*.js
...
...
Disallow: /xmlsitemap.php

Just wanted to make sure I understood correctly. I will not bother with changing unless it is necessary. If it automatically seeks a sub-folder on the server anywhere with that name then I will just leave it.

Thanks in advance zascok!

--------------- Added [DATE]1344206047[/DATE] at [TIME]1344206047[/TIME] ---------------

Quote:

Originally Posted by zascok (Post 2354795)
I just don't have anything else but forum :) so it's right in the top.

Gotcha, thanks!

invitezone 08-06-2012 12:08 PM

Quote:

Originally Posted by zascok (Post 2354771)
yup and nope you can't leave it at that you gotta up it into the root of your forum :)

hehehe, erm yeah I understand that much :p
I meant if I don't want to ban any bots or spiders from my site, I just want to limit them to the usual stuff, I can just leave your example file as it is, unedited, and upload that to forum root right?

Thanks

Nichtofen 08-06-2012 08:08 PM

That is correct. Put robots.txt into your root with the contents that zascok gracefully provided. :)


All times are GMT. The time now is 08:47 PM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01718 seconds
  • Memory Usage 1,790KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (4)bbcode_code_printable
  • (6)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (24)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete