View Full Version : robots.txt for 3.8.2 - Any Ideas?
vbplusme
04-23-2009, 02:37 AM
Hello and Greetings,
I have just noticed that Google Webmaster Tools is complaining about a LOT of URLs being restricted by my robots.txt file.
Is anyone else having this problem? If not, can I get an example of a robots.txt written for 3.8.2?
I tweaked mine thinking that I was fixing a duplicate content problem but I apparently crossed the line on it :D
Any ideas, suggestions greatly appreciated.
TIA
Dismounted
04-23-2009, 05:23 AM
What is it currently?
vbplusme
04-23-2009, 07:20 AM
Sorry, should have thought to post it:
User-agent: *
#Crawl-Delay: 10
Disallow: /admincp/
Disallow: /ajax.php
Disallow: /announcement.php
Disallow: /archive/
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cgi-bin/
Disallow: /chat/
Disallow: /clientscript/
Disallow: /converse.php
Disallow: /cpstyles/
Disallow: /cron.php
Disallow: /customavatars/
Disallow: /customgroupicons/
Disallow: /customprofilepics/
Disallow: /editpost.php
Disallow: /faq.php
Disallow: /forumdisplay.php?daysprune
Disallow: /forumdisplay.php?do
Disallow: /forumdisplay.php?order
Disallow: /forumdisplay.php?page
Disallow: /forumdisplay.php?pp
Disallow: /forumdisplay.php?sort
Disallow: /gallery/
Disallow: /global.php
Disallow: /group_inlinemod.php
Disallow: /groupsubscription.php
Disallow: /images/
Disallow: /includes/
Disallow: /infraction.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php
Disallow: /member_inlinemod.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /modcp/
Disallow: /moderation.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /payment_gateway.php
Disallow: /payments.php
Disallow: /personal/
Disallow: /printthread.php
Disallow: /profile.php?do
Disallow: /register.php
Disallow: /report.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showpost.php
Disallow: /showthread.php?goto
Disallow: /showthread.php?mode
Disallow: /showthread.php?p
Disallow: /showthread.php?page
Disallow: /showthread.php?post
Disallow: /showthread.php?pp
Disallow: /signaturepics/
Disallow: /subscription.php
User-Agent: msnbot
Crawl-Delay: 10
User-Agent: Slurp
Crawl-Delay: 10
veenuisthebest
04-23-2009, 08:25 AM
Two points in addition to above robots.txt:-
1. We should not give out our admincp directory in robots.txt as it makes the location displayable to the world. What is the use of renaming admincp feature then?
2. Also its good to give a referance to our sitemap at the end of robots.txt
Sitemap: http://site.com/sitemap_index.xml.gz
vbplusme
04-23-2009, 08:54 AM
Thanks for the comments, had not thought about the sitemap reference in there. thanks for that. I double password protect my admincp folder though I could easily take it out of the list altogether as the bots can not access it anyway so thanks for that comment as well.
vbplusme
04-24-2009, 06:36 PM
Anyone see any problem with the content of this robots.txt or have any idea how to fix the "google" complaining about the restrictions? TIA
hambil
04-24-2009, 10:33 PM
Thanks for the comments, had not thought about the sitemap reference in there. thanks for that. I double password protect my admincp folder though I could easily take it out of the list altogether as the bots can not access it anyway so thanks for that comment as well.
It's not an issue of whether they access it or not, it is whether they try. Everything they try and fail at is wasted bandwidth and resources. If you password protect your admincp and modcp directories there is no reason to leave them out of robots.txt.
I pretty much followed the advice in this article, and have had not complaints from google: http://www.theadminzone.com/forums/showthread.php?t=19872
vbplusme
04-25-2009, 01:16 AM
As a matter of fact I did use those guidelines to construct my robots.txt (and the follow on suggestions in that thread). I forgot about that, thanks for reminding me about it.
vbplusme
04-26-2009, 08:35 AM
I do have a follow on question on the robots.txt file that I am currently using. I have the vbulletin blog software installed on this site as well as wordpress. I have not disallows in this robots.txt for any blog files. I would not have thought anything about it except that I just looked at my sitemap and see a huge number of URLs for blog stuff that doesn't really exist like archives from 1983?
Anyone have a suggestion about a sensible robots.txt entry for both the vbulletin blog and wordpress?
TIA for any ideas.
hambil
04-26-2009, 11:12 AM
Sounds like a sitemap issue not a robot.txt issue. I know that's not an answer per say, but I'd be looking at why your sitemap contains links that don't exist, instead.
vbplusme
04-26-2009, 11:45 AM
I can exclude it from the sitemap for sure but I thought it was pretty strange to be referencing archives from 1970 through current. The links looks like:
hxxp://www.mysite.com/blog.php?do=list&m=12&y=1970
hxxp://www.mysite.com/blog.php?u=1&m=12&y=1970
hxxp://www.mysite.com/blog.php?do=list&m=11&y=1970
hxxp://www.mysite.com/blog.php?u=1&m=11&y=1970
hxxp://www.mysite.com/blog.php?do=list&m=10&y=1970
hxxp://www.mysite.com/blog.php?u=1&m=10&y=1970
hxxp://www.mysite.com/blog.php?do=list&m=9&y=1970
hxxp://www.mysite.com/blog.php?u=1&m=9&y=1970
hxxp://www.mysite.com/blog.php?do=list&m=8&y=1970
hxxp://www.mysite.com/blog.php?u=1&m=8&y=1970
hxxp://www.mysite.com/blog.php?do=list&m=7&y=1970
hxxp://www.mysite.com/blog.php?u=1&m=7&y=1970
hxxp://www.mysite.com/blog.php?do=list&m=6&y=1970
hxxp://www.mysite.com/blog.php?u=1&m=6&y=1970
hxxp://www.mysite.com/blog.php?do=list&m=5&y=1970
hxxp://www.mysite.com/blog.php?u=1&m=5&y=1970
hxxp://www.mysite.com/blog.php?do=list&m=4&y=1970
hxxp://www.mysite.com/blog.php?u=1&m=4&y=1970
hxxp://www.mysite.com/blog.php?do=list&m=3&y=1970
hxxp://www.mysite.com/blog.php?u=1&m=3&y=1970
hxxp://www.mysite.com/blog.php?do=list&m=2&y=1970
hxxp://www.mysite.com/blog.php?u=1&m=2&y=1970
hxxp://www.mysite.com/blog.php?do=list&m=1&y=1970
hxxp://www.mysite.com/blog.php?u=1&m=1&y=1970
The sitemap software is finding this for every year to current?
BSMedia
04-26-2009, 02:17 PM
Those are sorting URL's that google finds and goes crazy on. Block off blog.php and call it a day.
It will also do this for the calendar, usually on newer vBulletin sites for some reason if you use the site:http://sitename.com you'll see 500 pages of calender sorting URL's.
Its fine to have the warning messages in GWT that tell you access is restricted to those URL's, they provide no value what so ever to your sites rankings, and should be blocked off as with most sorting URL's since it will just be duplicated content from else where.
vbplusme
04-26-2009, 02:47 PM
Thanks very much for the reply. I appreciate it. It fits with what I suspected on the duplicate content issues I am trying to solve. I will add blog.php to my robots.txt file and exclude it from my sitemap.
thanks again.
--------------- Added 1240771303 at 1240771303 ---------------
Thanks very much for the reply. I appreciate it. It fits with what I suspected on the duplicate content issues I am trying to solve. I will add blog.php to my robots.txt file and exclude it from my sitemap.
thanks again.
--------------- Added 1240797229 at 1240797229 ---------------
Those are sorting URL's that google finds and goes crazy on. Block off blog.php and call it a day.
It will also do this for the calendar, usually on newer vBulletin sites for some reason if you use the site:http://sitename.com you'll see 500 pages of calender sorting URL's.
Its fine to have the warning messages in GWT that tell you access is restricted to those URL's, they provide no value what so ever to your sites rankings, and should be blocked off as with most sorting URL's since it will just be duplicated content from else where.
Do Tags do the same thing, i.e. provide duplicate content with no value to the PR? I see a boat load of them in the sitemap too.
TIA for a reply.
vBulletin® v3.8.12 by vBS, Copyright ©2000-2025, vBulletin Solutions Inc.