PDA

View Full Version : What should be in robots.txt?


K4GAP
11-04-2013, 07:03 PM
What should be in robots.txt?

Elite_360_
11-04-2013, 09:01 PM
this is what i have on my site i have it on the root of my site my front end uses joomla

here a link about robots txt (http://www.robotstxt.org/robotstxt.html)




User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /forum/admincp/
Disallow: /forum/clientscript/
Disallow: /forum/cpstyles/
Disallow: /forum/customavatars/
Disallow: /forum/customprofilepics/
Disallow: /forum/images/
Disallow: /forum/includes
Disallow: /forum/modcp/
Disallow: /forum/ajax.php
Disallow: /forum/attachment.php
Disallow: /forum/calendar.php
Disallow: /forum/cron.php
Disallow: /forum/editpost.php
Disallow: /forum/global.php
Disallow: /forum/image.php
Disallow: /forum/inlinemod.php
Disallow: /forum/joinrequests.php
Disallow: /forum/login.php
Disallow: /forum/member.php
Disallow: /forum/memberlist.php
Disallow: /forum/misc.php
Disallow: /forum/moderator.php
Disallow: /forum/newattachment.php
Disallow: /forum/newreply.php
Disallow: /forum/newthread.php
Disallow: /forum/online.php
Disallow: /forum/poll.php
Disallow: /forum/postings.php
Disallow: /forum/printthread.php
Disallow: /forum/private.php
Disallow: /forum/profile.php
Disallow: /forum/register.php
Disallow: /forum/report.php
Disallow: /forum/reputation.php
Disallow: /forum/search.php
Disallow: /forum/sendmessage.php
Disallow: /forum/showgroups.php
Disallow: /forum/subscription.php
Disallow: /forum/threadrate.php
Disallow: /forum/usercp.php
Disallow: /forum/usernote.php

ForceHSS
11-05-2013, 02:27 AM
this is what i have on my site i have it on the root of my site my front end uses joomla

here a link about robots txt (http://www.robotstxt.org/robotstxt.html)




User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /forum/admincp/
Disallow: /forum/clientscript/
Disallow: /forum/cpstyles/
Disallow: /forum/customavatars/
Disallow: /forum/customprofilepics/
Disallow: /forum/images/
Disallow: /forum/includes
Disallow: /forum/modcp/
Disallow: /forum/ajax.php
Disallow: /forum/attachment.php
Disallow: /forum/calendar.php
Disallow: /forum/cron.php
Disallow: /forum/editpost.php
Disallow: /forum/global.php
Disallow: /forum/image.php
Disallow: /forum/inlinemod.php
Disallow: /forum/joinrequests.php
Disallow: /forum/login.php
Disallow: /forum/member.php
Disallow: /forum/memberlist.php
Disallow: /forum/misc.php
Disallow: /forum/moderator.php
Disallow: /forum/newattachment.php
Disallow: /forum/newreply.php
Disallow: /forum/newthread.php
Disallow: /forum/online.php
Disallow: /forum/poll.php
Disallow: /forum/postings.php
Disallow: /forum/printthread.php
Disallow: /forum/private.php
Disallow: /forum/profile.php
Disallow: /forum/register.php
Disallow: /forum/report.php
Disallow: /forum/reputation.php
Disallow: /forum/search.php
Disallow: /forum/sendmessage.php
Disallow: /forum/showgroups.php
Disallow: /forum/subscription.php
Disallow: /forum/threadrate.php
Disallow: /forum/usercp.php
Disallow: /forum/usernote.php


I would remove your admincp and modcp from there

K4GAP
11-05-2013, 04:04 AM
I would remove your admincp and modcp from there

Just trying to catch on to make sure mine is configured properly. Why would you not list those two items? Not needed or, some specific reason?

Max Taxable
11-05-2013, 04:07 AM
Just trying to catch on to make sure mine is configured properly. Why would you not list those two items? Not needed or, some specific reason?The robots.txt file is publicly viewable - has to be for the bots to "read." You'll be letting script kiddies know where the backend is located.

K4GAP
11-05-2013, 04:19 AM
The robots.txt file is publicly viewable - has to be for the bots to "read." You'll be letting script kiddies know where the backend is located.

Ok so for me to open my forum to badbots or "scripy kiddies" I'm at risk, I got that.

Then here is my headache, restrict the bad guys and lose being found by folks looking for the type of content my site offers or, take a chance so I may gain more members?

I'm going to start Googleing and hopefully there is a sweet spot to go with.

Thanks

Elite_360_
11-05-2013, 04:40 AM
I would remove your admincp and modcp from there

those dir are password protected im not worried so before you can login the back end you have to enter a custom username password for admincp and modcp dir

ForceHSS
11-05-2013, 05:10 AM
those dir are password protected im not worried so before you can login the back end you have to enter a custom username password for admincp and modcp dir
I have mine pass worded as well but a good hacker can still hack any forum no matter what you have in place

Max Taxable
11-05-2013, 04:22 PM
Ok so for me to open my forum to badbots or "scripy kiddies" I'm at risk, I got that.

Then here is my headache, restrict the bad guys and lose being found by folks looking for the type of content my site offers or, take a chance so I may gain more members?

I'm going to start Googleing and hopefully there is a sweet spot to go with.

ThanksNah... I would just let the good bots index admincp and modcp, the bad bots don't obey robots.txt anyway.

Robots.txt is alot like gun laws - only the law abiding obey them. The bad bots just flat out ignore it. True story.

EDIT to add: It's far more effective to just block the bad bots, using Simon Lloyd's "Ban Spiders by User Agent." (https://vborg.vbsupport.ru/showthread.php?t=264932) One of the best Mods in the history of Mods.

Videx
11-06-2013, 02:35 PM
If we're taking a poll, I have both those files in mine. It never occurred to me that a hacker would need that information for some nefarious purpose. I mean, once they get into the server they're going to know all that stuff anyway.

More important I think is keeping the db backed up locally regularly. Go ahead and hack my site; I'll just reinstall and be right back.

Max Taxable
11-06-2013, 03:13 PM
If we're taking a poll, I have both those files in mine. It never occurred to me that a hacker would need that information for some nefarious purpose. I mean, once they get into the server they're going to know all that stuff anyway.Nobody mentioned hackers. Script Kiddies, was the meter.

There is no reason to include admincp and modcp in robots.txt. The bad bots are going there anyway.

Digital Jedi
11-06-2013, 06:58 PM
Ok so for me to open my forum to badbots or "scripy kiddies" I'm at risk, I got that.

Then here is my headache, restrict the bad guys and lose being found by folks looking for the type of content my site offers or, take a chance so I may gain more members?

I'm going to start Googleing and hopefully there is a sweet spot to go with.

Thanks

Particularly, you don't want to include your admin and mod directories if you've changed them for the very reason vBulletin let's you change them in the first place. To make it harder for a hacker to guess where they are.

But as an aside, it's pointless to put them in robots.txt, since, as was mentioned, robots.txt is an honor system thing. Legitimate bots already know about and have no use for your Admin CP (presuming the name hasn't been changed.) Whereas bad bots aren't going to honor it in the first place. Robots.txt is something you use to control how much legitimate bots see and index. Don't even consider it for bad bots. For bad bots, consider other blocking tools such as Simon's modification or .htaccess. In my case, my host let's me ban IP addresses from all my domains via cPanel, which simply automates the editing of .htaccess across all my domains.

For robots.txt, you may want to try a little trial and error. A few years ago I was getting slammed by MSNbot. I didn't know why (probably because Bing was about to startup in a year or two), but it was hogging system resources and was exacerbating pre-existing conditions. So I set a crawl-delay for MSN.

User-agent: msnbot
Crawl-delay: 3

Basically, I'm saying you tweak robots.txt according to your needs. Look at your modifications that have their own unique pages. Do you feel the need to have it index every one of those? Just as an example, if you had iTrader installed, you probably wouldn't need the iTrader ratings page indexed. Or the arcade. Or a page devoted to Facebook login. You probably have default pages that don't need to be indexed, like login.php or private.php. Robot.txt will help to keep bots from wasting their time there.

K4GAP
11-07-2013, 04:29 AM
That's some good info, thanks.