PDA

View Full Version : Confused !


jimv8673
09-03-2014, 02:03 PM
Google is pretty much non stop hitting my site, but this is what i see when its there

Google Spider 10:54 AM Viewing 'No Permission' Message /calendar.php?do=getinfo&day=2014-9-30&c=1 Viewing Event 66.249.69.169

How can i stop this ??

Simon Lloyd
09-03-2014, 02:24 PM
Organise your robots.txt to block them :)
http://www.vbulletin.com/forum/forum/vbulletin-4/vbulletin-4-questions-problems-and-troubleshooting/354525-perfect-vbulletin-4-robots-txt

Max Taxable
09-03-2014, 02:51 PM
Organise your robots.txt to block themIt won't block them, it just asks please don't visit these files/folders.:D

Google is usually friendly though, and usually obeys robots.txt.

To the OP: Keep in mind it may take a few days before you see the obedience.

RichieBoy67
09-03-2014, 04:25 PM
Yep, as the above said..use the robots.txt and add files and directories you do not want crawled..

It is a good thing to see your site being crawled by Google. :)

Simon Lloyd
09-03-2014, 04:44 PM
It won't block them, it just asks please don't visit these files/folders.:D

Google is usually friendly though, and usually obeys robots.txt.

To the OP: Keep in mind it may take a few days before you see the obedience.It's just symantics :), if you go to your Google Webmaster tools it doesn't say "Kindly asked not to look at these locations" it says "blocked by robots.txt".....i like blocked it sounds so much meaner ;)

RichieBoy67
09-03-2014, 04:51 PM
It's just symantics :), if you go to your Google Webmaster tools it doesn't say "Kindly asked not to look at these locations" it says "blocked by robots.txt".....i like blocked it sounds so much meaner ;)

True but many bots just ignore the robots.txt file completely.

Blocked does sound meaner!:D

Max Taxable
09-03-2014, 07:46 PM
It's just symantics :), if you go to your Google Webmaster tools it doesn't say "Kindly asked not to look at these locations" it says "blocked by robots.txt".....i like blocked it sounds so much meaner ;)Oh I understand that, you do, most all of we more experienced webbers do - but these noobs don't. They see "block" they think it really means, "block" and will be back in a week complaining it didn't work!:D

ozzy47
09-03-2014, 07:55 PM
About /robots.txt

In a nutshell

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: * Disallow: / The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
There are two important considerations when using /robots.txt:
robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.So don't try to use /robots.txt to hide information.



Why did this robot ignore my /robots.txt?

It could be that it was written by an inexperienced software writer. Occasionally schools set their students "write a web robot" assignments.
But, these days it's more likely that the robot is explicitly written to scan your site for information to abuse: it might be collecting email addresses to send email spam, look for forms to post links ("spamdexing (http://en.wikipedia.org/wiki/Spamdexing)"), or security holes to exploit.


Can I block just bad robots?

In theory yes, in practice, no. If the bad robot obeys /robots.txt, and you know the name it scans for in the User-Agent field. then you can create a section in your /robotst.txt to exclude it specifically. But almost all bad robots ignore /robots.txt, making that pointless.


If the bad robot operates from a single IP address, you can block its access to your web server through server configuration or with a network firewall.


If copies of the robot operate at lots of different IP addresses, such as hijacked PCs that are part of a large Botnet (http://en.wikipedia.org/wiki/Botnet), then it becomes more difficult. The best option then is to use advanced firewall rules configuration that automatically block access to IP addresses that make many connections; but that can hit good robots as well your bad robots.