The Arcive of Official vBulletin Modifications Site.

ozzy47 · #8 09-03-2014, 07:55 PM

About /robots.txt

In a nutshell

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

Code:

User-agent: * Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

So don't try to use /robots.txt to hide information.

Why did this robot ignore my /robots.txt?

It could be that it was written by an inexperienced software writer. Occasionally schools set their students "write a web robot" assignments.
But, these days it's more likely that the robot is explicitly written to scan your site for information to abuse: it might be collecting email addresses to send email spam, look for forms to post links ("spamdexing"), or security holes to exploit.

Can I block just bad robots?

In theory yes, in practice, no. If the bad robot obeys /robots.txt, and you know the name it scans for in the User-Agent field. then you can create a section in your /robotst.txt to exclude it specifically. But almost all bad robots ignore /robots.txt, making that pointless.

If the bad robot operates from a single IP address, you can block its access to your web server through server configuration or with a network firewall.

If copies of the robot operate at lots of different IP addresses, such as hijacked PCs that are part of a large Botnet, then it becomes more difficult. The best option then is to use advanced firewall rules configuration that automatically block access to IP addresses that make many connections; but that can hit good robots as well your bad robots.

X vBulletin 3.8.12 by vBS Debug Information
Page Generation 0.06968 seconds Memory Usage 2,456KB Queries Executed 12 (?)
More Information
Template Usage: (1)SHOWTHREAD (1)ad_footer_end (1)ad_footer_start (1)ad_header_end (1)ad_header_logo (1)ad_navbar_below (1)ad_showthread_beforeqr (1)bbcode_code (4)bbcode_quote (1)footer (1)forumjump (1)forumrules (1)gobutton (1)header (1)headinclude (1)navbar (3)navbar_link (120)option (8)post_thanks_box (8)post_thanks_button (1)post_thanks_javascript (1)post_thanks_navbar_search (8)post_thanks_postbit_info (8)postbit (8)postbit_onlinestatus (8)postbit_wrapper (1)showthread_list (1)spacer_close (1)spacer_open (1)tagbit_wrapper Phrase Groups Available: global inlinemod postbit posting reputationlevel showthread	Included Files: ./showthread.php ./global.php ./includes/init.php ./includes/class_core.php ./includes/config.php ./includes/functions.php ./includes/class_hook.php ./includes/modsystem_functions.php ./includes/functions_bigthree.php ./includes/class_postbit.php ./includes/class_bbcode.php ./includes/functions_reputation.php ./includes/functions_threadedmode.php ./includes/functions_post_thanks.php Hooks Called: init_startup init_startup_session_setup_start init_startup_session_setup_complete cache_permissions fetch_postinfo_query fetch_postinfo fetch_threadinfo_query fetch_threadinfo fetch_foruminfo style_fetch cache_templates global_start parse_templates global_setup_complete showthread_start showthread_getinfo forumjump showthread_post_start showthread_query_postids_threaded showthread_threaded_construct_link showthread_query bbcode_fetch_tags bbcode_create showthread_postbit_create postbit_factory postbit_display_start post_thanks_function_post_thanks_off_start post_thanks_function_post_thanks_off_end post_thanks_function_fetch_thanks_start post_thanks_function_fetch_thanks_end post_thanks_function_thanked_already_start post_thanks_function_thanked_already_end fetch_musername postbit_imicons bbcode_parse_start bbcode_parse_complete_precache bbcode_parse_complete postbit_display_complete post_thanks_function_can_thank_this_post_start tag_fetchbit_complete forumrules navbits navbits_complete showthread_complete
Messages:

The Arcive of Official vBulletin Modifications Site.

It is not a VB3 engine, just a parsed copy!