Go Back   vb.org Archive > vBulletin Modifications > vBulletin 4.x Modifications > vBulletin 4.x Add-ons
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools
Ban Spiders by User Agent Details »»
Ban Spiders by User Agent
Version: 3.1.2, by Simon Lloyd Simon Lloyd is offline
Developer Last Online: May 2023 Show Printable Version Email this Page

Category: Miscellaneous Hacks - Version: 4.x.x Rating:
Released: 08-08-2011 Last Update: 12-17-2014 Installs: 491
Uses Plugins
 
No support by the author.

What this mod does
With this mod you can enter User Agents to watch or ban, you can also recieve emails or have an Output.txt created and updated with time and date of visits. It doesn't just have to be spiders, you can watch, log or ban any useragent!

How to install
Simply import the product ban_spider, the mod is active by default but none of the other options are turned on.

What is a UserAgent?
http://en.wikipedia.org/wiki/User_agent

Understanding a UserAgent string
http://user-agent-string.info/parse

Genuine User Getting Blocked?
https://vborg.vbsupport.ru/showpost....&postcount=105

Tools to help
http://whatsmyuseragent.com/SwitchingUserAgents.asp
http://www.botsvsbrowsers.com/SimulateUserAgent.asp

FAQ
https://vborg.vbsupport.ru/showpost....&postcount=137

How does it work?
https://vborg.vbsupport.ru/showpost....&postcount=381

What's a bot?
http://en.wikipedia.org/wiki/Spambot

How do i ban a bot?
https://vborg.vbsupport.ru/showpost....&postcount=318
https://vborg.vbsupport.ru/showpost....7&postcount=51

Where's output.txt located?
https://vborg.vbsupport.ru/showpost....&postcount=216

Bad bot lists
https://vborg.vbsupport.ru/showpost....&postcount=259
https://vborg.vbsupport.ru/showpost....&postcount=224
https://vborg.vbsupport.ru/showpost....&postcount=281

Tested on vb3.7.x, vB3.8.x , vB4.x.x but should work on any version.

__________________________________________________ __________________
Special thanks to:
Lior
KH99
BoP5
for helping me sort out a few issues

...and beta testers

ForceHSS (Special thanks to Force for latest testing)
ozzy47
GreyHost

If you use this please mark as INSTALLED

History
9th June 2011 Orginal xml added
12th June 2011 Added both email notification and text file logging
22nd June 2011 Version 2.0.0, Added create thread on activity
  1. Added match facility you can now use something like Yandex and it will match MOZILLA/5.0 (COMPATIBLE; YANDEXBOT/3.0; +HTTP://YANDEX.COM/BOTS)
  2. Added clickable link to visited thread
22nd September 2011 added user redirect url selection
08th October Beta testing started for thread creation.
20th October Beta testing started for emailing.
21st October Beta testing complete Ver 3.0.0 uploaded
29th October minor fix added to cope with empty userid on thread creation
30th October Beta testing automatic redirection to spiders/bots IP
31st October New xml uploaded with automatic redirect to IP
25th November Minor fix for blank forumid fixed
26th November 2011 Fixed version check & create thread Off by default
17th December 2014 Version 3.1.0 uploaded, Hook changed extra logging and statistics added by Ozzy47 (Chris)
18th December 2014 Version 3.1.1 uploaded, prevented spiders being counted when mod turned off.
17th December 2014 Version 3.1.2 uploaded, due to rogue code from another mod
The Bad Bots list is now included in the product
Please prune out all those that you wish to be able to see your site (i suggest you definately prune out "DA" and "Custo" :

Support will now only be given to those who have this mod marked as INSTALLED

Download Now

File Type: xml product-ban_spider4x.xml (30.8 KB, 469 views)

Supporters / CoAuthors

Show Your Support

  • This modification may not be copied, reproduced or published elsewhere without author's permission.

Comments
  #482  
Old 02-11-2013, 05:20 PM
fly fly is offline
 
Join Date: Oct 2003
Posts: 1,215
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Max Taxable View Post
Amazon AWS is their hosting they sell. And yes they also crawl the web: http://aws.amazon.com/search-engines/

I have it blocked as well, using this Mod.
Did you read that? There is nowhere in that link that says that Amazon themselves crawl websites. Can you even explain why a hosting company would want to catalog data from every website on the internet?

I'm wondering if there is some confusion on what a user agent is and does. The UA is the remote web crawlers way of tell you that it is there cataloging your site. It's not required that a crawler send you a UA at all. Instead, its just considered polite. If someone wanted to, they could send a completely random UA every time or not send one at all.

Since Amazon AWS is in the hosting business, they have no need to crawl websites at all. However, this doesn't PREVENT people from buying their own server from Amazon and crawling your website. If someone were to do this, the UA would be whatever they wanted it to be, not some form of "AmazonAWS".

Assuming what you're really trying to do is prevent anyone from buying a server from Amazon and accessing your website, you'll need to find all the IP blocks that AWS owns and block those. However, that is outside the scope of this mod.
Reply With Quote
  #483  
Old 02-11-2013, 07:58 PM
Simon Lloyd's Avatar
Simon Lloyd Simon Lloyd is offline
 
Join Date: Aug 2008
Location: Manchester
Posts: 3,481
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

For reference here's what a user agent is and some extra info http://en.wikipedia.org/wiki/User_agent. All this mod is designed to do is stop bots from eating up your bandwidth by redirecting them before any content loads. To be honest you can never stop anyone who is intent on scraping your site from doing so.
Reply With Quote
  #484  
Old 02-11-2013, 11:00 PM
Max Taxable's Avatar
Max Taxable Max Taxable is offline
 
Join Date: Feb 2011
Posts: 3,134
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by fly View Post
Did you read that? There is nowhere in that link that says that Amazon themselves crawl websites. Can you even explain why a hosting company would want to catalog data from every website on the internet?

I'm wondering if there is some confusion on what a user agent is and does. The UA is the remote web crawlers way of tell you that it is there cataloging your site. It's not required that a crawler send you a UA at all. Instead, its just considered polite. If someone wanted to, they could send a completely random UA every time or not send one at all.

Since Amazon AWS is in the hosting business, they have no need to crawl websites at all. However, this doesn't PREVENT people from buying their own server from Amazon and crawling your website. If someone were to do this, the UA would be whatever they wanted it to be, not some form of "AmazonAWS".

Assuming what you're really trying to do is prevent anyone from buying a server from Amazon and accessing your website, you'll need to find all the IP blocks that AWS owns and block those. However, that is outside the scope of this mod.
The "amazonaws" crawlers have that designation in their UA string. Anything else coming from Amazon has it in its host description.

The rest of your missive, I am well aware of.
Reply With Quote
  #485  
Old 02-12-2013, 12:55 AM
fly fly is offline
 
Join Date: Oct 2003
Posts: 1,215
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

ok.
Reply With Quote
  #486  
Old 03-03-2013, 01:39 AM
Inspector G Inspector G is offline
 
Join Date: Dec 2012
Posts: 43
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I have a confusing question...
Ok I have a very small member site...like 24 members...
So when I noticed I had 35 users online most of the time and I started seeing more and more baidu spiders
I decided to do something about it...

I installed this mod.
almost instantly ...well within say 3 hours my users online soared to well over 150 on busy times like Now...tonight.
I had
Most users ever online was 247, 1 Day Ago at 12:58 AM.

With only one new account created, and maybe me or one other registered user online...

My question is this. what happened when I installed this mod to make such a drastic change in the users on my site and why?

I do not understand this and I read that the server load increases...
I find it hard to believe that anyone is finding my site via a search engine since it is a brand new .cc name and it has only been online for two months now...

Is there something about pushing away Baidu that enables more sites to come, or Spam bots?
attempting to register and what not, many are in areas that there would not be a normal user.

I see many attempts a registering and yet no more new users.,.. so I believe those are bots locking...

Please advise...
Reply With Quote
  #487  
Old 03-03-2013, 03:42 AM
Simon Lloyd's Avatar
Simon Lloyd Simon Lloyd is offline
 
Join Date: Aug 2008
Location: Manchester
Posts: 3,481
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

What's happening is (and you'll probably find this) is because Baidu can't get in with the spiders/ip's they were using they are now trying a rotation of other ip's and bots, i use this mod myself although i don't ban the bots as i monitor their visits to further enhance any mod i make against them, i currently have 236 baidu bots (and 140 other bots/search engines) at my site.

With the mod in place and redirection working you'll find that these bots that you have banned will slowly drop off as they all get the message of the 301 permananet redirect to wherever you've decided to send them, your server load will lessen and things will be more normal
Reply With Quote
  #488  
Old 03-03-2013, 03:44 AM
Simon Lloyd's Avatar
Simon Lloyd Simon Lloyd is offline
 
Join Date: Aug 2008
Location: Manchester
Posts: 3,481
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Also do you have your robots.txt set up correctly to stop the search engines or bots that obey robots.txt from indexing pages on your site that they shouldn't like register.php, members.php ....etc?
Reply With Quote
  #489  
Old 03-03-2013, 03:59 AM
Inspector G Inspector G is offline
 
Join Date: Dec 2012
Posts: 43
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I did not understand how to do the text part since I am what I even call very green in this aspect of Vbulleting...
so I just installed the mod...
I can wait and see if it drops off and report back...
Thanks for the help in understanding...
Reply With Quote
  #490  
Old 03-03-2013, 05:34 AM
Simon Lloyd's Avatar
Simon Lloyd Simon Lloyd is offline
 
Join Date: Aug 2008
Location: Manchester
Posts: 3,481
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Ok, what you need to do is upload the attached to your forum root, however if your forum is at this level www.mysite.com/ then edit the attached to remove /forums if your forum is at this level www.mysite.com/forums then you can just upload it to that folder.

You can add any page or file to robots.txt that you wish, just follow the same structure
Attached Files
File Type: txt robots.txt (1.4 KB, 28 views)
Reply With Quote
Благодарность от:
Inspector G
  #491  
Old 03-03-2013, 05:56 AM
Inspector G Inspector G is offline
 
Join Date: Dec 2012
Posts: 43
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Well thanks Simon...
Thats really nice...
I will do so immediately.
Nice to see someone really help out the Noob...lol
Thanks again I appreciate this very much...
I will report back.
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 07:50 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.10363 seconds
  • Memory Usage 2,366KB
  • Queries Executed 27 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (2)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)modsystem_post
  • (1)navbar
  • (4)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (2)pagenav_pagelinkrel
  • (11)post_thanks_box
  • (17)post_thanks_box_bit
  • (11)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (2)post_thanks_postbit
  • (11)post_thanks_postbit_info
  • (10)postbit
  • (2)postbit_attachment
  • (11)postbit_onlinestatus
  • (11)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • fetch_musername
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • post_thanks_function_fetch_thanks_bit_start
  • post_thanks_function_show_thanks_date_start
  • post_thanks_function_show_thanks_date_end
  • post_thanks_function_fetch_thanks_bit_end
  • post_thanks_function_fetch_post_thanks_template_start
  • post_thanks_function_fetch_post_thanks_template_end
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_attachment
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete