Go Back   vb.org Archive > vBulletin Modifications > vBulletin 4.x Modifications > vBulletin 4.x Add-ons
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools
Ban Spiders by User Agent Details »»
Ban Spiders by User Agent
Version: 3.1.2, by Simon Lloyd Simon Lloyd is offline
Developer Last Online: May 2023 Show Printable Version Email this Page

Category: Miscellaneous Hacks - Version: 4.x.x Rating:
Released: 08-08-2011 Last Update: 12-17-2014 Installs: 491
Uses Plugins
 
No support by the author.

What this mod does
With this mod you can enter User Agents to watch or ban, you can also recieve emails or have an Output.txt created and updated with time and date of visits. It doesn't just have to be spiders, you can watch, log or ban any useragent!

How to install
Simply import the product ban_spider, the mod is active by default but none of the other options are turned on.

What is a UserAgent?
http://en.wikipedia.org/wiki/User_agent

Understanding a UserAgent string
http://user-agent-string.info/parse

Genuine User Getting Blocked?
https://vborg.vbsupport.ru/showpost....&postcount=105

Tools to help
http://whatsmyuseragent.com/SwitchingUserAgents.asp
http://www.botsvsbrowsers.com/SimulateUserAgent.asp

FAQ
https://vborg.vbsupport.ru/showpost....&postcount=137

How does it work?
https://vborg.vbsupport.ru/showpost....&postcount=381

What's a bot?
http://en.wikipedia.org/wiki/Spambot

How do i ban a bot?
https://vborg.vbsupport.ru/showpost....&postcount=318
https://vborg.vbsupport.ru/showpost....7&postcount=51

Where's output.txt located?
https://vborg.vbsupport.ru/showpost....&postcount=216

Bad bot lists
https://vborg.vbsupport.ru/showpost....&postcount=259
https://vborg.vbsupport.ru/showpost....&postcount=224
https://vborg.vbsupport.ru/showpost....&postcount=281

Tested on vb3.7.x, vB3.8.x , vB4.x.x but should work on any version.

__________________________________________________ __________________
Special thanks to:
Lior
KH99
BoP5
for helping me sort out a few issues

...and beta testers

ForceHSS (Special thanks to Force for latest testing)
ozzy47
GreyHost

If you use this please mark as INSTALLED

History
9th June 2011 Orginal xml added
12th June 2011 Added both email notification and text file logging
22nd June 2011 Version 2.0.0, Added create thread on activity
  1. Added match facility you can now use something like Yandex and it will match MOZILLA/5.0 (COMPATIBLE; YANDEXBOT/3.0; +HTTP://YANDEX.COM/BOTS)
  2. Added clickable link to visited thread
22nd September 2011 added user redirect url selection
08th October Beta testing started for thread creation.
20th October Beta testing started for emailing.
21st October Beta testing complete Ver 3.0.0 uploaded
29th October minor fix added to cope with empty userid on thread creation
30th October Beta testing automatic redirection to spiders/bots IP
31st October New xml uploaded with automatic redirect to IP
25th November Minor fix for blank forumid fixed
26th November 2011 Fixed version check & create thread Off by default
17th December 2014 Version 3.1.0 uploaded, Hook changed extra logging and statistics added by Ozzy47 (Chris)
18th December 2014 Version 3.1.1 uploaded, prevented spiders being counted when mod turned off.
17th December 2014 Version 3.1.2 uploaded, due to rogue code from another mod
The Bad Bots list is now included in the product
Please prune out all those that you wish to be able to see your site (i suggest you definately prune out "DA" and "Custo" :

Support will now only be given to those who have this mod marked as INSTALLED

Download Now

File Type: xml product-ban_spider4x.xml (30.8 KB, 469 views)

Supporters / CoAuthors

Show Your Support

  • This modification may not be copied, reproduced or published elsewhere without author's permission.

Comments
  #392  
Old 11-01-2012, 11:18 PM
ForceHSS ForceHSS is offline
 
Join Date: Apr 2008
Posts: 6,357
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Aboundex/0.2
seems to be a new one here is the full thing Aboundex/0.2 (http://www.aboundex.com/crawler/)
ip is 173.193.219.168-static.reverse.softlayer.com
I have checked the ip and it has come back that a spam bot is using it

If someone wants to run checks see if it needs added to the list. I am not 100% sure if it is this is why it needs checked first
Reply With Quote
  #393  
Old 11-01-2012, 11:21 PM
CAG CheechDogg's Avatar
CAG CheechDogg CAG CheechDogg is offline
 
Join Date: Feb 2012
Location: Riverside, California USA
Posts: 1,080
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Simon Lloyd View Post
Banning ips are only for incoming unless you've banned them in cpanel or htaccess. As for asking about the delay there's no problem i like to help where i can.

Yeah I used htaccess to ban them completely. I need to find out exactly what IPs I can band and still allow facebook to work properly when I post links to articles or posts...sigh...lol

But hanks for understanding and helping out, it is very much appreciated Simon
Reply With Quote
  #394  
Old 11-01-2012, 11:39 PM
Max Taxable's Avatar
Max Taxable Max Taxable is offline
 
Join Date: Feb 2011
Posts: 3,134
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by CAG CheechDogg View Post
Yeah I used htaccess to ban them completely. I need to find out exactly what IPs I can band and still allow facebook to work properly when I post links to articles or posts...sigh...lol

But hanks for understanding and helping out, it is very much appreciated Simon
In my experience, that's the only time the FB external hit bot comes to your site - when you or someone else posts a link to your site, on facebook. It's your friend. Same with twitterbot and all of its affiliates. I don't mess with those at all.
Reply With Quote
  #395  
Old 11-01-2012, 11:52 PM
CAG CheechDogg's Avatar
CAG CheechDogg CAG CheechDogg is offline
 
Join Date: Feb 2012
Location: Riverside, California USA
Posts: 1,080
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Max Taxable View Post
In my experience, that's the only time the FB external hit bot comes to your site - when you or someone else posts a link to your site, on facebook. It's your friend. Same with twitterbot and all of its affiliates. I don't mess with those at all.
Max it's weird because I have the facebook like buttons off on my forums. I do have rssgraffiti but I don't see why that would be hitting pages like the mood and status module and other unrelated pages.
Reply With Quote
  #396  
Old 11-02-2012, 12:02 AM
Max Taxable's Avatar
Max Taxable Max Taxable is offline
 
Join Date: Feb 2011
Posts: 3,134
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by CAG CheechDogg View Post
Max it's weird because I have the facebook like buttons off on my forums. I do have rssgraffiti but I don't see why that would be hitting pages like the mood and status module and other unrelated pages.
Some autospam bots do spoof their user agents as facebook or even googlebot.
Reply With Quote
  #397  
Old 11-02-2012, 12:14 AM
CAG CheechDogg's Avatar
CAG CheechDogg CAG CheechDogg is offline
 
Join Date: Feb 2012
Location: Riverside, California USA
Posts: 1,080
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Max Taxable View Post
Some autospam bots do spoof their user agents as facebook or even googlebot.

Great! now you tell me ! lol...Thanks again Max I will have to take a careful look at the IPs and try to see if they match facebooks then.
Reply With Quote
  #398  
Old 11-02-2012, 12:22 AM
Max Taxable's Avatar
Max Taxable Max Taxable is offline
 
Join Date: Feb 2011
Posts: 3,134
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

From what I've seen over the years facebook's bots have good behavior and only come to see you when something is posted there, from your site. Then they don't crawl around and they SURE don't go anywhere suspicious.
Reply With Quote
  #399  
Old 11-02-2012, 02:52 AM
CAG CheechDogg's Avatar
CAG CheechDogg CAG CheechDogg is offline
 
Join Date: Feb 2012
Location: Riverside, California USA
Posts: 1,080
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Yeah, nothing suspicious about facebook's crawlers, scrapers or bots , what ever they are. But it has caused my forums to pop the 500 internal server error a bunch of times , I check around the time those errors happen and are reported to me and its facebook's ips around the times of the 500 errors.
Reply With Quote
Благодарность от:
Max Taxable
  #400  
Old 11-02-2012, 03:48 PM
CAG CheechDogg's Avatar
CAG CheechDogg CAG CheechDogg is offline
 
Join Date: Feb 2012
Location: Riverside, California USA
Posts: 1,080
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Well I decided to completely deny Facebook crawlers, scrapers, spiders or bots to crawl my site.

I deleted all their active sessions from my database through phpMyAdmin and added "facebook" to the list and I haven't gotten one single facebook critter on my site since.

Sucks because I can no longer share anything on facebook from my forums but I just had to do it. Facebook wont reply and doesn't seem to care about eating up bandwidth with their crawlers.

Oh well.
Reply With Quote
  #401  
Old 11-02-2012, 04:03 PM
TheSupportForum TheSupportForum is offline
 
Join Date: Jan 2007
Posts: 1,158
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by CAG CheechDogg View Post
Well I decided to completely deny Facebook crawlers, scrapers, spiders or bots to crawl my site.

I deleted all their active sessions from my database through phpMyAdmin and added "facebook" to the list and I haven't gotten one single facebook critter on my site since.

Sucks because I can no longer share anything on facebook from my forums but I just had to do it. Facebook wont reply and doesn't seem to care about eating up bandwidth with their crawlers.

Oh well.
a wise choice if that's happening to you, as they will eat up your bandwidth
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 07:56 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.06741 seconds
  • Memory Usage 2,365KB
  • Queries Executed 27 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (6)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)modsystem_post
  • (1)navbar
  • (4)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (2)pagenav_pagelinkrel
  • (11)post_thanks_box
  • (17)post_thanks_box_bit
  • (11)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (2)post_thanks_postbit
  • (11)post_thanks_postbit_info
  • (10)postbit
  • (1)postbit_attachment
  • (11)postbit_onlinestatus
  • (11)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • fetch_musername
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • post_thanks_function_fetch_thanks_bit_start
  • post_thanks_function_show_thanks_date_start
  • post_thanks_function_show_thanks_date_end
  • post_thanks_function_fetch_thanks_bit_end
  • post_thanks_function_fetch_post_thanks_template_start
  • post_thanks_function_fetch_post_thanks_template_end
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_attachment
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete