View Full Version : Miscellaneous Hacks - Ban Spiders by User Agent
ofir24
12-16-2011, 02:53 PM
is it work on vb 4.1.7?
Stefan118
12-16-2011, 02:54 PM
is it work on vb 4.1.7?
Yes, and also on 4.1.8
Simon Lloyd
12-16-2011, 03:35 PM
if you check out the version above it works on 4.x.x that means any version from 4.0.0 and onwards :)
gosiah23
12-22-2011, 01:14 PM
Works like a charm!!! Thank you so much :)
Simon Lloyd
12-22-2011, 08:29 PM
Glad you like it, check your resource usage after a month and you'll see a dramatic drop in your bandwidth usage and cpu time :)
Boofo
12-27-2011, 02:12 PM
Simon, you might want to add:
Twiceler
xpymep2.exe
to the spiders list in the mod.
Simon Lloyd
12-27-2011, 02:39 PM
Hi Boofo, thanks for that however there are quite a few maintained lists around Lee that posts here has posted his list at vbulletin.com and at vbseo.com his pet hate is bad bots n spiders so he has a hugeeeee list, the ones i added to the mod are just to get people started and to see how to enter their bots :)
Boofo
12-27-2011, 04:38 PM
A link to those lists would be a good addition, also.
Simon Lloyd
12-29-2011, 08:57 AM
Try these:
http://www.forumpostersunion.com/showthread.php?t=1644
http://www.vbseo.com/f34/how-create-vbulletin-bot-scraper-trap-47378/index4.html
But to be honest it's nothing that a little googling or binging wont reslove :)
Boofo
12-29-2011, 09:24 AM
I was referring to those that aren't familiar with bots and how they act. The more knowledge available when releasing any mods here is always a GOOD thing. Sorry if I interrupted anything with that request.
Simon Lloyd
12-29-2011, 01:18 PM
Well i suppose the secondd of those two links has more explanation but i didn't get that from your post, i thought you wanted links to lists posting, but for my money that can be a bad thing, people would then blindly enter all the bots to the banning list when really, dependant on their content, they will want some of those scraping or visiting their site, it's a personal preference really.
I built this because the bots were killing my bandwidth and making the forum slow so i ban the more agressive ones like Baidu, why on earth would they need 215 bots indexing my site?, so ban them in favour for the less aggressive :)
Anyway as an information post this link explains what they are http://en.wikipedia.org/wiki/Spambot
Max Taxable
12-29-2011, 01:49 PM
i ban the more agressive ones like Baidu, why on earth would they need 215 bots indexing my site?Yeah, 215 Chinese bots leeching resources, to allegedly index your site for people who most likely can never see it. Most US sites are blocked in China. Baidu makes no sense at all, it certainly doesn't help anybody. By far the worst behaving bot out there, it totally ignores robots.txt.
ForceHSS
12-30-2011, 03:39 PM
A link to those lists would be a good addition, also.
https://vborg.vbsupport.ru/showpost.php?p=2265667&postcount=224
Simon Lloyd
12-30-2011, 03:42 PM
Lol, thanks Force ;)
spillage
01-22-2012, 08:02 PM
Since upgrading to vB4.1.10, I've noticed (occasionally) some spiders in the list still showing up online.
Anyone else having this issue, and any ideas what's going on?
TiA
Simon Lloyd
01-22-2012, 08:53 PM
You shouldn't unless they are using a different UA, the spiders xml that you can download from Mosh's site (vb.com thread) may have them in the list with the same name but their UA may be different to that in the list :)
spillage
01-22-2012, 09:32 PM
A couple of examples of ones in my list that (recently) show as online;
AhrefsBot
Exabot
gigabaz
Heritrix
Majestics MJ12bot
I though this hack picked up on all things similar (name based)?... ie "Majestics" should also take out "Majestics MJ12bot", regardless of the software they're using to access the site.
How do we include same name spiders in our list that are using different UA's?
Simon Lloyd
01-23-2012, 04:40 AM
You are right entering Exabot should prevent that bot viewing your site, however entering "Majestics MJ12bot" will not stop a bot with either "Majestics" or "MJ12bot" it will only stop bots with that entire phrase as part of their UA, so if you want to ban that bot enter the seperated values.
Can you give me a link to your site?
spillage
01-24-2012, 02:07 AM
Simon, I thought about that and included the individual entries when the issue first arose.
nscale.net ... however, the WGO block is only visible to members... PM me if you want temporary access.
TombstoneWarrior
01-24-2012, 10:30 AM
what does this option mean? " Create new thread for each UA detection (be aware that this can cause hundreds of threads at first until spiders get the mesage!)
You can use this even if you aren't using the ban option above" also what does this option do. will it creat a thread in my forum?
ALSO WHY IN A DISCRIPTION FROMA MEMBER DO THEY HAVE THIS OPTION SET TO YES??
Simon Lloyd
01-24-2012, 01:49 PM
Firstly that does exactly as it says, It WILL create threads!, hundreds and hundreds of them :) and one of the members here who say they have it set to "yes" chose to do so!
BlueCheri
02-18-2012, 08:44 AM
Downloaded, suggested this by a friend.
Think must be useful, let us see how it works.
Thanx.
G!
sarasotarepub
02-24-2012, 12:24 PM
Well done Simon.
We were having an infestation of Baidu Spiders but your Mod took care of them. I installed it last night and this morning they (and others) were gone.
Simon Lloyd
02-24-2012, 06:37 PM
Glad it's helping you :)
bosanci28
03-04-2012, 03:20 PM
hmmm,
why do i have all this in my ""VSa - Advanced Forum Statistics"
this:
* activity from bot no.7 (baidu) in your list*
also notice from 2 other bots like: yandex and webster.
thanks for any help...
Simon Lloyd
03-04-2012, 03:45 PM
In forum statistics you will see the bot activity as they do access your site but are then redirected before they can open the thread...etc that they were attempting, if you are seeing the actual phrase "activity from bot no.7 (baidu) in your list" then it's because you have the notifications turned on in the mod :)
bosanci28
03-04-2012, 03:52 PM
so just turn off the option at: "Create New Thread" ?
thanks
Simon Lloyd
03-04-2012, 04:15 PM
Yep, turn it off, it's just a way of monitoring it working or if you are only testing one bot at a time, otherwise you'll get thousands of the damn things ;)
meaters
03-14-2012, 02:54 PM
Awesome mod, thanks!
Saved our community from Baidu, hundreds of bots were online persistenly to the point of crashing our server.
Simon Lloyd
03-14-2012, 03:08 PM
Please mark it as installed :)
Max Taxable
03-14-2012, 03:08 PM
Awesome mod, thanks!
Saved our community from Baidu, hundreds of bots were online persistenly to the point of crashing our server.And with only the addition of, per line:
MSIE 1
MSIE 2
MSIE 3
MSIE 4
MSIE 5
MSIE 6
You end 99.9% of all spam bot registration attempts and cut garbage traffic even further.
Here's my entire ban list for this Mod:
baiduspider
beta.statsit.com
statsit
SiteIntel
Yandex
GomezAgent
FunWebProducts
MSIE 1
MSIE 2
MSIE 3
MSIE 4
MSIE 5
MSIE 6
w3m
Simon Lloyd
03-14-2012, 03:25 PM
Are you dead sure on those early IE's?
Max, could you mark as installed please?
Max Taxable
03-14-2012, 03:39 PM
Are you dead sure on those early IE's?I am dead sure the percentage of human beings still using these dinosaurs is infinitesimally small, so small they're not worth worrying about losing. (None of my 3,200+ users have these, for example)
I am also dead sure that entering these into your Mod doesn't interfere with IE 7,8,9 etc. Tested and verified.
I am also dead sure that the IsBot Mod I have is still working, but that since I put the dinosaur IE's in your Mod - it went from catching 40-50 bot registration attempts per day to catching only one or two!
The early IE's are 99.9% of the spam bot problem on the web, because these are easily infected to become botnet zombies. Human spammers are extremely rare, because think about it - if you have to pay someone to spam it kind of defeats the purpose of spamming.
I used to get 1,500 or so visits a day from these early IE computers, and spent months analyzing them and their origins. Never found one that looked like a Human. It is the 21st Century already, and I think it is high time webmasters not only stopped supporting early IE, but should also take steps to just plain block them. If the FBI and Microsoft really wanted to stop the botnet problem, MS would revoke the registration of these, or automatically upgrade them.
I used to use a script that did just that - would detect early IE and install the latest version of firefox, making it the default browser on that computer - using the same exploits that made them botnet zombies in the first place. I virtually wiped out a entire botnet that way, back in 2006 while one of my sites was undergoing a DDoS attack from one.
Your Mod is by far the best weapon against the botnets yet, and I have been studying them and fighting them for at least 10 years.
Max, could you mark as installed please?I did, on the 3.8.x version I run.
Simon Lloyd
03-14-2012, 05:59 PM
:), thanks and thanks! ;)
manning
03-16-2012, 02:05 AM
I banned them at the server level. Not catering to the Chinese or Asian market and never will cater to the Chinese or Asian market so don't need them to index my site.
Interesting idea - my forum really doesnt cater to Asian markets either no Russian or pretty much any place other than USA maybe UK ... What if I add ALLOW for those IPS and deny for everyone else... that makes htaccess huge - what affect will that have on load time? Course if they use a proxy in one of the other locations theyd still get in..... damn idiots!
BadgerDog
03-16-2012, 11:04 AM
Just for my clarity ... :)
I still get spiders appearing in PaulM's guest list and I understand from previous posts why. I also still see spiders active in my "Who's On-line" listing, but I understand that doesn't mean they actually are on the site, but have showed and been redirected?
As a test, I turned ON for a few minutes the post in thread option, captured a few posts and then turned it OFF.
Here's a typical thread it started:
Activity from Bot No. 7 (Baiduspider) in your list
Date and Time: 03-16-2012 06:57:28
Associated Username (if any): Unregistered
Matched bots[7]: Baiduspider
With User Agent: MOZILLA/5.0 (COMPATIBLE; BAIDUSPIDER/2.0; +HTTP://WWW.BAIDU.COM/SEARCH/SPIDER.HTML)
Does this mean that in fact that the Baidu spider has been caught by this mod and redirected elsewhere? Does it mean that the mod is actually working, in spite of what appears in the "Who's On-line" listing?
Thanks .. :)
Regards,
Doug
Max Taxable
03-16-2012, 02:40 PM
BadgerDog that's strange, I never see any of the banned user agents either in who's online or in Paul's Track Guest Visits (https://vborg.vbsupport.ru/showthread.php?t=201214) Mod.
Simon Lloyd
03-16-2012, 06:51 PM
Just for my clarity ... :)
I still get spiders appearing in PaulM's guest list and I understand from previous posts why. I also still see spiders active in my "Who's On-line" listing, but I understand that doesn't mean they actually are on the site, but have showed and been redirected?
As a test, I turned ON for a few minutes the post in thread option, captured a few posts and then turned it OFF.
Here's a typical thread it started:
Does this mean that in fact that the Baidu spider has been caught by this mod and redirected elsewhere? Does it mean that the mod is actually working, in spite of what appears in the "Who's On-line" listing?
Thanks .. :)
Regards,
DougIt may be that the mod is conflicting with some other mod, if you want to pm me admin access with permissions i'll take a look for you :)
baileyjojoms
03-16-2012, 08:32 PM
Just a hint to anyone with Baidu Spider issues. This Mod works great, but after getting 30,000 spider bans I had enough. I contact Baidu via their Spider Complaint section on their webpage, and they have halted crawling my site. This request was processed within 3 working days. I haven't seen a hint of Baidu since then.
Simon Lloyd
03-16-2012, 09:00 PM
Thats great news, i cnat believe you actually logged all those denials :), great info anyway as Baidu doesn't follow robots.txt (which they claim it does).
stilly
03-16-2012, 10:10 PM
Very nice mod. Thx.
BadgerDog
03-20-2012, 10:50 AM
Just a hint to anyone with Baidu Spider issues. This Mod works great, but after getting 30,000 spider bans I had enough. I contact Baidu via their Spider Complaint section on their webpage, and they have halted crawling my site. This request was processed within 3 working days. I haven't seen a hint of Baidu since then.
Do you have a link?
I can't seem to find the right page....
Thanks .. :)
Regards,
Doug
BadgerDog
03-20-2012, 10:53 AM
It may be that the mod is conflicting with some other mod, if you want to pm me admin access with permissions i'll take a look for you :)
I'll try to do that this weekend ... thank you for the offer .. :)
I also got a complaint from a member that he was redirected, but I can't see any reason for it as he's in the U.S. and uses MSN.
If I turn the mod ON, he gets redirected, if I turn it OFF he gets access. He says he's using IE9... :confused:
Regards,
Doug
Simon Lloyd
03-20-2012, 12:20 PM
I'll try to do that this weekend ... thank you for the offer .. :)
I also got a complaint from a member that he was redirected, but I can't see any reason for it as he's in the U.S. and uses MSN.
If I turn the mod ON, he gets redirected, if I turn it OFF he gets access. He says he's using IE9... :confused:
Regards,
DougIn that case his UserAgent has been changed to incorporate one of your banned bots names, get your member to go here http://whatsmyuseragent.com/ and copy and paste the UserAgent in a pm to you, you can then see if any of it appears in your list, this mod cannot redirect without there being a match.
BadgerDog
03-20-2012, 06:30 PM
In that case his UserAgent has been changed to incorporate one of your banned bots names, get your member to go here http://whatsmyuseragent.com/ and copy and paste the UserAgent in a pm to you, you can then see if any of it appears in your list, this mod cannot redirect without there being a match.
Good idea ... :up:
The member sent me this ..
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0; FunWebProducts; BOIE9;ENUS)
Does that help?
Regards,
Doug
Simon Lloyd
03-20-2012, 07:01 PM
Lol, doesn't help me, but it may help you, check your list to see if any part of the User Agent is in your banned bots list, it could be that his computer has been infected and his UA has been changed, theres a link at the top of the page ( for the link i posted) that tells you how to change your useragent.
Max Taxable
03-20-2012, 11:24 PM
Good idea ... :up:
The member sent me this ..
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0; FunWebProducts; BOIE9;ENUS)
Does that help?
Regards,
DougI bet FunWebProducts is on your list, especially if you copied my list from earlier in this thread.
That's malware by the way, also known to contain trojans, also is a browser hijacker. It's also known as "CoolWebSearch." Your friend's computer is highly corrupted. It is capable of sending his login credentials anywhere, among other things.
http://en.wikipedia.org/wiki/Browser_hijacker
Tell him if he's gonna visit alot of pr0n sites, do it with alot more secure browser than IE.
BadgerDog
03-21-2012, 12:20 AM
I bet FunWebProducts is on your list, especially if you copied my list from earlier in this thread.
Yes, it is ... :up:
So, just for my education, what does the data that's imbedded in his user agent that appears where FunWebProducts mean? Is this just some optional field that shows elements that are being loaded and used by his browser?
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0; FunWebProducts; BOIE9;ENUS)
Thanks for any feedback ... :)
Regards,
Doug
Max Taxable
03-21-2012, 12:45 AM
Yes, it is ... :up:
So, just for my education, what does the data that's imbedded in his user agent that appears where FunWebProducts mean? Is this just some optional field that shows elements that are being loaded and used by his browser?
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0; FunWebProducts; BOIE9;ENUS)
Thanks for any feedback ... :)
Regards,
DougI assume you're asking about the bolded, and offhand I would say that's the spammy Bing toolbar. Trident is the microsoft layout engine.
BadgerDog
03-21-2012, 01:22 AM
I assume you're asking about the bolded, and offhand I would say that's the spammy Bing toolbar. Trident is the microsoft layout engine.
Thanks Max ... :up:
Appreciate the feedback ...
Regards,
Doug
Max Taxable
03-21-2012, 01:26 AM
That dude has alot of garbage on his computer.
Simon Lloyd
03-21-2012, 07:56 AM
Use this tool http://user-agent-string.info/parse it will breakdown the UA in to its component parts :)
BadgerDog
03-21-2012, 09:08 AM
Use this tool http://user-agent-string.info/parse it will breakdown the UA in to its component parts :)
Thank you Simon... :)
Very useful ... :up:
Regards,
Doug
Max Taxable
03-21-2012, 02:59 PM
Use this tool http://user-agent-string.info/parse it will breakdown the UA in to its component parts :)U da Man Simon.
Simon Lloyd
03-21-2012, 04:18 PM
Lol, thanks!, i try ;)
baileyjojoms
03-22-2012, 01:27 AM
Do you have a link?
I can't seem to find the right page....
Thanks .. :)
Regards,
Doug
Yes, I ensured that the following was in my robots.txt file:
User-agent: Baiduspider
Disallow: /
Then I sent an email to: spiderhelp@baidu.com
Here is the message and reply I received:
Dear,
Thank you for your email.
We have updated our DNS record to make our spider behave the way requested in your robots file.
Should you need further assistance, please do not hesitate to contact us.
Best Regards,
Stephy Wu
Baidu Spider Team
________________________________________
re: Continuous Crawling of my site
To whom it may concern;
I have been trying for a month now to halt all crawling of my site by Baidu. I have added the following code to my robots.txt file:
User-agent: Baiduspider
Disallow: /
This was done 3 weeks ago. However I am being crawled daily.
Baidu is daily eating up a ton of Server Resources, and costing me slow load times. I also employed a spider ban modification, and have banned more than 28,000 Baidu spider entries in 3 weeks.
This is ridiculous. I am asking you to immediately halt all crawling of my site by Baidu.
I have not seen hide nor hair of Baidu since this was done, nearly a month ago.
To find the email address I went to their website, translated the page into English, and the searched Baidu Spider. Which took me to a search results page, which lead me to this page:
http://www.baidu.com/search/spider.html
I simply translated to English, and found the info I was looking for.
Baidu was the ONLY spider that was causing major issues, now I am able to use this add-on for other spiders - but Baidu was using massive amounts of resources.
Hope this helps.
BadgerDog
03-22-2012, 10:42 AM
Hope this helps.
Yes, thank you very much ... :)
Regards,
Doug
Alan_SP
03-28-2012, 04:40 PM
I have problems with Majestics MJ12bot. I tried to redirect bad spiders to own IP, or to HTML address given in mod. AFAIK all spiders other than Majestics MJ12bot are gone (here and there are new ones, but I remove them).
Do you know why this spider is successful in avoiding this mod?
Simon Lloyd
03-28-2012, 05:08 PM
When the spider appears next click the who's online, at the bottom choose show useragent and copy their entire UA string to the list :)
Alan_SP
03-28-2012, 06:07 PM
Thanks for the info. Before I didn't noticed this. I'll wait till it shows again (hopefully never again).
I also installed vB Bad Behavior: https://vborg.vbsupport.ru/showthread.php?t=261498
EDIT: I found this info about it:
Mozilla/5.0 (compatible; MJ12bot/v1.4.2; http://www.majestic12.co.uk/bot.php?+)
I now use only this string in spider list settings: MJ12bot.
I hope it will stop it, maybe Majestics MJ12bot was too much
Max Taxable
03-28-2012, 07:50 PM
Thanks for the info. Before I didn't noticed this. I'll wait till it shows again (hopefully never again).
I also installed vB Bad Behavior: https://vborg.vbsupport.ru/showthread.php?t=261498
EDIT: I found this info about it:
Mozilla/5.0 (compatible; MJ12bot/v1.4.2; http://www.majestic12.co.uk/bot.php?+)
I now use only this string in spider list settings: MJ12bot.
I hope it will stop it, maybe Majestics MJ12bot was too muchOn separate lines.
But I am curious - why do you want to ban this bot? It's one of the better behaved ones out there and it doesn't flood. It obeys robots.txt as well.
How can I block MJ12bot?
MJ12bot adheres to the robots.txt standard. If you want the bot to prevent website from being crawled then add the following text to your robots.txt:
User-agent: MJ12bot
Disallow: /
Simon Lloyd
03-28-2012, 09:49 PM
I hope it will stop it, maybe Majestics MJ12bot was too muchYes it was because it looks for the entire string you entered and as that doesn't appear in the UA it doesn't get banned :)
BTW, Max is dead right :)
Alan_SP
03-29-2012, 06:01 PM
Because I don't have use of spiders that no one is using. At least I don't know anyone that uses Majestics.
People usually use Google, Bing, Facebook... Other search engines or spiders don't interest me, as they don't interest my users. Same goes for Baidu or similar search engines.
Does people use Majestics? And for what purposes?
Baf_Jams
03-29-2012, 09:04 PM
Installed Thanks :)
Max Taxable
03-29-2012, 09:36 PM
Because I don't have use of spiders that no one is using. At least I don't know anyone that uses Majestics.
People usually use Google, Bing, Facebook... Other search engines or spiders don't interest me, as they don't interest my users. Same goes for Baidu or similar search engines.
Does people use Majestics? And for what purposes?The link you posted earlier nicely explains it.
It is a friendly, well behaved bot that helps your presence on the web. Baidu is none of these, it is a aggressive, leeching, unfriendly bot attached to a Chinese search engine. There's no comparison between the two.
Ban whatever bots you like, no one's telling you not to. But MJ12 isn't hurting you, it's helping you and it is friendly.
Alan_SP
03-30-2012, 04:43 PM
But MJ12 isn't hurting you, it's helping you and it is friendly.
Thanks for your explanation. It's helpful, not with just this post. :)
S_E_A
04-14-2012, 10:33 AM
Thank you for a great mod, Simon.
I want to block Deepnet Explorer Spiders. To do this I enter 'Deepnet Explorer' into the spide list? It's okay to block Deepnet Explorer?
Simon Lloyd
04-14-2012, 10:43 AM
Blocking spiders is all about personal choice, do a little research and find out whether you want to cater for that country and whether they add value to your site!, when Deepnet Explorer are visiting go to who's online and at the bottom there's a dropdown box for "Show Useragent?" select Yes, then check out their useragent, you can enter any or all of the UA string, so if they actually do have Deepnet in the UA then you just enter that on its own line in the list :)
Max Taxable
04-14-2012, 01:48 PM
My current (updated) list of banned user agents entered into this Mod:
baiduspider
beta.statsit.com
statsit
SiteIntel
Yandex
GomezAgent
FunWebProducts
MSIE 1
MSIE 2
MSIE 3
MSIE 4
MSIE 5
MSIE 6
Nesotebot
DCPbot
Opera/1
Opera/2
Opera/3
Opera/4
Opera/5
Opera/6
Opera/7
Opera/8
AOL Advertising R&D
DataCha0s
aiHitBot
Apache-HttpClient
Zend_Http_Client
ReverseGet
ForceHSS
04-18-2012, 06:32 PM
xpymep.exe
start.exe
I seen these two as hosts so I added them to the list look strange
ForceHSS
05-02-2012, 03:10 AM
My full list if any need it
beta.statsit.com
statsit
SiteIntel
Yandex
GomezAgent
FunWebProducts
Nesotebot
DCPbot
Opera
AOL Advertising R&D
DataCha0s
aiHitBot
Apache-HttpClient
Zend_Http_Client
ReverseGet
Baidu
BoardReader
almaden
Anarchie
ASPSeek
attach
autoemailspider
BackWeb
Bandit
BatchFTP
BlackWidow
Bot\mailto:craftbot@yahoo.com
Buddy
bumblebee
CherryPicker
ChinaClaw
CICC
Collector
Copier
Copyscape
Crescent
DIIbot
DISCo
DISCo\Pump
dotbot
Download\Demon
Download\Wonder
Downloader
Drip
DSurf15a
eCatch
EasyDL/2.99
EirGrabber
email
EmailCollector
EmailSiphon
EmailWolf
Express\WebPictures
ExtractorPro
EyeNetIE
FileHound
FlashGet
FrontPage
GetRight
GetSmart
GetWeb!
gigabaz
Go\!Zilla
Go!Zilla
Go-Ahead-Got-It
gotit
Grabber
GrabNet
Grafula
grub-client
HMView
HTTrack
httpdown
.*httrack.*
ia_archiver
Image\Stripper
Image\Sucker
Indy*Library
Indy\Library
InterGET
InternetLinkagent
Internet\Ninja
InternetSeer.com
Iria
JBH*agent
JetCar
JOC\Web\Spider
JustView
larbin
LeechFTP
LexiBot
lftp
Link*Sleuth
likse
//Link
LinkWalker
Mag-Net
Magnet
Mass\Downloader
Memo
Microsoft.URL
MIDown\tool
Mirror
Mister\PiX
Mozilla.*Indy
Mozilla.*NEWT
Mozilla*MSIECrawler
Mozilla/4.0
Mozilla/4.79
MS\FrontPage*
MSFrontPage
MSIECrawler
MSProxy
myinfo.any-request-allowed.com
Navroad
NearSite
NetAnts
NetMechanic
NetSpider
Net\Vampire
NetZIP
NICErsPRO
Ninja
Nutch
Octopus
Offline\Explorer
Offline\Navigator
Openfind
PageGrabber
Papa\Foto
pavuk
pcBrowser
Ping
PingALink
Pockey
psbot
Pump
Python-urllib/2.4
QRVA
RealDownload
Reaper
Recorder
ReGet
Scooter
Seeker
Siphon
sitecheck.internetseer.com
SiteSnagger
SlySearch
SmartDownload
Snake
sogou
Soso
SpaceBison
Spinn3r
sproose
Stripper
start.exe
Sucker
SuperBot
SuperHTTP
Surfbot
Szukacz
SeznamBot
tAkeOut
Teleport\Pro
TurnitinBot/2.1
URLSpiderPro
Vacuum
VoidEYE
vBSEO
Web\Image\Collector
Web\Sucker
WebAuto
[Ww]eb[Bb]andit
webcollage
WebCopier
Web\Downloader
WebEMailExtrac.*
WebFetch
WebGo\IS
WebHook
WebLeacher
WebMiner
WebMirror
WebReaper
WebSauger
Website
Website\eXtractor
Website\Quester
Webster
WebStripper
WebWhacker
WebZIP
Wget
Whacker
Widow
WWWOFFLE
x-Tractor
Xaldon\WebSpider
Xenu
xpymep.exe
Yandex
Yeti
YOUDAOBOT
Zeus.*Webster
Zeus
bigtree
05-02-2012, 03:41 AM
Add BoardReader Its a bad one.
ForceHSS
05-02-2012, 03:47 AM
added
Nirjonadda
05-02-2012, 04:33 PM
# Facebook Spider
# Google Wireless Transcoder Spider
How I Can Ban Permanently From My Web Site?
Simon Lloyd
05-02-2012, 06:26 PM
# Facebook Spider
# Google Wireless Transcoder Spider
How I Can Ban Permanently From My Web Site?Check this thread https://vborg.vbsupport.ru/showpost.php?p=2319989&postcount=318
waldvb
05-16-2012, 04:51 PM
Installed, enabled Mod. Still can see Baidu in Who's Online - i.e, IP 180.76.5.165, 180.76.5.168, 180.76.5.66
What's wrong?
ForceHSS
05-16-2012, 05:34 PM
Installed, enabled Mod. Still can see Baidu in Who's Online - i.e, IP 180.76.5.165, 180.76.5.168, 180.76.5.66
What's wrong?
post screenshot of settings
waldvb
05-16-2012, 06:08 PM
post screenshot of settings
Here are settings. With mod I have just 4-5 baidu IP's. Before I had 100 +
ForceHSS
05-16-2012, 09:39 PM
spiders own ip make that a yes
Winter Sonata
05-17-2012, 01:10 PM
Installed, I hope that can help reducing the server resources usage
Simon Lloyd
05-17-2012, 07:38 PM
@Waldvb, if you have Paul M's who's online mod then you will see them in that as they actually call for a thread directly, what happens is they make that request but are instantly redirected to the plac eof your choice :)
@Winter Sonata, thanks for installing, please click "Mark As Installed" at the top left of this thread so you can recieve support if you need it :)
spillage
05-18-2012, 12:23 AM
Installed, I hope that can help reducing the server resources usage
For me, it's stopping all but Google, Yahoo, and Bing (as I've not added those to my list).
This is an excellent mod.
Winter Sonata
05-18-2012, 11:36 PM
Thanks Simon Lloyd , now done :) sorry forget at the 1st time :)
Best Regards!
waldvb
07-16-2012, 10:47 PM
Can't ban utel.net.ua
Max Taxable
07-16-2012, 10:58 PM
Can't ban utel.net.uaIf that is in the USER AGENT string, sure you can. If it's the host name, this Mod doesn't "see" it. This Mod isn't for banning IPs, ISPs or hostnames.
Simon Lloyd
07-17-2012, 06:07 AM
Can't ban utel.net.uaRead this post https://vborg.vbsupport.ru/showpost.php?p=2319989&postcount=318
KidHTML
07-19-2012, 12:08 PM
How do I block these bots because I'm not understanding this mod or this whole topic...
Twitterbot,
Butterfly Topsy Crawler (2),
Embedly
Simon Lloyd
07-19-2012, 12:16 PM
If you'd like to mark it installed i'll give you all the help you need :)
KidHTML
07-19-2012, 12:19 PM
Sorry, marked as installed and thanks.
Max Taxable
07-19-2012, 12:24 PM
How do I block these bots because I'm not understanding this mod or this whole topic...
Twitterbot,
Butterfly Topsy Crawler (2),
EmbedlyPersonally I wouldn't block any of these.
KidHTML
07-19-2012, 12:26 PM
Personally I wouldn't block any of these.
I wasn't sure if I should or not...
Max Taxable
07-19-2012, 02:16 PM
I wasn't sure if I should or not...I posted a list of the ones I block, earlier in this thread. All are there for very good reason. They aren't helpful to the site, they leech too much resources, or they are typically botnet zombies. Bad actors all.
Simon Lloyd
07-19-2012, 03:20 PM
Thanks for the help Max :)
Max Taxable
07-19-2012, 07:24 PM
Thanks for the help Max :)Heh. I never explained how to use your Mod....:p
Simon Lloyd
07-19-2012, 08:52 PM
Heh. I never explained how to use your Mod....:pI was under the impression that it was self explanatory, maybe i should add to the first post?
Max Taxable
07-19-2012, 09:23 PM
I was under the impression that it was self explanatory, maybe i should add to the first post?I got the same impression...
Willy T
07-19-2012, 10:35 PM
I must say..... This was the only thing I found to get rid of those damn Baidu spiders! I averaged 50 - 65 baidu spiders at one time. I often had almost as many guests online as I did users.
Now I am perfectly fine with 1-5 google & bing spiders online. 50+ spiders that simply don't help me? yea... No thanks!
Max Taxable
07-19-2012, 11:47 PM
Baidu is evil.
Nichtofen
07-29-2012, 07:08 PM
This mod is great! I cannot express how wonderful this mod is and how well it works. Tie that into how responsive and resourceful Simon is and it's a winner. Thanks for all the help and clarification within the comments to Simon, Max, Force, and all!
My forum is young and relatively slow as it were, but I wish to stay ahead of the game on this one and keep my security up as the server load increases. I don't see the spiders that were stopping by for a visit anymore. :)
Thanks again and keep up the good work
Marked Installed and Nominated
vB4.2.0
Simon Lloyd
07-29-2012, 10:55 PM
Glad you like it and it works well for you, i think your our first vb4.2.x so it's good to know it works upward :)
I know it's a pain but do familiarise yourself with the info that can be found at the links in the first post, these will help you as your forum gets busier, also take the time to research some of the bots, they're not all bad, some will even help your site grow.
Be mindful about who and which nations you are going to cater for before banning/allowing bots to crawl your site and you'll be golden.
zascok
08-15-2012, 03:18 PM
Don't really know what the Yandex has done to be called bad_bot :confused: and ia_archiver ?
Nice mod installed
Simon Lloyd
08-15-2012, 05:09 PM
As has been said many times blocking spiders/bots is a personal thing, you need to understand who and which countries you are aiming your site and content at, do you have enoughj resources to allow your content to be scraped by archivers...etc, do you have enough bandwidth to allow bots who add no value or do not cater for your target audience to index your content?
Just be very honest with yourself on what your intentions are with regards to your target audience and user experience :)
zascok
08-15-2012, 06:40 PM
here is a one to your collection Ezooms <-- really bad one don't even give a thing about robot.txt.
About the Yandex (http://www.yandex.ru/). Just in case: it's the biggest engine in Russian language. Of course if you don't wanna it index you site that is you personal thing. All I asked is: what has it done? Did anyone ever see that bot is braking the rules ?
Simon Lloyd
08-15-2012, 07:10 PM
Is your site in english?, do you cater for Russians? (Yandex = Yahoo) it's done nothing bad that i know of and my interest isn't to keep a list of "bad bots"...etc, i purely built this to save on precious bandwidth that was being decimated by bots, my site is englsh, the UTF is set up as english...etc but yandex crawl my site, i don't need them, they dont bring trafic and i don't speak Russian, so i block them, so thats what i mean, choose what you want to block with regards to who you ar targetting.
In the mod description there's links to maintained lists that you can use, the ones in the product were just your starter :)
rootsxrocks
08-15-2012, 07:22 PM
OMG I can Banish Baudi Thank you I am sick of that worthless misbehaving IP switching overloading spider.
zascok
08-15-2012, 07:28 PM
OK i see now. Tia. Now wondering is there is a plugin that bans counties by the name :)
rootsxrocks
08-15-2012, 07:34 PM
We are a localized site too and have no need for Chinese or Russian search results.
Simon Lloyd
08-15-2012, 08:55 PM
OMG I can Banish Baudi Thank you I am sick of that worthless misbehaving IP switching overloading spider.Glad i could be of service ;)
ForceHSS
10-29-2012, 03:03 AM
xpymep.exe
here is a new one to add to the list
Max Taxable
10-29-2012, 03:10 AM
xpymep.exe
here is a new one to add to the listOne you already gave us, in your fantastic list posted earlier in the thread:
https://vborg.vbsupport.ru/showpost.php?p=2325399&postcount=321
ForceHSS
10-29-2012, 03:25 AM
you right did not see that bit late hear could be the reason :)
Disco_Dave
10-31-2012, 03:05 PM
I installed this mod yesterday and from then I have been bombarded with this spider AhrefsBot Spider I've tried adding them to this mod, but no joy.
Max Taxable
10-31-2012, 05:17 PM
I installed this mod yesterday and from then I have been bombarded with this spider AhrefsBot Spider I've tried adding them to this mod, but no joy.It blocks that spider, how did you add it?
TheSupportForum
10-31-2012, 05:27 PM
I installed this mod yesterday and from then I have been bombarded with this spider AhrefsBot Spider I've tried adding them to this mod, but no joy.
you need to block their IP, AhrefsBot has more than 1 IP
Max Taxable
10-31-2012, 05:29 PM
you need to block their IP, AhrefsBot has more than 1 IPThe whole purpose of this Mod is to block user agents, so you don't have to block IP addresses.
If the person put "AhrefsBot" in this Mod, it should be blocked no matter the IP.
Simon Lloyd
10-31-2012, 09:06 PM
I think you'll find that if you check WOL when arhefsbot is online and then choose to show useragent from the dropdown ahrefsbot isn't actually in their useragent, i think i posted about this to another user a few posts or so ago.
TheSupportForum
10-31-2012, 09:20 PM
why not just put
*
/
it blocks all useragents and will only require standard 1 line box
this is what i have done
Simon Lloyd
10-31-2012, 09:37 PM
why not just put
*
/
it blocks all useragents and will only require standard 1 line box
this is what i have done
that will block everyone if using it in my mod!
Here's some UA's that ahrefs use
Mozilla/5.0 (compatible; AhrefsBot/1.0; +http://ahrefs.com/robot/)
Mozilla/5.0 (compatible; AhrefsBot/2.0; +http://ahrefs.com/robot/)
Mozilla/5.0 (compatible; AhrefsBot/3.0; +http://ahrefs.com/robot/)
Mozilla/5.0 (compatible; SiteBot/0.1; +http://www.sitebot.org/robot/)
Mozilla/5.0 (compatible; SiteBot/0.1; +http://www.sitebot.org/robot/),gzip(gfe)
Mozilla/5.0 (compatible; SiteBot/0.1; +http://www.sitebot.org/robot/),gzip(gfe),gzip(gfe),gzip(gfe)
So use SiteBot or Ahrefs as the banning UA's :)
TheSupportForum
10-31-2012, 10:31 PM
for those who want to block blocks accesses through Facebook external hit
facebookexternalhit/1.0
Max Taxable
10-31-2012, 11:44 PM
I think you'll find that if you check WOL when arhefsbot is online and then choose to show useragent from the dropdown ahrefsbot isn't actually in their useragent, i think i posted about this to another user a few posts or so ago.I have had no problems, Ahrefsbot is in my collection, and I never see it hit my site anymore.
Pretty sure the OP is saying he sees "ahrefsbot" in his WOL after adding it to your mod.
Max Taxable
10-31-2012, 11:45 PM
for those who want to block blocks accesses through Facebook external hit
facebookexternalhit/1.0I don't block that because it's just facebook getting image and text information from a thread or a post someone has posted to facebook. It's self defeating to block this, it's your friend. Same with twitterbot.
Simon Lloyd
11-01-2012, 07:39 AM
I have had no problems, Ahrefsbot is in my collection, and I never see it hit my site anymore.
Pretty sure the OP is saying he sees "ahrefsbot" in his WOL after adding it to your mod.Well if thats the case and its actually in vbulletins standard WOL then it will either disappear after his online timeout setting in admincp or the Ahrefsbot is using a UA that doesn't have Ahrefs in it, if he is using Paul M's Who has visited or a similar mod then it will appear always as both mods are doing their job!, i've mentioned this a few times.
EDIT: this is actually mentioned in the FAQ thats referrenced in the mods description.
Disco_Dave
11-01-2012, 08:14 AM
It blocks that spider, how did you add it?
There user agent was comingup as this: choopa: choopa.net: I placed both of these in. They had at least 20 different ip addies.
Simon Lloyd
11-01-2012, 08:37 AM
You only need to enter choopa if that displays in their UA and they'll be banned immediately and will disappear from WOL after the WOL timeout, the ip's are of no consequence, i have a ban ip mod but this one ban's the string found in the UA so they can use 100 different ips for the same UA and still they will be banned :)
Disco_Dave
11-01-2012, 08:48 AM
That's what I did yesterday morning, but in the afternoon I had around 36 ahrefsbots under the UA Choopa.net I added choopa in the morning to your mod but they where still there in the afternoon that's way I asked. I haven't seen them this morning though..
Simon Lloyd
11-01-2012, 09:06 AM
Do you use any mod for who visited statistics? if you dont the only other explanation is that they had already accessed a thread or area prior to you banning them, the only time they can be banned is when they release that thread or area to move to another, after that they're history :)
Disco_Dave
11-01-2012, 09:15 AM
I have boofo's mod for displaying spiders. I'm not sure if any spiders can view our site you need to be registered to view any content on our site.
Great mod and thanks for your help..
Simon Lloyd
11-01-2012, 09:45 AM
It doesn't matter that they cannot view any content, your thread url's are being indexed which is why you are being crawled, naturally they see the same as guest, just viewed your site you should also turn off displaying WOL for guests, it will save you queries and bandwidth :)
Disco_Dave
11-01-2012, 10:00 AM
Cheers Simon :D
The feckers are still getting in: AhrefsBot Spider 02:15 PM / Viewing Index NIRC: 173.199.115.83.choopa.net
Simon Lloyd
11-01-2012, 03:05 PM
If thats form the logging then yes you will get that until every thread they have tried to index previously becomes a 301 permanent redirect, if not, if you want to pm me temp admin access with all permissions i'll take a look and see what i can do for you.
Simon Lloyd
11-01-2012, 04:12 PM
Ok, i've checked and i dont see any of these bots in your native vbulletin WOL, the other mods you have for statistics and total visitors...etc WILL log these as visiting because the bots are directly accessing a url, the logging is done before the url loads completely, my mod also bans them at this point so both mods are working :)
Just as a note, you're using create a thread, you can quickly get thousands of threads, it's better to use the output.txt logging :)
Note to all!:
If you have Simon in your ban list this will ban the following:
simon
SimonLloyd
Lloyd simon
thisisanincrediblylongsimonwordhere
Get the idea?, you dont need to add all those to your ban list, simply because the mod looks for the string "simon" (case doesn't matter) in the entire string, so, if you'd used this in your list:
Simon*\Lloyd
It would NOT ban:
Simon
Simon Lloyd
thisissimonlloydinastring
but it WOULD ban
Simon*\Lloyd-in(this.string)
thisstringSimon*\Lloydhere
....etc
Hope you all understand this better now and can get to removing duplicates from your list.
@tricksodave, you can delete the temp account for me now thanks, also if you read the above please prune your list.
If any of you have any trouble with editing your lists let me know and i'll help with anything you're stuck with :)
Disco_Dave
11-01-2012, 04:17 PM
Thanks Simon, That's helped me understand it a bit better. Thanks again...
TheSupportForum
11-01-2012, 04:20 PM
Simon Lloyd
i c wat u done there :)
haha
CAG CheechDogg
11-01-2012, 07:33 PM
Simon does this also block Facebook's scrapper? I am getting slammed by Facebook IP's and spiders:
facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
I did it through htaccess but this blocks the ability for me to post any articles to facebook with a thumbnail.
Here is a link: http://www.botopedia.org/user-agent-list/social-media-agents/facebook-external-hit
CAG CheechDogg
11-01-2012, 07:35 PM
Or is there a way to slow these guys down with crawl-delay like this:
User-Agent: *
Crawl-Delay: 10
I read you should use the agent by name instead of the above, if you know how or does facebook follow the above?
CAG CheechDogg
11-01-2012, 07:39 PM
Here is something else on facebook's bot, spiders or what ever they really are. Facebook claims they are not spiders or bots but instead scrapers, but I have been getting 500 server side errors and I check my error logs and during or around the time they are hitting my site over 100 times sometimes within 2 minutes I see Facebook IPs in the error logs....sigh...
Help? lol....
Simon Lloyd
11-01-2012, 08:32 PM
Is there only Facebook in the error log? As for banning both or whoever read my post above, as you see it all depends on the UAs of each bot, banning is a personal thing, most both don't recognise the delay command in robots. Maybe look at their ip range and ban some of their ips you can use my other mod for that.
Simon Lloyd
11-01-2012, 08:34 PM
CAG you haven't downloaded or marked this as installed!
CAG CheechDogg
11-01-2012, 08:40 PM
Hey! I did downloaded but I didn't hit installed! lol...Sorry ...
As for banning the ips I have done that, but that blocks the ability to post the articles with the right info on facebook, so I have to make a decision here on whether facebook will help my site or not.
I was just asking the question really about the crawl-delay which shouldn't have been asked here Simon, I apologize for that.
Simon Lloyd
11-01-2012, 09:06 PM
Banning ips are only for incoming unless you've banned them in cpanel or htaccess. As for asking about the delay there's no problem i like to help where i can.
ForceHSS
11-01-2012, 11:18 PM
Aboundex/0.2
seems to be a new one here is the full thing Aboundex/0.2 (http://www.aboundex.com/crawler/)
ip is 173.193.219.168-static.reverse.softlayer.com
I have checked the ip and it has come back that a spam bot is using it
If someone wants to run checks see if it needs added to the list. I am not 100% sure if it is this is why it needs checked first
CAG CheechDogg
11-01-2012, 11:21 PM
Banning ips are only for incoming unless you've banned them in cpanel or htaccess. As for asking about the delay there's no problem i like to help where i can.
Yeah I used htaccess to ban them completely. I need to find out exactly what IPs I can band and still allow facebook to work properly when I post links to articles or posts...sigh...lol
But hanks for understanding and helping out, it is very much appreciated Simon
Max Taxable
11-01-2012, 11:39 PM
Yeah I used htaccess to ban them completely. I need to find out exactly what IPs I can band and still allow facebook to work properly when I post links to articles or posts...sigh...lol
But hanks for understanding and helping out, it is very much appreciated SimonIn my experience, that's the only time the FB external hit bot comes to your site - when you or someone else posts a link to your site, on facebook. It's your friend. Same with twitterbot and all of its affiliates. I don't mess with those at all.
CAG CheechDogg
11-01-2012, 11:52 PM
In my experience, that's the only time the FB external hit bot comes to your site - when you or someone else posts a link to your site, on facebook. It's your friend. Same with twitterbot and all of its affiliates. I don't mess with those at all.
Max it's weird because I have the facebook like buttons off on my forums. I do have rssgraffiti but I don't see why that would be hitting pages like the mood and status module and other unrelated pages.
Max Taxable
11-02-2012, 12:02 AM
Max it's weird because I have the facebook like buttons off on my forums. I do have rssgraffiti but I don't see why that would be hitting pages like the mood and status module and other unrelated pages.Some autospam bots do spoof their user agents as facebook or even googlebot.
CAG CheechDogg
11-02-2012, 12:14 AM
Some autospam bots do spoof their user agents as facebook or even googlebot.
Great! now you tell me ! lol...Thanks again Max I will have to take a careful look at the IPs and try to see if they match facebooks then.
Max Taxable
11-02-2012, 12:22 AM
From what I've seen over the years facebook's bots have good behavior and only come to see you when something is posted there, from your site. Then they don't crawl around and they SURE don't go anywhere suspicious.
CAG CheechDogg
11-02-2012, 02:52 AM
Yeah, nothing suspicious about facebook's crawlers, scrapers or bots , what ever they are. But it has caused my forums to pop the 500 internal server error a bunch of times , I check around the time those errors happen and are reported to me and its facebook's ips around the times of the 500 errors.
CAG CheechDogg
11-02-2012, 03:48 PM
Well I decided to completely deny Facebook crawlers, scrapers, spiders or bots to crawl my site.
I deleted all their active sessions from my database through phpMyAdmin and added "facebook" to the list and I haven't gotten one single facebook critter on my site since.
Sucks because I can no longer share anything on facebook from my forums but I just had to do it. Facebook wont reply and doesn't seem to care about eating up bandwidth with their crawlers.
Oh well.
TheSupportForum
11-02-2012, 04:03 PM
Well I decided to completely deny Facebook crawlers, scrapers, spiders or bots to crawl my site.
I deleted all their active sessions from my database through phpMyAdmin and added "facebook" to the list and I haven't gotten one single facebook critter on my site since.
Sucks because I can no longer share anything on facebook from my forums but I just had to do it. Facebook wont reply and doesn't seem to care about eating up bandwidth with their crawlers.
Oh well.
a wise choice if that's happening to you, as they will eat up your bandwidth
CAG CheechDogg
11-02-2012, 04:16 PM
a wise choice if that's happening to you, as they will eat up your bandwidth
Yeah Simon, I had to do it, I didn't want to because it did bring in traffic but after carefully thinking about it, it's not going to hurt me or the site if I block it.
Oh the funs of owning a website eh? lol..:eek:
TheSupportForum
11-02-2012, 04:27 PM
Yeah Simon, I had to do it, I didn't want to because it did bring in traffic but after carefully thinking about it, it's not going to hurt me or the site if I block it.
Oh the funs of owning a website eh? lol..:eek:
for me i own 2 so i am catching bots to block across 2 domains
i have spam traps on 1 to catch them
CAG CheechDogg
11-02-2012, 04:29 PM
Just a bit of FYI, I was getting hit by Russian IPs and they were also trying to register, I tracked it down to "Deepnet" Explorer which I just blocked as well, just thought I would mention that here.
Simon Lloyd
11-02-2012, 04:57 PM
As for Facebook, if you've gone that route maybe it would be beneficial to set up an rss feed from your site in facebook :)
CAG CheechDogg
11-02-2012, 05:15 PM
I did have the rss feed using "rss graffiti" which will no longer work now that I blocked facebook
Do you know any other way to do this ?
Simon Lloyd
11-02-2012, 05:44 PM
Ah!, no it was graffitti that i was using, however, you can get twitter to post to facebook :)
CAG CheechDogg
11-02-2012, 05:51 PM
Hmmm...ok I will check to see how twitter will crawl my site lol....I did set up an RSS feed a couple minutes ago using Social RSS: http://www.facebook.com/CAGclan/app_23798139265
I will do some searching for twitter to facebook though, thanks for the suggestion!
CAG CheechDogg
11-02-2012, 09:10 PM
Simon, rss graffiti still works with facebook blocked by this mod! muahaha! This is great!
Simon Lloyd
11-02-2012, 09:36 PM
:) Glad you're happy!
CAG CheechDogg
11-02-2012, 09:49 PM
:) Glad you're happy!
Yeah Simon thanks again for a great mod! Now I don't have the facebook critters and my new threads are still getting posted on facebook...muahahha!
CAG CheechDogg
11-04-2012, 03:50 PM
New Spider to add you guys
SeznamBot
Seznam Fulltext Blog
In Omnibus
11-04-2012, 04:16 PM
New Spider to add you guys
SeznamBot
Seznam Fulltext Blog
I've never seen this bot before so obviously it only hangs out at the coolest sites.
Max Taxable
11-04-2012, 05:14 PM
It was on the list posted earlier in the thread. Nasty little bugger.
TheSupportForum
11-04-2012, 05:16 PM
there is a new bot i spotted today
TurnitinBot/2.1
Simon Lloyd
11-04-2012, 05:34 PM
Hey guys, if you come across new bots...etc can you also post them here https://www.vbulletin.com/forum/showthread.php?t=352664 so Mosh can add them to his spider list for vbulletin too :)
TheSupportForum
11-04-2012, 05:39 PM
Hey guys, if you coma across new bots...etc can you also post them here https://www.vbulletin.com/forum/showthread.php?t=352664 so Mosh can add them to his spider lits for vbulletin too :)
Thanks, just posted mine
CAG CheechDogg
11-04-2012, 06:37 PM
there is a new bot i spotted today
TurnitinBot/2.1
I had that one show up before when I used Kunena forums, I blocked that sucker about a year ago.
Snowhog
11-04-2012, 10:04 PM
Thank you Simon for such a useful MOD. Simple, clean, and effective. Installed and nominated for MOTM.
tambo
11-05-2012, 06:50 PM
Excellent mod. Has already helped reduce our guest list.
Many thanks.
CAG CheechDogg
11-06-2012, 06:51 AM
The Artabus spider is still getting through even though it's on the list, anything else that can help ?
Simon Lloyd
11-06-2012, 07:07 AM
The Artabus spider is still getting through even though it's on the list, anything else that can help ?I've said this before..........go to WGO click users online (online.php), at the bottom from the dropdown choose to view user agent and check what Artabus has as its UA, it probably doesn't have artabus in the UA.
CAG CheechDogg
11-06-2012, 07:16 AM
The following is what shows there Simon:
pool-109-191-73-49.is74.ru
Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; T312461)
What should I use to block it here?
Simon Lloyd
11-06-2012, 07:31 AM
to be safe block the entire string, this way you wont accidentally block legitimate users :)
CAG CheechDogg
11-06-2012, 07:34 AM
Simon I feel like a fool asking this...but which one would be the entire string to use?
pool-109-191-73-49.is74.ru
or
Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; T312461)
I don't want to block legitimate users! lol..
Simon Lloyd
11-06-2012, 07:39 AM
the second one, the first is to do with the IP address.
CAG CheechDogg
11-06-2012, 07:47 AM
Thanks Simon! lol....I feel like what my Son calls me at times a "Goober"...hahah...:D
Max Taxable
11-06-2012, 02:01 PM
VERY few legitimate human users are going to be on MSIE 7 or older.
I block ALL MSIE except 8. Amazing how much that alone cut down on bot registration attempts.
CAG CheechDogg
11-06-2012, 11:38 PM
VERY few legitimate human users are going to be on MSIE 7 or older.
I block ALL MSIE except 8. Amazing how much that alone cut down on bot registration attempts.
so what do you use to block all MSIE except for 8 Max, htaccess or some other way?
TheSupportForum
11-06-2012, 11:52 PM
so what do you use to block all MSIE except for 8 Max, htaccess or some other way?
heres an example
Chrome/10.*
Firefox/4.*
so i asume MSIE will work as
MSIE/6.*
MSIE/7.*
and so on
Max Taxable
11-07-2012, 12:33 AM
so what do you use to block all MSIE except for 8 Max, htaccess or some other way?I just have it in like this:
MSIE 1
MSIE 2
MSIE 3
MSIE 4
MSIE 5
MSIE 6
MSIE 7
TheSupportForum
11-07-2012, 12:44 AM
I just have it in like this:
MSIE 1
MSIE 2
MSIE 3
MSIE 4
MSIE 5
MSIE 6
MSIE 7
wont MSIE 1 block MSIE Beta 10 ?
which means MSIE 8, 9 only visitors
Max Taxable
11-07-2012, 01:22 AM
wont MSIE 1 block MSIE Beta 10 ?
which means MSIE 8, 9 only visitorsNope!
CAG CheechDogg
11-07-2012, 02:28 AM
This is using your mod here right?
Max Taxable
11-07-2012, 03:42 AM
This Mod, yeah. It ain't my mod tho.
Simon Lloyd
11-07-2012, 07:47 AM
wont MSIE 1 block MSIE Beta 10 ?
which means MSIE 8, 9 only visitorsTake a quick look at this post https://vborg.vbsupport.ru/showpost.php?p=2377564&postcount=381 should help explain how the system works better :)
CAG CheechDogg
11-07-2012, 10:33 AM
Take a quick look at this post https://vborg.vbsupport.ru/showpost.php?p=2377564&postcount=381 should help explain how the system works better :)
Yeep makes way more sense now, how easy it is for "us" to overlook a single post that explains it all. Sorry , I am guilty of doing so. ...
Thanks for tolerating us Lloyd, we appreciate it very much!
Simon Lloyd
11-07-2012, 11:50 AM
Its not toleration, i just love helping people :)
vb50kgpoo
11-12-2012, 10:17 AM
Hi Simon
Yours is a great product. I made the mistake of uninstalling it in order to use AbyssGuard, which is plagued with problems. I have now reinstalled Ban Spiders By User Agent. One question, are there any ramifications in banning \wbot[\/\-] with your mod? I ask as putting \wbot[\/\-] directly into my htaccess banning mecahism causes issues.
Regards / RSVP
vb50kgpoo
11-12-2012, 11:26 AM
Also.......
Does anyone know what these bots are;
Robot ID - Hits - Bandwidth - Last visit - Hits on robots.txt
robot 772 8576221 20121111093454 0
crawl 699 9556953 20121108085243 0
spider 5 114750 20121106065956 0
Bad bots using generic names?
ForceHSS
11-12-2012, 12:04 PM
What is there full host name
vb50kgpoo
11-12-2012, 01:28 PM
What is there full host name
That is it !
They go by those generic names only, nothing else.
Simon Lloyd
11-12-2012, 04:31 PM
Hi Simon
Yours is a great product. I made the mistake of uninstalling it in order to use AbyssGuard, which is plagued with problems. I have now reinstalled Ban Spiders By User Agent. One question, are there any ramifications in banning \wbot[\/\-] with your mod? I ask as putting \wbot[\/\-] directly into my htaccess banning mecahism causes issues.
Regards / RSVPIm ,y mod you are banning any useragent that has any occurrence of one of the strings in your list, i very much doubt that \wbot[\/\-] is found in any useragent as the looks like a regular expression, in my mod simply wbot will do if thats in their useragent.
Also.......
Does anyone know what these bots are;
Robot ID - Hits - Bandwidth - Last visit - Hits on robots.txt
robot 772 8576221 20121111093454 0
crawl 699 9556953 20121108085243 0
spider 5 114750 20121106065956 0
Bad bots using generic names?Thats just stats from cpanel awstats and mean nothing other than spiders were identified with those in their identifier.
That is it !
They go by those generic names only, nothing else.They are not their useragents, read some of the links that i took time and trouble to post in the mod description on hpw to find the useragent.
bzcomputers
11-20-2012, 03:09 PM
wont MSIE 1 block MSIE Beta 10 ?
which means MSIE 8, 9 only visitors
It won't block MSIE Beta 10, but it will block MSIE 10. Best to remove MSIE 1 from your list now. I haven't seen any references in the output.txt file of anything prior to MSIE 5 so removing MSIE 1 probably won't cause any problems.
https://vborg.vbsupport.ru/external/2012/11/15.jpg
Edit: If you were wondering what the IP Address 108.2.106.107 is, it is Verizon's Search Engine (www.verizon.net) definitely not something you would want to block.
Max Taxable
11-20-2012, 03:35 PM
It won't block MSIE Beta 10, but it will block MSIE 10. Best to remove MSIE 1 from your list now. I haven't seen any references in the output.txt file of anything prior to MSIE 5 so removing MSIE 1 probably won't cause any problemsUsing the time based Mod in my signature, I often see user agents with early IE, such as 3, 4 and 5, but you're right - I haven't seen any IE 1 or 2 in so long, there's no doubt it would be good to remove MSIE 1 from the ban list.
Thanks for the information!
Simon Lloyd
11-20-2012, 05:57 PM
It won't block MSIE Beta 10, but it will block MSIE 10. Best to remove MSIE 1 from your list now. I haven't seen any references in the output.txt file of anything prior to MSIE 5 so removing MSIE 1 probably won't cause any problems.
https://vborg.vbsupport.ru/external/2012/11/15.jpg
Edit: If you were wondering what the IP Address 108.2.106.107 is, it is Verizon's Search Engine (www.verizon.net) definitely not something you would want to block.It will block MSIE Beta 10 if it appears like that in the useragent, this mod will block ALL instances where the exact string in your list is found in the UA.
My Hattiesburg
12-15-2012, 09:45 PM
Okay, I've installed this and it seems to be working fine, but I have some questions.
At first the Baidu spider was hammering us, hitting the site about every 8 seconds, but now it seems to have more or less given up on us. Yandex, on the other hand, seems to have intensified it's attempted crawls. At first it was hitting the site about every 25 seconds but over the course of this install has ranged everywhere from every 1 second to where it's at now, about every 1.5 minutes. The 1.5 minute attempts have only occurred in the last couple of days.
A couple of days ago when Yandex was hitting us every 1 second, we had some server load issues. I don't know if this is related but it seems it might be and I'm wondering if logging the blocks to a text file might be counterproductive in that area, writing to the file every second or so.
Also, Yandex is using the same IP address every time, so I thought it might be best to just block it using the .htaccess file, but that doesn't seem to have had any effect. Is this mod redirecting Yandex before it has a chance to read the .htaccess file or is Yandex simply ignoring it?
smirkley
12-15-2012, 10:24 PM
Your server would be the one to decide if Yandex is blocked by the .htaccess file or not.
If the ip is denied in htaccess, your server will block before any vb module loads, or any page opther for that matter.
Simon Lloyd
12-16-2012, 05:29 AM
Smirkley is right, .htaccess is loaded before anything else. As for my mod using Yandex as a blocking string will block it, if it is still showing then thats because either they dont actually have yandex in the user agent or you've entered more than just Yandex as the string, remember, my mod bans anything that has exactly your string to look for (including spaces), so if your strings in my mod look like this:
Baidu
Yandex
SoSo
.....etc
then yandex will be blocked, however if it looks like this:
Baidu
Yandex123
SoSo
....etc
then any bot with just yandex in their string or any other kind of yandex like YandexWorld will not be blocked, but anything containing yandex123 will be blocked.
As for writing to a file, just turn that bit off, it's there simply for test purposes, trouble shooting or checking on individual user agents.
My Hattiesburg
12-16-2012, 08:20 PM
Your mod is blocking Yandex, but I just figured since it was using the same IP address every time it might be better to just go ahead and block that IP.
I guess I didn't do something right in the .htaccess file because Yandex was getting past it and was getting blocked by the mod.
Simon Lloyd
12-17-2012, 03:52 PM
yandex doesn't always use the same IP, somewhere earlier in the pages for this mod i think i posted how to do it in .htaccess if you wish.
smstoolbox
12-19-2012, 05:10 PM
Hi Simon,
I have recently installed your mod and whilst testing options managed to block my home PC agent from accessing my site (Im new to this game sorry)! My question is after removing my agent details i still appear to be blocked from my home PC - How can i rsolve this any advice would be appreciated....
Alibass
12-19-2012, 05:19 PM
Try clearing your browser cache and also cache from admincp/Maintenance
Simon Lloyd
12-19-2012, 05:40 PM
It will be your browser cache, you can also try (on your pc) Start>Run>ipconfig /flush dns (the space is intentional)
Max Taxable
12-19-2012, 05:47 PM
I'll caution on the above - you really don't want to enter any parts of user agent strings that might be common stuff - you have the potential of not only blocking yourself, but literally tens of millions of computers and/or other devices.
It's one of the best Mods there is for vBulletin but, USE WITH CAUTION.
Simon Lloyd
12-19-2012, 08:05 PM
I'll caution on the above - you really don't want to enter any parts of user agent strings that might be common stuff - you have the potential of not only blocking yourself, but literally tens of millions of computers and/or other devices.
It's one of the best Mods there is for vBulletin but, USE WITH CAUTION.Thanks for that, this mod was only ever built to stop bots eating up your bandwidth which is why i recommend using actual bot names found in the useragents :)
smstoolbox
12-20-2012, 09:44 AM
It will be your browser cache, you can also try (on your pc) Start>Run>ipconfig /flush dns (the space is intentional)
:) Thanks Simon I will try this once I get back from work
smstoolbox
12-20-2012, 09:44 AM
Try clearing your browser cache and also cache from admincp/Maintenance
Thanks for the advice!!
smstoolbox
12-21-2012, 12:14 PM
:) Thanks Simon I will try this once I get back from work
:up: Sorted thanks again!!
smstoolbox
12-21-2012, 12:15 PM
I'll caution on the above - you really don't want to enter any parts of user agent strings that might be common stuff - you have the potential of not only blocking yourself, but literally tens of millions of computers and/or other devices.
It's one of the best Mods there is for vBulletin but, USE WITH CAUTION.
:up: Thanks for the heads up on this!!
WorldCraft
12-29-2012, 08:59 PM
Fantastic little mod. Works great! :up:
well done, mark installed
Simon Lloyd
01-24-2013, 02:42 PM
Glad its helped you:)
Simon Lloyd
02-02-2013, 04:04 AM
I'm looking for feedback guys!
Would it be beneficial to automatically ban bots that exceed x number of bots at any one time?
So, the likes of Baiduspider send around 200 at any one time, so if i entered say 150 (in place of x) in a settings box then they would automatically get added to the ban list, let me know your views as i'm not going to work on something nobody feels is needed :)
bzcomputers
02-02-2013, 06:16 AM
I'm looking for feedback guys!
Would it be beneficial to automatically ban bots that exceed x number of bots at any one time?
So, the likes of Baiduspider send around 200 at any one time, so if i entered say 150 (in place of x) in a settings box then they would automatically get added to the ban list, let me know your views as i'm not going to work on something nobody feels is needed :)
It's not a bad idea but is probably not needed. I think most any bot that would "exceed a certain number" would probably be a bot we are already blocking by name with this. I guess it would be nice to have a second log of the bots that are coming through if that is possible, then we could tell if it was necessary.
One thing I wouldn't mind seeing is options to choose both filename and directory for the bot output file(s). An option to be able to show the most recent bots at the top of the file (reverse of how it saves now) is something I would like too, not sure what everyone else thinks.
Alibass
02-02-2013, 12:10 PM
I'm looking for feedback guys!
Would it be beneficial to automatically ban bots that exceed x number of bots at any one time?
So, the likes of Baiduspider send around 200 at any one time, so if i entered say 150 (in place of x) in a settings box then they would automatically get added to the ban list, let me know your views as i'm not going to work on something nobody feels is needed :)
I like this idea and would most definitely like to see this feature added. :)
S_E_A
02-08-2013, 01:32 PM
Hi,
I would like to ban Amazon AWS EC2. I have tried AmazonAWS and Amazon AWS EC2. Any suggestions please?
Thank you.
Simon Lloyd
02-08-2013, 01:55 PM
Check out the links i've given in the mod description above entitled : How do i ban a bot?
it should explain how to find out their exact user agent :)
Hi,
I would like to ban Amazon AWS EC2. I have tried AmazonAWS and Amazon AWS EC2. Any suggestions please?
Thank you.
That's a hosting service. Why are they spidering your site? Are you sure that's correct?
S_E_A
02-08-2013, 03:25 PM
Based on research a number of people recommend blocking AmazonAWS. What do people on here recommend?
Simon Lloyd
02-08-2013, 03:59 PM
That's a hosting service. Why are they spidering your site? Are you sure that's correct?
Based on research a number of people recommend blocking AmazonAWS. What do people on here recommend?I suspect accounts held on some of their servers are of no use to your forum and are scrapping content or emails..etc
Banning bots, as i've always said is a personal thing :)
I suspect accounts held on some of their servers are of no use to your forum and are scrapping content or emails..etc
Banning bots, as i've always said is a personal thing :)
Okay, but if I recall correctly this only bans by user agent, not IP block and therefore would be ineffective to ban 'AWS'.
Simon Lloyd
02-08-2013, 05:16 PM
Why would it be ineffective banning them? every device that accesses the internet...etc has a UserAgent, you just need to find the useragent and i show you how to do that in the links in the mod description.
Read this: http://www.webmasterworld.com/search_engine_spiders/4368965.htm
If you really want to ban ip's then https://vborg.vbsupport.ru/showthread.php?t=268146
Why would it be ineffective banning them? every device that accesses the internet...etc has a UserAgent, you just need to find the useragent and i show you how to do that in the links in the mod description.
Read this: http://www.webmasterworld.com/search_engine_spiders/4368965.htm
If you really want to ban ip's then https://vborg.vbsupport.ru/showthread.php?t=268146
Because since Amazon runs a cloud hosting service, anyone can own an AWS server. Hell, I have one. There is no ONE service and user agent on AWS, so its not possible to ban all AWS servers by user agent.
Simon Lloyd
02-08-2013, 05:25 PM
But not ALL AWS users are bad, are you? :), agreed you cannot ban a server but every bot, spider, person or device that comes your way will have a UA that you can ban.
But not ALL AWS users are bad, are you? :), agreed you cannot ban a server but every bot, spider, person or device that comes your way will have a UA that you can ban.
The request was to ban AWS servers by user agent. That's not possible.
(And technically you don't even have to send a user agent.)
Simon Lloyd
02-08-2013, 05:37 PM
The request wasn't specifically to ban the servers by UA :), if you send a malformed or blank UA then you can ban those too ;)
As a side note i noticed that you haven't downloaded the latest version of this mod or marked it installed, have you uninstalled it, if so could i ask why? just helps me develop more robust things in the future.
The request wasn't specifically to ban the servers by UA :), if you send a malformed or blank UA then you can ban those too ;)
As a side note i noticed that you haven't downloaded the latest version of this mod or marked it installed, have you uninstalled it, if so could i ask why? just helps me develop more robust things in the future.
I started with this mod. However, my server at the time was so resource starved that I needed to block the spiders before it got to PHP/MYSQL. Nothing wrong with it. It worked well. I just couldn't afford the resources.
Max Taxable
02-10-2013, 03:09 AM
The request was to ban AWS servers by user agent. That's not possible.
(And technically you don't even have to send a user agent.)Yes it is. Enter it exactly like it appears in the user agent string.
"amazonaws"
Yes it is. Enter it exactly like it appears in the user agent string.
"amazonaws"
Yes, but no one is using that UA. Amazon has no reason to crawl *any* site.
Max Taxable
02-11-2013, 04:56 PM
Yes, but no one is using that UA. Amazon has no reason to crawl *any* site.Amazon AWS is their hosting they sell. And yes they also crawl the web: http://aws.amazon.com/search-engines/
I have it blocked as well, using this Mod.
Here is how I decide what UAs I block:
1.) Is it beneficial for my site to have it crawling?
2.) Does it behave nicely? Does it obey robots.txt?
3.) If in any way suspicious, it goes in this Mod.
Like the developer says, it's all about personal choice.
Amazon AWS is their hosting they sell. And yes they also crawl the web: http://aws.amazon.com/search-engines/
I have it blocked as well, using this Mod.
Did you read that? There is nowhere in that link that says that Amazon themselves crawl websites. Can you even explain why a hosting company would want to catalog data from every website on the internet?
I'm wondering if there is some confusion on what a user agent is and does. The UA is the remote web crawlers way of tell you that it is there cataloging your site. It's not required that a crawler send you a UA at all. Instead, its just considered polite. If someone wanted to, they could send a completely random UA every time or not send one at all.
Since Amazon AWS is in the hosting business, they have no need to crawl websites at all. However, this doesn't PREVENT people from buying their own server from Amazon and crawling your website. If someone were to do this, the UA would be whatever they wanted it to be, not some form of "AmazonAWS".
Assuming what you're really trying to do is prevent anyone from buying a server from Amazon and accessing your website, you'll need to find all the IP blocks that AWS owns and block those. However, that is outside the scope of this mod.
Simon Lloyd
02-11-2013, 07:58 PM
For reference here's what a user agent is and some extra info http://en.wikipedia.org/wiki/User_agent. All this mod is designed to do is stop bots from eating up your bandwidth by redirecting them before any content loads. To be honest you can never stop anyone who is intent on scraping your site from doing so.
Max Taxable
02-11-2013, 11:00 PM
Did you read that? There is nowhere in that link that says that Amazon themselves crawl websites. Can you even explain why a hosting company would want to catalog data from every website on the internet?
I'm wondering if there is some confusion on what a user agent is and does. The UA is the remote web crawlers way of tell you that it is there cataloging your site. It's not required that a crawler send you a UA at all. Instead, its just considered polite. If someone wanted to, they could send a completely random UA every time or not send one at all.
Since Amazon AWS is in the hosting business, they have no need to crawl websites at all. However, this doesn't PREVENT people from buying their own server from Amazon and crawling your website. If someone were to do this, the UA would be whatever they wanted it to be, not some form of "AmazonAWS".
Assuming what you're really trying to do is prevent anyone from buying a server from Amazon and accessing your website, you'll need to find all the IP blocks that AWS owns and block those. However, that is outside the scope of this mod.The "amazonaws" crawlers have that designation in their UA string. Anything else coming from Amazon has it in its host description.
The rest of your missive, I am well aware of.
Inspector G
03-03-2013, 01:39 AM
I have a confusing question...
Ok I have a very small member site...like 24 members...
So when I noticed I had 35 users online most of the time and I started seeing more and more baidu spiders
I decided to do something about it...
I installed this mod.
almost instantly ...well within say 3 hours my users online soared to well over 150 on busy times like Now...tonight.
I had
Most users ever online was 247, 1 Day Ago at 12:58 AM.
With only one new account created, and maybe me or one other registered user online...
My question is this. what happened when I installed this mod to make such a drastic change in the users on my site and why?
I do not understand this and I read that the server load increases...
I find it hard to believe that anyone is finding my site via a search engine since it is a brand new .cc name and it has only been online for two months now...
Is there something about pushing away Baidu that enables more sites to come, or Spam bots?
attempting to register and what not, many are in areas that there would not be a normal user.
I see many attempts a registering and yet no more new users.,.. so I believe those are bots locking...
Please advise...
Simon Lloyd
03-03-2013, 03:42 AM
What's happening is (and you'll probably find this) is because Baidu can't get in with the spiders/ip's they were using they are now trying a rotation of other ip's and bots, i use this mod myself although i don't ban the bots as i monitor their visits to further enhance any mod i make against them, i currently have 236 baidu bots (and 140 other bots/search engines) at my site.
With the mod in place and redirection working you'll find that these bots that you have banned will slowly drop off as they all get the message of the 301 permananet redirect to wherever you've decided to send them, your server load will lessen and things will be more normal :)
Simon Lloyd
03-03-2013, 03:44 AM
Also do you have your robots.txt set up correctly to stop the search engines or bots that obey robots.txt from indexing pages on your site that they shouldn't like register.php, members.php ....etc?
Inspector G
03-03-2013, 03:59 AM
I did not understand how to do the text part since I am what I even call very green in this aspect of Vbulleting...
so I just installed the mod...
I can wait and see if it drops off and report back...
Thanks for the help in understanding...
Simon Lloyd
03-03-2013, 05:34 AM
Ok, what you need to do is upload the attached to your forum root, however if your forum is at this level www.mysite.com/ then edit the attached to remove /forums if your forum is at this level www.mysite.com/forums then you can just upload it to that folder.
You can add any page or file to robots.txt that you wish, just follow the same structure :)
Inspector G
03-03-2013, 05:56 AM
Well thanks Simon...
Thats really nice...
I will do so immediately.
Nice to see someone really help out the Noob...lol
Thanks again I appreciate this very much...
I will report back.
Inspector G
03-03-2013, 05:59 AM
So I think what you are telling me is this...
Since my site forum is at root level to edit as follows...
This...Disallow: /forums/albums.php
to This...Disallow: /albums.php
Simon Lloyd
03-03-2013, 07:40 AM
yes if your forum isn't in a folder but simply "on your server" so you dont need to access a folder to get to it then thats correct!
dog-tag
03-31-2013, 04:37 PM
After being only installed 10 minutes, I've seen a 20% drop in server load already. I was already blocking them with .htaccess but they were still getting in. According to AWstats bots have been hitting my server MILLIONS of times per month.
Thank you very much from the bottom of my heart, you're very talented!
Simon Lloyd
03-31-2013, 05:44 PM
You're welcome, dont forget to remove them from /htaccess now as they will be adding load just being there :)
datoneer
04-01-2013, 08:21 PM
Thank you good mod
Simon Lloyd
04-01-2013, 08:54 PM
Glad you like it :)
bzcomputers
05-01-2013, 09:29 PM
Been running this for a little over 8 months now.
This past month it blocked 6,659 bad bots. Which is very close to what it blocked on the first month I had it installed.
Baidu finally stopped coming after about 4 months. They were originally hitting the site at over 10 times an hour. Yandex is still coming but they are down to once or twice a day instead of multiple times an hour.
Most Popular blocked User Agents currently:
FunWebProducts, MSIE 6, MSIE 7, Nutch, Yandex
My Full Blocked User Agent list:
almaden
Anarchie
Artabus
ASPSeek
attach
autoemailspider
BackWeb
Baidu
Bandit
BatchFTP
BlackWidow
BoardReader
Bot\mailto:craftbot@yahoo.com
Buddy
bumblebee
CherryPicker
ChinaClaw
CICC
Collector
CoolWebSearch
Copier
Copyscape
Crescent
DIIbot
DISCo
DISCo\Pump
dotbot
Download\Demon
Download\Wonder
Downloader
Drip
DSurf15a
eCatch
EasyDL/2.99
EirGrabber
email
EmailCollector
EmailSiphon
EmailWolf
Express\WebPictures
ExtractorPro
EyeNetIE
FileHound
FlashGet
FrontPage
FunWebProducts
GetRight
GetSmart
GetWeb!
gigabaz
GNIP
Go\!Zilla
Go!Zilla
Go-Ahead-Got-It
gotit
Grabber
GrabNet
Grafula
grub-client
HMView
HTTrack
httpdown
.*httrack.*
ia_archiver
Ichiro
Image\Stripper
Image\Sucker
Indy*Library
Indy\Library
InterGET
InternetLinkagent
Internet\Ninja
InternetSeer.com
Iria
JBH*agent
JetCar
JOC\Web\Spider
JustView
larbin
LeechFTP
LexiBot
lftp
Link*Sleuth
likse
//Link
LinkWalker
Mag-Net
Magnet
Magpie
Mass\Downloader
Memo
Microsoft.URL
MIDown\tool
Mirror
Mister\PiX
Mozilla.*Indy
Mozilla.*NEWT
Mozilla*MSIECrawler
MS\FrontPage*
MSFrontPage
MSIECrawler
MSIE 2
MSIE 3
MSIE 4
MSIE 5
MSIE 6
MSIE 7
MSProxy
Navroad
NearSite
NetAnts
NetMechanic
NetSpider
Net\Vampire
NetZIP
NICErsPRO
Ninja
Nutch
Octopus
Offline\Explorer
Offline\Navigator
omgili
Openfind
Opera/1
Opera/2
Opera/3
Opera/4
Opera/5
Opera/6
Opera/7
Opera/8
PageGrabber
Papa\Foto
pavuk
pcBrowser
Ping
PingALink
Pockey
psbot
Pump
QRVA
RealDownload
Reaper
Recorder
ReGet
Scooter
Seeker
Siphon
sitecheck.internetseer.com
SiteSnagger
SlySearch
SmartDownload
Snake
sogou
Soso
SpaceBison
speedy
Spinn3r
sproose
Stripper
Sucker
SuperBot
SuperHTTP
Surfbot
Szukacz
tAkeOut
Teleport\Pro
URLSpiderPro
Vacuum
VoidEYE
Web\Image\Collector
Web\Sucker
WebAuto
[Ww]eb[Bb]andit
webcollage
WebCopier
Web\Downloader
WebEMailExtrac.*
WebFetch
WebGo\IS
WebHook
WebLeacher
WebMiner
WebMirror
WebReaper
WebSauger
Website
Website\eXtractor
Website\Quester
Webster
WebStripper
WebWhacker
WebZIP
Wget
Whacker
Widow
WWWOFFLE
x-Tractor
Xaldon\WebSpider
Xenu
Yandex
Yeti
YOUDAOBOT
Zeus.*Webster
Zeus
This new one just showed up and has been attempting to ping my site on average around a hundred times a day (started about 15 days ago):
05-01-2013 16:20:25 .
Matched bots[135]: . Ping .
With User Agent: . A6-INDEXER/1.0 (HTTP://WWW.A6CORP.COM/A6-WEB-SCRAPING-POLICY/) .
Seems some bots come and go, just glad this mod is here!
Simon Lloyd
05-01-2013, 10:29 PM
Im very glad you've found this useful, thanks for posting your updated bot list it may help others decide which to block, however i still have to mention that banning bots is a personal thing and you have to decide what it is you want to acheive from the banning and will anything you block prevent legitimate people from viewing your site.
In the above you block MSIE 7, whilst this may be good for you others may want users who still only have IE7 to be able to view their site. All i'm saying to people is think before you block :)
bzcomputers
05-01-2013, 11:12 PM
What is your take on "MSIE 6"? I seem to also be getting quite a few hits from that browser as well.
Simon Lloyd
05-01-2013, 11:33 PM
Personally unless you're catering for developing countries (computerwise i mean like eastern block...etc) i'd ban MSIE 6 but again have to stress it's a personal choice.
vBulletin® v3.8.12 by vBS, Copyright ©2000-2025, vBulletin Solutions Inc.