vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vBulletin 4.x Add-ons (https://vborg.vbsupport.ru/forumdisplay.php?f=245)
-   -   Miscellaneous Hacks - Ban Spiders by User Agent (https://vborg.vbsupport.ru/showthread.php?t=268208)

Toxic2 11-09-2013 07:55 PM

Simon can you come up with a Viable List of Bad UA's more appropriate for 2013? MSIE1 blocked one of my members, so i deleted MSIE1 from the List.

Simon Lloyd 11-09-2013 08:25 PM

Hi Toxic2, banning UAs is a personal thing, you have to choose which to ban or not, throughout this thread (and the other vb versions of this mod) there are lists, there's a maintained list of spiders a vb.com in a thread by Mosh so pretty much all of the crawlers will be listed there (a download from his site) and you just choose which you want to ban by unveiling their UA when they visit your site.

Max Taxable 11-09-2013 09:55 PM

This Mod's default list is a pretty dog gone good list, but like Simon says it might not meet your needs. It depends on your target audience countries, your preferences, all that.

bzcomputers 11-09-2013 10:09 PM

Quote:

Originally Posted by Toxic2 (Post 2459521)
Simon can you come up with a Viable List of Bad UA's more appropriate for 2013? MSIE1 blocked one of my members, so i deleted MSIE1 from the List.

Removing MSIE1 is about the only thing that would definitely be recommended to remove from the old list for this mod. MSIE1 will block users on IE10 and IE11 (which means A LOT of users and growing daily). I haven't seen anything prior to IE5 try to access my sites in a long time, so removing MSIE1 should not negatively impact your site in any way either (but blocking IE10 & 11 will!).

As was mentioned above it is a personal choice on what you block. There are thousands of sites out there that don't block any and are doing just fine. You need to figure out what's important for your site. Being hit hundreds of times a day from bots from regions of the world you probably don't even cater too like China or countries of the former Soviet Union? or Blocking those bots and in turn reducing server load and load times for those users you do want to cater too?

For instance if you block IE6 (MSIE6) you will currently block over 20% of Chinese internet users (average everywhere else in the World is less than 1% using IE6). When you consider China has 1.3 billion people that is a lot of potential website traffic. At the same time China also accounted for nearly 40% of the world's computer hacking attack traffic during 2012. If you have content that is viable for Chinese users which can possibly turn into potential profit (sales / ad clicks) for you then blocking MSIE6 and chinese specific bots like baiduspider, iaskspider, sogou spider, Sosospider, YoudaoBot and others is probably not recommended.

There are plenty of good resources of information in this thread and other places on the net on useragents. Just remember copying someones elses useragent blocking list directly and not specifically taking a little time and tweaking it for your own needs is almost guaranteed to negatively impact your site traffic.

Here are a couple of additional resources:
http://www.useragentstring.com/pages/Crawlerlist/
http://www.user-agents.org/

Max Taxable 11-09-2013 10:18 PM

Quote:

Originally Posted by bzcomputers (Post 2459539)
Removing MSIE1 is about the only thing that would definitely be recommended to remove from the old list for this mod. MSIE1 will block users on IE10 and IE11 (which means A LOT of users and growing daily).

I definitely agree with this.

Simon Lloyd 11-10-2013 07:29 PM

If you really want to block MSIE1 then enter it exactly as that plus a leading space, if I remember rightly I've allowed it to accept spaces rather than ignore them, so doing it like that will not (or shouldn't) ban MSIE10.

Alan_SP 11-10-2013 10:04 PM

I have similar problem, it might be solved now.

But anyway, just to ask it, what other than blocking MSIE 1 could block legitimate users in this list? If anyone knows:

Quote:

AhrefsBot
www.archive.org
gooblogsearch
seoprofiler.com
WinHttp
exabot
commoncrawl
80legs
TITAN
Charlotte
http://labs.topsy.com/butterfly/
MSRBOT
Indy Library
Plukkie
Goo Blog Search
SeznamBot
T312461
PostRank
Tweetmeme.com
Twitterbot
WordPress
Windows Live SOEE
MLBot
PycURL
ScoutJet
Ezooms
Deepnet Explorer
Mail.Ru
Majestics MJ12bot
almaden
Anarchie
Artabus
ASPSeek
attach
autoemailspider
BackWeb
Baidu
Bandit
BatchFTP
BlackWidow
Bot\mailto:craftbot@yahoo.com
Buddy
bumblebee
CherryPicker
ChinaClaw
CICC
Collector
Copier
Copyscape
Crescent
DIIbot
DISCo
DISCo\Pump
dotbot
Download\Demon
Download\Wonder
Downloader
Drip
DSurf15a
eCatch
EasyDL/2.99
EirGrabber
email
EmailCollector
EmailSiphon
EmailWolf
Express\WebPictures
ExtractorPro
EyeNetIE
FileHound
FlashGet
FrontPage
GetRight
GetSmart
GetWeb!
gigabaz
GNIP
Go\!Zilla
Go!Zilla
Go-Ahead-Got-It
gotit
Grabber
GrabNet
Grafula
grub-client
HMView
HTTrack
httpdown
.*httrack.*
ia_archiver
Ichiro
Image\Stripper
Image\Sucker
Indy*Library
Indy\Library
InterGET
InternetLinkagent
Internet\Ninja
InternetSeer.com
Iria
JBH*agent
JetCar
JOC\Web\Spider
JustView
larbin
LeechFTP
LexiBot
lftp
Link*Sleuth
likse
//Link
LinkWalker
MSIE 2
MSIE 3
MSIE 4
MSIE 5
Mag-Net
Magnet
Magpie
Mass\Downloader
Memo
Microsoft.URL
MIDown\tool
Mirror
Mister\PiX
Mozilla.*Indy
Mozilla.*NEWT
Mozilla*MSIECrawler
MS\FrontPage*
MSFrontPage
MSIECrawler
MSProxy
Navroad
NearSite
NetAnts
NetMechanic
NetSpider
Net\Vampire
NetZIP
NICErsPRO
Ninja
Nutch
Octopus
Offline\Explorer
Offline\Navigator
omgili
Openfind
PageGrabber
Papa\Foto
pavuk
pcBrowser
Ping
PingALink
Pockey
psbot
Pump
QRVA
RealDownload
Reaper
Recorder
ReGet
Scooter
Seeker
Siphon
sitecheck.internetseer.com
SiteSnagger
SlySearch
SmartDownload
Snake
sogou
Soso
SpaceBison
speedy
Spinn3r
sproose
Stripper
Sucker
SuperBot
SuperHTTP
Surfbot
Szukacz
tAkeOut
Teleport\Pro
URLSpiderPro
Vacuum
VoidEYE
Web\Image\Collector
Web\Sucker
WebAuto
[Ww]eb[Bb]andit
webcollage
WebCopier
Web\Downloader
WebEMailExtrac.*
WebFetch
WebGo\IS
WebHook
WebLeacher
WebMiner
WebMirror
WebReaper
WebSauger
Website
Website\eXtractor
Website\Quester
Webster
WebStripper
WebWhacker
WebZIP
Wget
Whacker
Widow
WWWOFFLE
x-Tractor
Xaldon\WebSpider
Xenu
Yandex
Yeti
YOUDAOBOT
Zeus.*Webster
Zeus

Max Taxable 11-10-2013 10:27 PM

Quote:

Originally Posted by Alan_SP (Post 2459821)
what other than blocking MSIE 1 could block legitimate users in this list? If anyone knows:

Blocking IE 1-6 won't block any legitimate humans, it will block 90% of the botnet zombie computers out there.

Alan_SP 11-11-2013 11:40 PM

Actually, if you use string "MSIE 1" it blocks users of MSIE 10 and 11, as product looks for string anywhere, not as a whole word. And, if I'm not mistaken, to block only MSIE 1, you should use string MSIE 1,0.

I had many regular users who couldn't reach my site, probably all of them use MSIE 10 or 11.

Max Taxable 11-12-2013 12:11 AM

Quote:

Originally Posted by Alan_SP (Post 2460024)
Actually, if you use string "MSIE 1" it blocks users of MSIE 10 and 11, as product looks for string anywhere, not as a whole word. And, if I'm not mistaken, to block only MSIE 1, you should use string MSIE 1,0.

I had many regular users who couldn't reach my site, probably all of them use MSIE 10 or 11.

The developer covered that here.

But I agree with you, there's really no reason to have IE1 on the ban list, there very likely isn't a functioning computer anywhere that is also online, that runs it.

IE6 is the biggest botnet zombie browser though, and there are MILLIONS of them still online.

qpurser 12-10-2013 03:29 PM

I added "Majestics MJ12bot Spider" to the list some time ago but for some reason he keeps coming back and I can see he is viewing threads.
I added them as "majestics" and "Majestics MJ12bot Spider"

When they are on the list are they not banned and redirected automatically?

Avros 12-10-2013 06:31 PM

Not all Bots will adhere to or obey the text file. If anything it will provide them with more information than you want them to have.

Only a minority of spiders adhere to those rules.

Max Taxable 12-10-2013 06:55 PM

Quote:

Originally Posted by Avros (Post 2467411)
Not all Bots will adhere to or obey the text file. If anything it will provide them with more information than you want them to have.

Only a minority of spiders adhere to those rules.

They can ignore "robots.txt" but they can't ignore this. It's not "robots.txt" it has teeth.

Here's my current blacklist:
SemrushBot
SeznamBot
http_requester
BIXOCRAWLER
KomodiaBot
QACC
Plukkie
botje
Opera/9.80
'Mozilla
CompSpyBot
ScreenerBot
Chrome/15
python-requests
ZumBot
Ruby
Add Catalog
Genieo
socialbm_bot
Ezooms
omgilibot
Go http package
JikeSpider
Python-urllib
Iron
sputnik
Xenu
Wotbox
200PleaseBot
360Spider
Indy Library
Sogou
SEOstats
baiduspider
beta.statsit.com
statsit
SiteIntel
Yandex
GomezAgent
Nesotebot
DCPbot
AOL Advertising R&D
DataCha0s
aiHitBot
Apache-HttpClient
Zend_Http_Client
ReverseGet
XXX bot Content
vBSEO
spbot
OffByOne
thyroidbuzz
AcoonBot
coccoc
xpymep
proxyproxy2884
AppEngine
start.exe
Semiocast HTTP client
Firefox/3.6.23
Firefox/3.6.3
TurnitinBot
curl
SwpLc/1.6
GrepNetstat.com
news bot
AskTbPTV
checks
panopta
App3le
PhantomJS
AlwaysOnline
SISTRIX
proximic
CRAWL-E/0.6.4
WebMoney
HTMLParser
oBot
UnisterBot
ERACrawler
MSIE 2
MSIE 3
MSIE 4
MSIE 5
MSIE 6
crawler4j
NCSA_Mosaic
Rippers
80legs
Firefox/3.5.6
YaBrowser
majestic
EasouSpider
User-Agent
FunWebProducts
I am not seeing anything from "majestic" in online.php. Or any of these for that matter.

Simon Lloyd 12-10-2013 08:52 PM

What you need to do is check their UA when they are online, the chances are that what you've entered in the list is NOT in their UA that's why they still get through :*

Stevenwi 12-15-2013 07:13 AM

Installed :)

sv1cec 12-31-2013 11:10 AM

Marked as installed, but it messes up my vB 4.2.2 Suite after a while. So I removed it.

ozzy47 12-31-2013 11:12 AM

Messes it up how, what you put is not going to help anyone debug your issues.

sv1cec 12-31-2013 11:21 AM

Correct, apologies, but I wanted to remove it quickly, since users were complaining.

Well, after I install it, and I start getting emails from the hack, the normal layout of the page is messed up, and the site is shown in a very strange way, you could say that no formatting is done in the page, only characters shown.

I tried it twice and I had two friends email me saying that the layout was messed up. I checked and sure enough it was. I'll reinstall this and see if I can grab a shot of before and after.

ozzy47 12-31-2013 11:29 AM

Yeah that is strange, I have not seen issues like that. Perhaps if it does it again, instead of un installing, just disable it, incase someone needs to debug it on your site.

sv1cec 12-31-2013 11:39 AM

Reinstalled it and waiting.

One strange thing is that some of the emails take way too long to be sent. I just received an email which was sent about 70 minutes ago. Is that normal?

ozzy47 12-31-2013 11:40 AM

That would be a issue with your server, not the mod if I am not mistaken.

sv1cec 12-31-2013 11:57 AM

One question, regarding logging to a file. Shall the file be created by me, or is it created automagically by the mod? If I have to create it, what permissions are required?

Simon Lloyd 12-31-2013 12:15 PM

The mod should not mess with your install, if there was a problem it would be from the off and not give issues later. With regards logging, the mod creates the file provided that you have permissions on your server to create the same.

Simon Lloyd 12-31-2013 12:18 PM

With regards emails it will depend on forum usage and and your daily allowances if you're not on a dedicated box.

Please attach a txt file of the contents of your ban list an a pic of your settings, the issues you described are not or have not been reported and seem unlikely to have been caused by the mod but we'll try and help as much as possible.

spillage 01-05-2014 02:16 AM

I'm running it on 4.2.2 just fine.

tanzeelniazi 01-14-2014 07:45 PM

What is Redirect URL: which url can i add ? explain please
Redirect to spiders own ip? can i tick Yes ?
Spider List : Default Use Question: Baidu (Bing.com) Yandex: yahoo.com Its mean All traffic from Bing and yahoo Banned?

bzcomputers 01-14-2014 08:25 PM

Quote:

Originally Posted by tanzeelniazi (Post 2474198)
What is Redirect URL: which url can i add ? explain please
Redirect to spiders own ip? can i tick Yes ?
Spider List : Default Use Question: Baidu (Bing.com) Yandex: yahoo.com Its mean All traffic from Bing and yahoo Banned?

Redirect URL is the url spiders will be directed to if they are in your "Spider List". If you select "Yes" to redirect to the spider's own IP this will override any url you set beneath it. You use one or the other.

No, all spider traffic from Bing and Yahoo will not be banned they are different services and have different spiders. Yes, there is some overlap in technologies used by some search engines but this will not affect you being scanned and ranked by Bing and Yahoo.

I'm not sure why you made a direct correlation between Baidu (Chinese search engine) and Bing, then Yandex (Russian search engine) and Yahoo but they are all completely different services wholly owned by different companies ...for now.

Simon Lloyd 01-14-2014 09:36 PM

What he ^^ said :-)

Max Taxable 01-14-2014 11:24 PM

Quote:

Originally Posted by tanzeelniazi (Post 2474198)
Its mean All traffic from Bing and yahoo Banned?

No it does not. It means anything with USER AGENT STRING including Bing and Yahoo are blocked. Don't confuse this with hostnames or referrers - those aren't affected by this.

Simon Lloyd 01-15-2014 12:13 AM

Just to be clear again BING is NOT Baidu and Yahoo is NOT Yandex, this mod will only ban a spider/bot/person if their useragent contains an entry that you have in your list.

lazytown 03-19-2014 10:19 PM

Does this have an option to just block registrations?

ozzy47 03-19-2014 11:24 PM

No, blocking the bot, in affect stops them from your site all together, which is what you want to do with bad bots. :)

Simon Lloyd 03-20-2014 06:29 AM

When you've blocked a useragent they offender cannot even load a page on your site, so they cannot register or do anything.

ggrimes620 03-20-2014 08:48 PM

I used to see upwards of about 100-200 bots/guests on my board at all times of the day. I installed Simon's MOD last week and it has practically eliminated all the bad bots from seeing my website! I only see legit bots/guests viewing my website now!

Very happy with this MOD!

Simon Lloyd 03-20-2014 09:15 PM

Nice to hear its doing its job as well ad expected :-)

ozzy47 03-26-2014 02:03 AM

Ok I got a question, every once in awhile, I notice bots that are in the list, showing up in WOL, it is not often, but it does happen.

Code:

180.76.5.24 - Whois
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

Code:

199.21.99.109 - Whois
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

Is it possible that at the exact moment those spiders are hitting that a cron is running, and allowing the bots to bypass the check?

Simon Lloyd 03-26-2014 02:18 AM

The bots aren't bypassing the check and the mod is live, it doesn't use cron, the mod runs at every page call. You'll see them now and then as they do actually have to make a call for a page of some sort in order to be checked then redirected, it may be at this moment of redirection that they get on the WOL list, but as you are aware the list isn't live and is subject to the timeout period you set in admincp (mines set for 900 seconds), one other reason you could see them is if you use a mod for who visited ...etc like Paul M's one, they will always show there as his mod is doing it's job recording them but mine is equally doing it's job by transporting them anwhere but they're intended page :)

ozzy47 03-26-2014 02:26 AM

I know they show in his mod, but it is just strange seeing them show up in online.php page.

I remember back in 2011 when I beta tested this for you, that never happened. :confused:

Max Taxable 03-26-2014 03:14 AM

Quote:

Originally Posted by Simon Lloyd (Post 2489567)
The bots aren't bypassing the check and the mod is live, it doesn't use cron, the mod runs at every page call. You'll see them now and then as they do actually have to make a call for a page of some sort in order to be checked then redirected, it may be at this moment of redirection that they get on the WOL list, but as you are aware the list isn't live and is subject to the timeout period you set in admincp (mines set for 900 seconds), one other reason you could see them is if you use a mod for who visited ...etc like Paul M's one, they will always show there as his mod is doing it's job recording them but mine is equally doing it's job by transporting them anwhere but they're intended page :)

I use that Mod on two boards and never, ever saw a spider on this list, appear there. I have also never seen a spider on this list, showing up in WoL.

I am on vB3.8 though..... And they do not and never have shown up in Paul's Mod.

ozzy47 03-26-2014 11:42 PM

Strange, they have always showed in Paul's mod for me. Which is fine, I just don't get the WOL deal.

Visitors (359),
Logged In (6),
Google Spiders (51),
Majestics MJ12bot Spiders (12),
Bing Spiders (12),
Whois Source Spiders (1),
Baidu Spiders (111),
A Bot Spiders (1),
Wayback Machine Spiders (15),
Squider Spiders (29),
AhrefsBot Spiders (6),
BLEXBot Spiders (1),
Exabot Spiders (1),
Galaxy Spiders (1),
Xenu Link Sleuth Spiders (2),
AboutUs:Bot Spiders (1),
MSNBot Spiders (1),
EasouSpider Spiders (3),
Alexa Archive Spiders (1),
Sucuri Spiders (5),
Ezooms Spiders (2),
Indy Library Spiders (1),
Wget Spiders (1),
Google Favicon Spiders (1),
Nutch Spiders (1),
Crawler4j Spiders (1),
Alexa Bot Spiders (3),
Yahoo! Slurp Spiders (1)


All times are GMT. The time now is 05:06 PM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.02011 seconds
  • Memory Usage 1,848KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (2)bbcode_code_printable
  • (9)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (1)pagenav_pagelinkrel
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (40)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete