PDA

View Full Version : Miscellaneous Hacks - Ban Spiders by User Agent


Pages : 1 2 [3] 4

ikorolis
05-08-2013, 10:50 AM
thanks
installed your mod

fxdigi-cash
06-07-2013, 08:10 AM
Thanks a lot for the great mod!

I will try it out and see how things go...

Cheers

Max Taxable
06-07-2013, 02:30 PM
What is your take on "MSIE 6"? I seem to also be getting quite a few hits from that browser as well.My personal take is, there aren't very many actual humans using it. It's almost always a botnet zombie computer.

And if it is a human using that dinosaur, I really don't want his/her traffic anyway.

bzcomputers
06-07-2013, 08:50 PM
My personal take is, there aren't very many actual humans using it. It's almost always a botnet zombie computer.

And if it is a human using that dinosaur, I really don't want his/her traffic anyway.

I came across this a couple weeks back:

http://www.ie6countdown.com/


It's a worldwide countdown Microsoft is doing tracking Internet Explorer 6 usage. They are tracking the percentage of users worldwide still using ie6.


Excluding China the percentage of users worldwide still using ie6 is much less than 1% and in China it is currently 24%. To me that is just one more reason to block "MSIE 6".

Max Taxable
06-08-2013, 09:53 PM
I came across this a couple weeks back:

http://www.ie6countdown.com/


It's a worldwide countdown Microsoft is doing tracking Internet Explorer 6 usage. They are tracking the percentage of users worldwide still using ie6.


Excluding China the percentage of users worldwide still using ie6 is much less than 1% and in China it is currently 24%. To me that is just one more reason to block "MSIE 6".Yup. Like I said, people who use that dinosaur aren't desirable to have on the site, and definitely aren't worth support for their archaic, garbage browser that should be erased from the web.

XGC Viper XI
09-13-2013, 07:23 PM
WARNING: For those that have that use the vBulletin Mobile Application, this plugin can and will prevent your app from being publish. if you have the UserAgent banned. I think it is the MSIE 6.

Solution: When you got to publish you app just disable this product until you have published you app. Then enable the product after words. If you have this active when trying to publish and you have it posting in a forum, look for the post that targes the API file. Then you will know what the UserAgent is that you have that is locking it down and preventing it from getting your site's information. Don't worry, when you go to publish you will know instantly.

fly
09-13-2013, 08:48 PM
Why on Earth would you ban the IE6 user agent?

Max Taxable
09-13-2013, 09:36 PM
Why on Earth would you ban the IE6 user agent?Because just about the only use for IE6 anymore is spambot networks, botnet zombie computers.

If some real, actual human is still using IE6 I don't want them on my site.
But, there really aren't.

DemOnstar
09-14-2013, 01:35 AM
Installed and testing.

One question, the pre-filled redirect url should be left intact?

Thanks

ForceHSS
09-14-2013, 07:38 AM
Installed and testing.

One question, the pre-filled redirect url should be left intact?

Thanks

You can change that if u want

DemOnstar
09-14-2013, 01:07 PM
You can change that if u want

Thank you... I will leave as is for the present...

Simon Lloyd
09-14-2013, 08:43 PM
You have the option to redirect to a site (i.e the one already installed) or directly back to the ip of the banned useragent, its all about choice really :)

DemOnstar
09-17-2013, 04:50 AM
How do I know if this is working?

Haven't seen any evidence so far...What do I look for?

Simon Lloyd
09-17-2013, 05:13 AM
it will take over 30 minutes to start to see differences in the WOL as the spiders get the message and a bit longer until they stop trying altogether.

The easiest way to see it working is to turn on writing to the log file, or if you dare have threads made in a forum of your choice, i advise against it as you can get thousands of posts quickly!!!!! it's only there for test purposes.

Max Taxable
09-17-2013, 01:58 PM
How do I know if this is working?

Haven't seen any evidence so far...What do I look for?Smoke. Smoke starts coming out of your hard drive. :D

Simon Lloyd
09-17-2013, 03:37 PM
Smoke. Smoke starts coming out of your hard drive. :DThats only if you're a power user :D

K4GAP
11-06-2013, 06:24 AM
Should I make any changes to my robot.txt file?
Right now it is blank.

Simon Lloyd
11-06-2013, 07:25 AM
Hi Gary this thread has nothing to do with robots.txt files, the mod bans anything whose useragent contains any string you enter in to it.

And as a standard you should have something in your robots file as you've been shown here https://vborg.vbsupport.ru/showthread.php?t=304164, there are many threads here that contain details of robots.txt.

K4GAP
11-06-2013, 08:47 AM
Oh, I guess I need to learn more about user agents and robots.

Thanks'

Simon Lloyd
11-06-2013, 09:13 AM
Hi Gary, all that you need to know about useragents...etc is in the thread description. Not all bots follow the robots.txt so, with this mod you can block those bots completely and many others. What you need to do is identify your target audience, so if you are not catering for China then you'd want to block Chinese traffic, to sort the bots out you can block the likes of Baidu Sogou....etc.

I'll try and help you with whatever you need along the way so that you get to keep your bandwidth for more important users :)

Toxic2
11-09-2013, 07:55 PM
Simon can you come up with a Viable List of Bad UA's more appropriate for 2013? MSIE1 blocked one of my members, so i deleted MSIE1 from the List.

Simon Lloyd
11-09-2013, 08:25 PM
Hi Toxic2, banning UAs is a personal thing, you have to choose which to ban or not, throughout this thread (and the other vb versions of this mod) there are lists, there's a maintained list of spiders a vb.com in a thread by Mosh so pretty much all of the crawlers will be listed there (a download from his site) and you just choose which you want to ban by unveiling their UA when they visit your site.

Max Taxable
11-09-2013, 09:55 PM
This Mod's default list is a pretty dog gone good list, but like Simon says it might not meet your needs. It depends on your target audience countries, your preferences, all that.

bzcomputers
11-09-2013, 10:09 PM
Simon can you come up with a Viable List of Bad UA's more appropriate for 2013? MSIE1 blocked one of my members, so i deleted MSIE1 from the List.

Removing MSIE1 is about the only thing that would definitely be recommended to remove from the old list for this mod. MSIE1 will block users on IE10 and IE11 (which means A LOT of users and growing daily). I haven't seen anything prior to IE5 try to access my sites in a long time, so removing MSIE1 should not negatively impact your site in any way either (but blocking IE10 & 11 will!).

As was mentioned above it is a personal choice on what you block. There are thousands of sites out there that don't block any and are doing just fine. You need to figure out what's important for your site. Being hit hundreds of times a day from bots from regions of the world you probably don't even cater too like China or countries of the former Soviet Union? or Blocking those bots and in turn reducing server load and load times for those users you do want to cater too?

For instance if you block IE6 (MSIE6) you will currently block over 20% of Chinese internet users (average everywhere else in the World is less than 1% using IE6). When you consider China has 1.3 billion people that is a lot of potential website traffic. At the same time China also accounted for nearly 40% of the world's computer hacking attack traffic during 2012. If you have content that is viable for Chinese users which can possibly turn into potential profit (sales / ad clicks) for you then blocking MSIE6 and chinese specific bots like baiduspider, iaskspider, sogou spider, Sosospider, YoudaoBot and others is probably not recommended.

There are plenty of good resources of information in this thread and other places on the net on useragents. Just remember copying someones elses useragent blocking list directly and not specifically taking a little time and tweaking it for your own needs is almost guaranteed to negatively impact your site traffic.

Here are a couple of additional resources:
http://www.useragentstring.com/pages/Crawlerlist/
http://www.user-agents.org/

Max Taxable
11-09-2013, 10:18 PM
Removing MSIE1 is about the only thing that would definitely be recommended to remove from the old list for this mod. MSIE1 will block users on IE10 and IE11 (which means A LOT of users and growing daily).I definitely agree with this.

Simon Lloyd
11-10-2013, 07:29 PM
If you really want to block MSIE1 then enter it exactly as that plus a leading space, if I remember rightly I've allowed it to accept spaces rather than ignore them, so doing it like that will not (or shouldn't) ban MSIE10.

Alan_SP
11-10-2013, 10:04 PM
I have similar problem, it might be solved now.

But anyway, just to ask it, what other than blocking MSIE 1 could block legitimate users in this list? If anyone knows:

AhrefsBot
www.archive.org
gooblogsearch
seoprofiler.com
WinHttp
exabot
commoncrawl
80legs
TITAN
Charlotte
http://labs.topsy.com/butterfly/
MSRBOT
Indy Library
Plukkie
Goo Blog Search
SeznamBot
T312461
PostRank
Tweetmeme.com
Twitterbot
WordPress
Windows Live SOEE
MLBot
PycURL
ScoutJet
Ezooms
Deepnet Explorer
Mail.Ru
Majestics MJ12bot
almaden
Anarchie
Artabus
ASPSeek
attach
autoemailspider
BackWeb
Baidu
Bandit
BatchFTP
BlackWidow
Bot\mailto:craftbot@yahoo.com
Buddy
bumblebee
CherryPicker
ChinaClaw
CICC
Collector
Copier
Copyscape
Crescent
DIIbot
DISCo
DISCo\Pump
dotbot
Download\Demon
Download\Wonder
Downloader
Drip
DSurf15a
eCatch
EasyDL/2.99
EirGrabber
email
EmailCollector
EmailSiphon
EmailWolf
Express\WebPictures
ExtractorPro
EyeNetIE
FileHound
FlashGet
FrontPage
GetRight
GetSmart
GetWeb!
gigabaz
GNIP
Go\!Zilla
Go!Zilla
Go-Ahead-Got-It
gotit
Grabber
GrabNet
Grafula
grub-client
HMView
HTTrack
httpdown
.*httrack.*
ia_archiver
Ichiro
Image\Stripper
Image\Sucker
Indy*Library
Indy\Library
InterGET
InternetLinkagent
Internet\Ninja
InternetSeer.com
Iria
JBH*agent
JetCar
JOC\Web\Spider
JustView
larbin
LeechFTP
LexiBot
lftp
Link*Sleuth
likse
//Link
LinkWalker
MSIE 2
MSIE 3
MSIE 4
MSIE 5
Mag-Net
Magnet
Magpie
Mass\Downloader
Memo
Microsoft.URL
MIDown\tool
Mirror
Mister\PiX
Mozilla.*Indy
Mozilla.*NEWT
Mozilla*MSIECrawler
MS\FrontPage*
MSFrontPage
MSIECrawler
MSProxy
Navroad
NearSite
NetAnts
NetMechanic
NetSpider
Net\Vampire
NetZIP
NICErsPRO
Ninja
Nutch
Octopus
Offline\Explorer
Offline\Navigator
omgili
Openfind
PageGrabber
Papa\Foto
pavuk
pcBrowser
Ping
PingALink
Pockey
psbot
Pump
QRVA
RealDownload
Reaper
Recorder
ReGet
Scooter
Seeker
Siphon
sitecheck.internetseer.com
SiteSnagger
SlySearch
SmartDownload
Snake
sogou
Soso
SpaceBison
speedy
Spinn3r
sproose
Stripper
Sucker
SuperBot
SuperHTTP
Surfbot
Szukacz
tAkeOut
Teleport\Pro
URLSpiderPro
Vacuum
VoidEYE
Web\Image\Collector
Web\Sucker
WebAuto
[Ww]eb[Bb]andit
webcollage
WebCopier
Web\Downloader
WebEMailExtrac.*
WebFetch
WebGo\IS
WebHook
WebLeacher
WebMiner
WebMirror
WebReaper
WebSauger
Website
Website\eXtractor
Website\Quester
Webster
WebStripper
WebWhacker
WebZIP
Wget
Whacker
Widow
WWWOFFLE
x-Tractor
Xaldon\WebSpider
Xenu
Yandex
Yeti
YOUDAOBOT
Zeus.*Webster
Zeus

Max Taxable
11-10-2013, 10:27 PM
what other than blocking MSIE 1 could block legitimate users in this list? If anyone knows:Blocking IE 1-6 won't block any legitimate humans, it will block 90% of the botnet zombie computers out there.

Alan_SP
11-11-2013, 11:40 PM
Actually, if you use string "MSIE 1" it blocks users of MSIE 10 and 11, as product looks for string anywhere, not as a whole word. And, if I'm not mistaken, to block only MSIE 1, you should use string MSIE 1,0.

I had many regular users who couldn't reach my site, probably all of them use MSIE 10 or 11.

Max Taxable
11-12-2013, 12:11 AM
Actually, if you use string "MSIE 1" it blocks users of MSIE 10 and 11, as product looks for string anywhere, not as a whole word. And, if I'm not mistaken, to block only MSIE 1, you should use string MSIE 1,0.

I had many regular users who couldn't reach my site, probably all of them use MSIE 10 or 11.The developer covered that here. (https://vborg.vbsupport.ru/showpost.php?p=2459790&postcount=526)

But I agree with you, there's really no reason to have IE1 on the ban list, there very likely isn't a functioning computer anywhere that is also online, that runs it.

IE6 is the biggest botnet zombie browser though, and there are MILLIONS of them still online.

qpurser
12-10-2013, 03:29 PM
I added "Majestics MJ12bot Spider" to the list some time ago but for some reason he keeps coming back and I can see he is viewing threads.
I added them as "majestics" and "Majestics MJ12bot Spider"

When they are on the list are they not banned and redirected automatically?

Avros
12-10-2013, 06:31 PM
Not all Bots will adhere to or obey the text file. If anything it will provide them with more information than you want them to have.

Only a minority of spiders adhere to those rules.

Max Taxable
12-10-2013, 06:55 PM
Not all Bots will adhere to or obey the text file. If anything it will provide them with more information than you want them to have.

Only a minority of spiders adhere to those rules.They can ignore "robots.txt" but they can't ignore this. It's not "robots.txt" it has teeth.

Here's my current blacklist:

SemrushBot
SeznamBot
http_requester
BIXOCRAWLER
KomodiaBot
QACC
Plukkie
botje
Opera/9.80
'Mozilla
CompSpyBot
ScreenerBot
Chrome/15
python-requests
ZumBot
Ruby
Add Catalog
Genieo
socialbm_bot
Ezooms
omgilibot
Go http package
JikeSpider
Python-urllib
Iron
sputnik
Xenu
Wotbox
200PleaseBot
360Spider
Indy Library
Sogou
SEOstats
baiduspider
beta.statsit.com
statsit
SiteIntel
Yandex
GomezAgent
Nesotebot
DCPbot
AOL Advertising R&D
DataCha0s
aiHitBot
Apache-HttpClient
Zend_Http_Client
ReverseGet
XXX bot Content
vBSEO
spbot
OffByOne
thyroidbuzz
AcoonBot
coccoc
xpymep
proxyproxy2884
AppEngine
start.exe
Semiocast HTTP client
Firefox/3.6.23
Firefox/3.6.3
TurnitinBot
curl
SwpLc/1.6
GrepNetstat.com
news bot
AskTbPTV
checks
panopta
App3le
PhantomJS
AlwaysOnline
SISTRIX
proximic
CRAWL-E/0.6.4
WebMoney
HTMLParser
oBot
UnisterBot
ERACrawler
MSIE 2
MSIE 3
MSIE 4
MSIE 5
MSIE 6
crawler4j
NCSA_Mosaic
Rippers
80legs
Firefox/3.5.6
YaBrowser
majestic
EasouSpider
User-Agent
FunWebProducts

I am not seeing anything from "majestic" in online.php. Or any of these for that matter.

Simon Lloyd
12-10-2013, 08:52 PM
What you need to do is check their UA when they are online, the chances are that what you've entered in the list is NOT in their UA that's why they still get through :*

Stevenwi
12-15-2013, 07:13 AM
Installed :)

sv1cec
12-31-2013, 11:10 AM
Marked as installed, but it messes up my vB 4.2.2 Suite after a while. So I removed it.

ozzy47
12-31-2013, 11:12 AM
Messes it up how, what you put is not going to help anyone debug your issues.

sv1cec
12-31-2013, 11:21 AM
Correct, apologies, but I wanted to remove it quickly, since users were complaining.

Well, after I install it, and I start getting emails from the hack, the normal layout of the page is messed up, and the site is shown in a very strange way, you could say that no formatting is done in the page, only characters shown.

I tried it twice and I had two friends email me saying that the layout was messed up. I checked and sure enough it was. I'll reinstall this and see if I can grab a shot of before and after.

ozzy47
12-31-2013, 11:29 AM
Yeah that is strange, I have not seen issues like that. Perhaps if it does it again, instead of un installing, just disable it, incase someone needs to debug it on your site.

sv1cec
12-31-2013, 11:39 AM
Reinstalled it and waiting.

One strange thing is that some of the emails take way too long to be sent. I just received an email which was sent about 70 minutes ago. Is that normal?

ozzy47
12-31-2013, 11:40 AM
That would be a issue with your server, not the mod if I am not mistaken.

sv1cec
12-31-2013, 11:57 AM
One question, regarding logging to a file. Shall the file be created by me, or is it created automagically by the mod? If I have to create it, what permissions are required?

Simon Lloyd
12-31-2013, 12:15 PM
The mod should not mess with your install, if there was a problem it would be from the off and not give issues later. With regards logging, the mod creates the file provided that you have permissions on your server to create the same.

Simon Lloyd
12-31-2013, 12:18 PM
With regards emails it will depend on forum usage and and your daily allowances if you're not on a dedicated box.

Please attach a txt file of the contents of your ban list an a pic of your settings, the issues you described are not or have not been reported and seem unlikely to have been caused by the mod but we'll try and help as much as possible.

spillage
01-05-2014, 02:16 AM
I'm running it on 4.2.2 just fine.

tanzeelniazi
01-14-2014, 07:45 PM
What is Redirect URL: which url can i add ? explain please
Redirect to spiders own ip? can i tick Yes ?
Spider List : Default Use Question: Baidu (Bing.com) Yandex: yahoo.com Its mean All traffic from Bing and yahoo Banned?

bzcomputers
01-14-2014, 08:25 PM
What is Redirect URL: which url can i add ? explain please
Redirect to spiders own ip? can i tick Yes ?
Spider List : Default Use Question: Baidu (Bing.com) Yandex: yahoo.com Its mean All traffic from Bing and yahoo Banned?

Redirect URL is the url spiders will be directed to if they are in your "Spider List". If you select "Yes" to redirect to the spider's own IP this will override any url you set beneath it. You use one or the other.

No, all spider traffic from Bing and Yahoo will not be banned they are different services and have different spiders. Yes, there is some overlap in technologies used by some search engines but this will not affect you being scanned and ranked by Bing and Yahoo.

I'm not sure why you made a direct correlation between Baidu (Chinese search engine) and Bing, then Yandex (Russian search engine) and Yahoo but they are all completely different services wholly owned by different companies ...for now.

Simon Lloyd
01-14-2014, 09:36 PM
What he ^^ said :-)

Max Taxable
01-14-2014, 11:24 PM
Its mean All traffic from Bing and yahoo Banned?No it does not. It means anything with USER AGENT STRING including Bing and Yahoo are blocked. Don't confuse this with hostnames or referrers - those aren't affected by this.

Simon Lloyd
01-15-2014, 12:13 AM
Just to be clear again BING is NOT Baidu and Yahoo is NOT Yandex, this mod will only ban a spider/bot/person if their useragent contains an entry that you have in your list.

lazytown
03-19-2014, 10:19 PM
Does this have an option to just block registrations?

ozzy47
03-19-2014, 11:24 PM
No, blocking the bot, in affect stops them from your site all together, which is what you want to do with bad bots. :)

Simon Lloyd
03-20-2014, 06:29 AM
When you've blocked a useragent they offender cannot even load a page on your site, so they cannot register or do anything.

ggrimes620
03-20-2014, 08:48 PM
I used to see upwards of about 100-200 bots/guests on my board at all times of the day. I installed Simon's MOD last week and it has practically eliminated all the bad bots from seeing my website! I only see legit bots/guests viewing my website now!

Very happy with this MOD!

Simon Lloyd
03-20-2014, 09:15 PM
Nice to hear its doing its job as well ad expected :-)

ozzy47
03-26-2014, 02:03 AM
Ok I got a question, every once in awhile, I notice bots that are in the list, showing up in WOL, it is not often, but it does happen.

180.76.5.24 - Whois
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

199.21.99.109 - Whois
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

Is it possible that at the exact moment those spiders are hitting that a cron is running, and allowing the bots to bypass the check?

Simon Lloyd
03-26-2014, 02:18 AM
The bots aren't bypassing the check and the mod is live, it doesn't use cron, the mod runs at every page call. You'll see them now and then as they do actually have to make a call for a page of some sort in order to be checked then redirected, it may be at this moment of redirection that they get on the WOL list, but as you are aware the list isn't live and is subject to the timeout period you set in admincp (mines set for 900 seconds), one other reason you could see them is if you use a mod for who visited ...etc like Paul M's one, they will always show there as his mod is doing it's job recording them but mine is equally doing it's job by transporting them anwhere but they're intended page :)

ozzy47
03-26-2014, 02:26 AM
I know they show in his mod, but it is just strange seeing them show up in online.php page.

I remember back in 2011 when I beta tested this for you, that never happened. :confused:

Max Taxable
03-26-2014, 03:14 AM
The bots aren't bypassing the check and the mod is live, it doesn't use cron, the mod runs at every page call. You'll see them now and then as they do actually have to make a call for a page of some sort in order to be checked then redirected, it may be at this moment of redirection that they get on the WOL list, but as you are aware the list isn't live and is subject to the timeout period you set in admincp (mines set for 900 seconds), one other reason you could see them is if you use a mod for who visited ...etc like Paul M's one, they will always show there as his mod is doing it's job recording them but mine is equally doing it's job by transporting them anwhere but they're intended page :)I use that Mod on two boards and never, ever saw a spider on this list, appear there. I have also never seen a spider on this list, showing up in WoL.

I am on vB3.8 though..... And they do not and never have shown up in Paul's Mod.

ozzy47
03-26-2014, 11:42 PM
Strange, they have always showed in Paul's mod for me. Which is fine, I just don't get the WOL deal.

Visitors (359),
Logged In (6),
Google Spiders (51),
Majestics MJ12bot Spiders (12),
Bing Spiders (12),
Whois Source Spiders (1),
Baidu Spiders (111),
A Bot Spiders (1),
Wayback Machine Spiders (15),
Squider Spiders (29),
AhrefsBot Spiders (6),
BLEXBot Spiders (1),
Exabot Spiders (1),
Galaxy Spiders (1),
Xenu Link Sleuth Spiders (2),
AboutUs:Bot Spiders (1),
MSNBot Spiders (1),
EasouSpider Spiders (3),
Alexa Archive Spiders (1),
Sucuri Spiders (5),
Ezooms Spiders (2),
Indy Library Spiders (1),
Wget Spiders (1),
Google Favicon Spiders (1),
Nutch Spiders (1),
Crawler4j Spiders (1),
Alexa Bot Spiders (3),
Yahoo! Slurp Spiders (1)

Delphiprogrammi
04-12-2014, 07:47 PM
Hi,

I have the impression this hack is blocking pingdom bots (i'm a pingdom customer)


Ping
PingALink


Pingdoms uses this useragent "Pingdom.com_bot_version_1.4_(http://www.pingdom.com)"

Max Taxable
04-12-2014, 07:48 PM
Hi,

I have the impression this hack is blocking pingdom bots (i'm a pingdom customer)


Ping
PingALink


Pingdoms uses this useragent "Pingdom.com_bot_version_1.4_(http://www.pingdom.com)"Then just, remove those from the definitions in options.

Although, hard to see how those definitions are blocking that UA.

Simon Lloyd
04-13-2014, 04:48 AM
"Ping" would certainly block pingdom, PingAlink would not.

ozzy47
09-06-2014, 09:50 PM
I had the Create New Thread turned off for about a year now, and I just re-enabled it, and for some unexplainable reason, it refuses to write a new thread when a bot visits the site. Also when I enable that option it stops writing to the log, I am puzzled. :confused:

Yes I have it set to a valid userid, and a valid forumid.

Simon Lloyd
09-07-2014, 03:22 AM
Are you using any kind of extra caching mod or optimising mod? Dbtech vboptomise can cause anomalies like this, an uninstall then reinstall could fix it, I suggest before you reinstall you change the version number to say 3.0.4 just to prompt a datastore change.

ozzy47
09-07-2014, 10:45 AM
Yeah using the vBoptimize mod.

I changed it up a bit, instead of adding a thread, I have it writing to the DB now, so I can pull the number of blocked bots/spiders from there. :)

Simon Lloyd
09-08-2014, 07:01 AM
Just as long as you're only logging the count as other entries...etc could bloat your database ein no time at all let alone the number of calls to the database could impede a users experience on a large board. :)

ozzy47
09-08-2014, 08:56 AM
Yeah I am going to have to monitor it and see how it goes. :)

Black Snow
09-15-2014, 08:51 AM
Is there a new more up-to-date list of spiders to ban?

Black Snow
09-15-2014, 08:52 AM
Yeah using the vBoptimize mod.

I changed it up a bit, instead of adding a thread, I have it writing to the DB now, so I can pull the number of blocked bots/spiders from there. :)

Could you share the code to store them to the database?

Max Taxable
09-15-2014, 01:02 PM
Is there a new more up-to-date list of spiders to ban?Lists for this and "Stop Hostnames from Registering" here. (http://ozzmodz.com/showthread.php/506-The-Era-Of-Big-Spam-Is-Over)

ozzy47
09-15-2014, 11:37 PM
Could you share the code to store them to the database?

Well it is not something that is easily done, it requires a new table in the DB, a edit to the plugin for this mod, a new plugin to render the template to display the info and a new template created.

Black Snow
09-16-2014, 10:10 AM
Well it is not something that is easily done, it requires a new table in the DB, a edit to the plugin for this mod, a new plugin to render the template to display the info and a new template created.

Thanks for the reply ozzy. I am capable of doing the necessary adds/edits if you are willing to share them?

ozzy47
09-16-2014, 10:20 AM
Give me some time to put it together, and I will post the instructions on my site. I should be able to have it together by this weekend.

Black Snow
09-16-2014, 10:31 AM
Thank you very much ozzy. I'd like your post but you know how it goes lol.

ozzy47
09-16-2014, 10:52 AM
LOL, yes I am all to familiar with that. :)

Black Snow
09-25-2014, 08:54 PM
Any luck with this ozzy47?

ozzy47
09-25-2014, 09:11 PM
Not just yet, got a bit tied up. I'll try to put instructions together this weekend.

Black Snow
10-08-2014, 08:21 AM
Any update ozzy my man?

Gadget_Guy
11-23-2014, 10:27 PM
Ever since putting this in place my users who use Opera are complaining they can't connect to the site.

Any thoughts?

D.

ozzy47
11-23-2014, 10:29 PM
What is in your list?
What is one of the users complaining useragent string?

Gadget_Guy
11-23-2014, 10:33 PM
This is my list:

(Deleted)

What do you mean "What is one of the users complaining useragent string?"

Tell me how to get this info for you

D.

ozzy47
11-23-2014, 10:36 PM
In your online.php, set User Agent: to Yes

Then copy the useragent string of one of the members that was complaining, and paste it here.

Max Taxable
11-23-2014, 10:44 PM
What do you mean "What is one of the users complaining useragent string?"

Tell me how to get this info for you

D.Helpful little template edit makes it shown all the time:


https://vborg.vbsupport.ru/showthread.php?t=306009

Remove this from your list:

Opera/9.80

ozzy47
11-23-2014, 10:45 PM
Yep, that would do it. :)

You should also edit your above post and remove the list, don't need everyone to see it. :)

Gadget_Guy
11-23-2014, 11:11 PM
Thanks Guys!

Many thanks!

My site is getting HAMMERED right now by bots.... need to get the server loads down...

Fingers crossed that this will do the trick.

(I also updated my robots.txt with some additional stuff)

- For security reasons... would either of you be willing to review it via PM?

D.

ozzy47
11-23-2014, 11:12 PM
What is it you need reviewed?

Gadget_Guy
11-23-2014, 11:27 PM
Just a sanity check of my robots.txt file I created.

D.

ozzy47
11-23-2014, 11:31 PM
Sure, you can PM it to me, but TBH it's not gonna stop bad bots, as they ignore it.

Simon Lloyd
11-24-2014, 04:55 AM
Remember that this mod will only ban useragents that have the word or entry you have entered in the ban list, you can quite freely put your ban list from the mod in a txt file and attach it to a post and we'll help you with it.

tanzeelniazi
12-04-2014, 04:53 PM
Apache Synapse ESB Spider is a Bad or Good ? i think is not good because so many IP i see.
How to banned via Spider ban ?
I add this type
almaden
Apache Synapse ESB
Apache Synapse ESB Spider
Synapse
Anarchie
AhrefsBot
Artabus
ASPSeek
Useragent is Mozilla/4.0 (compatible; Synapse)
Add spider in my banned list but still showing Apache spiders :( why ?

Simon Lloyd
12-04-2014, 05:22 PM
Is that all that is in their useragent?

Simon Lloyd
12-04-2014, 06:30 PM
You can block that entire string or try blocking this Synapse-HttpComponents-NIO

The Apache Synapse is an abused proxy, more ofetn than not it's used to check for vulnerabilities against different attacks like DDOS and sql injection.

ozzy47
12-05-2014, 10:55 AM
TBH all you need to add is Synapse. I have only that in the block list, and it is successful at blocking all of them.

Remember, if you add it to the list, it will take as much time as you have set in vBoptions for them to disappear, from the online list, as you have set for the timeout.

So if your timeout is set to 15 min, and you add a bot, it will take at least that long before they no longer appear in who's online. :)

tanzeelniazi
12-05-2014, 04:51 PM
i added like this m correct ??
if i am wrong how to add full string in spider list

Gadget_Guy
12-06-2014, 02:48 AM
Why is the darn Baidu spider so hard to stop?

I have the mod installed, have had it running for weeks... but this darn spider keeps infiltrating the site slowing things down.

It is in my list.... but if you look at who's online... there they are....

(attached txt of my spider list)

(also... based on what you are seeing... should I be adding any of the others seen in the screen shot to the list? if so... what are the entries?)

Thanks,

D.

Max Taxable
12-06-2014, 02:53 AM
Why is the darn Baidu spider so hard to stop?

I have the mod installed, have had it running for weeks... but this darn spider keeps infiltrating the site slowing things down.

It is in my list.... but if you look at who's online... there they are....

(attached txt of my spider list)


(also... based on what you are seeing... should I be adding any of the others seen in the screen shot to the list? if so... what are the entries?)

Thanks,

D.It is a weird thing Ozzy was having on his site too, and I just assumed it was a v4 thing since I had no occurrences of Baidu at all on my boards, which were all 3.8s.

I bought a existing v4 though, and haven't seen a baidu since I installed this Mod. So I'm really clueless as to how they get through on you and Ozz.

CAG CheechDogg
12-06-2014, 04:07 AM
The Baidu spider can take up to a couple if not few months to completely disappear and actually obey the no crawl rule when adding this mod or even blocking it through your robots.txt ... I have it blocked everywhere and it took maybe 3 days before I didn't see it again ... the best thing to do for me was also add a huge IP block to my htaccess file that completely blocks all of China and a couple other Asian countries from accessing my site ...

Simon Lloyd
12-06-2014, 09:54 AM
Why is the darn Baidu spider so hard to stop?

I have the mod installed, have had it running for weeks... but this darn spider keeps infiltrating the site slowing things down.

It is in my list.... but if you look at who's online... there they are....

(attached txt of my spider list)


(also... based on what you are seeing... should I be adding any of the others seen in the screen shot to the list? if so... what are the entries?)

Thanks,

D.They are not slowing your site down and not actually getting anywhere, you are using Paul M's who has visited mod, both his mod and mine are working fine, his mod registers them at the time of call for page load as mine redirects them at the same time :)

Simon Lloyd
12-06-2014, 09:55 AM
i added like this m correct ??
if i am wrong how to add full string in spider list
http://awesomescreenshot.com/0a93z97i2bThat appears correct.

Simon Lloyd
12-06-2014, 09:56 AM
The Baidu spider can take up to a couple if not few months to completely disappear and actually obey the no crawl rule when adding this mod or even blocking it through your robots.txt ... I have it blocked everywhere and it took maybe 3 days before I didn't see it again ... the best thing to do for me was also add a huge IP block to my htaccess file that completely blocks all of China and a couple other Asian countries from accessing my site ...
Believe it or not you can actually go to their site and ask them not to crawl your site :)

tanzeelniazi
12-06-2014, 10:38 AM
That appears correct.

you say i am correct but i see also these spiders in whois online :( why ? also i see i link showing like this
showthread?=1' ???
i mean 1 spider see this link showthread?=1' ???

Simon Lloyd
12-06-2014, 01:32 PM
Where are you seeing that? make sure that your list of bad bots has no leading or trailing spaces on each name. If you are still having trouble you can pm me temporary admin login details with rights to administer plugins and i'll take a look :)

Max Taxable
12-06-2014, 06:20 PM
The Baidu spider can take up to a couple if not few months to completely disappear and actually obey the no crawl rule when adding this mod or even blocking it through your robots.txt ... I have it blocked everywhere and it took maybe 3 days before I didn't see it again ... the best thing to do for me was also add a huge IP block to my htaccess file that completely blocks all of China and a couple other Asian countries from accessing my site ...It's not a no crawl rule, it is a outright block, with this mod. There's no obeying or disobeying. It's not robots.txt. When you install this mod and with baidu on the list, it should be blocked. Gone.

I believe for the incidents of it still appearing in v4s after this mod is installed, must have something to do with interference from some other mod. Else, how to explain my v4 getting NO appearances by baidu after I installed this?

Max Taxable
12-06-2014, 06:55 PM
I'll add to the above - I suspect a hook conflict actually. One that only happens when some other mod calls the same hook, and it's not all the time because from what we saw at OzzModz, Baidu was greatly reduced in appearances by this mod, just not totally gone. Occasionally one or two of them slip through during the time of the hook conflict.

It doesn't happen in v3 at all, for the same reason I believe.

Gadget_Guy
12-06-2014, 07:34 PM
Anything in particular I can look for in terms of a hook that might conflict?

Would a list of my mods help?

D.

Max Taxable
12-06-2014, 08:06 PM
In v3 and v4 this mod calls "style_fetch."

CAG CheechDogg
12-06-2014, 08:20 PM
Well I haven't had Baidu appear on my site in over 2 years ... way before I even installed this mod which has helped me a lot regardless of how it does it lol ... my point is that "I" haven't had Baidu for a very long time ...

Max Taxable
12-06-2014, 08:31 PM
Well sure if you block China and other countries via .htaccess you probably won't see Baidu.

I used to have a massive .htaccess with country blocks.

Simon Lloyd
12-07-2014, 04:50 AM
banning by .htaccess is fine if you only have a few things in it because it is read with every single server request, so if you have 10 blocks in your .htaccess and lets say you have a web page with 30 elements (icons, css, containers, includes.....etc) then each one of those that tries to access that page has 30 checks made just to load that page.

Now consider your own landing page and check how many things load to make that page up and you'll soon see why having a lot of bans in your .htaccess can be detrimental particularly if you are on shared hosting or limited vps.

@Gadget_Guy & Max Taxable
The hook is style_fetch, you can try changing the hook for one of the others that loads before all the others but you may not see the result your looking for, doesn't hurt to try :)

ozzy47
12-07-2014, 07:52 AM
Yeah I would try and stay away from ip blocking totally.

Ha Ha Ha, I have been testing this out for the past 10hrs or so Simon, to early to tell yet, but so far looking good.

princesspepper
12-07-2014, 10:49 AM
Installed on VB4.2.2 PL2.

One question that I can't seem to find in the first few posts of this thread that usually explain stuff.... Why would you choose to redirect the bot back to itself? What function does this have over redirecting to a url?

ozzy47
12-07-2014, 10:52 AM
It really makes no difference where you send them, it is just user choice. :)

Simon Lloyd
12-07-2014, 12:10 PM
Installed on VB4.2.2 PL2.

One question that I can't seem to find in the first few posts of this thread that usually explain stuff.... Why would you choose to redirect the bot back to itself? What function does this have over redirecting to a url?For me it was giving them a taste of their own medicine, they drain our resources so we send them back to drain theirs :)

ozzy47
12-07-2014, 08:47 PM
The hook is style_fetch, you can try changing the hook for one of the others that loads before all the others but you may not see the result your looking for, doesn't hurt to try :)

Ha Ha Ha, I have been testing this out for the past 10hrs or so Simon, to early to tell yet, but so far looking good.

Well so far it seems to be going as planned, I will wait another 24 - 48 hrs, and if it is working, I'll let you know exactly what i did. Which hook I used, and what additional plugin I added. :)

Gadget_Guy
12-07-2014, 08:52 PM
Whoot!

Looking forward to your findings Ozzy!

D.

CAG CheechDogg
12-07-2014, 11:05 PM
banning by .htaccess is fine if you only have a few things in it because it is read with every single server request, so if you have 10 blocks in your .htaccess and lets say you have a web page with 30 elements (icons, css, containers, includes.....etc) then each one of those that tries to access that page has 30 checks made just to load that page.

Now consider your own landing page and check how many things load to make that page up and you'll soon see why having a lot of bans in your .htaccess can be detrimental particularly if you are on shared hosting or limited vps.

@Gadget_Guy & Max Taxable
The hook is style_fetch, you can try changing the hook for one of the others that loads before all the others but you may not see the result your looking for, doesn't hurt to try :)

I have had the ip blocks in my htaccess for over 5 years my Man and I haven't ran into any problems in those 5 years ..

If an IP is blocked on your server it's not allowing the page or any page to load, so I am a bit confussed about "so if you have 10 blocks in your .htaccess and lets say you have a web page with 30 elements (icons, css, containers, includes.....etc) then each one of those that tries to access that page has 30 checks made just to load that page."


As a matter of fact, when I didn't have these IP blocks in my htaccess file I was constantly getting emails from my host that my site was being suspended ... by blocking these IPs I am keeping them from even accessing anything on my website or forums ...thus the usage of resources went down ...

Max Taxable
12-07-2014, 11:24 PM
Installed on VB4.2.2 PL2.

One question that I can't seem to find in the first few posts of this thread that usually explain stuff.... Why would you choose to redirect the bot back to itself? What function does this have over redirecting to a url?Just don't redirect to any of your own pages - feedback loop danger.

EDIT TO ADD: I was right about the hook conflict with some other mod(s) Ozzy?

ozzy47
12-07-2014, 11:30 PM
I would not say a conflict, but perhaps a better hook to execute the mod. That is if the testing continues to provide the desired results.

Simon Lloyd
12-08-2014, 06:48 AM
I have had the ip blocks in my htaccess for over 5 years my Man and I haven't ran into any problems in those 5 years ..

If an IP is blocked on your server it's not allowing the page or any page to load, so I am a bit confussed about "so if you have 10 blocks in your .htaccess and lets say you have a web page with 30 elements (icons, css, containers, includes.....etc) then each one of those that tries to access that page has 30 checks made just to load that page."


As a matter of fact, when I didn't have these IP blocks in my htaccess file I was constantly getting emails from my host that my site was being suspended ... by blocking these IPs I am keeping them from even accessing anything on my website or forums ...thus the usage of resources went down ...

I agree in part, when you didnt have the block they were calling on every resource...php, mysql, cpu and ram, with the block they pretty much are just using ram as cpu and php time and response is minmal and as you are not loading anything else the ram isn't being maxed either. If you have whole country blocks that doesn't take as much checking as full octet ips like 192.161.0.1, if you have plenty of those then they are checked against each request, if you are blocking just 192.161 then its just one check against each request.

Im probably not explaining myself too well (it reads much better in my head :)).

Simon Lloyd
12-08-2014, 06:52 AM
I would not say a conflict, but perhaps a better hook to execute the mod. That is if the testing continues to provide the desired results.Just bear in mind that any other hook you choose will need to be sufficient to perform the other tasks of the mod if you wished like sending the email or creating the threads. Some of the other runtime hooks will give errors or not work as expected especially with the thread creation, also keep in mind you need to redirect them before anything has loaded as it's this that is the basis of the mod - keeping resources for your members and not the bots :)

CAG CheechDogg
12-08-2014, 06:55 AM
I agree in part, when you didnt have the block they were calling on every resource...php, mysql, cpu and ram, with the block they pretty much are just using ram as cpu and php time and response is minmal and as you are not loading anything else the ram isn't being maxed either. If you have whole country blocks that doesn't take as much checking as full octet ips like 192.161.0.1, if you have plenty of those then they are checked against each request, if you are blocking just 192.161 then its just one check against each request.

Im probably not explaining myself too well (it reads much better in my head :)).


No my Man, you actually are explaining yourself very well lol ...

All I know is that I have not had any negative effects from doing the blocks and I also have just a list of single IPs ..... and let me tell you , that list is long as hell lol ....

princesspepper
12-08-2014, 06:57 AM
Just don't redirect to any of your own pages - feedback loop danger.

EDIT TO ADD: I was right about the hook conflict with some other mod(s) Ozzy?
Thanks, but I'm still unsure what the benefit would be to redirect back to the source. Would it make them aware you don't want them sooner?

Simon Lloyd
12-08-2014, 08:42 AM
Thanks, but I'm still unsure what the benefit would be to redirect back to the source. Would it make them aware you don't want them sooner?It really doesn't matter, they are redirected with a 301 which is a permanent redirect, so they will always see the url they tried to crawl as the one you send them to. Like i said, i coded that in to send them back to themselves so they have less resources to be crawling other peoples sites - it's only fair! :)

Alan_SP
12-08-2014, 09:17 PM
the best thing to do for me was also add a huge IP block to my htaccess file that completely blocks all of China and a couple other Asian countries from accessing my site ...

Would you share you CIDR list? Not in this thread, but maybe make a new thread?

princesspepper
12-08-2014, 10:14 PM
i coded that in to send them back to themselves so they have less resources to be crawling other peoples sites - it's only fair! :)
Thanks, that is all I wanted to know. :)

Simon Lloyd
12-09-2014, 03:47 AM
Princesspepper could you mark this as installed please :)

CAG CheechDogg
12-09-2014, 03:49 AM
Would you share you CIDR list? Not in this thread, but maybe make a new thread?

You can get it here my Man : https://vborg.vbsupport.ru/showthread.php?t=302628&highlight=%23+Cambodia+%28KH%29+deny+from+114.134. 184.0%2F21

Gadget_Guy
12-09-2014, 06:18 PM
Hey Ozzy,

Are we any closer to an alternative or modification to this so that we can get better blocking in place?

I am willing to test on my site as I am still getting hit hard by spiders even with the mod in place.

D.

Simon Lloyd
12-09-2014, 07:33 PM
Hey Gadget Guy, no disrespect but the mod here is mine and isn't marked as reusable code, Ozzy may post what he's tried or done but wont necessarily be added to this mod, however Ozzy has developed one like this with other measures, you can get it at his site.

If you are being hit by spiders with this mod in place it will be because there is an anomaly in your list, this list isn't exhaustive but here's a few reasons why:
Entry in list has a leading or trailing space
Entry has a typo of some sort
Entry doesn't actually represent the bot you think it does (i.e Ahrefsbot I believe has a different name in the UA)
Mod run order may conflict with another mod using the same hook
There are other reasons but those should get you going! :)

CAG CheechDogg
12-09-2014, 07:56 PM
Crazy how a leading or trailing space can jack things up lol .. I remember trying to block facebook completely when I first installed your mod here Simon and it wasn't working ..why? the damn trailing space!! lol

Gadget_Guy
12-09-2014, 08:14 PM
Sorry Simon,

I meant no disrespect... I thought you guys were working together.

My apologies.

Could I send you my list via PM and a screenshot of what I am seeing in terms of the spiders appearing?

D.

ozzy47
12-09-2014, 08:44 PM
Hey Gadget Guy, no disrespect but the mod here is mine and isn't marked as reusable code, Ozzy may post what he's tried or done but wont necessarily be added to this mod, however Ozzy has developed one like this with other measures, you can get it at his site.

If you are being hit by spiders with this mod in place it will be because there is an anomaly in your list, this list isn't exhaustive but here's a few reasons why:
Entry in list has a leading or trailing space
Entry has a typo of some sort
Entry doesn't actually represent the bot you think it does (i.e Ahrefsbot I believe has a different name in the UA)
Mod run order may conflict with another mod using the same hook
There are other reasons but those should get you going! :)

No I have not developed a mod similar to this. I recommend this mod to everyone. If you remember correctly, I was one of your beta testers way back in the day.

My actual intention with what I have done, was to PM it to you first, and get your input, before I said anything to anyone else, or maybe you update the mod with it.

So let me know what you want me to do. :)

Alan_SP
12-09-2014, 09:50 PM
You can get it here my Man : https://vborg.vbsupport.ru/showthread.php?t=302628&highlight=%23+Cambodia+%28KH%29+deny+from+114.134. 184.0%2F21

Thank you, I tried to like your post, but there's a limit in this.

Anyway, I see you put this in htaccess, I think it's better added to firewall rules for denying hosts. It works much faster. Of course, if you have access to firewall in that way.

ozzy47
12-09-2014, 09:52 PM
Yeah, if you have a dedi, or a vps, you would add it to the firewall there, stop them cold.

Simon Lloyd
12-10-2014, 04:35 AM
No I have not developed a mod similar to this. I recommend this mod to everyone. If you remember correctly, I was one of your beta testers way back in the day.

My actual intention with what I have done, was to PM it to you first, and get your input, before I said anything to anyone else, or maybe you update the mod with it.

So let me know what you want me to do. :)I thought you had developed something that's why planned improvements for this have been shelved (no point in duplication :)), in that case Ozzy PM away :)

Simon Lloyd
12-10-2014, 04:37 AM
Sorry Simon,

I meant no disrespect... I thought you guys were working together.

My apologies.

Could I send you my list via PM and a screenshot of what I am seeing in terms of the spiders appearing?



D.
Yes of course :)

Max Taxable
12-10-2014, 04:45 AM
I thought you had developed something that's why planned improvements for this have been shelved (no point in duplication :)), in that case Ozzy PM away :)No sir we have been trying to solve the mystery of why Baidu gets through on some v4 installations, but not all and never a v3, and my hook conflict idea opened a new can of worms for investigation, and Ozz found something very interesting.

ForceHSS
12-10-2014, 08:43 AM
No sir we have been trying to solve the mystery of why Baidu gets through on some v4 installations, but not all and never a v3, and my hook conflict idea opened a new can of worms for investigation, and Ozz found something very interesting.
What interesting thing was found

CAG CheechDogg
12-10-2014, 09:21 AM
Yeah ...Yeah .. what "something very interesting" do you speak of .....

Black Snow
12-10-2014, 10:29 AM
No sir we have been trying to solve the mystery of why Baidu gets through on some v4 installations, but not all and never a v3, and my hook conflict idea opened a new can of worms for investigation, and Ozz found something very interesting.

I never looked before but it is also getting through on my site.

CAG CheechDogg
12-10-2014, 10:54 AM
Baidu has kissed my "gluteus maximus" for almost 2 years and change if not more ... So has Yandex and a handful of others as well ... I must have a "magic" forum :)

ozzy47
12-10-2014, 11:04 AM
I am re working a couple of things, and then need to test further, I can then share my findings with Simon. :)

Gadget_Guy
12-10-2014, 01:14 PM
I would be happy to test on my site if it helps the community.

D.

Max Taxable
12-10-2014, 03:55 PM
What interesting thing was foundI don't wanna flap it, already flapped too much. But I am pretty sure the problem is solved.

Gadget_Guy
12-11-2014, 02:27 AM
If this helps anyone.... this is a list of what I am seeing in terms of spiders on my site with this installed.

Bing Spiders (6),
Google Favicon Spiders (9),
Proximic Spiders (135),
Baidu Spiders (175),
WinHTTP Spiders (12),
Facebook Spiders (20),
Google AdSense Spiders (7),
Magpie Spiders (9),
linkdexbot/2.0 Spiders (7),
AhrefsBot Spiders (14),
Coccoc Spiders (2),
Google AppEngine Spiders (6),
Google Spiders (40),
Sucuri Spiders (3),
Twitterbot Spiders (4),
Google FeedFetcher Spiders (3),
Apple RSS Spiders (1),
WordPress.com mShots Spiders (1),
Google Web Preview Spiders (3),
Grapeshot Spiders (2),
James BOT WebCrawler Spiders (5),
Netseer crawler/2.0 Spiders (2),
Google Images Spiders (3),
Galaxy Spiders (2),
Feedly Spiders (2),
DotBot Spiders (1),
Yahoo! Slurp Spiders (1),
360Spider Spiders (4),
Netcraft Web Server Survey Spiders (1),
NerdyBot Spiders (2),
Exabot Spiders (1),
Integrity Bot Spiders (1),
ContextAd Bot Spiders (2),
Twitturls.com (Python-urllib) Spiders (1)

I am happy to supply any information that you may find useful to assist in the work you are doing.

D.

Simon Lloyd
12-11-2014, 05:34 AM
I need a snapshot of your settings for the mod as there is no way all those being entered in the mod would get past the mod!

CAG CheechDogg
12-11-2014, 08:09 AM
This is a snapshot of the spiders that are showing up in the whos online:

https://vborg.vbsupport.ru/external/2014/12/30.jpg

What exactly do you need a snapshot in the settings Simon?

This is my list of spiders I have banned with your mod:

almaden
Anarchie
Artabus
ASPSeek
attach
autoemailspider
BackWeb
Baidu
Bandit
BatchFTP
BlackWidow
Bot\mailto:craftbot@yahoo.com
Buddy
bumblebee
CherryPicker
ChinaClaw
CICC
Collector
Copier
Copyscape
Crescent
DIIbot
DISCo
DISCo\Pump
dotbot
Download\Demon
Download\Wonder
Downloader
Drip
DSurf15a
eCatch
EasyDL/2.99
EirGrabber
email
EmailCollector
EmailSiphon
EmailWolf
Express\WebPictures
ExtractorPro
EyeNetIE
FileHound
FlashGet
FrontPage
GetRight
GetSmart
GetWeb!
gigabaz
GNIP
Go\!Zilla
Go!Zilla
Go-Ahead-Got-It
gotit
Grabber
GrabNet
Grafula
grub-client
HMView
HTTrack
httpdown
.*httrack.*
ia_archiver
Ichiro
Image\Stripper
Image\Sucker
Indy*Library
Indy\Library
InterGET
InternetLinkagent
Internet\Ninja
InternetSeer.com
Iria
JBH*agent
JetCar
JOC\Web\Spider
JustView
larbin
LeechFTP
LexiBot
lftp
Link*Sleuth
likse
//Link
LinkWalker
Mag-Net
Magnet
Magpie
magpie
Mass\Downloader
Memo
Microsoft.URL
MIDown\tool
Mirror
Mister\PiX
Mozilla.*Indy
Mozilla.*NEWT
Mozilla*MSIECrawler
MS\FrontPage*
MSFrontPage
MSIECrawler
MSProxy
Navroad
NearSite
NetAnts
NetMechanic
NetSpider
Net\Vampire
NetZIP
NICErsPRO
Ninja
Nutch
Octopus
Offline\Explorer
Offline\Navigator
omgili
Openfind
PageGrabber
Papa\Foto
PaperLiBot
pavuk
pcBrowser
Ping
PingALink
Pockey
psbot
Pump
QRVA
RealDownload
Reaper
Recorder
ReGet
Scooter
Seeker
Siphon
sitecheck.internetseer.com
SiteSnagger
SlySearch
SmartDownload
Snake
sogou
Soso
SpaceBison
speedy
Spinn3r
sproose
Stripper
Sucker
SuperBot
SuperHTTP
Surfbot
Szukacz
tAkeOut
Teleport\Pro
URLSpiderPro
Vacuum
VoidEYE
Web\Image\Collector
Web\Sucker
WebAuto
[Ww]eb[Bb]andit
webcollage
WebCopier
Web\Downloader
WebEMailExtrac.*
WebFetch
WebGo\IS
WebHook
WebLeacher
WebMiner
WebMirror
WebReaper
WebSauger
Website
Website\eXtractor
Website\Quester
Webster
WebStripper
WebWhacker
WebZIP
Wget
Whacker
Widow
WWWOFFLE
x-Tractor
Xaldon\WebSpider
Xenu
Yandex
Yeti
YOUDAOBOT
Zeus.*Webster
Zeus
baiduspider
beta.statsit.com
statsit
SiteIntel
Yandex
GomezAgent
FunWebProducts
Nesotebot
DCPbot
AOL Advertising R&D
DataCha0s
aiHitBot
Apache-HttpClient
Zend_Http_Client
ReverseGet
XXX bot Content
vBSEO
spbot
OffByOne
thyroidbuzz
AcoonBot
coccoc
xpymep
proxyproxy2884
AppEngine
start.exe
Semiocast HTTP client
Firefox/3.6.23
TurnitinBot
curl
SwpLc/1.6
GrepNetstat.com
news bot
AskTbPTV
checks
panopta
App3le
PhantomJS
AlwaysOnline
SISTRIX
proximic
CRAWL-E/0.6.4
WebMoney
Maxthon
HTMLParser
oBot
UnisterBot
ERACrawler
Butterfly
Topsy
Butterfly Topsy Crawler
Ezooms
Deepnet
Alexa
Bitlybot
Seznam
Fulltext
Facebook
Sunrise Communications AG
crawl
Crawl
MJ12bot
Bimbot
Snapbot
thunderstone
Thunderstone
grub-client
Bing
MSN
OOZBOT
Wayback Machine
Crowsnest Spider
FlipboardProxy
Feedly

Gadget_Guy
12-11-2014, 10:48 AM
Here is my stuff:

Simon Lloyd
12-11-2014, 05:06 PM
Hi Gadget Guy, remove the second picture as it has your email address in it. I see the settings are ok, now can you just copy the list as you have it (copy straight out of the textbox in the mod) sitck it in a wordpad document, zip it and attach it here so i can check that please.

Gadget_Guy
12-11-2014, 05:12 PM
That is what the spiders.txt file I attached is.

D.

Simon Lloyd
12-11-2014, 05:17 PM
This is a snapshot of the spiders that are showing up in the whos online:

https://vborg.vbsupport.ru/external/2014/12/30.jpg

What exactly do you need a snapshot in the settings Simon?

This is my list of spiders I have banned with your mod:

..........................................There's one or two duplicates there but that doesn't matter, however rather than ban baiduspider just ban baidu, you'll have more luck with that as not all baiduspiders have the entire name in the UA, that goes for most of the bots, lets say there's a bot called Simon Lloyd Crawl Everything Everywherespider then the following will ban it:
Simon or Lloyd or Crawl or Spider.....etc
The same goes for:
Lloyd Crawl or Everything or Simon Lloyd....etc (case isnt important)

What the mod does is look for the string you entered, so if you want to ban the spider i mentioned above just Simon will do it, howevere lets say you have a friendly bot called Simon Lloyd Crawled Everything Everywherespider then to ban the first bot and allow the other you'd need to enter a string that is unique to the first one so in this case i could be:
Simon Lloyd Crawl Everything
This way it wont pick up the "Crawl" in the friendly bots name as its looking for the exact string you entered.

Hope that helps.

Simon Lloyd
12-11-2014, 05:22 PM
That is what the spiders.txt file I attached is.

D.I need it as asked for, the reason for this is to check for machine charaters and/or leading/trailing spaces, hard returns.....etc

Gadget_Guy
12-11-2014, 05:25 PM
What the mod does is look for the string you entered, so if you want to ban the spider i mentioned above just Simon will do it, howevere lets say you have a friendly bot called Simon Lloyd Crawled Everything Everywherespider then to ban the first bot and allow the other you'd need to enter a string that is unique to the first one so in this case i could be:
Simon Lloyd Crawl Everything
This way it wont pick up the "Crawl" in the friendly bots name as its looking for the exact string you entered.

Hope that helps.


This COULD be the issue I have with mine then. When you look at my txt file you will see that I tried putting in multiple variations. That could be negating the effectiveness.

Maybe as part of the mod could be an updated list that we can copy/paste so that people like me who are clueless don't do the wrong thing.

Keeping in mind that we would want the "good" spiders to get through like google, bing, and the legit ones that are important to SEO, Adsense, and other things like that.

I hope you and Ozzy are disscusing the hook thing as well... he seemed to think that may be important with my 4.2.2 site.

I will say that when I had this mod in place for my 3.8.x site it worked perfectly and I didn't get hit hard till I upgraded to 4.2.2

I saw my server loads go way up....

D.

Simon Lloyd
12-11-2014, 05:32 PM
There is already a list included and many throughout this thread, i've also explained the above before. I wouldn't update the list of spiders to ban as i've said probably over a dozen times it's a personal thing on what or who you ban.

If you want to pm me access as i've said before i'll take a look.

Gadget_Guy
12-11-2014, 05:40 PM
Here you go.

Simon Lloyd
12-11-2014, 06:11 PM
Right, i've been through your list, i wont comment on the bots you are banning as thats your preference, what i ahve done is ordered the list, checked for anything that shouldn't be there and removed some bots as they will be taken care of by other entries you have.

What i will say is if you are NOT using Paul Ms "Who has visited" mod and you are still seeing any of the bots on your list appear in WOL then you need to check that spiders UserAgent to see if the name or text you have in your list actually appears in the UA.

CAG CheechDogg
12-11-2014, 06:48 PM
There's one or two duplicates there but that doesn't matter, however rather than ban baiduspider just ban baidu, you'll have more luck with that as not all baiduspiders have the entire name in the UA, that goes for most of the bots, lets say there's a bot called Simon Lloyd Crawl Everything Everywherespider then the following will ban it:
Simon or Lloyd or Crawl or Spider.....etc
The same goes for:
Lloyd Crawl or Everything or Simon Lloyd....etc (case isnt important)

What the mod does is look for the string you entered, so if you want to ban the spider i mentioned above just Simon will do it, howevere lets say you have a friendly bot called Simon Lloyd Crawled Everything Everywherespider then to ban the first bot and allow the other you'd need to enter a string that is unique to the first one so in this case i could be:
Simon Lloyd Crawl Everything
This way it wont pick up the "Crawl" in the friendly bots name as its looking for the exact string you entered.

Hope that helps.

No no ...I am fine Simon, I have "ZERO" traces of baidu ... I think when you asked for the settings and a shot you were asking Gadget Guy and not me ... but I am fine, I have no problems with the mod not blocking any of the bots at all ... Thank you !!!!

Gadget_Guy
12-11-2014, 07:35 PM
Do you mean this one:

https://vborg.vbsupport.ru/showthread.php?t=232636


Then, yes... I am using it.

D.

Gadget_Guy
12-11-2014, 07:43 PM
Right, i've been through your list, i wont comment on the bots you are banning as thats your preference, what i ahve done is ordered the list, checked for anything that shouldn't be there and removed some bots as they will be taken care of by other entries you have.

What i will say is if you are NOT using Paul Ms "Who has visited" mod and you are still seeing any of the bots on your list appear in WOL then you need to check that spiders UserAgent to see if the name or text you have in your list actually appears in the UA.

I have implemented your list.

In regards to your comment about "my list".... I have no idea to be honest.

Those I put in there based on what I was seeing in my WOL and putting things in there to try and block them.

I am sure I was way off base and incorrect in doing so.

So.... in light of this... if you want to provide a "proper" list... I am happy to take your guidance.

I don't know the first thing about any of this stuff and am looking to experts like yourself to assist.

D.

CAG CheechDogg
12-11-2014, 07:51 PM
I have implemented your list.

In regards to your comment about "my list".... I have no idea to be honest.

Those I put in there based on what I was seeing in my WOL and putting things in there to try and block them.

I am sure I was way off base and incorrect in doing so.

So.... in light of this... if you want to provide a "proper" list... I am happy to take your guidance.

I don't know the first thing about any of this stuff and am looking to experts like yourself to assist.

D.

Gadge my Man... what you need to do is ask yourself a few questions ... like how much do rankings mean to you, if you want every single search engine to crawl your site ... are you on a shared or dedicated server and if you are already having problems with bots and especially bad bots hitting your site too much ..

I for one don't care for any other search engine except for google and yahoo ... I have completely blocked most search engines that are foreign and facebook as well ... as you can see from the screenshot I only have like 10 different spiders/bots that even crawl my site and that is just how I want it ...

So put together a list of the engines and spiders that you know are giving you hell then you can add those to the list that I have or Simon and Ozzy have and you can give it one last look and just remove the ones you want your site to be crawled with ....

I added my list in a txt file if you want to try mine out and see how it works for you ...

Here it is as well ...

Simon Lloyd
12-11-2014, 07:53 PM
As i've said many times in this thread, the fact that they are showing up in Paul Ms mod doesn't mean they are getting through, his kod logs them as they visit, mine redirects thema t the same time, both mods are working fine!

Just copy CAG CheechDogg's list and prune as needed. There is no "proper" list its all a personal choice!

CAG CheechDogg
12-11-2014, 07:57 PM
As i've said many times in this thread, the fact that they are showing up in Paul Ms mod doesn't mean they are getting through, his kod logs them as they visit, mine redirects thema t the same time, both mods are working fine!

Just copy CAG CheechDogg's list and prune as needed. There is no "proper" list its all a personal choice!

Right Simon ... it logs the actual visit (detection) ... I check to see what is actually getting through by going to the online.php page and selecting Display: Search Bots from the drop down and then you can see what is actually crawling the site ...

CAG CheechDogg
12-11-2014, 08:00 PM
I actually use this mod here by the Great Boobo (RIP) to also display the spiders in the whos online list and it is hell of accurate !!!!

https://vborg.vbsupport.ru/showthread.php?t=243460

Gadget_Guy
12-11-2014, 08:20 PM
Guys... I can't thank you enough for all the help and advice.

I feel like a huge blindfold was lifted from my eyes with the last couple posts.

I tried reading all 45 pages of this mod to really understand it... but I missed the point about the detections being just that.

I was scratching my head when WOL didn't really match up with online.php list

I want to be crawled... but only by the right spiders..... the ones that really count for a north american audience and people who use the "traditional" engines like Yahoo, Google, Bing etc

I certainly don't want to jeopardize my Google adsense ads and things like that.

I want my site to be found.... and be a source when people search for information pertaining to what we do.

D.

ozzy47
12-11-2014, 08:26 PM
I took a quick look at your list, it looks like you are only blocking bad bots, so it should be ok. :)

Max Taxable
12-11-2014, 10:19 PM
On my own vB 4, the only time I ever see Baidu in either Paul's mod or in WoL, is when I turn off this mod.

Paul's mod fires before this one, that is true. But that is not the reason some people get Baidu there. Baidu is also in WoL, if you look, on boards that are showing Baidu in Paul's mod.

ozzy47
12-11-2014, 10:21 PM
Well I think I might have taken care of the issues, as well as added a few more things to the mod, but I need to test it a bit more to be totally sure.

Gadget_Guy
12-11-2014, 10:23 PM
WOL seems to be pretty clean now....

What is this spider? I see a lot of entries for it on WOL

Proximic Spider
54.175.33.76
Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php)

Edit...

This one as well:

Magpie Spider
94.228.34.203
magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)


D.

Max Taxable
12-11-2014, 10:28 PM
Harmless crawlers but if you want them blocked you can put them on the list.

ozzy47
12-11-2014, 10:29 PM
Proximic Spider
http://www.proximic.com/spider.html
Magpie Spider
http://www.brandwatch.com/magpie-crawler/

CAG CheechDogg
12-12-2014, 02:25 AM
Proximic Spider
http://www.proximic.com/spider.html
Magpie Spider
http://www.brandwatch.com/magpie-crawler/

I have had those 2 blocked for a very long time ..no need for them for my forums ....

CAG CheechDogg
12-12-2014, 02:30 AM
Harmless crawlers but if you want them blocked you can put them on the list.

It's not so much about if they are harmless or not, it's the amount of resources they sometimes use up crawling and the amount of sessions they leave open and the amount of spiders crawling at the same time ....for those 2 in the past I saw 23 different ips for proximic at one time and like 10 for magpie as well .. way too many sessions and spiders to be running around at the same time....

Gadget_Guy
12-12-2014, 02:41 AM
So if I want to add them to my list, what do I enter?

D.

Max Taxable
12-12-2014, 02:43 AM
So if I want to add them to my list, what do I enter?

D.Proximic
Magpie

CAG CheechDogg
12-12-2014, 02:46 AM
Make sure you leave no trailing spaces or space at the start

Gadget_Guy
12-12-2014, 03:14 AM
Okay.. now I am ready to bang my head on the wall.

Baidu is back.

Just saw it in WOL

(I grabbed two to show)

D.

Simon Lloyd
12-12-2014, 04:37 AM
If you see them again please take a snapshot showing their useragents.

Gadget_Guy
12-12-2014, 10:37 AM
Here are snapshots.

I included magpie which I added yesterday.

CAG CheechDogg
12-12-2014, 10:48 AM
Okay.. now I am ready to bang my head on the wall.

Baidu is back.

Just saw it in WOL

(I grabbed two to show)

D.

Just deny those IPs access to your site with htaccess ... that is what I also did even though I use this mod ...

ozzy47
12-12-2014, 10:56 AM
Yes that is a option, but it helps in no way to help figure out what the issue is with them making it through.

But I may have that solved, but I need to do further testing. :)

CAG CheechDogg
12-12-2014, 11:11 AM
The thing is Baidu has tons of ips from where it comes from .... some aren't actually Baidu spiders or bots but simply bad bots disguising themselves as Baidu...check this list out for all the known ips and some user agents for Baidu:

USER AGENT: "Baiduspider+(+http://www.baidu.com/search/spider.htm)"
IP: 119.63.196.10
IP: 119.63.196.102
IP: 119.63.196.103
IP: 119.63.196.104
IP: 119.63.196.105
IP: 119.63.196.106
IP: 119.63.196.107
IP: 119.63.196.108
IP: 119.63.196.109
IP: 119.63.196.11
IP: 119.63.196.110
IP: 119.63.196.111
IP: 119.63.196.112
IP: 119.63.196.113
IP: 119.63.196.114
IP: 119.63.196.115
IP: 119.63.196.116
IP: 119.63.196.117
IP: 119.63.196.119
IP: 119.63.196.12
IP: 119.63.196.120
IP: 119.63.196.13
IP: 119.63.196.15
IP: 119.63.196.16
IP: 119.63.196.17
IP: 119.63.196.18
IP: 119.63.196.19
IP: 119.63.196.20
IP: 119.63.196.21
IP: 119.63.196.22
IP: 119.63.196.23
IP: 119.63.196.24
IP: 119.63.196.25
IP: 119.63.196.26
IP: 119.63.196.27
IP: 119.63.196.39
IP: 119.63.196.40
IP: 119.63.196.41
IP: 119.63.196.42
IP: 119.63.196.43
IP: 119.63.196.45
IP: 119.63.196.47
IP: 119.63.196.48
IP: 119.63.196.50
IP: 119.63.196.51
IP: 119.63.196.52
IP: 119.63.196.53
IP: 119.63.196.55
IP: 119.63.196.56
IP: 119.63.196.57
IP: 119.63.196.73
IP: 119.63.196.74
IP: 119.63.196.75
IP: 119.63.196.76
IP: 119.63.196.77
IP: 119.63.196.78
IP: 119.63.196.79
IP: 119.63.196.80
IP: 119.63.196.81
IP: 119.63.196.82
IP: 119.63.196.83
IP: 119.63.196.84
IP: 119.63.196.85
IP: 119.63.196.86
IP: 119.63.196.88
IP: 119.63.196.89
IP: 123.125.71.100
IP: 123.125.71.101
IP: 123.125.71.102
IP: 123.125.71.103
IP: 123.125.71.104
IP: 123.125.71.105
IP: 123.125.71.106
IP: 123.125.71.107
IP: 123.125.71.108
IP: 123.125.71.109
IP: 123.125.71.110
IP: 123.125.71.111
IP: 123.125.71.112
IP: 123.125.71.113
IP: 123.125.71.114
IP: 123.125.71.115
IP: 123.125.71.116
IP: 123.125.71.117
IP: 123.125.71.70
IP: 123.125.71.71
IP: 123.125.71.81
IP: 123.125.71.83
IP: 123.125.71.94
IP: 123.125.71.95
IP: 123.125.71.96
IP: 123.125.71.97
IP: 123.125.71.98
IP: 123.125.71.99
IP: 180.76.5.100
IP: 180.76.5.103
IP: 180.76.5.111
IP: 180.76.5.137
IP: 180.76.5.141
IP: 180.76.5.142
IP: 180.76.5.145
IP: 180.76.5.146
IP: 180.76.5.147
IP: 180.76.5.148
IP: 180.76.5.150
IP: 180.76.5.151
IP: 180.76.5.155
IP: 180.76.5.159
IP: 180.76.5.161
IP: 180.76.5.162
IP: 180.76.5.166
IP: 180.76.5.167
IP: 180.76.5.170
IP: 180.76.5.171
IP: 180.76.5.172
IP: 180.76.5.176
IP: 180.76.5.177
IP: 180.76.5.178
IP: 180.76.5.179
IP: 180.76.5.180
IP: 180.76.5.181
IP: 180.76.5.182
IP: 180.76.5.183
IP: 180.76.5.185
IP: 180.76.5.187
IP: 180.76.5.189
IP: 180.76.5.190
IP: 180.76.5.192
IP: 180.76.5.193
IP: 180.76.5.194
IP: 180.76.5.195
IP: 180.76.5.196
IP: 180.76.5.197
IP: 180.76.5.49
IP: 180.76.5.51
IP: 180.76.5.54
IP: 180.76.5.55
IP: 180.76.5.57
IP: 180.76.5.58
IP: 180.76.5.59
IP: 180.76.5.62
IP: 180.76.5.63
IP: 180.76.5.64
IP: 180.76.5.65
IP: 180.76.5.67
IP: 180.76.5.87
IP: 180.76.5.88
IP: 180.76.5.90
IP: 180.76.5.92
IP: 180.76.5.93
IP: 180.76.5.94
IP: 180.76.5.95
IP: 180.76.5.96
IP: 180.76.5.97
IP: 180.76.5.98
IP: 180.76.5.99
IP: 180.76.6.21
IP: 180.76.6.212
IP: 180.76.6.213
IP: 180.76.6.222
IP: 180.76.6.223
IP: 180.76.6.224
IP: 180.76.6.231
IP: 180.76.6.28
IP: 180.76.6.29
IP: 180.76.6.36
IP: 180.76.6.37
IP: 199.36.73.116
IP: 220.181.108.100
IP: 220.181.108.107
IP: 220.181.108.110
IP: 220.181.108.81
IP: 220.181.108.84
IP: 220.181.108.93
IP: 220.181.108.95
IP: 220.181.108.97
IP: 222.186.24.59

USER AGENT: "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
IP: 67.159.56.162
IP: 72.11.144.119
IP: 119.63.196.10
IP: 119.63.196.103
IP: 119.63.196.105
IP: 119.63.196.106
IP: 119.63.196.11
IP: 119.63.196.113
IP: 119.63.196.116
IP: 119.63.196.117
IP: 119.63.196.118
IP: 119.63.196.120
IP: 119.63.196.124
IP: 119.63.196.125
IP: 119.63.196.13
IP: 119.63.196.16
IP: 119.63.196.19
IP: 119.63.196.20
IP: 119.63.196.23
IP: 119.63.196.25
IP: 119.63.196.26
IP: 119.63.196.27
IP: 119.63.196.28
IP: 119.63.196.30
IP: 119.63.196.31
IP: 119.63.196.32
IP: 119.63.196.38
IP: 119.63.196.40
IP: 119.63.196.42
IP: 119.63.196.43
IP: 119.63.196.44
IP: 119.63.196.45
IP: 119.63.196.47
IP: 119.63.196.53
IP: 119.63.196.54
IP: 119.63.196.57
IP: 119.63.196.58
IP: 119.63.196.60
IP: 119.63.196.61
IP: 119.63.196.76
IP: 119.63.196.78
IP: 119.63.196.80
IP: 119.63.196.82
IP: 119.63.196.88
IP: 119.63.196.89
IP: 119.63.196.9
IP: 119.63.196.93
IP: 119.63.196.94
IP: 119.63.196.96
IP: 123.125.67.164
IP: 123.125.71.100
IP: 123.125.71.101
IP: 123.125.71.102
IP: 123.125.71.103
IP: 123.125.71.104
IP: 123.125.71.105
IP: 123.125.71.106
IP: 123.125.71.107
IP: 123.125.71.108
IP: 123.125.71.109
IP: 123.125.71.110
IP: 123.125.71.111
IP: 123.125.71.112
IP: 123.125.71.113
IP: 123.125.71.114
IP: 123.125.71.115
IP: 123.125.71.116
IP: 123.125.71.117
IP: 123.125.71.12
IP: 123.125.71.13
IP: 123.125.71.14
IP: 123.125.71.15
IP: 123.125.71.16
IP: 123.125.71.17
IP: 123.125.71.18
IP: 123.125.71.19
IP: 123.125.71.20
IP: 123.125.71.21
IP: 123.125.71.22
IP: 123.125.71.23
IP: 123.125.71.24
IP: 123.125.71.25
IP: 123.125.71.26
IP: 123.125.71.27
IP: 123.125.71.28
IP: 123.125.71.29
IP: 123.125.71.30
IP: 123.125.71.31
IP: 123.125.71.32
IP: 123.125.71.33
IP: 123.125.71.34
IP: 123.125.71.35
IP: 123.125.71.36
IP: 123.125.71.38
IP: 123.125.71.39
IP: 123.125.71.40
IP: 123.125.71.41
IP: 123.125.71.42
IP: 123.125.71.43
IP: 123.125.71.44
IP: 123.125.71.45
IP: 123.125.71.46
IP: 123.125.71.47
IP: 123.125.71.48
IP: 123.125.71.49
IP: 123.125.71.50
IP: 123.125.71.51
IP: 123.125.71.52
IP: 123.125.71.53
IP: 123.125.71.54
IP: 123.125.71.55
IP: 123.125.71.56
IP: 123.125.71.57
IP: 123.125.71.58
IP: 123.125.71.59
IP: 123.125.71.60
IP: 123.125.71.69
IP: 123.125.71.70
IP: 123.125.71.71
IP: 123.125.71.72
IP: 123.125.71.73
IP: 123.125.71.74
IP: 123.125.71.75
IP: 123.125.71.76
IP: 123.125.71.77
IP: 123.125.71.78
IP: 123.125.71.79
IP: 123.125.71.80
IP: 123.125.71.81
IP: 123.125.71.82
IP: 123.125.71.83
IP: 123.125.71.84
IP: 123.125.71.85
IP: 123.125.71.86
IP: 123.125.71.87
IP: 123.125.71.88
IP: 123.125.71.89
IP: 123.125.71.90
IP: 123.125.71.91
IP: 123.125.71.92
IP: 123.125.71.94
IP: 123.125.71.95
IP: 123.125.71.96
IP: 123.125.71.97
IP: 123.125.71.98
IP: 123.125.71.99
IP: 125.39.78.168
IP: 125.39.78.171
IP: 125.39.78.173
IP: 125.39.78.174
IP: 125.39.78.177
IP: 125.39.78.179
IP: 125.39.78.181
IP: 125.39.78.183
IP: 125.39.78.185
IP: 125.39.78.187
IP: 125.39.78.188
IP: 125.39.78.189
IP: 125.90.93.141
IP: 173.236.136.101
IP: 180.76.5.100
IP: 180.76.5.101
IP: 180.76.5.103
IP: 180.76.5.107
IP: 180.76.5.110
IP: 180.76.5.111
IP: 180.76.5.113
IP: 180.76.5.136
IP: 180.76.5.137
IP: 180.76.5.138
IP: 180.76.5.139
IP: 180.76.5.140
IP: 180.76.5.141
IP: 180.76.5.142
IP: 180.76.5.143
IP: 180.76.5.144
IP: 180.76.5.145
IP: 180.76.5.146
IP: 180.76.5.147
IP: 180.76.5.148
IP: 180.76.5.149
IP: 180.76.5.150
IP: 180.76.5.151
IP: 180.76.5.153
IP: 180.76.5.154
IP: 180.76.5.155
IP: 180.76.5.156
IP: 180.76.5.157
IP: 180.76.5.158
IP: 180.76.5.159
IP: 180.76.5.160
IP: 180.76.5.161
IP: 180.76.5.162
IP: 180.76.5.163
IP: 180.76.5.164
IP: 180.76.5.165
IP: 180.76.5.166
IP: 180.76.5.167
IP: 180.76.5.168
IP: 180.76.5.169
IP: 180.76.5.170
IP: 180.76.5.171
IP: 180.76.5.172
IP: 180.76.5.173
IP: 180.76.5.175
IP: 180.76.5.176
IP: 180.76.5.177
IP: 180.76.5.178
IP: 180.76.5.179
IP: 180.76.5.180
IP: 180.76.5.181
IP: 180.76.5.182
IP: 180.76.5.183
IP: 180.76.5.184
IP: 180.76.5.185
IP: 180.76.5.186
IP: 180.76.5.187
IP: 180.76.5.188
IP: 180.76.5.189
IP: 180.76.5.190
IP: 180.76.5.191
IP: 180.76.5.192
IP: 180.76.5.193
IP: 180.76.5.194
IP: 180.76.5.195
IP: 180.76.5.196
IP: 180.76.5.197
IP: 180.76.5.48
IP: 180.76.5.49
IP: 180.76.5.50
IP: 180.76.5.51
IP: 180.76.5.52
IP: 180.76.5.53
IP: 180.76.5.54
IP: 180.76.5.55
IP: 180.76.5.56
IP: 180.76.5.57
IP: 180.76.5.58
IP: 180.76.5.59
IP: 180.76.5.60
IP: 180.76.5.61
IP: 180.76.5.62
IP: 180.76.5.63
IP: 180.76.5.64
IP: 180.76.5.65
IP: 180.76.5.66
IP: 180.76.5.67
IP: 180.76.5.87
IP: 180.76.5.88
IP: 180.76.5.89
IP: 180.76.5.90
IP: 180.76.5.91
IP: 180.76.5.92
IP: 180.76.5.93
IP: 180.76.5.94
IP: 180.76.5.95
IP: 180.76.5.96
IP: 180.76.5.97
IP: 180.76.5.98
IP: 180.76.5.99
IP: 180.76.6.20
IP: 180.76.6.21
IP: 180.76.6.211
IP: 180.76.6.212
IP: 180.76.6.213
IP: 180.76.6.222
IP: 180.76.6.223
IP: 180.76.6.224
IP: 180.76.6.225
IP: 180.76.6.227
IP: 180.76.6.230
IP: 180.76.6.231
IP: 180.76.6.232
IP: 180.76.6.233
IP: 180.76.6.26
IP: 180.76.6.28
IP: 180.76.6.29
IP: 180.76.6.35
IP: 180.76.6.36
IP: 180.76.6.37
IP: 204.45.133.74
IP: 220.181.108.165
IP: 220.181.108.166
IP: 220.181.108.167
IP: 220.181.108.168
IP: 220.181.108.169
IP: 220.181.108.170
IP: 220.181.108.171
IP: 220.181.108.172
IP: 220.181.108.173
IP: 220.181.108.174
IP: 220.181.108.175
IP: 220.181.108.176
IP: 220.181.108.177
IP: 220.181.108.178
IP: 220.181.108.179
IP: 220.181.108.180
IP: 220.181.108.181
IP: 220.181.108.182
IP: 220.181.108.183
IP: 220.181.108.184
IP: 220.181.108.185
IP: 220.181.108.186
IP: 220.181.108.187
IP: 220.181.108.79

USER AGENT: "Mozilla/5.0+(compatible;+Baiduspider/2.0;++http://www.baidu.com/search/spider.html)"
IP: 222.76.212.176

CAG CheechDogg
12-12-2014, 11:12 AM
Just block those ips and get it over with ...

Gadget_Guy
12-12-2014, 02:26 PM
agreed that I can just deal with it via my firewall.

However, I am providing these reports to help the team with improving the mod.

I get so much from this site that I want to try and give back as much as I can.... so if it can help others who don't have firewalls etc then I feel good about trying to help.

I will continue to try things this way to give my experiences... once they get the mod doing what they want it to do as Ozzy eludes to, I will most likely just go the firewall route.

D.

ozzy47
12-12-2014, 10:51 PM
Well back to the drawing board. :(

ozzy47
12-13-2014, 09:41 AM
Ok here they are showing up in WOL, and they should not be.
I went back to the way the mod is now, and I disabled any other mod running on style_fetch so it is not a hook conflict.

Now I will go back to a recoded version, and try some different hook locations.

Simon Lloyd
12-14-2014, 08:08 AM
If anyone has a test vb4 install that is open to bot access i'd appreciate having admin access to try a few things, I ask this because I can only install an early version of vb4 that comes with my 3.8.7 licence. and I suspect its the later versions that are having this anomaly.

ozzy47
12-14-2014, 12:56 PM
Simon, I can send you a PM on my site, which would contain the XML's for vB3 and vB4.

The new way has so far seemed to successfully blocked them, as well they do not show up in Paul's mod anymore.

I also have added the ability to change the name of the log file, to a different name, in case admins want to change it, as well as a counter of how many bots are blocked using the mod, that shows up in the site stats on the forum home. Now this is of course starting from the time you install the updated version, as there is no way to get a correct count from before. There is also a setting to select which usergroups can see the stats. :)

Max Taxable
12-14-2014, 04:48 PM
If anyone has a test vb4 install that is open to bot access i'd appreciate having admin access to try a few things, I ask this because I can only install an early version of vb4 that comes with my 3.8.7 licence. and I suspect its the later versions that are having this anomaly.It's going to be v4s that are heavily modded, having the issues. My 4.2.2 is lightly modded, and this hack works perfectly and always has.

Gadget_Guy
12-14-2014, 04:53 PM
I have 4.2.2 and I think I am heavily modded(ish)

Your call.

D.

Max Taxable
12-14-2014, 04:55 PM
I have 4.2.2 and I think I am heavily modded(ish)

Your call.

D.I wouldn't know, I only presume since we've only seen this on heavily modded v4 installs. Never seen it at all on a v3 no matter how heavily modded.

"heavily" modded is kind of subjective though too.:D

I believe either there is a hook conflict or, too much time waiting in the "queue" so to speak, since the hook this mod uses is like 15th or something in order of fire.

Gadget_Guy
12-14-2014, 05:03 PM
If you need any details about my site or want me to test things... I am happy to do it.

Shoot me a PM and I will respond with my e-mail address.... happy to work with you all to improve the world. ;)


D.

Simon Lloyd
12-14-2014, 05:34 PM
Simon, I can send you a PM on my site, which would contain the XML's for vB3 and vB4.

The new way has so far seemed to successfully blocked them, as well they do not show up in Paul's mod anymore.

I also have added the ability to change the name of the log file, to a different name, in case admins want to change it, as well as a counter of how many bots are blocked using the mod, that shows up in the site stats on the forum home. Now this is of course starting from the time you install the updated version, as there is no way to get a correct count from before. There is also a setting to select which usergroups can see the stats. :)Drop me a pm, I think im registered there :)

Max Taxable
12-14-2014, 06:03 PM
Just to make sure I clarify - there's nothing inherently wrong with this mod. I merely believe it is being interfered with, on some v4 installations.

ozzy47
12-14-2014, 06:23 PM
Drop me a pm, I think im registered there :)

Yes you are, and i sent you a PM there with the two XML's in it. :)

And yes, there is nothing wrong with the mod, the original code works just fine. I think it is a mod conflicting, or quite possibly a vBoption, not sure which but it's something. I just have it running in a much better hook location for vB4 now, as well as the added goodies. :)

Black Snow
12-15-2014, 12:06 PM
I have been trying to keep an eye kn this thread to see if anyone can block baidu 100%. Has this been achieved yet? I see ozzy47 talking about a new way to block them but see no instructions. Am I missing something?

ozzy47
12-15-2014, 12:39 PM
Yes I have them blocked completely, now it's up to Simon to look at my changes and see if he wants to update the mod. :)

Simon Lloyd
12-15-2014, 03:34 PM
Will take a look tomorrow guys :)

Max Taxable
12-15-2014, 05:34 PM
I have been trying to keep an eye kn this thread to see if anyone can block baidu 100%. Has this been achieved yet? I see ozzy47 talking about a new way to block them but see no instructions. Am I missing something?I have always had it blocked completely, 100 percent on both v3.8.7 and v4. Never, ever had one get through after installing this Mod. That's the point. Some v4 installations are having trouble with Baidu and some other bots getting through, not all.

ozzy47
12-15-2014, 05:38 PM
Right, but the changes I made should stop it on all sites now. :)

Gadget_Guy
12-16-2014, 12:12 AM
Looking forward to Max's thumbs up!

Kudos to all the people collaborating on this thread!

I really is good to see people working together as a community for a common goal!

I for one could not have a successful site if it wasn't for people like yourselves!

D.

ozzy47
12-16-2014, 09:15 AM
Yeah in the past 72 hrs, I have successfully blocked Baidu over 5400 times. :)

Black Snow
12-17-2014, 08:18 AM
Can you share with us how you done this?

ozzy47
12-17-2014, 09:24 AM
Simon has the changes, when he has time he will go over it, and if he wants, release a update. :)

Simon Lloyd
12-17-2014, 04:28 PM
Ozzy47's xml has been uploaded and an update sent out, thanks Chris nice work! :)

ozzy47
12-17-2014, 05:01 PM
Not a problem sir, glad to help make this great mod just a bit better. :)

Gadget_Guy
12-17-2014, 05:12 PM
Installed and already working.

I will post a screen shot of WOL before and after in a little while... I want to have it in place for a little bit so it can start to really do it's thing.

Home page stats in 2 min is reporting 43 blocked

D.

Gadget_Guy
12-17-2014, 05:21 PM
Working Brilliantly!

In less than 10min I went from 30+ Searchbots according to WOL to 14 and they are all the good ones (Google, Bing, Yahoo)

My server loads went from high 3 and 4 to 1.5 (ish)

Output file is showing Baidu being blocked among others. (most seem to be coming from Yandex)

You should all be commended on great work!

D.

ozzy47
12-17-2014, 05:25 PM
Excellent, glad to hear it has worked so well so far. :)

Simon Lloyd
12-17-2014, 05:35 PM
Version 3.1.1 uploaded, slight bug fix, stopped spiders/bots being counted when mod is off!

Max Taxable
12-17-2014, 05:42 PM
Nice to see this venerable, useful and very reliable plugin updated. Nice going guys.

Simon Lloyd
12-17-2014, 05:51 PM
Max if you downloaded the new version before I posted this message please download it again and overwrite the previous one, the first update had a bug :)

Max Taxable
12-17-2014, 05:55 PM
Max if you downloaded the new version before I posted this message please download it again and overwrite the previous one, the first update had a bug :)I have not done so because the original version still works perfectly for me on all my boards.

Simon Lloyd
12-17-2014, 05:56 PM
That's fine, as I made a mistake and that post should have been for the VB3 version :)

Max Taxable
12-17-2014, 06:00 PM
Just very happy to see this little issue solved for folks. Drove Ozzy and I crazy for months, before I bought a existing v4 board and didn't have the same problem. That opened the door for investigation, as it were. Gave us the angle of attack.

Simon Lloyd
12-17-2014, 06:03 PM
I think it's earlier versions of VB4 start up sequence is pretty much the same as VB3 later versions order was different, Ozzy tried the bootstrap hook and that did it! I wasn't aware of the issue until this thread resurrected itself with these posts.

Max Taxable
12-17-2014, 06:10 PM
I think it's earlier versions of VB4 start up sequence is pretty much the same as VB3 later versions order was different, Ozzy tried the bootstrap hook and that did it! I wasn't aware of the issue until this thread resurrected itself with these posts.It's definitely also got to do with how heavily modded the board is.... One of the early v4 boards I help admin had the same issue until all mods except this one, were turned off for a time. But that wasn't tried until just last week.

Originally we thought cron jobs were interfering, on Ozzy's board. Hence why only relatively few Baidu's were getting through. It wasn't until much later on we began to suspect hooks, and firing order, and etc.

Originally after working on it for a couple of weeks, we just blew it off for a time. Wrote it off as just a fluky deal. Until others started reporting the issue, then when I bought that v4 and didn't have the problem it really ramped up interest in solving this again. This goes back almost a year or maybe a little more, this mystery. For us at least.

Kudos to you Simon for having such a open mind about this - few devs would have listened especially on such a well established and widely used Mod as this.

Simon Lloyd
12-17-2014, 06:17 PM
.....Kudos to you Simon for having such a open mind about this - few devs would have listened especially on such a well established and widely used Mod as this.I built this to aid the community so it's a no brainer! :)

Max Taxable
12-17-2014, 06:24 PM
Ozz and I never truly let go of a problem. It haunts us day and night until it is solved!

https://vborg.vbsupport.ru/external/2015/02/1.gif

Gadget_Guy
12-18-2014, 12:03 AM
Hate to be the one to have to say this.

My site just started throwing these errors:

Database error in vBulletin 4.2.2:

Invalid SQL:

UPDATE `bad_bots_blocked` SET ban_useragent = ban_useragent + 1;

MySQL Error : Table 'subaru_test.bad_bots_blocked' doesn't exist
Error Number : 1146
Request Date : Wednesday, December 17th 2014 @ 08:11:41 PM
Error Date : Wednesday, December 17th 2014 @ 08:11:41 PM
Script : http://www.toronto-subaru-club.com/forums/archive/index.php?how
Referrer :
IP Address : 5.255.253.29
Username :
Classname : vB_Database_MySQLi
MySQL Version :


D.

ozzy47
12-18-2014, 12:07 AM
Ok, was this happening in 3.1.0?

I.G.O.T.A.
12-18-2014, 12:09 AM
Hate to be the one to have to say this.

My site just started throwing these errors:

Database error in vBulletin 4.2.2:

Invalid SQL:

UPDATE `bad_bots_blocked` SET ban_useragent = ban_useragent + 1;

MySQL Error : Table 'subaru_test.bad_bots_blocked' doesn't exist
Error Number : 1146
Request Date : Wednesday, December 17th 2014 @ 08:11:41 PM
Error Date : Wednesday, December 17th 2014 @ 08:11:41 PM
Script : http://www.toronto-subaru-club.com/forums/archive/index.php?how
Referrer :
IP Address : 5.255.253.29
Username :
Classname : vB_Database_MySQLi
MySQL Version :


D.

Ditto same here. Never had this before.

ozzy47
12-18-2014, 12:15 AM
I see the issue, use this XML for now. I had one of my own updates to my own table in the XML, that should not have been in there.

ozzy47
12-18-2014, 12:17 AM
Simon can grab that XML, and update the OP when he comes on. :)

Gadget_Guy
12-18-2014, 02:14 AM
So far so good with the new file.

D.

Simon Lloyd
12-18-2014, 05:12 AM
New xml uploaded. Ozzy can you remove the in post ones to save confusion for people looking back through the thread?

Black Snow
12-18-2014, 07:49 AM
Thanks for the update. I can't see any stats in the site statistics on the forum home.

EDIT: It works fine on my test forum. I am running vBOptomise and Total Online Time which both add stats below the "Welcome to our newest member" stats so unsure if this is causing an issue.

EDIT: I disabled vBOptomise and the stats popped up straight away.
Spiders/Bots Activity Blocked From Site: 6
Maybe worth a look into the conflict Simon?

Simon Lloyd
12-18-2014, 07:57 AM
Vboptimise is a caching mod, its possible that either you have to wait for the cache update or its using the same hook or template replacement. Ill check when i get home later.

Black Snow
12-18-2014, 08:05 AM
I cleared the cache to check if it would update but no luck.

aicel
12-18-2014, 08:59 AM
in Vb 5.1 new release don't work, i return to old release and work fine

ozzy47
12-18-2014, 09:24 AM
Thanks for the update. I can't see any stats in the site statistics on the forum home.

EDIT: It works fine on my test forum. I am running vBOptomise and Total Online Time which both add stats below the "Welcome to our newest member" stats so unsure if this is causing an issue.

EDIT: I disabled vBOptomise and the stats popped up straight away.

Maybe worth a look into the conflict Simon?

I am running vBoptimize and the stats show fine for me. :confused:

ozzy47
12-18-2014, 09:26 AM
in Vb 5.1 new release don't work, i return to old release and work fine

This mod is for vB4 not vB5, it will not work in that version at all, as the required hook system is not available.

ozzy47
12-18-2014, 09:27 AM
New xml uploaded. Ozzy can you remove the in post ones to save confusion for people looking back through the thread?

Done. :)

Black Snow
12-18-2014, 10:17 AM
I am running vBoptimize and the stats show fine for me. :confused:

If I disabled the mod then the bots blocked shows or if I choose not to show how many queries it has saved me on forumhome, then blocked shows

ozzy47
12-18-2014, 10:20 AM
That is weird, I have the queries showing on forum home, as well as the stats from this mod, and they both show.

What happens if you disable the Time Online mod?

Black Snow
12-18-2014, 10:52 AM
That is weird, I have the queries showing on forum home, as well as the stats from this mod, and they both show.

What happens if you disable the Time Online mod?

If I disable the time mod it doesnt show. It will ONLY show if I disable the vBOptomise mod OR turn off the option to display the saved queries on forumhome.

ozzy47
12-18-2014, 10:54 AM
I don't get it, :confused: . If you want. PM me a admin account, and I can try and figure out what is different between your site and my site, since I have it showing, and you don't. I can look into it later today when I get home from work.

sarasotarepub
12-18-2014, 12:13 PM
Just installed the latest update, looks good!

Gadget_Guy
12-18-2014, 12:27 PM
It's working great here as well.

Question: Having a look at my "Guests" block on the home page.... I take it you updated things so that the blocked bots are no longer showing in there?

D.

Simon Lloyd
12-18-2014, 12:39 PM
Blocked bots shouldn't show anywhere except in stats...etc

Gadget_Guy
12-18-2014, 05:39 PM
Blocked bots shouldn't show anywhere except in stats...etc

Awesome!

D.

Gadget_Guy
12-18-2014, 05:41 PM
Based on this screen capture from today.... would you recommend I add anything additional to my list in the mod?

D.

ozzy47
12-18-2014, 08:12 PM
Based on this screen capture from today.... would you recommend I add anything additional to my list in the mod?
D.

Looks fine to me. :)

bzcomputers
12-18-2014, 08:23 PM
Was averaging about 400 Baidu bots a day still getting through before this last update. Now after 24 hours I'm seeing zero. Thanks!

ozzy47
12-18-2014, 08:24 PM
Excellent, glad to hear it is stopping them for you. :)

princesspepper
12-19-2014, 08:05 AM
Simon has the changes, when he has time he will go over it, and if he wants, release a update. :)

Ozzy47's xml has been uploaded and an update sent out, thanks Chris nice work! :)

Thanks guyz!

Black Snow
12-19-2014, 08:37 AM
I'm working on a way to log all the blocked bots. Not sure if it will record every instance or just a total number for each bot. Will share my findings when I'm done.

ozzy47
12-19-2014, 09:16 AM
Say what? This is already recording the blocked bots. In four different possible ways, in a log file, in a thread, in a email and the bots blocked count.

Now writing each bot individually to the DB, is not a good idea, as the table would grow too big, and be a giant PITA when you query it to get the info. Think about it, in just five days, only blocking baidu, I have a count of 9062.

Now lets take that and round it to 10,000 for seven days. Lets say I am going to block the 281 bots I have in my list. 281 x 10,000 x 52 = 146,120,000 entries a year. :eek:

Sure each bot is not going to hit me 10,000 times a week, but you can see, this can really grow fast. ;)

And my site is not that busy, now imaginge on a busy site, the entries to the table in the DB could be 250,000,000 plus a year. Now run a query to retrieve the info from that table, and it would hurt the site every time it is ran. :mad:

IMO, the best way to go is how it is now, just increase the count every time a bot is blocked, and get that number and display it. :)

Simon Lloyd
12-19-2014, 09:45 AM
Yep, you'd have to list every bot in the database then increment it's count every time you block it, that would be the easier way but your actually talking about constantly querying the database with redas and writes - definitely not a good idea, save your resources for your members.

ozzy47
12-19-2014, 09:48 AM
Yes, I know Simon, I thought about doing it that way also, but the payoff, is not worth the trouble. It would bloat the mod, and become a resource hog in no time. :)