![]() |
Quote:
|
Baidu has kissed my "gluteus maximus" for almost 2 years and change if not more ... So has Yandex and a handful of others as well ... I must have a "magic" forum :)
|
I am re working a couple of things, and then need to test further, I can then share my findings with Simon. :)
|
I would be happy to test on my site if it helps the community.
D. |
Quote:
|
If this helps anyone.... this is a list of what I am seeing in terms of spiders on my site with this installed.
Bing Spiders (6), Google Favicon Spiders (9), Proximic Spiders (135), Baidu Spiders (175), WinHTTP Spiders (12), Facebook Spiders (20), Google AdSense Spiders (7), Magpie Spiders (9), linkdexbot/2.0 Spiders (7), AhrefsBot Spiders (14), Coccoc Spiders (2), Google AppEngine Spiders (6), Google Spiders (40), Sucuri Spiders (3), Twitterbot Spiders (4), Google FeedFetcher Spiders (3), Apple RSS Spiders (1), WordPress.com mShots Spiders (1), Google Web Preview Spiders (3), Grapeshot Spiders (2), James BOT WebCrawler Spiders (5), Netseer crawler/2.0 Spiders (2), Google Images Spiders (3), Galaxy Spiders (2), Feedly Spiders (2), DotBot Spiders (1), Yahoo! Slurp Spiders (1), 360Spider Spiders (4), Netcraft Web Server Survey Spiders (1), NerdyBot Spiders (2), Exabot Spiders (1), Integrity Bot Spiders (1), ContextAd Bot Spiders (2), Twitturls.com (Python-urllib) Spiders (1) I am happy to supply any information that you may find useful to assist in the work you are doing. D. |
I need a snapshot of your settings for the mod as there is no way all those being entered in the mod would get past the mod!
|
1 Attachment(s)
This is a snapshot of the spiders that are showing up in the whos online:
https://vborg.vbsupport.ru/external/2014/12/30.jpg What exactly do you need a snapshot in the settings Simon? This is my list of spiders I have banned with your mod: almaden Anarchie Artabus ASPSeek attach autoemailspider BackWeb Baidu Bandit BatchFTP BlackWidow Bot\mailto:craftbot@yahoo.com Buddy bumblebee CherryPicker ChinaClaw CICC Collector Copier Copyscape Crescent DIIbot DISCo DISCo\Pump dotbot Download\Demon Download\Wonder Downloader Drip DSurf15a eCatch EasyDL/2.99 EirGrabber EmailCollector EmailSiphon EmailWolf Express\WebPictures ExtractorPro EyeNetIE FileHound FlashGet FrontPage GetRight GetSmart GetWeb! gigabaz GNIP Go\!Zilla Go!Zilla Go-Ahead-Got-It gotit Grabber GrabNet Grafula grub-client HMView HTTrack httpdown .*httrack.* ia_archiver Ichiro Image\Stripper Image\Sucker Indy*Library Indy\Library InterGET InternetLinkagent Internet\Ninja InternetSeer.com Iria JBH*agent JetCar JOC\Web\Spider JustView larbin LeechFTP LexiBot lftp Link*Sleuth likse //Link LinkWalker Mag-Net Magnet Magpie magpie Mass\Downloader Memo Microsoft.URL MIDown\tool Mirror Mister\PiX Mozilla.*Indy Mozilla.*NEWT Mozilla*MSIECrawler MS\FrontPage* MSFrontPage MSIECrawler MSProxy Navroad NearSite NetAnts NetMechanic NetSpider Net\Vampire NetZIP NICErsPRO Ninja Nutch Octopus Offline\Explorer Offline\Navigator omgili Openfind PageGrabber Papa\Foto PaperLiBot pavuk pcBrowser Ping PingALink Pockey psbot Pump QRVA RealDownload Reaper Recorder ReGet Scooter Seeker Siphon sitecheck.internetseer.com SiteSnagger SlySearch SmartDownload Snake sogou Soso SpaceBison speedy Spinn3r sproose Stripper Sucker SuperBot SuperHTTP Surfbot Szukacz tAkeOut Teleport\Pro URLSpiderPro Vacuum VoidEYE Web\Image\Collector Web\Sucker WebAuto [Ww]eb[Bb]andit webcollage WebCopier Web\Downloader WebEMailExtrac.* WebFetch WebGo\IS WebHook WebLeacher WebMiner WebMirror WebReaper WebSauger Website Website\eXtractor Website\Quester Webster WebStripper WebWhacker WebZIP Wget Whacker Widow WWWOFFLE x-Tractor Xaldon\WebSpider Xenu Yandex Yeti YOUDAOBOT Zeus.*Webster Zeus baiduspider beta.statsit.com statsit SiteIntel Yandex GomezAgent FunWebProducts Nesotebot DCPbot AOL Advertising R&D DataCha0s aiHitBot Apache-HttpClient Zend_Http_Client ReverseGet XXX bot Content vBSEO spbot OffByOne thyroidbuzz AcoonBot coccoc xpymep proxyproxy2884 AppEngine start.exe Semiocast HTTP client Firefox/3.6.23 TurnitinBot curl SwpLc/1.6 GrepNetstat.com news bot AskTbPTV checks panopta App3le PhantomJS AlwaysOnline SISTRIX proximic CRAWL-E/0.6.4 WebMoney Maxthon HTMLParser oBot UnisterBot ERACrawler Butterfly Topsy Butterfly Topsy Crawler Ezooms Deepnet Alexa Bitlybot Seznam Fulltext Sunrise Communications AG crawl Crawl MJ12bot Bimbot Snapbot thunderstone Thunderstone grub-client Bing MSN OOZBOT Wayback Machine Crowsnest Spider FlipboardProxy Feedly |
1 Attachment(s)
Here is my stuff:
|
Hi Gadget Guy, remove the second picture as it has your email address in it. I see the settings are ok, now can you just copy the list as you have it (copy straight out of the textbox in the mod) sitck it in a wordpad document, zip it and attach it here so i can check that please.
|
That is what the spiders.txt file I attached is.
D. |
Quote:
Simon or Lloyd or Crawl or Spider.....etc The same goes for: Lloyd Crawl or Everything or Simon Lloyd....etc (case isnt important) What the mod does is look for the string you entered, so if you want to ban the spider i mentioned above just Simon will do it, howevere lets say you have a friendly bot called Simon Lloyd Crawled Everything Everywherespider then to ban the first bot and allow the other you'd need to enter a string that is unique to the first one so in this case i could be: Simon Lloyd Crawl Everything This way it wont pick up the "Crawl" in the friendly bots name as its looking for the exact string you entered. Hope that helps. |
Quote:
|
Quote:
This COULD be the issue I have with mine then. When you look at my txt file you will see that I tried putting in multiple variations. That could be negating the effectiveness. Maybe as part of the mod could be an updated list that we can copy/paste so that people like me who are clueless don't do the wrong thing. Keeping in mind that we would want the "good" spiders to get through like google, bing, and the legit ones that are important to SEO, Adsense, and other things like that. I hope you and Ozzy are disscusing the hook thing as well... he seemed to think that may be important with my 4.2.2 site. I will say that when I had this mod in place for my 3.8.x site it worked perfectly and I didn't get hit hard till I upgraded to 4.2.2 I saw my server loads go way up.... D. |
There is already a list included and many throughout this thread, i've also explained the above before. I wouldn't update the list of spiders to ban as i've said probably over a dozen times it's a personal thing on what or who you ban.
If you want to pm me access as i've said before i'll take a look. |
1 Attachment(s)
Here you go.
|
1 Attachment(s)
Right, i've been through your list, i wont comment on the bots you are banning as thats your preference, what i ahve done is ordered the list, checked for anything that shouldn't be there and removed some bots as they will be taken care of by other entries you have.
What i will say is if you are NOT using Paul Ms "Who has visited" mod and you are still seeing any of the bots on your list appear in WOL then you need to check that spiders UserAgent to see if the name or text you have in your list actually appears in the UA. |
Quote:
|
Do you mean this one:
https://vborg.vbsupport.ru/showthread.php?t=232636 Then, yes... I am using it. D. |
Quote:
In regards to your comment about "my list".... I have no idea to be honest. Those I put in there based on what I was seeing in my WOL and putting things in there to try and block them. I am sure I was way off base and incorrect in doing so. So.... in light of this... if you want to provide a "proper" list... I am happy to take your guidance. I don't know the first thing about any of this stuff and am looking to experts like yourself to assist. D. |
1 Attachment(s)
Quote:
I for one don't care for any other search engine except for google and yahoo ... I have completely blocked most search engines that are foreign and facebook as well ... as you can see from the screenshot I only have like 10 different spiders/bots that even crawl my site and that is just how I want it ... So put together a list of the engines and spiders that you know are giving you hell then you can add those to the list that I have or Simon and Ozzy have and you can give it one last look and just remove the ones you want your site to be crawled with .... I added my list in a txt file if you want to try mine out and see how it works for you ... Here it is as well ... |
As i've said many times in this thread, the fact that they are showing up in Paul Ms mod doesn't mean they are getting through, his kod logs them as they visit, mine redirects thema t the same time, both mods are working fine!
Just copy CAG CheechDogg's list and prune as needed. There is no "proper" list its all a personal choice! |
Quote:
|
I actually use this mod here by the Great Boobo (RIP) to also display the spiders in the whos online list and it is hell of accurate !!!!
https://vborg.vbsupport.ru/showthread.php?t=243460 |
Guys... I can't thank you enough for all the help and advice.
I feel like a huge blindfold was lifted from my eyes with the last couple posts. I tried reading all 45 pages of this mod to really understand it... but I missed the point about the detections being just that. I was scratching my head when WOL didn't really match up with online.php list I want to be crawled... but only by the right spiders..... the ones that really count for a north american audience and people who use the "traditional" engines like Yahoo, Google, Bing etc I certainly don't want to jeopardize my Google adsense ads and things like that. I want my site to be found.... and be a source when people search for information pertaining to what we do. D. |
I took a quick look at your list, it looks like you are only blocking bad bots, so it should be ok. :)
|
On my own vB 4, the only time I ever see Baidu in either Paul's mod or in WoL, is when I turn off this mod.
Paul's mod fires before this one, that is true. But that is not the reason some people get Baidu there. Baidu is also in WoL, if you look, on boards that are showing Baidu in Paul's mod. |
Well I think I might have taken care of the issues, as well as added a few more things to the mod, but I need to test it a bit more to be totally sure.
|
WOL seems to be pretty clean now....
What is this spider? I see a lot of entries for it on WOL Proximic Spider 54.175.33.76 Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php) Edit... This one as well: Magpie Spider 94.228.34.203 magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net) D. |
Harmless crawlers but if you want them blocked you can put them on the list.
|
Proximic Spider
http://www.proximic.com/spider.html Magpie Spider http://www.brandwatch.com/magpie-crawler/ |
Quote:
|
Quote:
|
So if I want to add them to my list, what do I enter?
D. |
Quote:
Magpie |
Make sure you leave no trailing spaces or space at the start
|
1 Attachment(s)
Okay.. now I am ready to bang my head on the wall.
Baidu is back. Just saw it in WOL (I grabbed two to show) D. |
If you see them again please take a snapshot showing their useragents.
|
1 Attachment(s)
Here are snapshots.
I included magpie which I added yesterday. |
Quote:
|
All times are GMT. The time now is 09:35 AM. |
Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information | |
---|---|
|
|
![]() |
|
Template Usage:
Phrase Groups Available:
|
Included Files:
Hooks Called:
|