vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vBulletin 4.x Add-ons (https://vborg.vbsupport.ru/forumdisplay.php?f=245)
-   -   Miscellaneous Hacks - Ban Spiders by User Agent (https://vborg.vbsupport.ru/showthread.php?t=268208)

TheWhite 09-12-2011 05:52 AM

Well guys, I have a decent forum with 185k members with a decent dedicated server and an acceptable Google adsense income.

In the last few years I've tried to "ban" the vicious Baidu Spider by using the robot text (useless because it don't obey it), then by using the .htacess (was great) which worked fine for a couple of years but since a month ago it has somehow found a way in not obeying that either.

In the last week or so I started banning the Ips as a last resort but these guys come out with a new one in a couple of days so this morning I did a Google search and came to this WONDERFUL VB MOD, this coder deserves a medal and not only the MOD OF THE MONTH (so vote for him!!). After an hour, Baidu is NO MORE!! I hit 253GB of bandwidth this morning which is over my average monthly (time period) rate by at least 75GB not mentioning the server slowdowns.

Baidu doesn't bring any good traffic (adsense wise), it only does harm by eating up your resources and slowing down your forum thus causing the Google Crawlers (GOOD BOTS) to take more time on indexing your forum which is bad.

I haven't tested the logging and emailing reportings yet but I will in the next few days.

Cheers!!

Simon Lloyd 09-12-2011 06:00 AM

Hi Guys, thanks for your kind words :), the logging seems to work fine with the banning but i suggest you only turn logging on at least 30 minutes after setting the banning as the text file can get huge quickly. I'm only a few more tests off releasing the fix for "Thread Creation" so maybe later today when i have time or first thing tomorrow ;)

TheWhite 09-12-2011 06:14 AM

Hi Simon, great work!!

I don't want to turn on the server logging because I don't like big files but the thread creation ( similar to the Multiple Login Detection one) would be very much appreciated.
The mod must remain lean and mean ;)

I'm using VB 3.6.12 so I hope you keep this mod compatible.

Regards

Simon Lloyd 09-12-2011 06:30 AM

As an extra thought, if you have a large .htaccess your forum will slow down as every user has to be compared against it or at least thats what i've been led to believe!

Simon Lloyd 09-12-2011 06:33 AM

Quote:

Originally Posted by TheWhite (Post 2244796)
Hi Simon, great work!!

I don't want to turn on the server logging because I don't like big files but the thread creation ( similar to the Multiple Login Detection one) would be very much appreciated.
The mod must remain lean and mean ;)

I'm using VB 3.6.12 so I hope you keep this mod compatible.

Regards

I hadn't tested this that far back but glad it works for you, it should remain compatible :).

Boofo 09-12-2011 02:02 PM

Quote:

Originally Posted by Simon Lloyd (Post 2244092)
I'll just add another attachment, i gave the list openly like that so folk could just either copy it or pick out the ones they wanted to ban, plus its no mystery as to what they're getting but i'll definately do that tomorrow :)

Any word on this yet?

Simon Lloyd 09-12-2011 02:37 PM

Boofo i've been a little preoccupied, promise to do it in the next hour :)

TheWhite 09-12-2011 03:57 PM

When are you going to fix/add the thread notification?
Regards

ForceHSS 09-12-2011 04:02 PM

1 Attachment(s)
still does not post here are my settings

Simon Lloyd 09-12-2011 04:19 PM

Lol the update you were notified of was just for the text file with bad bots being added, still working on getting banning and thread creation working at the same time :)

Simon Lloyd 09-12-2011 04:24 PM

as a side note you don't need the full useragent string anymore to ban them, you can now enter any part of the string:
e.g
bai will result in baidu being banned just as will any string containing "bai"
Entering Mozilla will result in every useragent string containing that to be banned.

So, entering the full bot name but not useragent string will do, enter Baidu for that spider, dont enter Ya as something to ban as Yahoo will be banned just as Yandex will.

ForceHSS 09-12-2011 06:54 PM

ok did not know as it did not say this in your first post

TheWhite 09-13-2011 04:23 PM

A little word of advice, don't get carried away with the bot banning because it might affect your Google revenue in a negative way, start with these for a while and control your traffic wisely.

Baidu
Yeti
Twiceler


Regards

Boofo 09-13-2011 04:45 PM

Quote:

Originally Posted by TheWhite (Post 2245418)
A little word of advice, don't get carried away with the bot banning because it might affect your Google revenue in a negative way, start with these for a while and control your traffic wisely.

Baidu
Yeti
Twiceler


Regards

Don't forget Yandex.

niteflyer32 09-13-2011 07:36 PM

Works great except one thing with the Baidu spider, it knocked them off the forum for about 3 hours. Now I see the Baidu spider back again on the online list.????

Update: Looks like the mod knocked Baidu off again, haven't seen them for 1 hour. There was 4 Baidu spiders all with different IPs that showed up and then nothing. Is this the mod working or should we not even see the bots that are on the blacklist? I don't see other spiders we've blacklisted showing back up yet.

ForceHSS 09-13-2011 10:55 PM

I have it like this in the list Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
and they never enter

Simon Lloyd 09-14-2011 05:59 AM

Don't worry if you see them in the WOL, what you have to remember is whilst they are redirected the moment they arrive vbulletin may have already registered that they have arrived and for a split second show them in WOL, they only go missing from WOL when your WOL timeout has expired, i have mine set to 900 seconds (15 minutes) so i have to wait beyond that time before i see them disappear.

The fix im working on for "Create thread" to work with "Ban spiders in list" at this moment uses a hook thats compiled later so the bots always show in WOL (until they get the message for the 301) as the minute they are redirected another of their bots try to crawl your site but are instantly redirected, i could release that version but it will amount to loads of messages here saying "I can still see the spiders" so i wont release it until i can make the bots disappear after the WOL timeout.

niteflyer32 09-17-2011 01:07 PM

1 Attachment(s)
The Baidu bots do seem to disappear once they are found. We used to have scores of them on the WOL and now just one or two show up and they are gone shortly.

One issue we have with a forum member who is getting blocked and the Google re-direct, he is using IE 8 and on Cox.net

Here is our block list, I'm not sure which one is blocking him. Any ideas? Thanks.

==================================

Simon Lloyd 09-17-2011 03:38 PM

What you'll need to get him to do is visit here http://whatsmyuseragent.com/ and get him to post you the entire contents of the box that says "Your UserAgent" and then see if any part of that string matches any of your blocked bots, post back and i'll take a look :)

P.S can you edit your last post and remove all those bots and add an attachment of a text file? that way folk don't have to scroll for ages to read the thread :)

niteflyer32 09-18-2011 06:28 AM

Sorry about the long post, got it changed to a text file.

Here is the forum member's response from the "Your UserAgent" link. I'm not sure which one from our list is blocking him, possibly one of the 3 Mozillas on the list?

Thanks for the help.

Your User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; Creative AutoUpdate v1.40.01)

Simon Lloyd 09-18-2011 08:14 AM

Firstly it looks like your clients UA has been altered in the past, i have two suggestions you can try right now, firstly change DA in your list to something like DA1 and ask him to try, if that doesn't do it try altering Net Extractor to NET1 Extractor, these are the only things i can see may be a problem, if that doesn't work i'll look at it more in depth but in the mean time your user can perform a UA switch by following the instructions here http://whatsmyuseragent.com/SwitchingUserAgents.asp

Simon Lloyd 09-18-2011 03:36 PM

I've got some new for you, i tested your clients UA against my forum using your spider list and was able to gain access to my site so it's not this mod keeping him/her out.

You can check for yourself here http://www.botsvsbrowsers.com/SimulateUserAgent.asp just enter his entire useragent in the test window and your site below, hit Go and enter the Captcha that pops in to the results window, hit Go again and you'll see your site :)

So from that this mods fine and not causing a problem.

EDIT: BTW do you know that every one of the bots you supplied in that list has a space before it? the mod will look for a space before each of those words if thats how it appears in you ban spider list.

niteflyer32 09-18-2011 06:38 PM

We changed DA on the UA list and now the member can access the forum????

I'm not seeing the space before the UA in the list I posted. The list appears to be working as it has knocked many of the bots off our WOL list.

Simon Lloyd 09-18-2011 06:51 PM

Well, if it's working for you all well and good :)

At least you now have the tools to test any future queries with users having problems with access because of their UA.

Simon Lloyd 09-18-2011 07:02 PM

Hmmm seems to be a problem with page 5 of thread....this is a test post!

Edit: thread back to normal now ;)

niteflyer32 09-18-2011 07:16 PM

Thank you for your time and help. This mod rocks.

Simon Lloyd 09-18-2011 08:33 PM

Since you like it and it works for you can you mark it installed please!

niteflyer32 09-18-2011 09:18 PM

Oops, I thought I had already marked that. Done. And MOTM done too.

We have 2 other users saying they are getting blocked now. I think I may have gone overboard on blocking UAs.

Your User Agent is:
Mozilla/4.0 (compatible; MSIE 8.0; AOL 9.0; AOLBuild 4327.5204; Windows NT
5.1; Trident/4.0; GTB6.4; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR
3.0.4506.2152; .NET CLR 3.5.30729; customie8)

and

Your User Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.163 Safari/535.1

Also, I selected the option to have a log created and written to output.txt in the forums root but I'm not seeing that log txt file. Any idea why that log is not appearing?

Simon Lloyd 09-19-2011 06:10 AM

Thanks for that, have you used the tools (links) i gave you, you can test their useragent against your site to see f it is the mod blocking them, what you have to remember is that when you enter something like DA then the mod looks for anything that contains just that, your other members issues were sorted because you changed DA to DA1 but in their useragent was "Creative Autoupdate, when it comes to banning UA's that have just a couple of letters or such like then you are best entering either a longer string or best entering the entire useragent.

The logging and create thread i only tested on their own, that is to say i didn't have the option for banning them turned on, it seems that there is a small glitch that i'm working on, when banning the spider they are banned as your forum style is being compiled for them but the notifications are created after the forum is shown to them completely, so the spiders are taken care of well before it ever gets to the notification stage, youre not seeing notifications because there's nothing to notify.

I am working on getting them both to work together and will post here when i manage that :)

Simon Lloyd 09-19-2011 06:27 AM

Your first UA is being blocked because you have "custo" in your block list!

Simon Lloyd 09-19-2011 07:05 AM

Your second UA can access your site (i just used the UA simulator from the link i posted above)

Simon Lloyd 09-19-2011 07:14 AM

post just to get to page 5 as this thread appears to be faulty!

niteflyer32 09-19-2011 07:44 AM

Thanks for the help on the UA list.

When I tested the 2nd UA I listed above with your UA test website http://www.botsvsbrowsers.com/SimulateUserAgent.asp I see our forum but I also see a "Request Status: 500 : Internal Server Error" at the top. Is that caused by our US block list or is that a server setting issue I need to talk to our webhost about? I also get a "500 : Internal Server Error" when trying to verify we have our Comscore analytic code (like Google Analytics) inserted on the forum.

What is the status of the spiders Majestics MJ12bot Spider, Speedy Spider and Voila Spider? I've searched them and get info that is dated or conflicting on if they are good or bad. They are hitting our site pretty often.

The page 5 weirdness of this thread you're seeing appears to be okay on my end.

BadgerDog 09-19-2011 11:51 AM

1 Attachment(s)
Installed with thanks .. :)

Unfortunately, I'm still getting Baidu spiders, even 4 days after installing this.

Attached is screen pic. What am I doing wrong?

Regards,
Doug

Simon Lloyd 09-19-2011 12:59 PM

For now there is an issue when banning bots and having one of the notification enabled (either create thread or Output.txt file on server), so if you have those enabled please disable them, to ban bots for now you must only have the mod activated and ban bots in list selected (and of course bots that you want to ban, do that and all will be good :)

You will recieve notice when i solve the "working together" issue.

Simon Lloyd 09-19-2011 01:04 PM

@niteflyer32, try turning the mod off then trying the UA at the UA simulation site and see what it returns, the mod shouldn't cause a 500 error and then show you the site, if you don't have access you simply get redirected so you wouldn't see your site.

I haven't researched the spiders, i built this mod to cut down on my server load as Baidu were hammering it, i have between 200 and 350 Baidu at my site at any one time, and because the index so vigourously their demand on the server is huge (although while im working on the issues with the notification i am allowing all bots at the moment).

BadgerDog 09-19-2011 01:48 PM

Quote:

Originally Posted by Simon Lloyd (Post 2247842)
For now there is an issue when banning bots and having one of the notification enabled (either create thread or Output.txt file on server), so if you have those enabled please disable them, to ban bots for now you must only have the mod activated and ban bots in list selected (and of course bots that you want to ban, do that and all will be good :)

You will recieve notice when i solve the "working together" issue.

Thanks .. :)

I've turned OFF logging (email notifications were already off) and I'll monitor it now ...

Regards,
Doug

BadgerDog 09-22-2011 10:09 AM

Quote:

Originally Posted by BadgerDog (Post 2247868)
Thanks .. :)

I've turned OFF logging (email notifications were already off) and I'll monitor it now ...

Regards,
Doug

That doesn't work either.... :confused:

Still getting lots of Baidu and Yandex spiders ...

I'm not sure this mod is working at all, regardless of any options set, or turned ON or OFF ... ;)

Regards,
Doug

smirkley 09-22-2011 04:08 PM

Still testing but I can say so far,... NICE !!

Thank you.

I am only banning 4 useragnts at the moment, but I wish to ask is there a condensed version of 'must ban' useragents off that list here, as compared to the whole list? I dont want to go crazy and ban too much especially if it hurts my membership or adsense rev.

So far I ban:

Baidu
Yeti
Twiceler
Yandex

Simon Lloyd 09-22-2011 04:52 PM

Quote:

Originally Posted by BadgerDog (Post 2248891)
That doesn't work either.... :confused:

Still getting lots of Baidu and Yandex spiders ...

I'm not sure this mod is working at all, regardless of any options set, or turned ON or OFF ... ;)

Regards,
Doug

if you want to pm me admin access details and url i'll take a look :)


All times are GMT. The time now is 01:04 PM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01713 seconds
  • Memory Usage 1,833KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (6)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (3)pagenav_pagelink
  • (1)pagenav_pagelinkrel
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (40)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete