vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vBulletin 4.x Add-ons (https://vborg.vbsupport.ru/forumdisplay.php?f=245)
-   -   Miscellaneous Hacks - Ban Spiders by User Agent (https://vborg.vbsupport.ru/showthread.php?t=268208)

vb50kgpoo 11-12-2012 01:28 PM

Quote:

Originally Posted by ForceHSS (Post 2380510)
What is there full host name

That is it !
They go by those generic names only, nothing else.

Simon Lloyd 11-12-2012 04:31 PM

Quote:

Originally Posted by vb50kgpoo (Post 2380473)
Hi Simon
Yours is a great product. I made the mistake of uninstalling it in order to use AbyssGuard, which is plagued with problems. I have now reinstalled Ban Spiders By User Agent. One question, are there any ramifications in banning \wbot[\/\-] with your mod? I ask as putting \wbot[\/\-] directly into my htaccess banning mecahism causes issues.
Regards / RSVP

Im ,y mod you are banning any useragent that has any occurrence of one of the strings in your list, i very much doubt that \wbot[\/\-] is found in any useragent as the looks like a regular expression, in my mod simply wbot will do if thats in their useragent.

Quote:

Originally Posted by vb50kgpoo (Post 2380504)
Also.......

Does anyone know what these bots are;

Robot ID - Hits - Bandwidth - Last visit - Hits on robots.txt
robot 772 8576221 20121111093454 0
crawl 699 9556953 20121108085243 0
spider 5 114750 20121106065956 0

Bad bots using generic names?

Thats just stats from cpanel awstats and mean nothing other than spiders were identified with those in their identifier.

Quote:

Originally Posted by vb50kgpoo (Post 2380545)
That is it !
They go by those generic names only, nothing else.

They are not their useragents, read some of the links that i took time and trouble to post in the mod description on hpw to find the useragent.

bzcomputers 11-20-2012 03:09 PM

Quote:

Originally Posted by TheSupportForum (Post 2379008)
wont MSIE 1 block MSIE Beta 10 ?

which means MSIE 8, 9 only visitors

It won't block MSIE Beta 10, but it will block MSIE 10. Best to remove MSIE 1 from your list now. I haven't seen any references in the output.txt file of anything prior to MSIE 5 so removing MSIE 1 probably won't cause any problems.

https://vborg.vbsupport.ru/external/2012/11/15.jpg


Edit: If you were wondering what the IP Address 108.2.106.107 is, it is Verizon's Search Engine (www.verizon.net) definitely not something you would want to block.

Max Taxable 11-20-2012 03:35 PM

Quote:

Originally Posted by bzcomputers (Post 2382955)
It won't block MSIE Beta 10, but it will block MSIE 10. Best to remove MSIE 1 from your list now. I haven't seen any references in the output.txt file of anything prior to MSIE 5 so removing MSIE 1 probably won't cause any problems

Using the time based Mod in my signature, I often see user agents with early IE, such as 3, 4 and 5, but you're right - I haven't seen any IE 1 or 2 in so long, there's no doubt it would be good to remove MSIE 1 from the ban list.

Thanks for the information!

Simon Lloyd 11-20-2012 05:57 PM

Quote:

Originally Posted by bzcomputers (Post 2382955)
It won't block MSIE Beta 10, but it will block MSIE 10. Best to remove MSIE 1 from your list now. I haven't seen any references in the output.txt file of anything prior to MSIE 5 so removing MSIE 1 probably won't cause any problems.

https://vborg.vbsupport.ru/external/2012/11/15.jpg


Edit: If you were wondering what the IP Address 108.2.106.107 is, it is Verizon's Search Engine (www.verizon.net) definitely not something you would want to block.

It will block MSIE Beta 10 if it appears like that in the useragent, this mod will block ALL instances where the exact string in your list is found in the UA.

My Hattiesburg 12-15-2012 09:45 PM

Okay, I've installed this and it seems to be working fine, but I have some questions.

At first the Baidu spider was hammering us, hitting the site about every 8 seconds, but now it seems to have more or less given up on us. Yandex, on the other hand, seems to have intensified it's attempted crawls. At first it was hitting the site about every 25 seconds but over the course of this install has ranged everywhere from every 1 second to where it's at now, about every 1.5 minutes. The 1.5 minute attempts have only occurred in the last couple of days.

A couple of days ago when Yandex was hitting us every 1 second, we had some server load issues. I don't know if this is related but it seems it might be and I'm wondering if logging the blocks to a text file might be counterproductive in that area, writing to the file every second or so.

Also, Yandex is using the same IP address every time, so I thought it might be best to just block it using the .htaccess file, but that doesn't seem to have had any effect. Is this mod redirecting Yandex before it has a chance to read the .htaccess file or is Yandex simply ignoring it?

smirkley 12-15-2012 10:24 PM

Your server would be the one to decide if Yandex is blocked by the .htaccess file or not.
If the ip is denied in htaccess, your server will block before any vb module loads, or any page opther for that matter.

Simon Lloyd 12-16-2012 05:29 AM

Smirkley is right, .htaccess is loaded before anything else. As for my mod using Yandex as a blocking string will block it, if it is still showing then thats because either they dont actually have yandex in the user agent or you've entered more than just Yandex as the string, remember, my mod bans anything that has exactly your string to look for (including spaces), so if your strings in my mod look like this:
Baidu
Yandex
SoSo
.....etc
then yandex will be blocked, however if it looks like this:
Baidu
Yandex123
SoSo
....etc
then any bot with just yandex in their string or any other kind of yandex like YandexWorld will not be blocked, but anything containing yandex123 will be blocked.

As for writing to a file, just turn that bit off, it's there simply for test purposes, trouble shooting or checking on individual user agents.

My Hattiesburg 12-16-2012 08:20 PM

Your mod is blocking Yandex, but I just figured since it was using the same IP address every time it might be better to just go ahead and block that IP.

I guess I didn't do something right in the .htaccess file because Yandex was getting past it and was getting blocked by the mod.

Simon Lloyd 12-17-2012 03:52 PM

yandex doesn't always use the same IP, somewhere earlier in the pages for this mod i think i posted how to do it in .htaccess if you wish.

smstoolbox 12-19-2012 05:10 PM

Hi Simon,

I have recently installed your mod and whilst testing options managed to block my home PC agent from accessing my site (Im new to this game sorry)! My question is after removing my agent details i still appear to be blocked from my home PC - How can i rsolve this any advice would be appreciated....

Alibass 12-19-2012 05:19 PM

Try clearing your browser cache and also cache from admincp/Maintenance

Simon Lloyd 12-19-2012 05:40 PM

It will be your browser cache, you can also try (on your pc) Start>Run>ipconfig /flush dns (the space is intentional)

Max Taxable 12-19-2012 05:47 PM

I'll caution on the above - you really don't want to enter any parts of user agent strings that might be common stuff - you have the potential of not only blocking yourself, but literally tens of millions of computers and/or other devices.

It's one of the best Mods there is for vBulletin but, USE WITH CAUTION.

Simon Lloyd 12-19-2012 08:05 PM

Quote:

Originally Posted by Max Taxable (Post 2391708)
I'll caution on the above - you really don't want to enter any parts of user agent strings that might be common stuff - you have the potential of not only blocking yourself, but literally tens of millions of computers and/or other devices.

It's one of the best Mods there is for vBulletin but, USE WITH CAUTION.

Thanks for that, this mod was only ever built to stop bots eating up your bandwidth which is why i recommend using actual bot names found in the useragents :)

smstoolbox 12-20-2012 09:44 AM

Quote:

Originally Posted by Simon Lloyd (Post 2391705)
It will be your browser cache, you can also try (on your pc) Start>Run>ipconfig /flush dns (the space is intentional)

:) Thanks Simon I will try this once I get back from work

smstoolbox 12-20-2012 09:44 AM

Quote:

Originally Posted by Alibass (Post 2391699)
Try clearing your browser cache and also cache from admincp/Maintenance

Thanks for the advice!!

smstoolbox 12-21-2012 12:14 PM

Quote:

Originally Posted by smstoolbox (Post 2391807)
:) Thanks Simon I will try this once I get back from work

:up: Sorted thanks again!!

smstoolbox 12-21-2012 12:15 PM

Quote:

Originally Posted by Max Taxable (Post 2391708)
I'll caution on the above - you really don't want to enter any parts of user agent strings that might be common stuff - you have the potential of not only blocking yourself, but literally tens of millions of computers and/or other devices.

It's one of the best Mods there is for vBulletin but, USE WITH CAUTION.

:up: Thanks for the heads up on this!!

WorldCraft 12-29-2012 08:59 PM

Fantastic little mod. Works great! :up:

etca 01-24-2013 04:20 AM

well done, mark installed

Simon Lloyd 01-24-2013 02:42 PM

Glad its helped you:)

Simon Lloyd 02-02-2013 04:04 AM

I'm looking for feedback guys!
Would it be beneficial to automatically ban bots that exceed x number of bots at any one time?

So, the likes of Baiduspider send around 200 at any one time, so if i entered say 150 (in place of x) in a settings box then they would automatically get added to the ban list, let me know your views as i'm not going to work on something nobody feels is needed :)

bzcomputers 02-02-2013 06:16 AM

Quote:

Originally Posted by Simon Lloyd (Post 2401233)
I'm looking for feedback guys!
Would it be beneficial to automatically ban bots that exceed x number of bots at any one time?

So, the likes of Baiduspider send around 200 at any one time, so if i entered say 150 (in place of x) in a settings box then they would automatically get added to the ban list, let me know your views as i'm not going to work on something nobody feels is needed :)

It's not a bad idea but is probably not needed. I think most any bot that would "exceed a certain number" would probably be a bot we are already blocking by name with this. I guess it would be nice to have a second log of the bots that are coming through if that is possible, then we could tell if it was necessary.

One thing I wouldn't mind seeing is options to choose both filename and directory for the bot output file(s). An option to be able to show the most recent bots at the top of the file (reverse of how it saves now) is something I would like too, not sure what everyone else thinks.

Alibass 02-02-2013 12:10 PM

Quote:

Originally Posted by Simon Lloyd (Post 2401233)
I'm looking for feedback guys!
Would it be beneficial to automatically ban bots that exceed x number of bots at any one time?

So, the likes of Baiduspider send around 200 at any one time, so if i entered say 150 (in place of x) in a settings box then they would automatically get added to the ban list, let me know your views as i'm not going to work on something nobody feels is needed :)

I like this idea and would most definitely like to see this feature added. :)

S_E_A 02-08-2013 01:32 PM

Hi,

I would like to ban Amazon AWS EC2. I have tried AmazonAWS and Amazon AWS EC2. Any suggestions please?

Thank you.

Simon Lloyd 02-08-2013 01:55 PM

Check out the links i've given in the mod description above entitled : How do i ban a bot?
it should explain how to find out their exact user agent :)

fly 02-08-2013 02:14 PM

Quote:

Originally Posted by S_E_A (Post 2402869)
Hi,

I would like to ban Amazon AWS EC2. I have tried AmazonAWS and Amazon AWS EC2. Any suggestions please?

Thank you.

That's a hosting service. Why are they spidering your site? Are you sure that's correct?

S_E_A 02-08-2013 03:25 PM

Based on research a number of people recommend blocking AmazonAWS. What do people on here recommend?

Simon Lloyd 02-08-2013 03:59 PM

Quote:

Originally Posted by fly (Post 2402874)
That's a hosting service. Why are they spidering your site? Are you sure that's correct?

Quote:

Originally Posted by S_E_A (Post 2402887)
Based on research a number of people recommend blocking AmazonAWS. What do people on here recommend?

I suspect accounts held on some of their servers are of no use to your forum and are scrapping content or emails..etc

Banning bots, as i've always said is a personal thing :)

fly 02-08-2013 05:13 PM

Quote:

Originally Posted by Simon Lloyd (Post 2402890)
I suspect accounts held on some of their servers are of no use to your forum and are scrapping content or emails..etc

Banning bots, as i've always said is a personal thing :)

Okay, but if I recall correctly this only bans by user agent, not IP block and therefore would be ineffective to ban 'AWS'.

Simon Lloyd 02-08-2013 05:16 PM

Why would it be ineffective banning them? every device that accesses the internet...etc has a UserAgent, you just need to find the useragent and i show you how to do that in the links in the mod description.

Read this: http://www.webmasterworld.com/search...rs/4368965.htm

If you really want to ban ip's then https://vborg.vbsupport.ru/showthread.php?t=268146

fly 02-08-2013 05:22 PM

Quote:

Originally Posted by Simon Lloyd (Post 2402915)
Why would it be ineffective banning them? every device that accesses the internet...etc has a UserAgent, you just need to find the useragent and i show you how to do that in the links in the mod description.

Read this: http://www.webmasterworld.com/search...rs/4368965.htm

If you really want to ban ip's then https://vborg.vbsupport.ru/showthread.php?t=268146

Because since Amazon runs a cloud hosting service, anyone can own an AWS server. Hell, I have one. There is no ONE service and user agent on AWS, so its not possible to ban all AWS servers by user agent.

Simon Lloyd 02-08-2013 05:25 PM

But not ALL AWS users are bad, are you? :), agreed you cannot ban a server but every bot, spider, person or device that comes your way will have a UA that you can ban.

fly 02-08-2013 05:31 PM

Quote:

Originally Posted by Simon Lloyd (Post 2402918)
But not ALL AWS users are bad, are you? :), agreed you cannot ban a server but every bot, spider, person or device that comes your way will have a UA that you can ban.

The request was to ban AWS servers by user agent. That's not possible.

(And technically you don't even have to send a user agent.)

Simon Lloyd 02-08-2013 05:37 PM

The request wasn't specifically to ban the servers by UA :), if you send a malformed or blank UA then you can ban those too ;)

As a side note i noticed that you haven't downloaded the latest version of this mod or marked it installed, have you uninstalled it, if so could i ask why? just helps me develop more robust things in the future.

fly 02-08-2013 08:47 PM

Quote:

Originally Posted by Simon Lloyd (Post 2402922)
The request wasn't specifically to ban the servers by UA :), if you send a malformed or blank UA then you can ban those too ;)

As a side note i noticed that you haven't downloaded the latest version of this mod or marked it installed, have you uninstalled it, if so could i ask why? just helps me develop more robust things in the future.

I started with this mod. However, my server at the time was so resource starved that I needed to block the spiders before it got to PHP/MYSQL. Nothing wrong with it. It worked well. I just couldn't afford the resources.

Max Taxable 02-10-2013 03:09 AM

Quote:

Originally Posted by fly (Post 2402919)
The request was to ban AWS servers by user agent. That's not possible.

(And technically you don't even have to send a user agent.)

Yes it is. Enter it exactly like it appears in the user agent string.

"amazonaws"

fly 02-10-2013 12:28 PM

Quote:

Originally Posted by Max Taxable (Post 2403250)
Yes it is. Enter it exactly like it appears in the user agent string.

"amazonaws"

Yes, but no one is using that UA. Amazon has no reason to crawl *any* site.

Max Taxable 02-11-2013 04:56 PM

Quote:

Originally Posted by fly (Post 2403305)
Yes, but no one is using that UA. Amazon has no reason to crawl *any* site.

Amazon AWS is their hosting they sell. And yes they also crawl the web: http://aws.amazon.com/search-engines/

I have it blocked as well, using this Mod.

Here is how I decide what UAs I block:

1.) Is it beneficial for my site to have it crawling?

2.) Does it behave nicely? Does it obey robots.txt?

3.) If in any way suspicious, it goes in this Mod.

Like the developer says, it's all about personal choice.


All times are GMT. The time now is 03:15 AM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.02190 seconds
  • Memory Usage 1,851KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (24)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (1)pagenav_pagelinkrel
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (40)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete