vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vBulletin 4.x Add-ons (https://vborg.vbsupport.ru/forumdisplay.php?f=245)
-   -   Miscellaneous Hacks - Ban Spiders by User Agent (https://vborg.vbsupport.ru/showthread.php?t=268208)

ForceHSS 10-29-2012 03:25 AM

you right did not see that bit late hear could be the reason :)

Disco_Dave 10-31-2012 03:05 PM

I installed this mod yesterday and from then I have been bombarded with this spider AhrefsBot Spider I've tried adding them to this mod, but no joy.

Max Taxable 10-31-2012 05:17 PM

Quote:

Originally Posted by tricksodave (Post 2377295)
I installed this mod yesterday and from then I have been bombarded with this spider AhrefsBot Spider I've tried adding them to this mod, but no joy.

It blocks that spider, how did you add it?

TheSupportForum 10-31-2012 05:27 PM

Quote:

Originally Posted by tricksodave (Post 2377295)
I installed this mod yesterday and from then I have been bombarded with this spider AhrefsBot Spider I've tried adding them to this mod, but no joy.

you need to block their IP, AhrefsBot has more than 1 IP

Max Taxable 10-31-2012 05:29 PM

Quote:

Originally Posted by simonhind (Post 2377329)
you need to block their IP, AhrefsBot has more than 1 IP

The whole purpose of this Mod is to block user agents, so you don't have to block IP addresses.

If the person put "AhrefsBot" in this Mod, it should be blocked no matter the IP.

Simon Lloyd 10-31-2012 09:06 PM

I think you'll find that if you check WOL when arhefsbot is online and then choose to show useragent from the dropdown ahrefsbot isn't actually in their useragent, i think i posted about this to another user a few posts or so ago.

TheSupportForum 10-31-2012 09:20 PM

why not just put

*

/

it blocks all useragents and will only require standard 1 line box
this is what i have done

Simon Lloyd 10-31-2012 09:37 PM

Quote:

Originally Posted by simonhind (Post 2377371)
why not just put

*

/

it blocks all useragents and will only require standard 1 line box
this is what i have done

that will block everyone if using it in my mod!

Here's some UA's that ahrefs use
Mozilla/5.0 (compatible; AhrefsBot/1.0; +http://ahrefs.com/robot/)
Mozilla/5.0 (compatible; AhrefsBot/2.0; +http://ahrefs.com/robot/)
Mozilla/5.0 (compatible; AhrefsBot/3.0; +http://ahrefs.com/robot/)
Mozilla/5.0 (compatible; SiteBot/0.1; +http://www.sitebot.org/robot/)
Mozilla/5.0 (compatible; SiteBot/0.1; +http://www.sitebot.org/robot/),gzip(gfe)
Mozilla/5.0 (compatible; SiteBot/0.1; +http://www.sitebot.org/robot/),gzip(...(gfe),gzip(gfe)

So use SiteBot or Ahrefs as the banning UA's :)

TheSupportForum 10-31-2012 10:31 PM

for those who want to block blocks accesses through Facebook external hit

facebookexternalhit/1.0

Max Taxable 10-31-2012 11:44 PM

Quote:

Originally Posted by Simon Lloyd (Post 2377369)
I think you'll find that if you check WOL when arhefsbot is online and then choose to show useragent from the dropdown ahrefsbot isn't actually in their useragent, i think i posted about this to another user a few posts or so ago.

I have had no problems, Ahrefsbot is in my collection, and I never see it hit my site anymore.

Pretty sure the OP is saying he sees "ahrefsbot" in his WOL after adding it to your mod.

Max Taxable 10-31-2012 11:45 PM

Quote:

Originally Posted by simonhind (Post 2377383)
for those who want to block blocks accesses through Facebook external hit

facebookexternalhit/1.0

I don't block that because it's just facebook getting image and text information from a thread or a post someone has posted to facebook. It's self defeating to block this, it's your friend. Same with twitterbot.

Simon Lloyd 11-01-2012 07:39 AM

Quote:

Originally Posted by Max Taxable (Post 2377401)
I have had no problems, Ahrefsbot is in my collection, and I never see it hit my site anymore.

Pretty sure the OP is saying he sees "ahrefsbot" in his WOL after adding it to your mod.

Well if thats the case and its actually in vbulletins standard WOL then it will either disappear after his online timeout setting in admincp or the Ahrefsbot is using a UA that doesn't have Ahrefs in it, if he is using Paul M's Who has visited or a similar mod then it will appear always as both mods are doing their job!, i've mentioned this a few times.

EDIT: this is actually mentioned in the FAQ thats referrenced in the mods description.

Disco_Dave 11-01-2012 08:14 AM

Quote:

Originally Posted by Max Taxable (Post 2377326)
It blocks that spider, how did you add it?

There user agent was comingup as this: choopa: choopa.net: I placed both of these in. They had at least 20 different ip addies.

Simon Lloyd 11-01-2012 08:37 AM

You only need to enter choopa if that displays in their UA and they'll be banned immediately and will disappear from WOL after the WOL timeout, the ip's are of no consequence, i have a ban ip mod but this one ban's the string found in the UA so they can use 100 different ips for the same UA and still they will be banned :)

Disco_Dave 11-01-2012 08:48 AM

That's what I did yesterday morning, but in the afternoon I had around 36 ahrefsbots under the UA Choopa.net I added choopa in the morning to your mod but they where still there in the afternoon that's way I asked. I haven't seen them this morning though..

Simon Lloyd 11-01-2012 09:06 AM

Do you use any mod for who visited statistics? if you dont the only other explanation is that they had already accessed a thread or area prior to you banning them, the only time they can be banned is when they release that thread or area to move to another, after that they're history :)

Disco_Dave 11-01-2012 09:15 AM

I have boofo's mod for displaying spiders. I'm not sure if any spiders can view our site you need to be registered to view any content on our site.

Great mod and thanks for your help..

Simon Lloyd 11-01-2012 09:45 AM

It doesn't matter that they cannot view any content, your thread url's are being indexed which is why you are being crawled, naturally they see the same as guest, just viewed your site you should also turn off displaying WOL for guests, it will save you queries and bandwidth :)

Disco_Dave 11-01-2012 10:00 AM

Cheers Simon :D

The feckers are still getting in: AhrefsBot Spider 02:15 PM / Viewing Index NIRC: 173.199.115.83.choopa.net

Simon Lloyd 11-01-2012 03:05 PM

If thats form the logging then yes you will get that until every thread they have tried to index previously becomes a 301 permanent redirect, if not, if you want to pm me temp admin access with all permissions i'll take a look and see what i can do for you.

Simon Lloyd 11-01-2012 04:12 PM

Ok, i've checked and i dont see any of these bots in your native vbulletin WOL, the other mods you have for statistics and total visitors...etc WILL log these as visiting because the bots are directly accessing a url, the logging is done before the url loads completely, my mod also bans them at this point so both mods are working :)

Just as a note, you're using create a thread, you can quickly get thousands of threads, it's better to use the output.txt logging :)

Note to all!:
If you have Simon in your ban list this will ban the following:
simon
SimonLloyd
Lloyd simon
thisisanincrediblylongsimonwordhere

Get the idea?, you dont need to add all those to your ban list, simply because the mod looks for the string "simon" (case doesn't matter) in the entire string, so, if you'd used this in your list:
Simon*\Lloyd
It would NOT ban:
Simon
Simon Lloyd
thisissimonlloydinastring
but it WOULD ban
Simon*\Lloyd-in(this.string)
thisstringSimon*\Lloydhere
....etc

Hope you all understand this better now and can get to removing duplicates from your list.

@tricksodave, you can delete the temp account for me now thanks, also if you read the above please prune your list.

If any of you have any trouble with editing your lists let me know and i'll help with anything you're stuck with :)

Disco_Dave 11-01-2012 04:17 PM

Thanks Simon, That's helped me understand it a bit better. Thanks again...

TheSupportForum 11-01-2012 04:20 PM

Simon Lloyd

i c wat u done there :)

haha

CAG CheechDogg 11-01-2012 07:33 PM

Simon does this also block Facebook's scrapper? I am getting slammed by Facebook IP's and spiders:

facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

I did it through htaccess but this blocks the ability for me to post any articles to facebook with a thumbnail.

Here is a link: http://www.botopedia.org/user-agent-...k-external-hit

CAG CheechDogg 11-01-2012 07:35 PM

Or is there a way to slow these guys down with crawl-delay like this:

User-Agent: *
Crawl-Delay: 10

I read you should use the agent by name instead of the above, if you know how or does facebook follow the above?

CAG CheechDogg 11-01-2012 07:39 PM

Here is something else on facebook's bot, spiders or what ever they really are. Facebook claims they are not spiders or bots but instead scrapers, but I have been getting 500 server side errors and I check my error logs and during or around the time they are hitting my site over 100 times sometimes within 2 minutes I see Facebook IPs in the error logs....sigh...

Help? lol....

Simon Lloyd 11-01-2012 08:32 PM

Is there only Facebook in the error log? As for banning both or whoever read my post above, as you see it all depends on the UAs of each bot, banning is a personal thing, most both don't recognise the delay command in robots. Maybe look at their ip range and ban some of their ips you can use my other mod for that.

Simon Lloyd 11-01-2012 08:34 PM

CAG you haven't downloaded or marked this as installed!

CAG CheechDogg 11-01-2012 08:40 PM

Hey! I did downloaded but I didn't hit installed! lol...Sorry ...

As for banning the ips I have done that, but that blocks the ability to post the articles with the right info on facebook, so I have to make a decision here on whether facebook will help my site or not.

I was just asking the question really about the crawl-delay which shouldn't have been asked here Simon, I apologize for that.

Simon Lloyd 11-01-2012 09:06 PM

Banning ips are only for incoming unless you've banned them in cpanel or htaccess. As for asking about the delay there's no problem i like to help where i can.

ForceHSS 11-01-2012 11:18 PM

Aboundex/0.2
seems to be a new one here is the full thing Aboundex/0.2 (http://www.aboundex.com/crawler/)
ip is 173.193.219.168-static.reverse.softlayer.com
I have checked the ip and it has come back that a spam bot is using it

If someone wants to run checks see if it needs added to the list. I am not 100% sure if it is this is why it needs checked first

CAG CheechDogg 11-01-2012 11:21 PM

Quote:

Originally Posted by Simon Lloyd (Post 2377651)
Banning ips are only for incoming unless you've banned them in cpanel or htaccess. As for asking about the delay there's no problem i like to help where i can.


Yeah I used htaccess to ban them completely. I need to find out exactly what IPs I can band and still allow facebook to work properly when I post links to articles or posts...sigh...lol

But hanks for understanding and helping out, it is very much appreciated Simon

Max Taxable 11-01-2012 11:39 PM

Quote:

Originally Posted by CAG CheechDogg (Post 2377686)
Yeah I used htaccess to ban them completely. I need to find out exactly what IPs I can band and still allow facebook to work properly when I post links to articles or posts...sigh...lol

But hanks for understanding and helping out, it is very much appreciated Simon

In my experience, that's the only time the FB external hit bot comes to your site - when you or someone else posts a link to your site, on facebook. It's your friend. Same with twitterbot and all of its affiliates. I don't mess with those at all.

CAG CheechDogg 11-01-2012 11:52 PM

Quote:

Originally Posted by Max Taxable (Post 2377691)
In my experience, that's the only time the FB external hit bot comes to your site - when you or someone else posts a link to your site, on facebook. It's your friend. Same with twitterbot and all of its affiliates. I don't mess with those at all.

Max it's weird because I have the facebook like buttons off on my forums. I do have rssgraffiti but I don't see why that would be hitting pages like the mood and status module and other unrelated pages.

Max Taxable 11-02-2012 12:02 AM

Quote:

Originally Posted by CAG CheechDogg (Post 2377693)
Max it's weird because I have the facebook like buttons off on my forums. I do have rssgraffiti but I don't see why that would be hitting pages like the mood and status module and other unrelated pages.

Some autospam bots do spoof their user agents as facebook or even googlebot.

CAG CheechDogg 11-02-2012 12:14 AM

Quote:

Originally Posted by Max Taxable (Post 2377697)
Some autospam bots do spoof their user agents as facebook or even googlebot.


Great! now you tell me ! lol...Thanks again Max I will have to take a careful look at the IPs and try to see if they match facebooks then.

Max Taxable 11-02-2012 12:22 AM

From what I've seen over the years facebook's bots have good behavior and only come to see you when something is posted there, from your site. Then they don't crawl around and they SURE don't go anywhere suspicious.

CAG CheechDogg 11-02-2012 02:52 AM

Yeah, nothing suspicious about facebook's crawlers, scrapers or bots , what ever they are. But it has caused my forums to pop the 500 internal server error a bunch of times , I check around the time those errors happen and are reported to me and its facebook's ips around the times of the 500 errors.

CAG CheechDogg 11-02-2012 03:48 PM

Well I decided to completely deny Facebook crawlers, scrapers, spiders or bots to crawl my site.

I deleted all their active sessions from my database through phpMyAdmin and added "facebook" to the list and I haven't gotten one single facebook critter on my site since.

Sucks because I can no longer share anything on facebook from my forums but I just had to do it. Facebook wont reply and doesn't seem to care about eating up bandwidth with their crawlers.

Oh well.

TheSupportForum 11-02-2012 04:03 PM

Quote:

Originally Posted by CAG CheechDogg (Post 2377828)
Well I decided to completely deny Facebook crawlers, scrapers, spiders or bots to crawl my site.

I deleted all their active sessions from my database through phpMyAdmin and added "facebook" to the list and I haven't gotten one single facebook critter on my site since.

Sucks because I can no longer share anything on facebook from my forums but I just had to do it. Facebook wont reply and doesn't seem to care about eating up bandwidth with their crawlers.

Oh well.

a wise choice if that's happening to you, as they will eat up your bandwidth


All times are GMT. The time now is 12:26 AM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01844 seconds
  • Memory Usage 1,837KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (14)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (1)pagenav_pagelinkrel
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (40)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete