vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vBulletin 4.x Add-ons (https://vborg.vbsupport.ru/forumdisplay.php?f=245)
-   -   Miscellaneous Hacks - Ban Spiders by User Agent (https://vborg.vbsupport.ru/showthread.php?t=268208)

Simon Lloyd 12-06-2014 09:56 AM

Quote:

Originally Posted by CAG CheechDogg (Post 2525962)
The Baidu spider can take up to a couple if not few months to completely disappear and actually obey the no crawl rule when adding this mod or even blocking it through your robots.txt ... I have it blocked everywhere and it took maybe 3 days before I didn't see it again ... the best thing to do for me was also add a huge IP block to my htaccess file that completely blocks all of China and a couple other Asian countries from accessing my site ...

Believe it or not you can actually go to their site and ask them not to crawl your site :)

tanzeelniazi 12-06-2014 10:38 AM

Quote:

Originally Posted by Simon Lloyd (Post 2525990)
That appears correct.

you say i am correct but i see also these spiders in whois online :( why ? also i see i link showing like this
showthread?=1' ???
i mean 1 spider see this link showthread?=1' ???

Simon Lloyd 12-06-2014 01:32 PM

Where are you seeing that? make sure that your list of bad bots has no leading or trailing spaces on each name. If you are still having trouble you can pm me temporary admin login details with rights to administer plugins and i'll take a look :)

Max Taxable 12-06-2014 06:20 PM

Quote:

Originally Posted by CAG CheechDogg (Post 2525962)
The Baidu spider can take up to a couple if not few months to completely disappear and actually obey the no crawl rule when adding this mod or even blocking it through your robots.txt ... I have it blocked everywhere and it took maybe 3 days before I didn't see it again ... the best thing to do for me was also add a huge IP block to my htaccess file that completely blocks all of China and a couple other Asian countries from accessing my site ...

It's not a no crawl rule, it is a outright block, with this mod. There's no obeying or disobeying. It's not robots.txt. When you install this mod and with baidu on the list, it should be blocked. Gone.

I believe for the incidents of it still appearing in v4s after this mod is installed, must have something to do with interference from some other mod. Else, how to explain my v4 getting NO appearances by baidu after I installed this?

Max Taxable 12-06-2014 06:55 PM

I'll add to the above - I suspect a hook conflict actually. One that only happens when some other mod calls the same hook, and it's not all the time because from what we saw at OzzModz, Baidu was greatly reduced in appearances by this mod, just not totally gone. Occasionally one or two of them slip through during the time of the hook conflict.

It doesn't happen in v3 at all, for the same reason I believe.

Gadget_Guy 12-06-2014 07:34 PM

Anything in particular I can look for in terms of a hook that might conflict?

Would a list of my mods help?

D.

Max Taxable 12-06-2014 08:06 PM

In v3 and v4 this mod calls "style_fetch."

CAG CheechDogg 12-06-2014 08:20 PM

Well I haven't had Baidu appear on my site in over 2 years ... way before I even installed this mod which has helped me a lot regardless of how it does it lol ... my point is that "I" haven't had Baidu for a very long time ...

Max Taxable 12-06-2014 08:31 PM

Well sure if you block China and other countries via .htaccess you probably won't see Baidu.

I used to have a massive .htaccess with country blocks.

Simon Lloyd 12-07-2014 04:50 AM

banning by .htaccess is fine if you only have a few things in it because it is read with every single server request, so if you have 10 blocks in your .htaccess and lets say you have a web page with 30 elements (icons, css, containers, includes.....etc) then each one of those that tries to access that page has 30 checks made just to load that page.

Now consider your own landing page and check how many things load to make that page up and you'll soon see why having a lot of bans in your .htaccess can be detrimental particularly if you are on shared hosting or limited vps.

@Gadget_Guy & Max Taxable
The hook is style_fetch, you can try changing the hook for one of the others that loads before all the others but you may not see the result your looking for, doesn't hurt to try :)

ozzy47 12-07-2014 07:52 AM

Yeah I would try and stay away from ip blocking totally.

Ha Ha Ha, I have been testing this out for the past 10hrs or so Simon, to early to tell yet, but so far looking good.

princesspepper 12-07-2014 10:49 AM

Installed on VB4.2.2 PL2.

One question that I can't seem to find in the first few posts of this thread that usually explain stuff.... Why would you choose to redirect the bot back to itself? What function does this have over redirecting to a url?

ozzy47 12-07-2014 10:52 AM

It really makes no difference where you send them, it is just user choice. :)

Simon Lloyd 12-07-2014 12:10 PM

Quote:

Originally Posted by princesspepper (Post 2526133)
Installed on VB4.2.2 PL2.

One question that I can't seem to find in the first few posts of this thread that usually explain stuff.... Why would you choose to redirect the bot back to itself? What function does this have over redirecting to a url?

For me it was giving them a taste of their own medicine, they drain our resources so we send them back to drain theirs :)

ozzy47 12-07-2014 08:47 PM

Quote:

Originally Posted by Simon Lloyd (Post 2526073)
The hook is style_fetch, you can try changing the hook for one of the others that loads before all the others but you may not see the result your looking for, doesn't hurt to try :)

Quote:

Originally Posted by ozzy47 (Post 2526087)
Ha Ha Ha, I have been testing this out for the past 10hrs or so Simon, to early to tell yet, but so far looking good.

Well so far it seems to be going as planned, I will wait another 24 - 48 hrs, and if it is working, I'll let you know exactly what i did. Which hook I used, and what additional plugin I added. :)

Gadget_Guy 12-07-2014 08:52 PM

Whoot!

Looking forward to your findings Ozzy!

D.

CAG CheechDogg 12-07-2014 11:05 PM

Quote:

Originally Posted by Simon Lloyd (Post 2526073)
banning by .htaccess is fine if you only have a few things in it because it is read with every single server request, so if you have 10 blocks in your .htaccess and lets say you have a web page with 30 elements (icons, css, containers, includes.....etc) then each one of those that tries to access that page has 30 checks made just to load that page.

Now consider your own landing page and check how many things load to make that page up and you'll soon see why having a lot of bans in your .htaccess can be detrimental particularly if you are on shared hosting or limited vps.

@Gadget_Guy & Max Taxable
The hook is style_fetch, you can try changing the hook for one of the others that loads before all the others but you may not see the result your looking for, doesn't hurt to try :)

I have had the ip blocks in my htaccess for over 5 years my Man and I haven't ran into any problems in those 5 years ..

If an IP is blocked on your server it's not allowing the page or any page to load, so I am a bit confussed about "so if you have 10 blocks in your .htaccess and lets say you have a web page with 30 elements (icons, css, containers, includes.....etc) then each one of those that tries to access that page has 30 checks made just to load that page."


As a matter of fact, when I didn't have these IP blocks in my htaccess file I was constantly getting emails from my host that my site was being suspended ... by blocking these IPs I am keeping them from even accessing anything on my website or forums ...thus the usage of resources went down ...

Max Taxable 12-07-2014 11:24 PM

Quote:

Originally Posted by princesspepper (Post 2526133)
Installed on VB4.2.2 PL2.

One question that I can't seem to find in the first few posts of this thread that usually explain stuff.... Why would you choose to redirect the bot back to itself? What function does this have over redirecting to a url?

Just don't redirect to any of your own pages - feedback loop danger.

EDIT TO ADD: I was right about the hook conflict with some other mod(s) Ozzy?

ozzy47 12-07-2014 11:30 PM

I would not say a conflict, but perhaps a better hook to execute the mod. That is if the testing continues to provide the desired results.

Simon Lloyd 12-08-2014 06:48 AM

Quote:

Originally Posted by CAG CheechDogg (Post 2526228)
I have had the ip blocks in my htaccess for over 5 years my Man and I haven't ran into any problems in those 5 years ..

If an IP is blocked on your server it's not allowing the page or any page to load, so I am a bit confussed about "so if you have 10 blocks in your .htaccess and lets say you have a web page with 30 elements (icons, css, containers, includes.....etc) then each one of those that tries to access that page has 30 checks made just to load that page."


As a matter of fact, when I didn't have these IP blocks in my htaccess file I was constantly getting emails from my host that my site was being suspended ... by blocking these IPs I am keeping them from even accessing anything on my website or forums ...thus the usage of resources went down ...

I agree in part, when you didnt have the block they were calling on every resource...php, mysql, cpu and ram, with the block they pretty much are just using ram as cpu and php time and response is minmal and as you are not loading anything else the ram isn't being maxed either. If you have whole country blocks that doesn't take as much checking as full octet ips like 192.161.0.1, if you have plenty of those then they are checked against each request, if you are blocking just 192.161 then its just one check against each request.

Im probably not explaining myself too well (it reads much better in my head :)).

Simon Lloyd 12-08-2014 06:52 AM

Quote:

Originally Posted by ozzy47 (Post 2526232)
I would not say a conflict, but perhaps a better hook to execute the mod. That is if the testing continues to provide the desired results.

Just bear in mind that any other hook you choose will need to be sufficient to perform the other tasks of the mod if you wished like sending the email or creating the threads. Some of the other runtime hooks will give errors or not work as expected especially with the thread creation, also keep in mind you need to redirect them before anything has loaded as it's this that is the basis of the mod - keeping resources for your members and not the bots :)

CAG CheechDogg 12-08-2014 06:55 AM

Quote:

Originally Posted by Simon Lloyd (Post 2526254)
I agree in part, when you didnt have the block they were calling on every resource...php, mysql, cpu and ram, with the block they pretty much are just using ram as cpu and php time and response is minmal and as you are not loading anything else the ram isn't being maxed either. If you have whole country blocks that doesn't take as much checking as full octet ips like 192.161.0.1, if you have plenty of those then they are checked against each request, if you are blocking just 192.161 then its just one check against each request.

Im probably not explaining myself too well (it reads much better in my head :)).


No my Man, you actually are explaining yourself very well lol ...

All I know is that I have not had any negative effects from doing the blocks and I also have just a list of single IPs ..... and let me tell you , that list is long as hell lol ....

princesspepper 12-08-2014 06:57 AM

Quote:

Originally Posted by Max Taxable (Post 2526230)
Just don't redirect to any of your own pages - feedback loop danger.

EDIT TO ADD: I was right about the hook conflict with some other mod(s) Ozzy?

Thanks, but I'm still unsure what the benefit would be to redirect back to the source. Would it make them aware you don't want them sooner?

Simon Lloyd 12-08-2014 08:42 AM

Quote:

Originally Posted by princesspepper (Post 2526259)
Thanks, but I'm still unsure what the benefit would be to redirect back to the source. Would it make them aware you don't want them sooner?

It really doesn't matter, they are redirected with a 301 which is a permanent redirect, so they will always see the url they tried to crawl as the one you send them to. Like i said, i coded that in to send them back to themselves so they have less resources to be crawling other peoples sites - it's only fair! :)

Alan_SP 12-08-2014 09:17 PM

Quote:

Originally Posted by CAG CheechDogg (Post 2525962)
the best thing to do for me was also add a huge IP block to my htaccess file that completely blocks all of China and a couple other Asian countries from accessing my site ...

Would you share you CIDR list? Not in this thread, but maybe make a new thread?

princesspepper 12-08-2014 10:14 PM

Quote:

Originally Posted by Simon Lloyd (Post 2526269)
i coded that in to send them back to themselves so they have less resources to be crawling other peoples sites - it's only fair! :)

Thanks, that is all I wanted to know. :)

Simon Lloyd 12-09-2014 03:47 AM

Princesspepper could you mark this as installed please :)

CAG CheechDogg 12-09-2014 03:49 AM

Quote:

Originally Posted by Alan_SP (Post 2526371)
Would you share you CIDR list? Not in this thread, but maybe make a new thread?

You can get it here my Man : https://vborg.vbsupport.ru/showthrea...134.184.0%2F21

Gadget_Guy 12-09-2014 06:18 PM

Hey Ozzy,

Are we any closer to an alternative or modification to this so that we can get better blocking in place?

I am willing to test on my site as I am still getting hit hard by spiders even with the mod in place.

D.

Simon Lloyd 12-09-2014 07:33 PM

Hey Gadget Guy, no disrespect but the mod here is mine and isn't marked as reusable code, Ozzy may post what he's tried or done but wont necessarily be added to this mod, however Ozzy has developed one like this with other measures, you can get it at his site.

If you are being hit by spiders with this mod in place it will be because there is an anomaly in your list, this list isn't exhaustive but here's a few reasons why:
Entry in list has a leading or trailing space
Entry has a typo of some sort
Entry doesn't actually represent the bot you think it does (i.e Ahrefsbot I believe has a different name in the UA)
Mod run order may conflict with another mod using the same hook
There are other reasons but those should get you going! :)

CAG CheechDogg 12-09-2014 07:56 PM

Crazy how a leading or trailing space can jack things up lol .. I remember trying to block facebook completely when I first installed your mod here Simon and it wasn't working ..why? the damn trailing space!! lol

Gadget_Guy 12-09-2014 08:14 PM

Sorry Simon,

I meant no disrespect... I thought you guys were working together.

My apologies.

Could I send you my list via PM and a screenshot of what I am seeing in terms of the spiders appearing?

D.

ozzy47 12-09-2014 08:44 PM

Quote:

Originally Posted by Simon Lloyd (Post 2526503)
Hey Gadget Guy, no disrespect but the mod here is mine and isn't marked as reusable code, Ozzy may post what he's tried or done but wont necessarily be added to this mod, however Ozzy has developed one like this with other measures, you can get it at his site.

If you are being hit by spiders with this mod in place it will be because there is an anomaly in your list, this list isn't exhaustive but here's a few reasons why:
Entry in list has a leading or trailing space
Entry has a typo of some sort
Entry doesn't actually represent the bot you think it does (i.e Ahrefsbot I believe has a different name in the UA)
Mod run order may conflict with another mod using the same hook
There are other reasons but those should get you going! :)

No I have not developed a mod similar to this. I recommend this mod to everyone. If you remember correctly, I was one of your beta testers way back in the day.

My actual intention with what I have done, was to PM it to you first, and get your input, before I said anything to anyone else, or maybe you update the mod with it.

So let me know what you want me to do. :)

Alan_SP 12-09-2014 09:50 PM

Quote:

Originally Posted by CAG CheechDogg (Post 2526403)

Thank you, I tried to like your post, but there's a limit in this.

Anyway, I see you put this in htaccess, I think it's better added to firewall rules for denying hosts. It works much faster. Of course, if you have access to firewall in that way.

ozzy47 12-09-2014 09:52 PM

Yeah, if you have a dedi, or a vps, you would add it to the firewall there, stop them cold.

Simon Lloyd 12-10-2014 04:35 AM

Quote:

Originally Posted by ozzy47 (Post 2526511)
No I have not developed a mod similar to this. I recommend this mod to everyone. If you remember correctly, I was one of your beta testers way back in the day.

My actual intention with what I have done, was to PM it to you first, and get your input, before I said anything to anyone else, or maybe you update the mod with it.

So let me know what you want me to do. :)

I thought you had developed something that's why planned improvements for this have been shelved (no point in duplication :)), in that case Ozzy PM away :)

Simon Lloyd 12-10-2014 04:37 AM

Quote:

Originally Posted by Gadget_Guy (Post 2526509)
Sorry Simon,

I meant no disrespect... I thought you guys were working together.

My apologies.

Could I send you my list via PM and a screenshot of what I am seeing in terms of the spiders appearing?



D.

Yes of course :)

Max Taxable 12-10-2014 04:45 AM

Quote:

Originally Posted by Simon Lloyd (Post 2526552)
I thought you had developed something that's why planned improvements for this have been shelved (no point in duplication :)), in that case Ozzy PM away :)

No sir we have been trying to solve the mystery of why Baidu gets through on some v4 installations, but not all and never a v3, and my hook conflict idea opened a new can of worms for investigation, and Ozz found something very interesting.

ForceHSS 12-10-2014 08:43 AM

Quote:

Originally Posted by Max Taxable (Post 2526554)
No sir we have been trying to solve the mystery of why Baidu gets through on some v4 installations, but not all and never a v3, and my hook conflict idea opened a new can of worms for investigation, and Ozz found something very interesting.

What interesting thing was found

CAG CheechDogg 12-10-2014 09:21 AM

Yeah ...Yeah .. what "something very interesting" do you speak of .....


All times are GMT. The time now is 07:59 AM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.02036 seconds
  • Memory Usage 1,853KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (22)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (1)pagenav_pagelinkrel
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (40)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete