vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vBulletin 4.x Add-ons (https://vborg.vbsupport.ru/forumdisplay.php?f=245)
-   -   Miscellaneous Hacks - Ban Spiders by User Agent (https://vborg.vbsupport.ru/showthread.php?t=268208)

fly 02-11-2013 05:20 PM

Quote:

Originally Posted by Max Taxable (Post 2403548)
Amazon AWS is their hosting they sell. And yes they also crawl the web: http://aws.amazon.com/search-engines/

I have it blocked as well, using this Mod.

Did you read that? There is nowhere in that link that says that Amazon themselves crawl websites. Can you even explain why a hosting company would want to catalog data from every website on the internet?

I'm wondering if there is some confusion on what a user agent is and does. The UA is the remote web crawlers way of tell you that it is there cataloging your site. It's not required that a crawler send you a UA at all. Instead, its just considered polite. If someone wanted to, they could send a completely random UA every time or not send one at all.

Since Amazon AWS is in the hosting business, they have no need to crawl websites at all. However, this doesn't PREVENT people from buying their own server from Amazon and crawling your website. If someone were to do this, the UA would be whatever they wanted it to be, not some form of "AmazonAWS".

Assuming what you're really trying to do is prevent anyone from buying a server from Amazon and accessing your website, you'll need to find all the IP blocks that AWS owns and block those. However, that is outside the scope of this mod.

Simon Lloyd 02-11-2013 07:58 PM

For reference here's what a user agent is and some extra info http://en.wikipedia.org/wiki/User_agent. All this mod is designed to do is stop bots from eating up your bandwidth by redirecting them before any content loads. To be honest you can never stop anyone who is intent on scraping your site from doing so.

Max Taxable 02-11-2013 11:00 PM

Quote:

Originally Posted by fly (Post 2403551)
Did you read that? There is nowhere in that link that says that Amazon themselves crawl websites. Can you even explain why a hosting company would want to catalog data from every website on the internet?

I'm wondering if there is some confusion on what a user agent is and does. The UA is the remote web crawlers way of tell you that it is there cataloging your site. It's not required that a crawler send you a UA at all. Instead, its just considered polite. If someone wanted to, they could send a completely random UA every time or not send one at all.

Since Amazon AWS is in the hosting business, they have no need to crawl websites at all. However, this doesn't PREVENT people from buying their own server from Amazon and crawling your website. If someone were to do this, the UA would be whatever they wanted it to be, not some form of "AmazonAWS".

Assuming what you're really trying to do is prevent anyone from buying a server from Amazon and accessing your website, you'll need to find all the IP blocks that AWS owns and block those. However, that is outside the scope of this mod.

The "amazonaws" crawlers have that designation in their UA string. Anything else coming from Amazon has it in its host description.

The rest of your missive, I am well aware of.

fly 02-12-2013 12:55 AM

ok.

Inspector G 03-03-2013 01:39 AM

I have a confusing question...
Ok I have a very small member site...like 24 members...
So when I noticed I had 35 users online most of the time and I started seeing more and more baidu spiders
I decided to do something about it...

I installed this mod.
almost instantly ...well within say 3 hours my users online soared to well over 150 on busy times like Now...tonight.
I had
Most users ever online was 247, 1 Day Ago at 12:58 AM.

With only one new account created, and maybe me or one other registered user online...

My question is this. what happened when I installed this mod to make such a drastic change in the users on my site and why?

I do not understand this and I read that the server load increases...
I find it hard to believe that anyone is finding my site via a search engine since it is a brand new .cc name and it has only been online for two months now...

Is there something about pushing away Baidu that enables more sites to come, or Spam bots?
attempting to register and what not, many are in areas that there would not be a normal user.

I see many attempts a registering and yet no more new users.,.. so I believe those are bots locking...

Please advise...

Simon Lloyd 03-03-2013 03:42 AM

What's happening is (and you'll probably find this) is because Baidu can't get in with the spiders/ip's they were using they are now trying a rotation of other ip's and bots, i use this mod myself although i don't ban the bots as i monitor their visits to further enhance any mod i make against them, i currently have 236 baidu bots (and 140 other bots/search engines) at my site.

With the mod in place and redirection working you'll find that these bots that you have banned will slowly drop off as they all get the message of the 301 permananet redirect to wherever you've decided to send them, your server load will lessen and things will be more normal :)

Simon Lloyd 03-03-2013 03:44 AM

Also do you have your robots.txt set up correctly to stop the search engines or bots that obey robots.txt from indexing pages on your site that they shouldn't like register.php, members.php ....etc?

Inspector G 03-03-2013 03:59 AM

I did not understand how to do the text part since I am what I even call very green in this aspect of Vbulleting...
so I just installed the mod...
I can wait and see if it drops off and report back...
Thanks for the help in understanding...

Simon Lloyd 03-03-2013 05:34 AM

1 Attachment(s)
Ok, what you need to do is upload the attached to your forum root, however if your forum is at this level www.mysite.com/ then edit the attached to remove /forums if your forum is at this level www.mysite.com/forums then you can just upload it to that folder.

You can add any page or file to robots.txt that you wish, just follow the same structure :)

Inspector G 03-03-2013 05:56 AM

Well thanks Simon...
Thats really nice...
I will do so immediately.
Nice to see someone really help out the Noob...lol
Thanks again I appreciate this very much...
I will report back.


All times are GMT. The time now is 12:37 AM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01906 seconds
  • Memory Usage 1,747KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (2)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (2)pagenav_pagelinkrel
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (10)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete