vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vB3 General Discussions (https://vborg.vbsupport.ru/forumdisplay.php?f=111)
-   -   vb_taggregate_temp_... table grow! (https://vborg.vbsupport.ru/showthread.php?t=311785)

Simon Lloyd 06-27-2014 08:31 PM

a lot of bots dont recognise robots.txt but follow some of the advice here http://antezeta.com/news/avoid-search-engine-indexing. For .htaccess use this
Code:

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider|yandex|anywordyoulike|like|bing) [NC]
RewriteRule .* - [R=403,L]

just keep adding names to the string or use my mod for banning bots.

Your .htaccess file must reside in the forum root for this (you might have to set your control panel to view hidden files if you dont see it)

Ghostt 06-27-2014 08:53 PM

1. i dont want block all useragent especially not google....

i need to block them from only 1101 forumdisplay what ive done with this robot.txt line (hope this is correct):
Disallow: /*forumdisplay.php?f=1101&order=desc&page=*

but this is the emergency solution because its not a real fix of the bad performance of very big forumdisplay pages.
and how i see no one can realy help here .

and i allready use your ban spider addon. thanks for it, but we have sometimes problem with it.
if server crashes the cloudflare.com 502 error page is shown, after user refresh page. user get blocked/redirected from this addon.
you know why?

Simon Lloyd 06-27-2014 10:14 PM

It will be to do with cloudflare caching i would imagine as my mod doesn't store any user/visitor details. As for your search engines try this, put this in your header template (or even forumdisplay but i think it must always use the header template)
HTML Code:

<if condition="in_array($forumid, array(X,Y,Z)) AND "$show['search_engine']">
<meta HTTP-EQUIV="REFRESH" content="0; url=http://www.mysite.com">

</if>

Change x,y,z to be whichever forumid's you want to protect, change mysite.com to any url you want to redirect the spiders to and you should be golden :)

Ghostt 06-27-2014 10:31 PM

just tested htaccess code on the wordpress site on same server. connections/sec dropped from 200 to 10
with blocking this bastards: Baiduspider|yandex|anywordyoulike|like

so my robot.txt code not allways work you say?
seems to. but i will test your code tomorrow and check querys. thanks,

Simon Lloyd 06-27-2014 11:29 PM

I put these in anywordyoulike|like to show you can include anything in the string, they're not actual bots ;) Robots.txt is only obeyed by good legitimate (but maybe unwanted) bots, for the likes of Baidu...etc you can actually get in contact with them and ask them to stop indexing your site.

Ghostt 06-27-2014 11:37 PM

so baidu ignore robots.txt?

Simon Lloyd 06-28-2014 12:10 AM

It seems that way as do many others like AhrefsBot, sosospider, Aboundex and even Bing to name but a few!

--------------- Added [DATE]1403918146[/DATE] at [TIME]1403918146[/TIME] ---------------

For a more complete .htaccess block look here http://wpsecure.net/bad-bot-list/

Ghostt 06-28-2014 10:53 AM

i test it . SetEnvIfNoCase or rewrite Rules better?
both codes from the link dont work i think coz i see in cloudflare baidu still crawling.... first rewrite gives an error.
can i use this bots listed there for your write here?
or any better list. in your plugin ive a very big list but i think it will not work with that list because there are not complete spider names

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider|yandex|bing) [NC]
RewriteRule .* - [R=403,L]


All times are GMT. The time now is 02:22 PM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01023 seconds
  • Memory Usage 1,730KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)bbcode_code_printable
  • (1)bbcode_html_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (2)pagenav_pagelink
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (8)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete