Go Back   vb.org Archive > vBulletin 3 Discussion > vB3 General Discussions
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools Display Modes
  #21  
Old 06-27-2014, 08:31 PM
Simon Lloyd's Avatar
Simon Lloyd Simon Lloyd is offline
 
Join Date: Aug 2008
Location: Manchester
Posts: 3,481
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

a lot of bots dont recognise robots.txt but follow some of the advice here http://antezeta.com/news/avoid-search-engine-indexing. For .htaccess use this
Code:
RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider|yandex|anywordyoulike|like|bing) [NC]
RewriteRule .* - [R=403,L]
just keep adding names to the string or use my mod for banning bots.

Your .htaccess file must reside in the forum root for this (you might have to set your control panel to view hidden files if you dont see it)
Reply With Quote
  #22  
Old 06-27-2014, 08:53 PM
Ghostt Ghostt is offline
 
Join Date: Oct 2009
Posts: 359
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

1. i dont want block all useragent especially not google....

i need to block them from only 1101 forumdisplay what ive done with this robot.txt line (hope this is correct):
Disallow: /*forumdisplay.php?f=1101&order=desc&page=*

but this is the emergency solution because its not a real fix of the bad performance of very big forumdisplay pages.
and how i see no one can realy help here .

and i allready use your ban spider addon. thanks for it, but we have sometimes problem with it.
if server crashes the cloudflare.com 502 error page is shown, after user refresh page. user get blocked/redirected from this addon.
you know why?
Reply With Quote
  #23  
Old 06-27-2014, 10:14 PM
Simon Lloyd's Avatar
Simon Lloyd Simon Lloyd is offline
 
Join Date: Aug 2008
Location: Manchester
Posts: 3,481
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

It will be to do with cloudflare caching i would imagine as my mod doesn't store any user/visitor details. As for your search engines try this, put this in your header template (or even forumdisplay but i think it must always use the header template)
HTML Code:
<if condition="in_array($forumid, array(X,Y,Z)) AND "$show['search_engine']">
<meta HTTP-EQUIV="REFRESH" content="0; url=http://www.mysite.com">
</if>
Change x,y,z to be whichever forumid's you want to protect, change mysite.com to any url you want to redirect the spiders to and you should be golden
Reply With Quote
  #24  
Old 06-27-2014, 10:31 PM
Ghostt Ghostt is offline
 
Join Date: Oct 2009
Posts: 359
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

just tested htaccess code on the wordpress site on same server. connections/sec dropped from 200 to 10
with blocking this bastards: Baiduspider|yandex|anywordyoulike|like

so my robot.txt code not allways work you say?
seems to. but i will test your code tomorrow and check querys. thanks,
Reply With Quote
  #25  
Old 06-27-2014, 11:29 PM
Simon Lloyd's Avatar
Simon Lloyd Simon Lloyd is offline
 
Join Date: Aug 2008
Location: Manchester
Posts: 3,481
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I put these in anywordyoulike|like to show you can include anything in the string, they're not actual bots Robots.txt is only obeyed by good legitimate (but maybe unwanted) bots, for the likes of Baidu...etc you can actually get in contact with them and ask them to stop indexing your site.
Reply With Quote
  #26  
Old 06-27-2014, 11:37 PM
Ghostt Ghostt is offline
 
Join Date: Oct 2009
Posts: 359
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

so baidu ignore robots.txt?
Reply With Quote
  #27  
Old 06-28-2014, 12:10 AM
Simon Lloyd's Avatar
Simon Lloyd Simon Lloyd is offline
 
Join Date: Aug 2008
Location: Manchester
Posts: 3,481
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

It seems that way as do many others like AhrefsBot, sosospider, Aboundex and even Bing to name but a few!

--------------- Added [DATE]1403918146[/DATE] at [TIME]1403918146[/TIME] ---------------

For a more complete .htaccess block look here http://wpsecure.net/bad-bot-list/
Reply With Quote
  #28  
Old 06-28-2014, 10:53 AM
Ghostt Ghostt is offline
 
Join Date: Oct 2009
Posts: 359
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

i test it . SetEnvIfNoCase or rewrite Rules better?
both codes from the link dont work i think coz i see in cloudflare baidu still crawling.... first rewrite gives an error.
can i use this bots listed there for your write here?
or any better list. in your plugin ive a very big list but i think it will not work with that list because there are not complete spider names

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider|yandex|bing) [NC]
RewriteRule .* - [R=403,L]
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 09:27 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.09352 seconds
  • Memory Usage 2,232KB
  • Queries Executed 13 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)ad_showthread_firstpost
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (1)bbcode_code
  • (1)bbcode_html
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)navbar
  • (3)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (2)pagenav_pagelink
  • (8)post_thanks_box
  • (8)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (8)post_thanks_postbit_info
  • (8)postbit
  • (8)postbit_onlinestatus
  • (8)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete