Go Back   vb.org Archive > vBulletin Modifications > Archive > vB.org Archives > vBulletin 3.0 > vBulletin 3.0 Full Releases

Reply
 
Thread Tools
Remove Bot SIDs from URL Requests Details »»
Remove Bot SIDs from URL Requests
Version: 1.00, by calorie calorie is offline
Developer Last Online: Nov 2023 Show Printable Version Email this Page

Version: 3.0.3 Rating:
Released: 09-29-2004 Last Update: Never Installs: 16
 
No support by the author.

Hack 1: vB303_remove_bot_sids_1.txt

Okay so I notice that there are some bots where SIDs are in the requests. One such bot is msnbot, and who knows of the current code behind this bot, but it seems that it treats each different SID as a new link. Here is a quick and dirty hack to prevent this. You need the $_SERVER['HTTP_USER_AGENT'] and $_SERVER['REQUEST_URI'] array elements or their equivalents to use this mini hack. The first step of the hack prevents SIDs in new requests. The second step forces a redirect in order to strip the SIDs from links in the bot memory. There is no need to apply this hack for bots that have google or slurp@inktomi or yahoo! slurp as part of their user agent. Like I said, it is a quick and dirty hack, but it does what I need it to do. If you use this mod, a click of the install button is appreciated.

Hack 2: vB303_remove_bot_sids_2.txt

Do the following to see a list of bots that may appear on the Who's Online list: AdminCP >> vBulletin Options >> Who's Online Options >> Spider Identification Strings & Enable Spider Display & Spider Identification Description

However, according to http://www.vbulletin.com/forum/showthread.php?t=112022, the user agents that don't receive session IDs are hard coded in the sessions.php file. The bots that are hard coded are as follows: google, slurp@inktomi, yahoo! slurp

Thus the bots for the "who's online list" versus the bots in the "remove SID list" are currently not the same. This hack removes the session ids from the list of bots in the vBulletin Options rather than from those that were hard coded in the script.

It may be the case that pages were already crawled by a bot not hard coded in the "remove SID list" so those bots may spider with session ids in the requests. This hack includes an optional step to remove session ids from such bots via redirect.

Show Your Support

  • This modification may not be copied, reproduced or published elsewhere without author's permission.

Comments
  #12  
Old 10-02-2004, 11:25 PM
Erwin's Avatar
Erwin Erwin is offline
 
Join Date: Jan 2002
Posts: 7,604
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Someone should integrate the 2. It would not be too hard.
Reply With Quote
  #13  
Old 10-03-2004, 03:42 AM
calorie calorie is offline
 
Join Date: May 2003
Posts: 2,804
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Okay, the second hack in the first post of this thread uses the bots listed in the vBulletin Options.
Reply With Quote
  #14  
Old 10-09-2004, 04:25 PM
AlexanderT's Avatar
AlexanderT AlexanderT is offline
 
Join Date: Mar 2003
Posts: 294
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

calorie thank you! Just by accident I noticed that the two IPs using most of my bandwidth were msnbot and jetbot these days. And the logs revealed that they were constantly browsing my forum with new session strings. A nightmare!

Notice though that for vB303_remove_bot_sids_2 you could probably use the datastore cache, thus saving one costly query.
Reply With Quote
  #15  
Old 11-13-2004, 12:26 AM
BamaStangGuy's Avatar
BamaStangGuy BamaStangGuy is offline
 
Join Date: Mar 2004
Location: Alabama
Posts: 521
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I'm really confused.... what is msnbots unique identifier.... and what do I need to do so they dont get sids....
Reply With Quote
  #16  
Old 11-19-2004, 10:54 PM
T2DMan T2DMan is offline
 
Join Date: Apr 2004
Location: Auckland, New Zealand
Posts: 81
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Always apprehensive about adding hacks when it adds additional load to the server (more lines of code). But this one looks to reduce the amount of downloads that the spiders will potentially make. So it should mean less bandwidth and less server load from the bots.

Good hack.
Reply With Quote
  #17  
Old 11-22-2004, 07:05 AM
ChuanSE's Avatar
ChuanSE ChuanSE is offline
 
Join Date: Feb 2003
Posts: 311
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

SO, what is the final conclusion about this all?

Is there a hack or update available?

thx
Reply With Quote
  #18  
Old 02-03-2005, 09:05 AM
agiacosa agiacosa is offline
 
Join Date: Dec 2004
Posts: 208
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Has this been resolved?
Reply With Quote
  #19  
Old 02-03-2005, 09:20 AM
agiacosa agiacosa is offline
 
Join Date: Dec 2004
Posts: 208
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Instructions say "$zzzz_domain_tld = "http://www.yourdomain.tld"; //////////// CONFIGURE THIS VARIABLE - NO ENDING SLASH *************"

Is it www.mydomain.tld or www.mydomain.com?
Reply With Quote
  #20  
Old 02-10-2005, 06:24 AM
calorie calorie is offline
 
Join Date: May 2003
Posts: 2,804
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

The vB 3.0.6 code in includes/sessions.php still *cough* does a hard remove of SIDs from bot requests.

- as of vB 3.0.3: (google|slurp@inktomi|yahoo! slurp)
- as of vB 3.0.4: (google|slurp@inktomi|yahoo! slurp)
- as of vB 3.0.5: (google|slurp@inktomi|yahoo! slurp)
- as of vB 3.0.6: (google|msnbot|yahoo! slurp)

This means that setting WOL bots via vBoptions does not automatically imply removal of SIDs from every bot request.

Note that WOL settings versus SID removal are two different things, as of the last time I checked (see this thread).

For as much as Zachery is a sweetie, as of vB 3.0.6, WOL bots via vBoptions do not automatically remove SIDs from every bot request.

Both hack1 and hack2 posted should still work for vB 3.0.3 through vB 3.0.6., and while I briefly looked at datastore, hack2 still uses a query.

Also note that, although MSNbot was added in includes/sessions.php as of vB 3.0.6, it will not prevent MSNbot (or any other bot) from making requests with SIDs if said bot has already requested pages using SIDs.

That is where the optional portion of the hacks comes into play! I have modified my optional portion, to be placed at the start of includes/init.php, as shown below. Of course, you could PHP include the code just the same.

Now, you need to realize that the below code is rather 'buttoned down' in that listed bots can only crawl forumdisplay, showthread, printthread, and index, and only certain query string type pieces related to those pages.

I worked my optional portion this way because I have no need for bots to consider, for example, showthread.php?t=xyz&page=a&pp=A different from showthread.php?t=xyz&page=a&pp=B, index.php? different from index.php, etcetera.

In my mind, robots.txt and meta tags options, etcetera, are not quite flexible enough, and do not have a fast enough response. Rather, I choose to 'button down that hatch' so to speak with forced 301s as shown below.

Of course, the below code does not preclude the use of a .htaccess file (your OS willing) so, whatever you do, the way you decide to handle bots is ultimately up to you, your OS willing.

Code:
/*************************************************************************************************************************************************************************/

// are $_SERVER['HTTP_USER_AGENT'] and $_SERVER['REQUEST_URI'] defined on your server?
// if the answer is no, do not apply this hack, as this hack needs those $_SERVER elements

// is your vB forum located at http://www.your-domain.com/index.php on your server?
// if the answer is yes, do not apply this hack, as this hack only works for forums located 
// at http://www.your-domain.com/your-forum-dir/index.php

// what is your domain uri - no ending slash
$zzzz_domain_tld = "http://www.YOUR-DOMAIN.COM";

// what are your forum directories - separate with | character - begin slash - no ending slash
$zzzz_forum_dirs = "/forum|/forum/archive";

// what forum pages to allow - separate with | character - no extension as .php is assumed
// note: at max you can allow forumdisplay, showthread, printthread, index - no showpost, etcetera
$zzzz_forum_pages = "forumdisplay|showthread|printthread|index";

// what bots to redirect - separate with | character - bot name must be part of the bot user agent
$zzzz_redirect_bots = "msnbot|gigabot|yahoo|google|jeeves|bot|crawl|seek|wisenut|teoma";

/*************************************************************************************************************************************************************************/

$zzzz_pages_allowed = "(($zzzz_forum_dirs)/($zzzz_forum_pages)\.php((/|[?])?([a-z]+[=][a-z]+[&])?([tf][=-][0-9]+([&](page)[=][0-9]+)?([-][p][-][0-9]+)?)?(\.html)?)?)";

if (preg_match("#($zzzz_redirect_bots)#si",$_SERVER['HTTP_USER_AGENT'])) {
        if (preg_match("#(s|sessionhash)=[a-z0-9]{32}?&?#si",$_SERVER['REQUEST_URI'])) {
                $zzzz_destination = preg_replace("/(s|sessionhash)=[a-z0-9]{32}?&?/","",$_SERVER['REQUEST_URI']);
		zzzz_doRedirect($zzzz_domain_tld,$zzzz_destination);
        }
        if (eregi("$zzzz_pages_allowed(.*)",$_SERVER['REQUEST_URI'],$zzzz_regs)) {
		if (!empty($zzzz_regs[6])) {
			$zzzz_destination = eregi_replace($zzzz_regs[6],"",$zzzz_regs[1]);
		}
		elseif (!empty($zzzz_regs[12])) {
			$zzzz_destination = $zzzz_regs[1];
		}
		if (!empty($zzzz_regs[6]) || !empty($zzzz_regs[12])) {
			$zzzz_destination = eregi_replace("($zzzz_forum_pages)\.php[?]?$","",$zzzz_destination);
			zzzz_doRedirect($zzzz_domain_tld,$zzzz_destination);
		}
        }
        if (!eregi("(($zzzz_forum_dirs)/?$|$zzzz_pages_allowed)",$_SERVER['REQUEST_URI'])) {
		zzzz_doRedirect($zzzz_domain_tld,"");
        }
        if (eregi("(.*)[?]$",$_SERVER['REQUEST_URI'],$zzzz_regs)) {
		zzzz_doRedirect($zzzz_domain_tld,$zzzz_regs[1]);
        }
}

function zzzz_doRedirect($zzzz_domain_tld,$zzzz_destination) {
	header("HTTP/1.1 301 Moved Permanently");
	header("Location: $zzzz_domain_tld$zzzz_destination");
	exit();
}
Reply With Quote
  #21  
Old 02-28-2005, 12:09 AM
cellardoor cellardoor is offline
 
Join Date: Oct 2004
Posts: 2
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I'm confused :ermm:
Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 11:22 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.18087 seconds
  • Memory Usage 2,305KB
  • Queries Executed 25 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)bbcode_code
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)modsystem_post
  • (1)navbar
  • (6)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (2)pagenav_pagelink
  • (11)post_thanks_box
  • (11)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (11)post_thanks_postbit_info
  • (10)postbit
  • (11)postbit_onlinestatus
  • (11)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete