Go Back   vb.org Archive > vBulletin Modifications > Archive > vB.org Archives > vBulletin 3.0 > vBulletin 3.0 Full Releases
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools
Remove Bot SIDs from URL Requests Details »»
Remove Bot SIDs from URL Requests
Version: 1.00, by calorie calorie is offline
Developer Last Online: Nov 2023 Show Printable Version Email this Page

Version: 3.0.3 Rating:
Released: 09-29-2004 Last Update: Never Installs: 16
 
No support by the author.

Hack 1: vB303_remove_bot_sids_1.txt

Okay so I notice that there are some bots where SIDs are in the requests. One such bot is msnbot, and who knows of the current code behind this bot, but it seems that it treats each different SID as a new link. Here is a quick and dirty hack to prevent this. You need the $_SERVER['HTTP_USER_AGENT'] and $_SERVER['REQUEST_URI'] array elements or their equivalents to use this mini hack. The first step of the hack prevents SIDs in new requests. The second step forces a redirect in order to strip the SIDs from links in the bot memory. There is no need to apply this hack for bots that have google or slurp@inktomi or yahoo! slurp as part of their user agent. Like I said, it is a quick and dirty hack, but it does what I need it to do. If you use this mod, a click of the install button is appreciated.

Hack 2: vB303_remove_bot_sids_2.txt

Do the following to see a list of bots that may appear on the Who's Online list: AdminCP >> vBulletin Options >> Who's Online Options >> Spider Identification Strings & Enable Spider Display & Spider Identification Description

However, according to http://www.vbulletin.com/forum/showthread.php?t=112022, the user agents that don't receive session IDs are hard coded in the sessions.php file. The bots that are hard coded are as follows: google, slurp@inktomi, yahoo! slurp

Thus the bots for the "who's online list" versus the bots in the "remove SID list" are currently not the same. This hack removes the session ids from the list of bots in the vBulletin Options rather than from those that were hard coded in the script.

It may be the case that pages were already crawled by a bot not hard coded in the "remove SID list" so those bots may spider with session ids in the requests. This hack includes an optional step to remove session ids from such bots via redirect.

Show Your Support

  • This modification may not be copied, reproduced or published elsewhere without author's permission.

Comments
  #2  
Old 09-30-2004, 08:25 PM
Dan's Avatar
Dan Dan is offline
 
Join Date: Dec 2002
Location: Titusville, Florida
Posts: 1,787
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by calorie
Okay so I notice that there are some bots where SIDs are in the requests. One such bot is msnbot, and who knows of the current code behind this bot, but it seems that it treats each different SID as a new link. Here is a quick and dirty hack to prevent this. You need the $_SERVER['HTTP_USER_AGENT'] and $_SERVER['REQUEST_URI'] array elements or their equivalents to use this mini hack. The first step of the hack prevents SIDs in new requests. The second step forces a redirect in order to strip the SIDs from links in the bot memory. There is no need to apply this hack for bots that have google or slurp@inktomi or yahoo! slurp as part of their user agent. Like I said, it is a quick and dirty hack, but it does what I need it to do. If you use this mod, a click of the install button is appreciated.
Nice hack Thank you for sharing it with us!
Reply With Quote
  #3  
Old 10-01-2004, 05:45 PM
zajako's Avatar
zajako zajako is offline
 
Join Date: Jan 2002
Location: a place not to far away
Posts: 633
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

so this helps in getting results for msn search and some others?

sorry im just kinda confused.
Reply With Quote
  #4  
Old 10-01-2004, 08:14 PM
Zachery's Avatar
Zachery Zachery is offline
 
Join Date: Jul 2002
Location: Ontario, Canada
Posts: 11,440
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by calorie
Okay so I notice that there are some bots where SIDs are in the requests. One such bot is msnbot, and who knows of the current code behind this bot, but it seems that it treats each different SID as a new link. Here is a quick and dirty hack to prevent this. You need the $_SERVER['HTTP_USER_AGENT'] and $_SERVER['REQUEST_URI'] array elements or their equivalents to use this mini hack. The first step of the hack prevents SIDs in new requests. The second step forces a redirect in order to strip the SIDs from links in the bot memory. There is no need to apply this hack for bots that have google or slurp@inktomi or yahoo! slurp as part of their user agent. Like I said, it is a quick and dirty hack, but it does what I need it to do. If you use this mod, a click of the install button is appreciated.
All you need to do is add the useragent and its display name in the vBoptions and it will remove the session
Reply With Quote
  #5  
Old 10-02-2004, 06:06 AM
calorie calorie is offline
 
Join Date: May 2003
Posts: 2,804
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

If a bot isn't on a remove SID list, and then it crawls, it gets a SID, and when it comes back to respider, the bot has the SID in the respider request because it was assigned a SID initally. What is suggested may strip SIDs from bot requests that are new, but for respider requests, if the bot was initially assigned a SID, the bot remembers the SID, so the SID is in the respider request. In my situation, I didn't have msnbot on a remove SID list, so after some time, the bot was making quite a lot of respider requests for the same pages but using different SIDs. By the time I realized what was happening, I was out a good chunk of bandwidth so this is the type of situation where I think this hack is useful.
Reply With Quote
  #6  
Old 10-02-2004, 12:36 PM
Zachery's Avatar
Zachery Zachery is offline
 
Join Date: Jul 2002
Location: Ontario, Canada
Posts: 11,440
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by calorie
If a bot isn't on a remove SID list, and then it crawls, it gets a SID, and when it comes back to respider, the bot has the SID in the respider request because it was assigned a SID initally. What is suggested may strip SIDs from bot requests that are new, but for respider requests, if the bot was initially assigned a SID, the bot remembers the SID, so the SID is in the respider request. In my situation, I didn't have msnbot on a remove SID list, so after some time, the bot was making quite a lot of respider requests for the same pages but using different SIDs. By the time I realized what was happening, I was out a good chunk of bandwidth so this is the type of situation where I think this hack is useful.
I dont think your logic is correct, i never had msnbot on the spiders list (untill after i noticed it on my forums) I checked its full location originaly and it did display a session id, and now after checking it yesterday(and it displaying as the MSNBot) it did not have a session id.
Reply With Quote
  #7  
Old 10-02-2004, 03:49 PM
calorie calorie is offline
 
Join Date: May 2003
Posts: 2,804
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I'm not talking about how bots show up in the who's online list. The only place in the vB code where I see sessions removed for bots is in the sessions.php file:

PHP Code:
// automatically determine whether to put the sessionhash into the URL
if (sizeof($_COOKIE) > OR preg_match("#(google|slurp@inktomi|yahoo! slurp)#si"$_SERVER['HTTP_USER_AGENT']))
{
    
// they have at least 1 cookie, so they should be accepting them
    
$nosessionhash 1;
    
$shash $session['sessionhash'] = '';
    
$surl $session['sessionurl'] = '';
    
$surlJS $session['sessionurl_js'] = '';
}
else
{
    
$nosessionhash 0;
    
$shash $session['sessionhash'];
    
$surl $session['sessionurl'] = 's=' $session['sessionhash'] . '&';
    
$surlJS $session['sessionurl_js'] = 's=' $session['sessionhash'] . '&';

If I would have had msnbot in the preg_match statement initially, msnbot would not have had SIDs in the requests.

Because msnbot was not in the preg_match statement, msnbot had SIDs in all requests until I applied this hack.

Please check this thread. Maybe it explains it better than I can.

BTW, are you checking your raw server access logs? I don't see how putting msnbot in vBoptions removes the SID from requests.
Reply With Quote
  #8  
Old 10-02-2004, 04:35 PM
Zachery's Avatar
Zachery Zachery is offline
 
Join Date: Jul 2002
Location: Ontario, Canada
Posts: 11,440
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by calorie
I'm not talking about how bots show up in the who's online list. The only place in the vB code where I see sessions removed for bots is in the sessions.php file:

PHP Code:
// automatically determine whether to put the sessionhash into the URL
if (sizeof($_COOKIE) > OR preg_match("#(google|slurp@inktomi|yahoo! slurp)#si"$_SERVER['HTTP_USER_AGENT']))
{
    
// they have at least 1 cookie, so they should be accepting them
    
$nosessionhash 1;
    
$shash $session['sessionhash'] = '';
    
$surl $session['sessionurl'] = '';
    
$surlJS $session['sessionurl_js'] = '';
}
else
{
    
$nosessionhash 0;
    
$shash $session['sessionhash'];
    
$surl $session['sessionurl'] = 's=' $session['sessionhash'] . '&';
    
$surlJS $session['sessionurl_js'] = 's=' $session['sessionhash'] . '&';

If I would have had msnbot in the preg_match statement initially, msnbot would not have had SIDs in the requests.

Because msnbot was not in the preg_match statement, msnbot had SIDs in all requests until I applied this hack.

Please check this thread. Maybe it explains it better than I can.

BTW, are you checking your raw server access logs? I don't see how putting msnbot in vBoptions removes the SID from requests.
There is a secdtion in the vBulletin 3 options area that lets you specific which useragets are spiders, once they are defined as spiders, they no longer every get a session id/

AdminCP > vBulletin Options > Who's Online Options > Spider Identification Strings & Spider Identification Description


Enter an unique identifier for each Search Engine spider that you wish to recognize. This should be something unique to the spider's HTTP USER AGENT. Please place one per line. Case is not important and the previous option needs to be enabled for identification to occur

Enter the text that you wish to display for each of the above spiders on Who's Online. You need to place the spiders description on the same line as the spider's identifier above. For example, if you place 'google' as the third spider above, place 'Google' on the third line to the right.
Reply With Quote
  #9  
Old 10-02-2004, 09:04 PM
calorie calorie is offline
 
Join Date: May 2003
Posts: 2,804
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Please read this vB.com thread. It indicates that the "who's online list" and the "remove SID list" are not the same. That thread is dated from August. Has something in the vB code changed since August? Where in the code are bot SIDs removed from the who's online list?
Reply With Quote
  #10  
Old 10-02-2004, 09:43 PM
nexialys
Guest
 
Posts: n/a
Default

yes, that filter related to the WOL would be better to be applyed globally to the SID list, so we can really filter what kind of spider can browse the site... i have built that feature in IPB, so i suppose it's easy to do for vB!
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 01:36 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.07068 seconds
  • Memory Usage 2,322KB
  • Queries Executed 23 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (2)bbcode_php
  • (4)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)modsystem_post
  • (1)navbar
  • (6)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (2)pagenav_pagelink
  • (10)post_thanks_box
  • (10)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (10)post_thanks_postbit_info
  • (9)postbit
  • (9)postbit_onlinestatus
  • (10)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete