The vB 3.0.6 code in includes/sessions.php
still *cough* does a hard remove of SIDs from bot requests.
- as of vB 3.0.3: (google|slurp@inktomi|yahoo! slurp)
- as of vB 3.0.4: (google|slurp@inktomi|yahoo! slurp)
- as of vB 3.0.5: (google|slurp@inktomi|yahoo! slurp)
- as of vB 3.0.6: (google|msnbot|yahoo! slurp)
This means that setting WOL bots via vBoptions does
not automatically imply removal of SIDs from every bot request.
Note that WOL settings versus SID removal are two different things, as of the last time I checked (see
this thread).
For as much as Zachery is a sweetie, as of vB 3.0.6, WOL bots via vBoptions do
not automatically remove SIDs from
every bot request.
Both hack1 and hack2 posted should still work for vB 3.0.3 through vB 3.0.6., and while I briefly looked at datastore, hack2 still uses a query.
Also note that, although MSNbot was added in includes/sessions.php as of vB 3.0.6, it will
not prevent MSNbot (or any other bot) from making requests with SIDs
if said bot has already requested pages using SIDs.
That is where the optional portion of the hacks comes into play! I have modified my optional portion, to be placed at the start of includes/init.php, as shown below. Of course, you could PHP include the code just the same.
Now, you need to realize that the below code is rather 'buttoned down' in that
listed bots can only crawl forumdisplay, showthread, printthread, and index,
and only certain query string type pieces related to those pages.
I worked my optional portion this way because I have no need for bots to consider,
for example, showthread.php?t=xyz&page=a&pp=
A different from showthread.php?t=xyz&page=a&pp=
B, index.php
? different from index.php, etcetera.
In my mind, robots.txt and meta tags options, etcetera, are not quite flexible enough, and do not have a fast enough response. Rather, I choose to 'button down that hatch' so to speak with forced 301s as shown below.
Of course, the below code does
not preclude the use of a .htaccess file (your OS willing) so, whatever you do, the way you decide to handle bots is ultimately up to you, your OS willing.
Code:
/*************************************************************************************************************************************************************************/
// are $_SERVER['HTTP_USER_AGENT'] and $_SERVER['REQUEST_URI'] defined on your server?
// if the answer is no, do not apply this hack, as this hack needs those $_SERVER elements
// is your vB forum located at http://www.your-domain.com/index.php on your server?
// if the answer is yes, do not apply this hack, as this hack only works for forums located
// at http://www.your-domain.com/your-forum-dir/index.php
// what is your domain uri - no ending slash
$zzzz_domain_tld = "http://www.YOUR-DOMAIN.COM";
// what are your forum directories - separate with | character - begin slash - no ending slash
$zzzz_forum_dirs = "/forum|/forum/archive";
// what forum pages to allow - separate with | character - no extension as .php is assumed
// note: at max you can allow forumdisplay, showthread, printthread, index - no showpost, etcetera
$zzzz_forum_pages = "forumdisplay|showthread|printthread|index";
// what bots to redirect - separate with | character - bot name must be part of the bot user agent
$zzzz_redirect_bots = "msnbot|gigabot|yahoo|google|jeeves|bot|crawl|seek|wisenut|teoma";
/*************************************************************************************************************************************************************************/
$zzzz_pages_allowed = "(($zzzz_forum_dirs)/($zzzz_forum_pages)\.php((/|[?])?([a-z]+[=][a-z]+[&])?([tf][=-][0-9]+([&](page)[=][0-9]+)?([-][p][-][0-9]+)?)?(\.html)?)?)";
if (preg_match("#($zzzz_redirect_bots)#si",$_SERVER['HTTP_USER_AGENT'])) {
if (preg_match("#(s|sessionhash)=[a-z0-9]{32}?&?#si",$_SERVER['REQUEST_URI'])) {
$zzzz_destination = preg_replace("/(s|sessionhash)=[a-z0-9]{32}?&?/","",$_SERVER['REQUEST_URI']);
zzzz_doRedirect($zzzz_domain_tld,$zzzz_destination);
}
if (eregi("$zzzz_pages_allowed(.*)",$_SERVER['REQUEST_URI'],$zzzz_regs)) {
if (!empty($zzzz_regs[6])) {
$zzzz_destination = eregi_replace($zzzz_regs[6],"",$zzzz_regs[1]);
}
elseif (!empty($zzzz_regs[12])) {
$zzzz_destination = $zzzz_regs[1];
}
if (!empty($zzzz_regs[6]) || !empty($zzzz_regs[12])) {
$zzzz_destination = eregi_replace("($zzzz_forum_pages)\.php[?]?$","",$zzzz_destination);
zzzz_doRedirect($zzzz_domain_tld,$zzzz_destination);
}
}
if (!eregi("(($zzzz_forum_dirs)/?$|$zzzz_pages_allowed)",$_SERVER['REQUEST_URI'])) {
zzzz_doRedirect($zzzz_domain_tld,"");
}
if (eregi("(.*)[?]$",$_SERVER['REQUEST_URI'],$zzzz_regs)) {
zzzz_doRedirect($zzzz_domain_tld,$zzzz_regs[1]);
}
}
function zzzz_doRedirect($zzzz_domain_tld,$zzzz_destination) {
header("HTTP/1.1 301 Moved Permanently");
header("Location: $zzzz_domain_tld$zzzz_destination");
exit();
}