vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vBulletin 2.x Full Releases (https://vborg.vbsupport.ru/forumdisplay.php?f=4)
-   -   Spider friendly URLs (https://vborg.vbsupport.ru/showthread.php?t=18035)

buro9 07-17-2003 08:13 PM

I'm getting a bit carried away by this now, and someone needs to stop me :)

I've applied the same idea to assist with reducing bandwidth :)

In my .htaccess file I've added this:

Code:

  #
  # avatar.php rewriting
  #
  # av1-1053412959.gif = userid + dateline
RewriteRule ^av([0-9]+)-([0-9]+).gif$ avatar.php?userid=$1&dateline=$2 [L]

  #
  # attachment.php rewriting
  #
  # atp157156.gif = postid + extension
RewriteRule ^atp([0-9]+).([a-z]+)$ attachment.php?postid=$1 [L]
  # att157156.gif = attachmentid + extension
RewriteRule ^att([0-9]+).([a-z]+)$ attachment.php?attachmentid=$1 [L]

And you should be able to figure the rest out ;)

Simply change all references of avatar.php?userid=$post[userid]&$post[dateline]... and variations, for av$post[userid]-$post[dateline].gif

Don't worry about the extension, the correct mime-type will be returned by the php... and that's whats important.

Then change template postbit_attachment so that the URL for attachments is this : atp$post[postid].$post[attachmentextension]

Note that this also sidesteps a bug in Mozilla whereby downloading a zip file from a php page would prompt a php file extension rather than zip.

I've been hacking for sure, and again I don't recall clearly every change I made. But if you got the gist of everything else in this thread then I've no doubt you can do this.

Essentially the point is that a lack of querystring allows the browser and proxies/caches to cache the avatars and attachments.

This obviously reduces bandwidth... and also reduces database load.

I have high hopes for this little addition to this very fine hack ;)

buro9 07-20-2003 08:33 PM

One last bug fix.

If you use lots of standard avatars... then the page navigation over pages of avatars from member.php will be broken (you'll only ever get the first page)... so you will also need to insert this:

Code:

RewriteCond %{QUERY_STRING} ^(.*)-([0-9]+)\.html$
RewriteRule ^member.php$ member.php?%1&pagenumber=%2? [L]

That will be after lierduh's other corrective rewrites.

Erwin 07-21-2003 02:14 AM

Btw, I am doing something similar in vB3 - it's a lot easier, since sessionhash is coded by itself now. :)

Ogmuk 07-22-2003 12:58 AM

Thanks for the terrific job on the great hack guys. I'm one of the few who doesn't need to have their forums added to search engines, but it's great for everyone who uses Google AdSense to gain some revenue. I have a question, does anyone know how to get the sessionhashes removed from the navbits? i.e.:
My_vbulletin_board > Some_Forum > This_is_my_post
Listed near the top in each forum or post level page. This same problem affects the Forum Jump menu on the bottom left and no doubt many other pages, but from personal experience, these are most commonly used ones.

This is not interesting for those who use this hack purely to get their pages Googled but for those who're using this for AdSense sessionhashes nearly always means that you'll be getting charity placeholders for banners instead (since the crawler thinks that it hasn't cached the page to generate an appropriate banner).

It also looks like that if a link on the forum sends you back to the index any different from http://www.mywebsite.com/forums/index.php (i.e. http://www.mywebsite.com/forums/index.php?s= or just http://www.mywebsite.com/forums/) that it will generate placeholder banners too. Perhaps this can be avoided as well?

buro9 07-23-2003 04:56 AM

My reason for all of this is AdSense... I'm not bothered about spidering at all.

I manually removed all mentions of 'sessionhash' as appropriate throughout the whole codebase (php & templates).

There's a few subtle ones that linger... for example in the replacement variables for the styles... modify the header to remove the sessionhash from the main image and core navigation.

The page nav bit is buried in admin/functions.php and you can remove the sessionhash from there.

I also then adjusted all of my user options and registration forms to remove the option to not use cookies. And modified the FAQ to say that cookies are compulsory.

buro9 07-23-2003 04:59 AM

One thing to point out is that even when you successfully remove all sessionhashes... Google spiders still visit with one!

I think their software has learnt vb and just compensates and discards. But this didn't bother me because the lack of sessionhashes and querystrings does help with being cached by proxies (the particularly dumb ones that AOL seem to use). So there is a benefit to it... but not as much as you think there will be.

Erwin 07-23-2003 10:04 AM

Quote:

Today at 03:59 PM buro9 said this in Post #276
One thing to point out is that even when you successfully remove all sessionhashes... Google spiders still visit with one!
What do you mean?

buro9 07-23-2003 11:21 AM

Oh no, my mistake :)

In my online.php there was still a place where a sessionhash was being echoed and I incorrectly thought that the spiders were using a hash... but they're not... it's just the display to me of where the spider is that inserted the hash.

Ignore that last bit :)

Which is good... as now it clearly is working better than I thought.

Ogmuk 07-23-2003 11:38 AM

Thanks for the reply Buro9. I wonder if it's possible to get a step-by-step guide how to remove all the sessionhashes on every page where it is needed. If you or anyone has that amount of spare time of course ;)

filburt1's beta script looks like something that could work with AdSense too, this might be interesting to look into. Did anyone try anything like this out for AdSense?

EDIT: VB3 works like a charm with AdSense. I can't wait for RC1 (just like nearly everyone else here).

buro9 07-23-2003 03:35 PM

Found another bug:

The admin function to merge threads did not work, because your thread URL's are now of a different format.

postings.php and the action 'domergethread' expected a URL with 'threadid=' in it. But if you've followed all instructions (!) your formats are more similar to:

http://www.bowlie.com/forum/t5249.html
and
http://www.bowlie.com/forum/t5249-15-3.html

So, to fix this, do this:

FIND (in postings.php):
Code:

$getthreadid=intval(substr($mergethreadurl,strpos($mergethreadurl,"threadid=")+9));
And replace with:
Code:

  // HACK : START : SPIDER FRIENDLY URLS
  //$getthreadid=intval(substr($mergethreadurl,strpos($mergethreadurl,"threadid=")+9));
  $getthreadid = intval(preg_replace("/(^.*\/t)|(-[\d]+-[\d]+)|(\.html)/", "", $mergethreadurl));
  // HACK : END : SPIDER FRIENDLY URLS

If your format is slightly different, modify the regexp pattern slightly ;)

All it's doing is stripping out the threadid from the new format URL and putting that in the variable in the same way the old code did.

As you'll note, I always leave old-code lying around commented out in case I ever want to roll-back... it's just my style... but if you trust my work you can delete that line.

I also always leave those START and END blocks in, so I can see what the hell I changed and why :)



Ogmuk, simply put... I started by using the template search to find all templates with 'sessionhash' in them. Then I edited each and every template (nigh on all of them) and removed the applicable code... which usually boils down to:

Code:

s=$session[sessionhash]
And that is buried in nearly all URL's and also in some hidden form fields.

Once removed from all templates, I then searched through all .php files in the root of the forum directories, and similarly replaced all sessionhashes. EXCEPT where I found $dbsession[sessionhash] as this was usually being written TO the cookie and wasn't being echoed.

You will have to read through each instance, but it's obvious that if it's appearing in a URL you can strip it out... but if it's in code then you'll probably want to keep it there.


And lastly... I have AdSense running on my site, and thought I'd share this last tip for you:

AdSense advises you not to place adverts on pages that you have to be logged on to view, or on search results pages. The former is because they'll never correctly spider it and serve relevant adverts (I bet you see ones for password cracking and security!), and the second is because the pages changes too frequently and by the time it's spidered it's useless. Both in effect will show inappropriate or public service adverts which do nothing for your revenue... and lower your click-throughs by increasing impressions... and also generate server load by sending too many spiders your way.

So... I've written some JavaScript to only put adverts on pages that I know I WANT to show AdSense adverts on... here it is for you :)

Code:

<script type="text/javascript">
var adPages = new Array(
  'forumdisplay.php',
  'index.php',
  'announcement.php',
  'showthread.php',
  'calendar.php',
  'donate.php',
  'misc.php',
  'memberlist.php',
  'vbstats.php',
  'member.php',
  'forum/f',
  'forum/t'
);
var returnAdvert = false;
var pageString = new String();
pageString = document.location.href;
for (var ii = 0; ii < adPages.length; ii++) {
  if (pageString.indexOf(adPages[ii]) >= 0) {
    returnAdvert = true;
    break;
  }
}
if (returnAdvert == false && document.location.href == "http://www.bowlie.com/forum/") {
  returnAdvert = true;
}
if (returnAdvert == true) {
  var google_ad_client='pub-9576666925012421';
  var google_ad_width=468;
  var google_ad_height=60;
  var google_ad_format='468x60_as';
  document.write('<scr'+'ipt type="text/javascr'+'ipt" language="JavaScr'+'ipt" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></scr'+'ipt>');
}
</script>

All you need to do is change my forum path to your full forum path and put in your ad_client code (otherwise I get your money!).

Hope all of that info helps everyone.

Cheers

David K


All times are GMT. The time now is 11:13 AM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01319 seconds
  • Memory Usage 1,784KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (6)bbcode_code_printable
  • (1)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (1)pagenav_pagelinkrel
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (10)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete