Go Back   vb.org Archive > vBulletin Modifications > Archive > vB.org Archives > vBulletin 3.5 > vBulletin 3.5 Add-ons
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools
Google sitemap for the vB Archives. Redirect human and robots. Details »»
Google sitemap for the vB Archives. Redirect human and robots.
Version: 1.2, by lierduh lierduh is offline
Developer Last Online: Nov 2023 Show Printable Version Email this Page

Version: 3.5.1 Rating:
Released: 08-09-2005 Last Update: 11-08-2005 Installs: 130
Uses Plugins
Code Changes Additional Files  
No support by the author.

Release V1.2 (9 Nov 2005)
* Higher sitemap priority rate is given to threads with new posts. So Google can index fresh threads first.

* Not recommending the original optional STEP 3 hack. To avoid potential Google penalty, my advice is to remove the STEP 3 hack.

Release V1.1a (12 Oct 2005)

* Bug fix only

Release V1.1 (9 Oct 2005)

* Can handle very large forums with more than 50,000 URLs per forum
URLs will be spanned through multiple files for each large forum.

* Created a function to detect search engine crawlers. The vB built-in
search engine detector can only identify about 3 or 4 search engines.
My function will detect over 20 search engine crawlers.

* Support forums hosted by web servers that do not support 'fix_pathinfo'
ie. instead of the usual 'archive/index.php/f-10.html' link. These
forums have a link as 'archive/index.php?f-10.html'.

* Alert about wrong directory permissions to help newbies.

* Automatically write index file to archive directory if the php
script can not write into the base vB directory.

* Bug fixes.


Objectives
==============
  • Create Google sitemap files and sitemap index file for vB archives, submit to Google by the Scheduled Tasks.
  • To have the vB Archive used as a mirror to the actual threads.
  • Google loves the nature of the archive pages, as they are static and do not contain repeated contents.
  • Google gauge pages heavily based on external links. We need to redirect these external thread links to the archive pages.
  • We often see vbulletin archive in the Google search results, but the users are taken to the archive page instead of the actual threads. We need to automatically redirect visitors to the actual threads instead of the archive. Otherwise the visitor either need to reclick for the Full Version or read the dull archive contents.

Q and A
==============
Q. Would the sitemap contain the links for hidden forums?
A. No, the forum permission was consulted while generating the sitemap files.

Q. How often are the sitemap files generated?
A. You decide and set in the Scheduled Tasks. The script can not be called by external user by default to prevent boring people killing your server.

Q. Is the sitemap file compressed.
A. Yes, the multiple sitemap files are gunziped according to Google sitemap standard to save bandwidth. Sitemap index file is not compressed, it is submitted as a normal xml file.

Q. Would the sitemaps include links for the normal threads? eg. showthread.php?t=1234...
A. No, it is unlikely Google will index your entire site if you feed it with all the combination of showthread links. It is better to let Google going through the more static archives. You will have a better chance for sure to have more thread contents indexed by Google this way.

Q. Why don't you go crazy about rewrite rules and do things like including thread title as the url.
A. I won't deny having keywords in the url is a good SEO strategy, but Google also does not like "Over Search Engine Optimized" web sites. Google has recently penalized a huge number of such sites. Sending them from page rank of 5, 6 to 0.

Q. Does sitemap really help?
A. Definitely, Google has done over 60,000 pages since I submitted my sitemaps a few days ago. Yahoo bots were visiting more pages than Google before the sitemap. I expect the total Google visits for this month will be exceeding Yahoo in the next one or two days.

What is involved?
==================
I have divided this hack into two steps. The first step involves unloading a php file. This enables the sitemap to be generated and submitted to Google.

The second step involves installing a Plugin using AdminCP. This sends all robots to the archive pages, preventing them viewing the actual threads.

For example, Google/Other Crawlers follows an external link to visit:
http://forums.mysite/showthread.php?t=1234&page=2

It will be told this page is permanently relocated to:
http://forums.mysite/archive/index.php/t-1234-p-2

This way you don't lose page rank gain from external links.

Install
=========
To install, follow the readme file.
To let me know you have installed this and let me send update information to you. Please click INSTALL .

Strategy
=========

It is unlikely Google/other Search Engine will index your entire site, especially due to the dynamic nature of the vbulletin forums. An archive sitemap will let Google concentrate on the real contents of your forums -- the threads. If Google needs to go through the endless member profile pages. It will get sick of it and just become tired.(sorry, perhaps robots can not become tired). What we can do is disallowing the crawling of unneccessary pages. My robots.txt contains:

#ALL BOTS
User-agent: *
Disallow: /admincp/
Disallow: /ajax.php
Disallow: /attachments/
Disallow: /clientscript/
Disallow: /cpstyles/
Disallow: /images/
Disallow: /includes/
Disallow: /install/
Disallow: /modcp/
Disallow: /subscriptions/
Disallow: /customavatars/
Disallow: /customprofilepics/
Disallow: /announcement.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /external.php
Disallow: /faq.php
Disallow: /frm_attach
Disallow: /image.php
#Disallow: /index.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php?
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /payment_gateway.php
Disallow: /payments.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /showpost.php
Disallow: /subscription.php
Disallow: /usercp.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php

You perhaps have noticed I included index.php in there. Apparently Google regards http://forums.mysite/index.html as same as http://forums.mysite/
...but http://forums.mysite/index.php as a different file. The default vB templates include index.php as the internal link. That will spread your page rank on your home page! So it is better off not letting Google see this file.

If you have rewrite installed. Perhaps you could add to the .htaccess file:

RewriteCond %{QUERY_STRING} ^$
RewriteRule ^index.php$ / [R=301,L]

(if your forums are under http://site/forums/. Try: RewriteRule ^forums/index.php$ forums/ [R=301,L])

That will redirect /index.php to /, but only if no query_string is presented. ie. /index.php?do=mymod will not be redirected.

Show Your Support

  • This modification may not be copied, reproduced or published elsewhere without author's permission.

Comments
  #42  
Old 09-03-2005, 08:36 PM
David_R David_R is offline
 
Join Date: Mar 2005
Location: Los Angeles
Posts: 212
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Yorixz
It doesn't support vB 3.x, that's why it's in the vB 3.5 category
hi,
something similar exists in vb 3.5 extensions, but does supports 3.x as well, can you compare features of both these hacks ?
Reply With Quote
  #43  
Old 09-04-2005, 08:46 AM
Yorixz Yorixz is offline
 
Join Date: Jun 2005
Location: Netherlands
Posts: 284
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by David_R
hi,
something similar exists in vb 3.5 extensions, but does supports 3.x as well, can you compare features of both these hacks ?
I'm afraid I can't since I can't get this hack working on my forum right now, if it's running here I'll post my experience and such for you.
Reply With Quote
  #44  
Old 09-07-2005, 10:12 AM
vauge vauge is offline
 
Join Date: Oct 2004
Posts: 114
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I really really like this idea, but is the below a concern? I do not want to do negative thing for my forums.

Quote:
Originally Posted by Google
Don't employ cloaking or sneaky redirects.
http://www.google.com/webmasters/guidelines.html
Reply With Quote
  #45  
Old 09-07-2005, 11:11 PM
lierduh lierduh is offline
 
Join Date: Jan 2003
Location: Sydney, Australia
Posts: 459
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by vauge
I really really like this idea, but is the below a concern? I do not want to do negative thing for my forums.

http://www.google.com/webmasters/guidelines.html
301 Redirect is Google preferred way. Google penalize web site with duplicated contents. So if you have two URLs showing the same contents, Google prefers you redirect from one URL to the other. Have both will attract penlty.

What we do here is not sneaky. We have the actual contents, we just want Google to show one version of it. We do not want Google to give us higher page rank than the pages actually worth, we just want Google to index the actually contents, instead of looping through the endless internal links.

I moved my forums to a new domain a few weeks ago (just before I released this hack). There are so far 150,000 pages indexed by Google already. Without sitemap Yahoo has only indexed just over 20,000 pages.
Reply With Quote
  #46  
Old 09-07-2005, 11:14 PM
lierduh lierduh is offline
 
Join Date: Jan 2003
Location: Sydney, Australia
Posts: 459
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Yorixz
This mod looked very interesting, install went fine but now when I try to run it I get this errors:
Code:
Warning: fopen(/home/ftpusers/otfans/html/forums/g_sitemap.xml) [function.fopen]: failed to open stream: Permission denied in /archive/forums_sitemap.php on line 245
What am I doing wrong? I'll be very glad to hear it
As the error suggests. The permission is denied in the '/home/ftpusers/otfans/html/forums/' directory. That is the base vB directory. Where I described very clearly as "the same directory where you find showthread.php etc.".

People keep asking the directory name to change. I do not know, because that can be anything. In your case it is 'forums', others may be 'public_html'...
Reply With Quote
  #47  
Old 09-12-2005, 04:49 AM
KarateKid's Avatar
KarateKid KarateKid is offline
 
Join Date: Oct 2001
Location: Sydney
Posts: 158
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

hm, with vb 3.5 rc3, I get the following errors when accessing the forums_sitemap.php:

PHP Code:
Warningarray_keys(): The first argument should be an array in /home/htdocs/web0/html/forum/includes/class_core.php on line 1438

Warning
Invalid argument supplied for foreach() in /home/htdocs/web0/html/forum/includes/class_core.php on line 1438

Warning
array_keys(): The first argument should be an array in /home/htdocs/web0/html/forum/includes/class_core.php on line 1453

Warning
Invalid argument supplied for foreach() in /home/htdocs/web0/html/forum/includes/class_core.php on line 1453



    Unable to add cookies
header already sent.
    
File: /home/htdocs/web0/html/forum/includes/class_core.php
    Line
1438 
Any ideas?
Reply With Quote
  #48  
Old 09-13-2005, 01:32 AM
lierduh lierduh is offline
 
Join Date: Jan 2003
Location: Sydney, Australia
Posts: 459
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

No idea, I have upgraded to RC3. The hack works without any further modification.

If you have changed the php file, make sure your upgrade does not copy over them. I did not copy new files to 'archive directory'.
Reply With Quote
  #49  
Old 09-14-2005, 02:50 PM
buro9 buro9 is offline
 
Join Date: Feb 2002
Location: London, UK
Posts: 585
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Hey lierduh,

Thanks for your hacks over the years, always fine jobs

I'm another one wanting instructions for the archive/index.php and archive/global.php

I've applied the rest of the instructions, and all appears to be working fine. And I do get the concept of removing the PDA crud, and redirecting humans out of the archive... but I'm a little lost with the diff output you've supplied... a more primitive ><+/- lines would've confused me less

If you do ever find time to update the instructions, it will be very appreciated by quite a few of us. And I realise how much of a pain that is, as I'm supposed to be porting my own hacks and really don't like the idea much.

Cheers

DavidK
Reply With Quote
  #50  
Old 09-14-2005, 10:26 PM
lierduh lierduh is offline
 
Join Date: Jan 2003
Location: Sydney, Australia
Posts: 459
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by buro9
Hey lierduh,

Thanks for your hacks over the years, always fine jobs

I'm another one wanting instructions for the archive/index.php and archive/global.php

I've applied the rest of the instructions, and all appears to be working fine. And I do get the concept of removing the PDA crud, and redirecting humans out of the archive... but I'm a little lost with the diff output you've supplied... a more primitive ><+/- lines would've confused me less

If you do ever find time to update the instructions, it will be very appreciated by quite a few of us. And I realise how much of a pain that is, as I'm supposed to be porting my own hacks and really don't like the idea much.

Cheers

DavidK
Hello DavidK,

The reason it has confused you is my diff were based on the RC2 and modified RC2 files. I presume now you have got RC3 files.

I have attached the diff between RC3 and modified RC2 files (this will confuse everyone else, but not you). Otherwise, if you still have the RC2 files, then the coloured diff will make a lot of sense.
Reply With Quote
  #51  
Old 09-15-2005, 06:32 AM
buro9 buro9 is offline
 
Join Date: Feb 2002
Location: London, UK
Posts: 585
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by lierduh
Hello DavidK,

The reason it has confused you is my diff were based on the RC2 and modified RC2 files. I presume now you have got RC3 files.

I have attached the diff between RC3 and modified RC2 files (this will confuse everyone else, but not you). Otherwise, if you still have the RC2 files, then the coloured diff will make a lot of sense.
Much better Thanks

And yes... it will confuse everyone now
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 06:23 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.04764 seconds
  • Memory Usage 2,347KB
  • Queries Executed 25 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)bbcode_code
  • (1)bbcode_php
  • (7)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)modsystem_post
  • (1)navbar
  • (6)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (1)pagenav_pagelinkrel
  • (11)post_thanks_box
  • (11)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (11)post_thanks_postbit_info
  • (10)postbit
  • (11)postbit_onlinestatus
  • (11)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete