Go Back   vb.org Archive > vBulletin Modifications > Archive > vB.org Archives > vBulletin 3.5 > vBulletin 3.5 Add-ons

Reply
 
Thread Tools
Google sitemap for the vB Archives. Redirect human and robots. Details »»
Google sitemap for the vB Archives. Redirect human and robots.
Version: 1.2, by lierduh lierduh is offline
Developer Last Online: Nov 2023 Show Printable Version Email this Page

Version: 3.5.1 Rating:
Released: 08-09-2005 Last Update: 11-08-2005 Installs: 130
Uses Plugins
Code Changes Additional Files  
No support by the author.

Release V1.2 (9 Nov 2005)
* Higher sitemap priority rate is given to threads with new posts. So Google can index fresh threads first.

* Not recommending the original optional STEP 3 hack. To avoid potential Google penalty, my advice is to remove the STEP 3 hack.

Release V1.1a (12 Oct 2005)

* Bug fix only

Release V1.1 (9 Oct 2005)

* Can handle very large forums with more than 50,000 URLs per forum
URLs will be spanned through multiple files for each large forum.

* Created a function to detect search engine crawlers. The vB built-in
search engine detector can only identify about 3 or 4 search engines.
My function will detect over 20 search engine crawlers.

* Support forums hosted by web servers that do not support 'fix_pathinfo'
ie. instead of the usual 'archive/index.php/f-10.html' link. These
forums have a link as 'archive/index.php?f-10.html'.

* Alert about wrong directory permissions to help newbies.

* Automatically write index file to archive directory if the php
script can not write into the base vB directory.

* Bug fixes.


Objectives
==============
  • Create Google sitemap files and sitemap index file for vB archives, submit to Google by the Scheduled Tasks.
  • To have the vB Archive used as a mirror to the actual threads.
  • Google loves the nature of the archive pages, as they are static and do not contain repeated contents.
  • Google gauge pages heavily based on external links. We need to redirect these external thread links to the archive pages.
  • We often see vbulletin archive in the Google search results, but the users are taken to the archive page instead of the actual threads. We need to automatically redirect visitors to the actual threads instead of the archive. Otherwise the visitor either need to reclick for the Full Version or read the dull archive contents.

Q and A
==============
Q. Would the sitemap contain the links for hidden forums?
A. No, the forum permission was consulted while generating the sitemap files.

Q. How often are the sitemap files generated?
A. You decide and set in the Scheduled Tasks. The script can not be called by external user by default to prevent boring people killing your server.

Q. Is the sitemap file compressed.
A. Yes, the multiple sitemap files are gunziped according to Google sitemap standard to save bandwidth. Sitemap index file is not compressed, it is submitted as a normal xml file.

Q. Would the sitemaps include links for the normal threads? eg. showthread.php?t=1234...
A. No, it is unlikely Google will index your entire site if you feed it with all the combination of showthread links. It is better to let Google going through the more static archives. You will have a better chance for sure to have more thread contents indexed by Google this way.

Q. Why don't you go crazy about rewrite rules and do things like including thread title as the url.
A. I won't deny having keywords in the url is a good SEO strategy, but Google also does not like "Over Search Engine Optimized" web sites. Google has recently penalized a huge number of such sites. Sending them from page rank of 5, 6 to 0.

Q. Does sitemap really help?
A. Definitely, Google has done over 60,000 pages since I submitted my sitemaps a few days ago. Yahoo bots were visiting more pages than Google before the sitemap. I expect the total Google visits for this month will be exceeding Yahoo in the next one or two days.

What is involved?
==================
I have divided this hack into two steps. The first step involves unloading a php file. This enables the sitemap to be generated and submitted to Google.

The second step involves installing a Plugin using AdminCP. This sends all robots to the archive pages, preventing them viewing the actual threads.

For example, Google/Other Crawlers follows an external link to visit:
http://forums.mysite/showthread.php?t=1234&page=2

It will be told this page is permanently relocated to:
http://forums.mysite/archive/index.php/t-1234-p-2

This way you don't lose page rank gain from external links.

Install
=========
To install, follow the readme file.
To let me know you have installed this and let me send update information to you. Please click INSTALL .

Strategy
=========

It is unlikely Google/other Search Engine will index your entire site, especially due to the dynamic nature of the vbulletin forums. An archive sitemap will let Google concentrate on the real contents of your forums -- the threads. If Google needs to go through the endless member profile pages. It will get sick of it and just become tired.(sorry, perhaps robots can not become tired). What we can do is disallowing the crawling of unneccessary pages. My robots.txt contains:

#ALL BOTS
User-agent: *
Disallow: /admincp/
Disallow: /ajax.php
Disallow: /attachments/
Disallow: /clientscript/
Disallow: /cpstyles/
Disallow: /images/
Disallow: /includes/
Disallow: /install/
Disallow: /modcp/
Disallow: /subscriptions/
Disallow: /customavatars/
Disallow: /customprofilepics/
Disallow: /announcement.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /external.php
Disallow: /faq.php
Disallow: /frm_attach
Disallow: /image.php
#Disallow: /index.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php?
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /payment_gateway.php
Disallow: /payments.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /showpost.php
Disallow: /subscription.php
Disallow: /usercp.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php

You perhaps have noticed I included index.php in there. Apparently Google regards http://forums.mysite/index.html as same as http://forums.mysite/
...but http://forums.mysite/index.php as a different file. The default vB templates include index.php as the internal link. That will spread your page rank on your home page! So it is better off not letting Google see this file.

If you have rewrite installed. Perhaps you could add to the .htaccess file:

RewriteCond %{QUERY_STRING} ^$
RewriteRule ^index.php$ / [R=301,L]

(if your forums are under http://site/forums/. Try: RewriteRule ^forums/index.php$ forums/ [R=301,L])

That will redirect /index.php to /, but only if no query_string is presented. ie. /index.php?do=mymod will not be redirected.

Show Your Support

  • This modification may not be copied, reproduced or published elsewhere without author's permission.

Comments
  #52  
Old 09-15-2005, 08:37 AM
Brandon Sheley's Avatar
Brandon Sheley Brandon Sheley is offline
 
Join Date: Mar 2005
Location: Google Kansas
Posts: 4,678
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

i can't seem to get the files to be created

from the instructions, which where just about to much info for me..
i upload the forums_sitemap.php to arcives then chmom the arcives folder 775,
then make the sceduled task, run it, and the files should be made..
what part am i missing ? thankyou..

at rc3 now.
Reply With Quote
  #53  
Old 09-18-2005, 11:01 AM
Yorixz Yorixz is offline
 
Join Date: Jun 2005
Location: Netherlands
Posts: 284
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Code:
Warning: array_keys() [function.array-keys]: The first argument should be an array in /home/ftpusers/otfans/html/forums/includes/class_core.php on line 1438

Warning: Invalid argument supplied for foreach() in /home/ftpusers/otfans/html/forums/includes/class_core.php on line 1438

Warning: array_keys() [function.array-keys]: The first argument should be an array in /home/ftpusers/otfans/html/forums/includes/class_core.php on line 1453

Warning: Invalid argument supplied for foreach() in /home/ftpusers/otfans/html/forums/includes/class_core.php on line 1453
I'm also still having that errors, weird =/ lierduh, could it be that you've tested it with PHP4 rather than PHP5?
Reply With Quote
  #54  
Old 09-18-2005, 10:46 PM
lierduh lierduh is offline
 
Join Date: Jan 2003
Location: Sydney, Australia
Posts: 459
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Yorixz
Code:
Warning: array_keys() [function.array-keys]: The first argument should be an array in /home/ftpusers/otfans/html/forums/includes/class_core.php on line 1438

Warning: Invalid argument supplied for foreach() in /home/ftpusers/otfans/html/forums/includes/class_core.php on line 1438

Warning: array_keys() [function.array-keys]: The first argument should be an array in /home/ftpusers/otfans/html/forums/includes/class_core.php on line 1453

Warning: Invalid argument supplied for foreach() in /home/ftpusers/otfans/html/forums/includes/class_core.php on line 1453
I'm also still having that errors, weird =/ lierduh, could it be that you've tested it with PHP4 rather than PHP5?
I do not have php5 to test.

Instead of calling the script directly, have you tried using the Schedule Task's "Run Now" button?
Reply With Quote
  #55  
Old 09-18-2005, 11:32 PM
thenetbox thenetbox is offline
 
Join Date: Mar 2002
Posts: 184
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

thank you very much! I just started trying to do this my self but found this yay!
Reply With Quote
  #56  
Old 09-20-2005, 04:53 PM
Yorixz Yorixz is offline
 
Join Date: Jun 2005
Location: Netherlands
Posts: 284
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by lierduh
I do not have php5 to test.

Instead of calling the script directly, have you tried using the Schedule Task's "Run Now" button?
Yes, that results into
Code:
Warning: gzopen(/home/ftpusers/otfans/html/forums/archive/sitemap_11.gz) [function.gzopen]: failed to open stream: Permission denied in /archive/forums_sitemap.php on line 132

Warning: gzwrite(): supplied argument is not a valid stream resource in /archive/forums_sitemap.php on line 72

Warning: gzwrite(): supplied argument is not a valid stream resource in /archive/forums_sitemap.php on line 87

Warning: gzwrite(): supplied argument is not a valid stream resource in /archive/forums_sitemap.php on line 87

Warning: gzwrite(): supplied argument is not a valid stream resource in /archive/forums_sitemap.php on line 87

Warning: gzwrite(): supplied argument is not a valid stream resource in /archive/forums_sitemap.php on line 87

Warning: gzwrite(): supplied argument is not a valid stream resource in /archive/forums_sitemap.php on line 87

Warning: gzwrite(): supplied argument is not a valid stream resource in /archive/forums_sitemap.php on line 87

Warning: gzwrite(): supplied argument is not a valid stream resource in /archive/forums_sitemap.php on line 87

Warning: gzwrite(): supplied argument is not a valid stream resource in /archive/forums_sitemap.php on line 87

Warning: gzwrite(): supplied argument is not a valid stream resource in /archive/forums_sitemap.php on line 87
for like thousand times.

Weird thing is that I'm 100% sure that I chmodded everything correctly. (It's on a debian host, if that is relevant)
Reply With Quote
  #57  
Old 09-20-2005, 11:10 PM
lierduh lierduh is offline
 
Join Date: Jan 2003
Location: Sydney, Australia
Posts: 459
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Yorixz
Yes, that results into
Code:
Warning: gzopen(/home/ftpusers/otfans/html/forums/archive/sitemap_11.gz) [function.gzopen]: failed to open stream: Permission denied in /archive/forums_sitemap.php on line 132
for like thousand times.

Weird thing is that I'm 100% sure that I chmodded everything correctly. (It's on a debian host, if that is relevant)
That means the script can't write to archive directory. What is the persmission for this directory?
Reply With Quote
  #58  
Old 09-21-2005, 05:04 AM
Yorixz Yorixz is offline
 
Join Date: Jun 2005
Location: Netherlands
Posts: 284
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by lierduh
That means the script can't write to archive directory. What is the persmission for this directory?
Right now it's 0777 and it's working, I thought I already changed it some days ago, it was 0775 (which should be enough as far as I know)

Thanks for your support
Reply With Quote
  #59  
Old 09-24-2005, 08:34 PM
jribz jribz is offline
 
Join Date: Oct 2003
Posts: 66
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

OK I have this installed and it seems to be working as described. When viewing who's online I can see the search engines looking at threads with url's similar to the following.

/archive/index.php/t-6044.html

When clicked by a human user they are redirected to

/showthread.php?t=6044

So I can only assume this works since all the spiders on the board are seeing the archived version, and when users click they are taken to the full version.

I do have a couple of questions however, since I am not too familiar with Google Sitemaps. Does the script automatically upload the sitemap to Google without any further action aside from making the Scheduled Task? I have made the task in the manager and run it (every day at 1AM), and it has created the files ( [xml] in forum root and specific [gz] forums in archive folder).

[upon further thinking, would I be correct in saying I need to let google know about the xml file in the root of the site?]

What is the affect on other search engines? I see yahoo, msn, ask, and others viewing similar archives, so I assume the affect is similar to what is happening with google, but they are not getting a map.

Last question, does this basically mean that other SEO hacks are not required, since the spiders will never see the rewritten urls anyhow?

Allot of assumptions up there. :ermm:

Oh and one last thing, I do use mod rewrite on my server for many sites, and have had no issues, but the command you say to enter to resolve the index.php issue seems to bog the server, making any urls that point directly to it, as in /index.php, not load. I suppose this could be a conflict within my htaccess file, but not too certain where to start looking. (however, I did try it with only the codes you provided (and RewriteEngine on) and have the same problem.

Thanks for your time and the hack.
Reply With Quote
  #60  
Old 09-25-2005, 12:16 AM
lierduh lierduh is offline
 
Join Date: Jan 2003
Location: Sydney, Australia
Posts: 459
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Each time the script is run.

1) It re-generates all the sitemaps. Makes sense because you have more threads/posts now.

2) It notifies Google about new sitemaps being available. You will notice Google fetches these files soon afterwards.

If you have the scheduled task logged. The end of the log is the response sent by Google. It should say:

======================
Sitemap Notification Received

Your Sitemap has been successfully added to our list of Sitemaps to crawl. If this is the first time you are notifying Google about this Sitemap, please add it via http://www.google.com/webmasters/sitemaps so you can track its status. Please note that we do not add all submitted URLs to our index, and we cannot make any predictions or guarantees about when or if they will appear.
======================

One thing to remember is under your Google sitemap account. The 'last submitted' does not reflect the auto ping/submit. It only logs the manual submit you do by push the button at Google sitemap account.

Other search engine do not accept sitemaps as far as I know, at least not using Google's sitemap format. The redirects however works for all the major search engine which I believe benefits the indexing.

I do not recommend using SEO at least for existing sites. The chances are Google has already indexed part of your forums using links like /showthread.php?t=12345. Now if you rewrite all the URLs, Google will have two copies of the same contents for that thread. (one with the traditional URL, one from your new rewrite URL). This will lead Google panalizing your site ranking. Some smarter SEO scheme redirect your old URL to the new one does not suffer this, but it becomes a very complicated add-on. It may break every time a mojor vB version is released. I elect not to use such scheme. For the record, I used URL rewrite SEO back in vB2 era. In my .htaccess, I still need to redirect my old rewritten vB2 URLs in fear of Google penalizing my site. Basically the vB archive is very static, it was designed for SEO in the first place anyway. Think about how many clickable links a normal showthread brings to you, it becomes a mess for search engines no matter how smart your SEO is.

For index.php redirect, my working version is:

RewriteEngine on

#...

RewriteCond %{QUERY_STRING} ^$
RewriteRule ^index.php$ / [R=301,L]

If it does not work for you, I would check the http logs. Failing that, log your rewrite! (you need to do this in your http.conf, consult apache manual for log level etc.)
Reply With Quote
  #61  
Old 09-25-2005, 12:40 AM
jribz jribz is offline
 
Join Date: Oct 2003
Posts: 66
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Thanks for the reply, that clears up alot... I had to verify site ownership via google, the logs for the cron showed exactly that.

One thing I notice however, while looking now, is that the google spider is viewing a few regular threads, while the google adsense spider is viewing the archive, also viewing the archive is msnbot yahoslurp and askjeeves.

Wonder why google is seeing a regular thread now.

Going to look into the htaccess in a bit.
Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 02:43 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.05154 seconds
  • Memory Usage 2,332KB
  • Queries Executed 25 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (4)bbcode_code
  • (4)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)modsystem_post
  • (1)navbar
  • (6)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (1)pagenav_pagelinkrel
  • (11)post_thanks_box
  • (11)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (11)post_thanks_postbit_info
  • (10)postbit
  • (11)postbit_onlinestatus
  • (11)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete