Go Back   vb.org Archive > vBulletin Modifications > Archive > vB.org Archives > vBulletin 3.5 > vBulletin 3.5 Add-ons
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools
Google sitemap for the vB Archives. Redirect human and robots. Details »»
Google sitemap for the vB Archives. Redirect human and robots.
Version: 1.2, by lierduh lierduh is offline
Developer Last Online: Nov 2023 Show Printable Version Email this Page

Version: 3.5.1 Rating:
Released: 08-09-2005 Last Update: 11-08-2005 Installs: 130
Uses Plugins
Code Changes Additional Files  
No support by the author.

Release V1.2 (9 Nov 2005)
* Higher sitemap priority rate is given to threads with new posts. So Google can index fresh threads first.

* Not recommending the original optional STEP 3 hack. To avoid potential Google penalty, my advice is to remove the STEP 3 hack.

Release V1.1a (12 Oct 2005)

* Bug fix only

Release V1.1 (9 Oct 2005)

* Can handle very large forums with more than 50,000 URLs per forum
URLs will be spanned through multiple files for each large forum.

* Created a function to detect search engine crawlers. The vB built-in
search engine detector can only identify about 3 or 4 search engines.
My function will detect over 20 search engine crawlers.

* Support forums hosted by web servers that do not support 'fix_pathinfo'
ie. instead of the usual 'archive/index.php/f-10.html' link. These
forums have a link as 'archive/index.php?f-10.html'.

* Alert about wrong directory permissions to help newbies.

* Automatically write index file to archive directory if the php
script can not write into the base vB directory.

* Bug fixes.


Objectives
==============
  • Create Google sitemap files and sitemap index file for vB archives, submit to Google by the Scheduled Tasks.
  • To have the vB Archive used as a mirror to the actual threads.
  • Google loves the nature of the archive pages, as they are static and do not contain repeated contents.
  • Google gauge pages heavily based on external links. We need to redirect these external thread links to the archive pages.
  • We often see vbulletin archive in the Google search results, but the users are taken to the archive page instead of the actual threads. We need to automatically redirect visitors to the actual threads instead of the archive. Otherwise the visitor either need to reclick for the Full Version or read the dull archive contents.

Q and A
==============
Q. Would the sitemap contain the links for hidden forums?
A. No, the forum permission was consulted while generating the sitemap files.

Q. How often are the sitemap files generated?
A. You decide and set in the Scheduled Tasks. The script can not be called by external user by default to prevent boring people killing your server.

Q. Is the sitemap file compressed.
A. Yes, the multiple sitemap files are gunziped according to Google sitemap standard to save bandwidth. Sitemap index file is not compressed, it is submitted as a normal xml file.

Q. Would the sitemaps include links for the normal threads? eg. showthread.php?t=1234...
A. No, it is unlikely Google will index your entire site if you feed it with all the combination of showthread links. It is better to let Google going through the more static archives. You will have a better chance for sure to have more thread contents indexed by Google this way.

Q. Why don't you go crazy about rewrite rules and do things like including thread title as the url.
A. I won't deny having keywords in the url is a good SEO strategy, but Google also does not like "Over Search Engine Optimized" web sites. Google has recently penalized a huge number of such sites. Sending them from page rank of 5, 6 to 0.

Q. Does sitemap really help?
A. Definitely, Google has done over 60,000 pages since I submitted my sitemaps a few days ago. Yahoo bots were visiting more pages than Google before the sitemap. I expect the total Google visits for this month will be exceeding Yahoo in the next one or two days.

What is involved?
==================
I have divided this hack into two steps. The first step involves unloading a php file. This enables the sitemap to be generated and submitted to Google.

The second step involves installing a Plugin using AdminCP. This sends all robots to the archive pages, preventing them viewing the actual threads.

For example, Google/Other Crawlers follows an external link to visit:
http://forums.mysite/showthread.php?t=1234&page=2

It will be told this page is permanently relocated to:
http://forums.mysite/archive/index.php/t-1234-p-2

This way you don't lose page rank gain from external links.

Install
=========
To install, follow the readme file.
To let me know you have installed this and let me send update information to you. Please click INSTALL .

Strategy
=========

It is unlikely Google/other Search Engine will index your entire site, especially due to the dynamic nature of the vbulletin forums. An archive sitemap will let Google concentrate on the real contents of your forums -- the threads. If Google needs to go through the endless member profile pages. It will get sick of it and just become tired.(sorry, perhaps robots can not become tired). What we can do is disallowing the crawling of unneccessary pages. My robots.txt contains:

#ALL BOTS
User-agent: *
Disallow: /admincp/
Disallow: /ajax.php
Disallow: /attachments/
Disallow: /clientscript/
Disallow: /cpstyles/
Disallow: /images/
Disallow: /includes/
Disallow: /install/
Disallow: /modcp/
Disallow: /subscriptions/
Disallow: /customavatars/
Disallow: /customprofilepics/
Disallow: /announcement.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /external.php
Disallow: /faq.php
Disallow: /frm_attach
Disallow: /image.php
#Disallow: /index.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php?
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /payment_gateway.php
Disallow: /payments.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /showpost.php
Disallow: /subscription.php
Disallow: /usercp.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php

You perhaps have noticed I included index.php in there. Apparently Google regards http://forums.mysite/index.html as same as http://forums.mysite/
...but http://forums.mysite/index.php as a different file. The default vB templates include index.php as the internal link. That will spread your page rank on your home page! So it is better off not letting Google see this file.

If you have rewrite installed. Perhaps you could add to the .htaccess file:

RewriteCond %{QUERY_STRING} ^$
RewriteRule ^index.php$ / [R=301,L]

(if your forums are under http://site/forums/. Try: RewriteRule ^forums/index.php$ forums/ [R=301,L])

That will redirect /index.php to /, but only if no query_string is presented. ie. /index.php?do=mymod will not be redirected.

Show Your Support

  • This modification may not be copied, reproduced or published elsewhere without author's permission.

Comments
  #192  
Old 10-27-2005, 10:14 PM
D|ver's Avatar
D|ver D|ver is offline
 
Join Date: Feb 2003
Posts: 177
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

i have a small question:
is it possible to add additional pages to the sitemap?
Reply With Quote
  #193  
Old 10-28-2005, 01:07 PM
buro9 buro9 is offline
 
Join Date: Feb 2002
Location: London, UK
Posts: 585
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I have a feature request... to dump a single text file, gzipped, of ALL of the urls that go into the various sitemaps.

Basically... when we're in the loop to create the various sitemaps, to additionally write a text file, with just one full URL per line.

This is because this would also be good for Yahoo and other spiders. Yahoo specifically asks for such a thing on their submit page:
http://submit.search.yahoo.com/free/request
Quote:
You can also provide the location of a text file containing a list of URLs, one URL per line, say urllist.txt. We also recognize compressed versions of the file, say urllist.gz.
And to me... it seems that the loop to create the Google Sitemap, is the perfect low overhead place to also dump the Archive URL's into a text file for Yahoo and other spiders to feed from.
Reply With Quote
  #194  
Old 10-28-2005, 07:58 PM
:Judge:'s Avatar
:Judge: :Judge: is offline
 
Join Date: Jan 2003
Location: USA ~ MD
Posts: 230
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Don't know but maybe this is over my head here

Went through and made changes 1 and 2 and that is all and I have no idea to check and see if it is working.

I have no errors so that is a plus.

Do I have to sign up for Google Sitemap? - Forget I said that.
Reply With Quote
  #195  
Old 10-29-2005, 09:52 AM
dutchbb dutchbb is offline
 
Join Date: Nov 2003
Posts: 899
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Just a warning here... I was reading about seo and step 3 might not be such a good idea. This is called cloaking and it is a black hat technique. It is clearly stated in the google quality terms as a forbidden way to SEO.

Actually I found out after installing this, my pagerank for my homepage did go down 4 pages for a very important keyword, I didn't do anything else that could be suspicious so I removed this!

The sitemap is good, but optimizing pages just for search engines and make them look different from what you human visitors see, is NOT recommended and you take a high risk for being penalized by google or other SE.
Reply With Quote
  #196  
Old 10-29-2005, 11:13 PM
falter falter is offline
 
Join Date: Oct 2004
Posts: 24
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Triple_T
Just a warning here... I was reading about seo and step 3 might not be such a good idea. This is called cloaking and it is a black hat technique. It is clearly stated in the google quality terms as a forbidden way to SEO.

Actually I found out after installing this, my pagerank for my homepage did go down 4 pages for a very important keyword, I didn't do anything else that could be suspicious so I removed this!

The sitemap is good, but optimizing pages just for search engines and make them look different from what you human visitors see, is NOT recommended and you take a high risk for being penalized by google or other SE.
if you impelemented the robots.txt that is suggested, that is most likely the cause of your PR drop, and not due to the cloaking.

There are many, many things that you can do that are considered cloaking. I don't think google would flip out over this seeing as the content that is provided to the search engine spider is the same content that is provided to the human user. An example of an abuse of cloaking is where, say, a completely different set of content is given to the spider than that which is given to the human.

I'm going to pubcon (http://www.pubcon.com) in a couple weeks; I'll ask around to see what some SEO's think about what we're doing here. I can even ask guys at yahoo and google. Personally, I think it's fine, since the same core content is being given to the search engines and humans.
Reply With Quote
  #197  
Old 11-01-2005, 04:23 PM
eoc_Jason's Avatar
eoc_Jason eoc_Jason is offline
 
Join Date: Dec 2001
Location: Houston, TX
Posts: 493
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

The problem is, what you think is the "same content" is different than what a spider thinks is the same. Yes, cloaking is a serious issue and search engines do penaltize sites for doing such. Some search engines (like google) have spiders that look like a regular web broswer so that it can compare results between it and the actual spider results. If they don't match then, well, you get the idea.

I expanded my robots.txt file to exclude a lot of the links that are listed in the notes. And I use the generator script to make the xml files for google, but that's it. I do not believe it trying to redirect bots or users to various pages, that will only end up with bad things happening.
Reply With Quote
  #198  
Old 11-04-2005, 10:20 PM
lierduh lierduh is offline
 
Join Date: Jan 2003
Location: Sydney, Australia
Posts: 459
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Cloaking or not, is a long debating topic. The general advise from the experts is to not cloaking due the risk it involves. I would say do not install the step 3 if you are concerned about this.

However nowadays many major sites use cloaking including Amazon and Google itself. Believe or not vBulletin also uses cloaking!
Reply With Quote
  #199  
Old 11-07-2005, 04:01 PM
eoc_Jason's Avatar
eoc_Jason eoc_Jason is offline
 
Join Date: Dec 2001
Location: Houston, TX
Posts: 493
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Yes, but Amazon is a much more reputable site than say, joe bob's bait shack... Plus companies like that work directly with Google to enhance features for both sites.
Reply With Quote
  #200  
Old 11-08-2005, 09:27 PM
falter falter is offline
 
Join Date: Oct 2004
Posts: 24
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Well! I've backed-out the cloaking after the number of my indexed pages on google went from >40,000 to just over 800. I'm assuming that we got penalized in some form. My PR is still a 5, but that doesn't mean much of anything at all.

I can honestly say that my opinion is reversed on the cloaking side of things. I do not recommend implementing step 3.
Reply With Quote
  #201  
Old 11-08-2005, 11:22 PM
Citizen Citizen is offline
 
Join Date: Sep 2005
Posts: 129
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

What exactly was step 3 of the hack? I looked over the installation and didn't see a "step 3"
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 06:17 AM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.04590 seconds
  • Memory Usage 2,329KB
  • Queries Executed 25 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (2)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)modsystem_post
  • (1)navbar
  • (6)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (1)pagenav_pagelinkrel
  • (11)post_thanks_box
  • (11)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (11)post_thanks_postbit_info
  • (10)postbit
  • (11)postbit_onlinestatus
  • (11)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete