After working with this mod for three weeks, I have dropped the Google Sitemap method suggested by the vBSEO team. That method, well intended, was a quick "kludge" which was not optimal for this type of application.
What I have done is easy and requires a small bit of manual labor and goes something like this:
Copy the *xml.gz files from ./vbseo_sitemap/data to another directory, for example FORUMROOT/es for Spanish, FORUMROOT/ja for Japanese, FORUMROOT/zh-CN for Chinese etc.
Unzip the files and use sed to add the ?hl=ja (or whatever flag you want to do) to each URL in the Sitemap. This takes about 10 seconds.
Update sitemap_index.xml.gz the same way, or use VI, etc.
Submit this Sitemap to Google.
Copy the first one you did and repeat for as many languages as you wish.
This method has many advantages.
First of all you have a completely different sitemap of your entire site for each language. So easy to submit to language specific search engines. Also, you can easily track the indexing progress for each Sitemap. This is much easier to manage and much cleaner, IMHO.
Of course, this method takes a bit of work when your need to update your language Sitemaps, but if you have a large board, this will get you indexed nicely in a well organized way. You can add the newer links after a high percentage of the legacy links are archived (in a few months).
We added the top 10 languages to Google Webmaster Tools, each with its own Sitemap, so where we originally had one big sitemap with nearly 396K URLs, we now have a total of around 4,750K URLs total in 11 Sitemaps. So far, Google is happy :-)
With this simple method, you can see the index progress on each language. You can submit your Sitemaps to language specific search engines. You can manage the update frequency on the translated URLs differently than your main site. You can also avoid any potential problems with your main sitemap.
We are definately seeing a sort of "Google penalty" for using ?hl=flag (duplicate content, it seems -- at least in the Sitemaps)
As Google indexes the various language sitemaps, it is subtracting indexed links from the main sitemap ... so I will need to rewrite the URLs sooner-than-later !
We are definately seeing a sort of "Google penalty" for using ?hl=flag (duplicate content, it seems -- at least in the Sitemaps)
As Google indexes the various language sitemaps, it is subtracting indexed links from the main sitemap ... so I will need to rewrite the URLs sooner-than-later !
I think it was temporary Google problem - I change nothing and right now my sitemap is clear of duplicate content errors Just checked.
I checked first one '^(.+?)\.html\?hl=(.+?)$' => '$2/$1.html' and it crashed my forum So I commented it.
The best way would be to redirect internally URLs like country/rest to rest?hl=country.
Internally I mean without changing URL in browser (without sending header).
And redirect URLs like rest?hl=country to country/rest with 301 header.
It would be best because old, already indexed addresses will work. Redirect will made reindexing faster (I think ) and you avoid possibility of duplicate content penalty if same content is available in booth URL's. In same time it would be good for this mod because no changes would be required at all. Unfortunately I'm not expert in .htaccess file or vbSEO custom rewrite rules, so I don't know does it is possible. For sure it is possible to redirect one address to other, but can it be done internally?...