PDA

View Full Version : Miscellaneous Hacks - TFSEO Google Sitemap Generator (URL rewriting support)


paketeto
12-07-2008, 10:00 PM
Google's sitemap with the URL rewriting of TFSEO.

If you dont know TFSEO look at: https://vborg.vbsupport.ru/showthread.php?t=173738

All credit goes to Superjeff, TFSEO'S creator, and i hope he can include and improve this code to merge it seamlessly into TFSEO.

READ CAREFULLY:
ONLY WORKS WITH METHOD 2 OF TFSEO (MEDIUM), with URL STRUCTURE SET TO DIRECTORY (THE FIRST ONE)

---------------------------------------


1. Add this line to your htaccess rules:

RewriteRule (.*)\.xml(.*) $1.php$2 [nocase]


2. Open sitemap.php and fill the required data.


3. Upload sitemap.php to the root of your forum.


4. Write in browser www.yourfoum.com/sitemap.php to generate the xml google sitemap.


About the rewrite rule:
What this does is it turns on the rewrite rule (allowing you to modify how URLs are handled, essentially) and adds the logic that allows a file.php to be intepreted as file.xml.

This now means that if you put sitemap.xml into your browser you'll be viewing the output from sitemap.php and that's crucial because now when Google looks for sitemap.xml it's viewing live data from your PHP script. This means that your sitemap.xml file will never be inaccurate.

So now you only must go to

http://www.google.com/webmasters/tools/ping?sitemap=http://www.yourforum/sitemap.xml

each you time you want to tell google to index the changes of your sitemap. The xml is generated by the php code on the fly so is always correct :)


limitations: Writes only home, forums and threads urls (i think is enough for most uses).

-----------------------------------
v1.1: Added parameters for Home, improved defaults value for frequency and priority.

Bombowiec89
12-09-2008, 05:13 AM
thanks installed :)

Q-v-n-s-Q
12-09-2008, 06:33 AM
very nice, so i need to install TFSEO as well.

LoRdGd
12-09-2008, 01:46 PM
Can you explain me why the info about database is nesesary in this modyfication? Maybe im supersensitive, but this is a firs mod where i found sth like this.

yahsuah
12-09-2008, 02:40 PM
Thanks!!!

I was searching this...

paketeto
12-09-2008, 04:42 PM
Can you explain me why the info about database is nesesary in this modyfication? Maybe im supersensitive, but this is a firs mod where i found sth like this.

Of course. Im not a programmer, and i dont know how to merge this with vbulletin, so i had to make this "dirty" solution, but it works. I hope Superjeff, the tfseo creator, will include and integrate this without the need of fill duplicated data. Sorry.

Anyway, although being a dirty code, is safe to use and works fine.

NAZIA
12-09-2008, 07:51 PM
Whats about this error|

XML Parsing Error: junk after document element
Location: http://bzupages.com/sitemap.php
Line Number 2, Column 1:<b>Warning</b>: mysql_num_rows(): supplied argument is not a valid MySQL result resource in <b>/mounted-storage/home108c/sub001/sc62900-HIDD/bzupages/sitemap.php</b> on line <b>47</b><br />
^

paketeto
12-09-2008, 10:18 PM
Whats about this error|

XML Parsing Error: junk after document element
Location: http://bzupages.com/sitemap.php
Line Number 2, Column 1:<b>Warning</b>: mysql_num_rows(): supplied argument is not a valid MySQL result resource in <b>/mounted-storage/home108c/sub001/sc62900-HIDD/bzupages/sitemap.php</b> on line <b>47</b><br />
^

Did you fill all required data in sitemap.php correctly and the database connection was successful?

xuanhuy238
12-11-2008, 12:33 AM
XML Parsing Error: syntax error
Location: http://www.pes.vn/sitemap.php
Line Number 1, Column 1:Query failed
^
how to fix it???

paketeto
12-11-2008, 07:19 AM
how to fix it???

The php is not connecting with your database, be sure you fill correctly all required data inside sitemap.php.

paketeto
12-13-2008, 07:42 PM
Finally, SuperJeff accepted to integrate this mod in TFSEO. So soon, we will have a very powerfull and free alternative to seo our vbulletin forum, without the need of manually editing files, all from control panel. :)

Regards!

yahsuah
12-14-2008, 11:55 AM
Finally, SuperJeff accepted to integrate this mod in TFSEO. So soon, we will have a very powerfull and free alternative to seo our vbulletin forum, without the need of manually editing files, all from control panel. :)

Regards!

We are waiting this :)

Kem
12-15-2008, 04:08 AM
Error de lectura XML: no se encuentra elemento
Ubicaci?n: http://geekside.net/foros/sitemap.php
N?mero de l?nea 7970, columna 1:
Spanish.

I get that error, it could connect successful to the mysql, but im getting that error. :/

paketeto
12-15-2008, 06:09 PM
I think this plugin is not ready for urls different than www.xxxxxxx.xxx

I suggest you to wait for the next release of TFSEO, because his creator agreed to include and improve my code to work seemlesly with TFSEO.

He also agreed to include some code i wrote including stopwords for urls, with smart options.
With this, we wont need much more to improve the seo of our forums. ;)


Error de lectura XML: no se encuentra elemento
Ubicaci?n: http://geekside.net/foros/sitemap.php
N?mero de l?nea 7970, columna 1:
Spanish.

I get that error, it could connect successful to the mysql, but im getting that error. :/

Kem
12-16-2008, 04:02 AM
Another error ^^

Error de lectura XML: no se encuentra elemento
Ubicaci?n: http://www.geekside.net/foros/sitemap.php
N?mero de l?nea 7998, columna 1:

Thanks for reply :)

milon_tsf
01-06-2009, 05:20 PM
Hi - I just installed, and get the following error when trying to run the script:

The XML page cannot be displayed

Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later.


--------------------------------------------------------------------------------

The following tags were not closed: urlset. Error processing resource

As far as I can see, it has mapped most of the pages and just stops half way through another. Any ideas?

Thanks

joe1989
01-20-2009, 07:21 PM
Perhaps this mod is not compatible with 3.8?

Just a thought... does anyone know if this mod works in 3.8??

kfiasche81
01-21-2009, 03:36 PM
i receive this error..

Errore interpretazione XML: contenuto illegale dopo l'elemento
Indirizzo: http://www.iltermitano.it/forum/sitemap.php
Linea numero 2, colonna 1:<b>Warning</b>: mysql_num_rows(): supplied argument is not a valid MySQL result resource in <b>/home/iltermitano/iltermitano.it/forum/sitemap.php</b> on line <b>47</b><br />
^

JamesGunner
01-23-2009, 08:51 PM
not working anymore

desconexion
01-24-2009, 12:22 AM
Hi there.

I get this error

This page contains the following errors:

error on line 7163 at column 1: Extra content at the end of the document
Below is a rendering of the page up to the first error.

Any solutions?

SoftDux
02-22-2009, 10:01 AM
Does anyone know if this mod will work on VB 3.6?

imported_silkroad
04-10-2009, 08:46 AM
Can't get this to work with 3.7.x .....

Error:

[10-Apr-2009 05:30:44] PHP Fatal error: Call to undefined function convert_int_to_utf8() in /home/public_html/forum/sitemap.php(36) : regexp code on line 1

Crimm
04-14-2009, 05:11 PM
I ran across this and I see people are having problems.

I really want this working so I have started working on it.

Update 1: I have noticed that there is no place for a prefix, which a lot of us use. So I have added that to the variables.

Update 2: I noticed a few of the queries combined multiple queries into one. mysql_num_rows for one. I have split these up for debugging purposes.

For example:

$result = mysql_query("select * from " . $prefix . "thread WHERE visible = 1",$link);
$num_rows = mysql_num_rows($result);
$query = "select * from " . $prefix . "thread WHERE visible = 1 ORDER BY dateline desc" ;
$result = mysql_query($query) or die("Query failed");

$num_rows2 = mysql_num_rows(mysql_query("select * from " . $prefix . "forum"));
$query2 = "select * from " . $prefix . "forum ORDER BY forumid desc" ;
$result2 = mysql_query($query2) or die("Query failed");

Update 3: It is failing validation (http://validator.w3.org/), but adding the necessary information to it causes the file to fail.

Update 4: It would be really nice if this file pulled from the config automatically.

Giving me the following file:

<?php

///////////////////////Configure this/////////////////////////////////////////////////////

$username="username"; //your db user name
$password="password"; // the pass for the db
$database="dbname"; // name of db
$server="localhost"; // server
$prefix = "vb_";
$sitename="http://something.com/"; // sitename including www
$maxkey = "8"; // max key words as you specified in tfseo control panel
$home_freq = "daily"; //always, hourly, daily, weekly, monthly, yearly, never
$home_priority = "1"; // google priority for home (www.yoursite.com)
$forum_freq = "daily"; //always, hourly, daily, weekly, monthly, yearly, never
$forum_priority = "0.8"; // google priority for forums (www.yoursite.com/f4)
$thread_freq = "daily"; //always, hourly, daily, weekly, monthly, yearly, never
$thread_priority = "0.4"; // google priority for threads (www.yoursite.com/f4/your-thread)

function remove_accents($string){ ////this strings must be the same you are using as character replacements, these are the "wide range"

return strtr($string,
"?????????????????????????????????????????????????? ?????????????@?",
"YuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiion oooooouuuuyyEan");
}

////////////////////////////DONT NEED TO TOUCH ANYTHING BELOW//////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////////////////



function unhtmlspecialchars($text)
{

$text = preg_replace('/&#([0-9]+);/esiU', "convert_int_to_utf8('\\1')", $text);


return str_replace(array('&lt;', '&gt;', '&quot;', '&amp;'), array('<', '>', '"', '&'), $text);
}



//connect to the database
$link = mysql_connect($server,$username,$password);
mysql_select_db($database,$link) or die( "Unable to select database");

$result = mysql_query("select * from " . $prefix . "thread WHERE visible = 1",$link);
$num_rows = mysql_num_rows($result);
$query = "select * from " . $prefix . "thread WHERE visible = 1 ORDER BY dateline desc" ;
$result = mysql_query($query) or die("Query failed");

$num_rows2 = mysql_num_rows(mysql_query("select * from " . $prefix . "forum"));
$query2 = "select * from " . $prefix . "forum ORDER BY forumid desc" ;
$result2 = mysql_query($query2) or die("Query failed");

//this is the normal header applied to any Google sitemap.xml file
echo '<?xml version="1.0" encoding="ISO-8859-1"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">';

//HOME RESULTS

$url_product ="http://" . $sitename;

$realdate = date("Y-m-d");
$year = substr($realdate,0,4); //work out the month
$mon = substr($realdate,5,2); //work out the month
$day = substr($realdate,8,2); //work out the day
$displaydate = ''.$year.'-'.$mon.'-'.$day.'';

echo
'
<url>
<loc>'.$url_product.'</loc>
<lastmod>'.$displaydate.'</lastmod>
<changefreq>'.$home_freq.'</changefreq>
<priority>'.$home_priority.'</priority>
</url>
';


//FORUM RESULTS
$i=0;
for($i=0;$i<$num_rows2; $i++)
{
$url_product ="http://" . $sitename . "/f" .mysql_result($result2,$i,"forumid");

/*you need to assign a date to the entity. if you don't
store a timestamp in the Database then you need slapping*/

$realdate = date("Y-m-d");
$year = substr($realdate,0,4); //work out the month
$mon = substr($realdate,5,2); //work out the month
$day = substr($realdate,8,2); //work out the day
$displaydate = ''.$year.'-'.$mon.'-'.$day.'';

echo
'
<url>
<loc>'.$url_product.'</loc>
<lastmod>'.$displaydate.'</lastmod>
<changefreq>'.$forum_freq.'</changefreq>
<priority>'.$forum_priority.'</priority>
</url>
';
}

//THREAD RESULTS
for($i=0;$i<$num_rows; $i++)
{
//cleanurl
$title = mysql_result($result,$i,"title");
$a = strtolower($title);
$a = remove_accents($a);
$a = unhtmlspecialchars($a);
$a = str_replace("'", '', $a);
$a = preg_split("#[^a-z0-9]#", $a, -1, PREG_SPLIT_NO_EMPTY);
$a = array_slice($a, 0, $maxkey);
$a = implode("-",$a);
if (empty($a))
{
$a = 'thread';
}

//your url-product as we worked out in #4
$url_product ="http://" . $sitename . "/f" .mysql_result($result,$i,"forumid") . "/" . $a . '-t' .mysql_result($result,$i,"threadid");

/*you need to assign a date to the entity. if you don't
store a timestamp in the Database then you need slapping*/


$date = mysql_result($result,$i,"dateline"); //the date stored
$realdate = date('Y-m-d H:i:s', $date);

$year = substr($realdate,0,4); //work out the month
$mon = substr($realdate,5,2); //work out the month
$day = substr($realdate,8,2); //work out the day

/*display the date in the format Google expects:
2006-01-29 for example*/

$displaydate = ''.$year.'-'.$mon.'-'.$day.'';

//you can assign whatever changefreq and priority you like
echo
'
<url>
<loc>'.$url_product.'</loc>
<lastmod>'.$displaydate.'</lastmod>
<changefreq>'.$thread_freq.'</changefreq>
<priority>'.$thread_priority.'</priority>
</url>
';
}

mysql_close(); //close connection

//close the XML attribute that we opened in #3
echo
'</urlset>';



?>

Errors I'm still receiving:
1) IE 7 says invalid XML
2) Awaiting Google Webmaster Tools to validate Sitemap.

Wants:
1) To pass validation
2) Pull info from config file

If I have time (which I don't have much of) - I'll come back here and update this and see if I can't help some people out :) by adding wants.

Note: I'm running 3.8.1 and I get output, but I'm not 100% sure that Google will accept it.

Also note: I'm not officially supporting this, like I said I don't have much time. I will try to get it working for myself and put it here and put notes, but that's about all I'm going to have time to do :)

Crimm
04-14-2009, 05:25 PM
I just got returned a whole bunch of errors from Google Webmaster tools.

Let me keep working on this and see what I can present.

EDIT:

The errors were because of the $sitename variable.

Do not include http://

Crimm
04-14-2009, 05:45 PM
Okay this update utilizes the config.php to pull database information, so you don't have to enter it.

The only thing you have to enter is the little bit of stuff at the top.

<?php
require_once('./includes/config.php');

///////////////////////Configure this/////////////////////////////////////////////////////

$sitename="SITENAME WITHOUT THE HTTP!!"; // sitename including www
$maxkey = "8"; // max key words as you specified in tfseo control panel
$home_freq = "daily"; //always, hourly, daily, weekly, monthly, yearly, never
$home_priority = "1"; // google priority for home (www.yoursite.com)
$forum_freq = "daily"; //always, hourly, daily, weekly, monthly, yearly, never
$forum_priority = "0.8"; // google priority for forums (www.yoursite.com/f4)
$thread_freq = "daily"; //always, hourly, daily, weekly, monthly, yearly, never
$thread_priority = "0.4"; // google priority for threads (www.yoursite.com/f4/your-thread)

function remove_accents($string){ ////this strings must be the same you are using as character replacements, these are the "wide range"

return strtr($string,
"?????????????????????????????????????????????????? ?????????????@?",
"YuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiion oooooouuuuyyEan");
}

////////////////////////////DONT NEED TO TOUCH ANYTHING BELOW//////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////////////////

$username=$config['MasterServer']['username']; //your db user name
$password=$config['MasterServer']['password']; // the pass for the db
$database=$config['Database']['dbname']; // name of db
$server=$config['MasterServer']['servername']; // server
$prefix = $config['Database']['tableprefix'];


function unhtmlspecialchars($text)
{

$text = preg_replace('/&#([0-9]+);/esiU', "convert_int_to_utf8('\\1')", $text);


return str_replace(array('&lt;', '&gt;', '&quot;', '&amp;'), array('<', '>', '"', '&'), $text);
}



//connect to the database
$link = mysql_connect($server,$username,$password);
mysql_select_db($database,$link) or die( "Unable to select database");

$result = mysql_query("select * from " . $prefix . "thread WHERE visible = 1",$link);
$num_rows = mysql_num_rows($result);
$query = "select * from " . $prefix . "thread WHERE visible = 1 ORDER BY dateline desc" ;
$result = mysql_query($query) or die("Query failed");

$num_rows2 = mysql_num_rows(mysql_query("select * from " . $prefix . "forum"));
$query2 = "select * from " . $prefix . "forum ORDER BY forumid desc" ;
$result2 = mysql_query($query2) or die("Query failed");

//this is the normal header applied to any Google sitemap.xml file
echo '<?xml version="1.0" encoding="ISO-8859-1"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">';

//HOME RESULTS

$url_product ="http://" . $sitename;

$realdate = date("Y-m-d");
$year = substr($realdate,0,4); //work out the month
$mon = substr($realdate,5,2); //work out the month
$day = substr($realdate,8,2); //work out the day
$displaydate = ''.$year.'-'.$mon.'-'.$day.'';

echo
'
<url>
<loc>'.$url_product.'</loc>
<lastmod>'.$displaydate.'</lastmod>
<changefreq>'.$home_freq.'</changefreq>
<priority>'.$home_priority.'</priority>
</url>
';


//FORUM RESULTS
$i=0;
for($i=0;$i<$num_rows2; $i++)
{
$url_product ="http://" . $sitename . "/f" .mysql_result($result2,$i,"forumid");

/*you need to assign a date to the entity. if you don't
store a timestamp in the Database then you need slapping*/

$realdate = date("Y-m-d");
$year = substr($realdate,0,4); //work out the month
$mon = substr($realdate,5,2); //work out the month
$day = substr($realdate,8,2); //work out the day
$displaydate = ''.$year.'-'.$mon.'-'.$day.'';

echo
'
<url>
<loc>'.$url_product.'</loc>
<lastmod>'.$displaydate.'</lastmod>
<changefreq>'.$forum_freq.'</changefreq>
<priority>'.$forum_priority.'</priority>
</url>
';
}

//THREAD RESULTS
for($i=0;$i<$num_rows; $i++)
{
//cleanurl
$title = mysql_result($result,$i,"title");
$a = strtolower($title);
$a = remove_accents($a);
$a = unhtmlspecialchars($a);
$a = str_replace("'", '', $a);
$a = preg_split("#[^a-z0-9]#", $a, -1, PREG_SPLIT_NO_EMPTY);
$a = array_slice($a, 0, $maxkey);
$a = implode("-",$a);
if (empty($a))
{
$a = 'thread';
}

//your url-product as we worked out in #4
$url_product ="http://" . $sitename . "/f" .mysql_result($result,$i,"forumid") . "/" . $a . '-t' .mysql_result($result,$i,"threadid");

/*you need to assign a date to the entity. if you don't
store a timestamp in the Database then you need slapping*/


$date = mysql_result($result,$i,"dateline"); //the date stored
$realdate = date('Y-m-d H:i:s', $date);

$year = substr($realdate,0,4); //work out the month
$mon = substr($realdate,5,2); //work out the month
$day = substr($realdate,8,2); //work out the day

/*display the date in the format Google expects:
2006-01-29 for example*/

$displaydate = ''.$year.'-'.$mon.'-'.$day.'';

//you can assign whatever changefreq and priority you like
echo
'
<url>
<loc>'.$url_product.'</loc>
<lastmod>'.$displaydate.'</lastmod>
<changefreq>'.$thread_freq.'</changefreq>
<priority>'.$thread_priority.'</priority>
</url>
';
}

mysql_close(); //close connection

//close the XML attribute that we opened in #3
echo
'</urlset>';



?>

It translates to XML using IE7 - Just tried and I'm awaiting another re-submission with Google Webmaster Tools.

CILGINKRAL_
04-14-2009, 06:17 PM
Thank you really, really good :)

Crimm
04-14-2009, 07:00 PM
Looks like Google Webmaster Tools is taking it this time.

It'll take a long time though because I have a few LARGE RSS feeds coming in.

Tonight once I see it complete ... Maybe I'll work on further integrating it.

I wonder if paketeto would let me release it under 3.8 so that we have a clean place to discuss it.

Crimm
04-15-2009, 01:19 AM
google webmaster tools said:

Property Status
Sitemap type Web
Format Sitemap
Submitted 7 hours ago
Last downloaded by Google 5 hours ago
Status OK
Total URLs in Sitemap 6196


So this works.

I guess I'll wait to do anymore modifications until I hear from paketeto.

I took vbtfseo the first version and modified it highly for my site, but this still works perfectly with it. I tested it on another site with vbtfseo version 2.X and it works as well.

If anyone is interested in me working on this, please let me know.

zeus_r6
02-19-2010, 04:43 PM
These are the errors I'm coming up with using your modified sitemap.php code:

78460
Parsing error
We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting.
Problem detected on: Feb 19, 2010
Warnings 78458
Invalid XML tag
This tag was not recognized. Please fix it and resubmit.
Parent tag: urlset
Tag: br
Problem detected on: Feb 19, 2010
Warnings 78459
Invalid XML tag
This tag was not recognized. Please fix it and resubmit.
Parent tag: urlset
Tag: b
Problem detected on: Feb 19, 2010
Warnings 78459
Invalid XML tag
This tag was not recognized. Please fix it and resubmit.
Parent tag: urlset
Tag: b
Problem detected on: Feb 19, 2010
Warnings 78459
Invalid XML tag
This tag was not recognized. Please fix it and resubmit.
Parent tag: urlset
Tag: b
Problem detected on: Feb 19, 2010
Warnings 78459
Invalid XML tag
This tag was not recognized. Please fix it and resubmit.
Parent tag: urlset
Tag: br
Problem detected on: Feb 19, 2010

Carpesimia
10-12-2011, 12:18 AM
I know this response comes pretty late, but I just inherited a forum running TFSEO on 3.8.7, and my boss wanted a sitemap immediately. I downloaded this package and tested it out. Here is what I found, and I hope it helps you...

1) The sitemap ran for awhile and then just stopped dead. It was missing a function called convert_int_to_utf8(). This could be the issue many of you reported. I added this function, and the sitemap ran to completion each time.

2) The sitemap generator is stupid, and grabs all forums and threads. Including private forums. So I added an "excuded_forums" variable, and excluded those forums in the 2 SQL statements.

3) The way this is written is S-L-O-W on a large forum. The forum Im using has 184,000 threads and it took 38 minutes to create the sitemap. The box is a bad-ass machine too, with 32 gb of ram and 4 quad processors. Its not a 386. Its because of the "select *" and "mysql_result()" grabbing alot of data, and then picking each piece from the results one at a time. So yeah, SLOW.

4) Having 184k urls, my sitemap is too big for google to eat.

So here's what I had to do to make it work:

1) I replaced the I/O with a database library I use all the time. Just this one change took the processing of this program from 38 minutes down to 11 seconds. This is because instead of grabbing the data piecemeal from the database, i only selected the fields i was using and grabbed the data in-bulk from the db and dropped it into an array for processing. So yeah, 11 seconds.

2) The file was still big, so i changed the program to write the file to disc instead of echoing out the output. This way I can generate it via cron, whenever i want.

3) I also added a new variable called $urlsperfile, and created a sitemap_#.xml with that many urls, cycling to a new file when i got more than $urlsperfile.

4) Added a call to gzip to compress all of the files i made.

5) created an uncompressed sitemap.xml file, which is a sitemap index file pointing to all of the sitemap_#.xml.gz files i just processed.

6) Finally, made some very MINOR changes to the xml to meet the sitemap.orgs requirements, found at: http://www.sitemaps.org/protocol.php

The final outcome? a cronjob that runs in less than 15 seconds, generates 10 sitemap files with 20k urls per, and one sitemap file linking them all.

Oh, and this is on a 3.8.7 forum.