The Arcive of Official vBulletin Modifications Site.It is not a VB3 engine, just a parsed copy! |
|
TFSEO Google Sitemap Generator (URL rewriting support) Details »» | |||||||||||||||||||||||||||
TFSEO Google Sitemap Generator (URL rewriting support)
Developer Last Online: Feb 2009
Google's sitemap with the URL rewriting of TFSEO.
If you dont know TFSEO look at: https://vborg.vbsupport.ru/showthread.php?t=173738 All credit goes to Superjeff, TFSEO'S creator, and i hope he can include and improve this code to merge it seamlessly into TFSEO. READ CAREFULLY: ONLY WORKS WITH METHOD 2 OF TFSEO (MEDIUM), with URL STRUCTURE SET TO DIRECTORY (THE FIRST ONE) --------------------------------------- 1. Add this line to your htaccess rules: RewriteRule (.*)\.xml(.*) $1.php$2 [nocase] 2. Open sitemap.php and fill the required data. 3. Upload sitemap.php to the root of your forum. 4. Write in browser www.yourfoum.com/sitemap.php to generate the xml google sitemap. About the rewrite rule: What this does is it turns on the rewrite rule (allowing you to modify how URLs are handled, essentially) and adds the logic that allows a file.php to be intepreted as file.xml. This now means that if you put sitemap.xml into your browser you'll be viewing the output from sitemap.php and that's crucial because now when Google looks for sitemap.xml it's viewing live data from your PHP script. This means that your sitemap.xml file will never be inaccurate. So now you only must go to http://www.google.com/webmasters/too...um/sitemap.xml each you time you want to tell google to index the changes of your sitemap. The xml is generated by the php code on the fly so is always correct limitations: Writes only home, forums and threads urls (i think is enough for most uses). ----------------------------------- v1.1: Added parameters for Home, improved defaults value for frequency and priority. Supporters / CoAuthors Show Your Support
|
Comments |
#22
|
|||
|
|||
Does anyone know if this mod will work on VB 3.6?
|
#23
|
|||
|
|||
Can't get this to work with 3.7.x .....
Error: [10-Apr-2009 05:30:44] PHP Fatal error: Call to undefined function convert_int_to_utf8() in /home/public_html/forum/sitemap.php(36) : regexp code on line 1 |
#24
|
||||
|
||||
I ran across this and I see people are having problems.
I really want this working so I have started working on it. Update 1: I have noticed that there is no place for a prefix, which a lot of us use. So I have added that to the variables. Update 2: I noticed a few of the queries combined multiple queries into one. mysql_num_rows for one. I have split these up for debugging purposes. For example: Code:
$result = mysql_query("select * from " . $prefix . "thread WHERE visible = 1",$link); $num_rows = mysql_num_rows($result); $query = "select * from " . $prefix . "thread WHERE visible = 1 ORDER BY dateline desc" ; $result = mysql_query($query) or die("Query failed"); $num_rows2 = mysql_num_rows(mysql_query("select * from " . $prefix . "forum")); $query2 = "select * from " . $prefix . "forum ORDER BY forumid desc" ; $result2 = mysql_query($query2) or die("Query failed"); Update 4: It would be really nice if this file pulled from the config automatically. Giving me the following file: Code:
<?php ///////////////////////Configure this///////////////////////////////////////////////////// $username="username"; //your db user name $password="password"; // the pass for the db $database="dbname"; // name of db $server="localhost"; // server $prefix = "vb_"; $sitename="http://something.com/"; // sitename including www $maxkey = "8"; // max key words as you specified in tfseo control panel $home_freq = "daily"; //always, hourly, daily, weekly, monthly, yearly, never $home_priority = "1"; // google priority for home (www.yoursite.com) $forum_freq = "daily"; //always, hourly, daily, weekly, monthly, yearly, never $forum_priority = "0.8"; // google priority for forums (www.yoursite.com/f4) $thread_freq = "daily"; //always, hourly, daily, weekly, monthly, yearly, never $thread_priority = "0.4"; // google priority for threads (www.yoursite.com/f4/your-thread) function remove_accents($string){ ////this strings must be the same you are using as character replacements, these are the "wide range" return strtr($string, "???????????????????????????????????????????????????????????????@?", "YuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyyEan"); } ////////////////////////////DONT NEED TO TOUCH ANYTHING BELOW////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////////////////////// function unhtmlspecialchars($text) { $text = preg_replace('/&#([0-9]+);/esiU', "convert_int_to_utf8('\\1')", $text); return str_replace(array('<', '>', '"', '&'), array('<', '>', '"', '&'), $text); } //connect to the database $link = mysql_connect($server,$username,$password); mysql_select_db($database,$link) or die( "Unable to select database"); $result = mysql_query("select * from " . $prefix . "thread WHERE visible = 1",$link); $num_rows = mysql_num_rows($result); $query = "select * from " . $prefix . "thread WHERE visible = 1 ORDER BY dateline desc" ; $result = mysql_query($query) or die("Query failed"); $num_rows2 = mysql_num_rows(mysql_query("select * from " . $prefix . "forum")); $query2 = "select * from " . $prefix . "forum ORDER BY forumid desc" ; $result2 = mysql_query($query2) or die("Query failed"); //this is the normal header applied to any Google sitemap.xml file echo '<?xml version="1.0" encoding="ISO-8859-1"?> <urlset xmlns="http://www.google.com/schemas/sitemap/0.84">'; //HOME RESULTS $url_product ="http://" . $sitename; $realdate = date("Y-m-d"); $year = substr($realdate,0,4); //work out the month $mon = substr($realdate,5,2); //work out the month $day = substr($realdate,8,2); //work out the day $displaydate = ''.$year.'-'.$mon.'-'.$day.''; echo ' <url> <loc>'.$url_product.'</loc> <lastmod>'.$displaydate.'</lastmod> <changefreq>'.$home_freq.'</changefreq> <priority>'.$home_priority.'</priority> </url> '; //FORUM RESULTS $i=0; for($i=0;$i<$num_rows2; $i++) { $url_product ="http://" . $sitename . "/f" .mysql_result($result2,$i,"forumid"); /*you need to assign a date to the entity. if you don't store a timestamp in the Database then you need slapping*/ $realdate = date("Y-m-d"); $year = substr($realdate,0,4); //work out the month $mon = substr($realdate,5,2); //work out the month $day = substr($realdate,8,2); //work out the day $displaydate = ''.$year.'-'.$mon.'-'.$day.''; echo ' <url> <loc>'.$url_product.'</loc> <lastmod>'.$displaydate.'</lastmod> <changefreq>'.$forum_freq.'</changefreq> <priority>'.$forum_priority.'</priority> </url> '; } //THREAD RESULTS for($i=0;$i<$num_rows; $i++) { //cleanurl $title = mysql_result($result,$i,"title"); $a = strtolower($title); $a = remove_accents($a); $a = unhtmlspecialchars($a); $a = str_replace("'", '', $a); $a = preg_split("#[^a-z0-9]#", $a, -1, PREG_SPLIT_NO_EMPTY); $a = array_slice($a, 0, $maxkey); $a = implode("-",$a); if (empty($a)) { $a = 'thread'; } //your url-product as we worked out in #4 $url_product ="http://" . $sitename . "/f" .mysql_result($result,$i,"forumid") . "/" . $a . '-t' .mysql_result($result,$i,"threadid"); /*you need to assign a date to the entity. if you don't store a timestamp in the Database then you need slapping*/ $date = mysql_result($result,$i,"dateline"); //the date stored $realdate = date('Y-m-d H:i:s', $date); $year = substr($realdate,0,4); //work out the month $mon = substr($realdate,5,2); //work out the month $day = substr($realdate,8,2); //work out the day /*display the date in the format Google expects: 2006-01-29 for example*/ $displaydate = ''.$year.'-'.$mon.'-'.$day.''; //you can assign whatever changefreq and priority you like echo ' <url> <loc>'.$url_product.'</loc> <lastmod>'.$displaydate.'</lastmod> <changefreq>'.$thread_freq.'</changefreq> <priority>'.$thread_priority.'</priority> </url> '; } mysql_close(); //close connection //close the XML attribute that we opened in #3 echo '</urlset>'; ?> 1) IE 7 says invalid XML 2) Awaiting Google Webmaster Tools to validate Sitemap. Wants: 1) To pass validation 2) Pull info from config file If I have time (which I don't have much of) - I'll come back here and update this and see if I can't help some people out by adding wants. Note: I'm running 3.8.1 and I get output, but I'm not 100% sure that Google will accept it. Also note: I'm not officially supporting this, like I said I don't have much time. I will try to get it working for myself and put it here and put notes, but that's about all I'm going to have time to do |
#25
|
||||
|
||||
I just got returned a whole bunch of errors from Google Webmaster tools.
Let me keep working on this and see what I can present. EDIT: The errors were because of the $sitename variable. Do not include http:// |
#26
|
||||
|
||||
Okay this update utilizes the config.php to pull database information, so you don't have to enter it.
The only thing you have to enter is the little bit of stuff at the top. Code:
<?php require_once('./includes/config.php'); ///////////////////////Configure this///////////////////////////////////////////////////// $sitename="SITENAME WITHOUT THE HTTP!!"; // sitename including www $maxkey = "8"; // max key words as you specified in tfseo control panel $home_freq = "daily"; //always, hourly, daily, weekly, monthly, yearly, never $home_priority = "1"; // google priority for home (www.yoursite.com) $forum_freq = "daily"; //always, hourly, daily, weekly, monthly, yearly, never $forum_priority = "0.8"; // google priority for forums (www.yoursite.com/f4) $thread_freq = "daily"; //always, hourly, daily, weekly, monthly, yearly, never $thread_priority = "0.4"; // google priority for threads (www.yoursite.com/f4/your-thread) function remove_accents($string){ ////this strings must be the same you are using as character replacements, these are the "wide range" return strtr($string, "???????????????????????????????????????????????????????????????@?", "YuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyyEan"); } ////////////////////////////DONT NEED TO TOUCH ANYTHING BELOW////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////////////////////// $username=$config['MasterServer']['username']; //your db user name $password=$config['MasterServer']['password']; // the pass for the db $database=$config['Database']['dbname']; // name of db $server=$config['MasterServer']['servername']; // server $prefix = $config['Database']['tableprefix']; function unhtmlspecialchars($text) { $text = preg_replace('/&#([0-9]+);/esiU', "convert_int_to_utf8('\\1')", $text); return str_replace(array('<', '>', '"', '&'), array('<', '>', '"', '&'), $text); } //connect to the database $link = mysql_connect($server,$username,$password); mysql_select_db($database,$link) or die( "Unable to select database"); $result = mysql_query("select * from " . $prefix . "thread WHERE visible = 1",$link); $num_rows = mysql_num_rows($result); $query = "select * from " . $prefix . "thread WHERE visible = 1 ORDER BY dateline desc" ; $result = mysql_query($query) or die("Query failed"); $num_rows2 = mysql_num_rows(mysql_query("select * from " . $prefix . "forum")); $query2 = "select * from " . $prefix . "forum ORDER BY forumid desc" ; $result2 = mysql_query($query2) or die("Query failed"); //this is the normal header applied to any Google sitemap.xml file echo '<?xml version="1.0" encoding="ISO-8859-1"?> <urlset xmlns="http://www.google.com/schemas/sitemap/0.84">'; //HOME RESULTS $url_product ="http://" . $sitename; $realdate = date("Y-m-d"); $year = substr($realdate,0,4); //work out the month $mon = substr($realdate,5,2); //work out the month $day = substr($realdate,8,2); //work out the day $displaydate = ''.$year.'-'.$mon.'-'.$day.''; echo ' <url> <loc>'.$url_product.'</loc> <lastmod>'.$displaydate.'</lastmod> <changefreq>'.$home_freq.'</changefreq> <priority>'.$home_priority.'</priority> </url> '; //FORUM RESULTS $i=0; for($i=0;$i<$num_rows2; $i++) { $url_product ="http://" . $sitename . "/f" .mysql_result($result2,$i,"forumid"); /*you need to assign a date to the entity. if you don't store a timestamp in the Database then you need slapping*/ $realdate = date("Y-m-d"); $year = substr($realdate,0,4); //work out the month $mon = substr($realdate,5,2); //work out the month $day = substr($realdate,8,2); //work out the day $displaydate = ''.$year.'-'.$mon.'-'.$day.''; echo ' <url> <loc>'.$url_product.'</loc> <lastmod>'.$displaydate.'</lastmod> <changefreq>'.$forum_freq.'</changefreq> <priority>'.$forum_priority.'</priority> </url> '; } //THREAD RESULTS for($i=0;$i<$num_rows; $i++) { //cleanurl $title = mysql_result($result,$i,"title"); $a = strtolower($title); $a = remove_accents($a); $a = unhtmlspecialchars($a); $a = str_replace("'", '', $a); $a = preg_split("#[^a-z0-9]#", $a, -1, PREG_SPLIT_NO_EMPTY); $a = array_slice($a, 0, $maxkey); $a = implode("-",$a); if (empty($a)) { $a = 'thread'; } //your url-product as we worked out in #4 $url_product ="http://" . $sitename . "/f" .mysql_result($result,$i,"forumid") . "/" . $a . '-t' .mysql_result($result,$i,"threadid"); /*you need to assign a date to the entity. if you don't store a timestamp in the Database then you need slapping*/ $date = mysql_result($result,$i,"dateline"); //the date stored $realdate = date('Y-m-d H:i:s', $date); $year = substr($realdate,0,4); //work out the month $mon = substr($realdate,5,2); //work out the month $day = substr($realdate,8,2); //work out the day /*display the date in the format Google expects: 2006-01-29 for example*/ $displaydate = ''.$year.'-'.$mon.'-'.$day.''; //you can assign whatever changefreq and priority you like echo ' <url> <loc>'.$url_product.'</loc> <lastmod>'.$displaydate.'</lastmod> <changefreq>'.$thread_freq.'</changefreq> <priority>'.$thread_priority.'</priority> </url> '; } mysql_close(); //close connection //close the XML attribute that we opened in #3 echo '</urlset>'; ?> |
#27
|
||||
|
||||
Thank you really, really good
|
#28
|
||||
|
||||
Looks like Google Webmaster Tools is taking it this time.
It'll take a long time though because I have a few LARGE RSS feeds coming in. Tonight once I see it complete ... Maybe I'll work on further integrating it. I wonder if paketeto would let me release it under 3.8 so that we have a clean place to discuss it. |
#29
|
||||
|
||||
google webmaster tools said:
Quote:
So this works. I guess I'll wait to do anymore modifications until I hear from paketeto. I took vbtfseo the first version and modified it highly for my site, but this still works perfectly with it. I tested it on another site with vbtfseo version 2.X and it works as well. If anyone is interested in me working on this, please let me know. |
#30
|
|||
|
|||
These are the errors I'm coming up with using your modified sitemap.php code:
78460 Parsing error We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting. Problem detected on: Feb 19, 2010 Warnings 78458 Invalid XML tag This tag was not recognized. Please fix it and resubmit. Parent tag: urlset Tag: br Problem detected on: Feb 19, 2010 Warnings 78459 Invalid XML tag This tag was not recognized. Please fix it and resubmit. Parent tag: urlset Tag: b Problem detected on: Feb 19, 2010 Warnings 78459 Invalid XML tag This tag was not recognized. Please fix it and resubmit. Parent tag: urlset Tag: b Problem detected on: Feb 19, 2010 Warnings 78459 Invalid XML tag This tag was not recognized. Please fix it and resubmit. Parent tag: urlset Tag: b Problem detected on: Feb 19, 2010 Warnings 78459 Invalid XML tag This tag was not recognized. Please fix it and resubmit. Parent tag: urlset Tag: br Problem detected on: Feb 19, 2010 |
#31
|
|||
|
|||
I know this response comes pretty late, but I just inherited a forum running TFSEO on 3.8.7, and my boss wanted a sitemap immediately. I downloaded this package and tested it out. Here is what I found, and I hope it helps you...
1) The sitemap ran for awhile and then just stopped dead. It was missing a function called convert_int_to_utf8(). This could be the issue many of you reported. I added this function, and the sitemap ran to completion each time. 2) The sitemap generator is stupid, and grabs all forums and threads. Including private forums. So I added an "excuded_forums" variable, and excluded those forums in the 2 SQL statements. 3) The way this is written is S-L-O-W on a large forum. The forum Im using has 184,000 threads and it took 38 minutes to create the sitemap. The box is a bad-ass machine too, with 32 gb of ram and 4 quad processors. Its not a 386. Its because of the "select *" and "mysql_result()" grabbing alot of data, and then picking each piece from the results one at a time. So yeah, SLOW. 4) Having 184k urls, my sitemap is too big for google to eat. So here's what I had to do to make it work: 1) I replaced the I/O with a database library I use all the time. Just this one change took the processing of this program from 38 minutes down to 11 seconds. This is because instead of grabbing the data piecemeal from the database, i only selected the fields i was using and grabbed the data in-bulk from the db and dropped it into an array for processing. So yeah, 11 seconds. 2) The file was still big, so i changed the program to write the file to disc instead of echoing out the output. This way I can generate it via cron, whenever i want. 3) I also added a new variable called $urlsperfile, and created a sitemap_#.xml with that many urls, cycling to a new file when i got more than $urlsperfile. 4) Added a call to gzip to compress all of the files i made. 5) created an uncompressed sitemap.xml file, which is a sitemap index file pointing to all of the sitemap_#.xml.gz files i just processed. 6) Finally, made some very MINOR changes to the xml to meet the sitemap.orgs requirements, found at: http://www.sitemaps.org/protocol.php The final outcome? a cronjob that runs in less than 15 seconds, generates 10 sitemap files with 20k urls per, and one sitemap file linking them all. Oh, and this is on a 3.8.7 forum. |
|
|
X vBulletin 3.8.12 by vBS Debug Information | |
---|---|
|
|
More Information | |
Template Usage:
Phrase Groups Available:
|
Included Files:
Hooks Called:
|