The Arcive of Official vBulletin Modifications Site.It is not a VB3 engine, just a parsed copy! |
|
![]() |
|||||||||||||||||||||||||
I am tired of my 200,000 posts not being listed in Google. I was inspired by phpbuilder.com this morning and I wrote:
vbSpiderFriend - the search engine indexer for all of your posts Purpose: Allow search engine spiders to crawl a linked list of all of your posts. Project Requirements: -Friendly URLs (no query strings) -Good dynamic meta tags -Never have to touch the script again.. It is Y3K compliant, simply re-submit to the engines to update your listings Install Requirements: -vBulletin 1.x or 2.x -about 10 minutes 1) Download the attached Zip. 2) Open class.mysql.php and put your database login info at the top. 3) Create a new directory called archive under your forum, like /forum/archive 4) Open the included .htaccess and change the Error 404 to your new archive path. 5) Open index.php and change the self-explanatory variables at the top of the file. 6) Upload all 3 files to your archive directory. 7) Submit /forum/archive/index.php to search engines and watch em crawl DISCLAIMER: I don't use 2.x but I checked the schema and this should work fine. NOTES: This uses ErrorDocument and query string parsing to get the variables needed. I do not have the time or energy to troubleshoot this if it does not work on your server. Sorry! Show Your Support
|
Comments |
#192
|
|||
|
|||
![]()
Hello,
I am writing to see if some one would be interested in taking this hack one step further to allow for the creation of html pages for search engines to index. This would prevent problems of some search engines overloading a server if trying to index they dynamic content to fast. What I propose, is for a script to make pages identical to what we have with this script, however it actually makes the html page like what is dynamic and puts them into a similar folder structure. The script would need to be able to cycle through all of the posts initially so it doesnt cause problems doing them all at once, and it should then be run via a cron job or manually every so often to archive new posts, or re archive edited posts since the last run. I feel this would meet a lot of the needs of sites on shared servers, and if need be would be willing to pay for this to be done. Let me know if anyone is interested. -Brian |
#193
|
||||
|
||||
![]() Quote:
|
#194
|
|||
|
|||
![]() Quote:
All it does is whip through the database, and for all search query results for a particular query it will cannon hundreds of small files onto the docroot. Now, we use this to pre-generate content on our site, thus massively reducing the database hits for the dynamic content (very important for us, we get over 100,000 unique users per hour on our top content sites). Once a day it is fired and the site is made fresh. News is fired hourly or manually when needed. In our application it's good for security too, since the database resides on a different machine and no access is needed by the webserver (the Cache Cannon resides on an interim machine and simply copies the files to the web server). ...anyhow, yesterday when I saw this thread I realised that the main flaw is that it is too slow in generating the content for the spiders. That the spiders would prefer static html so they can trawl faster, and that the pages were not optimised enough for a high ranking on the search engine. Also you probably get hit by several spiders a day (take a peek at your logs and requests for \robots.txt for an indication), and the work to pre-generate is probably less than the work to serve it all each time. Thus I will probably be making another implementation of this hack 100% new, but based upon our existing Cache Cannon theory. It will create a single html file for each post, and you could fire it for given date ranges (reducing server load) or forums, at given time intervals (manually, say weekly) or via a cron job. I shall also include client side javascript in these files to redirect a user to the proper version of the post in the appropriate forum onload. This should be googlebot safe as I believe it ignores client side script, but will ensure that when a user comes from a search engine, they are simply bounced to the correct entry in the real forum. eva2000, I shall endeavour to make sure that this does not generate files for private forums. This will be perfect for you since entering private forum id's would not be possible, since the files are static. Though it should be noted that as this will generate static files... should you later turn a public forum private then you would have to delete those files manually, hence including the $forumId in the proposed folder structure. Proposed storage: The folder structure... $forumpath/archive/$forumId/$year/$month/$day For the file names... $postId.htm I shall start this on Tuesday next week, and hope to have it finished by Saturday next week (I'd do it sooner, but it's my birthday and this isn't that important!). The files will be standalone and I shall develop them with vb v2.2.2 though as I shall only be accessing user, post, thread and forum (I guess... I'll have to look at the schema) this should be backwards compatible to at least 2.x boards. Though I will only be supporting the latest version at any time. If I run into trouble or need assistance with the schema I shall let you know. Cheers David K http://www.buro9.com/forum/ |
#195
|
|||
|
|||
![]()
I just wanted to touch base to see if you had yet worked on this.
-Brian |
#196
|
|||
|
|||
![]()
Started to put the basics in place last night whilst building another PC.
Got reasonably far with the function that will dump files on the docroot. Just subject to load testing for that. Also built the query that will extract all posts for dumping... have been working on this to make sure it excludes private forums... need to install foxserv at home to test this on a test forum. I think I shall have the back end fully over the evenings this week. The front end will be the thing that actually takes time, but I'm hoping that's just gonna be Saturday and not need more work. Problems stem from wanting to decrease server load by breaking the generation into managable amounts (monthly or by forum). I'll let you know when I have something substantial, and then we can start a new thread here for discussion whilst it's improved. I think it's realistic to say it should be ready as a Beta over this weekend, and that a Release version should follow next week once everyone is sure that it does what they want it to (though I'll not be including code to toast bread). Cheers David K |
#197
|
|||
|
|||
![]()
Wow cant wait!
|
#198
|
||||
|
||||
![]()
How did this go Buro? I'm looking for a solution like this which also works on Win2k (withou htaccess, etc). Did a search for Cache Cannon as well but couldn't find it?
|
#199
|
|||
|
|||
![]()
I've e-mailed Brian offline about this, but in essence it's built.
What I have thus far is an adequate interface offering caching for: All Forums Specific Forums Specific Forms + Sub Forums Within the past x days. The Cache Cannon then will fire for all applicable posts, and uses a template to render the display in the html files. The only missing thing is the final parsing through all resultant folders and files, constructing the index.htm files that will tie it all together for the spiders... and I have plans on the best way to do this already. Awaiting feedback from Brian, but if you wish I can send you an example of the current code tonight and you can offer your comments on how to progress it. I do not want to release it until it works fully on the backend, I'm not bothered by cosmetic things at the moment (since that will be template driven and user adjustable), just that it all works a dream... if you wish to be a private beta tester and help me push it forward, then get in contact. Cheers David K |
#200
|
|||
|
|||
![]()
It all sounds great! Cant wait to test it out
![]() If its ready to test, you can email me at Brian@FutureQuest.net -Brian |
#201
|
|||
|
|||
![]()
I just wanted to follow up to see if you have a version available for us to download yet.
Thanks, Brian |
![]() |
|
|
X vBulletin 3.8.12 by vBS Debug Information | |
---|---|
|
|
![]() |
|
Template Usage:
Phrase Groups Available:
|
Included Files:
Hooks Called:
|