Go Back   vb.org Archive > vBulletin Modifications > Archive > vB.org Archives > vBulletin 2.x > vBulletin 2.x Full Releases
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools
Details »»

Version: , by Overgrow Overgrow is offline
Developer Last Online: Jun 2004 Show Printable Version Email this Page

Version: 2.0.x Rating:
Released: 04-29-2001 Last Update: Never Installs: 53
 
No support by the author.

I am tired of my 200,000 posts not being listed in Google. I was inspired by phpbuilder.com this morning and I wrote:

vbSpiderFriend - the search engine indexer for all of your posts

Purpose: Allow search engine spiders to crawl a linked list of all of your posts.

Project Requirements:

-Friendly URLs (no query strings)
-Good dynamic meta tags
-Never have to touch the script again.. It is Y3K compliant, simply re-submit to the engines to update your listings

Install Requirements:

-vBulletin 1.x or 2.x
-about 10 minutes


1) Download the attached Zip.

2) Open class.mysql.php and put your database login info at the top.

3) Create a new directory called archive under your forum, like /forum/archive

4) Open the included .htaccess and change the Error 404 to your new archive path.

5) Open index.php and change the self-explanatory variables at the top of the file.

6) Upload all 3 files to your archive directory.

7) Submit /forum/archive/index.php to search engines and watch em crawl


DISCLAIMER: I don't use 2.x but I checked the schema and this should work fine.

NOTES: This uses ErrorDocument and query string parsing to get the variables needed. I do not have the time or energy to troubleshoot this if it does not work on your server. Sorry!

Show Your Support

  • This modification may not be copied, reproduced or published elsewhere without author's permission.

Comments
  #192  
Old 02-21-2002, 12:05 AM
Brian Brian is offline
 
Join Date: Nov 2001
Posts: 35
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Hello,

I am writing to see if some one would be interested in taking this hack one step further to allow for the creation of html pages for search engines to index. This would prevent problems of some search engines overloading a server if trying to index they dynamic content to fast.

What I propose, is for a script to make pages identical to what we have with this script, however it actually makes the html page like what is dynamic and puts them into a similar folder structure.

The script would need to be able to cycle through all of the posts initially so it doesn’t cause problems doing them all at once, and it should then be run via a cron job or manually every so often to archive new posts, or re archive edited posts since the last run.

I feel this would meet a lot of the needs of sites on shared servers, and if need be would be willing to pay for this to be done.

Let me know if anyone is interested.

-Brian
Reply With Quote
  #193  
Old 02-21-2002, 01:15 AM
JJR512's Avatar
JJR512 JJR512 is offline
 
Join Date: Oct 2001
Location: Glen Burnie, MD, USA
Posts: 710
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally posted by eva2000
had to remove this hack as it allowed people to snoop in to private forums via entering forum id numbers which were not displayed (invisible) on the page listings
This doesn't happen for me. I just logged out of my board and tried putting in some private forum ID numbers. In all cases, even though the URL in the address bar showed the private forumid, the page actually went to forumid 1. Regardless of where I started, if I put in a private forumid number, it went to the first forum.
Reply With Quote
  #194  
Old 02-22-2002, 07:06 AM
buro9 buro9 is offline
 
Join Date: Feb 2002
Location: London, UK
Posts: 585
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally posted by Brian
Hello,

I am writing to see if some one would be interested in taking this hack one step further to allow for the creation of html pages for search engines to index. This would prevent problems of some search engines overloading a server if trying to index they dynamic content to fast.

What I propose, is for a script to make pages identical to what we have with this script, however it actually makes the html page like what is dynamic and puts them into a similar folder structure.

The script would need to be able to cycle through all of the posts initially so it doesn?t cause problems doing them all at once, and it should then be run via a cron job or manually every so often to archive new posts, or re archive edited posts since the last run.

I feel this would meet a lot of the needs of sites on shared servers, and if need be would be willing to pay for this to be done.

Let me know if anyone is interested.

-Brian
I already have what we've called a Cache Cannon on one of our sites.
All it does is whip through the database, and for all search query results for a particular query it will cannon hundreds of small files onto the docroot.

Now, we use this to pre-generate content on our site, thus massively reducing the database hits for the dynamic content (very important for us, we get over 100,000 unique users per hour on our top content sites).

Once a day it is fired and the site is made fresh. News is fired hourly or manually when needed.

In our application it's good for security too, since the database resides on a different machine and no access is needed by the webserver (the Cache Cannon resides on an interim machine and simply copies the files to the web server).

...anyhow, yesterday when I saw this thread I realised that the main flaw is that it is too slow in generating the content for the spiders. That the spiders would prefer static html so they can trawl faster, and that the pages were not optimised enough for a high ranking on the search engine. Also you probably get hit by several spiders a day (take a peek at your logs and requests for \robots.txt for an indication), and the work to pre-generate is probably less than the work to serve it all each time.

Thus I will probably be making another implementation of this hack 100% new, but based upon our existing Cache Cannon theory.

It will create a single html file for each post, and you could fire it for given date ranges (reducing server load) or forums, at given time intervals (manually, say weekly) or via a cron job.

I shall also include client side javascript in these files to redirect a user to the proper version of the post in the appropriate forum onload. This should be googlebot safe as I believe it ignores client side script, but will ensure that when a user comes from a search engine, they are simply bounced to the correct entry in the real forum.

eva2000, I shall endeavour to make sure that this does not generate files for private forums. This will be perfect for you since entering private forum id's would not be possible, since the files are static. Though it should be noted that as this will generate static files... should you later turn a public forum private then you would have to delete those files manually, hence including the $forumId in the proposed folder structure.

Proposed storage:

The folder structure...

$forumpath/archive/$forumId/$year/$month/$day

For the file names...

$postId.htm

I shall start this on Tuesday next week, and hope to have it finished by Saturday next week (I'd do it sooner, but it's my birthday and this isn't that important!).

The files will be standalone and I shall develop them with vb v2.2.2 though as I shall only be accessing user, post, thread and forum (I guess... I'll have to look at the schema) this should be backwards compatible to at least 2.x boards. Though I will only be supporting the latest version at any time.

If I run into trouble or need assistance with the schema I shall let you know.

Cheers

David K

http://www.buro9.com/forum/
Reply With Quote
  #195  
Old 02-27-2002, 01:03 AM
Brian Brian is offline
 
Join Date: Nov 2001
Posts: 35
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I just wanted to touch base to see if you had yet worked on this.

-Brian
Reply With Quote
  #196  
Old 02-27-2002, 09:18 AM
buro9 buro9 is offline
 
Join Date: Feb 2002
Location: London, UK
Posts: 585
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Started to put the basics in place last night whilst building another PC.

Got reasonably far with the function that will dump files on the docroot. Just subject to load testing for that.

Also built the query that will extract all posts for dumping... have been working on this to make sure it excludes private forums... need to install foxserv at home to test this on a test forum.

I think I shall have the back end fully over the evenings this week. The front end will be the thing that actually takes time, but I'm hoping that's just gonna be Saturday and not need more work. Problems stem from wanting to decrease server load by breaking the generation into managable amounts (monthly or by forum).

I'll let you know when I have something substantial, and then we can start a new thread here for discussion whilst it's improved.

I think it's realistic to say it should be ready as a Beta over this weekend, and that a Release version should follow next week once everyone is sure that it does what they want it to (though I'll not be including code to toast bread).

Cheers

David K
Reply With Quote
  #197  
Old 02-27-2002, 12:43 PM
Brian Brian is offline
 
Join Date: Nov 2001
Posts: 35
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Wow cant wait!
Reply With Quote
  #198  
Old 03-05-2002, 12:30 PM
rawnet's Avatar
rawnet rawnet is offline
 
Join Date: Oct 2001
Location: London, UK
Posts: 69
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

How did this go Buro? I'm looking for a solution like this which also works on Win2k (withou htaccess, etc). Did a search for Cache Cannon as well but couldn't find it?
Reply With Quote
  #199  
Old 03-05-2002, 02:02 PM
buro9 buro9 is offline
 
Join Date: Feb 2002
Location: London, UK
Posts: 585
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I've e-mailed Brian offline about this, but in essence it's built.

What I have thus far is an adequate interface offering caching for:

All Forums
Specific Forums
Specific Forms + Sub Forums
Within the past x days.

The Cache Cannon then will fire for all applicable posts, and uses a template to render the display in the html files.

The only missing thing is the final parsing through all resultant folders and files, constructing the index.htm files that will tie it all together for the spiders... and I have plans on the best way to do this already.

Awaiting feedback from Brian, but if you wish I can send you an example of the current code tonight and you can offer your comments on how to progress it.

I do not want to release it until it works fully on the backend, I'm not bothered by cosmetic things at the moment (since that will be template driven and user adjustable), just that it all works a dream... if you wish to be a private beta tester and help me push it forward, then get in contact.

Cheers

David K
Reply With Quote
  #200  
Old 03-05-2002, 02:22 PM
Brian Brian is offline
 
Join Date: Nov 2001
Posts: 35
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

It all sounds great! Cant wait to test it out

If its ready to test, you can email me at Brian@FutureQuest.net

-Brian
Reply With Quote
  #201  
Old 03-09-2002, 12:53 AM
Brian Brian is offline
 
Join Date: Nov 2001
Posts: 35
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I just wanted to follow up to see if you have a version available for us to download yet.

Thanks,
Brian
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 06:05 AM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.02869 seconds
  • Memory Usage 2,311KB
  • Queries Executed 27 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (2)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)modsystem_post
  • (1)navbar
  • (6)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (2)pagenav_pagelinkrel
  • (11)post_thanks_box
  • (11)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (11)post_thanks_postbit_info
  • (10)postbit
  • (11)postbit_onlinestatus
  • (11)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete