vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vBulletin 3.5 Add-ons (https://vborg.vbsupport.ru/forumdisplay.php?f=113)
-   -   Google sitemap for the vB Archives. Redirect human and robots. (https://vborg.vbsupport.ru/showthread.php?t=93980)

lierduh 10-07-2005 11:58 PM

This is something I had in mind to implement.:) So next version will certainly contain this feature.

I think I should be able to push a new version out this weekend including better documentation for the step 3. I have been waiting for the vB Gold.

Quote:

Originally Posted by buro9
I have a problem though... you're making a sitemap gz for each forum, well, some of my forums are big:


Could you add spanning?

So we'd start with:
http://www.bowlie.com/forum/archive/sitemap_4_1.gz

And when we passed an arbitrary value (make it a setting in the file in case Google change it later) we would move onto:
http://www.bowlie.com/forum/archive/sitemap_4_2.gz
http://www.bowlie.com/forum/archive/sitemap_4_3.gz
through
http://www.bowlie.com/forum/archive/...p_4_9999999.gz
etc

As it stands, Google is now refusing to pay attention to my mine as the one that exceeds it basically causes the whole thing to error.


Unreal Player 10-08-2005 01:23 AM

is it normal for my site to still be PENDING after 6 hours at google. And how does my site know what account i'm using to resubmit it automatically?

dutchbb 10-08-2005 06:06 PM

Quote:

Originally Posted by lierduh
This is something I had in mind to implement.:) So next version will certainly contain this feature.

I think I should be able to push a new version out this weekend including better documentation for the step 3. I have been waiting for the vB Gold.

HI

Google Spider still only looks at the normal threads in who's online?

Only the Yahoo! Slurp Spider looks at the archives?

falter 10-08-2005 11:06 PM

Hi there,
I'm very happy with the archive redirection. That's pretty slick stuff, and it seems to be working great. The sitemap submission to google hasn't really taken effect quite yet, but it's only be 36 hours since submission (I imagine that these things can take some time). Yahoo is going bonkers on us, though!

Anyway, I've submitted a bug/feature request to vbulletin as a result of installing this mod. You can see it here:
http://www.vbulletin.com/forum/bugs3...iew&bugid=1576

Specifically, it has to do with the way in which $show['search_engine'] is defined, which seems important as it plays quite an important role in this particular mod.

Looking at the definition of $show['search_engine'] seemed important as I, like others, have noticed that sometimes googlebot doesn't want to get redirected from showthread to the archives.

(as seen in /includes/init.php)
Code:

$show['search_engine'] = ($vbulletin->superglobal_size['_COOKIE'] == 0 AND preg_match("#(google|msnbot|yahoo! slurp)#si", $_SERVER['HTTP_USER_AGENT']));
As you can see, the vBulletin assumes that no search engine spider will ever use a cookie. I found the redirection to be more effective after removing the checking for the absence of a cookie, which resulted in this:
Code:

$show['search_engine'] = (true AND preg_match("#(google|msnbot|yahoo! slurp)#si", $_SERVER['HTTP_USER_AGENT']));
Now, as you can see in my bug report, I'm not terribly satisfied with the way $show['search_engine'] is defined in the first place, but making the mod as seen above helped me out, some.

Hope this helps some of you guys...

~mike.

falter 10-08-2005 11:12 PM

Quote:

Originally Posted by Triple_T
HI

Google Spider still only looks at the normal threads in who's online?

Only the Yahoo! Slurp Spider looks at the archives?

Triple_T,
Just for clarity's sake, I was having the same problem you are having. Try my mod (in the post above this one), and see if that helps.

~mike

dutchbb 10-09-2005 02:00 AM

Quote:

Originally Posted by falter
Triple_T,
Just for clarity's sake, I was having the same problem you are having. Try my mod (in the post above this one), and see if that helps.

~mike

I looked right after and 1 x google was in the archives. After that it was still also in the threads.

I noticed google is mutch less effective in comparison:
- only 1 spider most of the time (yahoo 10 or more)
- yahoo is now always in the archives, googlebot almost always not
- googlebot still goes to pages like printthread and member.php , and that even with a robot.txt disallowing that to happen.

MSN bot has not gone further than index.php, so looks like yahoo is just a better bot?

Now I have 2 questions regarding robots.txt:

- I have one both in the site root en the vbulletin root, is this needed , if not, what is the correct place (from what I have read it should be the site root)

- Is the .php extention needed for disallowing files, some say it's best to not include it, i have not seen a difference so far.

jdingman 10-09-2005 03:34 AM

Looks great so far. One question about mod_rewrite

using
Quote:

RewriteCond %{QUERY_STRING} ^$
RewriteRule ^index.php$ / [R=301,L]
that redirects if you're using forums.domain.com. What about if you're using domain.com/forums/? What mod_rewrite would you use for that redirect?

(not exactly for me because I can probably get it working, but anyone else that might need this as well.)

falter 10-09-2005 03:51 AM

Quote:

Originally Posted by Triple_T
Now I have 2 questions regarding robots.txt:

- I have one both in the site root en the vbulletin root, is this needed , if not, what is the correct place (from what I have read it should be the site root)

- Is the .php extention needed for disallowing files, some say it's best to not include it, i have not seen a difference so far.

your robots.txt should be accessible at the root of your domain (http://www.mydomain.com/robots.txt). this is the only place that spiders know to check.

if you're trying to explicitly define specific files (ex. /forums/showthread.php), then you should define that entry in your robots.txt file. there's no point in not putting the ".php" at the end (ex. /forums/showthread), it doesn't buy you anything. it can actually have a negative impact if your entries aren't defined well. say you're trying to tell search engines to ignore "/forum/s.php" (this is just hypothetical). if you were to just put "/forum/s" in your robots.txt, then, in addition to blocking "/forum/s.php", you'd be blocking "/forum/showthread.php", "/forum/search.php", "/forum/showgroups.php", anything else where the url starts with "/forum/s" .... as you can see, it's important to be as specific as possible, otherwise you risk shutting spiders out of huge chunks of your site.

falter 10-09-2005 03:59 AM

Quote:

Originally Posted by Triple_T
I looked right after and 1 x google was in the archives. After that it was still also in the threads.

I noticed google is mutch less effective in comparison:
- only 1 spider most of the time (yahoo 10 or more)
- yahoo is now always in the archives, googlebot almost always not
- googlebot still goes to pages like printthread and member.php , and that even with a robot.txt disallowing that to happen.

i've thought about it some more.
301 code just tells the bot that the link has permanently moved. it would take a second request from the spider to actually jump to the archives. if the spider is slow (as googlebot and msnbot typically are), i can see how it would appear as though googlebot was sitting in showthread, instead of being directed to the archive....

lierduh 10-09-2005 05:42 AM

I have a new version ready to be released. If anyone wants, you can download this and try out before I put together the package.

I still need to do the documentation for the modifications of index.php and global.php files.


All times are GMT. The time now is 02:21 PM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01227 seconds
  • Memory Usage 1,757KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (2)bbcode_code_printable
  • (7)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (1)pagenav_pagelinkrel
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (10)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete