vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vBulletin 2.x Full Releases (https://vborg.vbsupport.ru/forumdisplay.php?f=4)
-   -   vbArchive - Search Engine Indexer for vBulletin (https://vborg.vbsupport.ru/showthread.php?t=47667)

Domenico 02-17-2003 11:52 AM

The forums subforums and subforums headlines get spidered but the actual posts are not.
Now I can search for the headlines (subjects) in google and my page gets on top but when searching for sentences found in the post google gives nothing back because they aren't archived.

Is it just me or is that normal?

TECK 02-19-2003 02:28 AM

It's "normal"... it will take some time but Google will get them all.

GT2002 02-20-2003 04:57 PM

googlebot went nut on my site for almost week..... wonder when they will update their index???

PennylessZ28 02-21-2003 01:51 PM

I installed, I submitted, and a week later, my site stats are even higher than before on the search engines. AWESOME

Overgrow 02-21-2003 03:37 PM

Sorry, I've got the Googlebot busy with a few million appointments at my site :D

Boofo 02-21-2003 05:49 PM

Overgrow, how do you get the web robot to show up for the Current active users (like on your site)?

TECK 02-21-2003 10:21 PM

Inphinity released an online hack, is linked on the first post.

Boofo 02-21-2003 10:34 PM

Will it work with vBHome Lite?

TECK 02-21-2003 10:36 PM

Ya, it has nothing ot do with vbHL but the forums only.

Boofo 02-21-2003 11:45 PM

In the following instuctions for the addon by inphinity, how would we call this function (and where do we call it from) in the online.php?

Quote:

## function for user ip address checking
## matches full/part of an ip address
## might be useful for people who dont have a .htaccess file
## or those who want to identify bots who dont supply a valid or a cloaked
## useragent. probably should be called on return of 0 from useragentcheck
## in online.php
##
## i think its unnecessary. also the ip address matching isnt great since php
## cant handle CIDR addresses so either you break the ip address up and match
## values or you use ranges (as below) which will also identify ip outside
## the allocated range
## ie crawler918.com
## http://ws.arin.net/cgi-bin/whois.pl?queryinput=!%20NET-12-148-209-192-1
## 12.148.209.192/26
## /26 is 62 ip addresses identifying 12.148.209. means that you're blocking 254 ip
## address which will exclude non rogue ips.
## ip address have a tendancy to change and would result in a fairly bit list.

function useripaddresscheck( $match_addr, $addr_code )
{

Schorsch 02-25-2003 10:49 AM

it works now ( click ) :) but I get an IE6 syntax error when I click on a link.

how can I fix it ?

thanks!

Schorsch

Stu 02-25-2003 04:13 PM

Thanks Teck,
works a treat, within 12 hours had lots of crawlers scanning the forums :D

All I've got to do now is work out the best way to alter {imagefolder} currently it's set to /forums/images/
and the vBbutton is incorrect. http://www.mysite.com/forums//forums.../vb_bullet.gif

The DIBB Archive

All Sorted..

julius 02-26-2003 01:51 PM

Are html pages zipped as vb pages are?

TECK 02-26-2003 02:40 PM

If you ask if they are compressed with zlib, the answer is YES, if you have it enabled onto your vBulletin options.
There is no difference between the actual thread and the .html page.

EvilLS1 02-27-2003 10:43 PM

Very nice & useful hack. Thank you TECK.

I installed it yesterday and now I'm just waiting on the spiders to visit. :)

Schorsch 02-27-2003 11:19 PM

Quote:

Originally posted by EvilLS1
I installed it yesterday and now I'm just waiting on the spiders to visit. :)
how long does it take ?

EvilLS1 02-27-2003 11:34 PM

Quote:

Originally posted by Schorsch


how long does it take ?

To install the hack or for the spiders to visit?

To install the hack: 10 minutes or so.

For the spiders to visit: Depends on the search engine I think.

:)

NexDog 03-01-2003 12:25 PM

Teck,

Would seem you have outdone yourself yet again. :)

We have been working on some heavy SEO in the last 8 weeks and have started to get some great results out of Google. My SEO ace has always said that link poularity and CONTENT were key points in getting good rankings and as Google is due to dance today, I suddenly thought about this hack of yours and I link it will serve us proud. Our archive:

http://www.hostnexus.com/forum/archive/

I think I will link to it from our [url=http://hostnexus.com/home/sitemap.htm]SiteMap[/url. Would you think that would be best so the bots get there? I mean, is it pointless linking to it from the forum home page?

TECK 03-01-2003 02:52 PM

As long as you do the first 3 steps related to forum optimizations, you are ok...
A text link will always help.

Schorsch 03-01-2003 03:05 PM

Quote:

Originally posted by EvilLS1
For the spiders to visit: Depends on the search engine I think.

what about google ?

Kars10 03-01-2003 03:20 PM

Quote:

Originally posted by Schorsch


what about google ?

...einmal im Monat! Musst du aber auch in den Meta-tags eingeben. ;)

NexDog 03-01-2003 11:34 PM

Quote:

Originally posted by TECK
As long as you do the first 3 steps related to forum optimizations, you are ok...
A text link will always help.

Teck,

I don't get it.

1) Sessionhash - None of the archive's files have session IDs anyway?

Example: http://www.hostnexus.com/forum/showthread/t-3208.html

2) Why do I need to block the crawler from certain pages as they can't get in as they will be "sessioned?

3) Why do we need to link the forum thread to the archive? Aren't we doing this because the bots can't spider our forums? I see the archive thread is linked to the forums though.

Lastly, how do I block the archive from displaying our "Admin and Mod" forums? Sorry for all the questions but could do with some clarification here as Google is due any minute now so everyone get ready for the deep crawl. :)

mheinemann 03-01-2003 11:46 PM

Quote:

Originally posted by NexDog

Lastly, how do I block the archive from displaying our "Admin and Mod" forums?

The archive uses forum permissions, so mods/admins will be able to see it, but normal users shouldn't. Try logging out and going to the archive to see if it still shows up for you.

NexDog 03-01-2003 11:58 PM

Heh, right you are too. What a nonse I am. :D

Okay, feel free to have a stab at my other questions. ;)

NexDog 03-02-2003 12:21 AM

Okay, added all these mods - makes sense actually. One small refinement was the addition of <smallfont> tags around the "Friendly URL" link.

Still worried about the sessionhash optimisation. Is it really needed as I don't see any sessionhash IDs in the archive....

Quote:

Originally posted by TECK
Here it is another interesting mod I made to my forums, to link back to archives.
This will help because the crawlers will see the friendly URL's easier.
Basically the mod links every forum/thread to the archive.
To test it, mouse over each forum or thread icon, while viewing my forums.
Also, check the link I added under the thread title, while viewing the actual (not archived) thread.

TEMPLATE: forumhome_forumbit_level1_post
FIND:
Code:

    <td valign="top"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt=""></td>
REPLACE WITH:
Code:

    <td valign="top"><a href="forumdisplay/f-$forum[forumid].html"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt="$forum[title] Archive"></a></td>
TEMPLATE: forumhome_forumbit_level2_post
FIND:
Code:

    <td valign="top"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt=""></td>
REPLACE WITH:
Code:

    <td valign="top"><a href="forumdisplay/f-$forum[forumid].html"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt="$forum[title] Archive"></a></td>
TEMPLATE: forumdisplaybit
FIND:
Code:

  <td bgcolor="{firstaltcolor}"><img src="{imagesfolder}/$thread[newoldhot].gif" border="0" alt=""></td>
REPLACE WITH:
Code:

  <td bgcolor="{firstaltcolor}"><a href="showthread/t-$thread[threadid].html"><img src="{imagesfolder}/$thread[newoldhot].gif" border="0" alt="Archive: $thread[title]"></a></td>
TEMPLATE: forumdisplay_forumbit_level1_post
FIND:
Code:

    <td valign="top"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt=""></td>
REPLACE WITH:
Code:

    <td valign="top"><a href="forumdisplay/f-$forum[forumid].html"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt="$forum[title] Archive"></a></td>
TEMPLATE: forumdisplay_forumbit_level2_post
FIND:
Code:

    <td valign="top"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt=""></td>
REPLACE WITH:
Code:

    <td valign="top"><a href="forumdisplay/f-$forum[forumid].html"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt="$forum[title] Archive"></a></td>
TEMPLATE: showthread
FIND:
Code:

$navbar
REPLACE WITH:
Code:

$navbar<br>
<img border="0" src="{imagesfolder}/firstnew.gif" width="14" height="14" align="middle" alt=""> <a href="showthread/t-$thread[threadid].html">Friendly URL Link</a>



TECK 03-02-2003 12:32 AM

Quote:

Originally posted by NexDog
Still worried about the sessionhash optimisation. Is it really needed as I don't see any sessionhash IDs in the archive....
This is not for the archive files, but actual forums, that's why they are part of "forum" not "archive" optimizations.
The crawlers will visit your forums also and they will not like the session hash.

NexDog 03-02-2003 02:41 AM

So there's no real need to remove the sessionhash? I mean, the bots are spidering the archive instead right?
BTW, inktomi is deep inside my archive right now. :)

Where is the online.php fix?

NexDog 03-02-2003 02:48 AM

Hmmm, something seems amiss. I have to to find and implement the online.php hack so who's online is showing me the full URL and it is sessioned - even in the archive. Definitely concerned here as I thought the archive was supposed to produce unsessioned links.

NexDog 03-02-2003 02:50 AM

When I browse the archive, I get no sessioned URLs - any ideas on why inktomi is picking them up?

NexDog 03-02-2003 03:01 AM

Okay, implemented the sessionhash fix. Still unsure why inktomi has sessionhash IDs though.....

Going to look for the online.php fix now. :p

NexDog 03-02-2003 03:09 AM

Sweet, online.php fixed. Talking to myself. :D

Going out for a bit. Teck, if you see this, still would like to know about the URL seen in screenie...

NexDog 03-02-2003 01:14 PM

OMG! Thirty googlebots just can't get enough of Nexology. :)

Teck, you are the man, without doubt. :D

Few things though. Have only seen them hit the Archive once or twice. They seem to be casing the forum due to the sessionhash being removed.

I tried the useragent hack (by inphinity) but that just returned:
Code:

Fatal error: Call to undefined function: no_sessionhash() in /home/httpd/vhosts/hostnexus.com/httpdocs/forum/global.php on line 296
Not sure if I was doing things wrong, but I removed your extra mod to functions.php:
Code:

function no_sessionhash()

{

  global $session;



  $agent = array(

    'crawl',

    'googlebot',

    'gulliver',

    'ia_archiver',

    'internetseer',

    'linkalarm',

    'mercator',

    'openbot',

    'pingalink',

    'psbot',

    'scooter',

    'slurp',

    'slysearch',

    'zeus',

    'zyborg',

    'otheruseragentcrawleryouwant'

  );



  foreach( $agent as $useragent )

  {

    if ( stristr( getenv( 'HTTP_USER_AGENT' ) , $useragent ) )

    {

      $session['sessionhash'] = '';

    }

  }

}

and replaced that with:
Code:

function useragentcheck( $match_agent, $agent_code )
{

  $agent = array(
    'googlebot'        => 'www.google.com/|||Google',
    'gulliver'          => 'www.northernlight.com/|||Northern Light',
    'ia_archiver'      => 'www.archive.org/|||The Internet Archive',
    'internetseer'      => 'www.internetseer.com/|||Internet Seer',
    'linkalarm'        => 'linkalarm.com/|||Link Alarm',
    'mercator'          => 'www.research.compaq.com/SRC/mercator/|||Mercator',
    'openbot'          => 'www.openfind.com.tw/|||Openbot',
    'pingalink'        => 'www.pingalink.com/|||PingALink Monitor',
    'psbot'            => 'www.picsearch.com/bot.html|||PicSearch',
    'scooter'          => 'www.altavista.com/|||AltaVista',
    'slurp'            => 'www.inktomi.com/slurp.html|||Inktomi',
    'turnitinbot'      => 'www.turnitin.com/robot/crawlerinfo.html|||Turnitin',
    'slysearch'        => 'www.turnitin.com/robot/crawlerinfo.html|||Turnitin',
    'zeus'              => 'www.waltbren.com/products/zeus_internet_robot.htm|||Zeus Internet Marketing',
    'zyborg'            => 'www.wisenutbot.com/|||WiseNut',
    'teoma'            => 'www.teoma.com/|||Teoma/Ask Jeeves',
    'spider'            => 'Web Spider',
    'spyder'            => 'Web Spyder',
    'crawl'            => 'Web Crawler',
    'robot'            => 'Web Robot'
  );

  foreach( $agent as $useragent => $agenturl )
  {
    if ( preg_match ("/^\d+$/", $useragent) )
    {
      $useragent = $agenturl;
      $agenturl  = "Search Engine";
    }

    if ( preg_match ("/". preg_quote ($useragent) ."/i", $match_agent) )
    {
      $agentinfo = preg_split ("/\|\|\|/", $agenturl);
          if (!($agentinfo[1])) {
              $agentinfo[0] = "http://www.robotstxt.org/wc/active.html";
              $agentinfo[1] = "Web Robot $useragent";
          }

          switch ($agent_code) {
            case 0:
              return 1;
              break;
            case 1:
              return $agentinfo[1];
              break;
            case 2:
              return '</a><a href="http://'. $agentinfo[0] .'" alt="'. $agentinfo[1] .'"><i>'. $agentinfo[1] .'</i>';
              break;
          }
    }
  }

}

Or can I just tack inphinity's hack onto the end of functions.php after your no_sessionhash() function?

erdem 03-02-2003 02:32 PM

hi ...
first of all great hack , tnx for it TECK ...

i have a visual problem at online.php , i aplied the fix that u released and aplied that first 3 optimizations ...

but when someone browsing my archive online.php displays something like ;
-> Unknown Location: /showthread/images/catbg.gif?
-> Unknown Location: /forumdisplay/images/catbg.gif?

any idea ? can i fix this ?

thanks

NexDog 03-02-2003 02:35 PM

Teck posted that in his first post:

https://vborg.vbsupport.ru/showthrea...218#post342218

TECK 03-02-2003 02:42 PM

I personally blocked the forums for crawlers and renamed the archive.txt to archive.html because I don't want the crawlers to go at all to the forums.
http://www.teckwizards.com/archive.html

NexDog 03-02-2003 02:49 PM

Why would you want to keep the bots out your forum? Am I missing something?

NexDog 03-02-2003 03:00 PM

Google is having too much fun on our forum. Have 43 bots in. 16 Inktomi and 27 Google. :D

I also see a pattern. Inktomi was the first to arrive and they hit the Archive immediately. But those first 3 bots just sat on the Archive index and then went to the forums only and disappeared - they didn't crawl the threads.

Then Google came in and did the same thing on the forums. One bot came in and sat on index for an hour. It came back later with about 10 buddies and just sat on all the forums and disappeared. Now they are back in force and going crazy on all threads and one or two made it into the Archive and are just sitting on the forums but not going deep.

Pretty sure Google will back on the Archive just like Inktomi is right now.

TECK 03-02-2003 03:16 PM

Quote:

Originally posted by NexDog
Why would you want to keep the bots out your forum? Am I missing something?
Why would I want them to go to the forums? Those URL's are dropped anyway, because are not friendly.
I want them to go on the archive all the time because is the same content like the forums.

NexDog 03-02-2003 03:18 PM

Ahhhhhhhhh, okay. But you allow them to index.php right?

TECK 03-02-2003 03:19 PM

No, I blocked the /forum folder completly because my archive is on the main root:
[root] < archive.html
---[forum]

So in your case you want to rename the file to archive.html and block the rest of all files in /forum folder...
If you block the full forum folder, the crawlers will not have access to the archives anymore.

The nicest way to have your threads indexed properly is to install the vbHome (lite) script and it's archive add-on, in this way, the indexing is done directly onto your root. Better, since it's proven that the crawlers will index faster your threads if they are closer to the root...


All times are GMT. The time now is 06:47 PM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01823 seconds
  • Memory Usage 1,849KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (15)bbcode_code_printable
  • (10)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (1)pagenav_pagelinkrel
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (40)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete