vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vBulletin 2.x Full Releases (https://vborg.vbsupport.ru/forumdisplay.php?f=4)
-   -   vbArchive - Search Engine Indexer for vBulletin (https://vborg.vbsupport.ru/showthread.php?t=47667)

NexDog 03-02-2003 03:09 AM

Sweet, online.php fixed. Talking to myself. :D

Going out for a bit. Teck, if you see this, still would like to know about the URL seen in screenie...

NexDog 03-02-2003 01:14 PM

OMG! Thirty googlebots just can't get enough of Nexology. :)

Teck, you are the man, without doubt. :D

Few things though. Have only seen them hit the Archive once or twice. They seem to be casing the forum due to the sessionhash being removed.

I tried the useragent hack (by inphinity) but that just returned:
Code:

Fatal error: Call to undefined function: no_sessionhash() in /home/httpd/vhosts/hostnexus.com/httpdocs/forum/global.php on line 296
Not sure if I was doing things wrong, but I removed your extra mod to functions.php:
Code:

function no_sessionhash()

{

  global $session;



  $agent = array(

    'crawl',

    'googlebot',

    'gulliver',

    'ia_archiver',

    'internetseer',

    'linkalarm',

    'mercator',

    'openbot',

    'pingalink',

    'psbot',

    'scooter',

    'slurp',

    'slysearch',

    'zeus',

    'zyborg',

    'otheruseragentcrawleryouwant'

  );



  foreach( $agent as $useragent )

  {

    if ( stristr( getenv( 'HTTP_USER_AGENT' ) , $useragent ) )

    {

      $session['sessionhash'] = '';

    }

  }

}

and replaced that with:
Code:

function useragentcheck( $match_agent, $agent_code )
{

  $agent = array(
    'googlebot'        => 'www.google.com/|||Google',
    'gulliver'          => 'www.northernlight.com/|||Northern Light',
    'ia_archiver'      => 'www.archive.org/|||The Internet Archive',
    'internetseer'      => 'www.internetseer.com/|||Internet Seer',
    'linkalarm'        => 'linkalarm.com/|||Link Alarm',
    'mercator'          => 'www.research.compaq.com/SRC/mercator/|||Mercator',
    'openbot'          => 'www.openfind.com.tw/|||Openbot',
    'pingalink'        => 'www.pingalink.com/|||PingALink Monitor',
    'psbot'            => 'www.picsearch.com/bot.html|||PicSearch',
    'scooter'          => 'www.altavista.com/|||AltaVista',
    'slurp'            => 'www.inktomi.com/slurp.html|||Inktomi',
    'turnitinbot'      => 'www.turnitin.com/robot/crawlerinfo.html|||Turnitin',
    'slysearch'        => 'www.turnitin.com/robot/crawlerinfo.html|||Turnitin',
    'zeus'              => 'www.waltbren.com/products/zeus_internet_robot.htm|||Zeus Internet Marketing',
    'zyborg'            => 'www.wisenutbot.com/|||WiseNut',
    'teoma'            => 'www.teoma.com/|||Teoma/Ask Jeeves',
    'spider'            => 'Web Spider',
    'spyder'            => 'Web Spyder',
    'crawl'            => 'Web Crawler',
    'robot'            => 'Web Robot'
  );

  foreach( $agent as $useragent => $agenturl )
  {
    if ( preg_match ("/^\d+$/", $useragent) )
    {
      $useragent = $agenturl;
      $agenturl  = "Search Engine";
    }

    if ( preg_match ("/". preg_quote ($useragent) ."/i", $match_agent) )
    {
      $agentinfo = preg_split ("/\|\|\|/", $agenturl);
          if (!($agentinfo[1])) {
              $agentinfo[0] = "http://www.robotstxt.org/wc/active.html";
              $agentinfo[1] = "Web Robot $useragent";
          }

          switch ($agent_code) {
            case 0:
              return 1;
              break;
            case 1:
              return $agentinfo[1];
              break;
            case 2:
              return '</a><a href="http://'. $agentinfo[0] .'" alt="'. $agentinfo[1] .'"><i>'. $agentinfo[1] .'</i>';
              break;
          }
    }
  }

}

Or can I just tack inphinity's hack onto the end of functions.php after your no_sessionhash() function?

erdem 03-02-2003 02:32 PM

hi ...
first of all great hack , tnx for it TECK ...

i have a visual problem at online.php , i aplied the fix that u released and aplied that first 3 optimizations ...

but when someone browsing my archive online.php displays something like ;
-> Unknown Location: /showthread/images/catbg.gif?
-> Unknown Location: /forumdisplay/images/catbg.gif?

any idea ? can i fix this ?

thanks

NexDog 03-02-2003 02:35 PM

Teck posted that in his first post:

https://vborg.vbsupport.ru/showthrea...218#post342218

TECK 03-02-2003 02:42 PM

I personally blocked the forums for crawlers and renamed the archive.txt to archive.html because I don't want the crawlers to go at all to the forums.
http://www.teckwizards.com/archive.html

NexDog 03-02-2003 02:49 PM

Why would you want to keep the bots out your forum? Am I missing something?

NexDog 03-02-2003 03:00 PM

Google is having too much fun on our forum. Have 43 bots in. 16 Inktomi and 27 Google. :D

I also see a pattern. Inktomi was the first to arrive and they hit the Archive immediately. But those first 3 bots just sat on the Archive index and then went to the forums only and disappeared - they didn't crawl the threads.

Then Google came in and did the same thing on the forums. One bot came in and sat on index for an hour. It came back later with about 10 buddies and just sat on all the forums and disappeared. Now they are back in force and going crazy on all threads and one or two made it into the Archive and are just sitting on the forums but not going deep.

Pretty sure Google will back on the Archive just like Inktomi is right now.

TECK 03-02-2003 03:16 PM

Quote:

Originally posted by NexDog
Why would you want to keep the bots out your forum? Am I missing something?
Why would I want them to go to the forums? Those URL's are dropped anyway, because are not friendly.
I want them to go on the archive all the time because is the same content like the forums.

NexDog 03-02-2003 03:18 PM

Ahhhhhhhhh, okay. But you allow them to index.php right?

TECK 03-02-2003 03:19 PM

No, I blocked the /forum folder completly because my archive is on the main root:
[root] < archive.html
---[forum]

So in your case you want to rename the file to archive.html and block the rest of all files in /forum folder...
If you block the full forum folder, the crawlers will not have access to the archives anymore.

The nicest way to have your threads indexed properly is to install the vbHome (lite) script and it's archive add-on, in this way, the indexing is done directly onto your root. Better, since it's proven that the crawlers will index faster your threads if they are closer to the root...


All times are GMT. The time now is 10:52 AM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01709 seconds
  • Memory Usage 1,751KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (3)bbcode_code_printable
  • (1)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (2)pagenav_pagelinkrel
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (10)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete