PDA

View Full Version : vbArchive - Search Engine Indexer for vBulletin


Pages : 1 [2] 3 4

TECK
01-31-2003, 06:11 AM
LOL, 27 crawlers form Inktomi are chewing the archives right now. :)

Zombie
01-31-2003, 06:19 AM
Yeah, Inktomi has been going nuts on my site

Google doesn't seem to find my archive though

TECK
01-31-2003, 06:23 AM
Patience... Google are a little lazy, but once they hit you they wont stop...
I explained in the first post why Google is slower and why they drop the links on a regular basis.

Zombie
01-31-2003, 06:28 AM
Yeah, I figured as much. I'll post a screen once google goes nuts

Thanks for the hack :)

Destee
01-31-2003, 08:14 AM
Hi Teck ... Thank you for this hack. It was easy to install.

I had problems with #3 of the vital stuff, I couldn't find some of that code in my templates so I didn't do that part. Also the hack provided by Logician to parse images didn't work for me either. These issues are probably due to lack of rest, so I'll go back over them later.

I think it's installed well enough to *click installed*

I had Overgrow's or Fastforward's hack installed (I know I've tried them both) but never got any results, so I put yours in their spot. :)

I installed Skuzzy's hack too, though I've not seen any of my threads get added yet. I don't suppose there's a reason why I can't have them both?

site:destee.com archive (http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=site%3Adestee.com+archive)

This is where your hack has been installed:
http://www.destee.com/forums/archive

Thanks again for sharing and I've got my fingers crossed.

Destee

TECK
01-31-2003, 08:30 AM
Well, overgrow and also fastforward liked my hack and they install it onto their forums (not sure about fastforward, he said he will).
About the indexing, you have to be patient. It cannot index it over night, there are zillions of sites out there.

The good part about this hack is that it has what a search engine needs, to index it:
- unique meta tags for each page
- friendly links to forums, so you don't URL cloak
- static page look

Destee
01-31-2003, 08:36 AM
Thanks Teck for the encouragement :)

I'll be posting again, as soon as I have results to share!

Destee

TECK
01-31-2003, 08:49 AM
What user owns the steroidology forums? I need his name, Floris.

Floris
01-31-2003, 08:57 AM
Originally posted by TECK
What user owns the steroidology forums? I need his name, Floris. His username is BiggieSwolls

BiggieSwolls Steroidology Forums (http://www.steroidology.com/forum/archive/?styleid=2) Archive (v1.3 installed by xiphoid)

TECK
01-31-2003, 08:57 AM
Originally posted by Destee
Thanks Teck for the encouragement :)

I'll be posting again, as soon as I have results to share!

Destee
I don't like the idea that you installed over the old /archive the new one.
If I would be you, I would install it as "/vbarchive" and submit it to all engines listed in the first post.

Floris
01-31-2003, 09:36 AM
Here I am again, installed it on another forum: fitnessgeared.com

GearedUp FitnessGeared Forums (http://www.fitnessgeared.com/archive/) Archive (v 1.3 installed by xiphoid)

Feel free to add it to the first post.

TECK
01-31-2003, 10:04 AM
Damn, xiphoid in on fire... Link added. :)

Floris
01-31-2003, 10:05 AM
Installed it on 5 sites now. 3 public, 2 private.

ladyfyre
01-31-2003, 12:26 PM
i have a problem with this hack. In some forums, (the lesser trafficked ones) this works 100% normally & correctly. In the busier ones though, it is listing dozens of threads with dates for the FUTURE (several now say they "were" posted in July of 03) In each case, older threads are the ones that are coming up. Any ideas on why?

TECK
01-31-2003, 12:32 PM
Did you upgraded to vBulletin from another bulletin board?
As far as I can tell, is a database problem, not a script one, because if it would be something related to the script, it would show for all threads the wrong date.
Old threads from a different bulletin board, imported into vBulletin database will do this. You will have to contact a technician about this issue, is nothing I can do.

ladyfyre
01-31-2003, 12:37 PM
ugh :(
Yes, i upgraded LONG ago from UBB....but the dates show up correctly in the normal forum view....just not in the archive view.

TECK
01-31-2003, 12:39 PM
As I said, is nothing I can do about it. The script is working fine.
Check your database, because the script pulls what is in there.
If the database tells to the script that's the date, the script will display it.

The forums show the correct date because you are grabbing the date from the last post not the original date they were posted.
Since there is no importer who does a good job, the dates are messed, due to UBB format.

A workaround to your database problem would be this:
FILE: forumdisplay.txt
FIND: $threads = $DB_site->query("
SELECT threadid,title,dateline,replycountREPLACE WITH: $threads = $DB_site->query("
SELECT threadid,title,lastpost,dateline,replycountFIND: $thread['date'] = vbdate( $dateformat , $thread['dateline'] );
$thread['time'] = vbdate( $timeformat , $thread['dateline'] );REPLACE WITH: $thread['date'] = vbdate( $dateformat , $thread['lastpost'] );
$thread['time'] = vbdate( $timeformat , $thread['lastpost'] );That doesn't mean your database is not screwed anymore.
You should REALLY have it checked by a professional, or at least delete all old threads that belong to UBB.

ladyfyre
01-31-2003, 01:56 PM
Ok...after doing some checking around, and a hint from PPN that a fix for the UBB date problem existed, i found the fix here: https://vborg.vbsupport.ru/showthread.php?postid=130415#post130415

Just wanted to add it in case others had the same problem.

ladyfyre
01-31-2003, 06:07 PM
ACK!!!!!!!!!!!!

Ok...i have NO idea what i just did....but i blew up Apache, and had to re-install. Now the force-type is no longer working :(
I just get a 404 error when clicking the archive file....
Does anyone know how I can turn the force-type back on?

Overgrow
01-31-2003, 06:56 PM
Xi>>Installed it on 5 sites now. 3 public, 2 private.

Just curious, but why would you want spiders rooting through private forum archives?

Floris
01-31-2003, 07:00 PM
Originally posted by Overgrow
Xi>>Installed it on 5 sites now. 3 public, 2 private.

Just curious, but why would you want spiders rooting through private forum archives?

haha

I should have mentioned that it will be used not to get spidered but to do it like this:

1 group of users have full access to the forum/ directory
where the other group of users only have access to the archive/ generate files.

We use it so it is easier to read and simple overview without the forum features present.

(for the private server) - 500 workstations reading threads with 7 administrators working on threads :)

TECK
02-01-2003, 12:55 PM
Originally posted by ladyfyre
ACK!!!!!!!!!!!!

Ok...i have NO idea what i just did....but i blew up Apache, and had to re-install. Now the force-type is no longer working :(
I just get a 404 error when clicking the archive file....
Does anyone know how I can turn the force-type back on?
Did you checked with your tech if the mod_mime.c module (http://httpd.apache.org/docs/mod/mod_mime.html) is installed? This module includes the ForceType directive. Contact your host.
Heh, troubleshooting something I should not. :p

limey
02-01-2003, 08:00 PM
I'm having probs losing the cookie when I goto the archive page...

I assume its not taking the style because of that also...

However, its working fine on my other site (http://www.hometalkers.com)

I checked the templates and the site above has the templates installed under the regular templates heading. However, the site below has the templates installed under its own heading: Archive Templates. Is that the difference between the version 1.2 and 1.3 installers?


thanks in advance...

check it here (http://www.icalledit.com/forums)

TECK
02-01-2003, 09:22 PM
the difference between 1.2 and 1.3 is listed in the first post.
I would not worry about cookies since is viewed only by guests.
However it's working perfectly for every site tested.

The best test is here at https://vborg.vbsupport.ru/archive/

limey
02-02-2003, 12:06 AM
1) Can you explain why when I installed the hack on one site the templates are under "Default" and on another site the same vbarchive.php installed the templates under "Archive Templates"?

2) I think this might be the difference between the two sites. One is working and one is not. The one that is NOT working always says "Hello, Visitor". The actual archive works, just doesn't recognize the cookie.

note: problem #2 solved...make sure cookie domain in admin CP (smack) and delete any cookies from your site first before relogging in.)

TECK
02-02-2003, 01:13 AM
1. Because you didn't follow the readme file properly, when you installed the script onto the 2nd server.

limey
02-02-2003, 02:30 AM
So you are saying the templates normally do not install under its own category: "Archive Templates" or no?

saint_seiya
02-02-2003, 04:01 PM
I was wonderingi f you can add me to the list. :) This archive is the best, it rocks. ;)

http://www.vgcity.com/forum/archive/

(the others are similiar yeah but this ones template is cool and it works best with inktomi *the one i pay yearly for 2 day spidering :) )

TECK
02-02-2003, 04:09 PM
saint, can you make me a favor and make the copyright visible?
Thanks.

saint_seiya
02-02-2003, 04:15 PM
Sure, sorry thats the default font lol.

If you tell me how, there is no index file to edit, and if i change the color in my main vb settings my forum layout is gonna get screwed up :(

kuska
02-02-2003, 04:48 PM
TECK sorry to bother but i got more questions :)

Does it matter that i have a different Search Engine Indexer (Skuzzys) installed with your Archive? I figured why not double my chances of getting indexed since Skuzzy Archive uses different addresses for the forums, thread, posts than yours. Plus Skuzzys does not include your pimp dynamic Meta tags:)
Im just wondering if it matters to search engines... I installed Skuzzys about 3 weeks ago, within the 1st day intkomi started spidering it.... I installed your hack about a 1-2 weeks ago and no robots visit it and im just wondering is it because i have the skuzzys indexer installed also.

Here is what i have:
Skuzzys hack at
http://www.nakazdytemat.com/forum
Yours
http://www.NaKazdyTemat.com/archive


I dont see why there would be a problem between those two but i want your opinion on it TECK... Because if there is a problem im going to pick your hack and remowe SkuZzys...

And is anyone else doing what i am doing?

Thanks.

TECK
02-02-2003, 05:35 PM
Originally posted by saint_seiya
Sure, sorry thats the default font lol.

If you tell me how, there is no index file to edit, and if i change the color in my main vb settings my forum layout is gonna get screwed up :( Edit "archive" template, at the bottom, where is marked the copyright.

TECK
02-02-2003, 05:39 PM
Originally posted by kuska
TECK sorry to bother but i got more questions :)

Does it matter that i have a different Search Engine Indexer (Skuzzys) installed with your Archive? I figured why not double my chances of getting indexed since Skuzzy Archive uses different addresses for the forums, thread, posts than yours. Plus Skuzzys does not include your pimp dynamic Meta tags:)
Im just wondering if it matters to search engines... I installed Skuzzys about 3 weeks ago, within the 1st day intkomi started spidering it.... I installed your hack about a 1-2 weeks ago and no robots visit it and im just wondering is it because i have the skuzzys indexer installed also.

Here is what i have:
Skuzzys hack at
http://www.nakazdytemat.com/forum
Yours
http://www.NaKazdyTemat.com/archive


I dont see why there would be a problem between those two but i want your opinion on it TECK... Because if there is a problem im going to pick your hack and remowe SkuZzys...

And is anyone else doing what i am doing?

Thanks. There should be no problems using both archives. However, if you don't have them submitted, is useless to have them both, since the contents would be similar...

wooolF[RM]
02-02-2003, 06:17 PM
]* wooolF[RM] pokes TECK in the eye

just went by to say nice hack once again :) keep it up

Mike Gaidin
02-02-2003, 06:19 PM
Installed and working great. Thanks TECK. :)

saint_seiya
02-02-2003, 06:48 PM
Ok, i changed it:

http://www.vgcity.com/forum/archive/

Now can you add the link :)

Anyway, i was wondering if you could help me set up mod rewrite to rewrite all the urls of www.vgcity.com so they could be easily spidered. I am willing to pay :-/ I just need to get it done :(

kuska
02-02-2003, 08:26 PM
I did submit both of them :)
Ill just wait and pull my hair out till spiders come.
Thanks for this great hack.

jjj0923
02-03-2003, 08:41 PM
I installed this hack and a fine one it is.

I have mod_rewrite turned on in my apache server because I have a rewrite rule that keeps people from stealing my bandwidth and serving images from my site on their sites...

I installed the hack as instructed and embedded a link to /archive in my home page in the hopes that spiders would pick it up and follow it, I also visited the major search engines and submitted a new url ending with /archive...but JUST TO BE SAFE... I wrote a rule and some conditions that would insure spiders (the one's I know visit my site from my logs) would DEFINATELY spider the new html generated pages and here's how I did it:


RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^(.*)googlebot(.*)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)slurp(.*)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)scooter(.*)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)webcrawler(.*)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)surveybot(.*)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)mercator(.*)$ [NC]
RewriteRule ^(.*)\.php(.*)$ http://www.yourdomain.com/upload/archive [R=302]


Of course you would subsitute www.yourdomain.com with your domain name and substitute 'upload' with the directory name that contains 'archive' on your system.

I installed the hack yesterdaty and visits from slurp today are getting redirected to the archive generated pages.

Thanks for a great hack.

:)

kuska
02-04-2003, 02:05 AM
Usefull tool:
Link Popularity Check
Popularity script checks 7 major Search Engines for inbound links to your site.
Checks Google, AllTheWeb, AltaVista, AOL, MSN, HotBot, and Lycos.
Comparison to your competitors.
Option to send the report to your Email address.
Fast and accurate.

THIS IS NOT A SPAM POST!!!!! Just thought it might be usefull for this hack :) Im in the 0 to 250 references spot : \ ...vb.org has 2,501 to 10,000 references :)
Here is the link:

http://www.free-webmaster-tools.com/link_popularity.htm

TECK
02-04-2003, 02:48 AM
Thanks kuska. :)
Just checked now and it returned 1568 links for our site...

Cool tool. :)

jjj0923
02-04-2003, 11:42 AM
Got a question your Teck...

now that "we" have all these 'archive' links showing up on search engines, how do I redirect the non-search engines to my real forums once they click on a link to one of the archive pages that will be displayed in the search engine results. Should I just write a rule and add it to my .htaccess file?

How do you do it?

- jeff

jjj0923
02-04-2003, 11:52 AM
How about this as an enhancement?

Add some php code to archive that checks to see if the user_agent is one of the popular browers and if so, generate a link in the displayed /html that includes the threadid of the original post. This way, when a user clicks through from one of the search engines a link is displayed to that exact thread back to your real forums thread.

- jeff

TECK
02-04-2003, 02:13 PM
Ok, I corrected a minor bug that occurs when you have a new forum published with no threads on it.
Fix example (for the time it will stay empty):
http://www.teckwizards.com/forumdisplay/f-38.html

FILE: forumdisplay.txt
FIND: if ( $limitlower <= 0 )
{
$limitlower = 1;
}REPLACE WITH: if ( $limitlower <= 0 )
{
$limitlower = 1;
}
if ( empty( $totalthreads ) )
{
$limitlower = 0;
$showthreadbits = 'No threads available. Please try a different location.';
}
That will take care of the problem. I updated the .zip file.

jjj0923
02-04-2003, 02:23 PM
inktomi crawlers...

there have been 35 on my site since 8:00 am this morning - that's over 3 hours....

:)

Overgrow
02-04-2003, 02:31 PM
>>Should I just write a rule and add it to my .htaccess file

If you write a rule to auto-forward you may be banned by Google for link-cloaking. The best way is to try and make a prominent link, as you suggested. That's what my old hack did and hopefully Teck will update this to do the same.

jjj0923
02-04-2003, 02:42 PM
Originally posted by Overgrow
>>Should I just write a rule and add it to my .htaccess file

If you write a rule to auto-forward you may be banned by Google for link-cloaking. The best way is to try and make a prominent link, as you suggested. That's what my old hack did and hopefully Teck will update this to do the same.

banned by goggle - i think not.

google could care less if you redirect....for heavens save this entire hack is one big redirect.

:)

jjj0923
02-04-2003, 02:44 PM
while this is a great hack and will get your pages on websites, I think a lot of people are going to be very disappointed when they land on a page only to realize that there's no way to:

a) click through to your forum
b) find the original thread in your forum so that they can reply or join.

I showed this to a bunch of very seasoned web developers I work with and they ALL expressed the same opinion. It need to be enhanced, THEY THINK IT'S GREAT MIND YOU, but will leave the clicking-through end user somewhat disappointed. I'm look at the code right now to insert the link via the php (for browsers) instead of writing an external rule.


:)

TLucent
02-04-2003, 05:03 PM
How do you lock your board down for registered or logged in users only yet maintain your archives for spidering?

TECK
02-04-2003, 05:12 PM
What's the use to do this? This is really bad and unethical.
Scenario:
I search the web and find exactly what I need.
I go to the archive and click on the VIEW THREAD... BOOM, no permission.
I leave the site rightaway, swearing at the guy who had such an idea to publish twisted methods to gain more users on his board.

As long as you have my script installed, with my copyright on it, you are not allowed to use any forcing methods to block guests for the forums, but not to the archive. The script is designed to work the same way like the forums, and it will stay like that.
Feel free to uninstall it and use another script, I really want you to do this.

jjj0923
02-04-2003, 05:18 PM
Originally posted by TECK
What's the use to do this? This is really bad and unethical.
Scenario:
I search the web and find exactly what I need.
I go to the archive and click on the VIEW THREAD... BOOM, no permission.
I leave the site rightaway, swearing at the guy who had such an idea to publish twisted methods to gain more users on his board.

As long as you have my script installed, with my copyright on it, you are not allowed to use any forcing methods to block guests for the forums, but not to the archive. The script is designed to work the same way like the forums, and it will stay like that.
Feel free to uninstall it and use another script, I really want you to do this.

RIGHT ON!!! - I COULD NOT HAVE SAID IT ANY BETTER MYSELF.

EXPOSURE IS EVERYTHING. I take a look at 80% of the site listed here by people and they only have a few members. I want tens of thousands of members, not a few members. This is a great hack and gets you exposure!!!

TLucent
02-04-2003, 05:25 PM
Well Nakkid,

The site I run is a free service for anyone who REGISTERS with no ads and nothing is sold just a community forum that offers potentially usefull information for FREE. There is never nothing to buy and I pay for the hosting and bandwidth out of my pocket. I pay for the software that is ran on the website and I devote endless hours to provide a FREE service to whom ever would like to use it. All I ask is for them to register. I will respect your wishes with the use of your vbArchive code however do not respect your impractical suggestion of "twisted" and "unethical" methods "to gain more users" at the cost of the poor soul who devoted much time and money simply clicking on a link and ended up on my site.

.:TRansLucent.:

jjj0923
02-04-2003, 05:34 PM
Hey ".:TRansLucent.:"

You're an idiot....and now you're the first person on my "ignore list" :)

Respectfully submitted....

TLucent
02-04-2003, 05:39 PM
yeah and your funny with such useful, constructive contributions..

I'm off this subject my point stand clear.

TECK
02-04-2003, 06:18 PM
Originally posted by TLucent
Well Nakkid,

The site I run is a free service for anyone who REGISTERS with no ads and nothing is sold just a community forum that offers potentially useful information for FREE. There is never nothing to buy and I pay for the hosting and bandwidth out of my pocket. I pay for the software that is ran on the website and I devote endless hours to provide a FREE service to whom ever would like to use it. All I ask is for them to register. I will respect your wishes with the use of your vbArchive code however do not respect your impractical suggestion of "twisted" and "unethical" methods "to gain more users" at the cost of the poor soul who devoted much time and money simply clicking on a link and ended up on my site.

.:TRansLucent.:
Don't take in a negative way, but most visitors will think like that, the way I posted earlier.
I know what you mean by free service, look at me and my hacks... hell, vbHome eat my life for a week and I did it for free.

I was speaking not directly to you, but to a person who would do this. I guarantee you most people will close the browser and never go back to your site, if you block them... I know it from my own experience with my protected forums.

Since I open the gate to everyone, I had over 200 new members in 2 weeks. Before it used to be like this scenario:
User come and visit the site, he registers and he get accepted to the restricted areas, if he displays the vbHome (lite) copyright onto his website.
Look onto those forums:
http://www.teckwizards.com/forum/forumdisplay.php?s=&forumid=6

If you click on any thread, you will get a no_permission, until you register... but you still can view the thread titles. The rest of the forums are open to guests.
That is a hack I made, not released at vBulletin...

Anyway, the idea is this: is better to have your site open, trust me, people will register.
Good luck with your projects.

TECK
02-04-2003, 06:23 PM
Originally posted by Overgrow
>>Should I just write a rule and add it to my .htaccess file

If you write a rule to auto-forward you may be banned by Google for link-cloaking. The best way is to try and make a prominent link, as you suggested. That's what my old hack did and hopefully Teck will update this to do the same.
Overgrow, please post your mod, attached onto a .txt file so I can link it to the first post, with credit, like I did with Logician's mod.

jjj0923
02-04-2003, 06:26 PM
good points, Teck.

I welcome everyone and promote the living dayligghts of out my site. I paid for it all; bandwidth, servers, software everything to the tune of over $10,000 in the past year.

In less than a year I have over 1,600 members and ten more join everyday on average. I used to call potential advertisers, now they contact me. Open forums are the best...

:)

Banana
02-04-2003, 10:19 PM
Teck, if I have this hack installed and vblite - (how) does Googlebot retrieve 'achive'? Just that I've been crawled twice by Google and it only goes to the vanilla forums - and, yes, I have the Friendly URL link there too :(

TECK
02-04-2003, 11:28 PM
There are millions of websites out there with billions of pages. Be patient please.
It can take up to 2 months until you get "really" indexed, because it's done step by step...

Did you read the email from Google I posted in the first post?

Banana
02-05-2003, 02:29 PM
Yes thanks TECK I have. Yes we've been listed for months by Google, but none of the last 3 searches has found archives.

It always nice to read your condesending posts though. Makes my heart warm to you so much. I must remember to uninstall all your hacks because you really are intolerable most of the time.

jjj0923
02-05-2003, 02:38 PM
Hey Banana (spelled wrong ...oh and so is condescending). Please remove all his hacks.... it will be one less bozo he has to respond so in the future so that he can gratiously support (at his largesse I remind you) us who are truly appreciative of all of his excellent work and contributions.

:)

TLucent
02-05-2003, 07:25 PM
"googlebot.com (64.68.82.xxx) - Spider/Robot
04 Feb -- 18:33:47 -- -- Code 301 Moved Permanently = /forum"

I get this nearly everyday, and have never really been indexed. Is this normal?

Thx

saint_seiya
02-05-2003, 07:28 PM
This is weird, i pay for lycos insite select ( http://insite.lycos.com ) and it still has not been indexed :( Any ideas why? I added a link to the archive today, from my forum page in case that was it.

As you see it should be 48 hour spider refreshes and i installed this a while ago :) Am I doing something wron Teck? I did your archive this weekend, i will wait this week and then email lycos for support ;)

BTW, my site is www.vgcity.com , archive: http://www.vgcity.com/forum/archive :chinese:

PS.- Teck when you finish your other project can you tell me, i think it was vBHL . Thanks :) :smoke:

jjj0923
02-05-2003, 07:32 PM
google - don't think that's normal:

I get this:


2/5/2003 4:22:53 PM
Search String: googlebot
Replace String:
Path: D:\logs
File Mask: *.*
Search Subdirectories
crawler10.<googlebot>.com - - [23/Dec/2002:06:22:22 -0500] "GET / HTTP/1.0" 302 0 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
crawler10.googlebot.com - - [23/Dec/2002:06:22:22 -0500] "GET / HTTP/1.0" 302 0 "-" "<Googlebot>/2.1 (+http://www.googlebot.com/bot.html)"
crawler10.googlebot.com - - [23/Dec/2002:06:22:22 -0500] "GET / HTTP/1.0" 302 0 "-" "Googlebot/2.1 (+http://www.<googlebot>.com/bot.html)"
crawler10.<googlebot>.com - - [23/Dec/2002:06:22:24 -0500] "GET /robots.txt HTTP/1.0" 404 279 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
crawler10.googlebot.com - - [23/Dec/2002:06:22:24 -0500] "GET /robots.txt HTTP/1.0" 404 279 "-" "<Googlebot>/2.1 (+http://www.googlebot.com/bot.html)"
crawler10.googlebot.com - - [23/Dec/2002:06:22:24 -0500] "GET /robots.txt HTTP/1.0" 404 279 "-" "Googlebot/2.1 (+http://www.<googlebot>.com/bot.html)"
crawler10.<googlebot>.com - - [23/Dec/2002:06:22:28 -0500] "GET /upload/index.php HTTP/1.0" 200 122266 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

TECK
02-05-2003, 07:49 PM
302 is not an error, check w3c related sites for the error number.
The 404 you get it because you don't have a robots.txt file where resides the main files, not the forum ones.

wooolF[RM]
02-06-2003, 09:03 AM
]Just got an idea...

Imagine Forum home page :
Users Currently Online: 200 [ 100 users + 100 guests ] <-- just an EXAMPLE

idea is to trace IPs of all users and if they match any of the IPs owned by any of search crawlers like googlebot, altavista etc, show this :

Users Currently Online: 200 [ 100 users + 80 guests + Google + Altavista ] <-- just an EXAMPLE


Maybe looks ugly... maybe add extra queries... instead of dnsing/tracing all IPs u can just look after its ident (like Mozilla for IE).


PS: maybe it's not clever to add it on the forum home, but I would REALLY like to see this feature implemented on Who's Online page :)

I know u can do it, TECK ;)

Overgrow
02-06-2003, 05:57 PM
From the Google Webmasters FAQ:

What is cloaking?

The term "cloaking" is used to describe a website that returns altered webpages to search engines crawling the site. In other words, the webserver is programmed to return different content to Google than it returns to regular users, usually in an attempt to distort search engine rankings. This can mislead users about what they'll find when they click on a search result. To preserve the accuracy and quality of our search results, Google may permanently ban from our index any sites or site authors that engage in cloaking to distort their search rankings.

http://www.google.com/webmasters/faq.html


I'm assuming what you mean to do is give the spider a different page than a user gets if they click through from the search results.

That is Link Cloaking and that is grounds for banning, no matter how similar the pages are. Do at your own risk.

Overgrow
02-06-2003, 05:59 PM
>>Yes we've been listed for months by Google, but none of the last 3 searches has found archives

Hahahah let's blame Teck for Google's spidering. People using his hack are in Google. If you can't get your archive in there, that is your fault. Even people using my old vBSpiderFriend are doing very well in Google... DevShed was serving me answers with it just last week from the top Google 1-5 result spots.

Floris
02-06-2003, 07:21 PM
Tonight I received a google attack :)

49 guests online, 5 members :)

saint_seiya
02-06-2003, 07:23 PM
How did you all get that part where it says from where the guest came. I am going to read the readme again :p

Floris
02-06-2003, 07:23 PM
damn
IT DOESN"T STOP

4 Members and 53 Guests

The hosts are a vbulletin option > resolve hosts on whois online : yes.

saint_seiya
02-06-2003, 07:41 PM
Resolve IPs, cool thanks man! :)

BTW, congrats on the visitors, i hope that everysingle one of them joins (same for my site :bandit: ) :):):)

kuska
02-06-2003, 10:28 PM
Damn Google just wont visit me :(

saint_seiya
02-06-2003, 10:32 PM
I got like 6 bots on right now. googlebots. =|

TECK
02-07-2003, 01:28 AM
Originally posted by xiphoid
damn
IT DOESN"T STOP

4 Members and 53 Guests
Most users ever online was 58 on 06-02-2003 at 21:21.
There are currently 0 members and 49 guests on the boards.

That helped you break the record of visitors also, heh. Still having Google now onto your board...

Overgrow
02-07-2003, 02:55 AM
AUTO-FORWARDING

I'll put this out there for those of you who want to send users to the real forum page and spiders to the archive. I hope you don't get banned for link cloaking... see my last post.


$homeurl="yourdomain.com";
if ((!stristr(getenv(HTTP_REFERER),$homeURL)) or (strlen(getenv(HTTP_REFERER)) < 1)) {
header("location:http://www.$homeurl/forum/showthread.php?threadid=$thread[threadid]");
}


It checks if the referrer page is on your domain. If it's not, they obviously came in from somewhere else (ie, search results), so get out of the archive and on to the real thread.

wooolF[RM]
02-07-2003, 02:02 PM
]Originally posted by wooolF[RM]
Just got an idea...

Imagine Forum home page :
Users Currently Online: 200 [ 100 users + 100 guests ] <-- just an EXAMPLE

idea is to trace IPs of all users and if they match any of the IPs owned by any of search crawlers like googlebot, altavista etc, show this :

Users Currently Online: 200 [ 100 users + 80 guests + Google + Altavista ] <-- just an EXAMPLE


Maybe looks ugly... maybe add extra queries... instead of dnsing/tracing all IPs u can just look after its ident (like Mozilla for IE).


PS: maybe it's not clever to add it on the forum home, but I would REALLY like to see this feature implemented on Who's Online page :)

I know u can do it, TECK ;)

no comments at all? :paranoid:

TECK
02-07-2003, 02:31 PM
You can't do it, due to the way it is now set the database in vBulletin, at least not to my knowledge.$loggedins = $DB_site->query_first("
SELECT COUNT(*) AS sessions
FROM session
WHERE userid=0 AND lastactivity>$datecut
");

Floris
02-07-2003, 02:32 PM
KuraFire has released a hack to modify the whois online :) after my suggestion and request.

Online 15 users
Staff: 3 (user1,user2,user3)
Members: 7 (user4,user5,user6,user7,user8,user9,user10)
Guests: 5 (user11,user12,user13,user14,user15)

Maybe guests can now be split into:
Guests: 2 (user11,user12)
Search Engines: 3 (user13,user14,user15)

TECK
02-07-2003, 02:33 PM
You cannot Floris, that's why I posted the query.

Floris
02-07-2003, 02:54 PM
When I typed my post, your post wasn't there yet :)
teck: Today 05:31 PM
xip: Today 05:32 PM

Idea: can't we rewrite it to have a seperate usergroup id for bots from search engine? like the ban script- funtion, but instead of banning, showing its a search engine bot: google. :)

wooolF[RM]
02-07-2003, 02:56 PM
]:(

TECK
02-07-2003, 03:25 PM
The way it works now vBulletin is this:
Every time an user (guest or member) enters the site, a unique session is created, that is automatically deleted after 900 seconds, if not in use anymore.

The highlighted part, userid=0, reflects only the guests, since they have no userid's. So the query counts the the sessions opened by those users, not their user agent or any other ident method.

Unfortunately, there is no way around this... is not as simple as it is with members.

inphinity
02-07-2003, 03:45 PM
something slightly similar (see attachment)
search engines are listed in italics

wooolF[RM] yes it is possible although it would add 1 extra query to index.php - covering every search engine isnt realistic (new ones everyday etc) but doing the major ones is fairly easy.

i'll get floris to test some bits tonight if he's around on irc

inphinity
02-07-2003, 03:47 PM
Originally posted by TECK
Hmmmmmm... I'm really pissed... you see the crawler918.com in the pic above?
That's a spy. Read more here (http://www.advogato.org/article/610.html) about it.

To block the scums, add onto htaccess.txt file, at the top, this information:<limit GET>
order allow,deny
deny from 12.148.196.
deny from 12.148.209.
allow from all
</limit>then upload the file and rename it to .htaccess
That will block them for good, damn crooks.

deny from 12.148.196.
deny from 12.148.209.

those deny's block more that just the monkeys at crawler918.com
my limit section in htaccess looks like:

<limit GET POST>
order allow,deny

## -----------------------------------------------------------------------------
## block crawler918.com - http://www.nameprotect.com
## http://www.advogato.org/article/610.html
## http://ws.arin.net/cgi-bin/whois.pl?queryinput=NAMEPROTECT.COM
## http://ws.arin.net/cgi-bin/whois.pl?queryinput=!%20NET-12-148-196-128-1 /25
## http://ws.arin.net/cgi-bin/whois.pl?queryinput=!%20NET-12-148-209-192-1 /26
## http://ws.arin.net/cgi-bin/whois.pl?queryinput=!%20NET-12-175-0-32-1 /28
deny from 12.148.209.192/26
deny from 12.148.196.128/25
deny from 12.175.0.32/28
## -----------------------------------------------------------------------------
## block cyveillance - http://www.cyveillance.com
## http://www.webmasterworld.com/forum11/1587.htm
## http://ws.arin.net/cgi-bin/whois.pl?queryinput=CYVEILLANCE
## http://ws.arin.net/cgi-bin/whois.pl?queryinput=!%20NET-63-148-99-224-1 /27
## http://ws.arin.net/cgi-bin/whois.pl?queryinput=!%20NET-65-118-41-192-1 /27
deny from 63.148.99.224/27
deny from 65.118.41.192/27
## -----------------------------------------------------------------------------

## ================================================== ===========================
## -----------------------------------------------------------------------------
allow from all
</limit>


cyveillance do roughly the same thing as nameproctect
also in my robots.txt i've got:

# allow everyone else
User-agent: *
Disallow:

# block turnitin.com
User-agent: TurnitinBot
Disallow: /

www.turnitin.com might be a good cause for teachers - but they charge for accessing the data they've collected - so i'd rather not have them using my bandwidth/server load for free. So they can stay off my site until they decide to give a little back.

TECK
02-07-2003, 03:56 PM
Hmm, I did a WHOIS on their company (crawler918) and it came up with those 2 IPs...
Just curious, what results you got (other names)?

wooolF[RM]
02-07-2003, 04:28 PM
][ 19:31:44 ] _? ? /dns [ www.crawler918.com ] ...
[ 19:31:45 ] _? ? Failed to resolve : [ no such user ]

wooolF[RM]
02-07-2003, 04:29 PM
]Originally posted by inphinity
something slightly similar (see attachment)
search engines are listed in italics

wooolF[RM] yes it is possible although it would add 1 extra query to index.php - covering every search engine isnt realistic (new ones everyday etc) but doing the major ones is fairly easy.

i'll get floris to test some bits tonight if he's around on irc

Looks SEXY!!! Just what I needed :D

Also 1 extra query is not that much... maybe just adding it to who's online...

Thanx for the effort! :D

Floris
02-07-2003, 05:05 PM
Wow, that was easy, only 5 lines of code or something :)

wooolF[RM]
02-07-2003, 08:51 PM
]as said earlier, looks sexy :) if you could also release it instead of teasing me :p ;)

Floris
02-07-2003, 08:58 PM
If inp allows me to make a release, sure :)

wooolF[RM]
02-07-2003, 09:00 PM
release? it's just an addon to existed hack... uhm...

Floris
02-07-2003, 09:03 PM
"I will release it"
"I will addon it"

I will go with 'release'.

wooolF[RM]
02-07-2003, 09:10 PM
sorry... /me hides in the nearest bush and cries silently...

kuska
02-07-2003, 09:58 PM
w00t, this is to l33t teck :)
FINALLY !!!!!!!!!!!!!!!!
43 Google BOTS crawling since yesterday and STILL AT IT !!!!!!
Thanks TECK :)
HOTM for this HACK !!!!!!!

limey
02-07-2003, 10:10 PM
im so lucky...im mostly experiencing the turnitin.com crawl.

added them to robots.txt, but have to wait till their cached version expires.

inphinity
02-07-2003, 10:58 PM
you can add a broad deny for them for 48 hrs (time it takes for their cache of robots.txt to expire)

## turnitin.com
deny from 64.140.49

remember to remove it thou, it blocks a few more than turniton.com but they dont own their own ip block...

>as said earlier, looks sexy if you could also release it instead of teasing me

just cleaning stuff up atm need to write instructions as well :/

wooolF[RM]
02-07-2003, 11:07 PM
]oki, I'll just hang on, thanx for the job u guys do :)

TECK
02-07-2003, 11:50 PM
Originally posted by inphinity
something slightly similar (see attachment)
search engines are listed in italics

wooolF[RM] yes it is possible although it would add 1 extra query to index.php - covering every search engine isnt realistic (new ones everyday etc) but doing the major ones is fairly easy.

i'll get floris to test some bits tonight if he's around on irc
My mistake, I missunderstood. I thought wooolf[RM] is referring to the online users, on the forumhome page, not the actual online.php file.

About the index.php file, you said is possible to be done also, can you post the code? I would like to see please, so I can learn from you a tip.
Is not possible into my eyes...

TECK
02-07-2003, 11:57 PM
Originally posted by kuska
w00t, this is to l33t teck :)
FINALLY !!!!!!!!!!!!!!!!
43 Google BOTS crawling since yesterday and STILL AT IT !!!!!!
Thanks TECK :)
HOTM for this HACK !!!!!!!
As I said, it takes time.
Please don't panic if the links are dropped in a week or 2, is normal... they are moved from the "fast" crawl to the deep one.

TECK
02-08-2003, 11:10 AM
If you want to display nice names for your crawlers, instead of "Guest", see attached file (20 seconds install).
All you have to do is to add your crawler name and IP part.

NOTE: Pay attention to the commas, when you add each crawler.
Notice that the last one doesn't have a comma at the end.

wooolF[RM]
02-08-2003, 11:34 AM
]Originally posted by TECK
If you want to display nice names for your crawlers, instead of "Guest", see attached file (20 seconds install).

Thanx for the snippet :cool: :banana: :D

Floris
02-08-2003, 11:36 AM
You guys should have just waited.

wooolF[RM]
02-08-2003, 11:38 AM
]Originally posted by xiphoid
You guys should have just waited.
for what? :) TECK released a nice snippet, it works here (just tried), no extra load noticed... Just a great addition to the board :)

TECK
02-08-2003, 02:09 PM
Example of the script in action for crawler name instead of Guest...
Hmmm, 27 Google crawlers not chewing the web site...

TECK
02-08-2003, 02:13 PM
Guys, if you get new crawler IP's, please post them here so everyone can add them...
Thanks.

Floris
02-08-2003, 02:37 PM
This is why our script is better, it doesn't care about the IP

Here are some screenshots for inph to link to.

Floris
02-08-2003, 02:37 PM
He will soon release his addon, which will adjust the nosessioshash part and makes guest turn into the bot on online.php and shows how many bots are online on index.php at whois online section (following me?) hehe

TECK
02-08-2003, 02:39 PM
Originally posted by xiphoid
This is why our script is better, it doesn't care about the IP

Here are some screenshots for inph to link to. Who's saying my script is better?
You should release it so everyone can use it.

Floris
02-08-2003, 02:40 PM
Originally posted by TECK
Who's saying my script is better?
You should release it so everyone can use it.

Because we are working on it, geez, told you 500x on irc already :) Everybody can use, *when it's done*
Just sit down and wait :banana:

TECK
02-08-2003, 02:41 PM
Well, you posted screenshots, so I presumed is done.
Then you should wait before you post anything... :p

And I don't like to sit down. :banana:

Floris
02-08-2003, 02:42 PM
Originally posted by TECK
Well, you posted screenshots, so I presumed is done.
Then you should wait before you post anything... :p

And I don't like to sit down. :banana:

Then we just let you wait another day maybe :D

TECK
02-08-2003, 02:51 PM
* TECK starts the revolution.... :)

Floris
02-08-2003, 11:47 PM
inph - you got the code done! Stop playing wc3 and start posting :) now EYE even get inpatient

wooolF[RM]
02-09-2003, 12:38 AM
]Originally posted by xiphoid
Stop playing wc3 and start posting :)

LOL ;) :cheeky: *sorry for spam* :classic:

inphinity
02-09-2003, 01:05 AM
oi wc3 is important :p

useragent checking

Works both standalone and as a very nice compliment to TECK's vbarchive hack.

What does it do?
Allows you to match the useragent for Guests in Who's Online and display custom names/urls for recognised useragents such as Google, Teoma, Inktomi etc

You can also use it for matching the useragent anywhere on vb, ie for Currently Active Users on forumhome. expect a jazzed up online.php with icons next to names for which browser people are using sometime.

Why?
Got bored of looking up IPs then digging around the session table trying to find out which guests were really web robots also nosey to see who was reading the archives.

Install
Instructions in the file, should work with 2.2.x
Install time, 3-5mins. level, medium.

List of Detected Web Robots (thanks to TECK for listing the main ones)
Last updated: 08/02/03 10pm GMT
googlebot www.google.com Google
gulliver www.northernlight.com Northern Light
ia_archiver www.archive.org The Internet Archive
internetseer www.internetseer.com Internet Seer
linkalarm linkalarm.com Link Alarm
mercator www.research.compaq.com/SRC/mercator Mercator
openbot www.openfind.com.tw Openbot
pingalink www.pingalink.com PingALink Monitor
psbot www.picsearch.com/bot.html PicSearch
scooter www.altavista.com AltaVista
slurp www.inktomi.com/slurp.html Inktomi
turnitinbot www.turnitin.com/robot/crawlerinfo.html Turnitin
slysearch www.turnitin.com/robot/crawlerinfo.html Turnitin
zeus www.waltbren.com/products/zeus_internet_robot.htm Zeus Internet Marketing
zyborg www.wisenutbot.com WiseNut
teoma www.teoma.com Teoma/Ask Jeeves

-- these last 3 are generic and will display the useragent on who's online with a link to robotstxt.org where you can look up the useragent for obsecure and new bots.

spider Web Spider
spyder Web Spyder
crawl Web Crawler
robot Web Robot

Screenshots?
Who's online:
https://vborg.vbsupport.ru/attachment.php?s=&postid=351495
https://vborg.vbsupport.ru/attachment.php?s=&postid=351832
https://vborg.vbsupport.ru/attachment.php?s=&postid=351533

Currently Active Users
https://vborg.vbsupport.ru/attachment.php?s=&postid=351831

enjoy,
inph

thanks to floris for screenshots and testing

inphinity
02-09-2003, 01:15 AM
just a quick note if you're using TECK's guest_crawler

you should add the trailing dot to the ip addresses

'Google' => '216.239.46.',
'Inktomi' => '66.196.72.'so that you dont match (ie an octet at the beginning):
*216.239.46*
*66.196.72*

with the trailing dot you will only match:
216.239.46.*
66.196.72.*

:)

also a minor point for the vbarchive installer

the templates added are set to templatesetid=-1
which is fine but in vB's upgrade scripts, lines like:

$DB_site->query("DELETE FROM template WHERE templatesetid=-1 AND title<>'options'");tend to obliterate peoples templates :)

i would recommend adding the templates twice once with -1 and then again with the style id's so they appear as custom templates (with default content)

limey
02-09-2003, 05:48 AM
hey is 1000 hits by googlebot in 2 days good?

edited the number from 609 -1000

TECK
02-09-2003, 07:52 AM
Originally posted by inphinity
just a quick note if you're using TECK's guest_crawler

you should add the trailing dot to the ip addresses

'Google' => '216.239.46.',
'Inktomi' => '66.196.72.'
Very good point, I edited the file. :)
Also, I'm going to add your mod in the first post, with credit of course. Great job. :)
About the templates, is really easy, simply run again the installer, no need to recustomize the templates because they are automatically saved, if they are edited (not original), so your work is not lost...

EDIT: Link added. Check no. 5 in Forum Optimizations section. ;)

Floris
02-09-2003, 10:52 AM
Glad you liked the hack teck

wooolF[RM]
02-09-2003, 01:42 PM
]@ inphinity > big thanx for adding that feature to show web robots on home page! and also thanx for releasing this addon! Very nice :)

PS: I think you should also release it as a hack so people will know it exist and it will be possible to add this hack to the fine collection of vb.org hacks :)

Cheers!

wooolF[RM]
02-09-2003, 08:39 PM
]:: 51 members, 46 guests and 32 web robots (Google) on the boards

Nice to see it on the main forum page! Thanx again for the great addon! :D

Mike Gaidin
02-09-2003, 08:45 PM
Originally posted by inphinity
oi wc3 is important :p

useragent checking

Works both standalone and as a very nice compliment to TECK's vbarchive hack.



In the instructions for the modification of functions.php it just has a piece of code, but no instructions as to where to put it. Where does it go?

wooolF[RM]
02-09-2003, 08:57 PM
]Find ?> Add ABOVE this code the code mentioned in the attached file

limey
02-09-2003, 09:13 PM
Looks like those first googlebots were scouts and they sent the deepcrawlers over the past few days. Here they are in action.

wooolF[RM]
02-09-2003, 09:16 PM
]They are crawling my forum right now :D
I have about 120 users online + 32 Google bots :D

codewebs
02-09-2003, 10:39 PM
isnt this a bot? inktomi2-wat.server.ntl.com

if so how can i add it to inphinity 's hack?

inphinity
02-10-2003, 08:19 AM
Originally posted by codewebs
isnt this a bot? inktomi2-wat.server.ntl.com

if so how can i add it to inphinity 's hack? that isnt a web robot its:

Inktomi's Traffic Server network cache (transparent cache aka web proxy)

basically sits between ntls end users and the net and saves them bandwidth by caching files and requires no end user configuration

an old new article: http://www.internetnews.com/xSP/article.php/44591

--
on a side note
updated instructions for location of code in functions.php (thanks J-OST and wooolF)

## At the bottom of functions.php
## Just BEFORE ?>

Floris
02-10-2003, 12:38 PM
Most users ever online was 81 on 10-02-2003 at 03:41.

whoo hoo :)
No members online last night but a few, and just a few normal guests. 75+ search engines crawlers

scary :alien:

hypedave
02-10-2003, 01:31 PM
Hey Teck,
I finally got this puppy installed, the archive is working great, but now when I go to my vbhome page, I get the following error

Fatal error: Cannot redeclare archive_nopermission() (previously declared in /home/ochroma/public_html/alpha1/global.php:228) in /home/ochroma/public_html/alpha1/forum/admin/functions.php on line 2497

What have I done wrong at 3am in the morning?

line 2497 looks like this

function archive_nopermission()

TECK
02-10-2003, 02:04 PM
You need to install the vbHome addon instead (search engine indexer), that does this.
http://www.teckwizards.com/forum/showthread.php?s=&threadid=617

mini2
02-10-2003, 04:05 PM
Just installed. Works well, just got to wait for the little spiders to come and crawl all over my site.....

www.mini2.com/forum/archive/

Thanks very much.

Floris
02-10-2003, 04:27 PM
By the way, is the latest archive hack xhtml compliant, or can we expect an update for 1.4 (maybe with the other tweaks too) ? So it feels more natrual together with vbhome4

TECK
02-10-2003, 05:57 PM
I just found a site to submit for FREE to several engines. It doesn't hurt if we try it...
http://www.ineedhits.com/add-it/free/

TECK
02-10-2003, 06:53 PM
Originally posted by xiphoid
By the way, is the latest archive hack xhtml compliant, or can we expect an update for 1.4 (maybe with the other tweaks too) ? So it feels more natrual together with vbhome4
No is not, Floris. It uses the old vBulletin code.
For those who want a true XHTML compliant solution, you will have to install vbHome (lite) and use it's indexer add-on or to edit your current templates to make it also XHTML compliant.
I will not release a new version.

saint_seiya
02-10-2003, 08:48 PM
Triple post! *applause*

Hwulex
02-10-2003, 10:45 PM
w00t w00t! :D

Google came a nosing round today. Up to 28 on at one point, unfortunately I missed it, but was reliable informed by members of the event :D

TECK
02-12-2003, 06:57 PM
Originally posted by saint_seiya
Triple post! *applause*
Deleted all extra posts... for some reason VB was not responding... :)
Hwulex: So far the highest number of crawlers I saw on my site was 38. Some of other users got hit up to 47...

KeithMcL
02-13-2003, 05:39 PM
I just installed this hack and it's working great (www.webdevforums.com/archive/). About to go submit my link to all SE's mentioned in your first post.

I was going to install the other SE friendly hack, but after reading the problems other members had I decided on trying this one and it was dead easy to install. Good Job :)

BTW, I also changed the metatags keywords and description to my own ones.

Now lets hope that I get indexed by all the SE's I submit too.

WoodiE
02-14-2003, 04:52 PM
TECK,

Yey ANOTHER great hack from you! I had it installed within only a few minutes and it work great ( http:www.RCNitroTalk.com/forum/archive )

Thanks!


-Michael

TECK
02-15-2003, 10:47 PM
You are supposed to have it in http://www.mydomain.com/forum/ your archive.
You are not posting related to the vbHome add-on, right? Because it would be the wrong thread.

TECK
02-15-2003, 11:31 PM
What is the URL of your forums? Is the root?

TECK
02-15-2003, 11:44 PM
It will not work, unless you hack all files.
Your archive files must be in the /forum folder, as instructed in the readme file (I quote from):
[ROOT FOLDER] (no files to upload here)
---[FORUM] - upload here the following files:
- archive.txt
- forumdisplay.txt
- htaccess.txt
- showthread.txt

Schorsch
02-15-2003, 11:57 PM
what's wrong I only get text ??

click (http://www.fetter-esel.com/vB/archive)

TECK
02-16-2003, 12:41 AM
Do you have the ForceType directive enabled? Don't think so. ;)
Check with your host if mod_mime is installed and that the directive is enabled.

Schorsch
02-16-2003, 10:54 AM
Originally posted by TECK
Check with your host if mod_mime is installed and that the directive is enabled.

Hi TECK,

mod_mime is installed. how can I enable the "directive" ??

mini2
02-16-2003, 11:49 AM
Teck, just to restate this has been working a treat, not sure when Google will update it's index (is it 1st of each month) but Inktomi and Google in particular have been crawling ALL OVER the archived threads (also seen a few other drop by, but google's gone nuts).

www.mini2.com/forum or www.mini2.com/forum/archive/index.html

:)

Top marks.

tkeil69575
02-16-2003, 11:17 PM
this is a great script teck - thanks.

one question though. would it be possible to only let admins see the the search engines on "who is online", while still showing normal guests to users?

tina

glenvw
02-17-2003, 12:22 AM
oops....

Ignorant me. I just upgraded to VB 2.30
Guess what? I screwed up your hack that was working so good:
http://www.yes-its-free.com/vbbs/archive/

Is there an easy fix ( I hope!)

TECK
02-17-2003, 06:47 AM
Ya, run the installer again, it will re-add the templates. :)

glenvw
02-17-2003, 10:56 AM
thank you sir....

Domenico
02-17-2003, 11:52 AM
The forums subforums and subforums headlines get spidered but the actual posts are not.
Now I can search for the headlines (subjects) in google and my page gets on top but when searching for sentences found in the post google gives nothing back because they aren't archived.

Is it just me or is that normal?

TECK
02-19-2003, 02:28 AM
It's "normal"... it will take some time but Google will get them all.

GT2002
02-20-2003, 04:57 PM
googlebot went nut on my site for almost week..... wonder when they will update their index???

PennylessZ28
02-21-2003, 01:51 PM
I installed, I submitted, and a week later, my site stats are even higher than before on the search engines. AWESOME

Overgrow
02-21-2003, 03:37 PM
Sorry, I've got the Googlebot busy with a few million appointments at my site :D

Boofo
02-21-2003, 05:49 PM
Overgrow, how do you get the web robot to show up for the Current active users (like on your site)?

TECK
02-21-2003, 10:21 PM
Inphinity released an online hack, is linked on the first post.

Boofo
02-21-2003, 10:34 PM
Will it work with vBHome Lite?

TECK
02-21-2003, 10:36 PM
Ya, it has nothing ot do with vbHL but the forums only.

Boofo
02-21-2003, 11:45 PM
In the following instuctions for the addon by inphinity, how would we call this function (and where do we call it from) in the online.php?

## function for user ip address checking
## matches full/part of an ip address
## might be useful for people who dont have a .htaccess file
## or those who want to identify bots who dont supply a valid or a cloaked
## useragent. probably should be called on return of 0 from useragentcheck
## in online.php
##
## i think its unnecessary. also the ip address matching isnt great since php
## cant handle CIDR addresses so either you break the ip address up and match
## values or you use ranges (as below) which will also identify ip outside
## the allocated range
## ie crawler918.com
## http://ws.arin.net/cgi-bin/whois.pl?queryinput=!%20NET-12-148-209-192-1
## 12.148.209.192/26
## /26 is 62 ip addresses identifying 12.148.209. means that you're blocking 254 ip
## address which will exclude non rogue ips.
## ip address have a tendancy to change and would result in a fairly bit list.

function useripaddresscheck( $match_addr, $addr_code )
{

Schorsch
02-25-2003, 10:49 AM
it works now ( click (http://www.fetter-esel.de/vB/archive/) ) :) but I get an IE6 syntax error when I click on a link.

how can I fix it ?

thanks!

Schorsch

Stu
02-25-2003, 04:13 PM
Thanks Teck,
works a treat, within 12 hours had lots of crawlers scanning the forums :D

All I've got to do now is work out the best way to alter {imagefolder} currently it's set to /forums/images/
and the vBbutton is incorrect. http://www.mysite.com/forums//forums/images//vb_bullet.gif

The DIBB Archive (http://www.thedibb.co.uk/forums/archive/)

All Sorted..

julius
02-26-2003, 01:51 PM
Are html pages zipped as vb pages are?

TECK
02-26-2003, 02:40 PM
If you ask if they are compressed with zlib, the answer is YES, if you have it enabled onto your vBulletin options.
There is no difference between the actual thread and the .html page.

EvilLS1
02-27-2003, 10:43 PM
Very nice & useful hack. Thank you TECK.

I installed it yesterday and now I'm just waiting on the spiders to visit. :)

Schorsch
02-27-2003, 11:19 PM
Originally posted by EvilLS1
I installed it yesterday and now I'm just waiting on the spiders to visit. :)

how long does it take ?

EvilLS1
02-27-2003, 11:34 PM
Originally posted by Schorsch


how long does it take ?

To install the hack or for the spiders to visit?

To install the hack: 10 minutes or so.

For the spiders to visit: Depends on the search engine I think.

:)

NexDog
03-01-2003, 12:25 PM
Teck,

Would seem you have outdone yourself yet again. :)

We have been working on some heavy SEO in the last 8 weeks and have started to get some great results out of Google. My SEO ace has always said that link poularity and CONTENT were key points in getting good rankings and as Google is due to dance today, I suddenly thought about this hack of yours and I link it will serve us proud. Our archive:

http://www.hostnexus.com/forum/archive/

I think I will link to it from our [url=http://hostnexus.com/home/sitemap.htm]SiteMap[/url. Would you think that would be best so the bots get there? I mean, is it pointless linking to it from the forum home page?

TECK
03-01-2003, 02:52 PM
As long as you do the first 3 steps related to forum optimizations, you are ok...
A text link will always help.

Schorsch
03-01-2003, 03:05 PM
Originally posted by EvilLS1
For the spiders to visit: Depends on the search engine I think.


what about google ?

Kars10
03-01-2003, 03:20 PM
Originally posted by Schorsch


what about google ?

...einmal im Monat! Musst du aber auch in den Meta-tags eingeben. ;)

NexDog
03-01-2003, 11:34 PM
Originally posted by TECK
As long as you do the first 3 steps related to forum optimizations, you are ok...
A text link will always help.
Teck,

I don't get it.

1) Sessionhash - None of the archive's files have session IDs anyway?

Example: http://www.hostnexus.com/forum/showthread/t-3208.html

2) Why do I need to block the crawler from certain pages as they can't get in as they will be "sessioned?

3) Why do we need to link the forum thread to the archive? Aren't we doing this because the bots can't spider our forums? I see the archive thread is linked to the forums though.

Lastly, how do I block the archive from displaying our "Admin and Mod" forums? Sorry for all the questions but could do with some clarification here as Google is due any minute now so everyone get ready for the deep crawl. :)

mheinemann
03-01-2003, 11:46 PM
Originally posted by NexDog

Lastly, how do I block the archive from displaying our "Admin and Mod" forums?

The archive uses forum permissions, so mods/admins will be able to see it, but normal users shouldn't. Try logging out and going to the archive to see if it still shows up for you.

NexDog
03-01-2003, 11:58 PM
Heh, right you are too. What a nonse I am. :D

Okay, feel free to have a stab at my other questions. ;)

NexDog
03-02-2003, 12:21 AM
Okay, added all these mods - makes sense actually. One small refinement was the addition of <smallfont> tags around the "Friendly URL" link.

Still worried about the sessionhash optimisation. Is it really needed as I don't see any sessionhash IDs in the archive....

Originally posted by TECK
Here it is another interesting mod I made to my forums, to link back to archives.
This will help because the crawlers will see the friendly URL's easier.
Basically the mod links every forum/thread to the archive.
To test it, mouse over each forum or thread icon, while viewing my forums (http://www.teckwizards.com/forum/).
Also, check the link I added under the thread title, while viewing the actual (not archived) thread.

TEMPLATE: forumhome_forumbit_level1_post
FIND: <td valign="top"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt=""></td>REPLACE WITH:
<td valign="top"><a href="forumdisplay/f-$forum[forumid].html"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt="$forum[title] Archive"></a></td>TEMPLATE: forumhome_forumbit_level2_post
FIND: <td valign="top"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt=""></td>REPLACE WITH:
<td valign="top"><a href="forumdisplay/f-$forum[forumid].html"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt="$forum[title] Archive"></a></td>TEMPLATE: forumdisplaybit
FIND: <td bgcolor="{firstaltcolor}"><img src="{imagesfolder}/$thread[newoldhot].gif" border="0" alt=""></td>REPLACE WITH:
<td bgcolor="{firstaltcolor}"><a href="showthread/t-$thread[threadid].html"><img src="{imagesfolder}/$thread[newoldhot].gif" border="0" alt="Archive: $thread[title]"></a></td>TEMPLATE: forumdisplay_forumbit_level1_post
FIND: <td valign="top"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt=""></td>REPLACE WITH:
<td valign="top"><a href="forumdisplay/f-$forum[forumid].html"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt="$forum[title] Archive"></a></td>TEMPLATE: forumdisplay_forumbit_level2_post
FIND:
<td valign="top"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt=""></td>REPLACE WITH:
<td valign="top"><a href="forumdisplay/f-$forum[forumid].html"><img src="{imagesfolder}/$forum[onoff].gif" border="0" alt="$forum[title] Archive"></a></td>TEMPLATE: showthread
FIND:
$navbarREPLACE WITH:
$navbar<br>
<img border="0" src="{imagesfolder}/firstnew.gif" width="14" height="14" align="middle" alt=""> <a href="showthread/t-$thread[threadid].html">Friendly URL Link</a>

TECK
03-02-2003, 12:32 AM
Originally posted by NexDog
Still worried about the sessionhash optimisation. Is it really needed as I don't see any sessionhash IDs in the archive....
This is not for the archive files, but actual forums, that's why they are part of "forum" not "archive" optimizations.
The crawlers will visit your forums also and they will not like the session hash.

NexDog
03-02-2003, 02:41 AM
So there's no real need to remove the sessionhash? I mean, the bots are spidering the archive instead right?
BTW, inktomi is deep inside my archive right now. :)

Where is the online.php fix?

NexDog
03-02-2003, 02:48 AM
Hmmm, something seems amiss. I have to to find and implement the online.php hack so who's online is showing me the full URL and it is sessioned - even in the archive. Definitely concerned here as I thought the archive was supposed to produce unsessioned links.

NexDog
03-02-2003, 02:50 AM
When I browse the archive, I get no sessioned URLs - any ideas on why inktomi is picking them up?

NexDog
03-02-2003, 03:01 AM
Okay, implemented the sessionhash fix. Still unsure why inktomi has sessionhash IDs though.....

Going to look for the online.php fix now. :p

NexDog
03-02-2003, 03:09 AM
Sweet, online.php fixed. Talking to myself. :D

Going out for a bit. Teck, if you see this, still would like to know about the URL seen in screenie...

NexDog
03-02-2003, 01:14 PM
OMG! Thirty googlebots just can't get enough of Nexology. :)

Teck, you are the man, without doubt. :D

Few things though. Have only seen them hit the Archive once or twice. They seem to be casing the forum due to the sessionhash being removed.

I tried the useragent hack (by inphinity) but that just returned:

Fatal error: Call to undefined function: no_sessionhash() in /home/httpd/vhosts/hostnexus.com/httpdocs/forum/global.php on line 296

Not sure if I was doing things wrong, but I removed your extra mod to functions.php:

function no_sessionhash()

{

global $session;



$agent = array(

'crawl',

'googlebot',

'gulliver',

'ia_archiver',

'internetseer',

'linkalarm',

'mercator',

'openbot',

'pingalink',

'psbot',

'scooter',

'slurp',

'slysearch',

'zeus',

'zyborg',

'otheruseragentcrawleryouwant'

);



foreach( $agent as $useragent )

{

if ( stristr( getenv( 'HTTP_USER_AGENT' ) , $useragent ) )

{

$session['sessionhash'] = '';

}

}

}


and replaced that with:

function useragentcheck( $match_agent, $agent_code )
{

$agent = array(
'googlebot' => 'www.google.com/|||Google',
'gulliver' => 'www.northernlight.com/|||Northern Light',
'ia_archiver' => 'www.archive.org/|||The Internet Archive',
'internetseer' => 'www.internetseer.com/|||Internet Seer',
'linkalarm' => 'linkalarm.com/|||Link Alarm',
'mercator' => 'www.research.compaq.com/SRC/mercator/|||Mercator',
'openbot' => 'www.openfind.com.tw/|||Openbot',
'pingalink' => 'www.pingalink.com/|||PingALink Monitor',
'psbot' => 'www.picsearch.com/bot.html|||PicSearch',
'scooter' => 'www.altavista.com/|||AltaVista',
'slurp' => 'www.inktomi.com/slurp.html|||Inktomi',
'turnitinbot' => 'www.turnitin.com/robot/crawlerinfo.html|||Turnitin',
'slysearch' => 'www.turnitin.com/robot/crawlerinfo.html|||Turnitin',
'zeus' => 'www.waltbren.com/products/zeus_internet_robot.htm|||Zeus Internet Marketing',
'zyborg' => 'www.wisenutbot.com/|||WiseNut',
'teoma' => 'www.teoma.com/|||Teoma/Ask Jeeves',
'spider' => 'Web Spider',
'spyder' => 'Web Spyder',
'crawl' => 'Web Crawler',
'robot' => 'Web Robot'
);

foreach( $agent as $useragent => $agenturl )
{
if ( preg_match ("/^\d+$/", $useragent) )
{
$useragent = $agenturl;
$agenturl = "Search Engine";
}

if ( preg_match ("/". preg_quote ($useragent) ."/i", $match_agent) )
{
$agentinfo = preg_split ("/\|\|\|/", $agenturl);
if (!($agentinfo[1])) {
$agentinfo[0] = "http://www.robotstxt.org/wc/active.html";
$agentinfo[1] = "Web Robot $useragent";
}

switch ($agent_code) {
case 0:
return 1;
break;
case 1:
return $agentinfo[1];
break;
case 2:
return '</a><a href="http://'. $agentinfo[0] .'" alt="'. $agentinfo[1] .'"><i>'. $agentinfo[1] .'</i>';
break;
}
}
}

}

Or can I just tack inphinity's hack onto the end of functions.php after your no_sessionhash() function?

erdem
03-02-2003, 02:32 PM
hi ...
first of all great hack , tnx for it TECK ...

i have a visual problem at online.php , i aplied the fix that u released and aplied that first 3 optimizations ...

but when someone browsing my archive online.php displays something like ;
-> Unknown Location: /showthread/images/catbg.gif?
-> Unknown Location: /forumdisplay/images/catbg.gif?

any idea ? can i fix this ?

thanks

NexDog
03-02-2003, 02:35 PM
Teck posted that in his first post:

https://vborg.vbsupport.ru/showthread.php?postid=342218#post342218

TECK
03-02-2003, 02:42 PM
I personally blocked the forums for crawlers and renamed the archive.txt to archive.html because I don't want the crawlers to go at all to the forums.
http://www.teckwizards.com/archive.html

NexDog
03-02-2003, 02:49 PM
Why would you want to keep the bots out your forum? Am I missing something?

NexDog
03-02-2003, 03:00 PM
Google is having too much fun on our forum. Have 43 bots in. 16 Inktomi and 27 Google. :D

I also see a pattern. Inktomi was the first to arrive and they hit the Archive immediately. But those first 3 bots just sat on the Archive index and then went to the forums only and disappeared - they didn't crawl the threads.

Then Google came in and did the same thing on the forums. One bot came in and sat on index for an hour. It came back later with about 10 buddies and just sat on all the forums and disappeared. Now they are back in force and going crazy on all threads and one or two made it into the Archive and are just sitting on the forums but not going deep.

Pretty sure Google will back on the Archive just like Inktomi is right now.

TECK
03-02-2003, 03:16 PM
Originally posted by NexDog
Why would you want to keep the bots out your forum? Am I missing something?
Why would I want them to go to the forums? Those URL's are dropped anyway, because are not friendly.
I want them to go on the archive all the time because is the same content like the forums.

NexDog
03-02-2003, 03:18 PM
Ahhhhhhhhh, okay. But you allow them to index.php right?

TECK
03-02-2003, 03:19 PM
No, I blocked the /forum folder completly because my archive is on the main root:
[root] < archive.html
---[forum]

So in your case you want to rename the file to archive.html and block the rest of all files in /forum folder...
If you block the full forum folder, the crawlers will not have access to the archives anymore.

The nicest way to have your threads indexed properly is to install the vbHome (lite) script and it's archive add-on, in this way, the indexing is done directly onto your root. Better, since it's proven that the crawlers will index faster your threads if they are closer to the root...

NexDog
03-02-2003, 03:33 PM
Teck - why the need to rename to archive.html? I have 20 Inktomi spiders in there now so don't want to interrupt anything.

Might do on vbHome. Mainly doing this for added content and links and not so worried that everything gets listed in Google...

TECK
03-02-2003, 04:32 PM
I'm telling you what I did, only.
That doesn't mean you have to do the same like me...

NexDog
03-02-2003, 09:21 PM
Hey man, it's your hack so I just want to benefit from your skills. ;)

Woke up this morning to find all bots in the Archive - 30 googlebots doing their thing so me is a happy chappy. :D

Would just like to know the theory behind naming the archive with a .html extension.....

phenom
03-03-2003, 09:09 PM
alright, I'm getting a parse error after I add the add_to_functions.php to admin/functions.php

Any ideas?

phenom
03-03-2003, 10:25 PM
nevermind, a small little tiny typo

erdem
03-04-2003, 04:29 AM
Originally posted by NexDog
Hey man, it's your hack so I just want to benefit from your skills. ;)

Woke up this morning to find all bots in the Archive - 30 googlebots doing their thing so me is a happy chappy. :D

Would just like to know the theory behind naming the archive with a .html extension.....

it seems bots arent interested in visiting my site ;)
still waiting for bot rush but only sometimes googlebot10-11-12 visits my board then leave ;) they cant find archive ... they try to "retrieve password" , "register" and similiar things and they go out after visiting main page of archive finally ...

do i have to put every php file as it listed by TECK to robots.txt ... or "removing all those and allowing bots to visit all those" is better ?

thanks

NexDog
03-04-2003, 08:53 AM
What's your board's URL? I would be happy to take a look for you. And yea, put all php pages in the robots.txt except index.php.

erdem
03-04-2003, 09:48 AM
Originally posted by NexDog
What's your board's URL? I would be happy to take a look for you. And yea, put all php pages in the robots.txt except index.php.

http://www.trojanforge.net
vbhome installed as home.php ,
via htaccess top of domain is home.php
board index is at : http://www.trojanforge.net/index.php
archive is at : http://www.trojanforge.net/archive/

and for robots.txt ; as they will get no permission notices they can browse other pages from that pages i think via top and bottom links ... ermm ;)

NexDog
03-04-2003, 12:01 PM
Hmmm, don't know why they aren't finding the archive. I would implement the mod that links the on/off gif to the corresponding forum in the archive.

Dean C
03-04-2003, 06:15 PM
I've installed this 3 times for clients with absolute ease ...... until today ;)

In the server when i go to rename the archive.txt and then when i rename it to 'archive' it disappears from the file list. And upon a refresh it reappears as the old filename. So i thought i'd try uploading it as just 'archive' and not archive.txt and then in the ftp log thingy it says:

550 archive: Not a regular file

I don't know what to do or how to fix it :(

Regards

- miSt

erdem
03-04-2003, 06:46 PM
Originally posted by NexDog
Hmmm, don't know why they aren't finding the archive. I would implement the mod that links the on/off gif to the corresponding forum in the archive.

allright tnx , i added that also ...
i am thinkin of not editin robots.txt ... i think that wont cause anything only no permission pages ;) maybe bots pisses of and goes away ;)

anyway tnx for tips ... i will wait for next check/rush of bots ... i hope they will ;)

thanks for this great hack ...
greetz

Overgrow
03-06-2003, 06:36 PM
Got a lot of new guests from the archive, but not many are registering? Read this:

https://vborg.vbsupport.ru/showthread.php?s=&threadid=49721

klunderj
03-08-2003, 06:16 PM
I attempted this install. when I go to my archive it points my forums to .../forumdisplay/f-1.html, etc. and I get the "forum not specified" error.

My question... what process creates these .html files? Should my install have created them? or are these created dynamically?

iceman11111
03-09-2003, 04:22 PM
Great Hack!

Just installed it!

I was banging my head for about 2 hours trying to get it to work, Then I noticed that I missed the period infront of "htaccess", then bingo, everything is working great!

Thanks
:)

klunderj
03-09-2003, 09:49 PM
in step 3 of the installation, when you are to rename the files... are we supposed to rename them without any file extension?? Ie. forumdisplay rather than forumdisplay.txt or do they need the .php extension?

I am having problems getting this install to work.. my forums arent linked to anything... ie f-1.html is not found

-James

iceman11111
03-10-2003, 01:54 AM
How long does it usually take before the bots start invading?

JulianD
03-10-2003, 02:40 AM
I think I need to install this hack with some modifications ... Thanks teck.

iceman11111
03-10-2003, 01:18 PM
Originally posted by iceman11111
How long does it usually take before the bots start invading?

No one is crawling on my page :(

How long does it usually take?

limey
03-10-2003, 03:17 PM
You need to submit your archive to google et all.

iceman11111
03-11-2003, 04:04 PM
Originally posted by iceman11111


No one is crawling on my page :(

How long does it usually take?

Bump!!!

:)

mheinemann
03-11-2003, 04:15 PM
There is no set time. It took google about a month to crawl mine and another month for them to actually show up on google. Others have had theirs crawled days after they installed this hack. Just be patient.

Overgrow
03-11-2003, 05:26 PM
>>You need to submit your archive to google et all.

NO! Don't ever submit anything to Google. It will rank you higher if it finds you on it's own. Make sure you get other sites to link to yours and let the spider come naturally. To have your archive is listed, make sure it is linked FROM YOUR HOME PAGE. Even a small link is fine.

Thanks, Teck.. Dance, Google, Dance! I'm getting listed..

Schorsch
03-11-2003, 06:17 PM
Originally posted by Overgrow
To have your archive is listed, make sure it is linked FROM YOUR HOME PAGE. Even a small link is fine.

that's good to know. thanks for the tip!

NexDog
03-11-2003, 09:13 PM
I have the GoogleDeepCrawler on our forum right now. I'm wondering if I should add forumdisplay.php and showthread.php to the robots.txt as I don't know what Google will think if it spiders the forum and archive, finding the same content. Could drop one or the other. What has everyone else done with successful listings in Google?

NexDog
03-11-2003, 09:15 PM
Just to let you know that Google's Freshie and DeepCrawler come in on different IPs:

Overgrow
03-11-2003, 09:18 PM
So you're saying DeepCrawlers identify with "crawl#" and FreshCrawlers idenfity with "crawler#"? Thanks, good info.

>>I'm wondering if I should add forumdisplay.php and showthread.php to the robots.txt as I don't know what Google will think if it spiders the forum and archive, finding the same content.

I don't think so.. while they contain some of the same content, they are in different formats.. which should equal different content to the spider.

NexDog
03-11-2003, 09:25 PM
I see you had success and have your archive set up like mine. Did google's bots hit your forum and archive? Do you have showthread.php listed in your robots.txt?

NexDog
03-11-2003, 09:26 PM
WOW, PR7! What was your page rank before you messed with the archive? Due to our server move and new site, our dropped to 5 from 6 - not happy. :(

NexDog
03-11-2003, 09:31 PM
Looks like you're getting slammed:

Currently Active Users (977):450 members, 489 guests, and 38 web robots

I take it you have your own server. :D

Overgrow
03-11-2003, 09:35 PM
/me blushes at mention of his 7" PR

Google has just started digesting the archive. I started getting hits in from it a day or two ago so I'm not sure it has affected the PageRank yet. I'm not even sure the archive has been listed fully in Google, we may be in for the short dance right now, to be re-added later.

>>Did google's bots hit your forum and archive? Do you have showthread.php listed in your robots.txt?

Google has always gotten some of the posts since they are linked from the front page, then it finds the forums. It never got into the deepcrawl of the archive though. Funny, as I think I was the first person to release a vB archive hack that lots of people still use successfully... but I never linked mine from my front page so it was never picked up. Only when I saw Teck's was I inspired to get it working and linked.

But oh yea, no, I don't have anything special listed in robots.txt

>>38 web robots, I take it you have your own server.

hehe those robots have been here for weeks. Only in the past few days have I had some really high guest numbers. Usually members / guests is even. Now I have a hundred or more guests on than members thanks to the archive. Hopefully they are getting into the rest of the site as well.

Schorsch
03-11-2003, 10:05 PM
wow Overgrow, this must produce a huuuuge portion of traffic :bunny:

iceman11111
03-11-2003, 10:18 PM
How do you get your link on othere websites?

Overgrow
03-11-2003, 10:19 PM
Well I suppose you could hack into the other websites and put your link up.. or you can email the administrators and ask if they do link exchanges?

NexDog
03-12-2003, 03:23 AM
Had this bot on my forum for days:

argon.oxeo.com

Have no idea what it is. Does anyone know and should I deny it access?

NexDog
03-12-2003, 10:16 AM
Originally posted by Overgrow
So you're saying DeepCrawlers identify with "crawl#" and FreshCrawlers idenfity with "crawler#"? Thanks, good info.

Sorry, just caught this.

Google's Freshbot comes in on IP range 64.68.82 and the deep crawler that goes out in Google dance time comes in on 216.239.46.

Overgrow
03-13-2003, 01:39 AM
OK there is a point where the archive can work too well. I was getting hundreds of off-topic hits from Google searchers thanks to the spidering of my off-topic chat areas. The last thing I want is to expand the off-topic areas and get more general chatters in, so I added disallows to my robots.txt to the chatting archives and I made the actual archive threads offlimits in showthread.txt.. One question:

Will Google DE-list my pages once they are disallowed there?


I'm getting great results for things like:

Christina Agulera Naked (http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=christina+agulera+naked)
and
Internet Security Procedures (http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=internet+security+procedures)

glenvw
03-13-2003, 01:48 PM
My hat is off to you TECH!

5600 Pages (threads) indexed by Yahoo and Google.

You da man!

Question...I have to delete threads once in a while that are bad content. When someone lands on a deleted thread from the search hack, they get the page with no images, etc..

Is there a re-direct we could use that sends them somewhere else to fix that?

Thanks again!

Logician
03-18-2003, 07:42 PM
I don't know if this add-on was already provided in the thread or not, my apologies if it did:

This code keeps updating the original thread's view count even if your archive thread is visited:

Edit "showthread" (hack file! not showthread.php!), find:
eval( 'dooutput( "' . gettemplate( 'archive' ) . '" );' );

Before that add:


if ($noshutdownfunc) {
$DB_site->query("UPDATE thread SET views=views+1 WHERE threadid='$thread[threadid]'");
} else {
$shutdownqueries[]="UPDATE LOW_PRIORITY thread SET views=views+1 WHERE threadid='$thread[threadid]'";
}


Enjoy..

TECK
03-21-2003, 10:08 PM
03-11-03 at 11:25 PM NexDog said this in Post #467 (https://vborg.vbsupport.ru/showthread.php?postid=364429#post364429)
I see you had success and have your archive set up like mine. Did google's bots hit your forum and archive? Do you have showthread.php listed in your robots.txt?
Hmmm... Google results (http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=site%3Avbulletin.org+archive) for vBulletin.org Archive:
Results 1 - 10 of about 10,600. Search took 0.04 seconds

Google results (http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=site%3Ateckwizards.com+archive) for TeckWizards.com Archive:
Results 1 - 10 of about 727. Search took 0.06 seconds
(I don't allow the robots to visit any .php forum file and all archive files are still indexed. I have my archive set outside the forum folder.)

TECK
03-21-2003, 10:11 PM
03-13-03 at 03:48 PM glenvw said this in Post #477 (https://vborg.vbsupport.ru/showthread.php?postid=365485#post365485)
My hat is off to you TECH!

5600 Pages (threads) indexed by Yahoo and Google.

You da man!

Question...I have to delete threads once in a while that are bad content. When someone lands on a deleted thread from the search hack, they get the page with no images, etc..

Is there a re-direct we could use that sends them somewhere else to fix that?

Thanks again!
In your .htaccess file, add this line:
ErrorDocument 404 http://www.yoursite.com/

TECK
03-21-2003, 10:13 PM
03-13-03 at 03:39 AM Overgrow said this in Post #476 (https://vborg.vbsupport.ru/showthread.php?postid=365243#post365243)
OK there is a point where the archive can work too well. I was getting hundreds of off-topic hits from Google searchers thanks to the spidering of my off-topic chat areas. The last thing I want is to expand the off-topic areas and get more general chatters in, so I added disallows to my robots.txt to the chatting archives and I made the actual archive threads offlimits in showthread.txt.. One question:

Will Google DE-list my pages once they are disallowed there?


I'm getting great results for things like:

Christina Agulera Naked (http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=christina+agulera+naked)
and
Internet Security Procedures (http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=internet+security+procedures)
Yes, Google will drop them after a while (2 months) if you block certain archive folders in robots.txt file. It did for my archive.

VampireMan
03-21-2003, 10:15 PM
I have this hack installed & i must say .. impressed no end.

My target was getting in a top ten of a specific search

yahoo.co.uk & search for wedding forum in the uk , raked no.2.

Well cool :-)

Floris
03-21-2003, 10:26 PM
Searched pages from creations.nl for archive. Results 1 - 10 of about 3,540. Search took 0.09 seconds.


and i am not even using your hack no more but the one that comes with vb3 :)

ladyfyre
03-22-2003, 01:27 AM
ok...i have a question.....how many pages do Google & inktomi crawl at once? They have been crawling us every day almost all day anywhere from 2-10 instances each for a while.......how long will it take for it to finish?

NexDog
03-22-2003, 08:26 AM
I had Google on our forum for 2 weeks straight.

NexDog
03-22-2003, 08:28 AM
Even though we've been crawled twice on 2 separate dances, our archive still doesn't appear in Google. :(

mheinemann
03-22-2003, 10:07 AM
It takes a while for it to show up in google.

TECK
03-23-2003, 06:02 PM
It took for vBulletin.org site 3 months to display the 10,000 links.

ladyfyre
03-26-2003, 12:40 AM
hmmmm....well we show up with about 7,500 matches, but they are STILL going....do they do EVERY page? i am not sure they are crawling as fast as the threads get posted :(

keress
03-27-2003, 02:06 AM
I wanted to use this hack, but I can't get mod_mime working on my server. My provider said it seems to be conflicting with mod_php. Anybody know what can be done about this?

TECK
03-27-2003, 12:17 PM
Yesterday at 02:40 AM ladyfyre said this in Post #489 (https://vborg.vbsupport.ru/showthread.php?postid=373392#post373392)
hmmmm....well we show up with about 7,500 matches, but they are STILL going....do they do EVERY page? i am not sure they are crawling as fast as the threads get posted :(
You will have them all crawled... patience. :)

drumsy
03-28-2003, 01:25 PM
I'm experiencing problems with max SQL connections, etc. and Google has been crawling for not days but weeks. Is this normal and is there a way to keep the archives up but "prevent" access (temporarily) for the Googlebots?

TECK
03-28-2003, 06:24 PM
Not really. Google bots will not kill your site, is like having 30-40 guests visiting your site, which is not a lot...

Kevorkian
03-29-2003, 04:32 AM
htaccess don't work for me :( http://www.animesaga.com/forum/archive.php

NexDog
04-03-2003, 08:08 AM
Works for me and look great it does. :)

Does anyone have a list of search engine spider IPs? Had this bot on our site for a while:

trek30.sv.av.com

Ip range: 216.39.48

You think this is Alta Vista?

Courage
04-04-2003, 02:33 PM
03-29-03 at 08:32 AM Kevorkian said this in Post #494 (https://vborg.vbsupport.ru/showthread.php?postid=374852#post374852)
htaccess don't work for me :( http://www.animesaga.com/forum/archive.php


I had the same problem try to rename htaccess to .htaccess

Logikos
04-05-2003, 03:21 AM
After installing this hack, i get this error on every page.

Warning: Cannot modify header information - headers already sent by (output started at /home/hazelboy/public_html/Forums/admin/functions.php:4916) in /home/hazelboy/public_html/Forums/admin/functions.php on line 3227


and my members see this

Fatal error: Cannot redeclare archive_nopermission() (previously declared in /home/hazelboy/public_html/global.php:237) in /home/hazelboy/public_html/Forums/admin/functions.php on line 4803


Help :(

Other then that it works perfect

eva2000
04-06-2003, 03:16 AM
well after a while Fastweb/Alltheweb has indexed 46,636 pages from my vB forums here (http://www.alltheweb.com/search?q=%2Burl.all%3Ahttp%3A%2F%2Fanimeboards.com +%2Bsite%3Aanimeboards.com&c=web&cs=utf-8&co=1&no=off&l=any) :D

NexDog
04-06-2003, 06:23 AM
2.3.0 comes with its own archive, right? Wodering about upgrading to 2.3 or 2.2.9...decisions, decisions...

Boofo
04-06-2003, 06:25 AM
That's news to me. Are you sure it comes with it?