Log in

View Full Version : robots.txt Manager


MUG
02-07-2003, 10:00 PM
This script allows you to easily create a dynamically generated robots.txt file, based on specified rules.

If you use this hack, please click 'Install' :)

Screenshots will be attached...

MUG
02-08-2003, 07:00 PM
Control Panel

MUG
02-08-2003, 07:01 PM
Editor (All Robots)

MUG
02-08-2003, 07:01 PM
Editor (Specific Robot)

MUG
02-08-2003, 07:01 PM
Generated File (:banana:)

Neo
02-08-2003, 07:08 PM
Excellent G.

Dean C
02-08-2003, 07:31 PM
Wow very nice!

wooolF[RM]
02-08-2003, 08:04 PM
]very good job, tho' I can edit one txt file by hands :) No offence, again, very good job :)

May I ask you if you have the list with IPs of some major/all search engines? I kinda need it :) Thanx a lot and again, a nice hack!

MUG
02-08-2003, 08:07 PM
The point of the hack is to make administration easier. It keeps track of robots requesting the robots.txt file, allowing you to ban or restrict a bot without having to dig through the server logs. I wrote the hack today, so the only bots included in the database already are the ones that spidered my site during that time. The list is in the .sql file.

wooolF[RM]
02-08-2003, 08:37 PM
]thanx for the answer :)

djr
02-08-2003, 08:50 PM
Is it suppose to write a new robots.txt file everytime or do the bots see the robots.php file as robots.txt?

If your answer is it's suppose to write a new robots.txt file, it isn't working for me :-(

And: do I still need a robots.txt file?

MUG
02-08-2003, 09:06 PM
Originally posted by djr
Is it suppose to write a new robots.txt file everytime or do the bots see the robots.php file as robots.txt?

If your answer is it's suppose to write a new robots.txt file, it isn't working for me :-(

And: do I still need a robots.txt file? It uses mod_rewrite to send requests to robots.php. You have to create an .htaccess file with the following:

RewriteEngine on
RewriteRule robots.txt /robots.php (note: this is for the old version. read the new install file :))

Upload robots.php to the root web directory (usually public_html). Make sure you run robots.sql using phpMyAdmin. :)

djr
02-08-2003, 09:33 PM
I did that already ;)
So the robots are redirected to robots.php which is feeding them a perfectly rendered robots.txt file? Sorry 'bout asking, but I don't want to break my (high) ranking(s).

- djr

MUG
02-08-2003, 10:10 PM
Originally posted by djr
I did that already ;)
So the robots are redirected to robots.php which is feeding them a perfectly rendered robots.txt file? Sorry 'bout asking, but I don't want to break my (high) ranking(s).

- djr Yup. The only difference is the X-Powered-By header generated automatically by PHP.

BigCheeze
02-08-2003, 10:16 PM
Thanks! I just installed it. See if I can control those bot's a little more!!

SphereX
02-09-2003, 02:09 AM
very nice!

***installs

djr
02-09-2003, 09:51 AM
Hi MUG,

Can you add another column 'Owner' and 'Origin' (or whatever you might want to call it) where we can add the owner and origin of the spider?

For example:

googlebot | Googlebot/2.1 (+http://www.googlebot.com/bot.html) | 216.239.46.19 | Google | http://www.google.com | 4 Edit - Delete |


Not every spider describes itself fully. e.g. Mercator-2.0 is one of Altavista's robots, but there's no link to Altavista whatsoever.

Thanks,
- djr

djr
02-09-2003, 09:54 AM
I found some good overviews of spiders here (http://www.robotstxt.org/wc/active/html/index.html) and here (http://www.devmag.net/suchmaschinen/robots_namen.htm). If anyone has more of these lists, please add them to this thread.

Thanks,
- djr

MUG
02-09-2003, 10:24 AM
Originally posted by djr
Hi MUG,

Can you add another column 'Owner' and 'Origin' (or whatever you might want to call it) where we can add the owner and origin of the spider?

For example:

googlebot | Googlebot/2.1 (+http://www.googlebot.com/bot.html) | 216.239.46.19 | Google | http://www.google.com | 4 Edit - Delete |


Not every spider describes itself fully. e.g. Mercator-2.0 is one of Altavista's robots, but there's no link to Altavista whatsoever.

Thanks,
- djr Ooh, thanks. I was wondering what Mercator-2.0 was. :paranoid:

I'll add a description field, but there's not enough room for it to show on the main page so you'll have to click edit to view it.

MUG
02-09-2003, 10:40 AM
Version 1.0 final released. :pirate:

MUG
02-09-2003, 11:26 AM
Can this thread be moved to the Full Releases forum?

Velocd
02-09-2003, 04:31 PM
I have a slight problem with googlebots, and that is they storm my forum by huge numbers. Currently, for example, I have 7 googlebots crawling my forum. That seems purely excessive to me, and I would like to somehow limit the amount of googlebots to maybe 2.

What is the command line for robots.txt to do this? Or maybe there is some other alternate method.

Thanks ;)

MUG
02-09-2003, 04:42 PM
Originally posted by Velocd
I have a slight problem with googlebots, and that is they storm my forum by huge numbers. Currently, for example, I have 7 googlebots crawling my forum. That seems purely excessive to me, and I would like to somehow limit the amount of googlebots to maybe 2.

What is the command line for robots.txt to do this? Or maybe there is some other alternate method.

Thanks ;) Honestly, I don't think that is possible with robots.txt. If you created something that would dynamically insert text into a robots.txt file based on the number of Googlebots spidering your site, Google might "take the hint" and never come back. :ermm:

Velocd
02-10-2003, 01:43 AM
Drat.. :ermm:

Wish it were possible somehow, oh well. My current bandwidth is being consumed quicly by these googlebots, so I guess I'll simply have to restrict them from the threads.

Automated
02-10-2003, 11:40 AM
Originally posted by Velocd
Drat.. :ermm:

Wish it were possible somehow, oh well. My current bandwidth is being consumed quicly by these googlebots, so I guess I'll simply have to restrict them from the threads.

restricting them from the threads :confused: whats the point of getting spidered then ?

djr
02-11-2003, 08:04 PM
We have two different domains, but only one MySQL-database. Is it possible to place the robots.php on both the domains (and thus using the same tables)?

- djr

djr
02-13-2003, 08:46 AM
Already found it. Just rename the robots_log table to robots_log_domain1 and create another one with _domain2 and update changes in robots.php.

- djr

mheinemann
02-16-2003, 02:48 PM
Installed, works great!

MUG
02-16-2003, 10:54 PM
Glad that you like it. :cool:

Any suggestions? :)

mheinemann
02-17-2003, 12:19 PM
The only suggestion I can think of is being able to import your current robots.txt

I had disallowed "turnitin" and would like to be able to still block them.

MUG
02-17-2003, 04:56 PM
Originally posted by mheinemann
I had disallowed "turnitin" and would like to be able to still block them. I thought that I already included TurnitinBot in the .sql file?
Originally posted by mheinemann
The only suggestion I can think of is being able to import your current robots.txt
Good idea... it shouldn't be too hard to implement. :)

mheinemann
03-01-2003, 11:47 PM
And maybe being able to manually edit it as well.

stryka
03-09-2003, 01:18 AM
My current robots.txt file is not being overwritten when I click submit?

I made the changes to the .htaccess file... is there anything else i should look @ ??

Thanx

MUG
04-12-2003, 02:05 PM
1.1 Beta released. It includes the following bug fixes / additions:[list=1] Stripping of comments from generated file (although it is in the robots.txt specification, some bots choke on them)
Repairs newlines in generated file (old version sometimes produced \r\r\n)
Cleaner interface for control panel
Several other things I can't remember :confused:[/list=1]

Mickie D
05-29-2003, 12:36 PM
thanks for this hack its very useful and i think it should be a full release :)

i have one problem and it has nothing to do with your script :)

where can i find info on what bots i should ban i never had turnit in bot banned b4 .. why is that bot bad ???

PixelFx
05-29-2003, 04:34 PM
very cool, now I don't need to do this manually all the time ;) thank you for sharing :)

stryka
07-30-2003, 09:00 PM
I get an error after i updated to 1.1

Fatal error: db_connect(): Failed opening required '' (include_path='') in /home/name/public_html/robots.php on line 63

MUG
07-30-2003, 09:38 PM
Did you change $vB_Config_Path to the correct path?

daFish
10-07-2003, 08:06 AM
Great addition.
Ar their any plans for a new version?

-Fish

sabret00the
11-04-2003, 07:19 PM
nice little hack this :)

gmarik
11-12-2003, 06:00 PM
.htaccess error

I wanted to have "club/rules"

RewriteEngine On
Options +FollowSymlinks
RewriteBase /
RewriteRule ^club/rules\/?$ /announcement.php?s=&forumid=10 but this does not works, other rewrites work, but I could not make it either with forum names (club/f12, club/f45 taking the ID's)

peterska2
12-01-2003, 01:00 PM
I like this idea but how do I create a .htaccess file?

Is it just an extension like .txt or .php ?

Or is it something completely different?

*blushes* Still a n00b to this *blushes*