PDA

View Full Version : crawl-0c.cuil.com


alaska_av8r
01-19-2010, 03:02 AM
Is anyone else having problems with this search engine crawler. I am assuming thats what it is, I did a google search and came up with cuil.com and that is what it says.

This thing is constantly on my site, and I keep getting mysql database errors from it. That IP addy listed below is registered to crawl-0c.cuil.com.


Database error in vBulletin :

mysql_connect() [<a href='function.mysql-connect'>function.mysql-connect</a>]: Lost connection to MySQL server at 'reading initial communication packet', system error: 111
/home/marin49/public_html/includes/class_core.php on line 312

MySQL Error :
Error Number :
Request Date : Monday, January 18th 2010 @ 11:28:16 AM
Error Date : Monday, January 18th 2010 @ 11:28:16 AM
Script : http://www.boatinghowto.com/external.php?type=RSS2&forumids=38
Referrer :
IP Address : 216.129.119.10
Username :
Classname : vB_Database
MySQL Version :

BSMedia
01-19-2010, 04:39 AM
Create a robots.txt file and place in it. Or if you have a robots.txt simply append it to the end.

User-Agent: twiceler
Crawl-delay: 30

You can also completely block the crawler, since their search sucks any way, but that's up to you.

alaska_av8r
01-19-2010, 09:09 PM
thanks bsmedia, I may just go ahead and block them. But just curious what does the robots.txt file actually do. I am new to this so excuse me if it is a dumb question...lol

tim

BSMedia
01-20-2010, 12:17 AM
<a href="http://robotstxt.org" target="_blank">http://robotstxt.org</a>

Robots.txt is a file that reputable search engine spiders follow as a set of instructions of directories to include in their search results, to delay their requests to pages, etc.

I forgot to mention it needs to be placed in your top level directory, so you'll access it from domain.com/robots.txt or subdomain.domain.com/robots.txt

alaska_av8r
01-20-2010, 03:28 AM
Thank you, I will do a google search on that and see if what I need to put in there and learn all the ins and outs.