The Arcive of Official vBulletin Modifications Site.It is not a VB3 engine, just a parsed copy! |
|
Spider Watcher Details »» | |||||||||||||||||||||||||
Spider Watcher
Author: Mikel Beck (mikel.beck@elite-computing.net) This hack keeps track of the spiders (Search Engine robots) that visit your fourm. Every time a guest visits a page, the guest's IP address, user agent and the page they visited are logged to the database. When somebody views the spider statistics page, this data is "rolled up", meaning the raw data is collated, the spider's name is determined by comparing the user agent to data contained in the spiders_bulletin.xml file, and the number of pages and visits is summarized and writted back to the database. In addition, and data from non-bots is removed. The data is then displayed in a easy to read format for your viewing pleasure. If the user viewing the report has permissions to view IP addresses, these are displayed as well. A live version of the report from one of my sites can be seen here: http://www.happyhourpub.com/spiders.php Also see the attached screenshot for an exmaple. Revision History: 1.0.0 Beta 1 - 01/05/2006 - Initial Release 1.0.0 Beta 2 - 01/06/2006 - Included templates for spiders.php - Removed text from templates, added them as phrases 1.0.0 Beta 3 - 01/07/2006 - Split up the display of "known" and "unknown" spiders 1.0.0 Beta 4 - 01/25/2006 - Corrected potentional SQL injection issue in plug-in - Reduced the number of SQL queries required to display statistics - Corrected date/time display issue 1.0.0 Beta 5 - 02/01/2006 - Reduced the number of SQL queries required to display statistics 1.0.0 Beta 6 - 02/08/2006 - No release 1.0.0 Beta 7 - 02/11/2006 - Corrected issue with "unknown" spiders not being displayed properly. - Added tracking of the type of spider (searchspider, link checker, etc) 1.0.0 Beta 8 - 02/19/2006 - Change the display of IP addresses to be a pop-up so they're all not displayed on the main page. - Combined the spiders that have the same name but different user agents. 1.0.0 Beta 9 - 03/10/2006 - Changed the display to group similar spiders together (search spiders, http check spiders, etc) 1.0.0 Beta 10 - 08/08/2006 - Changed how the rollup functions. Instead of rolling up every time somebody views the spider page, it rolls up once per hour. - Corrected a few bugs here and there, mostly related to removing entries from the database. Installation Instructions 1. Upload spiders.php to the root of your forum. 2. Upload spiders_rollup.php to the includes/cron directory. 3. Import the file product-spiderwatcher.xml using the Manage Products module. 4. Add a link to spiders.php on your navbar or footer. 5. Add a cron job with the following information: Title: Spider Watcher Rollup Day of the Week: * Day of the Month: * Hour: * Minute: 0 - - - Log entries: Yes Filename: ./includes/cron/spiders_rollup.php Upgrade Instructions 1. Upload (and overwrite) spiders to the root of your forum. 2. Upload spiders_rollup.php to the includes/cron directory. 3. Import the file product-spiderwatcher.xml using the Manage Products module. Make sure the "Allow Overwrite" option is set to "Yes". 4. Add a link to spiders.php on your navbar or footer. 5. Add a cron job with the following information: Title: Spider Watcher Rollup Day of the Week: * Day of the Month: * Hour: * Minute: 0 - - - Log entries: Yes Filename: ./includes/cron/spiders_rollup.php ***UPGRADE NOTE*** When you upgrade from version 1.0.0 Beta 7 to 1.0.0 Beta 8 your existing spider data will be lost! To make sure that you can decode the maximum amount of spiders, you should grab the latest spiderlist.xml and replace the spiders_vbulletin.xml file in your forumhome/includes/xml/ directory with the one from this thread: http://www.vbulletin.com/forum/showthread.php?t=76662 Supporters / CoAuthors Show Your Support
|
Comments |
#292
|
||||
|
||||
Good hack, installed.
|
#293
|
|||
|
|||
What does everyone recommend I do with this nasty beast:
Unknown Spider Mozilla/5.0 (000000000; 0; 000 000 00 0 000000; 00000; 0000000000) 00000000000000 000000000000000 82.48.249.204 82.56.186.64 Is there an ID for it? If it won't identify itself should I just start blocking the IPs? Can I restrict them in my robots.txt? Thanks! |
#294
|
|||
|
|||
personally im about to remove this hack... it has high queries and it sucks they dont update...
|
#295
|
|||
|
|||
Quote:
But won't the high number of queries on affect the server when you load the spider page? I only allow admins to load it, so people shouldn't be hitting it all the time. |
#296
|
|||
|
|||
yeah they affect the server load... im removing it since i dont think it will be updated anytime soon... if it is then i may reinstall but as of right now it sucks editing from sql or having 500 bots unknown
|
#297
|
|||
|
|||
But it only affects the server when you actually load the page right? Its not an ongoing thing with cron or anything is it?
|
#298
|
||||
|
||||
It's not "stupid" that it doesn't update, the installation instructions say to update your spiders xml file BEFORE you install this. If you do that, then you won't have anything to update, as it'd be using the latest data.
The only time there should be a lot of queries is when it hasn't "rolled up" the data recently. After that first hit the number of queries should drop. Quote:
Look, people, I wrote this thing for myself. I posted it here thinking that other people may benefit from it. If you want to use it, use it. If you don't, then don't. But don't go bashing it or my code. If you think you can do better, then write your own. I will update it when I can, right now my top priority is finding a new job so I can support my family. Updating the the freebie hacks that I write when the mood strikes me aren't even in the top 10 of my list of priorities right now. {edit} One other thing... The first version that I put out deciphered the spider data when the user viewed the spiders page. If the spiders xml file had been updated after the data was collected it would be interpreted properly. But that version had way too many queries and everybody complained about it. Now the spider data is decoded when the spider visits and it's written to the database. The spiders page just collates all of that data and doesn't do any deciphering. |
#299
|
|||
|
|||
This hack is great bro.The only real issue i see with it is that sometimes the date and time doesnt get updated on a fresh spider visit to the forum.Sometimes the date reflects the true last visit and sometimes it doesnt.That is about the only thing i wish could be fixed up.Other than that,this hack is excellent and dont worry about harsh opinions of it,i cant wait to see the next update
|
#300
|
|||
|
|||
Mike,
I appreciate this hack, it gives me good visibility to what the spiders are actually doing. I'm not a very good coder so I can't improve on your work. My previous message was just stating that 58 queries is an extraordinarily large amount for such a simple page. It was my impression that the data would be regenerated when the spider page was loaded only, now I realize that its doing a lot of work each time a spider hits the site. Thanks again. |
#301
|
|||
|
|||
great hack, but there lays a very crtitcal problem for me at least. When viewing the page.
Page generated in 0.59078 seconds with 50 queries [Server Loads: 0.41 0.22 : 0.10] Thats quite alot of quires. The more spiders it fetches, the more queires this adds. |
|
|
X vBulletin 3.8.12 by vBS Debug Information | |
---|---|
|
|
More Information | |
Template Usage:
Phrase Groups Available:
|
Included Files:
Hooks Called:
|