PDA

View Full Version : Image Status Checker / Dead Image Finder


bairy
08-15-2006, 10:00 PM
Note this hack works with vb3.6


What does this do?
It scans all your posts, extracts all the img tags, and scans each of the images to see if they're still valid.


Why?
I had a look at all the images on my site and was alarmed at how many were now gone - deleted from photobucket accounts etc. Since the only way you can check the images on your board is to manually read every post, I decided to come up with a better way... and this is it.


How does it work?
The first part: In the AdminCP, under Maintenance and Update Counters... right at the bottom is this hack. It works by looking up every img tag, then requesting the image, and reading the http status code. So code 200 means 'image ok', 404/410 means 'image gone' etc. That then gets stored in a database table. A server has 15 seconds to reply to the request or the status is labelled as "Unknown"
The second part: The browsing element, imagestatuscheck.php (original filename huh!). This allows you to browse all the images found in the last scan using some powerful filtering (statuses to display, search, order by).


Hack features

General
Fully phrased.
Templates are grouped. Who's online handled.
Part 1 - Admin
Reads the post table, scans all the [img] tags on demand and records the actual http status code returned.
If it gets stuck during the scan, you can restart the section it's currently doing.
If an image appears in more than one post, it's only checked once.
Start from, per page and timeout options for scanning.
Part 2 - Browser
Status codes are put into one of three descriptions for simplicity: Working, Dead, Unknown. Unknown is if the server didn't respond or similar - on the basis that a temporary timeout doesn't necessarily mean the image has gone.
In the browser, image urls are force wrapped. Unless people post using all caps, you have a low screen resolution, or the font size is big, the table should never stretch.
Filtering allows you to show just the working/dead/unknown images, and there's a search facility for a variety of fields.
Convenient link to edit the post (if a dead link is found). This works by can_moderate - edit links only appear for people who own the post, or can moderate the forum it's in.
Works by canview - if someone can't view a particular forum (e.g. staff forum) normally, they can't view the images within it.
Uses css for common stuff to reduce the size of the outputted pages.



Bad Things
It's far from a perfect hack, there are many things to do. Please be aware that I won't be doing them, but if anyone else wants a crack, feel free!

Only supports http://, not https://
Can only handle replies like: HTTP 1.x 200 as the first line.
Only supports [img] tags. If you have HTML turned on in any forums it won't see <img src=> images.
Biggie: There's no way to update a single post or image without a full re-scan. That means if someone edits their post to update or remove a dead link, it will not change on the browser until a full re-scan is done. I did play with various update methods but most are flawed in one way or another. A planned feature will be to update the table dynamically whenever a post is made, edited or deleted, and on demand using a link.
No cron job.
No session variables. (People without cookies will be logged out a lot).



Footnotes
Originally I planned to throw something together quickly just for me to use but it turned into a "I may as well make a nice interface... oh and I may as well put some filtering controls in and I ..."


A [url] link checker can be found here


Installation
Upload imagestatuscheck.php to your vB directory. Install the product, set overwrite to yes.


Customizing

By default it's set to only allow moderators, super-moderators and administrators to view the browser. This can be changed with the setting in AdminCP > vB Options.
The phrases all start with ics_ if you want to change them.
You can add a link to imagestatuschecker.php on the navbar (or anywhere) if you want your members to be able to view it.



Screenies
Shot 1 is AdminCP during scan
Shot 2 is a typical Browser section output
Shot 3 is no results output


Changelog
See attached file for specific changes.
1.00 - 16th August 06
1.01 - 17th August 06
1.02 - 27th December 06

ChrisSy
08-16-2006, 02:27 PM
Looks like a very well made hack, and i dont mean to offend you but im a bit unsure of its use. Once you've found the posts mssing images, then what?

Is it possible to include a feature that scans threads for off-site linked images and then backs the images up into a folder on your server.

That way you can restore them when the img uploader sites decide to delete them.

bairy
08-16-2006, 03:05 PM
Looks like a very well made hack, and i dont mean to offend you but im a bit unsure of its use. Once you've found the posts mssing images, then what?
Whatever you like. All this script does is tells you if images linked in posts are working or not. If not, you (or the post owner) can edit the post to either update the link or delete it.

Is it possible to include a feature that scans threads for off-site linked images and then backs the images up into a folder on your server.
I should think so but it's not something I'll be developing.

Jay...
08-16-2006, 04:35 PM
is there anyway this can be done for all links? Thats what i am looking for

bairy
08-16-2006, 05:04 PM
I'll probably knock one out for [url=] at some point, the code won't be too different.

Jay...
08-16-2006, 05:13 PM
I'll probably knock one out for [url=] at some point, the code won't be too different.

nice one, if i press install will you be keeping us updated?

ntock
08-16-2006, 06:11 PM
Looks cool, I'd install if it'd replace all dead images with an image stored on your server which looks like "3rd party image not hosted anymore." etc. Great work though :)

Gryphon
08-16-2006, 06:49 PM
Get an error on scan. Found the offending post, but you might want to account for the odd duck who tries to post weird urls.

Also got an error when someone said [img] in their post and then later put an existing [img]http://img.jp[*/img], it tried to insert the following into the database: in their post and then later put an existing [img]http://img.jp[*/img]

Database error in vBulletin 3.6.0:

Invalid SQL:
INSERT INTO vb3_imagestatus VALUES (NULL, 87423, 1510, 'javascript:ShowLarge('/path/to/image.jpg');', '');

MySQL Error : You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near '');', '')' at line 1


and

Invalid SQL:
INSERT INTO vb3_imagestatus VALUES (NULL, 99805, 63, 'http://fakemeit'sprobablyaredXyoudope.jpg', '');

MySQL Error : You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near 'sprobablyaredXyoudope.jpg', '')' at line 1

bairy
08-16-2006, 06:56 PM
Jay... : yes
ntock : good suggestion.. though I'd rather leave the original url in so it can be corrected by the post owner if it's just been moved.
Blackjack : Looks like I forgot to escape the string to account for those dodgy urls. A job for the next release.

Gryphon
08-16-2006, 07:03 PM
There was also another issue, I edited my post.

Mr Chad
08-16-2006, 07:03 PM
wouldnt this use alot of bandwidth?

rmxs
08-16-2006, 07:03 PM
Thanks installed :)

rmxs
08-16-2006, 07:07 PM
OK i try it it works byt i get many worning links with Unknown status

Y this happents?

Can you tell me how can i add it also to navbar for moder admin smoder groups only??
i mean if there is no 5,6,7 dont show the link

EDIT:

Ok i made it its easy LOL

<if condition="$bbuserinfo[usergroupid] == 6">

<td class="vbmenu_control"><a href="imagestatuscheck.php">DIF</a></td>

</if>

bairy
08-16-2006, 07:51 PM
Chad : Each image is requested one by one and only the first 12 characters of the return are read, as they are the ones with the status code in them. After that the connection is closed. Theoretically it will use output about 200 bytes and input 12 bytes per request. Practically I don't know how web servers work, but I suspect once php has sent a close to the other server the transfer will stop. So no, not much bandwidth

ForYou
08-16-2006, 08:23 PM
Hello ,

there is error ,

Database error in vBulletin 3.6.0:

Invalid SQL:
INSERT INTO imagestatus VALUES (NULL, 172959, 3498, 'http://www.dohaeye.com/lyrics/3'air%20elnass.jpg', '');

MySQL Error : You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'air%20elnass.jpg', '')' at line 1
Error Number : 1064
Date : Wednesday, August 16th 2006 @ 09:21:55 PM
Script : http://www.dha.net/moda/admincp/misc.php?do=check_image_status&cis=findem
Referrer : http://www.dha.net/moda/admincp/misc.php?do=check_image_status
IP Address : 213.6.1.100
Username : Admin
Classname : vb_database

bairy
08-16-2006, 10:33 PM
The sql errors are because the image url isn't escaped. Silly oversight. I'll probably get an updated product out tomorrow along with a couple of other changes.

EasyTarget
08-16-2006, 10:58 PM
on a related note, what about scanning all posts for img tags, rehosting all remote images at a local location and editting the posts with the new url so that you don't have to worry about the images going away?

bairy
08-17-2006, 06:42 AM
OK i try it it works byt i get many worning links with Unknown status

Y this happents?
Sorry. Missed this question.
Unknown generally means one of two things:
1. Server didn't reply within 15 seconds
2. Server didn't send a nice http header back


on a related note, what about scanning all posts for img tags, rehosting all remote images at a local location and editting the posts with the new url so that you don't have to worry about the images going away?
Copyright issues. The idea is quite good though. I might include a way to allow you to manually do that.

rmxs
08-17-2006, 02:11 PM
Thanks bairy

Mr Chad
08-17-2006, 05:21 PM
Chad : Each image is requested one by one and only the first 12 characters of the return are read, as they are the ones with the status code in them. After that the connection is closed. Theoretically it will use output about 200 bytes and input 12 bytes per request. Practically I don't know how web servers work, but I suspect once php has sent a close to the other server the transfer will stop. So no, not much bandwidth
ahh thanks that clears it up, good job coding it.

bairy
08-17-2006, 06:32 PM
Updated to 1.01 to clear up the early bugs and improve a few things:

- Misc: Install code creates empty db table
- Misc: Corrected silly oversight to reduce db errors (escaping image urls)
- Scanner: Added options to maintenance section
- Scanner: Rewrote quite a bit of the code to work with the new options
- Browser: Added "you haven't scanned yet" warning if the table is missing (unlikely but best to be handled)
- Browser: isc_no_results template wasn't included in the 1.00 product for some reason. It is now and is used when there are no results
- Browser: Added a perpage, lower limit 5, upper limit 100. Outside these and it defaults to 30

Reupload imagestatuscheck.php. Reimport the product xml with overwrite set to yes.

Snatch
08-18-2006, 05:17 AM
If I click on "search/Filter" it blinks an than it shows me the startscreen of imagestatuscheck.php but no resulst.

What is wrong?

GreeTz
Snatch

bairy
08-18-2006, 07:21 AM
Have you run a scan first?
If so, how many images did it scan?

Snatch
08-18-2006, 07:43 AM
LoL sorry, my fault.

GreeTz
Snatch

Snatch
08-18-2006, 07:50 AM
2 more questions.

1:
now i runed the process for find death images.
But when I go to the .php File i get this Error

Datenbankfehler in vBulletin 3.6.0:

Invalid SQL:

SELECT
i.id AS iid, i.postid, i.userid, i.imageurl, i.status,
u.username,
t.forumid, t.title AS threadtitle,
f.title AS forumtitle
FROM imagestatus i
LEFT JOIN user u ON (i.userid = u.userid)
LEFT JOIN post p ON (i.postid = p.postid)
LEFT JOIN thread t ON (p.threadid = t.threadid)
LEFT JOIN forum f ON (t.forumid = f.forumid)
WHERE i.`status` IN (401,402,403,404,405,406,407,409,410,411,412,413,4 14,415,416,417,000,0,100,101,201,202,203,204,205,3 00,301,303,305,400,408,500,501,502,503,504,505)
AND t.forumid NOT IN (0)

ORDER BY u.username asc
LIMIT 0, 30;

MySQL-Fehler : Got error 28 from storage engine
Fehler-Nr. : 1030
Datum : Friday, August 18th 2006 @ 10:48:58 AM
Skript : http://www.celebritymarkt.de/imagestatuscheck.php
Referrer :
IP-Adresse :
Benutzername :
Klassenname : vb_database


Or is it so, that I can only use the php file if the search are finisched ?
562,783 images remaining Muhahaha

2:
What means the text "duplicate / dealt with" behind the ImageUrl?
Show Attach!
The first 2 Pages are o.k. but then only "duplicate / dealt with"

GreeTz
Snatch

bairy
08-18-2006, 09:54 AM
Error code 28 means no more space left. Either the hard drive ran out of space or your allowed disk space maxed itself.
If you really have 562k images, and I believe you do, then that's not really a surprise as the script creates a new table with all the images in it. I have 1300 images and takes up about 170k. So multiplying it up there's probably a table size of 70mb or so.

However it obviously managed to get some images in at least.

The duplicate/dealt with message comes up because:
Lets say you have one image and it's been linked in 2 posts. There's no point scanning the same image twice since one scan will tell us if it's valid. Therefore it's scanned once and if the image comes up again it's counted as 'duplicate' or 'dealt with' (they mean the same thing in this case).
Another reason is if you resume a scan (not restart it). As it will already have scanned some of the images and they'll be classed as "dealt with".
If you have a lot of images saying that then it could be because you're doing another scan but not from the start, or it could be related to the error 28, depending what got inserted and what didn't.

osso12
10-26-2006, 04:04 AM
Does this work with VB 3.6.2?
If so, everytime I run a scanner, and then run statuscheck.php,
I get:
You haven't run the scanner yet. You will find it in the Admin Control Panel under Maintainance -> Update Counters, at the bottom.
Non-admins don't see this message.
Tried a hundred times, but keeps doing the same thing.:down:

image status checker in vb options: 5,6,7
I need to get this to work.
Please someone help.

bchertov
12-18-2006, 03:32 PM
{I first posted my query in the URL checker thread}

Hi,

I have a custom HTML Daily Digest that includes Images that are inserted using {IMG} tags. I want to prevent images from forcing the Digest to be too wide because they are over 750 pixels wide. I can resize it in the digest if I know the image is too wide. So I'm looking for some code that will tell my how wide an {IMG} is. Can this hack help me? Can you help me?

Thanks!
Barry

bairy
12-18-2006, 04:47 PM
Ahhh now I see.

I've just realised that basically, no.
I think that in order to get the dimensions of an image, the server would have to fully download it and then analyse it as the information isn't included in the http headers. That would drain the destination server's bandwidth and take a lot longer.

My only real suggestion is to load up the images you want to include in a web browser, right click them and click properties, and see the dimensions there.

bchertov
12-19-2006, 03:18 PM
My only real suggestion is to load up the images you want to include in a web browser, right click them and click properties, and see the dimensions there.Thanks, but I was trying to find some automated way of doing this. I guess I'll check the image resizing hacks to see how they do it. Thanks anyway.

mauro1947
12-19-2006, 03:19 PM
Hi!
Does this mod works on vBulletin 3.6.4???
Thanks
Bye!

bairy
12-19-2006, 03:45 PM
It works in 3.6.0 and doesn't rely on much vb code, so I would say yes it'll be fine in 3.6.4

Hornstar
12-19-2006, 11:49 PM
Nice work, this looks like something I would need as i have lots of images.

Bounce
12-20-2006, 10:37 PM
It works in 3.6.0 and doesn't rely on much vb code, so I would say yes it'll be fine in 3.6.4

All I get is no images found :(

3.6.4

Run the scan in maintenance: There are a total of 8,057 images.

my link (http://www.hibeesbounce.com/forum/imagestatuscheck.php) above has been run

I've removed 2 images to try it :(

bairy
12-21-2006, 07:41 AM
Did you actually run the scan (looks like screenshot 1 in the first post), or just get as far as the image count screen?

Bounce
12-21-2006, 12:56 PM
Did you actually run the scan (looks like screenshot 1 in the first post), or just get as far as the image count screen?

I Ran the scan in admincp/ maintenance: There were are a total of 8,057 images.

But when i went to the link no images were found ?

Thanks

bairy
12-23-2006, 08:31 AM
Hmm,
Could you go into "Execute SQL Query", just underneath "Update Counters" and run:
select * from imagestatus
On the next page it'll say Results: x

I'm interested in what number x is.

Bounce
12-23-2006, 10:30 AM
On the next page it'll say Results: x

I'm interested in what number x is.

Results: 13,314 (0.0064s), Page 1 of 666

All at Status 000

If this is any use to you :)

bairy
12-23-2006, 12:55 PM
Ah I believe it's because I missed something that throws the error message even when the table exists. Do you have a table prefix?

I'm not sure why they're all status 000, we'll deal with that after.

Bounce
12-24-2006, 04:59 PM
Ah I believe it's because I missed something that throws the error message even when the table exists. Do you have a table prefix?

Not that I know of

// ****** TABLE PREFIX ******
// Prefix that your vBulletin tables have in the database.
$config['Database']['tableprefix'] = '';

Should I have ?

bairy
12-26-2006, 04:16 PM
Nah no table prefix should be fine.
I just thought I'd missed something, but it wouldn't make a difference, plus you'd get a database error message.

To be honest, I don't know. If you want to give me admin access and the ability to execute sql queries, I could have a look. Otherwise, I don't know what it might be.

Bounce
12-26-2006, 10:10 PM
If you want to give me admin access and the ability to execute sql queries, I could have a look. Otherwise, I don't know what it might be.


you have a pm

bairy
12-27-2006, 09:05 AM
Updated to 1.02

- Browser: Corrected bug that said "you haven't scanned" even if you have. This only affects people with table prefixes.

Re-upload the /imagestatuscheck.php file. The product file hasn't been changed so there's no need to re-import.

HarryBO
08-24-2007, 01:12 PM
I use vb. 3.6.8 an it doesn´t work. Where is the link in admincp, where i can scan? When I start imagestatuscheck.php I see the page,but i become no matches.
Plz help

HarryBO
08-24-2007, 09:55 PM
Great Problem!!!

I have even noticed that all my Images in the Forum are gone and all letters in threads are small. How can I fix that? A normal Deinstall wont work ;(

mystic10
11-20-2007, 01:50 AM
This Might Be Silly But I Am Confused On What To Do I Have Installed Everything And Clicked On Scan Links...after It Did Scan It Took Me Back To Scan Links..but Where Did The Data Go And How Can I Correct Broken Links Or Dupicated Links...please Guide

compuminus
02-13-2009, 04:54 PM
This image link checker is a wonderful idea and framework, but it does not work at all right now in vB 3.8. I've tried to tweak the settings to get it to work, but had not success with the following errors:

(1) All status fields beyond the first group defined by GPC['percycle'] are listed as "duplicate / dealt with" and a status code of "000" is entered into the imagestatus database

(2) If GPC['percycle'] is set larger than the number of imageurl entries, then all status codes are entered into the imagestatus table correctly. However, at the very end of the scan nearly all status entries are somehow reset to "000"

(3) If the update counters maintenance script is terminated just prior to completion (and GPC['percycle'] is set larger than the number of imageurl entries), then the imagestatus database has correct status entries in all fields (except those which were not yet checked prior to script termination). However, upon visiting the imagestatuscheck.php page and searching for dead links, all status codes in the imagestatus database are again reset to"000".

Overall, something is happening at the end of the code that inadvertently resets all status fields to "000" in the database. It seems like a very simple code change would fix all of this. If anyone can help diagnose this it would be great.

Alfa1
02-14-2009, 01:19 AM
This would be great to have working on vb 3.8

ForYou
02-17-2009, 10:02 AM
Hello ,

Is there a possibility to delete the image directly without reference to the post ,

Regards

ForYou
02-17-2009, 10:20 AM
Hello ,

There are a lot of pictures, starting with WWW, but not with http and this indicates that it does not exist Is it possible to fix this

RedHacker
03-13-2009, 08:04 AM
This work in 3.8.1....?

Kolbi
02-07-2011, 09:43 AM
Is the possibility given that someone can update it to get it workable with vB4?

TundraSoul
09-20-2013, 02:33 AM
Is the possibility given that someone can update it to get it workable with vB4?

Agreed this so needs to be updated for vB4!