tommythejoat
08-18-2013, 10:00 PM
When someone includes a reference to a foreign object that will be accessed on reference, it introduces a possible hazard for the site. The codes that are at risk are the <IMG ../> tag and the newer EMBED and OBJECT tags that are showing up in custom BB Codes.
The fundamental problem with these tags is that the target can be changed by whomever controls the target location. At a minimum a surprising object might be displayed on the site hosting the reference. Potentially someone could exploit a vulnerability in the processing of these resources and damage more than the appearance of the site.
Y2ksw has produced a mod that "harvests" the foreign IMG references and replaces them with local references to the copied image file. Doing this has some hazards of its own since those referencing external material might not obtain required permission for the use of the file and the site owner/admin then becomes complicit in violating the copyright of the image owner. The process itself is relatively benign since it batches the harvesting into relative small jobs that run in the background under the vB cron job.
I have been thinking that there may be a better way to address these issues. The Import External Images modification creates an imported image table where it stores the original URL as found in the reference and the mapped URL referencing the local resource in the imported images directory. If one added a status column to the imported image table, you could indicate status of intact, missing and modified. In order to suport the modified status, you would need to add the MD5 or some other hash of the original image to the table. The table would then have the columns imageindex, hash, status, originalURL and mapURL.
To make this work effectively might be too much overhead but there are two basic approaches. In order to save bandwidth, external references should remain intact with the harvested file available as the local backup. If an external reference returns a lost reference, it would be easy to trap the error and update the local table to show the image is missing and substitute the local backup reference for the original reference.
One could also checksum each successful image reference and update the table and the post if the checksum did not match. That might be too much load to be practical. A less intensive alternative would be to log all external references and have a batch job that runs under Cron to check them and update the table status and pagetext of the referencing post.
I suspect most readers will think this is much ado about nothing very important. However, this vulnerability has been a concern for our site for some time and we may undertake to do this just to evaluate the impact.
The fundamental problem with these tags is that the target can be changed by whomever controls the target location. At a minimum a surprising object might be displayed on the site hosting the reference. Potentially someone could exploit a vulnerability in the processing of these resources and damage more than the appearance of the site.
Y2ksw has produced a mod that "harvests" the foreign IMG references and replaces them with local references to the copied image file. Doing this has some hazards of its own since those referencing external material might not obtain required permission for the use of the file and the site owner/admin then becomes complicit in violating the copyright of the image owner. The process itself is relatively benign since it batches the harvesting into relative small jobs that run in the background under the vB cron job.
I have been thinking that there may be a better way to address these issues. The Import External Images modification creates an imported image table where it stores the original URL as found in the reference and the mapped URL referencing the local resource in the imported images directory. If one added a status column to the imported image table, you could indicate status of intact, missing and modified. In order to suport the modified status, you would need to add the MD5 or some other hash of the original image to the table. The table would then have the columns imageindex, hash, status, originalURL and mapURL.
To make this work effectively might be too much overhead but there are two basic approaches. In order to save bandwidth, external references should remain intact with the harvested file available as the local backup. If an external reference returns a lost reference, it would be easy to trap the error and update the local table to show the image is missing and substitute the local backup reference for the original reference.
One could also checksum each successful image reference and update the table and the post if the checksum did not match. That might be too much load to be practical. A less intensive alternative would be to log all external references and have a batch job that runs under Cron to check them and update the table status and pagetext of the referencing post.
I suspect most readers will think this is much ado about nothing very important. However, this vulnerability has been a concern for our site for some time and we may undertake to do this just to evaluate the impact.