View Full Version : How do I hunt down and store all URL's
Michael Morris
02-20-2005, 10:54 AM
I'd like to write a function to look over a message's contents and extract all the sites in all the URL's of the message for later storage in the database.
Any ideas?
Dean C
02-20-2005, 11:15 AM
This would be a pretty mammoth task, probably best done with MySQL regexp's:
http://dev.mysql.com/doc/mysql/en/regexp.html
Michael Morris
02-20-2005, 07:58 PM
Let me explain what I'm doing a little more clearly. I'm working on a hack designed to make spamming my boards a bit harder (actually, a LOT harder). I've noticed that comment spammers may forge IP's and email addresses, but they can't forge the destination URL or else you can't go to their damn site.
So I'm writing a program that gives moderators the option to "spam ban" a user. When they do the program pulls up all their messages, scans their messages, and presents a list of all URL's present. The moderator can then uncheck any he doesn't want to consider spam - the rest are added to a table in the database either as a domain (banning all it's pages) or page (banning just that page).
When users with less than 20 posts make a post the system compares their newly submitted message against the database of known bad URL's. If a match is found the poster is immediately put into a temporary ban pool and their post is set to invisible so that it has to be moderated. The next time a moderator logs in they get an alert that a spam attempt has been made and then they can go to a review screen and determine whether the system's temp ban was appropriate and confirm it if it was. They're also alerted to which entry in the database prompted the message to be spam banned.
Currently I have the message comparison working - that's simply a while loop over the database with a stristr statement. On the first match the function returns true and jumps out of the while loop. If there's no match the function returns false.
I think the answer to the question I posted is going to be based around preg_match_all. But I could use help on a good query screen to extract the site domain out of the url. There's an example of this on www.php.net that I might be playing with.
vBulletin® v3.8.12 by vBS, Copyright ©2000-2025, vBulletin Solutions Inc.