PDA

View Full Version : Need help with Regex


h2ojunkie
11-23-2008, 02:17 AM
I sware regex is the bain of my exsistance. The more I read on it, the more confused I get and the more my brain hurts.

If anyone is willing to help with this, I'd much appreciate it.

I'm trying to make a modification to the check4spam product to add some functionality.

Here's the original:

if (($vbulletin->userinfo['posts'] < $vbulletin->options['check4spam_urlstop']) && ($vbulletin->options['check4spam_urlstoponoff'] == 1)) {

$message = (strtolower($post['message'])) . ' ';
$message = preg_replace('/[\r\n]+/', ' ', $message);

$allowedurls = preg_split('/[\r\n]+/', $vbulletin->options['check4spam_allowedurls'], -1, PREG_SPLIT_NO_EMPTY);

foreach ($allowedurls as $allowedurl) {
$message = (preg_replace('/(\?)(https?\:\/\/)?(www\.)?' . $allowedurl . '\/?"?\]?(.*?)?\[\/url\]/i', '', $message));
}

$message = str_replace($allowedurls, '', $message);

$detecturl = explode('|', $vbulletin->options['check4spam_urlidentifiers']);

foreach ($detecturl as $urlspam)
{

if (stripos($message, $urlspam) !== false)
{


$dataman->set('visible', 0);
$check4spam_error = 1;

}
}
}


$allowedurls is a list of URL's stored one URL to a line as domain.com (no www)

$allowedurls = preg_split('/[\r\n]+/', $vbulletin->options['check4spam_allowedurls'], -1, PREG_SPLIT_NO_EMPTY);

Then, the part I'm trying to modify.

Right now, if you want to allow a subdomain, you must add it to the allowed list. so if you want to allow imageshack URL's, simply adding imageshack.us won't do the trick.

Instead you would need to list each subdomain seperatly as so:
i433.imageshack.us
i422.imageshack.us
i415.imageshack.us
etc....

As you can see, with the literally hundreds of subdomains used by a site like imageshack, it's just not possible to add them all to the allowedurls list.

The plugin performs the following on allowedurls
foreach ($allowedurls as $allowedurl) {
$message = (preg_replace('/(\[url=?"?\]?)(https?\:\/\/)?(www\.)?' . $allowedurl . '\/?"?\]?(.*?)?\[\/url\]/i', '', $message));
}

This is where the part where my brain starts hurting. I just can't seem to understand what is going on there.

Let's say I have domain.com in my allowedurl list

If I post any of these

http://www.domain.com (?"?\)
[url]www.domain.com
domain.com


It allows the url as it should

But if I post:
subdomain.domain.com

It does not allow the URL, and the only way to allow it would be to add subdomain.domain.com to the allowedurl list as described below.

Hopefully, someone can point me in the right direction of what I need to change in the preg_replace statement to allow subdomains by just adding the TLD to the allowedlist

Thanks