PDA

View Full Version : Help with simple regex filter


mme42
12-08-2009, 11:06 PM
I'm trying to write a simple plug to filter links out from certain domains. I'd like to replace the entire link with a message whether it's a hyperlinked, enclosed in CODE bb tags, or simply typed. But, I'm a newb to php let alone regex. I've tried a number of different variations and always seem to get parse errors. The only time I had it working at all was with one domain and basically no regex in a way that it might as well have just been using an str_replace. But, that doesn't work because I'd like to filter out the whole link rather than just the domain name.

Here's something similar to what I'm looking for (though this doesn't work):

$filterthese = array('domain1', 'domain2', 'domain3');
$replacement = 'LINKS HAVE BEEN FILTERED';
$regex = array('/^http+$filterthese+(rar|zip|html)$/i','/^<a+$filterthese+<\/a>$/i');
$this->post['message'] = preg_replace($regex, $replacement, $this->post['message']);

Can anybody clue me on whatever (probably obvious to somebody with more experience) mistake that I'm making?

I'd also add in some conditionals for which forums it filters etc. But, for now, I'd just like to get the filter part working.

Thanks :D

EDIT: Ok, I got rid of the parse errors for now. But, the replacement isn't working. I suspect that there is something that I'm not understanding about regexes, but I'm not sure where I'm messing up. Basically, I want a regex that would find this string:

1. Starts with http
2. Could contain any number of any characters before the domain/word
3. Has the domain somewhere in the middle
4. Could any number of any characters after
5. Ends with a number of extentions such as (html|htm|rar|zip|001)

I have a feeling that it's numbers 2 and 4 that are tripping me up. I now have this which I thought might work, but it doesn't:

$filterthese = array('domain1', 'domain2', 'domain3');
$replacement = 'LINKS HAVE BEEN FILTERED';
$regex = array('!^(http)+(.*)?($filterthese)+(.*)?(html|htm |rar|zip|001)$!i');
$this->post['message'] = preg_replace($regex, $replacement, $this->post['message']);


Or maybe I'm not understanding regexes whatsoever :p I don't know.

EDIT 2:
The more I've tried and asked around, this doesn't seem so "simple" anymore. And, I'm not sure that I'd even be doing this the best way even if I got the REGEX right. So, for now, I've decided to just start with a str_replace plug (https://vborg.vbsupport.ru/showthread.php?p=1928196#post1928196) (filter just the domain/host name) to get the job done even if it's a bit dirty. Though, I do plan to look into replacing the entire URL eventually. So, any ideas are certainly still welcome. :D