PDA

View Full Version : Trouble with a function I am using


sv1cec
11-12-2007, 02:06 PM
Folks, I am having some problems with a function I am using and I would appreciate your help.

Let me explain myself. I have a function which enters a pair of brackets around a censored word, if such a word is found in a post. This function is shown below:


function bracket_censored_text($text)
{
global $vboptions;
static $censorwords;

if ($vboptions['enablecensor'] AND !empty($vboptions['censorwords']))
{
if (empty($censorwords))
{
$vboptions['censorwords'] = preg_quote($vboptions['censorwords'], '#');
$censorwords = preg_split('#\s+#', $vboptions['censorwords'], -1, PREG_SPLIT_NO_EMPTY);
}

foreach ($censorwords AS $censorword)
{
$censorword = str_replace('\+', ' ', $censorword);
$censorword = str_replace('\\\{','',$censorword);
$censorword = str_replace('\\\}','',$censorword);
$censorword = str_replace('\\','',$censorword);
$text = preg_replace('#(?<=[^a-z]|^)(' . $censorword . ')(?=[^a-z]|$)#si', "[$censorword]", $text);
$text = str_replace('[[','[',$text);
$text = str_replace(']]',']',$text);
$text = str_replace('[[','[',$text);
$text = str_replace(']]',']',$text);

}
}
// strip any admin-specified blank ascii chars
$text = strip_blank_ascii($text, $vboptions['censorchar']);

return $text;
}


The problem I discovered today is this:

I have the following words (abbreviations) in my censored words list:

bs, b.s., BS, B.S.

Now if I have a post with the word "best" in it, everything works fine. But if in the same post, there is one of the above abbreviations, then the abbreviation is put in square brackets, but so is the word "best", which is actually replaced by "[b.s.]".

So this:


This is the best world.


is returned as "This is the best world.", the expression:


This is the best world in the B.S. universe.


is returned as "This is the [b.s.] world in the [b.s.] universe."

Even though I can work with PHP, I have absolutely no idea how the preg_ functions work, so I am at a loss on how to correct this problem. So if one of you experts can help me out, I would greatly appreciate it.

Analogpoint
11-12-2007, 04:38 PM
The dots in your censor word b.s. are being interpreted as the "any single character" expression, which is a dot (.) in the regular expression. What you need to do is escape any characters that have a special meaning for regexes, in your $censorword, before calling preg_replace In PHP, you can use preg_quote to do this.

$text = preg_replace('#(?<=[^a-z]|^)(' . preg_quote($censorword, '#') . ')(?=[^a-z]|$)#si', "[$censorword]", $text);

sv1cec
11-12-2007, 07:17 PM
Analogpoint, you are a life-saver. Many thanks!