Yeah, I had a feeling you'd evenutally find a problem with that. It's hard to get stuff like that to be bulletproof (or at least it's hard for me).
Anyway, try this:
Code:
$word = array(
'google',
'yahoo'
);
$link = array(
'<a href="http://google.com">google</a>',
'<a href="http://yahoo.com">yahoo</a>'
);
// Match any HTML tag, this will be the delimiter in preg_split
$regexp = "/(<\/?\w+((\s+(\w|\w[\w-]*\w)(\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>)/i";
// Capture delimiters and offsets. This will also capture things multiple times because of the
// multiple parens used in the pattern, so we'll have to skip them in the loop below
$parts = preg_split($regexp, $this->post['message'], -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_OFFSET_CAPTURE);
$newmsg = '';
$intag = false;
$offset = 0;
foreach ($parts as $part) // $part is array, 0 = string 1 = offset (because of PREG_SPLIT_OFFSET_CAPTURE)
{
if ($part[1] < $offset) // ignore parts from other parens in regexp
continue;
$offset = $part[1] + strlen($part[0]);
if (strncasecmp($part[0], '<a', 2) == 0 ||
strncasecmp($part[0], '<img', 4) == 0)
{
if (strcasecmp(substr($part[0], -2), '/>') == 0) // check for self-closed tag
$intag = false;
else
$intag = true;
}
else if (strncasecmp($part[0], '</a', 3) == 0 ||
strncasecmp($part[0], '</img', 5) == 0)
$intag = false;
else if (!$intag)
$part[0] = str_replace($word, $link, $part[0]);
$newmsg .= $part[0];
}
$this->post['message'] = $newmsg;
It's getting ugly now...