PDA

View Full Version : I suck at regexp! (read: HELP!)


Cloudrunner
02-18-2004, 02:20 AM
So I need a little guidance.

This is for my latest idea for a hack and will be released soon.

BUT

I have setup the sql to pull all the posts from the DB that contain a URL link via SELECT pagetext FROM posts WHERE pagetext LIKE '%[ URL ]%[ /URL ]%';

That being the case then $row['pagetext'] will be the entire post.

From that string I need to extract each and EVERY instance of '[ URL ]blah[ /URL ]' from within that string.

Now I understand I could use the explode(); function etc, but that only gives the first instance of '[ URL ]blah[ /URL ]'. I need each and every instance because my users have a habit of adding multiple URLs to their posts.

That being said, anyone who can lend a hand on this will get full credit when the hack is released.

Any takers, or even suggestions?

I've been fighting this for a few days to get the correct way to do it, and have been found a failure at it, for I suck at regexp!

Thank you in advance for any help that you may give.

)O( Cloudrunner )O(

AndrewD
02-18-2004, 08:45 AM
So I need a little guidance.

This is for my latest idea for a hack and will be released soon.

BUT

I have setup the sql to pull all the posts from the DB that contain a URL link via SELECT pagetext FROM posts WHERE pagetext LIKE '%[ URL ]%[ /URL ]%';

That being the case then $row['pagetext'] will be the entire post.

From that string I need to extract each and EVERY instance of '[ URL ]blah[ /URL ]' from within that string.

Now I understand I could use the explode(); function etc, but that only gives the first instance of '[ URL ]blah[ /URL ]'. I need each and every instance because my users have a habit of adding multiple URLs to their posts.

That being said, anyone who can lend a hand on this will get full credit when the hack is released.

Any takers, or even suggestions?

I've been fighting this for a few days to get the correct way to do it, and have been found a failure at it, for I suck at regexp!

Thank you in advance for any help that you may give.

)O( Cloudrunner )O(

I had the same need, and this is what I came up with. Rather than looking for the URL's I pass it through parse_bbcode first, because there may be html there as well. It dumps the links and the text into $titles and $links (which are arrays - check the documemtation on preg_match)


$selectpost = $DB_site->query("
SELECT ".
TABLE_PREFIX . "post.postid as postid, ".
TABLE_PREFIX . "post.username as username, ".
TABLE_PREFIX . "post.userid as userid, ".
TABLE_PREFIX . "post.threadid as threadid, ".
TABLE_PREFIX . "post.title as title, ".
TABLE_PREFIX . "post.pagetext as pagetext, ".
TABLE_PREFIX . "post.dateline as dateline, ".
TABLE_PREFIX . "thread.title as threadtitle
FROM ". TABLE_PREFIX . "post LEFT JOIN ". TABLE_PREFIX . "thread
ON ". TABLE_PREFIX . "post.threadid = ". TABLE_PREFIX . "thread.threadid
ORDER BY ". TABLE_PREFIX . "post.dateline
");

$urllist = array();

while ($postrec = $DB_site->fetch_array($selectpost)) {
$p = parse_bbcode2($postrec['pagetext'],0,0,0,1);
$lines = preg_split('/(\Z|<br \/>)/', $p, -1, PREG_SPLIT_NO_EMPTY);
foreach ($lines as $line) {
$i = preg_match_all ("/<a.*?>.*?<\/a.*?>/", $line, $url , PREG_SET_ORDER);
$k = 0;
while ($k < $i) {
preg_match("/>(.*?)</",$url[$k][0], $titles);
preg_match("/<a *href *= *\"*(.*?)\"*( |>)/",$url[$k][0], $links);
$k++;
}
}
}