PDA

View Full Version : PHP Help, Stripping HTML tags and updating MySQL.


speedway
02-02-2008, 04:19 AM
Hi all

I am looking for a bit of help in PHP. For my forums DB, I want to programatically update every posts content with the same content minus all the HTML tags. Basically a small PHP app to

connect to Mysql,
open the post table,
extract the pagetext value,
strip all HTML tags from it
write it back to that record
loop to the next record.

Due to my distinct lack of PHP knowledge I am struggling with the looping bit and retrieve/modify/update bit, so I appreciate some kind soul showing me the right direction.

Thanks in advance

Cheers
Bruce

MoT3rror
02-02-2008, 04:41 AM
When the posts are brought out of the database they are script of all the html tags, etc if you have html turn off and bbcode code is added.

speedway
02-02-2008, 05:16 AM
When the posts are brought out of the database they are script of all the html tags, etc if you have html turn off and bbcode code is added.
Thanks but I am not concerned how they look on the page. I want to convert my DB to UTF-8 and there are problematic posts that have been pasted from Microsoft Word. These contain font tags, color tags, font size tags and weird character formatting that causes grief. I basically want to strip everything HTML related from every posts and then I can safely convert the contents to UTF-8.

Cheers
Bruce

Opserty
02-02-2008, 08:31 AM
The 'post' table only stores the post with BBCode, unless you had HTML enabled on your board or something?

speedway
02-02-2008, 08:48 AM
The 'post' table only stores the post with BBCode, unless you had HTML enabled on your board or something?
Damn, see? Shows how much experience I have with this! :)

So, I need to remove all BBcode tags and restore the text to the Verdana, size 2 set. Any pointers to how I would do that - sites, texts, books, anything?

Cheers
Bruce

Dismounted
02-02-2008, 09:50 AM
You'll need some regex replacements. (I hate regex, as it's really confusing, but I'm giving it a shot :p.)
$newtext = preg_replace('/\[(.+?)]/', '', $text);

speedway
02-02-2008, 11:51 AM
Thank you Sir.

Armed with that I went searching on Google and found this:


function stripBBCode($text_to_search) {
$pattern = '|[[\/\!]*?[^\[\]]*?]|si';
$replace = '';
return preg_replace($pattern, $replace, $text_to_search);

}
but it zaps *all* BBCode. Would anyone have any idea on how to make it *not* remove things like:
[ QUOTE ]
[ /QUOTE ]
[ QUOTE=
[ B ]
[ I ]
[ U ]
or any of the other basic ones? (spaces included on purpose so the forum doesn't try and use them)

I did find this function as well:


function stripBBCode($stringInput) {
if (strpos($stringInput, '[') !== false) {
$validBBCodeArray = array(
'b',
'i',
'u',
'url',
'quote',
);

$validBBCode = join('|', $validBBCodeArray);

$stringOutput = preg_replace(
'@\[(?:\/{0,1}?)(?:' . $validBBCode . ')(?:\s{0,1}?)(?:\/{0,1}?)\]@',
'',
$stringInput
);
} else {
$stringOutput = $stringInput;
}

return $stringOutput;
}
but it strips everything *except* font and size tags (for some reason).

All the help so far is being appreciated I assure you. Now for that last little step :)

Cheers
Bruce

Dismounted
02-03-2008, 04:57 AM
Try this (it will strip all font and size tags, but nothing else):
$newtext = preg_replace('/\[(?:\/{0,1}?)(?:font|size)(?:\s{0,1}?)(?:\/{0,1}?)\]/', '', $text);

speedway
02-03-2008, 08:18 AM
Thanks Hanson

I tried that but it strips everything *but* the font & size tags :) I have tried nutting this one out myself but am still lost.

Cheers
Bruce

Reecey
02-03-2008, 08:23 AM
where do i put this code i would like to get rid of html used on my forum to as when i first opened many post's were made using html and i would prefer it to be bb code where do i put the code ?

Dismounted
02-03-2008, 10:54 AM
Thanks Hanson

I tried that but it strips everything *but* the font & size tags :) I have tried nutting this one out myself but am still lost.

Cheers
Bruce
In the second function you posted, you said it strips everything but the font/size tags. But my function does it as well? That's not really possible as I've reversed the conditions...
where do i put this code i would like to get rid of html used on my forum to as when i first opened many post's were made using html and i would prefer it to be bb code where do i put the code ?
This strips BB Code, not HTML.

kansei
02-27-2008, 07:13 PM
where do i put this code i would like to get rid of html used on my forum to as when i first opened many post's were made using html and i would prefer it to be bb code where do i put the code ?

+1

a new member on the forum I admin on copied a post from a different forum which is vbulletin BUT ALLOWS HTML in posts.. bah!

It's seriously hundreds and hundreds of ugly photobucket image tags with target _blank and oh so many things.

I spent 20 minutes manually changing them all to vbcode img tags but.. I'm only 1/3 of the way through?

Poet PHP
02-28-2008, 05:17 AM
if U want to replace BBcode to HTML use this


require_once('./global.php');
require_once(DIR . '/includes/class_bbcode.php');
$bbcode_parser =& new vB_BbCodeParser($vbulletin, fetch_tag_list());
$previewmessage = $bbcode_parser->parse($message);



and to remove them use the


$previewmessage = strip_tags(strip_bbcode($message, true, true));


see the result www.akafi.net/tvv.php