The Arcive of vBulletin Modifications Site.

av8or1 · #1 05-17-2011, 10:08 AM

Hi-

I am trying to include foreign language snippets into post content during post creation via script execution. I have read the information about the foreign language downloads that you can install in your forum and have read a few posts regarding unicode processing in vB, but I haven't found a way of doing this yet. Thus this post requesting guidance.

I haven't altered the unicode settings for my forum, mostly because my testing has shown that this didn't have an effect on the post content, though I could have missed something along the way. This is a summary of what I tested:

To begin I simply added a new post interactively via the standard editor on my test forum. I cut and pasted the text, which is cyrillic BTW, directly into the message and pressed the submit button. The text rendered correctly:

катушки

and upon viewing the source I saw that the standard HTML entities for the unicode characters that correspond to the entered text were displayed:

&#x43A&#x430&#x442&#x443&#x448&#x43A&#x438

So the question I have is how to replicate this via a script? The data that my script receives has these characters represented in their unicode \u0XXX form (at least I think that's the standard form, though I seem to recall the %uXXX form too) which is easily convertable to the HTML equivalent. However when my script does that and then submits the post (via a post DM object) all I see are the above HTML entities in the textual content of that post. And of course the same result occurred when leaving them as \u0XXX.

So I dug around in the code. I tried applying html_entity_decode to the body of the post prior to submitting it, but that didn't have any effect. I dug further. I found a couple of interesting items that I was going to attempt next, specifically:

unhtmlspecialchars (vB function)
htmlspecialchars_decode (php function)

The unthmlspecialchars would need the second parm set to true or else it won't decode unicode entities and I only saw this done in a couple of places within all of the vB code.

Anyway it was about midnight when I found these and so I haven't tried them yet, mostly because I'm not sure they will work or if it is even the correct approach to the problem.

So in the end I'd just like to ask: has anyone else dealt with this issue already and if so, can you describe how you solved it? I conducted a few searches on vb.org/forum but didn't find anything.

Thanks!

Jerry

kh99 · #2 05-17-2011, 02:33 PM

I haven't dealt with this before, but doing a little searching, I found this page: http://stackoverflow.com/questions/2...8-encoded-char

and adapting it a little, I found that if for example your message is saved in $message and looks like this: "\u043a\u0430\u0442\u0443\u0448\u043a\u0438", and you do something like this:

Code:

function replace_unicode_escape_sequence($match) { return mb_convert_encoding(pack('H*', $match[1]), 'HTML-ENTITIES', 'UCS-2BE'); } 
$message = preg_replace_callback('/\\\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', $message);

before sending $message to the dm object, then you will get a post with cyrillic chars.

av8or1 · #3 05-17-2011, 09:15 PM

Hi again kh99-

Thank you for the input. Websites like stackoverflow and devshed can be helpful sometimes yes?

I found this late last night too, but unfortunately it doesn't work. With this the output (page rendering) is then:

u043Au0430u0442u0443u0448u043Au0438

So I'm back at the drawing board. I'll post if I find a solution. If you have any additional ideas, feel free to share, I'd appreciate it.

Thanks!

Jerry

ps-I could have taken a left where I should have taken a right, so I am verifying at the moment...

pps-Back again: I found the place where I zigged where I should have zagged.

[scratching the top of my head]

Still working...

kh99 · #4 05-17-2011, 11:04 PM

I tested it by making a plugin using newpost_process and this code:

Code:

function replace_unicode_escape_sequence($match) {     return mb_convert_encoding(pack('H*', $match[1]), 'HTML-ENTITIES', 'UCS-2BE'); } 

$post['message'] = preg_replace_callback('/\\\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', "\u043a\u0430\u0442\u0443\u0448\u043a\u0438");

And I get this: катушки

BTW, it's a test system. Don't put it on a live forum or every post will be катушки.

av8or1 · #5 05-18-2011, 08:45 PM

Hmmmm...ok. Well I only have this code in my script that is a module in the overall migration utility from Lefora to vB, so it won't be accessible in the forum, no.

I'm still puzzled because I have been seeing this error:

HTML Code:

<b>Fatal error</b>:  Cannot redeclare replace_unicode_escape_sequence() (previously declared in /home/russia/public_html/testvb/AddNewThreadPost.php:250) in <b>/home/russia/public_html/testvb/AddNewThreadPost.php</b> on line <b>250</b><br />

Where line 250 (and 251) is (are):

PHP Code:


			
function replace_unicode_escape_sequence($match){ return mb_convert_encoding(pack('H*', $match[1]), 'HTML-ENTITIES', 'UCS-2BE'); }

$postData[ 'body' ] = preg_replace_callback('/\\\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', $postData[ 'body' ] );

I also attempted placing the function in-line with the pre_replace_callback() invocation but that yielded the error message stating that it didn't know what the HTML-ENTITIES encoding was. So I thought I had something wrong with the quotes in that variant and I played around with it for a while. Couldn't get it to work. I must be doing something amiss...

Ack. Back at it today.

Thanks

kh99 · #6 05-18-2011, 10:23 PM

Hmm...yeah, you can't redefine the function over and over, but defining it once at the beginning (outside any loop) should work, or doing something like:

Code:

if (!function_exists("replace_unicode_escape_sequence")) function replace_unicode_escape_sequence($match){ return mb_convert_encoding(pack('H*', $match[1]), 'HTML-ENTITIES', 'UCS-2BE'); }

Maybe HTML-ENTITES is new to PHP5? Doesn't look like it - or at least it's not listed in the changes.

av8or1 · #7 05-18-2011, 11:17 PM

kh99-

Much thanks again. I'm not sure if HTML-ENTITIES is new to PHP5 or not, but I saw it in the list of supported encodings of the PHP version (5-something) that my webhosting service provides, so I knew I was good-to-go in that regard.

Anyway. I found the error of my ways and now this little trick worked. Hopefully this can help someone in the future.

Jerry

X vBulletin 3.8.12 by vBS Debug Information
Page Generation 0.06570 seconds Memory Usage 4,708KB Queries Executed 11 (?)
More Information
Template Usage: (1)SHOWTHREAD (1)ad_footer_end (1)ad_footer_start (1)ad_header_end (1)ad_header_logo (1)ad_navbar_below (1)ad_showthread_beforeqr (1)ad_showthread_firstpost (1)ad_showthread_firstpost_sig (1)ad_showthread_firstpost_start (3)bbcode_code (1)bbcode_html (1)bbcode_php (1)footer (1)forumjump (1)forumrules (1)gobutton (1)header (1)headinclude (1)navbar (3)navbar_link (120)option (7)post_thanks_box (7)post_thanks_button (1)post_thanks_javascript (1)post_thanks_navbar_search (7)post_thanks_postbit_info (7)postbit (7)postbit_onlinestatus (7)postbit_wrapper (1)spacer_close (1)spacer_open (1)tagbit_wrapper Phrase Groups Available: global inlinemod postbit posting reputationlevel showthread	Included Files: ./showthread.php ./global.php ./includes/init.php ./includes/class_core.php ./includes/config.php ./includes/functions.php ./includes/class_hook.php ./includes/functions_amr.php ./includes/modsystem_functions.php ./includes/functions_bigthree.php ./includes/class_postbit.php ./includes/class_bbcode.php ./includes/functions_reputation.php ./includes/functions_post_thanks.php Hooks Called: init_startup init_startup_session_setup_start init_startup_session_setup_complete cache_permissions fetch_threadinfo_query fetch_threadinfo fetch_foruminfo style_fetch cache_templates global_start parse_templates global_setup_complete showthread_start showthread_getinfo forumjump showthread_post_start showthread_query_postids showthread_query bbcode_fetch_tags bbcode_create showthread_postbit_create postbit_factory postbit_display_start post_thanks_function_post_thanks_off_start post_thanks_function_post_thanks_off_end post_thanks_function_fetch_thanks_start post_thanks_function_fetch_thanks_end post_thanks_function_thanked_already_start post_thanks_function_thanked_already_end fetch_musername postbit_imicons bbcode_parse_start bbcode_parse_complete_precache bbcode_parse_complete postbit_display_complete post_thanks_function_can_thank_this_post_start tag_fetchbit_complete forumrules navbits navbits_complete showthread_complete
Messages: