Go Back   vb.org Archive > vBulletin 4 Discussion > vB4 Programming Discussions
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools Display Modes
  #1  
Old 05-17-2011, 11:08 AM
av8or1 av8or1 is offline
 
Join Date: Mar 2011
Posts: 58
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default Understanding post content unicode

Hi-

I am trying to include foreign language snippets into post content during post creation via script execution. I have read the information about the foreign language downloads that you can install in your forum and have read a few posts regarding unicode processing in vB, but I haven't found a way of doing this yet. Thus this post requesting guidance.

I haven't altered the unicode settings for my forum, mostly because my testing has shown that this didn't have an effect on the post content, though I could have missed something along the way. This is a summary of what I tested:

To begin I simply added a new post interactively via the standard editor on my test forum. I cut and pasted the text, which is cyrillic BTW, directly into the message and pressed the submit button. The text rendered correctly:

катушки

and upon viewing the source I saw that the standard HTML entities for the unicode characters that correspond to the entered text were displayed:

&#x43A&#x430&#x442&#x443&#x448&#x43A&#x438

So the question I have is how to replicate this via a script? The data that my script receives has these characters represented in their unicode \u0XXX form (at least I think that's the standard form, though I seem to recall the %uXXX form too) which is easily convertable to the HTML equivalent. However when my script does that and then submits the post (via a post DM object) all I see are the above HTML entities in the textual content of that post. And of course the same result occurred when leaving them as \u0XXX.

So I dug around in the code. I tried applying html_entity_decode to the body of the post prior to submitting it, but that didn't have any effect. I dug further. I found a couple of interesting items that I was going to attempt next, specifically:

unhtmlspecialchars (vB function)
htmlspecialchars_decode (php function)

The unthmlspecialchars would need the second parm set to true or else it won't decode unicode entities and I only saw this done in a couple of places within all of the vB code.

Anyway it was about midnight when I found these and so I haven't tried them yet, mostly because I'm not sure they will work or if it is even the correct approach to the problem.

So in the end I'd just like to ask: has anyone else dealt with this issue already and if so, can you describe how you solved it? I conducted a few searches on vb.org/forum but didn't find anything.

Thanks!

Jerry
Reply With Quote
  #2  
Old 05-17-2011, 03:33 PM
kh99 kh99 is offline
 
Join Date: Aug 2009
Location: Maine
Posts: 13,185
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I haven't dealt with this before, but doing a little searching, I found this page: http://stackoverflow.com/questions/2...8-encoded-char

and adapting it a little, I found that if for example your message is saved in $message and looks like this: "\u043a\u0430\u0442\u0443\u0448\u043a\u0438", and you do something like this:

Code:
function replace_unicode_escape_sequence($match) { return mb_convert_encoding(pack('H*', $match[1]), 'HTML-ENTITIES', 'UCS-2BE'); } 
$message = preg_replace_callback('/\\\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', $message);

before sending $message to the dm object, then you will get a post with cyrillic chars.
Reply With Quote
  #3  
Old 05-17-2011, 10:15 PM
av8or1 av8or1 is offline
 
Join Date: Mar 2011
Posts: 58
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Hi again kh99-

Thank you for the input. Websites like stackoverflow and devshed can be helpful sometimes yes? I found this late last night too, but unfortunately it doesn't work. With this the output (page rendering) is then:

u043Au0430u0442u0443u0448u043Au0438

So I'm back at the drawing board. I'll post if I find a solution. If you have any additional ideas, feel free to share, I'd appreciate it.

Thanks!

Jerry

ps-I could have taken a left where I should have taken a right, so I am verifying at the moment...

pps-Back again: I found the place where I zigged where I should have zagged.

[scratching the top of my head]

Still working...
Reply With Quote
  #4  
Old 05-18-2011, 12:04 AM
kh99 kh99 is offline
 
Join Date: Aug 2009
Location: Maine
Posts: 13,185
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I tested it by making a plugin using newpost_process and this code:

Code:
function replace_unicode_escape_sequence($match) {     return mb_convert_encoding(pack('H*', $match[1]), 'HTML-ENTITIES', 'UCS-2BE'); } 

$post['message'] = preg_replace_callback('/\\\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', "\u043a\u0430\u0442\u0443\u0448\u043a\u0438");
And I get this: катушки

BTW, it's a test system. Don't put it on a live forum or every post will be катушки.
Reply With Quote
  #5  
Old 05-18-2011, 09:45 PM
av8or1 av8or1 is offline
 
Join Date: Mar 2011
Posts: 58
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Hmmmm...ok. Well I only have this code in my script that is a module in the overall migration utility from Lefora to vB, so it won't be accessible in the forum, no.

I'm still puzzled because I have been seeing this error:

HTML Code:
<b>Fatal error</b>:  Cannot redeclare replace_unicode_escape_sequence() (previously declared in /home/russia/public_html/testvb/AddNewThreadPost.php:250) in <b>/home/russia/public_html/testvb/AddNewThreadPost.php</b> on line <b>250</b><br />
Where line 250 (and 251) is (are):

PHP Code:
function replace_unicode_escape_sequence($match){ return mb_convert_encoding(pack('H*'$match[1]), 'HTML-ENTITIES''UCS-2BE'); }
$postData'body' ] = preg_replace_callback('/\\\\u([0-9a-f]{4})/i''replace_unicode_escape_sequence'$postData'body' ] ); 
I also attempted placing the function in-line with the pre_replace_callback() invocation but that yielded the error message stating that it didn't know what the HTML-ENTITIES encoding was. So I thought I had something wrong with the quotes in that variant and I played around with it for a while. Couldn't get it to work. I must be doing something amiss...

Ack. Back at it today.

Thanks
Reply With Quote
  #6  
Old 05-18-2011, 11:23 PM
kh99 kh99 is offline
 
Join Date: Aug 2009
Location: Maine
Posts: 13,185
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Hmm...yeah, you can't redefine the function over and over, but defining it once at the beginning (outside any loop) should work, or doing something like:

Code:
if (!function_exists("replace_unicode_escape_sequence")) function replace_unicode_escape_sequence($match){ return mb_convert_encoding(pack('H*', $match[1]), 'HTML-ENTITIES', 'UCS-2BE'); }

Maybe HTML-ENTITES is new to PHP5? Doesn't look like it - or at least it's not listed in the changes.
Reply With Quote
  #7  
Old 05-19-2011, 12:17 AM
av8or1 av8or1 is offline
 
Join Date: Mar 2011
Posts: 58
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

kh99-

Much thanks again. I'm not sure if HTML-ENTITIES is new to PHP5 or not, but I saw it in the list of supported encodings of the PHP version (5-something) that my webhosting service provides, so I knew I was good-to-go in that regard.

Anyway. I found the error of my ways and now this little trick worked. Hopefully this can help someone in the future.

Jerry
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 02:18 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.07105 seconds
  • Memory Usage 2,235KB
  • Queries Executed 13 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)ad_showthread_firstpost
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (3)bbcode_code
  • (1)bbcode_html
  • (1)bbcode_php
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)navbar
  • (3)navbar_link
  • (120)option
  • (7)post_thanks_box
  • (7)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (7)post_thanks_postbit_info
  • (7)postbit
  • (7)postbit_onlinestatus
  • (7)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete