evannn
01-22-2008, 11:14 AM
Hi everyone,
We've set up a test server to test out conversion of myql db from latin to unicode.
Main purpose is to facilitate the non-English forums.
We've modified mysql's configuration and inserted the following:
[CLIENT]
default-character-set=utf8
[MYSQLD]
default-character-set=utf8
collation_server=utf8_unicode_ci
character_set_server=utf8
init-connect='SET NAMES utf8'
Next, under languages and phases, we changed language settings to UTF-8
Lastly, do a conversion on all vBulletin tables - convert all instances of latin to utf8_unicode_ci. Change the database collation to utf8_general too.
Things seems to work fine. All non-English data appear as it is (no HTML ENTITY was detected!)
But we noticed 2 perculiar issues:
1. Line Break issues with non-English data during posting of new thread and replies
Eg: Assuming the following are unicode text
Para 1. XXXXXXX
Para 2. YYYYYYYY
Any text that is AFTER the LINE BREAK doesnt get inserted into the database. I've checked the tables - Only the text before the line break goes in. The rest simply disappears. (Para 2 onwards is not saved into the db)
Is this a bug? Should I submit a ticket?
2. When we post any new threads, random data gets inserted on its own. The data seems to be grabbed from other posts randomly.
Any advice?
Thanks
--------------- Added 1201007886 at 1201007886 ---------------
I'm thinking of using iconv to convert the encoding manually. I've also thought of re-creating the database from fresh again and dump in the iconv-ed data..
But it seems pretty irrelevant
We've set up a test server to test out conversion of myql db from latin to unicode.
Main purpose is to facilitate the non-English forums.
We've modified mysql's configuration and inserted the following:
[CLIENT]
default-character-set=utf8
[MYSQLD]
default-character-set=utf8
collation_server=utf8_unicode_ci
character_set_server=utf8
init-connect='SET NAMES utf8'
Next, under languages and phases, we changed language settings to UTF-8
Lastly, do a conversion on all vBulletin tables - convert all instances of latin to utf8_unicode_ci. Change the database collation to utf8_general too.
Things seems to work fine. All non-English data appear as it is (no HTML ENTITY was detected!)
But we noticed 2 perculiar issues:
1. Line Break issues with non-English data during posting of new thread and replies
Eg: Assuming the following are unicode text
Para 1. XXXXXXX
Para 2. YYYYYYYY
Any text that is AFTER the LINE BREAK doesnt get inserted into the database. I've checked the tables - Only the text before the line break goes in. The rest simply disappears. (Para 2 onwards is not saved into the db)
Is this a bug? Should I submit a ticket?
2. When we post any new threads, random data gets inserted on its own. The data seems to be grabbed from other posts randomly.
Any advice?
Thanks
--------------- Added 1201007886 at 1201007886 ---------------
I'm thinking of using iconv to convert the encoding manually. I've also thought of re-creating the database from fresh again and dump in the iconv-ed data..
But it seems pretty irrelevant