View Single Post
  #1  
Old 10-03-2005, 03:13 PM
gr8dude gr8dude is offline
 
Join Date: Sep 2005
Location: Moldova
Posts: 12
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default Importing from a non-unicode database into VB using a unicode db

I'm new to this forum, I was directed here by a member of VB's support team. I'll just paste the history so that you can understand the issue I am dealing with.

I hope someone can provide a solution.
Quote:
We plan to run multiple language forums in one forum, therefore the codepage to go for is UTF-8. The problem is that our old forum, http://www.dekart.com/forum/ uses a different codepage: ISO-8859-5, and when we import the threads - they don't show up correctly on the new forum, if it is set to use UTF-8. If we switch it to ISO-8859-5 - the characters are rendered correctly.

The language that causes the trouble is Russian. But since there will be French and Spanish too, we must go for UTF-8.

I tried the following: took a dump of the database, opened it with a text editor and saved it as a unicode file. Uploading it back - did not work, the server itself did not accept the file as it uses a different codepage.
answer:
1. Not sure I understand this one. However with vB3, you can set the HTML Character Set for each specific language on your forums.

follow-up
Quote:
1. I am aware of the fact that VB can use different languages for different forum sections, but we chose the Unicode-way for the following reason: The main page will consist of several forums [for each language], each of them including several child-forums [a child-forum per program title].

So, on the main page there will be multiple languages on the same screen, displaying them in Unicode - OK - I can read umlauts, cyrillic characters, and east-european diacritics in one move.

On the other hand, if we use a different codepage for each forum, then those on the main screen will only be able to read correctly the text written in the currently-active codepage. This might be acceptable in the case of umlauts [only the special characters will be unreadable], but Russian is a totally different thing.

This is not convenient because:
- One can read only their language, but see gibberish in the case of all the other languages
- Those who speak several languages miss the chance to browse the 'foreign' stuff in a convenient way
- As a moderator, I speak multiple languages, and I manage more than one section of the forum. I will have to play with codepages each time I need to do something

This is why we decided to switch to UTF-8. It works perfectly, but it raises the problem of old threads in the old forum, which are stored in a ISO-codepage, not Unicode.

I hope the situation is more clear now.

The software on the server is this:
php 4.3.9
apache 1.3.33
mysql 3.23.52

It runs on an older version of SuSE linux, 8.something, if I recall correctly...

This issue really bothers me. We've purchased VB about 2 months ago, and we still haven't implemented it because of this legacy-problem.


There has to be a better way than manually posting all the 'problematic' messages. Perhaps someone wrote a tool that inerracts directly with the database?

Well, this is it. I hope someone can help me figure out which direction to go in order to solve this; or at least 'scientifically prove' that there is no solution.

As I understand, the 'bottleneck' is MySQL - it cannot work with unicode files; [i'm not an expert at that, so this might be wrong] but in this case, how does VB use UTF for the text?
Reply With Quote
 
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01256 seconds
  • Memory Usage 1,781KB
  • Queries Executed 11 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD_SHOWPOST
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (2)bbcode_quote
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)post_thanks_box
  • (1)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (1)post_thanks_postbit_info
  • (1)postbit
  • (1)postbit_onlinestatus
  • (1)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • reputationlevel
  • showthread
Included Files:
  • ./showpost.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showpost_start
  • bbcode_fetch_tags
  • bbcode_create
  • postbit_factory
  • showpost_post
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • showpost_complete