Thursday, February 03, 2005

UTF8 Encoded data

Today I faced the problem of storing Japanese characters in ResourceBundles (Struts framework/Java). The Japanese characters are in UTF-8 encoding (which is ofcourse variable-length encoding scheme) are 3 bytes. So they cant be obviously stored in TXT files which uses some Latin based encoding schemes (which consider characters are 2 bytes). So only way to store them is store them as encoded utf data. To convert to utf encoded format I used the tool native2ascii which comes with JDK.

Before Conversion
今すぐ予約

After conversion using native2ascii
\u4eca\u3059\u3050\u4e88\u7d04


No comments:

Disqus for techtalk