Monday, February 14, 2005

Handling east asian (Japanese, Chinese,Hindi,Tamil etc) languages input from browser

If you are working on an internationalization project and need to get input from the browser in mutliple languages and have problems with that, then you may need to read further.

The most common encoding used for east asian languages is UTF-8. When a page is rendered from the server side it is easy to set the pageEncoding to UTF-8 and send using
<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>

But for the input text from the browser the type of encoding used cannot be unfortunately set by the server. But we may have to generally assume it to be UTF-8. So before doing request.getParameter("field") we must setEncoding to UTF-8 using
request.setCharacterEncoding("UTF-8")

Here the assumption is that the browser is using UTF-8 and so do we. It should work fine for Wester European langauges even if the encoding used is ISO-8859-1 or US-ASCII.

You could also explicitly change your browser's encoding to UTF-8 in IE by View -> Encoding -> UTF-8

No comments:

Disqus for techtalk