Πέμπτη, 11 Οκτωβρίου 2007

Greek and Java

Under this subject I publish any issues that a Java programmer may encounter with Greek letters while writing Java programs. This information might be useful to programmers of other native languages too.

Issue 1: Apache Tomcat doesn't display Greek

Instead ???? are displayed.

Solution
You have to perform the following steps:
  1. Edit conf/server.xml like so:

    connector port="8080" maxhttpheader="8192" maxthreads="150" minsparethreads="25" maxsparethreads="75" enablelookups="false" redirectport="8443" acceptcount="100" connectiontimeout="20000" disableuploadtimeout="true" URIEncoding="ISO-8859-7" or URIEncoding="UTF-8" depending on your application.

  2. If you are using a servlet, then you have to add the following lines of code to your doGet() method:

    request.setCharacterEncoding("ISO-8859-7"); // or UTF-8
    response.setCharacterEncoding("ISO-8859-7"); // or UTF-8


  3. If you are using a JSP, then you must include the following header on the beginning of the jsp page:

    <%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> // or ISO-8859-7
  4. You may need to add this line too
    // or ISO-8859-7
  5. In case the encoding of the GET command is different than the one in URIEncoding (e.g. ISO-8859-7 and URIEncoding="UTF-8"), then use the command request.getQueryString() to get the original query string, before this is changed by Tomcat's URIEncoding, manually parse the string to a variable, e.g. myVar, and then use e.g. URLDecoder.decode(myvar, "ISO-8859-7").

Note: This info applies to JBoss too, as it is based on Tomcat.

References:
  1. JHUG forum link
  2. Apache Tomcat wiki
  3. The absolute minimum about Unicode and Character Sets

Issue 2: Greek support in Eclipse IDE

Right click an eclipse project and select Properties. In the info property, under Text file encoding box, select one of the following encodings:
  • ISO-8859-7
  • Cp1253
  • UTF-8
Issue 3: Greek support in Netbeans IDE

Right click on a project and select Properties. In the Encoding property, select one of the following encodings:
  • ISO-8859-7
  • Windows-1253
  • UTF-8
Issue 4: Greek support in MySQL

Use one of the following collations: utf8_unicode_ci or utf8_general_ci. Charset should be UTF-8.
If you are using phpmyadmin, you can do this while creating the database. If you have already created the database, you can change its collation by clicking on Operations tab.
If you can still not see Greek, edit my.conf and under [mysqld] tag add the string
character-set-server = utf8.

To use together with Tomcat:

url="jdbc:mysql://localhost/timesheet?useUnicode=true;characterEncoding=utf8" // or ISO-8859-7

Issue 5: Define encoding

Whenever you open/save a text file, specify the character encoding and don't rely on the OS default encoding:

Reader r = new InputStreamReader(new FileInputStream(file), "UTF-8"); // ISO-8859-7
Reader r = new InputStreamWriter(new FileInputStream(file), Charset.forName("ISO-8859-7"));
Writer w = new OutputStreamWriter(new FileOutputStream(file), "ISO-8859-7"); // UTF-8
Writer w = new OutputStreamWriter(new FileOutputStream(file), Charset.forName("UTF-8"));
String s = new String(byteArray, "UTF-8");
byte[] a = string.getBytes("UTF-8"); // Cp1253

Issue 6: Greek locale
private static final Locale GREEK_LOCALE = new Locale("el", "GR");

Issue 7: Greek and JEditorPane
Unfortunately, JEditorPane supports HTML 3.2 (to some arguable extend), while named Greek entities were added in HTML 4.0. This means, that

jEditorPane.setContentType("text/html");
jEditorPane.setText("αβγδ");
System.out.println(jEditorPane.getText());

will not return the Greek characters but something like

#965;#966;#967;#968

You can try

jEditorPane.setContentType("text/html;charset=utf-8");
jEditorPane.setText(new String("αβγδ".getBytes("utf-8"),"utf-8");

but it still doesn't work.

A solution is simply to use

jEditorPane.setContentType("text/plain");

In any case, it is better to handle the text from the variable it is stored than getting it from jEditorPane.getText().

References:
  1. Java Anti-Patterns