
Character encoding is one of those things that needs careful attention all the time. My usual take is: "UTF-8 is the answer. What was the question?" UTF-8 has some handy properties, such as being compatible with ASCII, while being able to store any Unicode character. It's described nicely in What Is UTF-8 And Why Is It Important? and more generally in the Java Internationalization book.
But other encodings are available, and I recently had to deal with UTF-7. This was a new one on me, but if you have to deal with low-level email encoding, you'll find it's surprisingly popular. The big thing about UTF-7 is that content can be included in SMTP without having to wrapper it in base64 or some other transfer encoding.
UTF-7 is supposed to be dead because, despite its plus points, we now have things like 8BITMIME. In practice, though, it's not at all dead, and for the Java programmer this is a problem because the platform does not support UTF-7.
No panic: the language has an extension mechanism in the
CharsetProvider class. Go find an open source library and drop it in. There are a couple: Zimbra have a Mozilla licensed implementation (thanks to Andy for spotting this), and there's a GPL version called JCharSet that can also be purchased as a commercial license. I went with the Zimbra one:$ cvs -z6 -d :pserver:anonymous:@cvs.zimbra.com:/usr/local/cvsroot co main
$ cd main/ZimbraCharset
$ ant
... which produces
build/zimbra-charset.jar, which looks like this:
Drop the jar into your project, and you're away with code like:
Charset utf7 = Charset.forName("UTF-7");However, drop that library and code into a web application, and it won't work. Didn't for me, with Tomcat. You might be able to drop the library into the
JRE/lib/ext folder or similar places, but I don't like that option. Instead I thought I'd try to understand what's going on. The short answer is: I don't know. The documentation says: Charset providers may be installed in an instance of the Java platform as extensions, that is, jar files placed into any of the usual extension directories. Providers may also be made available by adding them to the applet or application class path or by some other platform-specific means. Charset providers are looked up via the current thread's context class loader. I'd have thought that
WEB-INF/lib is a good extension directory, but I suspect the phrase "extension directory" is being used in a specific technical sense so perhaps WEB-INF/lib doesn't apply. Maybe it's a security issue, or perhaps it's a bug. So for now, I've used an explicit request along the lines of:
final Charset charset;
if (false == Charset.isSupported("UTF-7"))
{
ZimbraCharsetProvider zcpi = new ZimbraCharsetProvider();
charset = zcpi.charsetForName("UTF-7);
}
else
{
charset = Charset.forName("UTF-7");
}
It's a pain, but at least that works.
Given that Java is going GPL, I presume I can take the JCharSet UTF-7 implementation and submit it as a big fix and get this sorted once and for all in the platform...


5 Comments:
Tomcat's class loader order requires that you place your extra charset providers in a particular place so that they're loaded in time for your app to see them. If you stick charset.jar in tomcat/common/endorsed, you'll be golden.
Dan - many thanks for that comment. I suspect it'll be of great use to anyone stumbling on this entry. I guess I should just clarify that my concern is that if I distribute an application (WAR file) that requires and supplies a UTF-7 implementation it seems "smelly" that I should need to mess with the container's classpath (if you see what I mean). Especially so in a shared Tomcat environment. That might just be me having too high an expectation of what can be reasonably achived.
But yes... thank you for clarifying the details for Tomcat: that is appreciated.
After a quick search it seems you cannot do what you want.
The following explains several scenarios and how you might package them.
http://java.sun.com/j2ee/verified/packaging.html
Just wanted to point out another implementation:
http://sourceforge.net/projects/jutf7/
It has an MIT license, so it's easily used in all kinds of projects, and is pre-packaged as opposed to the zimbra one.
Thank you very much for this post. It was very helpful for me.
Regards
Post a Comment
<< Home