Spiral Arm Logo

Richard's technical notes

Wednesday, December 13, 2006

Generics in Comments

For a project I'm involved in that is stuck using JDK 1.4, I've found doing this soothing:

List/*<Stuff>*/ getStuff();

It's documentation, and gives me the hope that one day I can remove the comments for this particular project.

Tuesday, December 05, 2006

Next Net

Thanks to Ewan Silver and others for organizing the Next Net meet up. It's sort of a London Jini Group, but more general: a gathering of people interested in distributed (utility computing, grid computing, massively scalable, resilient) systems.

I unfortunately missed a large chunk of Dan Creswell's presentation on Utility Computing with Amazon EC2. What I could figure out rekindled my interest in these technologies.

For example: On my way over to the meeting I was wondering about how best to distribute database load somehow or other. When I walked into the meeting, the presentation was discussing doing-away with things like relational databases to obtain big scalability, and shifting mindsets away from Java EE. Ah. So think different.

Why would you bother? I think I'm finally starting to get utility computing. If you've worked in an environment where budgets for hardware involve a lot of hoop-jumping, it suddenly makes sense to be able to call on a dynamic network of resources at the moment you need it. What's more, it's looking like it's cheap. There's effort in building systems that take advantage of that kind of an environment, but if the cost savings are there, the pressure will come to build them that way.

It's interesting stuff, and now I have a larger-than-usual reading list as a result: MapReduce, Bigtable, Chubby, Google File System, Hadoop the
10th Jini Community Meeting papers and the Amazon Web Services Developer Connection docs.

The next Next Net will be Thur 18 Jan 2007 at Brunel.

Friday, December 01, 2006

UTF-7

Picture of a character encoding problem from a TV set-top box display

Character encoding is one of those things that needs careful attention all the time. My usual take is: "UTF-8 is the answer. What was the question?" UTF-8 has some handy properties, such as being compatible with ASCII, while being able to store any Unicode character. It's described nicely in What Is UTF-8 And Why Is It Important? and more generally in the Java Internationalization book.

But other encodings are available, and I recently had to deal with UTF-7. This was a new one on me, but if you have to deal with low-level email encoding, you'll find it's surprisingly popular. The big thing about UTF-7 is that content can be included in SMTP without having to wrapper it in base64 or some other transfer encoding.

UTF-7 is supposed to be dead because, despite its plus points, we now have things like 8BITMIME. In practice, though, it's not at all dead, and for the Java programmer this is a problem because the platform does not support UTF-7.

No panic: the language has an extension mechanism in the CharsetProvider class. Go find an open source library and drop it in. There are a couple: Zimbra have a Mozilla licensed implementation (thanks to Andy for spotting this), and there's a GPL version called JCharSet that can also be purchased as a commercial license. I went with the Zimbra one:


$ cvs -z6 -d :pserver:anonymous:@cvs.zimbra.com:/usr/local/cvsroot co main
$ cd main/ZimbraCharset
$ ant

... which produces build/zimbra-charset.jar, which looks like this:

Image showing the contents of the Zimbra Charset JAR

Drop the jar into your project, and you're away with code like: Charset utf7 = Charset.forName("UTF-7");

However, drop that library and code into a web application, and it won't work. Didn't for me, with Tomcat. You might be able to drop the library into the JRE/lib/ext folder or similar places, but I don't like that option. Instead I thought I'd try to understand what's going on. The short answer is: I don't know.

The documentation says: Charset providers may be installed in an instance of the Java platform as extensions, that is, jar files placed into any of the usual extension directories. Providers may also be made available by adding them to the applet or application class path or by some other platform-specific means. Charset providers are looked up via the current thread's context class loader. I'd have thought that WEB-INF/lib is a good extension directory, but I suspect the phrase "extension directory" is being used in a specific technical sense so perhaps WEB-INF/lib doesn't apply. Maybe it's a security issue, or perhaps it's a bug.

So for now, I've used an explicit request along the lines of:

final Charset charset;
if (false == Charset.isSupported("UTF-7"))
{
ZimbraCharsetProvider zcpi = new ZimbraCharsetProvider();
charset = zcpi.charsetForName("UTF-7);
}
else
{
charset = Charset.forName("UTF-7");
}

It's a pain, but at least that works.

Given that Java is going GPL, I presume I can take the JCharSet UTF-7 implementation and submit it as a big fix and get this sorted once and for all in the platform...