Monday, November 17, 2008

Java, StAX, Protobuf and Thrift

Another option to serialize objects is using the XML format. It has a lot of advantages, but its not performing well. In come cases this performance aspect is only a very small part of the transaction, but since there is so much of SOAP protocols floating around, in some cases they should be reconsidered. The fastest Java XML library that I know of is StAX. I've created a matching XML to the Thrift and Protobuf schemas, limiting the tag sizes to only two chars. Makes the XML not too readable, but limits its total size.
Here are the performance charts comparing to StAX with Java plain Serialization, Thrift and Protobuf.




Here is the Thrift object description:

namespace java serializers.thrift
typedef i32 int
typedef i64 long
enum Size {
SMALL = 1,
LARGE = 2,
}

enum Player {
JAVA = 0,
FLASH = 1,
}

/**
* Some comment...
*/
struct Image {
1: string uri, //url to the images
2: optional string title,
3: optional int width,
4: optional int height,
5: optional Size size,
}

struct Media {
1: string uri, //url to the thumbnail
2: optional string title,
3: optional int width,
4: optional int height,
5: optional string format,
6: optional long duration,
7: optional long size,
8: optional int bitrate,
9: optional list person,
10: optional Player player,
11: optional string copyright,
}

struct MediaContent {
1: optional list image,
2: optional Media media,
}
Protobuf:
// See README.txt for information and build instructions.

package serializers.protobuf;

option java_package = "serializers.protobuf";
option java_outer_classname = "MediaContentHolder";

message Image {
required string uri = 1; //url to the thumbnail
optional string title = 2; //used in the html ALT
optional int32 width = 3; // of the image
optional int32 height = 4; // of the image
enum Size {
SMALL = 0;
LARGE = 1;
}
optional Size size = 5; // of the image (in relative terms, provided by cnbc for example)
}

message Media {
required string uri = 1; //uri to the video, may not be an actual URL
optional string title = 2; //used in the html ALT
optional int32 width = 3; // of the video
optional int32 height = 4; // of the video
optional string format = 5; //avi, jpg, youtube, cnbc, audio/mpeg formats ...
optional int64 duration = 6; //time in miliseconds
optional int64 size = 7; //file size
optional int32 bitrate = 8; //video
repeated string person = 9; //name of a person featured in the video
enum Player {
JAVA = 0;
FLASH = 0;
}
optional Player player = 10; //in case of a player specific media
optional string copyright = 11;//media copyright
}

message MediaContent {
repeated Image image = 1;
optional Media media = 2;
}
The generated XML looks like this:

2 comments:

dontcare November 28, 2009 at 4:43 PM  

You want want to look at vtd-xml, another XPath engine that offers a lot of cool features

http://vtd-xml.sf.net

Eishay Smith November 29, 2009 at 9:37 PM  

Very interesting. Its great that it can parse a 256 GB XML document, but how in the world creates such insane docs and why is he still alive !?
Would you like to add a VTD plugin to the benchmark?

Creative Commons License This work by Eishay Smith is licensed under a Creative Commons Attribution 3.0 Unported License.