Sunday, November 30, 2008

Scala is faster then Java until you hit Object Serialization

Adding Scala to the serialization benchmarking parade along with Java, Stax, Thrift and Protobuf.
Scala is actually closer to Java, actually as for the serialization engine it IS java since it compiles to java classes. Since you could use Scala in a Java environment like a yet another Jar file then it would be good to check out the serialization cost, especially if you're using RMI, remote Spring or other protocol based on Java serialization.
The surprising part of the Scala compare is that Scala is actually faster in creating objects. To be fair, I've created the exact same objects in Java and Scala and created all the Scala objects from Java code! Creating Scala objects from Scala code might be even faster.
In the chart below size is the size of the object's serialized byte array.In the chart below time is measured in nanoseconds.



Does anyone have an explanation to this?
Here are my assumptions:

  • Since for each class in Scala the compiler creates two Java classes then the encoding in the serialized form needs to contains twice the meta data.
  • Scala's Enumeration does not translate to Java enumeration. I assume that an enum object serialized representation in Java is more compact then a regular object. Scala looses that.
  • As a new language Scala could do a better job in performance and still lavarage the JIT. But they didn't really cared too much about serialization. Actually, there is a good reason, if one cares about serialization performance he should pick up Protobuf or Thrift.
By the way, its really fun writing Scala code. Its much smaller and nicer.

9 comments:

Itay Maman December 1, 2008 at 5:36 AM  

First, here is another contender for you XStream.

Second, I agree with your reasoning that Scala generating more classes, hence the increased size.

I guess the interesting point is the Scala's being better in creating objects. I am not familiar with their implementation strategies so everything I'll say is just a hunch. Maybe they have lazy initialization of fields?

Eishay Smith December 1, 2008 at 10:21 AM  

What do you mean by lazy initialization of fields? I explicitly initialized all the fields. Maybe fields the contractor initialized to default values (null or zero) are ignored.

Itay Maman December 1, 2008 at 12:47 PM  

What does the "object creation" benchmark do? what does it look like?

Eishay Smith December 1, 2008 at 1:00 PM  

Here is the benchmark itself and here is the code that creates the java objects.
Actually I updated the benchmark and added a scala serializer. I'll commit it to the repository later today.

Ricky Clarkson December 1, 2008 at 1:18 PM  

Something must differ in your benchmark, because Scala generates similar bytecode to Java. Specifically, Java-like Scala generates roughly the same bytecode as Java. Idiomatic Scala generates somewhat slower bytecode (more use of anonymous functions).

Eishay Smith December 1, 2008 at 1:25 PM  

I agree its strange. I tried to do in both scala and java benchmarks the exact same things. I'll post the code for review in the evening when I'll get access to the machine.

Itay Maman December 1, 2008 at 10:48 PM  

(still in speculation land)

It is possible for Scala to implement lists more efficiently. Prepending to an immutable list is may be faster. The same goes for creating an empty immutable list.

Also, I don't know how Scala implements genericity. Do they rely on Java's? Seems unlikely. If they rolled out their own, they might have gotten rid of the dynamic cast entailed by Java's generics.

Eishay Smith December 1, 2008 at 11:17 PM  

Details about the code is in a new post Is object creation is Scala really faster then Java?.

Itay, I think you're right about the lists. I use two lists in the code and I assume the efficient Scala list has a large effect.
Scala started with the Generics Idea from its beginning so I assume it is not like the Java generics. Anyway, Java generics do not exist in runtime (only compile time), so how could they matter?

Itay Maman December 1, 2008 at 11:31 PM  

Here's how Java generic exist in run-time. Assume we have this simple generic class.

class Cell<T> { public T value; }

Client code:

Cell<String> c = new Cell<String>();
c.value = "abc";
String s = c.value;

True, there is a process of erasure where class Cell is compiled with T being replaced with (that is: erased to) it's upper bound: java.lang.Object.

Thus, on the client side, fetching c.value means fetching a value of type object. When the JVM verifies the client code it will reject it as ill-typed (assignment from Object to String) *unless* there is a downcast.

Thus, the code emitted by the compiler looks like this

String s = (String) c.value;

Creative Commons License This work by Eishay Smith is licensed under a Creative Commons Attribution 3.0 Unported License.