Thursday, April 09, 2009

resolving WstxUnexpectedCharException

Just got into this exception when parsing news articles from the web:

Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: 
Illegal character ((CTRL-CHAR, code 19))
at [row,col {unknown-source}]: [1186,417]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace
at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary
at com.ctc.wstx.sr.BasicStreamReader.finishToken
at com.ctc.wstx.sr.BasicStreamReader.next
at org.codehaus.stax2.ri.Stax2EventReaderImpl.peek
The problems appeared to be a control character in one of the non English articles. To solve the problem simply remove the control chars from the text using:
str.replaceAll("\\p{Cntrl}", "")

Wednesday, April 08, 2009

Protocol Buffers forward + backward compatibility demo using Scala and Voldemort

After the long benchmarking session which is still not over, I came to the understanding that engineers are clinging too much for numbers and making them the first and only impression when evaluating a library.
So here is a small demo of one of the nicer protobuf features: forward and backward compatibility.
[flame warning]show me how you do it with json[end flame warning].
Ok, first check out the protobuf-object-competability-example project:

git clone git://github.com/eishay/protobuf-object-competability-example.git

Open the file protobuf/user.proto and make it look like this:
package test;

option java_package = "test";
option java_outer_classname = "UserPBO";

option optimize_for = SPEED;

message User {
required uint32 id = 1;
optional string name = 2;
repeated string email = 3;
}
Now, lets compile it
protoc --java_out=src protobuf/user.proto
ant compile
Good! we're ready. Run the Voldemort server and a scala interactive client
bin/voldemort-server.sh . &
bin/voldemort-scala-shell.sh
Now we're starting the actual demo. Let us call this session the sign in service
Welcome to Scala version 2.7.3.final (Java HotSpot(TM) Client VM, Java 1.5.0_16).
Type in expressions to have them evaluated.
Type :help for more information.

scala> import voldemort._
import voldemort._

scala> import test.UserPBO._
import test.UserPBO._

//create a new protobuf object from scratch
scala> val user = User.newBuilder.setId(1).setName("Joe Smith").addEmail("joe@gmail.com").addEmail("joe2@yahoo.com").build
user: test.UserPBO.User =
id: 1
name: "Joe Smith"
email: "joe@gmail.com"
email: "joe2@yahoo.com"

//create a new voldemort client
scala> val vclient = new VClient[String, User] ("proto-store", "tcp://localhost:6666")
[2009-04-08 23:49:14,179] INFO Client /127.0.0.1:59028 connected. (voldemort.server.socket.SocketServer)
Established connection to proto-store via tcp://localhost:6666
vclient: voldemort.VClient[String,test.UserPBO.User] =
store name : proto-store
bootstrap url : tcp://localhost:6666
key serializer: StringSerializer
val serializer: ProtoBufSerializer

//push the user object to voldemort
scala> vclient put (user.getId.toString, user)
Now lets open another session in a new terminal, don't close the sign in service session yet!
But before opening the new session, we just got a notice that the User object can not store the list of emails any longer and from now on it stores a new membership object!
So our new protobuf object looks like this:
package test;

option java_package = "test";
option java_outer_classname = "UserPBO";

option optimize_for = SPEED;

//new membership class!
message Membership {
enum Type {
REGULAR = 0;
PRO = 1;
}
required bool active = 1;
optional Type type = 2 [default = REGULAR];
}

//wow, where are the email list??
message User {
required uint32 id = 1;
optional string name = 2;
optional Membership membership = 4;
}
Don't forget to compile using protoc/ant. OK, now its time to open the new session which we call membership service. Remember, the sign in service has an old definition of the user object and membership service has a new one.
Welcome to Scala version 2.7.3.final (Java HotSpot(TM) Client VM, Java 1.5.0_16).
Type in expressions to have them evaluated.
Type :help for more information.

scala> import voldemort._
import voldemort._

scala> import test.UserPBO._
import test.UserPBO._

//create the client
scala> val vclient = new VClient[String, User] ("proto-store", "tcp://localhost:6666")
Established connection to proto-store via tcp://localhost:6666
vclient: voldemort.VClient[String,test.UserPBO.User] =
store name : proto-store
bootstrap url : tcp://localhost:6666
key serializer: StringSerializer
val serializer: ProtoBufSerializer

//get the user we created int the sign in service
//note that it doesn't recognize the email list, but it still keeps it around
scala> val user = vclient get "1"
version(0:1)
user: test.UserPBO.User =
id: 1
name: "Joe Smith"
3: "joe@gmail.com"
3: "joe2@yahoo.com"

//append to the user object the membership details
scala> val newUser = User.newBuilder(user).setMembership(Membership.newBuilder.setActive(true).setType(Membership.Type.PRO).build).build
newUser: test.UserPBO.User =
id: 1
name: "Joe Smith"
membership {
active: true
type: PRO
}
3: "joe@gmail.com"
3: "joe2@yahoo.com"

//push the result back into voldmort
scala> vclient put (newUser.getId.toString, newUser)
Now lets go back to our sign in service and do this:

scala> //gets the user back from voldemort.
//It can still recognize all the good members it is used to
//as for the new ones, it can't recognize them, but it does not care
val user2 = vclient get "1"
version(0:2)
user2: test.UserPBO.User =
id: 1
name: "Joe Smith"
email: "joe@gmail.com"
email: "joe2@yahoo.com"
4: "\b\001\020\001"


Conclusions: with protobuf, when you change an object you should only update services who may use the changed members. All other services, even if they do use that object, should not care about it.
Disclaimer: This post does not intend to be a full protobuf tutorial, it focuses on a single protobuf feature and omits the rest.

Creative Commons License This work by Eishay Smith is licensed under a Creative Commons Attribution 3.0 Unported License.