Thursday, May 31, 2012

Hector's ColumnSliceIterator

Hector is a popular Java client library for Cassandra. Hector offers several classes and APIs for performing range and slice queries. ColumnSliceIterator is one such class. It implements java.util.Iterator and encapsulates some basic paging functionality. By default it fetches 100 columns per page/batch. Below is an example of how I was using it in one place in my code that landed me in some trouble.


That code resulted in the following NPE.

java.lang.NullPointerException
 at me.prettyprint.cassandra.service.ColumnSliceIterator.next(ColumnSliceIterator.java:105)

Being new to both Cassandra and Hector, I immediately assumed that there was a problem with my query. I spent a good bit of time debugging before I realized what was happening. The internal iterator in ColumnSliceIterator is not initialized until hasNext is called. This seems like a bug to me and sure enough I found this issue. It does not look like there are any plans to fix this bug, but fortunately it can be worked around easily enough. Changing my code to,


did the trick. Going forward, I will probably wrap ColumnSliceIterator in an Iterable so that I can use it with Java's for-each loop. This will encapsulate the bug and allow me to better control the paging. This bug kept me up late the other night; so, I thought it worth a short post. Overall my experience with Hector thus far has been really good. It does a nice job of encapsulating Cassandra's Thrift APIs.

2 comments:

  1. Hi John, thanks a lot for this useful explaination!

    ReplyDelete
  2. So did you do the wrapping code? was looking at doing the same thing...

    ReplyDelete