Introduction

We're designing the decoder peice using the stateful decoder interfaces that were carved up for the commons-codec API within their stateful package. The stateful decoder is designed to maintain state while processing arriving chunks of TLV data.

There are several issues confronted by the decoder while decoding a stream of TLV trees where TLV tuples are nested within each other. In general this decoder is viewed as a simple line parser for TLV tuples notifying of their arrival via callbacks.

The BER encoder and decoder (codec) will be designed to operate very much like the Simple API for XML (SAX). It will generate events which are calls on a handler as it encounters low level encoded structures called Tag Length Value (TLV) tuples.

Rather than return values, which could be extremely large, in one peice the decoder for example returns peices of a value until it completes processing the entire V of the TLV. This makes the decoder highly attractive for servers using non-blocking IO and SocketChannels. This design gives it a small decoding footprint regardless of the size of the Protocol Data Unit (PDU) being processed. It also makes it much faster since the decoder deals with a small simple task without much conditional logic for processing a PDU. We hope the combined benefits of non-blocking IO and this sleek codec, make any BER based protocol server extremely responsive under heavy loads with massive concurrency.

Requirements

The decoder must be fast, have a fixed memory footprint, and be simple. It should perform only one task: notifying content handlers via callbacks of the arrival of TLV tuples. While doing so it must maintain state in between calls to decode a chunk of arriving BER encoded data.

It should not try to interpret the content of the TLV tuples. These aspects are left to be handled by higher level content based facilities that build on top of the BERDecoder. These higher facilities provide their own callbacks to build on TLV events. The SnickersDecoder which transforms ASN.1 BER TLV Tuples into messages uses the BERDecoder in this way to give meaning to the arriving content.

Object Reuse and Using Primitive Types

The density of TLV tuples encountered during decoder operation will vary based on message characteristics. One of the most involved aspects to the decoder is to spit out TLV tuples when emitting events.

We could just instantiate a new TLV tuple object for every tuple but this would slow the decoder down and increase the memory footprint making it less efficient. For this reason we decided to reuse the same TLV tuple to deliver TLV data via notification events on the callback. The callback implementation must copy the tuple's data if it intends to save the TLV for use later. Otherwise the decoder will overwrite the TLV members on the next event. We leave the option to copy the TLV upto the higher level facility that way only those TLV tuples of interest, known only to the content specific handler, can be copied. Why waste space and time on events that will are not of interest?

The most complex part of the decoder deals with maintaining state while decoding. Data can arrive at any time to contain any part of a TLV or multiple TLVs along with parts of others. Often the fragmentation signature to the data along with its size will not be known. Furthermore the nesting of TLVs must be tracked while maintaining state. A stack is used to track the nesting of TLV tuples within a TLV tree.

We do not instantiate TLV tuple objects so pushing the one TLV instance we reuse is pointless. We could use two approaches here to handle this issue. First we could just create a new instance only for those TLV tuples that nest others and hence need to be pushed onto the stack. Or we can use multiple primitive stacks based on an int to store the set of values contained in the tuple. The second approach leads to greater complexity while the first leads to some overhead in extra instantiation time and memory which is negligable really. Which approach is best depends on the number of members in the tuple or in otherwords the number of primitive int stacks used.

We wrote a little test to figure out when one approach out performs the other in the ObjectVersePrimitiveTest stress test. From tinkering with the parameters of the test case we found the use of primitives to out perform tuple object instantiation when the number of member stacks is less than or equal to 2. If the number of stacks used is 3 or more then instantiating a constructed TLV object and pushing it onto one stack is a better strategy. In our case we have 3 peices of information that need to be pushed and poped together so from this test data the choice is clear. We clone the TLV tuple or instantiate a new one for constructed TLVs that are pushed onto a single stack. This is faster and removes the need to manage multiple stacks making the code less complex.

To be continued ...

More to come soon ...