Trying to figure out how to keep encoder design simple, symetric and extensible. At this point this section is really just a brain dump.
It might be a good idea to separate encoder functionality into separate encoder layers. This way we can isolation operations to divide the responsibilities of each encoder keeping each encoder simple. We are talking about stacking multiple encoders on top of each other.
The most primitive teir could be an encoder concerned with writing chunks out to a channel. On top of that may reside another encoder that converts TLV Tuple Tag, Length and Value events into a stream of chunked buffers. High level encoders can be designed to take stub instances as and input to produce a stream of TLV tuple events. These events are then pipped into the low level Tuple event encoder, then written out as chunks to a channel. We really will not be concerned with where the chunks go the event producer consumer pipeline is the primary concern for us.
There are several benefits to this approach. Here's a few of them:
This low level TLV Tuple event stream encoder will need to receive TLV events somehow. The encoder hence smust implement some kind of callback. The fact that it is a callback makes it receive the substrate this way rather than through the conventional encode() pathway. What then happens to the idea of an encode() method that maintains symetry with the decoder's decode() method?
The encode() method may still need to be used to turn the substrate (TLV tuples) into bytes in a buffer. This may be called externally as well as by the callback methods implemented by the encoder. The callback methods might use encode() after or before performing some house keeping operations. There are however wholes left wide open with this approach which could interfer with tuple nesting. Basically we will have two pathways for TLV delivery. First through the general encode() method second through the TLV event delivery methods. Which one is authoritative? Plus the TLV event methods are geared to handle TLV nesting. Receiving a Tuple via the encode() method could interfere with nesting and proper Tuple serialization into a buffer.
Perhaps to deal with these pitfalls we need to devise some standards around usage. The encode() method could be implemented to only allow the encoding of primitive Tuples. Still in there is the possibility of interfering with nesting and encoding order as Do we need all TLV events for the low level encoder?
For an exercise let's look at usage patterns from the top down. At the topmost level an encoder will be devised to multiplex many specific encoders. This top level encoder will map stub interfaces to other stub specific encoders with knowledge of the ASN.1 data type and the stub. When calls are made to encode(Object) the topmost encoder will use the stub interface of the Object to lookup an appropriate stub specific encoder. This topmost encoder hence multiplexes other encoders based on the argument to encode. The stub specific encoder recieves the substrate as the argument to the encode(Object) method. It generates a series of TLV tuple events using a special callback interface to convey the Tuple nesting pattern. The standard EncoderCallback is not sufficient for this. The topmost multiplexing encoder must recieve the callback events of the stub specific encoders and call its own callback as if it generated each event. The presence of the stub specific encoders is transparent. Below this system the low level encoder recieves the TLV tuple events generated and serializes them within buffers emitting buffers as encoded objects to its own callback. Here's how this all looks:
Creating a determinate length encoder without sacrificing efficiency is not easy. Making the code easy to manage and read is yet another more difficult matter. This is already very difficult to manage with the state machine approach we have taken. Furthermore we have found many clients and servers which reject the use of the indeterminate form even though according to the spec BER allows for such variance in encodings.
Efficiency is difficult to achieve because we need to know the lengths of nested (TLV) nodes to build constructed nodes with a determinate length encoding. Since the topmost constructed TLV is the first out the door we cannot transmit it until all nested TLV nodes have been generated with their lengths already calculated. A brut force approach might produce a TLV tree first before serializing the output to a stream. This way all nested TLV's and up can have their length fields computed depth first. This means keeping the entire transfer image in memory as well as the structures needed to manage a tree. Although DoS attacks are not as much of a concern for the encoding phase as they are for decoding in a server, the approach would still result in very inefficient encode operations especially when large PDUs are being transmitted either by a server or a client. Plus there is the fact that a PDU stub already exists with the same copy of the information making it highly likely that more than 2 times the transfer image will be required.
We must ask our selves if there is any way to avoid keeping the entire transfer image in memory. Alan and Alex have discussed the use of referrals to large binary data rather than keeping the data in memory during codec operation. A referral would correspond to a channel, or a stream to recall or store the data in question. This way large binary values can be streamed from or to disk. Eventually stubs will support these references although we do not have the mechanism completely defined yet. If the same referrence can be held in the place of the V field of a TLV then we can avoid having more than 2X the transfer image in memory. This however will not be the case when PDU's are tiny with small feilds well below the threshold used to gauge when disk streaming is to occur. This is still however one means to keep the in memory foot print down when PDU's with large fields are encoded.