diff --git a/docs/source/serialization-enum-evolution.rst b/docs/source/serialization-enum-evolution.rst new file mode 100644 index 0000000000..1b48594fe9 --- /dev/null +++ b/docs/source/serialization-enum-evolution.rst @@ -0,0 +1,335 @@ +Enum Evolution +============== + +.. contents:: + +In the continued development of a CorDapp an enumerated type that was fit for purpose at one time may +require changing. Normally, this would be problematic as anything serialised (and kept in a vault) would +run the risk of being unable to be deserialized in the future or older versions of the app still alive +within a compatibility zone may fail to deserialize a message. + +To facilitate backward and forward support for alterations to enumerated types Corda's serialization +framework supports the evolution of such types through a well defined framework that allows different +versions to interoperate with serialised versions of an enumeration of differing versions. + +This is achieved through the use of certain annotations. Whenever a change is made, an annotation +capturing the change must be added (whilst it can be omitted any interoperability will be lost). Corda +supports two modifications to enumerated types, adding new constants, and renaming existing constants + +.. warning:: Once added evolution annotations MUST NEVER be removed from a class, doing so will break + both forward and backward compatibility for this version of the class and any version moving + forward + +The Purpose of Annotating Changes +--------------------------------- + +The biggest hurdle to allowing enum constants to be changed is that there will exist instances of those +classes, either serialized in a vault or on nodes with the old, unmodified, version of the class that we +must be able to interoperate with. Thus if a received data structure references an enum assigned a constant +value that doesn't exist on the running JVM, a solution is needed. + +For this, we use the annotations to allow developers to express their backward compatible intentions. + +In the case of renaming constants this is somewhat obvious, the deserializing node will simply treat any +constants it doesn't understand as their "old" values, i.e. those values that it currently knows about. + +In the case of adding new constants the developer must chose which constant (that existed *before* adding +the new one) a deserializing system should treat any instances of the new one as. + +.. note:: Ultimately, this may mean some design compromises are required. If an enumeration is + planned as being often extended and no sensible defaults will exist then including a constant + in the original version of the class that all new additions can default to may make sense + +Evolution Transmission +---------------------- + +An object serializer, on creation, will inspect the class it represents for any evolution annotations. +If a class is thus decorated those rules will be encoded as part of any serialized representation of a +data structure containing that class. This ensures that on deserialization the deserializing object will +have access to any transformative rules it needs to build a local instance of the serialized object. + +Evolution Precedence +-------------------- + +On deserialization (technically on construction of a serialization object that facilitates serialization +and deserialization) a class's fingerprint is compared to the fingerprint received as part of the AMQP +header of the corresponding class. If they match then we are sure that the two class versions are functionally +the same and no further steps are required save the deserialization of the serialized information into an instance +of the class. + +If, however, the fingerprints differ then we know that the class we are attempting to deserialize is different +than the version we will be deserializing it into. What we cannot know is which version is newer, at least +not by examining the fingerprint + +.. note:: Corda's AMQP fingerprinting for enumerated types include the type name and the enum constants + +Newer vs older is important as the deserializer needs to use the more recent set of transforms to ensure it +can transform the serialised object into the form as it exists in the deserializer. Newness is determined simply +by length of the list of all transforms. This is sufficient as transform annotations should only ever be added + +.. warning:: technically there is nothing to prevent annotations being removed in newer versions. However, + this will break backward compatibility and should thus be avoided unless a rigorous upgrade procedure + is in place to cope with all deployed instances of the class and all serialised versions existing + within vaults. + +Thus, on deserialization, there will be two options to chose from in terms of transformation rules + + #. Determined from the local class and the annotations applied to it (the local copy) + #. Parsed from the AMQP header (the remote copy) + +Which set is used will simply be the largest. + +Renaming Constants +------------------ + +Renamed constants are marked as such with the ``@CordaSerializationTransformRenames`` meta annotation that +wraps a list of ``@CordaSerializationTransformRename`` annotations. Each rename requiring an instance in the +list. + +Each instance must provide the new name of the constant as well as the old. For example, consider the following enumeration: + +.. container:: codeset + + .. sourcecode:: kotlin + + enum class Example { + A, B, C + } + +If we were to rename constant C to D this would be done as follows: + +.. container:: codeset + + .. sourcecode:: kotlin + + @CordaSerializationTransformRenames ( + CordaSerializationTransformRename("D", "C") + ) + enum class Example { + A, B, D + } + +.. note:: The parameters to the ``CordaSerializationTransformRename`` annotation are defined as 'to' and 'from, + so in the above example it can be read as constant D (given that is how the class now exists) was renamed + from C + +In the case where a single rename has been applied the meta annotation may be omitted. Thus, the following is +functionally identical to the above: + +.. container:: codeset + + .. sourcecode:: kotlin + + @CordaSerializationTransformRename("D", "C") + enum class Example { + A, B, D + } + +However, as soon as a second rename is made the meta annotation must be used. For example, if at some time later +B is renamed to E: + +.. container:: codeset + + .. sourcecode:: kotlin + + @CordaSerializationTransformRenames ( + CordaSerializationTransformRename(from = "B", to = "E"), + CordaSerializationTransformRename(from = "C", to = "D") + ) + enum class Example { + A, E, D + } + +Rules +~~~~~ + + #. A constant cannot be renamed to match an existing constant, this is enforced through language constraints + #. A constant cannot be renamed to a value that matches any previous name of any other constant + +If either of these covenants are inadvertently broken, a ``NotSerializableException`` will be thrown on detection +by the serialization engine as soon as they are detected. Normally this will be the first time an object doing +so is serialized. However, in some circumstances, it could be at the point of deserialization. + +Adding Constants +---------------- + +Enumeration constants can be added with the ``@CordaSerializationTransformEnumDefaults`` meta annotation that +wraps a list of ``CordaSerializationTransformEnumDefault`` annotations. For each constant added an annotation +must be included that signifies, on deserialization, which constant value should be used in place of the +serialised property if that value doesn't exist on the version of the class as it exists on the deserializing +node. + +.. container:: codeset + + .. sourcecode:: kotlin + + enum class Example { + A, B, C + } + +If we were to add the constant D + +.. container:: codeset + + .. sourcecode:: kotlin + + @CordaSerializationTransformEnumDefaults ( + CordaSerializationTransformEnumDefault("D", "C") + ) + enum class Example { + A, B, C, D + } + +.. note:: The parameters to the ``CordaSerializationTransformEnumDefault`` annotation are defined as 'new' and 'old', + so in the above example it can be read as constant D should be treated as constant C if you, the deserializing + node, don't know anything about constant D + +.. note:: Just as with the ``CordaSerializationTransformRename`` transformation if a single transform is being applied + then the meta transform may be omitted. + + .. container:: codeset + + .. sourcecode:: kotlin + + @CordaSerializationTransformEnumDefault("D", "C") + enum class Example { + A, B, C, D + } + +New constants may default to any other constant older than them, including constants that have also been added +since inception. In this example, having added D (above) we add the constant E and chose to default it to D + +.. container:: codeset + + .. sourcecode:: kotlin + + @CordaSerializationTransformEnumDefaults ( + CordaSerializationTransformEnumDefault("E", "D"), + CordaSerializationTransformEnumDefault("D", "C") + ) + enum class Example { + A, B, C, D, E + } + +.. note:: Alternatively, we could have decided both new constants should have been defaulted to the first + element + + .. sourcecode:: kotlin + + @CordaSerializationTransformEnumDefaults ( + CordaSerializationTransformEnumDefault("E", "A"), + CordaSerializationTransformEnumDefault("D", "A") + ) + enum class Example { + A, B, C, D, E + } + +When deserializing the most applicable transform will be applied. Continuing the above example, deserializing +nodes could have three distinct views on what the enum Example looks like (annotations omitted for brevity) + +.. container:: codeset + + .. sourcecode:: kotlin + + // The original version of the class. Will deserialize: - + // A -> A + // B -> B + // C -> C + // D -> C + // E -> C + enum class Example { + A, B, C + } + + .. sourcecode:: kotlin + + // The class as it existed after the first addition. Will deserialize: + // A -> A + // B -> B + // C -> C + // D -> D + // E -> D + enum class Example { + A, B, C, D + } + + .. sourcecode:: kotlin + + // The current state of the class. All values will deserialize as themselves + enum class Example { + A, B, C, D, E + } + +Thus, when deserializing a value that has been encoded as E could be set to one of three constants (E, D, and C) +depending on how the deserializing node understands the class. + +Rules +~~~~~ + + #. New constants must be added to the end of the existing list of constants + #. Defaults can only be set to "older" constants, i.e. those to the left of the new constant in the list + #. Constants must never be removed once added + #. New constants can be renamed at a later date using the appropriate annotation + #. When renamed, if a defaulting annotation refers to the old name, it should be left as is + +Combining Evolutions +--------------------- + +Renaming constants and adding constants can be combined over time as a class changes freely. Added constants can +in turn be renamed and everything will continue to be deserializeable. For example, consider the following enum: + +.. container:: codeset + + .. sourcecode:: kotlin + + enum class OngoingExample { A, B, C } + +For the first evolution, two constants are added, D and E, both of which are set to default to C when not present + +.. container:: codeset + + .. sourcecode:: kotlin + + @CordaSerializationTransformEnumDefaults ( + CordaSerializationTransformEnumDefault("E", "C"), + CordaSerializationTransformEnumDefault("D", "C") + ) + enum class OngoingExample { A, B, C, D, E } + +Then lets assume constant C is renamed to CAT + +.. container:: codeset + + .. sourcecode:: kotlin + + @CordaSerializationTransformEnumDefaults ( + CordaSerializationTransformEnumDefault("E", "C"), + CordaSerializationTransformEnumDefault("D", "C") + ) + @CordaSerializationTransformRename("C", "CAT") + enum class OngoingExample { A, B, CAT, D, E } + +Note how the first set of modifications still reference C, not CAT. This is as it should be and will +continue to work as expected. + +Subsequently is is fine to add an additional new constant that references the renamed value. + +.. container:: codeset + + .. sourcecode:: kotlin + + @CordaSerializationTransformEnumDefaults ( + CordaSerializationTransformEnumDefault("F", "CAT"), + CordaSerializationTransformEnumDefault("E", "C"), + CordaSerializationTransformEnumDefault("D", "C") + ) + @CordaSerializationTransformRename("C", "CAT") + enum class OngoingExample { A, B, CAT, D, E, F } + +Unsupported Evolutions +---------------------- + +The following evolutions are not currently supports + + #. Removing constants + #. Reordering constants diff --git a/docs/source/serialization.rst b/docs/source/serialization.rst index 1b51f95110..ed5a278feb 100644 --- a/docs/source/serialization.rst +++ b/docs/source/serialization.rst @@ -47,13 +47,12 @@ It's reproduced here as an example of both ways you can do this for a couple of AMQP ==== -.. note:: AMQP serialization is not currently live and will be turned on in a future release. - -The long term goal is to migrate the current serialization format for everything except checkpoints away from the current -``Kryo``-based format to a more sustainable, self-describing and controllable format based on AMQP 1.0. The primary drivers for that move are: +Originally Corda used a ``Kryo``-based serialization scheme throughout for all serialization contexts. However, it was realised there +was a compelling use case for the definition and development of a custom format based upon AMQP 1.0. The primary drivers for this were #. A desire to have a schema describing what has been serialized along-side the actual data: - #. To assist with versioning, both in terms of being able to interpret long ago archived data (e.g. trades from + + #. To assist with versioning, both in terms of being able to interpret long ago archivEd data (e.g. trades from a decade ago, long after the code has changed) and between differing code versions. #. To make it easier to write user interfaces that can navigate the serialized form of data. #. To support cross platform (non-JVM) interaction, where the format of a class file is not so easily interpreted. @@ -65,7 +64,24 @@ The long term goal is to migrate the current serialization format for everything data poked directly into their fields without an opportunity to validate consistency or intercept attempts to manipulate supposed invariants. -Documentation on that format, and how JVM classes are translated to AMQP, will be linked here when it is available. +Delivering this is an ongoing effort by the Corda development team. At present, the ``Kryo``-based format is still used by the RPC framework on +both the client and server side. However, it is planned that this will move to the AMQP framework when ready. + +The AMQP framework is currently used for: + + #. The peer to peer context, representing inter-node communication. + #. The persistence layer, representing contract states persisted into the vault. + +Finally, for the checkpointing of flows Corda will continue to use the existing ``Kryo`` scheme. + +This separation of serialization schemes into different contexts allows us to use the most suitable framework for that context rather than +attempting to force a one size fits all approach. Where ``Kryo`` is more suited to the serialization of a programs stack frames, being more flexible +than our AMQP framework in what it can construct and serialize, that flexibility makes it exceptionally difficult to make secure. Conversly +our AMQP framework allows us to concentrate on a robust a secure framework that can be reasoned about thus made safer with far fewer unforeseen +security holes. + +.. note:: Selection of serialization context should, for the most part, be opaque to CorDapp developers, the Corda framework selecting + the correct context as confugred. .. For information on our choice of AMQP 1.0, see :doc:`amqp-choice`. For detail on how we utilise AMQP 1.0 and represent objects in AMQP types, see :doc:`amqp-format`. @@ -319,14 +335,6 @@ Enums #. All enums are supported, provided they are annotated with ``@CordaSerializable``. -.. warning:: Use of enums in CorDapps requires potentially deeper consideration than in other application environments - due to the challenges of simultaneously upgrading the code on all nodes. It is therefore important to consider the code - evolution perspective, since an older version of the enum code cannot - accommodate a newly added element of the enum in a new version of the enum code. See `Type Evolution`_. Hence, enums are - a good fit for genuinely static data that will *never* change. e.g. Days of the week is not going to be extended any time - soon and is indeed an enum in the Java library. A Buy or Sell indicator is another. However, something like - Trade Type or Currency Code is likely not, since who's to say a new trade type or currency will not come along soon. For - those it is better to choose another representation: perhaps just a string. Exceptions `````````` @@ -363,10 +371,6 @@ Future Enhancements static method responsible for returning the singleton instance. #. Instance internalizing support. We will add support for identifying classes that should be resolved against an instances map to avoid creating many duplicate instances that are equal. Similar to ``String.intern()``. - #. Enum evolution support. We *may* introduce an annotation that can be applied to an enum element to indicate that - if an unrecognised enum entry is deserialized from a newer version of the code, it should be converted to that - element in the older version of the code. This is dependent on identifying a suitable use case, since it does - mutate the data when transported to another node, which could be considered hazardous. .. Type Evolution: @@ -379,3 +383,10 @@ and a version of the current state of the class instantiated. More detail can be found in :doc:`serialization-default-evolution` +Enum Evolution +`````````````` +Corda supports interoperability of enumerated type versions. This allows such types to be changed over time without breaking +backward (or forward) compatibility. The rules and mechanisms for doing this are discussed in :doc:`serialization-enum-evolution`` + + +