Enum evolution documentation (#2189)

* CORDA-553 - Documentation

* CORDA-553 - Documentation

* Review comments

* review comments

* DOCUMENTATION: Serilization docs review updates
This commit is contained in:
Katelyn Baker 2018-01-05 19:30:17 +00:00 committed by GitHub
parent 6a07576c96
commit 83a0a2fa3c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 364 additions and 18 deletions

View File

@ -0,0 +1,335 @@
Enum Evolution
==============
.. contents::
In the continued development of a CorDapp an enumerated type that was fit for purpose at one time may
require changing. Normally, this would be problematic as anything serialised (and kept in a vault) would
run the risk of being unable to be deserialized in the future or older versions of the app still alive
within a compatibility zone may fail to deserialize a message.
To facilitate backward and forward support for alterations to enumerated types Corda's serialization
framework supports the evolution of such types through a well defined framework that allows different
versions to interoperate with serialised versions of an enumeration of differing versions.
This is achieved through the use of certain annotations. Whenever a change is made, an annotation
capturing the change must be added (whilst it can be omitted any interoperability will be lost). Corda
supports two modifications to enumerated types, adding new constants, and renaming existing constants
.. warning:: Once added evolution annotations MUST NEVER be removed from a class, doing so will break
both forward and backward compatibility for this version of the class and any version moving
forward
The Purpose of Annotating Changes
---------------------------------
The biggest hurdle to allowing enum constants to be changed is that there will exist instances of those
classes, either serialized in a vault or on nodes with the old, unmodified, version of the class that we
must be able to interoperate with. Thus if a received data structure references an enum assigned a constant
value that doesn't exist on the running JVM, a solution is needed.
For this, we use the annotations to allow developers to express their backward compatible intentions.
In the case of renaming constants this is somewhat obvious, the deserializing node will simply treat any
constants it doesn't understand as their "old" values, i.e. those values that it currently knows about.
In the case of adding new constants the developer must chose which constant (that existed *before* adding
the new one) a deserializing system should treat any instances of the new one as.
.. note:: Ultimately, this may mean some design compromises are required. If an enumeration is
planned as being often extended and no sensible defaults will exist then including a constant
in the original version of the class that all new additions can default to may make sense
Evolution Transmission
----------------------
An object serializer, on creation, will inspect the class it represents for any evolution annotations.
If a class is thus decorated those rules will be encoded as part of any serialized representation of a
data structure containing that class. This ensures that on deserialization the deserializing object will
have access to any transformative rules it needs to build a local instance of the serialized object.
Evolution Precedence
--------------------
On deserialization (technically on construction of a serialization object that facilitates serialization
and deserialization) a class's fingerprint is compared to the fingerprint received as part of the AMQP
header of the corresponding class. If they match then we are sure that the two class versions are functionally
the same and no further steps are required save the deserialization of the serialized information into an instance
of the class.
If, however, the fingerprints differ then we know that the class we are attempting to deserialize is different
than the version we will be deserializing it into. What we cannot know is which version is newer, at least
not by examining the fingerprint
.. note:: Corda's AMQP fingerprinting for enumerated types include the type name and the enum constants
Newer vs older is important as the deserializer needs to use the more recent set of transforms to ensure it
can transform the serialised object into the form as it exists in the deserializer. Newness is determined simply
by length of the list of all transforms. This is sufficient as transform annotations should only ever be added
.. warning:: technically there is nothing to prevent annotations being removed in newer versions. However,
this will break backward compatibility and should thus be avoided unless a rigorous upgrade procedure
is in place to cope with all deployed instances of the class and all serialised versions existing
within vaults.
Thus, on deserialization, there will be two options to chose from in terms of transformation rules
#. Determined from the local class and the annotations applied to it (the local copy)
#. Parsed from the AMQP header (the remote copy)
Which set is used will simply be the largest.
Renaming Constants
------------------
Renamed constants are marked as such with the ``@CordaSerializationTransformRenames`` meta annotation that
wraps a list of ``@CordaSerializationTransformRename`` annotations. Each rename requiring an instance in the
list.
Each instance must provide the new name of the constant as well as the old. For example, consider the following enumeration:
.. container:: codeset
.. sourcecode:: kotlin
enum class Example {
A, B, C
}
If we were to rename constant C to D this would be done as follows:
.. container:: codeset
.. sourcecode:: kotlin
@CordaSerializationTransformRenames (
CordaSerializationTransformRename("D", "C")
)
enum class Example {
A, B, D
}
.. note:: The parameters to the ``CordaSerializationTransformRename`` annotation are defined as 'to' and 'from,
so in the above example it can be read as constant D (given that is how the class now exists) was renamed
from C
In the case where a single rename has been applied the meta annotation may be omitted. Thus, the following is
functionally identical to the above:
.. container:: codeset
.. sourcecode:: kotlin
@CordaSerializationTransformRename("D", "C")
enum class Example {
A, B, D
}
However, as soon as a second rename is made the meta annotation must be used. For example, if at some time later
B is renamed to E:
.. container:: codeset
.. sourcecode:: kotlin
@CordaSerializationTransformRenames (
CordaSerializationTransformRename(from = "B", to = "E"),
CordaSerializationTransformRename(from = "C", to = "D")
)
enum class Example {
A, E, D
}
Rules
~~~~~
#. A constant cannot be renamed to match an existing constant, this is enforced through language constraints
#. A constant cannot be renamed to a value that matches any previous name of any other constant
If either of these covenants are inadvertently broken, a ``NotSerializableException`` will be thrown on detection
by the serialization engine as soon as they are detected. Normally this will be the first time an object doing
so is serialized. However, in some circumstances, it could be at the point of deserialization.
Adding Constants
----------------
Enumeration constants can be added with the ``@CordaSerializationTransformEnumDefaults`` meta annotation that
wraps a list of ``CordaSerializationTransformEnumDefault`` annotations. For each constant added an annotation
must be included that signifies, on deserialization, which constant value should be used in place of the
serialised property if that value doesn't exist on the version of the class as it exists on the deserializing
node.
.. container:: codeset
.. sourcecode:: kotlin
enum class Example {
A, B, C
}
If we were to add the constant D
.. container:: codeset
.. sourcecode:: kotlin
@CordaSerializationTransformEnumDefaults (
CordaSerializationTransformEnumDefault("D", "C")
)
enum class Example {
A, B, C, D
}
.. note:: The parameters to the ``CordaSerializationTransformEnumDefault`` annotation are defined as 'new' and 'old',
so in the above example it can be read as constant D should be treated as constant C if you, the deserializing
node, don't know anything about constant D
.. note:: Just as with the ``CordaSerializationTransformRename`` transformation if a single transform is being applied
then the meta transform may be omitted.
.. container:: codeset
.. sourcecode:: kotlin
@CordaSerializationTransformEnumDefault("D", "C")
enum class Example {
A, B, C, D
}
New constants may default to any other constant older than them, including constants that have also been added
since inception. In this example, having added D (above) we add the constant E and chose to default it to D
.. container:: codeset
.. sourcecode:: kotlin
@CordaSerializationTransformEnumDefaults (
CordaSerializationTransformEnumDefault("E", "D"),
CordaSerializationTransformEnumDefault("D", "C")
)
enum class Example {
A, B, C, D, E
}
.. note:: Alternatively, we could have decided both new constants should have been defaulted to the first
element
.. sourcecode:: kotlin
@CordaSerializationTransformEnumDefaults (
CordaSerializationTransformEnumDefault("E", "A"),
CordaSerializationTransformEnumDefault("D", "A")
)
enum class Example {
A, B, C, D, E
}
When deserializing the most applicable transform will be applied. Continuing the above example, deserializing
nodes could have three distinct views on what the enum Example looks like (annotations omitted for brevity)
.. container:: codeset
.. sourcecode:: kotlin
// The original version of the class. Will deserialize: -
// A -> A
// B -> B
// C -> C
// D -> C
// E -> C
enum class Example {
A, B, C
}
.. sourcecode:: kotlin
// The class as it existed after the first addition. Will deserialize:
// A -> A
// B -> B
// C -> C
// D -> D
// E -> D
enum class Example {
A, B, C, D
}
.. sourcecode:: kotlin
// The current state of the class. All values will deserialize as themselves
enum class Example {
A, B, C, D, E
}
Thus, when deserializing a value that has been encoded as E could be set to one of three constants (E, D, and C)
depending on how the deserializing node understands the class.
Rules
~~~~~
#. New constants must be added to the end of the existing list of constants
#. Defaults can only be set to "older" constants, i.e. those to the left of the new constant in the list
#. Constants must never be removed once added
#. New constants can be renamed at a later date using the appropriate annotation
#. When renamed, if a defaulting annotation refers to the old name, it should be left as is
Combining Evolutions
---------------------
Renaming constants and adding constants can be combined over time as a class changes freely. Added constants can
in turn be renamed and everything will continue to be deserializeable. For example, consider the following enum:
.. container:: codeset
.. sourcecode:: kotlin
enum class OngoingExample { A, B, C }
For the first evolution, two constants are added, D and E, both of which are set to default to C when not present
.. container:: codeset
.. sourcecode:: kotlin
@CordaSerializationTransformEnumDefaults (
CordaSerializationTransformEnumDefault("E", "C"),
CordaSerializationTransformEnumDefault("D", "C")
)
enum class OngoingExample { A, B, C, D, E }
Then lets assume constant C is renamed to CAT
.. container:: codeset
.. sourcecode:: kotlin
@CordaSerializationTransformEnumDefaults (
CordaSerializationTransformEnumDefault("E", "C"),
CordaSerializationTransformEnumDefault("D", "C")
)
@CordaSerializationTransformRename("C", "CAT")
enum class OngoingExample { A, B, CAT, D, E }
Note how the first set of modifications still reference C, not CAT. This is as it should be and will
continue to work as expected.
Subsequently is is fine to add an additional new constant that references the renamed value.
.. container:: codeset
.. sourcecode:: kotlin
@CordaSerializationTransformEnumDefaults (
CordaSerializationTransformEnumDefault("F", "CAT"),
CordaSerializationTransformEnumDefault("E", "C"),
CordaSerializationTransformEnumDefault("D", "C")
)
@CordaSerializationTransformRename("C", "CAT")
enum class OngoingExample { A, B, CAT, D, E, F }
Unsupported Evolutions
----------------------
The following evolutions are not currently supports
#. Removing constants
#. Reordering constants

View File

@ -47,13 +47,12 @@ It's reproduced here as an example of both ways you can do this for a couple of
AMQP
====
.. note:: AMQP serialization is not currently live and will be turned on in a future release.
The long term goal is to migrate the current serialization format for everything except checkpoints away from the current
``Kryo``-based format to a more sustainable, self-describing and controllable format based on AMQP 1.0. The primary drivers for that move are:
Originally Corda used a ``Kryo``-based serialization scheme throughout for all serialization contexts. However, it was realised there
was a compelling use case for the definition and development of a custom format based upon AMQP 1.0. The primary drivers for this were
#. A desire to have a schema describing what has been serialized along-side the actual data:
#. To assist with versioning, both in terms of being able to interpret long ago archived data (e.g. trades from
#. To assist with versioning, both in terms of being able to interpret long ago archivEd data (e.g. trades from
a decade ago, long after the code has changed) and between differing code versions.
#. To make it easier to write user interfaces that can navigate the serialized form of data.
#. To support cross platform (non-JVM) interaction, where the format of a class file is not so easily interpreted.
@ -65,7 +64,24 @@ The long term goal is to migrate the current serialization format for everything
data poked directly into their fields without an opportunity to validate consistency or intercept attempts to manipulate
supposed invariants.
Documentation on that format, and how JVM classes are translated to AMQP, will be linked here when it is available.
Delivering this is an ongoing effort by the Corda development team. At present, the ``Kryo``-based format is still used by the RPC framework on
both the client and server side. However, it is planned that this will move to the AMQP framework when ready.
The AMQP framework is currently used for:
#. The peer to peer context, representing inter-node communication.
#. The persistence layer, representing contract states persisted into the vault.
Finally, for the checkpointing of flows Corda will continue to use the existing ``Kryo`` scheme.
This separation of serialization schemes into different contexts allows us to use the most suitable framework for that context rather than
attempting to force a one size fits all approach. Where ``Kryo`` is more suited to the serialization of a programs stack frames, being more flexible
than our AMQP framework in what it can construct and serialize, that flexibility makes it exceptionally difficult to make secure. Conversly
our AMQP framework allows us to concentrate on a robust a secure framework that can be reasoned about thus made safer with far fewer unforeseen
security holes.
.. note:: Selection of serialization context should, for the most part, be opaque to CorDapp developers, the Corda framework selecting
the correct context as confugred.
.. For information on our choice of AMQP 1.0, see :doc:`amqp-choice`. For detail on how we utilise AMQP 1.0 and represent
objects in AMQP types, see :doc:`amqp-format`.
@ -319,14 +335,6 @@ Enums
#. All enums are supported, provided they are annotated with ``@CordaSerializable``.
.. warning:: Use of enums in CorDapps requires potentially deeper consideration than in other application environments
due to the challenges of simultaneously upgrading the code on all nodes. It is therefore important to consider the code
evolution perspective, since an older version of the enum code cannot
accommodate a newly added element of the enum in a new version of the enum code. See `Type Evolution`_. Hence, enums are
a good fit for genuinely static data that will *never* change. e.g. Days of the week is not going to be extended any time
soon and is indeed an enum in the Java library. A Buy or Sell indicator is another. However, something like
Trade Type or Currency Code is likely not, since who's to say a new trade type or currency will not come along soon. For
those it is better to choose another representation: perhaps just a string.
Exceptions
``````````
@ -363,10 +371,6 @@ Future Enhancements
static method responsible for returning the singleton instance.
#. Instance internalizing support. We will add support for identifying classes that should be resolved against an instances map to avoid
creating many duplicate instances that are equal. Similar to ``String.intern()``.
#. Enum evolution support. We *may* introduce an annotation that can be applied to an enum element to indicate that
if an unrecognised enum entry is deserialized from a newer version of the code, it should be converted to that
element in the older version of the code. This is dependent on identifying a suitable use case, since it does
mutate the data when transported to another node, which could be considered hazardous.
.. Type Evolution:
@ -379,3 +383,10 @@ and a version of the current state of the class instantiated.
More detail can be found in :doc:`serialization-default-evolution`
Enum Evolution
``````````````
Corda supports interoperability of enumerated type versions. This allows such types to be changed over time without breaking
backward (or forward) compatibility. The rules and mechanisms for doing this are discussed in :doc:`serialization-enum-evolution``