mirror of
https://github.com/corda/corda.git
synced 2025-01-03 19:54:13 +00:00
83a0a2fa3c
* CORDA-553 - Documentation * CORDA-553 - Documentation * Review comments * review comments * DOCUMENTATION: Serilization docs review updates
393 lines
17 KiB
ReStructuredText
393 lines
17 KiB
ReStructuredText
Object serialization
|
|
====================
|
|
|
|
What is serialization (and deserialization)?
|
|
--------------------------------------------
|
|
|
|
Object serialization is the process of converting objects into a stream of bytes and, deserialization, the reverse
|
|
process of creating objects from a stream of bytes. It takes place every time nodes pass objects to each other as
|
|
messages, when objects are sent to or from RPC clients from the node, and when we store transactions in the database.
|
|
|
|
Whitelisting
|
|
------------
|
|
|
|
In classic Java serialization, any class on the JVM classpath can be deserialized. This has shown to be a source of exploits
|
|
and vulnerabilities by exploiting the large set of 3rd party libraries on the classpath as part of the dependencies of
|
|
a JVM application and a carefully crafted stream of bytes to be deserialized. In Corda, we prevent just any class from
|
|
being deserialized (and pro-actively during serialization) by insisting that each object's class belongs on a whitelist
|
|
of allowed classes.
|
|
|
|
Classes get onto the whitelist via one of three mechanisms:
|
|
|
|
#. Via the ``@CordaSerializable`` annotation. In order to whitelist a class, this annotation can be present on the
|
|
class itself, on any of the super classes or on any interface implemented by the class or super classes or any
|
|
interface extended by an interface implemented by the class or superclasses.
|
|
#. By implementing the ``SerializationWhitelist`` interface and specifying a list of `whitelist` classes.
|
|
#. Via the built in Corda whitelist (see the class ``DefaultWhitelist``). Whilst this is not user editable, it does list
|
|
common JDK classes that have been whitelisted for your convenience.
|
|
|
|
The annotation is the preferred method for whitelisting. An example is shown in :doc:`tutorial-clientrpc-api`.
|
|
It's reproduced here as an example of both ways you can do this for a couple of example classes.
|
|
|
|
.. literalinclude:: example-code/src/main/kotlin/net/corda/docs/ClientRpcTutorial.kt
|
|
:language: kotlin
|
|
:start-after: START 7
|
|
:end-before: END 7
|
|
|
|
.. note:: Several of the core interfaces at the heart of Corda are already annotated and so any classes that implement
|
|
them will automatically be whitelisted. This includes `Contract`, `ContractState` and `CommandData`.
|
|
|
|
.. warning:: Java 8 Lambda expressions are not serializable except in flow checkpoints, and then not by default. The syntax to declare a serializable Lambda
|
|
expression that will work with Corda is ``Runnable r = (Runnable & Serializable) () -> System.out.println("Hello World");``, or
|
|
``Callable<String> c = (Callable<String> & Serializable) () -> "Hello World";``.
|
|
|
|
.. warning:: We will be replacing the use of Kryo in the serialization framework and so additional changes here are
|
|
likely.
|
|
|
|
AMQP
|
|
====
|
|
|
|
Originally Corda used a ``Kryo``-based serialization scheme throughout for all serialization contexts. However, it was realised there
|
|
was a compelling use case for the definition and development of a custom format based upon AMQP 1.0. The primary drivers for this were
|
|
|
|
#. A desire to have a schema describing what has been serialized along-side the actual data:
|
|
|
|
#. To assist with versioning, both in terms of being able to interpret long ago archivEd data (e.g. trades from
|
|
a decade ago, long after the code has changed) and between differing code versions.
|
|
#. To make it easier to write user interfaces that can navigate the serialized form of data.
|
|
#. To support cross platform (non-JVM) interaction, where the format of a class file is not so easily interpreted.
|
|
#. A desire to use a documented and static wire format that is platform independent, and is not subject to change with
|
|
3rd party library upgrades etc.
|
|
#. A desire to support open-ended polymorphism, where the number of subclasses of a superclass can expand over time
|
|
and do not need to be defined in the schema *upfront*, which is key to many Corda concepts, such as contract states.
|
|
#. Increased security from deserialized objects being constructed through supported constructors rather than having
|
|
data poked directly into their fields without an opportunity to validate consistency or intercept attempts to manipulate
|
|
supposed invariants.
|
|
|
|
Delivering this is an ongoing effort by the Corda development team. At present, the ``Kryo``-based format is still used by the RPC framework on
|
|
both the client and server side. However, it is planned that this will move to the AMQP framework when ready.
|
|
|
|
The AMQP framework is currently used for:
|
|
|
|
#. The peer to peer context, representing inter-node communication.
|
|
#. The persistence layer, representing contract states persisted into the vault.
|
|
|
|
Finally, for the checkpointing of flows Corda will continue to use the existing ``Kryo`` scheme.
|
|
|
|
This separation of serialization schemes into different contexts allows us to use the most suitable framework for that context rather than
|
|
attempting to force a one size fits all approach. Where ``Kryo`` is more suited to the serialization of a programs stack frames, being more flexible
|
|
than our AMQP framework in what it can construct and serialize, that flexibility makes it exceptionally difficult to make secure. Conversly
|
|
our AMQP framework allows us to concentrate on a robust a secure framework that can be reasoned about thus made safer with far fewer unforeseen
|
|
security holes.
|
|
|
|
.. note:: Selection of serialization context should, for the most part, be opaque to CorDapp developers, the Corda framework selecting
|
|
the correct context as confugred.
|
|
|
|
.. For information on our choice of AMQP 1.0, see :doc:`amqp-choice`. For detail on how we utilise AMQP 1.0 and represent
|
|
objects in AMQP types, see :doc:`amqp-format`.
|
|
|
|
We describe here what is and will be supported in the Corda AMQP format from the perspective
|
|
of CorDapp developers, to allow for CorDapps to take into consideration the future state. The AMQP serialization format will of
|
|
course continue to apply the whitelisting functionality that is already in place and described in :doc:`serialization`.
|
|
|
|
Core Types
|
|
----------
|
|
|
|
Here we describe the classes and interfaces that the AMQP serialization format will support.
|
|
|
|
Collection Types
|
|
````````````````
|
|
|
|
The following collection types are supported. Any implementation of the following will be mapped to *an* implementation of the interface or class on the other end.
|
|
e.g. If you, for example, use a Guava implementation of a collection it will deserialize as a different implementation,
|
|
but will continue to adhere to the most specific of any of the following interfaces. You should use only these types
|
|
as the declared types of fields and properties, and not the concrete implementation types. Collections must be used
|
|
in their generic form, the generic type parameters will be included in the schema, and the elements type checked against the
|
|
generic parameters when deserialized.
|
|
|
|
::
|
|
|
|
java.util.Collection
|
|
java.util.List
|
|
java.util.Set
|
|
java.util.SortedSet
|
|
java.util.NavigableSet
|
|
java.util.NonEmptySet
|
|
java.util.Map
|
|
java.util.SortedMap
|
|
java.util.NavigableMap
|
|
|
|
However, we will support the concrete implementation types below explicitly and also as the declared type of a field, as
|
|
a convenience.
|
|
|
|
::
|
|
|
|
java.util.LinkedHashMap
|
|
java.util.TreeMap
|
|
java.util.EnumSet
|
|
java.util.EnumMap (but only if there is at least one entry)
|
|
|
|
|
|
JVM primitives
|
|
``````````````
|
|
|
|
All the primitive types are supported.
|
|
|
|
::
|
|
|
|
boolean
|
|
byte
|
|
char
|
|
double
|
|
float
|
|
int
|
|
long
|
|
short
|
|
|
|
Arrays
|
|
``````
|
|
|
|
We also support arrays of any supported type, primitive or otherwise.
|
|
|
|
JDK Types
|
|
`````````
|
|
|
|
The following types are supported from the JDK libraries.
|
|
|
|
::
|
|
|
|
java.io.InputStream
|
|
|
|
java.lang.Boolean
|
|
java.lang.Byte
|
|
java.lang.Character
|
|
java.lang.Class
|
|
java.lang.Double
|
|
java.lang.Float
|
|
java.lang.Integer
|
|
java.lang.Long
|
|
java.lang.Short
|
|
java.lang.StackTraceElement
|
|
java.lang.String
|
|
java.lang.StringBuffer
|
|
|
|
java.math.BigDecimal
|
|
|
|
java.security.PublicKey
|
|
|
|
java.time.DayOfWeek
|
|
java.time.Duration
|
|
java.time.Instant
|
|
java.time.LocalDate
|
|
java.time.LocalDateTime
|
|
java.time.LocalTime
|
|
java.time.Month
|
|
java.time.MonthDay
|
|
java.time.OffsetDateTime
|
|
java.time.OffsetTime
|
|
java.time.Period
|
|
java.time.YearMonth
|
|
java.time.Year
|
|
java.time.ZonedDateTime
|
|
java.time.ZonedId
|
|
java.time.ZoneOffset
|
|
|
|
java.util.BitSet
|
|
java.util.Currency
|
|
java.util.UUID
|
|
|
|
Third Party Types
|
|
`````````````````
|
|
|
|
The following 3rd party types are supported.
|
|
|
|
::
|
|
|
|
kotlin.Unit
|
|
kotlin.Pair
|
|
|
|
org.apache.activemq.artemis.api.core.SimpleString
|
|
|
|
Corda Types
|
|
```````````
|
|
|
|
Classes and interfaces in the Corda codebase annotated with ``@CordaSerializable`` are of course supported.
|
|
|
|
All Corda exceptions that are expected to be serialized inherit from ``CordaThrowable`` via either ``CordaException``, for
|
|
checked exceptions, or ``CordaRuntimeException``, for unchecked exceptions. Any ``Throwable`` that is serialized but does
|
|
not conform to ``CordaThrowable`` will be converted to a ``CordaRuntimeException`` with the original exception type
|
|
and other properties retained within it.
|
|
|
|
Custom Types
|
|
------------
|
|
|
|
Here are the rules to adhere to for support of your own types:
|
|
|
|
Classes
|
|
```````
|
|
|
|
General Rules
|
|
'''''''''''''
|
|
|
|
#. The class must be compiled with parameter names included in the ``.class`` file. This is the default in Kotlin
|
|
but must be turned on in Java (``-parameters`` command line option to ``javac``).
|
|
#. The class is annotated with ``@CordaSerializable``.
|
|
#. The declared types of constructor arguments, getters, and setters must be supported, and where generics are used the
|
|
generic parameter must be a supported type, an open wildcard (``*``), or a bounded wildcard which is currently
|
|
widened to an open wildcard.
|
|
#. Any superclass must adhere to the same rules, but can be abstract.
|
|
#. Object graph cycles are not supported, so an object cannot refer to itself, directly or indirectly.
|
|
|
|
Constructor Instantiation
|
|
'''''''''''''''''''''''''
|
|
|
|
The primary way the AMQP serialization framework for Corda instantiates objects is via a defined constructor. This is
|
|
used to first determine which properties of an object are to be serialised then, on deserialization, it is used to
|
|
instantiate the object with the serialized values.
|
|
|
|
This is the recommended design idiom for serializable objects in Corda as it allows for immutable state objects to
|
|
be created
|
|
|
|
#. A Java Bean getter for each of the properties in the constructor, with the names matching up. For example, for a constructor
|
|
parameter ``foo``, there must be a getter called ``getFoo()``. If the type of ``foo`` is boolean, the getter may
|
|
optionally be called ``isFoo()``. This is why the class must be compiled with parameter names turned on.
|
|
#. A constructor which takes all of the properties that you wish to record in the serialized form. This is required in
|
|
order for the serialization framework to reconstruct an instance of your class.
|
|
#. If more than one constructor is provided, the serialization framework needs to know which one to use. The ``@ConstructorForDeserialization``
|
|
annotation can be used to indicate which one. For a Kotlin class, without the ``@ConstructorForDeserialization`` annotation, the
|
|
*primary constructor* will be selected.
|
|
|
|
In Kotlin, this maps cleanly to a data class where there getters are synthesized automatically. For example,
|
|
|
|
.. container:: codeset
|
|
|
|
.. sourcecode:: kotlin
|
|
|
|
data class Example (val a: Int, val b: String)
|
|
|
|
Both properties a and b will be included in the serialised form. However, as stated above, properties not mentioned in
|
|
the constructor will not be serialised. For example, in the following code property c will not be considered part of the
|
|
serialised form
|
|
|
|
.. container:: codeset
|
|
|
|
.. sourcecode:: kotlin
|
|
|
|
data class Example (val a: Int, val b: String) {
|
|
var c: Int = 20
|
|
}
|
|
|
|
var e = Example (10, "hello")
|
|
e.c = 100;
|
|
|
|
val e2 = e.serialize().deserialize() // e2.c will be 20, not 100!!!
|
|
|
|
.. warning:: Private properties in Kotlin classes render the class unserializable *unless* a public
|
|
getter is manually defined. For example:
|
|
|
|
.. container:: codeset
|
|
|
|
.. sourcecode:: kotlin
|
|
|
|
data class C(val a: Int, private val b: Int) {
|
|
// Without this C cannot be serialized
|
|
public fun getB() = b
|
|
}
|
|
|
|
.. note:: This is particularly relevant as IDE's can often point out where they believe a
|
|
property can be made private without knowing this can break Corda serialization. Should
|
|
this happen then a run time warning will be generated when the class fails to serialize
|
|
|
|
Setter Instantiation
|
|
''''''''''''''''''''
|
|
|
|
As an alternative to constructor based initialisation Corda can also determine the important elements of an
|
|
object by inspecting the getter and setter methods present on a class. If a class has **only** a default
|
|
constructor **and** properties then the serializable properties will be determined by the presence of
|
|
both a getter and setter for that property that are both publicly visible. I.e. the class adheres to
|
|
the classic *idiom* of mutable JavaBeans.
|
|
|
|
On deserialization, a default instance will first be created and then, in turn, the setters invoked
|
|
on that object to populate the correct values.
|
|
|
|
For example:
|
|
|
|
.. container:: codeset
|
|
|
|
.. sourcecode:: Java
|
|
|
|
class Example {
|
|
private int a;
|
|
private int b;
|
|
private int c;
|
|
|
|
public int getA() { return a; }
|
|
public int getB() { return b; }
|
|
public int getC() { return c; }
|
|
|
|
public void setA(int a) { this.a = a; }
|
|
public void setB(int b) { this.b = b; }
|
|
public void setC(int c) { this.c = c; }
|
|
}
|
|
|
|
Enums
|
|
`````
|
|
|
|
#. All enums are supported, provided they are annotated with ``@CordaSerializable``.
|
|
|
|
|
|
Exceptions
|
|
``````````
|
|
|
|
The following rules apply to supported ``Throwable`` implementations.
|
|
|
|
#. If you wish for your exception to be serializable and transported type safely it should inherit from either
|
|
``CordaException`` or ``CordaRuntimeException``.
|
|
#. If not, the ``Throwable`` will deserialize to a ``CordaRuntimeException`` with the details of the original
|
|
``Throwable`` contained within it, including the class name of the original ``Throwable``.
|
|
|
|
Kotlin Objects
|
|
``````````````
|
|
|
|
#. Kotlin ``object`` s are singletons and treated differently. They are recorded into the stream with no properties
|
|
and deserialize back to the singleton instance. Currently, the same is not true of Java singletons,
|
|
and they will deserialize to new instances of the class.
|
|
#. Kotlin's anonymous ``object`` s are not currently supported. I.e. constructs like:
|
|
``object : Contract {...}`` will not serialize correctly and need to be re-written as an explicit class declaration.
|
|
|
|
The Carpenter
|
|
`````````````
|
|
|
|
We will support a class carpenter that can dynamically manufacture classes from the supplied schema when deserializing
|
|
in the JVM without the supporting classes on the classpath. This can be useful where other components might expect to
|
|
be able to use reflection over the deserialized data, and also for ensuring classes not on the classpath can be
|
|
deserialized without loading potentially malicious code dynamically without security review outside of a fully sandboxed
|
|
environment. A more detailed discussion of the carpenter will be provided in a future update to the documentation.
|
|
|
|
Future Enhancements
|
|
```````````````````
|
|
|
|
#. Java singleton support. We will add support for identifying classes which are singletons and identifying the
|
|
static method responsible for returning the singleton instance.
|
|
#. Instance internalizing support. We will add support for identifying classes that should be resolved against an instances map to avoid
|
|
creating many duplicate instances that are equal. Similar to ``String.intern()``.
|
|
|
|
.. Type Evolution:
|
|
|
|
Type Evolution
|
|
--------------
|
|
|
|
Type evolution is the mechanisms by which classes can be altered over time yet still remain serializable and deserializable across
|
|
all versions of the class. This ensures an object serialized with an older idea of what the class "looked like" can be deserialized
|
|
and a version of the current state of the class instantiated.
|
|
|
|
More detail can be found in :doc:`serialization-default-evolution`
|
|
|
|
Enum Evolution
|
|
``````````````
|
|
Corda supports interoperability of enumerated type versions. This allows such types to be changed over time without breaking
|
|
backward (or forward) compatibility. The rules and mechanisms for doing this are discussed in :doc:`serialization-enum-evolution``
|
|
|
|
|
|
|