mirror of
https://github.com/corda/corda.git
synced 2024-12-18 20:47:57 +00:00
CORDA-3071 - Checkpoint agent tool (#5295)
* Initial commit based on experimental kryo hook agent. * WIP * Added documentation. * Additional improvements and documentation following more testing. * Added field level instrumentation + basic type handlers for String, byteArray, charArray, primitive types. * Working version (without array type handling) * Missing build.gradle file. * Handle display of Arrays and String. Pruning output to avoid repetition (by loop depth, object count). * Added configurable StackDepth (for display purposes) and filter out ProgressTracker stacks. * Further array handling (Object arrays, 2D, 3D), improved display and general code cleanup. * Various fixes and improvements following demo to RP. * Clean-up * Further clean-up * Set checkpoint id before deserialization. * Update documentation * Final clean-up. * Minor documentation fixes. * Updates following PR review feedback. * Add changelog entry.
This commit is contained in:
parent
eff762a5c3
commit
99e802c5c5
2
.idea/compiler.xml
generated
2
.idea/compiler.xml
generated
@ -29,6 +29,8 @@
|
||||
<module name="buildSrc_test" target="1.8" />
|
||||
<module name="canonicalizer_main" target="1.8" />
|
||||
<module name="canonicalizer_test" target="1.8" />
|
||||
<module name="checkpoint-agent_main" target="1.8" />
|
||||
<module name="checkpoint-agent_test" target="1.8" />
|
||||
<module name="cli_main" target="1.8" />
|
||||
<module name="cli_test" target="1.8" />
|
||||
<module name="client_main" target="1.8" />
|
||||
|
@ -146,6 +146,30 @@
|
||||
|
||||
</RollingRandomAccessFile>
|
||||
|
||||
<RollingFile name="Checkpoint-Agent-RollingFile-Appender"
|
||||
fileName="${log-path}/checkpoints_agent-${date:yyyyMMdd-HHmmss}.log"
|
||||
filePattern="${archive}/checkpoints_agent.%date{yyyy-MM-dd}-%i.log.gz">
|
||||
|
||||
<PatternLayout pattern="[%-5level] %date{ISO8601}{UTC}Z [%t] %c{2}.%method - %msg%n"/>
|
||||
|
||||
<Policies>
|
||||
<TimeBasedTriggeringPolicy/>
|
||||
<SizeBasedTriggeringPolicy size="100MB"/>
|
||||
</Policies>
|
||||
|
||||
<DefaultRolloverStrategy min="1" max="100">
|
||||
<Delete basePath="${archive}" maxDepth="1">
|
||||
<IfFileName glob="${log-name}*.log.gz"/>
|
||||
<IfLastModified age="60d">
|
||||
<IfAny>
|
||||
<IfAccumulatedFileSize exceeds="10 GB"/>
|
||||
</IfAny>
|
||||
</IfLastModified>
|
||||
</Delete>
|
||||
</DefaultRolloverStrategy>
|
||||
|
||||
</RollingFile>
|
||||
|
||||
<Rewrite name="Console-ErrorCode-Selector">
|
||||
<AppenderRef ref="Console-Selector"/>
|
||||
<ErrorCodeRewritePolicy/>
|
||||
@ -190,5 +214,8 @@
|
||||
<AppenderRef ref="Console-ErrorCode-Appender-Println"/>
|
||||
<AppenderRef ref="RollingFile-ErrorCode-Appender"/>
|
||||
</Logger>
|
||||
<Logger name="CheckpointAgent" level="debug" additivity="false">
|
||||
<AppenderRef ref="Checkpoint-Agent-RollingFile-Appender"/>
|
||||
</Logger>
|
||||
</Loggers>
|
||||
</Configuration>
|
||||
|
@ -9,6 +9,9 @@ release, see :doc:`app-upgrade-notes`.
|
||||
Version 5.0
|
||||
-----------
|
||||
|
||||
* Introduced a new low level flow diagnostics tool: checkpoint agent (that can be used standalone or in conjunction with the ``dumpCheckpoints`` shell command).
|
||||
See :doc:`tools-checkpoint-agent` for more information.
|
||||
|
||||
* The MockNet now supports setting a custom Notary class name, as was already supported by normal node config. See :doc:`tutorial-custom-notary`.
|
||||
|
||||
* Introduced a new ``Destination`` abstraction for communicating with non-Party destinations using the new ``FlowLogic.initateFlow(Destination)``
|
||||
|
@ -127,6 +127,8 @@ The main try-catch that handles the yielding may be found `here <https://github.
|
||||
primitive values). The arrays always have the same length, and they both contain values for each stack frame. The primitive stack
|
||||
additionally has a "metadata" slot for each stack frame, this is where the "method entry" value is put, as well as frame size data.
|
||||
|
||||
.. _flow_internals_checkpoints_ref:
|
||||
|
||||
Checkpoints
|
||||
-----------
|
||||
|
||||
|
384
docs/source/tools-checkpoint-agent.rst
Normal file
384
docs/source/tools-checkpoint-agent.rst
Normal file
@ -0,0 +1,384 @@
|
||||
Checkpoint Agent
|
||||
================
|
||||
|
||||
Firstly, please ensure you understand the mechanics and principles of Corda Flows by reading :doc:`key-concepts-flows` and :doc:`flow-state-machines`.
|
||||
We also recommend you understand the purpose and behaviour of the :doc:`node-flow-hospital` in relation to *checkpoints* and flow recovery.
|
||||
An advanced explanation of :ref:`*checkpoints* <flow_internals_checkpoints_ref>` within the flow state machine can be found here: :doc:`contributing-flow-internals`.
|
||||
|
||||
As a recap,
|
||||
|
||||
"A flow *checkpoint* is a serialised snapshot of the flow's stack frames and any objects reachable from the stack. Checkpoints are saved to
|
||||
the database automatically when a flow suspends or resumes, which typically happens when sending or receiving messages. A flow may be replayed
|
||||
from the last checkpoint if the node restarts. Automatic checkpointing is an unusual feature of Corda and significantly helps developers write
|
||||
reliable code that can survive node restarts and crashes. It also assists with scaling up, as flows that are waiting for a response can be flushed
|
||||
from memory."
|
||||
|
||||
The Checkpoint Agent is a very low level diagnostics tool that can be used to output the type, size and content of flow *checkpoints* at node runtime.
|
||||
It is primarily targeted at users developing and testing code that may exhibit flow mis-behaviour (preferably before going into production).
|
||||
|
||||
For a given flow *checkpoint*, the agent outputs:
|
||||
|
||||
1. Information about the checkpoint such as its ``id`` (also called a ``flow id``) that can be used to correlate with that flows lifecycle details in the main Corda logs.
|
||||
2. A nested hierarchical view of its reachable objects (indented and tagged with depth and size) and their associated sizes, including the state
|
||||
of any flows held within the checkpoint.
|
||||
|
||||
Diagnostics information is written to standard log files (eg. log4j2 configured logger).
|
||||
|
||||
This tool is particularly useful when used in conjunction with the ``dumpCheckpoints`` CRaSH shell command to troubleshoot and identify potential
|
||||
problems associated with checkpoints for flows that appear to not be completing.
|
||||
|
||||
The checkpoint agent can be downloaded from `here <https://software.r3.com/artifactory/corda-releases/net/corda/corda-tools-checkpoint-agent/>`_.
|
||||
|
||||
To run simply pass in the following jar to the JVM used to start a Corda node: ``-javaagent:<PATH>/checkpoint-agent.jar[=arg=value,...]``
|
||||
|
||||
.. warning:: This tool requires additional memory footprint and we recommended a minimal heap size of at least 1Gb.
|
||||
|
||||
The agent can be customised with a number of optional parameters described below.
|
||||
|
||||
.. note:: When using the gradle plugin utilities for deploying and running nodes, ensure the checkpoint agent jar is correctly passed to capsule as follows:
|
||||
``-Dcapsule.jvm.args=-javaagent:checkpoint-agent.jar[=arg=value,...]``
|
||||
|
||||
Configuration
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
The checkpoint agent can be started with the following optional parameters:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
checkpoint-agent.jar=[instrumentType=<read|write>],[instrumentClassname=<CLASSNAME>],[minimumSize=<MIN_SIZE>],[maximumSize=<MAX_SIZE>, [graphDepth=<DEPTH>], [printOnce=<true|false>]
|
||||
|
||||
* ``instrumentType``: whether to output checkpoints on read or write. Possible values: [read, write]. Default: read.
|
||||
* ``instrumentClassname``: specify the base type of objects to log. The default setting is to process all *Flow* object types. Default: net.corda.node.services.statemachine.FlowStateMachineImpl.
|
||||
* ``minimumSize``: specifies the minimum size (in bytes) of objects to log. Default: 8192 bytes (8K)
|
||||
* ``maximumSize``: specifies the maximum size (in bytes) of objects to log. Default: 20000000 bytes (20Mb)
|
||||
* ``graphDepth``: specifies how many levels deep to display the graph output. Default: unlimited
|
||||
* ``printOnce``: if true, will display a full object reference (and its sub-graph) only once. Otherwise an object will be displayed repeatedly as referenced. Default: true
|
||||
|
||||
These arguments are passed to the JVM along with the agent specification. For example:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
-javaagent:<PATH>/checkpoint-agent.jar=instrumentClassname=net.corda.vega.flows.SimmFlow,instrumentType=read,minimumSize=10240,maximumSize=512000,graphDepth=6,printOnce=false
|
||||
|
||||
.. note:: Arguments may be passed into the agent in any order and should **not** contain spaces between them.
|
||||
|
||||
Checkpoint Dump support
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When used in combination with the ``dumpCheckpoints`` shell command (see :ref:`Upgrading CorDapps <upgrading-cordapps-flow-drains>`.),
|
||||
the checkpoint agent will automatically output additional diagnostic information for all checkpoints dumped by the aforementioned tool.
|
||||
|
||||
You should therefore see two different output files upon invoking the checkpoint dumper command:
|
||||
|
||||
* ``<NODE_BASE>\logs\checkpoints_dump-<date>.zip`` contains zipped JSON representation of checkpoints (from ``dumpCheckpoints`` shell command)
|
||||
* ``<NODE_BASE>\logs\checkpoints_agent-<date>.log`` contains output from this agent tool (types and sizes of a checkpoint stack)
|
||||
|
||||
.. note:: You will only see a separate `checkpoints_agent-<date>.log` file if you configure a separate log4j logger as described below.
|
||||
Otherwise all diagnostics logging will be routed to the standard Corda node log file: ``node-<hostname>.log``.
|
||||
|
||||
If you **only** wish to log checkpoint data for failing flows, start the checkpoint agent with the following arguments:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
checkpoint-agent.jar=instrumentType=read,instrumentClassname=NONE
|
||||
|
||||
and use the ``dumpCheckpoints`` shell command to trigger diagnostics collection.
|
||||
|
||||
.. warning:: The checkpoint agent JAR file must be called "checkpoint-agent.jar" as the checkpoint dump support code uses Java reflection to
|
||||
determine whether the VM has been instrumented or not at runtime.
|
||||
|
||||
Logging configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The agent will log output to a log4j2 configured logger.
|
||||
|
||||
It is recommended to configure a separate log file to capture this information by configuring an appender as follows:
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
<Logger name="CheckpointAgent" level="info" additivity="false">
|
||||
<AppenderRef ref="Checkpoint-Agent-RollingFile-Appender"/>
|
||||
</Logger>
|
||||
|
||||
.. warning:: You must specify "CheckpointAgent" as the logger name.
|
||||
|
||||
In this instance we are specifying a Rolling File appender with archival rotation as follows:
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
<RollingFile name="Checkpoint-Agent-RollingFile-Appender"
|
||||
fileName="${log-path}/checkpoints_agent-${date:yyyyMMdd-HHmmss}.log"
|
||||
filePattern="${archive}/checkpoints_agent.%date{yyyy-MM-dd}-%i.log.gz">
|
||||
|
||||
<PatternLayout pattern="[%-5level] %date{ISO8601}{UTC}Z [%t] %c{2}.%method - %msg%n"/>
|
||||
|
||||
<Policies>
|
||||
<TimeBasedTriggeringPolicy/>
|
||||
<SizeBasedTriggeringPolicy size="100MB"/>
|
||||
</Policies>
|
||||
|
||||
<DefaultRolloverStrategy min="1" max="100">
|
||||
<Delete basePath="${archive}" maxDepth="1">
|
||||
<IfFileName glob="${log-name}*.log.gz"/>
|
||||
<IfLastModified age="60d">
|
||||
<IfAny>
|
||||
<IfAccumulatedFileSize exceeds="10 GB"/>
|
||||
</IfAny>
|
||||
</IfLastModified>
|
||||
</Delete>
|
||||
</DefaultRolloverStrategy>
|
||||
|
||||
</RollingFile>
|
||||
|
||||
The *log4j2.xml* containing the above configuration must now be be passed to the Corda node JVM along with the agent specification:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
-Dlog4j.configurationFile=<PATH>/log4j2.xml
|
||||
|
||||
Sample output
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
Using the *log4j2* configuration described above, the following output is generated to a file called ``checkpoints_agent-<DATE>.log`` under
|
||||
the Corda node ``logs`` directory for a single flow execution (in this case):
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
[INFO ] 2019-07-11T18:25:15,723Z [Node thread-1] CheckpointAgent. - [WRITE] Fiber@10000004:[43c7d5c8-aa66-4a98-beed-dc91354d0353][task: co.paralleluniverse.fibers.RunnableFiberTask@4dc8eaf(Fiber@10000004), target: null, scheduler: co.paralleluniverse.fibers.FiberExecutorScheduler@4e468018]
|
||||
000:net.corda.node.services.statemachine.FlowStateMachineImpl 21,149
|
||||
|
||||
[INFO ] 2019-07-11T18:19:51,115Z [FiberDeserializationChecker] CheckpointAgent. - [READ] class net.corda.node.services.statemachine.FlowStateMachineImpl
|
||||
000:net.corda.node.services.statemachine.FlowStateMachineImpl 21,151
|
||||
001: net.corda.node.services.statemachine.FlowStateMachineImpl 21,149
|
||||
002: java.lang.String 107
|
||||
003: [C 77
|
||||
002: co.paralleluniverse.fibers.Stack 20,932
|
||||
003: [J 278
|
||||
003: [Ljava.lang.Object; 20,054
|
||||
004: net.corda.finance.flows.CashIssueAndPaymentFlow 7,229
|
||||
005: net.corda.core.utilities.ProgressTracker 5,664
|
||||
etc ...
|
||||
|
||||
[INFO ] 2019-07-11T18:35:03,198Z [rpc-server-handler-pool-2] CheckpointAgent. - [READ] class net.corda.node.services.statemachine.ErrorState$Clean
|
||||
Checkpoint id: 15f16740-4ea2-4e48-bcb3-fd9051d5ba59
|
||||
000:net.corda.node.services.statemachine.FlowStateMachineImpl 21,151
|
||||
001: [C 77
|
||||
001: [J 278
|
||||
001: [Ljava.lang.Object; 20,054
|
||||
002: java.util.ArrayList 1,658
|
||||
003: net.corda.core.utilities.ProgressTracker$STARTING 0
|
||||
etc ...
|
||||
|
||||
Note,
|
||||
|
||||
* on WRITE (eg. a checkpoint is being serialized to disk), we have complete information of the checkpoint object including the Fiber it is
|
||||
running on and its checkpoint id (43c7d5c8-aa66-4a98-beed-dc91354d0353)
|
||||
|
||||
* on READ (eg. a checkpoint is being deserialized from disk), we only have information about the stack class hierarchy.
|
||||
Additionally, if we are using the CRaSH shell ``dumpCheckpoints`` command, we also see a flows checkpoint id.
|
||||
|
||||
Flow diagnostic process
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Lets assume a scenario where we have triggered a flow in a node (eg. node acting as a flow initiator) but the flow does not appear to complete.
|
||||
|
||||
For example, you may see the following using the CRaSH shell ``flow watch`` command:
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
Id Flow name Initiator Status
|
||||
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
15f16740-4ea2-4e48-bcb3-fd9051d5b Cash Issue And Payment bankUser In progress
|
||||
1c6c3e59-26aa-4b93-8435-4e34e265e Cash Issue And Payment bankUser In progress
|
||||
90613d6f-be78-41bd-98e1-33a756c28 Cash Issue And Payment bankUser In progress
|
||||
43c7d5c8-aa66-4a98-beed-dc91354d0 Cash Issue And Payment bankUser In progress
|
||||
Waiting for completion or Ctrl-C ...
|
||||
|
||||
Note that "In progress" indicates the flows above have not completed (and will have been checkpointed).
|
||||
|
||||
|
||||
1. Check the main corda node log file for *hospitalisation* and/or *flow retry* messages: ``<NODE_BASE>\logs\node-<hostname>.log``
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
[INFO ] 2019-07-11T17:56:43,227Z [pool-12-thread-1] statemachine.FlowMonitor. - Flow with id 90613d6f-be78-41bd-98e1-33a756c28808 has been waiting for 97904 seconds to receive messages from parties [O=BigCorporation, L=New York, C=US].
|
||||
|
||||
.. note:: Always search for the flow id, in this case **90613d6f-be78-41bd-98e1-33a756c28808**
|
||||
|
||||
2. From the CRaSH shell run the ``dumpCheckpoints`` command to trigger diagnostics information.
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
Welcome to the Corda interactive shell.
|
||||
Useful commands include 'help' to see what is available, and 'bye' to shut down the node.
|
||||
|
||||
Thu Jul 11 18:56:48 BST 2019>>> run dumpCheckpoints
|
||||
|
||||
You will now see an addition line in the main corda node log file as follows:
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
[INFO ] 2019-07-11T18:02:47,610Z [rpc-server-handler-pool-0] rpc.CheckpointDumper. - Checkpoint agent processing checkpointId: [90613d6f-be78-41bd-98e1-33a756c28808]
|
||||
|
||||
And two additional files will appear in the nodes logs directory:
|
||||
|
||||
* ``<NODE_BASE>\logs\checkpoints_dump-20190711-180247.zip``
|
||||
* ``<NODE_BASE>\logs\checkpoints_agent-20190711-185424.log``
|
||||
|
||||
3. Unzip the ``<NODE_BASE>\logs\checkpoints_dump-<date>.zip`` file, and you should see a file with a matching flow id as above:
|
||||
**CashIssueAndPaymentFlow-90613d6f-be78-41bd-98e1-33a756c28808.jsos**
|
||||
|
||||
It contents will contain the following diagnostics information:
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
{
|
||||
"id" : "90613d6f-be78-41bd-98e1-33a756c28808",
|
||||
"flowLogicClass" : "net.corda.finance.flows.CashIssueAndPaymentFlow",
|
||||
"flowLogic" : {
|
||||
"amount" : "200.00 USD",
|
||||
"issueRef" : "AQ==",
|
||||
"recipient" : "O=BigCorporation, L=New York, C=US",
|
||||
"anonymous" : true,
|
||||
"notary" : "O=Notary Service, L=Zurich, C=CH"
|
||||
},
|
||||
"flowCallStack" : [
|
||||
{
|
||||
"flowClass" : "net.corda.confidential.SwapIdentitiesFlow",
|
||||
"progressStep" : "Awaiting counterparty's anonymous identity"
|
||||
},
|
||||
{
|
||||
"flowClass" : "net.corda.finance.flows.CashPaymentFlow",
|
||||
"progressStep" : "Generating anonymous identities"
|
||||
},
|
||||
{
|
||||
"flowClass" : "net.corda.finance.flows.CashIssueAndPaymentFlow",
|
||||
"progressStep" : "Paying recipient"
|
||||
}
|
||||
],
|
||||
"suspendedOn" : {
|
||||
"sendAndReceive" : [
|
||||
{
|
||||
"session" : {
|
||||
"peer" : "O=BigCorporation, L=New York, C=US",
|
||||
"ourSessionId" : 1443438003030966253
|
||||
},
|
||||
"sentPayloadType" : "net.corda.confidential.SwapIdentitiesFlow$IdentityWithSignature",
|
||||
"sentPayload" : {
|
||||
"identity" : {
|
||||
"class" : "net.corda.core.identity.PartyAndCertificate",
|
||||
"deserialized" : "O=BankOfCorda, L=London, C=GB"
|
||||
},
|
||||
"signature" : "t+7hyUnQE08n3ST4krA/7fi1R8ItdrGvpeEbMFgTBDCHibMWiKo/NaTSVUdfwPmsEtl1PFx0MHz5rtRQ+XuEBg=="
|
||||
}
|
||||
}
|
||||
],
|
||||
"suspendedTimestamp" : "2019-07-10T14:44:58",
|
||||
"secondsSpentWaiting" : 98268
|
||||
},
|
||||
"origin" : {
|
||||
"rpc" : "bankUser"
|
||||
},
|
||||
"ourIdentity" : "O=BankOfCorda, L=London, C=GB",
|
||||
"activeSessions" : [ ],
|
||||
"errored" : null
|
||||
}
|
||||
|
||||
4. View the contents of the node agent diagnostics file:
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
[INFO ] 2019-07-11T18:02:47,615Z [rpc-server-handler-pool-0] CheckpointAgent. - [READ] class net.corda.node.services.statemachine.Checkpoint
|
||||
Checkpoint id: 90613d6f-be78-41bd-98e1-33a756c28808
|
||||
000:net.corda.node.services.statemachine.Checkpoint 29,200
|
||||
001: net.corda.node.services.statemachine.ErrorState$Clean 0
|
||||
001: net.corda.node.services.statemachine.FlowState$Started 26,061
|
||||
002: net.corda.core.internal.FlowIORequest$SendAndReceive 4,666
|
||||
003: java.util.Collections$SingletonMap 4,536
|
||||
004: net.corda.node.services.statemachine.FlowSessionImpl 500
|
||||
005: net.corda.core.identity.Party 360
|
||||
005: net.corda.node.services.statemachine.SessionId 28
|
||||
004: net.corda.core.serialization.SerializedBytes 3,979
|
||||
002: net.corda.core.serialization.SerializedBytes 21,222
|
||||
001: net.corda.core.context.InvocationContext 905
|
||||
002: net.corda.core.context.Actor 259
|
||||
002: net.corda.core.context.InvocationOrigin$RPC 13
|
||||
002: net.corda.core.context.Trace 398
|
||||
001: net.corda.core.identity.Party 156
|
||||
002: net.i2p.crypto.eddsa.EdDSAPublicKey 45
|
||||
002: net.corda.core.identity.CordaX500Name 92
|
||||
001: java.util.LinkedHashMap 327
|
||||
002: net.corda.node.services.statemachine.SessionState$Initiating 214
|
||||
001: java.util.ArrayList 1,214
|
||||
002: net.corda.node.services.statemachine.SubFlow$Inlined 525
|
||||
003: java.lang.Class 47
|
||||
003: net.corda.node.services.statemachine.SubFlowVersion$CorDappFlow 328
|
||||
004: net.corda.core.crypto.SecureHash$SHA256 118
|
||||
005: [B 33
|
||||
002: net.corda.node.services.statemachine.SubFlow$Initiating 322
|
||||
003: java.lang.Class 39
|
||||
003: net.corda.core.flows.FlowInfo 124
|
||||
003: net.corda.node.services.statemachine.SubFlowVersion$CorDappFlow 11
|
||||
002: net.corda.node.services.statemachine.SubFlow$Initiating 250
|
||||
003: java.lang.Class 41
|
||||
003: net.corda.core.flows.FlowInfo 99
|
||||
004: java.lang.String 91
|
||||
005: [C 85
|
||||
003: net.corda.node.services.statemachine.SubFlowVersion$CoreFlow 28
|
||||
|
||||
5. Take relevant recovery action, which may include:
|
||||
|
||||
* killing and retrying the flow:
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
Welcome to the Corda interactive shell.
|
||||
Useful commands include 'help' to see what is available, and 'bye' to shut down the node.
|
||||
|
||||
Thu Jul 11 20:24:11 BST 2019>>> flow kill 90613d6f-be78-41bd-98e1-33a756c28808
|
||||
[ERROR] 20:24:18+0100 [Node thread-1] corda.flow. - Flow interrupted while waiting for events, aborting immediately {actor_id=bankUser, actor_owning_identity=O=BankOfCorda, L=London, C=GB, actor_store_id=NODE_CONFIG, fiber-id=10000003, flow-id=15f16740-4ea2-4e48-bcb3-fd9051d5ba59, invocation_id=45622dc7-c4cf-4d11-85ad-1c45e0943455, invocation_timestamp=2019-07-11T18:19:40.519Z, origin=bankUser, session_id=02010e15-8e7a-46f7-976b-5e0626451c54, session_timestamp=2019-07-11T18:19:32.285Z, thread-id=176}
|
||||
Killed flow [90613d6f-be78-41bd-98e1-33a756c28808]
|
||||
|
||||
Thu Jul 11 20:26:45 BST 2019>>> flow start CashIssueAndPaymentFlow amount: $1000, issueRef: 0x01, recipient: "Bank B", anonymous: false, notary: "Notary Service"
|
||||
|
||||
* attempting to perform a graceful shutdown (draining all outstanding flows and preventing others from starting) and re-start of the node:
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
Welcome to the Corda interactive shell.
|
||||
Useful commands include 'help' to see what is available, and 'bye' to shut down the node.
|
||||
|
||||
Thu Jul 11 19:52:56 BST 2019>>> gracefulShutdown
|
||||
|
||||
Upon re-start ensure you disable flow draining mode to allow the node to continue to receive requests:
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
Welcome to the Corda interactive shell.
|
||||
Useful commands include 'help' to see what is available, and 'bye' to shut down the node.
|
||||
|
||||
Thu Jul 11 19:52:56 BST 2019>>> run setFlowsDrainingModeEnabled enabled: false
|
||||
|
||||
See also :ref:`Flow draining mode <draining-mode>`.
|
||||
|
||||
* contacting other participants in the network where their nodes are not responding to an initiated flow.
|
||||
The checkpoint dump gives good diagnostics on the reason a flow may be suspended (including the destination peer participant node that is not responding):
|
||||
|
||||
.. sourcecode:: none
|
||||
|
||||
"suspendedOn" : {
|
||||
"sendAndReceive" : [
|
||||
{
|
||||
"session" : {
|
||||
"peer" : "O=BigCorporation, L=New York, C=US",
|
||||
"ourSessionId" : 1443438003030966253
|
||||
},
|
||||
"sentPayloadType" : "net.corda.confidential.SwapIdentitiesFlow$IdentityWithSignature",
|
||||
"sentPayload" : {
|
||||
"identity" : {
|
||||
"class" : "net.corda.core.identity.PartyAndCertificate",
|
||||
"deserialized" : "O=BankOfCorda, L=London, C=GB"
|
||||
},
|
||||
"signature" : "t+7hyUnQE08n3ST4krA/7fi1R8ItdrGvpeEbMFgTBDCHibMWiKo/NaTSVUdfwPmsEtl1PFx0MHz5rtRQ+XuEBg=="
|
||||
}
|
||||
}
|
@ -1,7 +1,7 @@
|
||||
Tools
|
||||
=====
|
||||
|
||||
Corda provides various command line and GUI tools to help you as you work. Along with the three below, you may also
|
||||
Corda provides various command line and GUI tools to help you as you work. Along with those below, you may also
|
||||
wish to try the :doc:`blob-inspector`.
|
||||
|
||||
.. toctree::
|
||||
@ -11,3 +11,4 @@ wish to try the :doc:`blob-inspector`.
|
||||
network-bootstrapper
|
||||
demobench
|
||||
node-explorer
|
||||
tools-checkpoint-agent
|
||||
|
@ -341,7 +341,7 @@ abstract class AbstractNode<S>(val configuration: NodeConfiguration,
|
||||
installCoreFlows()
|
||||
registerCordappFlows()
|
||||
services.rpcFlows += cordappLoader.cordapps.flatMap { it.rpcFlows }
|
||||
val checkpointDumper = CheckpointDumper(checkpointStorage, database, services)
|
||||
val checkpointDumper = CheckpointDumper(checkpointStorage, database, services, services.configuration.baseDirectory)
|
||||
val rpcOps = makeRPCOps(cordappLoader, checkpointDumper)
|
||||
startShell()
|
||||
networkMapClient?.start(trustRoot)
|
||||
|
@ -1,6 +1,7 @@
|
||||
package net.corda.node.services.rpc
|
||||
|
||||
import co.paralleluniverse.fibers.Stack
|
||||
import co.paralleluniverse.strands.Strand
|
||||
import com.fasterxml.jackson.annotation.JsonAutoDetect
|
||||
import com.fasterxml.jackson.annotation.JsonAutoDetect.Visibility
|
||||
import com.fasterxml.jackson.annotation.JsonFormat
|
||||
@ -24,9 +25,11 @@ import net.corda.core.crypto.SecureHash
|
||||
import net.corda.core.flows.FlowInfo
|
||||
import net.corda.core.flows.FlowLogic
|
||||
import net.corda.core.flows.FlowSession
|
||||
import net.corda.core.flows.StateMachineRunId
|
||||
import net.corda.core.identity.CordaX500Name
|
||||
import net.corda.core.identity.Party
|
||||
import net.corda.core.internal.*
|
||||
import net.corda.core.node.ServiceHub
|
||||
import net.corda.core.serialization.SerializeAsToken
|
||||
import net.corda.core.serialization.SerializedBytes
|
||||
import net.corda.core.serialization.deserialize
|
||||
@ -38,11 +41,12 @@ import net.corda.core.utilities.ProgressTracker
|
||||
import net.corda.core.utilities.contextLogger
|
||||
import net.corda.node.internal.NodeStartup
|
||||
import net.corda.node.services.api.CheckpointStorage
|
||||
import net.corda.node.services.api.ServiceHubInternal
|
||||
import net.corda.node.services.statemachine.*
|
||||
import net.corda.nodeapi.internal.persistence.CordaPersistence
|
||||
import net.corda.serialization.internal.CheckpointSerializeAsTokenContextImpl
|
||||
import net.corda.serialization.internal.withTokenContext
|
||||
import sun.misc.VMSupport
|
||||
import java.nio.file.Path
|
||||
import java.time.Duration
|
||||
import java.time.Instant
|
||||
import java.time.ZoneOffset.UTC
|
||||
@ -53,7 +57,7 @@ import java.util.concurrent.atomic.AtomicInteger
|
||||
import java.util.zip.ZipEntry
|
||||
import java.util.zip.ZipOutputStream
|
||||
|
||||
class CheckpointDumper(private val checkpointStorage: CheckpointStorage, private val database: CordaPersistence, private val serviceHub: ServiceHubInternal) {
|
||||
class CheckpointDumper(private val checkpointStorage: CheckpointStorage, private val database: CordaPersistence, private val serviceHub: ServiceHub, val baseDirectory: Path) {
|
||||
companion object {
|
||||
private val TIME_FORMATTER = DateTimeFormatter.ofPattern("yyyyMMdd-HHmmss").withZone(UTC)
|
||||
private val log = contextLogger()
|
||||
@ -64,6 +68,10 @@ class CheckpointDumper(private val checkpointStorage: CheckpointStorage, private
|
||||
private lateinit var checkpointSerializationContext: CheckpointSerializationContext
|
||||
private lateinit var writer: ObjectWriter
|
||||
|
||||
private val isCheckpointAgentRunning by lazy {
|
||||
checkpointAgentRunning()
|
||||
}
|
||||
|
||||
fun start(tokenizableServices: List<Any>) {
|
||||
checkpointSerializationContext = CheckpointSerializationDefaults.CHECKPOINT_CONTEXT.withTokenContext(
|
||||
CheckpointSerializeAsTokenContextImpl(
|
||||
@ -91,13 +99,15 @@ class CheckpointDumper(private val checkpointStorage: CheckpointStorage, private
|
||||
|
||||
fun dump() {
|
||||
val now = serviceHub.clock.instant()
|
||||
val file = serviceHub.configuration.baseDirectory / NodeStartup.LOGS_DIRECTORY_NAME / "checkpoints_dump-${TIME_FORMATTER.format(now)}.zip"
|
||||
val file = baseDirectory / NodeStartup.LOGS_DIRECTORY_NAME / "checkpoints_dump-${TIME_FORMATTER.format(now)}.zip"
|
||||
try {
|
||||
if (lock.getAndIncrement() == 0 && !file.exists()) {
|
||||
database.transaction {
|
||||
checkpointStorage.getAllCheckpoints().use { stream ->
|
||||
ZipOutputStream(file.outputStream()).use { zip ->
|
||||
stream.forEach { (runId, serialisedCheckpoint) ->
|
||||
if (isCheckpointAgentRunning)
|
||||
instrumentCheckpointAgent(runId)
|
||||
val checkpoint = serialisedCheckpoint.checkpointDeserialize(context = checkpointSerializationContext)
|
||||
val json = checkpoint.toJson(runId.uuid, now)
|
||||
val jsonBytes = writer.writeValueAsBytes(json)
|
||||
@ -116,6 +126,28 @@ class CheckpointDumper(private val checkpointStorage: CheckpointStorage, private
|
||||
}
|
||||
}
|
||||
|
||||
private fun instrumentCheckpointAgent(checkpointId: StateMachineRunId) {
|
||||
log.info("Checkpoint agent diagnostics for checkpointId: $checkpointId")
|
||||
try {
|
||||
val checkpointHook = Class.forName("net.corda.tools.CheckpointHook").kotlin
|
||||
if (checkpointHook.objectInstance == null)
|
||||
log.info("Instantiating checkpoint agent object instance")
|
||||
val instance = checkpointHook.objectOrNewInstance()
|
||||
val checkpointIdField = instance.declaredField<UUID>(instance.javaClass, "checkpointId")
|
||||
checkpointIdField.value = checkpointId.uuid
|
||||
}
|
||||
catch (e: Exception) {
|
||||
log.error("Checkpoint agent instrumentation failed for checkpointId: $checkpointId\n. ${e.message}")
|
||||
}
|
||||
}
|
||||
|
||||
private fun checkpointAgentRunning(): Boolean {
|
||||
val agentProperties = VMSupport.getAgentProperties()
|
||||
return agentProperties.values.any { value ->
|
||||
(value is String && value.contains("checkpoint-agent.jar"))
|
||||
}
|
||||
}
|
||||
|
||||
private fun Checkpoint.toJson(id: UUID, now: Instant): CheckpointJson {
|
||||
val (fiber, flowLogic) = when (flowState) {
|
||||
is FlowState.Unstarted -> {
|
||||
|
@ -0,0 +1,102 @@
|
||||
package net.corda.node.services.rpc
|
||||
|
||||
import net.corda.core.context.InvocationContext
|
||||
import net.corda.core.flows.FlowLogic
|
||||
import net.corda.core.flows.StateMachineRunId
|
||||
import net.corda.core.identity.CordaX500Name
|
||||
import net.corda.core.serialization.SerializedBytes
|
||||
import net.corda.core.serialization.internal.CheckpointSerializationDefaults
|
||||
import net.corda.core.serialization.internal.checkpointSerialize
|
||||
import net.corda.node.services.persistence.DBCheckpointStorage
|
||||
import net.corda.node.services.statemachine.Checkpoint
|
||||
import net.corda.node.services.statemachine.FlowStart
|
||||
import net.corda.node.services.statemachine.SubFlowVersion
|
||||
import net.corda.nodeapi.internal.persistence.CordaPersistence
|
||||
import net.corda.testing.core.SerializationEnvironmentRule
|
||||
import net.corda.testing.core.TestIdentity
|
||||
import net.corda.testing.node.MockServices
|
||||
import org.junit.After
|
||||
import org.junit.Before
|
||||
import org.junit.Rule
|
||||
import org.junit.Test
|
||||
import java.nio.file.Paths
|
||||
|
||||
class CheckpointDumperTest {
|
||||
|
||||
@Rule
|
||||
@JvmField
|
||||
val testSerialization = SerializationEnvironmentRule()
|
||||
|
||||
private val myself = TestIdentity(CordaX500Name("Me", "London", "GB"))
|
||||
private lateinit var database: CordaPersistence
|
||||
private lateinit var services: MockServices
|
||||
private lateinit var checkpointStorage: DBCheckpointStorage
|
||||
|
||||
@Before
|
||||
fun setUp() {
|
||||
val (db, mockServices) = MockServices.makeTestDatabaseAndPersistentServices(
|
||||
cordappPackages = emptyList(),
|
||||
initialIdentity = myself,
|
||||
moreIdentities = emptySet(),
|
||||
moreKeys = emptySet()
|
||||
)
|
||||
database = db
|
||||
services = mockServices
|
||||
newCheckpointStorage()
|
||||
}
|
||||
|
||||
@After
|
||||
fun cleanUp() {
|
||||
database.close()
|
||||
}
|
||||
|
||||
@Test
|
||||
fun testDumpCheckpoints() {
|
||||
val dumper = CheckpointDumper(checkpointStorage, database, services, Paths.get("."))
|
||||
dumper.start(emptyList())
|
||||
|
||||
// add a checkpoint
|
||||
val (id, checkpoint) = newCheckpoint()
|
||||
database.transaction {
|
||||
checkpointStorage.addCheckpoint(id, checkpoint)
|
||||
}
|
||||
|
||||
dumper.dump()
|
||||
// check existence of output zip file: checkpoints_dump-<data>.zip
|
||||
}
|
||||
|
||||
// This test will only succeed when the VM startup includes the "checkpoint-agent":
|
||||
// -javaagent:tools/checkpoint-agent/build/libs/checkpoint-agent.jar
|
||||
@Test
|
||||
fun testDumpCheckpointsAndAgentDiagnostics() {
|
||||
val dumper = CheckpointDumper(checkpointStorage, database, services, Paths.get("."))
|
||||
dumper.start(emptyList())
|
||||
|
||||
// add a checkpoint
|
||||
val (id, checkpoint) = newCheckpoint()
|
||||
database.transaction {
|
||||
checkpointStorage.addCheckpoint(id, checkpoint)
|
||||
}
|
||||
|
||||
dumper.dump()
|
||||
// check existence of output zip file: checkpoints_dump-<date>.zip
|
||||
// check existence of output agent log: checkpoints_agent-<data>.log
|
||||
}
|
||||
|
||||
private fun newCheckpointStorage() {
|
||||
database.transaction {
|
||||
checkpointStorage = DBCheckpointStorage()
|
||||
}
|
||||
}
|
||||
|
||||
private fun newCheckpoint(version: Int = 1): Pair<StateMachineRunId, SerializedBytes<Checkpoint>> {
|
||||
val id = StateMachineRunId.createRandom()
|
||||
val logic: FlowLogic<*> = object : FlowLogic<Unit>() {
|
||||
override fun call() {}
|
||||
}
|
||||
val frozenLogic = logic.checkpointSerialize(context = CheckpointSerializationDefaults.CHECKPOINT_CONTEXT)
|
||||
val checkpoint = Checkpoint.create(InvocationContext.shell(), FlowStart.Explicit, logic.javaClass, frozenLogic, myself.identity.party, SubFlowVersion.CoreFlow(version), false)
|
||||
.getOrThrow()
|
||||
return id to checkpoint.checkpointSerialize(context = CheckpointSerializationDefaults.CHECKPOINT_CONTEXT)
|
||||
}
|
||||
}
|
@ -56,6 +56,7 @@ include 'tools:shell-cli'
|
||||
include 'tools:network-builder'
|
||||
include 'tools:cliutils'
|
||||
include 'tools:worldmap'
|
||||
include 'tools:checkpoint-agent'
|
||||
include 'example-code'
|
||||
project(':example-code').projectDir = file("$settingsDir/docs/source/example-code")
|
||||
include 'samples:attachment-demo:contracts'
|
||||
@ -97,4 +98,6 @@ if (JavaVersion.current() == JavaVersion.VERSION_1_8) {
|
||||
include 'core-deterministic:testing:verifier'
|
||||
include 'serialization-deterministic'
|
||||
}
|
||||
include 'tools:checkpoint-agent'
|
||||
findProject(':tools:checkpoint-agent')?.name = 'checkpoint-agent'
|
||||
|
||||
|
64
tools/checkpoint-agent/build.gradle
Normal file
64
tools/checkpoint-agent/build.gradle
Normal file
@ -0,0 +1,64 @@
|
||||
buildscript {
|
||||
// For sharing constants between builds
|
||||
Properties constants = new Properties()
|
||||
file("$projectDir/../../constants.properties").withInputStream { constants.load(it) }
|
||||
|
||||
ext.kotlin_version = constants.getProperty("kotlinVersion")
|
||||
ext.javaassist_version = "3.12.1.GA"
|
||||
|
||||
repositories {
|
||||
mavenLocal()
|
||||
mavenCentral()
|
||||
jcenter()
|
||||
}
|
||||
|
||||
dependencies {
|
||||
classpath "org.jetbrains.kotlin:kotlin-gradle-plugin:$kotlin_version"
|
||||
}
|
||||
}
|
||||
|
||||
repositories {
|
||||
mavenLocal()
|
||||
mavenCentral()
|
||||
jcenter()
|
||||
}
|
||||
|
||||
apply plugin: 'kotlin'
|
||||
apply plugin: 'idea'
|
||||
|
||||
description 'A javaagent to allow hooking into Kryo checkpoints'
|
||||
|
||||
dependencies {
|
||||
compile "org.jetbrains.kotlin:kotlin-stdlib-jre8:$kotlin_version"
|
||||
compile "org.jetbrains.kotlin:kotlin-reflect:$kotlin_version"
|
||||
compile "javassist:javassist:$javaassist_version"
|
||||
compile "com.esotericsoftware:kryo:4.0.0"
|
||||
compile "co.paralleluniverse:quasar-core:$quasar_version:jdk8"
|
||||
|
||||
compile (project(':core')) {
|
||||
transitive = false
|
||||
}
|
||||
|
||||
// Unit testing helpers.
|
||||
compile "org.junit.jupiter:junit-jupiter-api:${junit_jupiter_version}"
|
||||
compile "junit:junit:$junit_version"
|
||||
|
||||
// SLF4J: commons-logging bindings for a SLF4J back end
|
||||
compile "org.slf4j:jcl-over-slf4j:$slf4j_version"
|
||||
compile "org.slf4j:slf4j-api:$slf4j_version"
|
||||
}
|
||||
|
||||
jar {
|
||||
archiveName = "${project.name}.jar"
|
||||
manifest {
|
||||
attributes(
|
||||
'Premain-Class': 'net.corda.tools.CheckpointAgent',
|
||||
'Can-Redefine-Classes': 'true',
|
||||
'Can-Retransform-Classes': 'true',
|
||||
'Can-Set-Native-Method-Prefix': 'true',
|
||||
'Implementation-Title': "CheckpointAgent",
|
||||
'Implementation-Version': rootProject.version
|
||||
)
|
||||
}
|
||||
from { configurations.compile.collect { it.isDirectory() ? it : zipTree(it) } }
|
||||
}
|
@ -0,0 +1,492 @@
|
||||
package net.corda.tools
|
||||
|
||||
import co.paralleluniverse.strands.Strand
|
||||
import com.esotericsoftware.kryo.io.Input
|
||||
import com.esotericsoftware.kryo.io.Output
|
||||
import com.esotericsoftware.kryo.serializers.FieldSerializer
|
||||
import javassist.ClassPool
|
||||
import javassist.CtClass
|
||||
import net.corda.core.internal.ThreadBox
|
||||
import net.corda.core.utilities.debug
|
||||
import net.corda.tools.CheckpointAgent.Companion.instrumentClassname
|
||||
import net.corda.tools.CheckpointAgent.Companion.instrumentType
|
||||
import net.corda.tools.CheckpointAgent.Companion.log
|
||||
import net.corda.tools.CheckpointAgent.Companion.maximumSize
|
||||
import net.corda.tools.CheckpointAgent.Companion.minimumSize
|
||||
import net.corda.tools.CheckpointAgent.Companion.printOnce
|
||||
import net.corda.tools.CheckpointAgent.Companion.graphDepth
|
||||
import org.slf4j.LoggerFactory
|
||||
import java.io.ByteArrayInputStream
|
||||
import java.lang.instrument.ClassFileTransformer
|
||||
import java.lang.instrument.Instrumentation
|
||||
import java.security.ProtectionDomain
|
||||
import java.util.*
|
||||
import java.util.concurrent.ConcurrentHashMap
|
||||
import java.util.concurrent.atomic.AtomicInteger
|
||||
|
||||
class CheckpointAgent {
|
||||
|
||||
// whether to instrument serialized object on reads or write
|
||||
enum class InstrumentationType {
|
||||
WRITE, READ
|
||||
}
|
||||
|
||||
companion object {
|
||||
// custom argument defaults
|
||||
val DEFAULT_INSTRUMENT_CLASSNAME = "net.corda.node.services.statemachine.FlowStateMachineImpl"
|
||||
val DEFAULT_INSTRUMENT_TYPE = InstrumentationType.READ
|
||||
val DEFAULT_MINIMUM_SIZE = 8 * 1024
|
||||
val DEFAULT_MAXIMUM_SIZE = 20_000_000
|
||||
val DEFAULT_GRAPH_DEPTH = Int.MAX_VALUE
|
||||
|
||||
// startup arguments
|
||||
var instrumentClassname = DEFAULT_INSTRUMENT_CLASSNAME
|
||||
var minimumSize = DEFAULT_MINIMUM_SIZE
|
||||
var maximumSize = DEFAULT_MAXIMUM_SIZE
|
||||
var instrumentType = DEFAULT_INSTRUMENT_TYPE
|
||||
var graphDepth = DEFAULT_GRAPH_DEPTH
|
||||
var printOnce = true
|
||||
|
||||
val log by lazy {
|
||||
LoggerFactory.getLogger("CheckpointAgent")
|
||||
}
|
||||
|
||||
@JvmStatic
|
||||
fun premain(argumentsString: String?, instrumentation: Instrumentation) {
|
||||
parseArguments(argumentsString)
|
||||
instrumentation.addTransformer(CheckpointHook)
|
||||
}
|
||||
|
||||
fun parseArguments(argumentsString: String?) {
|
||||
argumentsString?.let {
|
||||
val nvpList = it.split(",")
|
||||
nvpList.forEach {
|
||||
val nvpItem = it.split("=")
|
||||
if (nvpItem.size == 2) {
|
||||
when (nvpItem[0].trim()) {
|
||||
"instrumentClassname" -> instrumentClassname = nvpItem[1]
|
||||
"instrumentType" -> try { instrumentType = InstrumentationType.valueOf(nvpItem[1].toUpperCase()) } catch (e: Exception) {
|
||||
display("Invalid value: ${nvpItem[1]}. Please specify read or write.")
|
||||
}
|
||||
"minimumSize" -> try { minimumSize = nvpItem[1].toInt() } catch (e: NumberFormatException) {
|
||||
display("Invalid value: ${nvpItem[1]}. Please specify an integer value.") }
|
||||
"maximumSize" -> try { maximumSize = nvpItem[1].toInt() } catch (e: NumberFormatException) {
|
||||
display("Invalid value: ${nvpItem[1]}. Please specify an integer value.")
|
||||
}
|
||||
"graphDepth" -> try { graphDepth = nvpItem[1].toInt() } catch (e: NumberFormatException) {
|
||||
display("Invalid value: ${nvpItem[1]}. Please specify an integer value.")
|
||||
}
|
||||
"printOnce" -> try { printOnce = nvpItem[1].toBoolean() } catch (e: Exception) {
|
||||
display("Invalid value: ${nvpItem[1]}. Please specify true or false.")
|
||||
}
|
||||
else -> display("Invalid argument: $nvpItem")
|
||||
}
|
||||
}
|
||||
else display("Missing value for argument: $nvpItem")
|
||||
}
|
||||
}
|
||||
println("Running Checkpoint agent with following arguments: instrumentClassname=$instrumentClassname, instrumentType=$instrumentType, minimumSize=$minimumSize, maximumSize=$maximumSize, graphDepth=$graphDepth, printOnce=$printOnce\n")
|
||||
}
|
||||
|
||||
private fun display(output: String) {
|
||||
System.err.println("CheckpointAgent: $output")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* The hook simply records the read() and write() entries and exits together with the output offset at the time of the call.
|
||||
* This is recorded in a StrandID -> List<StatsEvent> map.
|
||||
*
|
||||
* Later we "parse" these lists into a tree.
|
||||
*/
|
||||
object CheckpointHook : ClassFileTransformer {
|
||||
val classPool = ClassPool.getDefault()
|
||||
val hookClassName = javaClass.name
|
||||
|
||||
override fun transform(
|
||||
loader: ClassLoader?,
|
||||
className: String,
|
||||
classBeingRedefined: Class<*>?,
|
||||
protectionDomain: ProtectionDomain?,
|
||||
classfileBuffer: ByteArray
|
||||
): ByteArray? {
|
||||
if (className.startsWith("java") || className.startsWith("javassist") || className.startsWith("kotlin")) {
|
||||
return null
|
||||
}
|
||||
return try {
|
||||
val clazz = classPool.makeClass(ByteArrayInputStream(classfileBuffer))
|
||||
instrumentClass(clazz)?.toBytecode()
|
||||
} catch (throwable: Throwable) {
|
||||
throwable.printStackTrace(System.out)
|
||||
null
|
||||
}
|
||||
}
|
||||
|
||||
private fun instrumentClass(clazz: CtClass): CtClass? {
|
||||
for (method in clazz.declaredBehaviors) {
|
||||
if (instrumentType == CheckpointAgent.InstrumentationType.WRITE) {
|
||||
if (method.name == "write") {
|
||||
val parameterTypeNames = method.parameterTypes.map { it.name }
|
||||
if (parameterTypeNames == listOf("com.esotericsoftware.kryo.Kryo", "com.esotericsoftware.kryo.io.Output", "java.lang.Object")) {
|
||||
if (method.isEmpty) continue
|
||||
log.debug { "Instrumenting on write: ${clazz.name}" }
|
||||
method.insertBefore("$hookClassName.${this::writeEnter.name}($2, $3);")
|
||||
method.insertAfter("$hookClassName.${this::writeExit.name}($2, $3);")
|
||||
return clazz
|
||||
}
|
||||
}
|
||||
}
|
||||
if (instrumentType == CheckpointAgent.InstrumentationType.READ) {
|
||||
if (method.name == "read") {
|
||||
val parameterTypeNames = method.parameterTypes.map { it.name }
|
||||
if (parameterTypeNames == listOf("com.esotericsoftware.kryo.Kryo", "com.esotericsoftware.kryo.io.Input", "java.lang.Class")) {
|
||||
if (method.isEmpty) continue
|
||||
log.debug { "Instrumenting on read: ${clazz.name}" }
|
||||
method.insertBefore("$hookClassName.${this::readEnter.name}($2, $3);")
|
||||
method.insertAfter("$hookClassName.${this::readExit.name}($2, $3, (java.lang.Object)\$_);")
|
||||
return clazz
|
||||
} else if (parameterTypeNames == listOf("com.esotericsoftware.kryo.io.Input", "java.lang.Object")) {
|
||||
if (method.isEmpty) continue
|
||||
log.debug { "Instrumenting on field read: ${clazz.name}" }
|
||||
method.insertBefore("$hookClassName.${this::readFieldEnter.name}((java.lang.Object)this);")
|
||||
method.insertAfter("$hookClassName.${this::readFieldExit.name}($2, (java.lang.Object)this);")
|
||||
return clazz
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return null
|
||||
}
|
||||
|
||||
// StrandID -> StatsEvent map
|
||||
val events = ConcurrentHashMap<Long, Pair<ArrayList<StatsEvent>, AtomicInteger>>()
|
||||
|
||||
// Global static variable that can be set programmatically by 3rd party tools/code (eg. CheckpointDumper)
|
||||
// Only relevant to READ operations (as the checkpoint id must have previously been generated)
|
||||
var checkpointId: UUID? = null
|
||||
set(value) {
|
||||
if (value != null)
|
||||
log.debug { "Diagnosing checkpoint id: $value" }
|
||||
mutex.locked {
|
||||
field = value
|
||||
}
|
||||
}
|
||||
private val mutex = ThreadBox(checkpointId)
|
||||
|
||||
@JvmStatic
|
||||
fun readFieldEnter(that: Any) {
|
||||
if (that is FieldSerializer.CachedField<*>) {
|
||||
log.debug { "readFieldEnter object: ${that.field.name}:${that.field.type}" }
|
||||
val (list, _) = events.getOrPut(Strand.currentStrand().id) { Pair(ArrayList(), AtomicInteger(0)) }
|
||||
list.add(StatsEvent.EnterField(that.field.name, that.field.type))
|
||||
}
|
||||
}
|
||||
|
||||
@JvmStatic
|
||||
fun readFieldExit(obj: Any?, that: Any) {
|
||||
if (that is FieldSerializer.CachedField<*>) {
|
||||
val (list, _) = events.getOrPut(Strand.currentStrand().id) { Pair(ArrayList(), AtomicInteger(0)) }
|
||||
val value = that.field.get(obj)
|
||||
val arrayValue = getArrayValue(that.field.type, value)
|
||||
if (!that.javaClass.name.endsWith("ObjectField")
|
||||
|| arrayValue != null || that.field.type == java.lang.String::class.java || value == null) {
|
||||
log.debug { "readFieldExit basic type value: ${that.field.name}:${that.field.type} = ${arrayValue ?: value}" }
|
||||
list.add(StatsEvent.BasicTypeField(that.field.name, that.field.type, arrayValue ?: value))
|
||||
} else {
|
||||
log.debug { "readFieldExit object value: ${that.field.name}:${that.field.type} = $value" }
|
||||
list.add(StatsEvent.ObjectField(that.field.name, that.field.type, value))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private fun <T> getArrayValue(clazz: Class<T>, value: Any?): String? {
|
||||
if (clazz.isArray) {
|
||||
log.debug { "readFieldExit array type: $clazz, value: $value]" }
|
||||
if (Array<Number>::class.java.isAssignableFrom(clazz)) {
|
||||
val numberValue = value as Array<Number>
|
||||
log.debug { "readFieldExit array of number: $clazz = ${numberValue.joinToString(",")}" }
|
||||
return numberValue.joinToString(",")
|
||||
}
|
||||
else if (clazz == Array<Boolean>::class.java) {
|
||||
val arrayValue = value as Array<Boolean>
|
||||
log.debug { "readFieldExit array of boolean: $clazz = ${arrayValue.joinToString(",")}" }
|
||||
return arrayValue.joinToString(",")
|
||||
}
|
||||
// N dimensional arrays
|
||||
else if (arrayOf<Array<*>>()::class.java.isAssignableFrom(clazz)) {
|
||||
val arrayValue = value as Array<Array<*>>
|
||||
return arrayValue.map { arrayEntry ->
|
||||
log.debug { "N Dimensional: $clazz, $arrayEntry, ${arrayEntry::class.java}" }
|
||||
"[" + getArrayValue(arrayEntry::class.java, arrayEntry) + "]"
|
||||
}.toString()
|
||||
}
|
||||
// Kotlin array types
|
||||
else if (clazz == CharArray::class.java) {
|
||||
val arrayValue = value as CharArray
|
||||
log.debug { "readFieldExit char array: $clazz = ${arrayValue.joinToString("")}" }
|
||||
return arrayValue.joinToString("")
|
||||
}
|
||||
else if (clazz == ByteArray::class.java) {
|
||||
val arrayValue = value as ByteArray
|
||||
log.debug { "readFieldExit byte array: $clazz = ${byteArrayToHex(arrayValue)}" }
|
||||
return byteArrayToHex(arrayValue)
|
||||
}
|
||||
else if (clazz == ShortArray::class.java) {
|
||||
val arrayValue = value as ShortArray
|
||||
log.debug { "readFieldExit short array: $clazz = ${arrayValue.joinToString(",")}" }
|
||||
return arrayValue.joinToString(",")
|
||||
}
|
||||
else if (clazz == IntArray::class.java) {
|
||||
val arrayValue = value as IntArray
|
||||
log.debug { "readFieldExit int array: $clazz = ${arrayValue.joinToString(",")}" }
|
||||
return arrayValue.joinToString(",")
|
||||
}
|
||||
else if (clazz == LongArray::class.java) {
|
||||
val arrayValue = value as LongArray
|
||||
log.debug { "readFieldExit long array: $clazz = ${arrayValue.joinToString(",")}" }
|
||||
return arrayValue.joinToString(",")
|
||||
}
|
||||
else if (clazz == FloatArray::class.java) {
|
||||
val arrayValue = value as FloatArray
|
||||
log.debug { "readFieldExit float array: $clazz = ${arrayValue.joinToString(",")}" }
|
||||
return arrayValue.joinToString(",")
|
||||
}
|
||||
else if (clazz == DoubleArray::class.java) {
|
||||
val arrayValue = value as DoubleArray
|
||||
log.debug { "readFieldExit double array: $clazz = ${arrayValue.joinToString(",")}" }
|
||||
return arrayValue.joinToString(",")
|
||||
}
|
||||
else if (clazz == BooleanArray::class.java) {
|
||||
val arrayValue = value as BooleanArray
|
||||
log.debug { "readFieldExit boolean array: $clazz = ${arrayValue.joinToString(",")}" }
|
||||
return arrayValue.joinToString(",")
|
||||
}
|
||||
log.debug { "ARRAY OF TYPE: $clazz (size: ${(value as Array<Any?>).size})" }
|
||||
}
|
||||
return null
|
||||
}
|
||||
|
||||
private fun byteArrayToHex(a: ByteArray): String {
|
||||
val sb = StringBuilder(a.size * 2)
|
||||
for (b in a)
|
||||
sb.append(String.format("%02x", b))
|
||||
return sb.toString()
|
||||
}
|
||||
|
||||
@JvmStatic
|
||||
fun readEnter(input: Input, clazz: Class<*>) {
|
||||
val (list, count) = events.getOrPut(Strand.currentStrand().id) { Pair(ArrayList(), AtomicInteger(0)) }
|
||||
log.debug { "readEnter: adding event for clazz: ${clazz.name} (strandId: ${Strand.currentStrand().id})" }
|
||||
list.add(StatsEvent.Enter(clazz.name, input.total()))
|
||||
count.incrementAndGet()
|
||||
}
|
||||
|
||||
@JvmStatic
|
||||
fun readExit(input: Input, clazz: Class<*>, value: Any?) {
|
||||
val (list, count) = events[Strand.currentStrand().id]!!
|
||||
list.add(StatsEvent.Exit(clazz.name, input.total(), value))
|
||||
log.debug { "readExit: clazz[$clazz], strandId[${Strand.currentStrand().id}], eventCount[$count]" }
|
||||
if (count.decrementAndGet() == 0) {
|
||||
if ((checkpointId != null) ||
|
||||
((clazz.name == instrumentClassname) &&
|
||||
(input.total() >= minimumSize) &&
|
||||
(input.total() <= maximumSize))) {
|
||||
val sb = StringBuilder()
|
||||
if (checkpointId != null)
|
||||
sb.append("Checkpoint id: $checkpointId\n")
|
||||
prettyStatsTree(0, StatsInfo("", Any::class.java), readTree(list, 0).second, sb)
|
||||
|
||||
log.info("[READ] $clazz\n$sb")
|
||||
checkpointId = null
|
||||
}
|
||||
log.debug { "readExit: clearing event for clazz: $clazz (strandId: ${Strand.currentStrand().id})" }
|
||||
events.remove(Strand.currentStrand().id)
|
||||
}
|
||||
}
|
||||
|
||||
@JvmStatic
|
||||
fun writeEnter(output: Output, obj: Any) {
|
||||
val (list, count) = events.getOrPut(-Strand.currentStrand().id) { Pair(ArrayList(), AtomicInteger(0)) }
|
||||
log.debug { "writeEnter: adding event for clazz: ${obj.javaClass.name} (strandId: ${Strand.currentStrand().id})" }
|
||||
list.add(StatsEvent.Enter(obj.javaClass.name, output.total()))
|
||||
count.incrementAndGet()
|
||||
}
|
||||
|
||||
@JvmStatic
|
||||
fun writeExit(output: Output, obj: Any) {
|
||||
val (list, count) = events[-Strand.currentStrand().id]!!
|
||||
list.add(StatsEvent.Exit(obj.javaClass.name, output.total(), null))
|
||||
log.debug { "writeExit: clazz[${obj.javaClass.name}], strandId[${Strand.currentStrand().id}], eventCount[$count]" }
|
||||
if (count.decrementAndGet() == 0) {
|
||||
// always log diagnostics for explicit checkpoint ids (eg. set dumpCheckpoints)
|
||||
if ((checkpointId != null) ||
|
||||
((obj.javaClass.name == instrumentClassname) &&
|
||||
(output.total() >= minimumSize) &&
|
||||
(output.total() <= maximumSize))) {
|
||||
val sb = StringBuilder()
|
||||
prettyStatsTree(0, StatsInfo("", Any::class.java), readTree(list, 0).second, sb)
|
||||
log.info("[WRITE] $obj\n$sb")
|
||||
checkpointId = null
|
||||
}
|
||||
log.debug { "writeExit: clearing event for clazz: ${obj.javaClass.name} (strandId: ${Strand.currentStrand().id})" }
|
||||
events.remove(-Strand.currentStrand().id)
|
||||
}
|
||||
}
|
||||
|
||||
private fun prettyStatsTree(indent: Int, statsInfo: StatsInfo, identityInfo: IdentityInfo, builder: StringBuilder) {
|
||||
val statsTree = identityInfo.tree
|
||||
when (statsTree) {
|
||||
is StatsTree.Object -> {
|
||||
if (printOnce && identityInfo.refCount > 1) {
|
||||
log.debug { "Skipping $statsInfo, $statsTree (count:${identityInfo.refCount})" }
|
||||
}
|
||||
else if (indent/2 < graphDepth) {
|
||||
builder.append(String.format("%03d:", indent / 2))
|
||||
builder.append(CharArray(indent) { ' ' })
|
||||
builder.append(" ${statsInfo.fieldName} ")
|
||||
if (statsInfo.fieldType != null && statsInfo.fieldType.isArray) {
|
||||
val arrayValue = (statsTree.value as Array<Any?>)
|
||||
builder.append("${statsInfo.fieldType} (array length:${arrayValue.size})")
|
||||
}
|
||||
else if (statsInfo.fieldType != null && statsTree.value is Collection<*>) {
|
||||
builder.append("${statsInfo.fieldType} (collection size:${statsTree.value.size})")
|
||||
}
|
||||
else if (statsInfo.fieldType != null && statsTree.value is Map<*,*>) {
|
||||
builder.append("${statsInfo.fieldType} (map size:${statsTree.value.size})")
|
||||
}
|
||||
else {
|
||||
builder.append("${statsTree.className} (hash:${statsTree.value?.hashCode()}) (count:${identityInfo.refCount})")
|
||||
}
|
||||
builder.append(" ")
|
||||
builder.append(String.format("%,d", statsTree.size))
|
||||
builder.append("\n")
|
||||
}
|
||||
for (child in statsTree.children) {
|
||||
prettyStatsTree(indent + 2, child.first, child.second, builder)
|
||||
}
|
||||
}
|
||||
is StatsTree.BasicType -> {
|
||||
if (indent/2 < graphDepth) {
|
||||
builder.append(String.format("%03d:", indent / 2))
|
||||
builder.append(CharArray(indent) { ' ' })
|
||||
builder.append(" ${statsInfo.fieldName} ")
|
||||
builder.append("${statsTree.value}")
|
||||
builder.append("\n")
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
sealed class StatsEvent {
|
||||
data class Enter(val className: String, val offset: Long) : StatsEvent()
|
||||
data class Exit(val className: String, val offset: Long, val value: Any?) : StatsEvent()
|
||||
data class BasicTypeField(val fieldName: String, val fieldType: Class<*>?, val fieldValue: Any?) : StatsEvent()
|
||||
data class EnterField(val fieldName: String, val fieldType: Class<*>?) : StatsEvent()
|
||||
data class ObjectField(val fieldName: String, val fieldType: Class<*>?, val value: Any) : StatsEvent()
|
||||
}
|
||||
|
||||
sealed class StatsTree {
|
||||
data class Object(
|
||||
val className: String,
|
||||
val size: Long,
|
||||
val children: List<Pair<StatsInfo, IdentityInfo>>,
|
||||
val value: Any?
|
||||
) : StatsTree() {
|
||||
override fun toString(): String {
|
||||
return "Object [$className, $size, $value]"
|
||||
}
|
||||
}
|
||||
|
||||
data class BasicType(
|
||||
val value: Any?
|
||||
) : StatsTree()
|
||||
|
||||
data class Loop(var depth: Int) : StatsTree() {
|
||||
override fun toString(): String {
|
||||
return "Loop()"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
data class IdentityInfo(val tree: StatsTree, val refCount: Int)
|
||||
|
||||
fun readTree(events: List<StatsEvent>, index: Int, idMap: IdentityHashMap<Any, IdentityInfo> = IdentityHashMap()): Pair<Int, IdentityInfo> {
|
||||
val event = events[index]
|
||||
when (event) {
|
||||
is StatsEvent.Enter -> {
|
||||
val (nextIndex, children) = readTrees(events, index + 1, idMap)
|
||||
val exit = events[nextIndex] as StatsEvent.Exit
|
||||
require(event.className == exit.className)
|
||||
val tree = StatsTree.Object(event.className, exit.offset - event.offset, children, exit.value)
|
||||
if (idMap.containsKey(exit.value)) {
|
||||
val identityInfo = idMap[exit.value]!!
|
||||
idMap[exit.value] = IdentityInfo(identityInfo.tree, identityInfo.refCount + 1)
|
||||
log.debug { "Skipping repeated StatsEvent.Enter: ${exit.value} (hashcode:${exit.value!!.hashCode()}) (count:${idMap[exit.value]?.refCount})" }
|
||||
} else idMap[exit.value] = IdentityInfo(tree, 1)
|
||||
return Pair(nextIndex + 1, idMap[exit.value]!!)
|
||||
}
|
||||
else -> {
|
||||
throw IllegalStateException("Wasn't expecting event: $event")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
data class StatsInfo(val fieldName: String, val fieldType: Class<*>?)
|
||||
|
||||
fun readTrees(events: List<StatsEvent>, index: Int, idMap: IdentityHashMap<Any, IdentityInfo>): Pair<Int, List<Pair<StatsInfo, IdentityInfo>>> {
|
||||
val trees = ArrayList<Pair<StatsInfo, IdentityInfo>>()
|
||||
var i = index
|
||||
var arrayIdx = 0
|
||||
var inField = false
|
||||
while (true) {
|
||||
val event = events.getOrNull(i)
|
||||
when (event) {
|
||||
is StatsEvent.Enter -> {
|
||||
val (nextIndex, tree) = readTree(events, i, idMap)
|
||||
if (!inField) {
|
||||
arrayIdx++
|
||||
trees += StatsInfo("[$arrayIdx]", Any::class.java) to tree
|
||||
}
|
||||
i = nextIndex
|
||||
}
|
||||
is StatsEvent.EnterField -> {
|
||||
i++
|
||||
inField = true
|
||||
}
|
||||
is StatsEvent.Exit -> {
|
||||
arrayIdx = 0
|
||||
if (idMap.containsKey(event.value)) {
|
||||
val identityInfo = idMap[event.value]!!
|
||||
idMap[event.value] = IdentityInfo(identityInfo.tree, identityInfo.refCount + 1)
|
||||
log.debug { "Skipping repeated StatsEvent.Exit: ${event.value} (hashcode:${event.value!!.hashCode()}) (count:${idMap[event.value]?.refCount})" }
|
||||
}
|
||||
return Pair(i, trees)
|
||||
}
|
||||
is StatsEvent.BasicTypeField -> {
|
||||
trees += StatsInfo(event.fieldName, event.fieldType) to IdentityInfo(StatsTree.BasicType(event.fieldValue), 1)
|
||||
i++
|
||||
inField = false
|
||||
}
|
||||
is StatsEvent.ObjectField -> {
|
||||
val identityInfo =
|
||||
if (idMap.containsKey(event.value)) {
|
||||
val identityInfo = idMap[event.value]!!
|
||||
idMap[event.value] = IdentityInfo(identityInfo.tree, identityInfo.refCount + 1)
|
||||
log.debug { "Skipping repeated StatsEvent.ObjectField: ${event.value} (hashcode:${event.value!!.hashCode()}) (count:${idMap[event.value]?.refCount})" }
|
||||
identityInfo
|
||||
}
|
||||
else {
|
||||
IdentityInfo(StatsTree.Loop(0), 1)
|
||||
}
|
||||
trees += StatsInfo(event.fieldName, event.fieldType) to identityInfo
|
||||
i++
|
||||
inField = false
|
||||
}
|
||||
null -> {
|
||||
return Pair(i, trees)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
Loading…
Reference in New Issue
Block a user