From ac7f0fa321de5dd56ca937ce8c4aabcec39cb327 Mon Sep 17 00:00:00 2001 From: "John M. Penn" Date: Fri, 11 Aug 2023 15:28:42 -0500 Subject: [PATCH] Add Checkpointing-Best-Practices.md to How-to guides. --- .../Checkpointing-Best-Practices.md | 146 ++++++++++++++++++ docs/howto_guides/How-To-Guides.md | 1 + 2 files changed, 147 insertions(+) create mode 100644 docs/howto_guides/Checkpointing-Best-Practices.md diff --git a/docs/howto_guides/Checkpointing-Best-Practices.md b/docs/howto_guides/Checkpointing-Best-Practices.md new file mode 100644 index 00000000..8752db23 --- /dev/null +++ b/docs/howto_guides/Checkpointing-Best-Practices.md @@ -0,0 +1,146 @@ +# Trick CheckPointing Best Practices + +**Contents** + +* [Prerequisite Knowledge](#prerequisite-knowledge)
+* [Do's and Don'ts](#guidelines)
+* [Other Resources You Might Find Useful](#other-resources)
+ +*** + +Part of the process of designing a Trick simulation model is to ensure that it can be reliably checkpointed. Trick provides a lot of support for checkpointing, but there are things to know, and pitfalls to avoid. The purpose of this article is to provide knowledge, and guidelines that will make checkpointing easier. + + +## Prerequisite Knowledge + +The following is a high-level overview of the Trick Memory Manager and checkpointing. Understanding these concepts are important, and will help you design your sim models to be reliably checkpointable. + + +### How Memory Allocations are Recorded + +The Memory Manager is the component that **"knows"** about the memory objects (allocations) in your Trick simulation. For each of these objects the Memory Manager stores the following **"knowledge"** : + +1. **Name** - optional, but **STRONGLY** recommended. +2. **Address** - the address of the memory allocation. +3. **Storage-class** - this is either + * **TRICK\_LOCAL** for memory that is allocated by the Memory Manager, or + * **TRICK\_EXTERN** for memory that is allocated outside the Memory Manager that the Memory Manager is "told" about. +4. **Data-type** is + * Primitive type specifier (```TRICK_DOUBLE```, ```TRICK_INT```, ... etc.) , or + * Composite type specifier (```TRICK_STRUCTURED```). In this case the details of the type are specified by an ```ATTRIBUTES``` structure that is generated by Trick's Interface Code Generator (ICG). + + +### Trick Object Serialization + +The Memory Manager can convert (ie., serialize) any of the objects that it **"knows"** about to a portable, human-readable text representation, to the extent that it knows about them (ICG can only gather data-type knowledge from header files that it has scanned.) The object can later be re-created from this representation. The represention consists of: + +1. A **definition** of the allocation, and +2. **value assignments** to each of the members of the allocations data type. + +#### Example: + +Suppose one were to perform the following allocation: + +```double *dbl_p = (double*)TMM_declare_var_s("double dbl_array[3]");``` + +The Memory Manager would represent its **definition** as follows in a checkpoint : + +``` +double dbl_array[3]; +``` + +If one were then to assign values to the object, i.e. : + +``` + dbl_p[0] = 1.1; + dbl_p[1] = 2.2; + dbl_p[2] = 3.3; +``` + +then the Memory Manager would represent its **variable assignment** as follows in a checkpoint : + +``` +dbl_array = + {1.1, 2.2, 3.3}; +``` + +#### Serialization of Composite Objects +For composite type objects (i.e., class & struct objects), the **variable assignment** can consist of many assignment statements. Trick check-pointing code recursively descends into the composite type-tree, writing an assignment statement for each of the primitive data-typed members (leaves). + + +#### Serialization of Pointers +A pointer contains an address of another object. What's important is that it **refers** to the other object. We can't store the address of the object, because it will probably be different when the object is re-created at checkpoint reload. But, a **name** is also a reference. So we store pointers as names. Since objects have a name, and an address (once it's re-created) we can restore pointers by converting the name reference back to an address reference. + + +#### Importance of Naming Allocations +If an object is named, then that name will be used in checkpointing, 1) to identify and 2) to refer (point) to the object. If the object is anonymous then a temporary name must be created for checkpointing. These temporary names are of the form ```trick_anon_local_``` or ```trick_anon_extern_ +### Simulation Checkpointing + +A **checkpoint** is a persistent representation of a simulation state. It's exactly like a "saved computer game" when it's time for dinner. + +If the Trick Memory Manager **"knows"** about all of the allocations that comprise the state of a simulation, then it can checkpoint that simulation. The Trick Memory Manager checkpoints a simulation by : + +1. Opening a checkpoint file. +1. Writing all the **definitions**, of all of the objects that it knows about, to the file. +2. Writing ```clear_all_vars();``` to the file. This is interpreted when the checkpoint is re-loaded, to initialize the re-created objects. +3. Writing all the **variable assignments** to the file. These will populate the values of the object when the checkpoint is re-loaded. +4. Closing the file. + +There are certain things that simply cannot be checkpointed like file-pointers, and network connections. Perhaps there are other things as well. For these situations, Trick provides four special job classes: ```"checkpoint"```, ```"post_checkpoint"```, ```“preload_checkpoint”```, and ```“restart”``` (described below). + + +### What Happens When You Dump a Checkpoint + +A checkpoint of a simulation is usually initiated from the Input Processor. That is, via: + +1. The input file, or +2. The variable server. + +```trick.checkpoint(