<h1>Load testing<aclass="headerlink"href="#load-testing"title="Permalink to this headline">¶</a></h1>
<p>This section explains how to apply random load to nodes to stress test them. It also allows the specification of disruptions that strain different resources, allowing us to inspect the nodes’ behaviour under extreme conditions.</p>
<p>The load-testing framework is incomplete and is not part of CI currently, but the basic pieces are there.</p>
<h2>Configuration of the load testing cluster<aclass="headerlink"href="#configuration-of-the-load-testing-cluster"title="Permalink to this headline">¶</a></h2>
<p>The load-testing framework currently assumes the following about the node cluster:</p>
<ulclass="simple">
<li>The nodes are managed as a systemd service</li>
<li>The node directories are the same across the cluster</li>
<li>The messaging ports are the same across the cluster</li>
<li>The executing identity of the load-test has SSH access to all machines</li>
<li>There is a single network map service node</li>
<li>There is a single notary node</li>
<li>Some disruptions also assume other tools (like openssl) to be present</li>
</ul>
<p>Note that these points could and should be relaxed as needed.</p>
<p>The load test Main expects a single command line argument that points to a configuration file specifying the cluster hosts and optional overrides for the default configuration:</p>
<h2>Running the load tests<aclass="headerlink"href="#running-the-load-tests"title="Permalink to this headline">¶</a></h2>
<p>In order to run the loadtests you need to have an active SSH-agent running with a single identity added that has SSH access to the loadtest cluster.</p>
<p>You can use either IntelliJ or the gradle command line to start the tests.</p>
<p>To use gradle: <codeclass="docutils literal"><spanclass="pre">./gradlew</span><spanclass="pre">tools:loadtest:run</span><spanclass="pre">-Ploadtest-config=PATH_TO_LOADTEST_CONF</span></code></p>
<p>To use IntelliJ simply run Main.kt with the config path supplied as an argument.</p>
<h2>Configuration of individual load tests<aclass="headerlink"href="#configuration-of-individual-load-tests"title="Permalink to this headline">¶</a></h2>
<p>The load testing configurations are not set-in-stone and are meant to be played with to see how the nodes react.</p>
<p>There are a couple of top-level knobs to tweak test behaviour:</p>
<p>The one thing of note is <codeclass="docutils literal"><spanclass="pre">disruptionPatterns</span></code>, which may be used to specify ways of disrupting the normal running of the load tests.</p>
<p>Disruptions run concurrently in loops on randomly chosen nodes filtered by <codeclass="docutils literal"><spanclass="pre">nodeFilter</span></code> at somewhat random intervals.</p>
<p>As an example take <codeclass="docutils literal"><spanclass="pre">strainCpu</span></code> which overutilises the processor:</p>
<divclass="highlight-kotlin"><divclass="highlight"><pre><span></span><spanclass="k">fun</span><spanclass="nf">strainCpu</span><spanclass="p">(</span><spanclass="n">parallelism</span><spanclass="p">:</span><spanclass="n">Int</span><spanclass="p">,</span><spanclass="n">durationSeconds</span><spanclass="p">:</span><spanclass="n">Int</span><spanclass="p">)</span><spanclass="p">=</span><spanclass="n">Disruption</span><spanclass="p">(</span><spanclass="s">"Put strain on cpu"</span><spanclass="p">)</span><spanclass="p">{</span><spanclass="n">node</span><spanclass="p">,</span><spanclass="n">random</span><spanclass="p">-></span>
<spanclass="k">val</span><spanclass="py">shell</span><spanclass="p">=</span><spanclass="s">"for c in {1..$parallelism} ; do openssl enc -aes-128-cbc -in /dev/urandom -pass pass: -e > /dev/null & done && JOBS=\$(jobs -p) && (sleep $durationSeconds && kill \$JOBS) & wait"</span>
<p>We can use this by specifying a <codeclass="docutils literal"><spanclass="pre">DisruptionSpec</span></code> in the load test’s <codeclass="docutils literal"><spanclass="pre">RunParameters</span></code>:</p>
<p>This means every 5-10 seconds at least one randomly chosen nodes’ cores will be spinning 100% for 10 seconds.</p>
</div>
<divclass="section"id="how-to-write-a-load-test">
<h2>How to write a load test<aclass="headerlink"href="#how-to-write-a-load-test"title="Permalink to this headline">¶</a></h2>
<p>A load test is basically defined by a random datastructure generator that specifies a unit of work a node should perform, a function that performs this work, and a function that predicts what state the node should end up in by doing so:</p>
<p><codeclass="docutils literal"><spanclass="pre">LoadTest</span></code> is parameterised over <codeclass="docutils literal"><spanclass="pre">T</span></code>, the unit of work, and <codeclass="docutils literal"><spanclass="pre">S</span></code>, the state type that aims to track remote node states. As an example let’s look at the Self Issue test. This test simply creates Cash Issues from nodes to themselves, and then checks the vault to see if the numbers add up:</p>
<p>The unit of work <codeclass="docutils literal"><spanclass="pre">SelfIssueCommand</span></code> simply holds an Issue and a handle to a node where the issue should be submitted. The <codeclass="docutils literal"><spanclass="pre">generate</span></code> method should provide a generator for these.</p>
<p>The state <codeclass="docutils literal"><spanclass="pre">SelfIssueState</span></code> then holds a map from node identities to a Long that describes the sum quantity of the generated issues (we fixed the currency to be USD).</p>
<p>The invariant we want to hold then simply is: The sum of submitted Issues should be the sum of the quantities in the vaults.</p>
<p>The <codeclass="docutils literal"><spanclass="pre">interpret</span></code> function should take a <codeclass="docutils literal"><spanclass="pre">SelfIssueCommand</span></code> and update <codeclass="docutils literal"><spanclass="pre">SelfIssueState</span></code> to reflect the change we’re expecting in the remote nodes. In our case this will simply be adding the issued amount to the corresponding node’s Long.</p>
<p>The <codeclass="docutils literal"><spanclass="pre">execute</span></code> function should perform the action on the cluster. In our case it will simply take the node handle and submit an RPC request for the Issue.</p>
<p>The <codeclass="docutils literal"><spanclass="pre">gatherRemoteState</span></code> function should check the actual remote nodes’ states and see whether they conflict with our local predictions (and should throw if they do). This function deserves its own paragraph.</p>
<p><codeclass="docutils literal"><spanclass="pre">gatherRemoteState</span></code> gets as input handles to all the nodes, and the current predicted state, or null if this is the initial gathering.</p>
<p>The reason it gets the previous state boils down to allowing non-deterministic predictions about the nodes’ remote states. Say some piece of work triggers an asynchronous notification of a node. We need to account both for the case when the node hasn’t received the notification and for the case when it has. In these cases <codeclass="docutils literal"><spanclass="pre">S</span></code> should somehow represent a collection of possible states, and <codeclass="docutils literal"><spanclass="pre">gatherRemoteState</span></code> should “collapse” the collection based on the observations it makes. Of course we don’t need this for the simple case of the Self Issue test.</p>
<p>The last parameter <codeclass="docutils literal"><spanclass="pre">isConsistent</span></code> is used to poll for eventual consistency at the end of a load test. This is not needed for Self Issue.</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.