fix sync script, update remote sync documentation

2025-06-15 03:18:07 +00:00 · 2020-08-31 12:36:30 +02:00
parent 567042d146
commit e7db4d4fe0
2 changed files with 68 additions and 48 deletions
--- a/docs/parallel_fuzzing.md
+++ b/docs/parallel_fuzzing.md
@ -10,8 +10,8 @@ n-core system, you can almost always run around n concurrent fuzzing jobs with
 virtually no performance hit (you can use the afl-gotcpu tool to make sure).
 In fact, if you rely on just a single job on a multi-core system, you will
-be underutilizing the hardware. So, parallelization is usually the right
+be underutilizing the hardware. So, parallelization is always the right way to
-way to go.
+go.
 When targeting multiple unrelated binaries or using the tool in
 "non-instrumented" (-n) mode, it is perfectly fine to just start up several
@ -65,22 +65,7 @@ still perform deterministic checks; while the secondary instances will
 proceed straight to random tweaks.
 Note that you must always have one -M main instance!
-
+Running multiple -M instances is wasteful!
 Note that running multiple -M instances is wasteful, although there is an
 experimental support for parallelizing the deterministic checks. To leverage
 that, you need to create -M instances like so:
 ```
 ./afl-fuzz -i testcase_dir -o sync_dir -M mainA:1/3 [...]
 ./afl-fuzz -i testcase_dir -o sync_dir -M mainB:2/3 [...]
 ./afl-fuzz -i testcase_dir -o sync_dir -M mainC:3/3 [...]
 ```
 ...where the first value after ':' is the sequential ID of a particular main
 instance (starting at 1), and the second value is the total number of fuzzers to
 distribute the deterministic fuzzing across. Note that if you boot up fewer
 fuzzers than indicated by the second number passed to -M, you may end up with
 poor coverage.
 You can also monitor the progress of your jobs from the command line with the
 provided afl-whatsup tool. When the instances are no longer finding new paths,
@ -99,61 +84,88 @@ example may be:
 This is not a concern if you use @@ without -f and let afl-fuzz come up with the
 file name.
-## 3) Syncing with non-afl fuzzers or independant instances
+## 3) Multiple -M mains
 There is support for parallelizing the deterministic checks.
 This is only needed where
 1. many new paths are found fast over a long time and it looks unlikely that
    main node will ever catch up, and
 2. deterministic fuzzing is actively helping path discovery (you can see this
    in the main node for the first for lines in the "fuzzing strategy yields"
    section. If the ration `found/attemps` is high, then it is effective. It
    most commonly isn't.)
 Only if both are true it is beneficial to have more than one main.
 You can leverage this by creating -M instances like so:
 ```
 ./afl-fuzz -i testcase_dir -o sync_dir -M mainA:1/3 [...]
 ./afl-fuzz -i testcase_dir -o sync_dir -M mainB:2/3 [...]
 ./afl-fuzz -i testcase_dir -o sync_dir -M mainC:3/3 [...]
 ```
 ... where the first value after ':' is the sequential ID of a particular main
 instance (starting at 1), and the second value is the total number of fuzzers to
 distribute the deterministic fuzzing across. Note that if you boot up fewer
 fuzzers than indicated by the second number passed to -M, you may end up with
 poor coverage.
 ## 4) Syncing with non-afl fuzzers or independant instances
 A -M main node can be told with the `-F other_fuzzer_queue_directory` option
 to sync results from other fuzzers, e.g. libfuzzer or honggfuzz.
 Only the specified directory will by synced into afl, not subdirectories.
-The specified directories do not need to exist yet at the start of afl.
+The specified directory does not need to exist yet at the start of afl.
-## 4) Multi-system parallelization
+The `-F` option can be passed to the main node several times.
 ## 5) Multi-system parallelization
 The basic operating principle for multi-system parallelization is similar to
 the mechanism explained in section 2. The key difference is that you need to
 write a simple script that performs two actions:
  - Uses SSH with authorized_keys to connect to every machine and retrieve
-    a tar archive of the /path/to/sync_dir/<fuzzer_id>/queue/ directories for
+    a tar archive of the /path/to/sync_dir/<main_node(s)> directory local to
-    every <fuzzer_id> local to the machine. It's best to use a naming scheme
+    the machine.
-    that includes host name in the fuzzer ID, so that you can do something
+    It is best to use a naming scheme that includes host name and it's being
-    like:
+    a main node (e.g. main1, main2) in the fuzzer ID, so that you can do
    something like:
    ```sh
-    for s in {1..10}; do
+    for host in `cat HOSTLIST`; do
-      ssh user@host${s} "tar -czf - sync/host${s}_fuzzid*/[qf]*" >host${s}.tgz
+      ssh user@$host "tar -czf - sync/$host_main*/" > $host.tgz
    done
    ```
  - Distributes and unpacks these files on all the remaining machines, e.g.:
    ```sh
-    for s in {1..10}; do
+    for srchost in `cat HOSTLIST`; do
-      for d in {1..10}; do
+      for dsthost in `cat HOSTLIST`; do
        test "$s" = "$d" && continue
-        ssh user@host${d} 'tar -kxzf -' <host${s}.tgz
+        ssh user@$srchost 'tar -kxzf -' < $dsthost.tgz
      done
    done
    ```
-There is an example of such a script in examples/distributed_fuzzing/;
+There is an example of such a script in examples/distributed_fuzzing/.
 you can also find a more featured, experimental tool developed by
 Martijn Bogaard at:
-  https://github.com/MartijnB/disfuzz-afl
+There are other (older) more featured, experimental tools:
  * https://github.com/richo/roving
  * https://github.com/MartijnB/disfuzz-afl
-Another client-server implementation from Richo Healey is:
+However these do not support syncing just main nodes (yet).
  https://github.com/richo/roving
 Note that these third-party tools are unsafe to run on systems exposed to the
 Internet or to untrusted users.
 When developing custom test case sync code, there are several optimizations
 to keep in mind:
  - The synchronization does not have to happen very often; running the
-    task every 30 minutes or so may be perfectly fine.
+    task every 60 minutes or even less often at later fuzzing stages is
    fine
  - There is no need to synchronize crashes/ or hangs/; you only need to
    copy over queue/* (and ideally, also fuzzer_stats).
@ -179,12 +191,17 @@ to keep in mind:
  - You do not want a "main" instance of afl-fuzz on every system; you should
    run them all with -S, and just designate a single process somewhere within
    the fleet to run with -M.
  - Syncing is only necessary for the main nodes on a system. It is possible
    to run main-less with only secondaries. However then you need to find out
    which secondary took over the temporary role to be the main node. Look for
    the `is_main` file in the fuzzer directories, eg. `sync-dir/hostname-*/is_main`
 It is *not* advisable to skip the synchronization script and run the fuzzers
 directly on a network filesystem; unexpected latency and unkillable processes
 in I/O wait state can mess things up.
-## 5) Remote monitoring and data collection
+## 6) Remote monitoring and data collection
 You can use screen, nohup, tmux, or something equivalent to run remote
 instances of afl-fuzz. If you redirect the program's output to a file, it will
@ -208,7 +225,7 @@ Keep in mind that crashing inputs are *not* automatically propagated to the
 main instance, so you may still want to monitor for crashes fleet-wide
 from within your synchronization or health checking scripts (see afl-whatsup).
-## 6) Asymmetric setups
+## 7) Asymmetric setups
 It is perhaps worth noting that all of the following is permitted:
@ -224,7 +241,7 @@ It is perhaps worth noting that all of the following is permitted:
    the discovered test cases can have synergistic effects and improve the
    overall coverage.
-    (In this case, running one -M instance per each binary is a good plan.)
+    (In this case, running one -M instance per target is necessary.)
  - Having some of the fuzzers invoke the binary in different ways.
    For example, 'djpeg' supports several DCT modes, configurable with
--- a/examples/distributed_fuzzing/sync_script.sh
+++ b/examples/distributed_fuzzing/sync_script.sh
@ -39,8 +39,11 @@ FUZZ_USER=bob
 # Directory to synchronize
 SYNC_DIR='/home/bob/sync_dir'
-# Interval (seconds) between sync attempts
+# We only capture -M main nodes, set the name to your chosen nameing scheme
-SYNC_INTERVAL=$((30 * 60))
+MAIN_NAME='main'
 # Interval (seconds) between sync attempts (eg one hour)
 SYNC_INTERVAL=$((60 * 60))
 if [ "$AFL_ALLOW_TMP" = "" ]; then
@ -63,7 +66,7 @@ while :; do
    echo "[*] Retrieving data from ${host}.${FUZZ_DOMAIN}..."
    ssh -o 'passwordauthentication no' ${FUZZ_USER}@${host}.$FUZZ_DOMAIN \
-      "cd '$SYNC_DIR' && tar -czf - ${host}_*/[qf]*" >".sync_tmp/${host}.tgz"
+      "cd '$SYNC_DIR' && tar -czf - ${host}_${MAIN_NAME}*/" > ".sync_tmp/${host}.tgz"
  done
@ -80,7 +83,7 @@ while :; do
      echo "    Sending fuzzer data from ${src_host}.${FUZZ_DOMAIN}..."
      ssh -o 'passwordauthentication no' ${FUZZ_USER}@$dst_host \
-        "cd '$SYNC_DIR' && tar -xkzf -" <".sync_tmp/${src_host}.tgz"
+        "cd '$SYNC_DIR' && tar -xkzf - " < ".sync_tmp/${src_host}.tgz"
    done