Merge "binaryonly_fuzzing.md" into "fuzzing_binary-only_targets.md"

2025-06-12 01:58:17 +00:00 · 2021-11-21 15:42:46 +01:00
parent 5ec859cece
commit 36514a2e4f
2 changed files with 245 additions and 271 deletions
--- a/docs/binaryonly_fuzzing.md
+++ b/docs/binaryonly_fuzzing.md
@ -1,225 +0,0 @@
 # Fuzzing binary-only programs with AFL++
  AFL++, libfuzzer and others are great if you have the source code, and
  it allows for very fast and coverage guided fuzzing.
  However, if there is only the binary program and no source code available,
  then standard `afl-fuzz -n` (non-instrumented mode) is not effective.
  The following is a description of how these binaries can be fuzzed with AFL++.
 ## TL;DR:
  qemu_mode in persistent mode is the fastest - if the stability is
  high enough. Otherwise try retrowrite, afl-dyninst and if these
  fail too then try standard qemu_mode with AFL_ENTRYPOINT to where you need it.
  If your target is a library use utils/afl_frida/.
  If your target is non-linux then use unicorn_mode/.
 ## QEMU
  Qemu is the "native" solution to the program.
  It is available in the ./qemu_mode/ directory and once compiled it can
  be accessed by the afl-fuzz -Q command line option.
  It is the easiest to use alternative and even works for cross-platform binaries.
  The speed decrease is at about 50%.
  However various options exist to increase the speed:
   - using AFL_ENTRYPOINT to move the forkserver entry to a later basic block in
     the binary (+5-10% speed)
   - using persistent mode [qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md)
     this will result in 150-300% overall speed increase - so 3-8x the original
     qemu_mode speed!
   - using AFL_CODE_START/AFL_CODE_END to only instrument specific parts
  Note that there is also honggfuzz: [https://github.com/google/honggfuzz](https://github.com/google/honggfuzz)
  which now has a qemu_mode, but its performance is just 1.5% ...
  As it is included in AFL++ this needs no URL.
  If you like to code a customized fuzzer without much work, we highly
  recommend to check out our sister project libafl which will support QEMU
  too:
  [https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL)
 ## AFL FRIDA
  In frida_mode you can fuzz binary-only targets easily like with QEMU,
  with the advantage that frida_mode also works on MacOS (both intel and M1).
  If you want to fuzz a binary-only library then you can fuzz it with
  frida-gum via utils/afl_frida/, you will have to write a harness to
  call the target function in the library, use afl-frida.c as a template.
  Both come with AFL++ so this needs no URL.
  You can also perform remote fuzzing with frida, e.g. if you want to fuzz
  on iPhone or Android devices, for this you can use
  [https://github.com/ttdennis/fpicker/](https://github.com/ttdennis/fpicker/)
  as an intermediate that uses AFL++ for fuzzing.
  If you like to code a customized fuzzer without much work, we highly
  recommend to check out our sister project libafl which supports Frida too:
  [https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL)
  Working examples already exist :-)
 ## WINE+QEMU
  Wine mode can run Win32 PE binaries with the QEMU instrumentation.
  It needs Wine, python3 and the pefile python package installed.
  As it is included in AFL++ this needs no URL.
 ## UNICORN
  Unicorn is a fork of QEMU. The instrumentation is, therefore, very similar.
  In contrast to QEMU, Unicorn does not offer a full system or even userland
  emulation. Runtime environment and/or loaders have to be written from scratch,
  if needed. On top, block chaining has been removed. This means the speed boost
  introduced in  the patched QEMU Mode of AFL++ cannot simply be ported over to
  Unicorn. For further information, check out [unicorn_mode/README.md](../unicorn_mode/README.md).
  As it is included in AFL++ this needs no URL.
 ## AFL UNTRACER
   If you want to fuzz a binary-only shared library then you can fuzz it with
   utils/afl_untracer/, use afl-untracer.c as a template.
   It is slower than AFL FRIDA (see above).
 ## ZAFL
  ZAFL is a static rewriting platform supporting x86-64 C/C++, stripped/unstripped, 
  and PIE/non-PIE binaries. Beyond conventional instrumentation, ZAFL's API enables 
  transformation passes (e.g., laf-Intel, context sensitivity, InsTrim, etc.).
  Its baseline instrumentation speed typically averages 90-95% of afl-clang-fast's.
  [https://git.zephyr-software.com/opensrc/zafl](https://git.zephyr-software.com/opensrc/zafl)
 ## DYNINST
  Dyninst is a binary instrumentation framework similar to Pintool and
  Dynamorio (see far below). However whereas Pintool and Dynamorio work at
  runtime, dyninst instruments the target at load time, and then let it run -
  or save the binary with the changes.
  This is great for some things, e.g. fuzzing, and not so effective for others,
  e.g. malware analysis.
  So what we can do with dyninst is taking every basic block, and put afl's
  instrumention code in there - and then save the binary.
  Afterwards we can just fuzz the newly saved target binary with afl-fuzz.
  Sounds great? It is. The issue though - it is a non-trivial problem to
  insert instructions, which change addresses in the process space, so that
  everything is still working afterwards. Hence more often than not binaries
  crash when they are run.
  The speed decrease is about 15-35%, depending on the optimization options
  used with afl-dyninst.
  [https://github.com/vanhauser-thc/afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst)
 ## RETROWRITE
  If you have an x86/x86_64 binary that still has its symbols, is compiled
  with position independant code (PIC/PIE) and does not use most of the C++
  features then the retrowrite solution might be for you.
  It decompiles to ASM files which can then be instrumented with afl-gcc.
  It is at about 80-85% performance.
  [https://github.com/HexHive/retrowrite](https://github.com/HexHive/retrowrite)
 ## MCSEMA
  Theoretically you can also decompile to llvm IR with mcsema, and then
  use llvm_mode to instrument the binary.
  Good luck with that.
  [https://github.com/lifting-bits/mcsema](https://github.com/lifting-bits/mcsema)
 ## INTEL-PT
  If you have a newer Intel CPU, you can make use of Intels processor trace.
  The big issue with Intel's PT is the small buffer size and the complex
  encoding of the debug information collected through PT.
  This makes the decoding very CPU intensive and hence slow.
  As a result, the overall speed decrease is about 70-90% (depending on
  the implementation and other factors).
  There are two AFL intel-pt implementations:
  1. [https://github.com/junxzm1990/afl-pt](https://github.com/junxzm1990/afl-pt)
     => this needs Ubuntu 14.04.05 without any updates and the 4.4 kernel.
  2. [https://github.com/hunter-ht-2018/ptfuzzer](https://github.com/hunter-ht-2018/ptfuzzer)
     => this needs a 4.14 or 4.15 kernel. the "nopti" kernel boot option must
        be used. This one is faster than the other.
  Note that there is also honggfuzz: https://github.com/google/honggfuzz
  But its IPT performance is just 6%!
 ## CORESIGHT
  Coresight is ARM's answer to Intel's PT.
  With afl++ v3.15 there is a coresight tracer implementation available in
  `coresight_mode/` which is faster than QEMU, however can not run in parallel.
  Currently only one process can be traced, it is WIP.
 ## PIN & DYNAMORIO
  Pintool and Dynamorio are dynamic instrumentation engines, and they can be
  used for getting basic block information at runtime.
  Pintool is only available for Intel x32/x64 on Linux, Mac OS and Windows,
  whereas Dynamorio is additionally available for ARM and AARCH64.
  Dynamorio is also 10x faster than Pintool.
  The big issue with Dynamorio (and therefore Pintool too) is speed.
  Dynamorio has a speed decrease of 98-99%
  Pintool has a speed decrease of 99.5%
  Hence Dynamorio is the option to go for if everything else fails, and Pintool
  only if Dynamorio fails too.
  Dynamorio solutions:
  * [https://github.com/vanhauser-thc/afl-dynamorio](https://github.com/vanhauser-thc/afl-dynamorio)
  * [https://github.com/mxmssh/drAFL](https://github.com/mxmssh/drAFL)
  * [https://github.com/googleprojectzero/winafl/](https://github.com/googleprojectzero/winafl/) <= very good but windows only
  Pintool solutions:
  * [https://github.com/vanhauser-thc/afl-pin](https://github.com/vanhauser-thc/afl-pin)
  * [https://github.com/mothran/aflpin](https://github.com/mothran/aflpin)
  * [https://github.com/spinpx/afl_pin_mode](https://github.com/spinpx/afl_pin_mode) <= only old Pintool version supported
 ## Non-AFL solutions
  There are many binary-only fuzzing frameworks.
  Some are great for CTFs but don't work with large binaries, others are very
  slow but have good path discovery, some are very hard to set-up ...
  * QSYM: [https://github.com/sslab-gatech/qsym](https://github.com/sslab-gatech/qsym)
  * Manticore: [https://github.com/trailofbits/manticore](https://github.com/trailofbits/manticore)
  * S2E: [https://github.com/S2E](https://github.com/S2E)
  * Tinyinst: [https://github.com/googleprojectzero/TinyInst](https://github.com/googleprojectzero/TinyInst) (Mac/Windows only)
  * Jackalope: [https://github.com/googleprojectzero/Jackalope](https://github.com/googleprojectzero/Jackalope)
  *  ... please send me any missing that are good
 ## Closing words
  That's it! News, corrections, updates? Send an email to vh@thc.org
--- a/docs/fuzzing_binary-only_targets.md
+++ b/docs/fuzzing_binary-only_targets.md
@ -1,83 +1,282 @@
 # Fuzzing binary-only targets
-When source code is *NOT* available, AFL++ offers various support for fast,
+AFL++, libfuzzer, and other fuzzers are great if you have the source code of the
-on-the-fly instrumentation of black-box binaries. 
+target. This allows for very fast and coverage guided fuzzing.
-If you do not have to use Unicorn the following setup is recommended to use
+However, if there is only the binary program and no source code available, then
-qemu_mode:
+standard `afl-fuzz -n` (non-instrumented mode) is not effective.
  * run 1 afl-fuzz -Q instance with CMPLOG (`-c 0` + `AFL_COMPCOV_LEVEL=2`)
  * run 1 afl-fuzz -Q instance with QASAN  (`AFL_USE_QASAN=1`)
  * run 1 afl-fuzz -Q instance with LAF (`AFL_PRELOAD=libcmpcov.so` + `AFL_COMPCOV_LEVEL=2`)
 Alternatively you can use frida_mode, just switch `-Q` with `-O` and remove the
 LAF instance.
-Then run as many instances as you have cores left with either -Q mode or - better -
+For fast, on-the-fly instrumentation of black-box binaries, AFL++ still offers
-use a binary rewriter like afl-dyninst, retrowrite, zafl, etc.
+various support. The following is a description of how these binaries can be
 fuzzed with AFL++.
-For Qemu and Frida mode, check out the persistent mode, it gives a huge speed
+## TL;DR:
 improvement if it is possible to use.
-### QEMU
+Qemu_mode in persistent mode is the fastest - if the stability is high enough.
 Otherwise, try RetroWrite, Dyninst, and if these fail, too, then try standard
 qemu_mode with AFL_ENTRYPOINT to where you need it.
-For linux programs and its libraries this is accomplished with a version of
+If your target is a library, then use frida_mode.
-QEMU running in the lesser-known "user space emulation" mode.
+
-QEMU is a project separate from AFL, but you can conveniently build the
+If your target is non-linux, then use unicorn_mode.
-feature by doing:
+
 ## Fuzzing binary-only targets with AFL++
 ### Qemu_mode
 Qemu_mode is the "native" solution to the program. It is available in the
 ./qemu_mode/ directory and, once compiled, it can be accessed by the afl-fuzz -Q
 command line option. It is the easiest to use alternative and even works for
 cross-platform binaries.
 For linux programs and its libraries, this is accomplished with a version of
 QEMU running in the lesser-known "user space emulation" mode. QEMU is a project
 separate from AFL++, but you can conveniently build the feature by doing:
 ```shell
 cd qemu_mode
 ./build_qemu_support.sh
 ```
-For additional instructions and caveats, see [qemu_mode/README.md](../qemu_mode/README.md).
+The following setup to use qemu_mode is recommended:
-If possible you should use the persistent mode, see [qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md).
+* run 1 afl-fuzz -Q instance with CMPLOG (`-c 0` + `AFL_COMPCOV_LEVEL=2`)
-The mode is approximately 2-5x slower than compile-time instrumentation, and is
+* run 1 afl-fuzz -Q instance with QASAN (`AFL_USE_QASAN=1`)
-less conducive to parallelization.
+* run 1 afl-fuzz -Q instance with LAF (`AFL_PRELOAD=libcmpcov.so` +
  `AFL_COMPCOV_LEVEL=2`), alternatively you can use frida_mode, just switch `-Q`
  with `-O` and remove the LAF instance
-If [afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst) works for
+Then run as many instances as you have cores left with either -Q mode or - even
-your binary, then you can use afl-fuzz normally and it will have twice
+better - use a binary rewriter like Dyninst, RetroWrite, ZAFL, etc.
 the speed compared to qemu_mode (but slower than qemu persistent mode).
 Note that several other binary rewriters exist, all with their advantages and
 caveats.
-### Frida
+If [afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst) works for your
 binary, then you can use afl-fuzz normally and it will have twice the speed
 compared to qemu_mode (but slower than qemu persistent mode). Note that several
 other binary rewriters exist, all with their advantages and caveats.
-Frida mode is sometimes faster and sometimes slower than Qemu mode.
+The speed decrease of qemu_mode is at about 50%. However, various options exist
-It is also newer, lacks COMPCOV, but supports MacOS.
+to increase the speed:
 - using AFL_ENTRYPOINT to move the forkserver entry to a later basic block in
  the binary (+5-10% speed)
 - using persistent mode
  [qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md) this will
  result in a 150-300% overall speed increase - so 3-8x the original qemu_mode
  speed!
 - using AFL_CODE_START/AFL_CODE_END to only instrument specific parts
 For additional instructions and caveats, see
 [qemu_mode/README.md](../qemu_mode/README.md). If possible, you should use the
 persistent mode, see
 [qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md). The mode is
 approximately 2-5x slower than compile-time instrumentation, and is less
 conducive to parallelization.
 Note that there is also honggfuzz:
 [https://github.com/google/honggfuzz](https://github.com/google/honggfuzz) which
 now has a qemu_mode, but its performance is just 1.5% ...
 If you like to code a customized fuzzer without much work, we highly recommend
 to check out our sister project libafl which will support QEMU, too:
 [https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL)
 ### WINE+QEMU
 Wine mode can run Win32 PE binaries with the QEMU instrumentation. It needs
 Wine, python3, and the pefile python package installed.
 It is included in AFL++.
 ### Frida_mode
 In frida_mode, you can fuzz binary-only targets as easily as with QEMU.
 Frida_mode is sometimes faster and sometimes slower than Qemu_mode. It is also
 newer, lacks COMPCOV, and has the advantage that it works on MacOS (both intel
 and M1).
 To build frida_mode:
 ```shell
 cd frida_mode
 make
 ```
-For additional instructions and caveats, see [frida_mode/README.md](../frida_mode/README.md).
+For additional instructions and caveats, see
-If possible you should use the persistent mode, see [qemu_frida/README.md](../qemu_frida/README.md).
+[frida_mode/README.md](../frida_mode/README.md). If possible, you should use the
-The mode is approximately 2-5x slower than compile-time instrumentation, and is
+persistent mode, see [qemu_frida/README.md](../qemu_frida/README.md). The mode
-less conducive to parallelization.
+is approximately 2-5x slower than compile-time instrumentation, and is less
 conducive to parallelization. But for binary-only fuzzing, it gives a huge speed
 improvement if it is possible to use.
 If you want to fuzz a binary-only library, then you can fuzz it with frida-gum
 via frida_mode/. You will have to write a harness to call the target function in
 the library, use afl-frida.c as a template.
 You can also perform remote fuzzing with frida, e.g. if you want to fuzz on
 iPhone or Android devices, for this you can use
 [https://github.com/ttdennis/fpicker/](https://github.com/ttdennis/fpicker/) as
 an intermediate that uses AFL++ for fuzzing.
 If you like to code a customized fuzzer without much work, we highly recommend
 to check out our sister project libafl which supports Frida, too:
 [https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL).
 Working examples already exist :-)
 ### Unicorn
-For non-Linux binaries you can use AFL++'s unicorn mode which can emulate
+Unicorn is a fork of QEMU. The instrumentation is, therefore, very similar. In
-anything you want - for the price of speed and user written scripts.
+contrast to QEMU, Unicorn does not offer a full system or even userland
-See [unicorn_mode/README.md](../unicorn_mode/README.md).
+emulation. Runtime environment and/or loaders have to be written from scratch,
 if needed. On top, block chaining has been removed. This means the speed boost
 introduced in the patched QEMU Mode of AFL++ cannot simply be ported over to
 Unicorn.
 For non-Linux binaries, you can use AFL++'s unicorn_mode which can emulate
 anything you want - for the price of speed and user written scripts.
 To build unicorn_mode:
 It can be easily built by:
 ```shell
 cd unicorn_mode
 ./build_unicorn_support.sh
 ```
 For further information, check out
 [unicorn_mode/README.md](../unicorn_mode/README.md).
 ### Shared libraries
-If the goal is to fuzz a dynamic library then there are two options available.
+If the goal is to fuzz a dynamic library, then there are two options available.
-For both you need to write a small harness that loads and calls the library.
+For both, you need to write a small harness that loads and calls the library.
-Then you fuzz this with either frida_mode or qemu_mode, and either use
+Then you fuzz this with either frida_mode or qemu_mode and either use
 `AFL_INST_LIBS=1` or `AFL_QEMU/FRIDA_INST_RANGES`.
-Another, less precise and slower option is using ptrace with debugger interrupt
+Another, less precise and slower option is to fuzz it with utils/afl_untracer/
-instrumentation: [utils/afl_untracer/README.md](../utils/afl_untracer/README.md).
+and use afl-untracer.c as a template. It is slower than frida_mode.
-### More
+For more information, see
 [utils/afl_untracer/README.md](../utils/afl_untracer/README.md).
-A more comprehensive description of these and other options can be found in
+## Binary rewriters
-[binaryonly_fuzzing.md](binaryonly_fuzzing.md).
+
 ### Coresight
 Coresight is ARM's answer to Intel's PT. With AFL++ v3.15, there is a coresight
 tracer implementation available in `coresight_mode/` which is faster than QEMU,
 however, cannot run in parallel. Currently, only one process can be traced, it
 is WIP.
 ### Dyninst
 Dyninst is a binary instrumentation framework similar to Pintool and DynamoRIO.
 However, whereas Pintool and DynamoRIO work at runtime, Dyninst instruments the
 target at load time and then let it run - or save the binary with the changes.
 This is great for some things, e.g. fuzzing, and not so effective for others,
 e.g. malware analysis.
 So, what we can do with Dyninst is taking every basic block and put AFL++'s
 instrumentation code in there - and then save the binary. Afterwards, we can
 just fuzz the newly saved target binary with afl-fuzz. Sounds great? It is. The
 issue though - it is a non-trivial problem to insert instructions, which change
 addresses in the process space, so that everything is still working afterwards.
 Hence, more often than not binaries crash when they are run.
 The speed decrease is about 15-35%, depending on the optimization options used
 with afl-dyninst.
 [https://github.com/vanhauser-thc/afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst)
 ### Intel PT
 If you have a newer Intel CPU, you can make use of Intel's processor trace. The
 big issue with Intel's PT is the small buffer size and the complex encoding of
 the debug information collected through PT. This makes the decoding very CPU
 intensive and hence slow. As a result, the overall speed decrease is about
 70-90% (depending on the implementation and other factors).
 There are two AFL intel-pt implementations:
 1. [https://github.com/junxzm1990/afl-pt](https://github.com/junxzm1990/afl-pt)
    => This needs Ubuntu 14.04.05 without any updates and the 4.4 kernel.
 2. [https://github.com/hunter-ht-2018/ptfuzzer](https://github.com/hunter-ht-2018/ptfuzzer)
    => This needs a 4.14 or 4.15 kernel. The "nopti" kernel boot option must be
    used. This one is faster than the other.
 Note that there is also honggfuzz:
 [https://github.com/google/honggfuzz](https://github.com/google/honggfuzz). But
 its IPT performance is just 6%!
 ### Mcsema
 Theoretically, you can also decompile to llvm IR with mcsema, and then use
 llvm_mode to instrument the binary. Good luck with that.
 [https://github.com/lifting-bits/mcsema](https://github.com/lifting-bits/mcsema)
 ### Pintool & DynamoRIO
 Pintool and DynamoRIO are dynamic instrumentation engines. They can be used for
 getting basic block information at runtime. Pintool is only available for Intel
 x32/x64 on Linux, Mac OS, and Windows, whereas DynamoRIO is additionally
 available for ARM and AARCH64. DynamoRIO is also 10x faster than Pintool.
 The big issue with DynamoRIO (and therefore Pintool, too) is speed. DynamoRIO
 has a speed decrease of 98-99%, Pintool has a speed decrease of 99.5%.
 Hence, DynamoRIO is the option to go for if everything else fails and Pintool
 only if DynamoRIO fails, too.
 DynamoRIO solutions:
 * [https://github.com/vanhauser-thc/afl-dynamorio](https://github.com/vanhauser-thc/afl-dynamorio)
 * [https://github.com/mxmssh/drAFL](https://github.com/mxmssh/drAFL)
 * [https://github.com/googleprojectzero/winafl/](https://github.com/googleprojectzero/winafl/)
  <= very good but windows only
 Pintool solutions:
 * [https://github.com/vanhauser-thc/afl-pin](https://github.com/vanhauser-thc/afl-pin)
 * [https://github.com/mothran/aflpin](https://github.com/mothran/aflpin)
 * [https://github.com/spinpx/afl_pin_mode](https://github.com/spinpx/afl_pin_mode)
  <= only old Pintool version supported
 ### RetroWrite
 If you have an x86/x86_64 binary that still has its symbols, is compiled with
 position independent code (PIC/PIE), and does not use most of the C++ features,
 then the RetroWrite solution might be for you. It decompiles to ASM files which
 can then be instrumented with afl-gcc.
 It is at about 80-85% performance.
 [https://github.com/HexHive/retrowrite](https://github.com/HexHive/retrowrite)
 ### ZAFL
 ZAFL is a static rewriting platform supporting x86-64 C/C++,
 stripped/unstripped, and PIE/non-PIE binaries. Beyond conventional
 instrumentation, ZAFL's API enables transformation passes (e.g., laf-Intel,
 context sensitivity, InsTrim, etc.).
 Its baseline instrumentation speed typically averages 90-95% of
 afl-clang-fast's.
 [https://git.zephyr-software.com/opensrc/zafl](https://git.zephyr-software.com/opensrc/zafl)
 ## Non-AFL++ solutions
 There are many binary-only fuzzing frameworks. Some are great for CTFs but don't
 work with large binaries, others are very slow but have good path discovery,
 some are very hard to set-up...
 * Jackalope:
  [https://github.com/googleprojectzero/Jackalope](https://github.com/googleprojectzero/Jackalope)
 * Manticore:
  [https://github.com/trailofbits/manticore](https://github.com/trailofbits/manticore)
 * QSYM:
  [https://github.com/sslab-gatech/qsym](https://github.com/sslab-gatech/qsym)
 * S2E: [https://github.com/S2E](https://github.com/S2E)
 * TinyInst:
  [https://github.com/googleprojectzero/TinyInst](https://github.com/googleprojectzero/TinyInst)
  (Mac/Windows only)
 *  ... please send me any missing that are good
 ## Closing words
 That's it! News, corrections, updates? Send an email to vh@thc.org.