* Pure Python (3.6) port of benchmark.sh as benchmark.py, no other changes * Test standard and persistent modes separately * Add support for multi-core benchmarking * Save the results to a json file * Allow config of all experiment params, average across runs * Add start_time_of_run and total_execs_per_sec, cleanup for PR * benchmark: cleanup, add results, add a data exploration notebook * benchmark: add a README, lower default runs from 5 to 3 * benchmark: notebook wording tweaks * copy 'detect_leaks=0' from ASAN to LSAN fix for issue #1733, set "detect_leaks=0" when ASAN_OPTIONS contains it and LSAN_OPTIONS are not set. * fix of fix: make sure ASAN_OPTIONS and LSAN_OPTIONS agree on leak detection * fix lsan fix * clang-format 16->17 * Add missing initialisation for havoc_queued during the custom mutator's stage. * fix dictionary and cmin * Use direct call to write to OpenBSD The linker on OpenBSD emits a warning when linking this file: warning: syscall() may go away, please rewrite code to use direct calls * Fix possible doc inconsistency for custom mutator's queue_get function. * update todos * benchmark: Add support for COMPARISON file * benchmark: show the number of cores used in COMPARISON * benchmark: lower minimum Python version to 3.8 * benchmark: use afl's execs/s; increase CPU model width * benchmark: disallow duplicate entries for the same CPU in COMPARISON * Update benchmark.py * fix inf in stats * Fix benchmark.py * missing closing parenthesis * Update benchmark.py * benchmark: remove self-calculation of execs/sec * benchmark: update COMPARISON * benchmark: Update Jupyter notebook and results file. * benchmark: rename afl_execs_per_sec to execs_per_sec * benchmark: update README * update * add benchmark * nits * add benchmarks * Update unicornafl ref * Pass correct Nyx ID when creating a Nyx runner * Fix typo in docker pull command, add exampe to mount current dir as volume (#1914) * mini fix * add custom_post_run.c * update afl-fuzz-run * update python module * format code * update * merge function * changes * code format * improve cmplog * nit * nit * fix * fix * Stop hardcoding the path /usr/local/lib/afl in afl-ld-lto.c and respect the configured PREFIX. * Add benchmark for Raspberry Pi 5 * ryzen 5950 benchmark * add missing raspery5 * comparison -> comparison.md * removing options "-Wl,-rpath" "LLVM_LIBDIR" when using gcc * fixing -Wl,-rpath=<LLVM_LIBDIR> * nits * fix * afl-cc fixes * nit * add n_fuzz to ignore_timeouts * fix * Fix #1927 * in-depth blog post * add AFL_FUZZER_LOOPCOUNT * AFL_FUZZER_LOOPCOUNT * fix 2 mutation bugs * v4.09c release * v4.10a init * switch to explore powerschedule as default * fix MUT_INSERTASCIINUM * fix MUT_STRATEGY_ARRAY_SIZE * fix bad fix for MUT_STRATEGY_ARRAY_SIZE * remove afl-network-client on uninstall * update nyx * Improve binary-only related docs * llvm 18 build fixes. * code format * Fix custom_send link Add a leading '/' to walk in the repo root instead of current dir. * Use ../ instead * initial simple injection detection support * inject docs * fix for issue #1916, iLLVM crash in split-floatingpoint-compares * LLVM 17 bug workaround * finish injection implementation * remove tmp todo * update changelog * forgot to add the injection pass * Output afl-clang-fast stuffs only if necessary (#1912) * afl-cc header * afl-cc common declarations - Add afl-cc-state.c - Strip includes, find_object, debug/be_quiet/have_*/callname setting from afl-cc.c - Use debugf_args in main - Modify execvp stuffs to fit new aflcc struct * afl-cc show usage * afl-cc mode selecting 1. compiler_mode by callname in argv[0] 2. compiler_mode by env "AFL_CC_COMPILER" 3. compiler_mode/instrument_mode by command line options "--afl-..." 4. instrument_mode/compiler_mode by various env vars including "AFL_LLVM_INSTRUMENT" 5. final checking steps 6. print "... - mode: %s-%s\n" 7. determine real argv[0] according to compiler_mode * afl-cc macro defs * afl-cc linking behaviors * afl-cc fsanitize behaviors * afl-cc misc * afl-cc body update * afl-cc all-in-one formated with custom-format.py * nits --------- Co-authored-by: vanhauser-thc <vh@thc.org> * changelog * update grammar mutator * lto llvm 12+ * docs(custom_mutators): fix missing ':' (#1953) * Fix broken LTO mode and response file support (#1948) * Strip `-Wl,-no-undefined` during compilation (#1952) Make the compiler wrapper stripping `-Wl,-no-undefined` in addition to `-Wl,--no-undefined`. Both versions of the flag are accepted by clang and, therefore, used by building systems in the wild (e.g., samba will not build without this fix). * Remove dead code in write_to_testcase (#1955) The custom_mutators_count check in if case is duplicate with if condition. The else case is custom_mutators_count == 0, neither custom_mutator_list iteration nor sent check needed. Signed-off-by: Xeonacid <h.dwwwwww@gmail.com> * update qemuafl * WIP: Add ability to generate drcov trace using QEMU backend (#1956) * Document new drcov QEMU plugin * Add link to lightkeeper for QEMU drcov file loading --------- Co-authored-by: Jean-Romain Garnier <jean-romain.garnier@airbus.com> * code format * changelog * sleep on uid != 0 afl-system-config * fix segv about skip_next, warn on unsupported cases of linking options (#1958) * todos * ensure afl-cc only allows available compiler modes * update grammar mutator * disable aslr on apple * fix for arm64 * help selective instrumentation * typos * macos * add compiler test script * apple fixes --------- Signed-off-by: Xeonacid <h.dwwwwww@gmail.com> Co-authored-by: Chris Ball <chris@printf.net> Co-authored-by: hexcoder <hexcoder-@users.noreply.github.com> Co-authored-by: hexcoder- <heiko@hexco.de> Co-authored-by: Manuel Carrasco <m.carrasco@imperial.ac.uk> Co-authored-by: Jasper Lievisse Adriaanse <j@jasper.la> Co-authored-by: ifyGecko <26214995+ifyGecko@users.noreply.github.com> Co-authored-by: Dominik Maier <domenukk@gmail.com> Co-authored-by: Christian Holler (:decoder) <choller@mozilla.com> Co-authored-by: Carlo Maragno <ste.maragno@gmail.com> Co-authored-by: yangzao <yangzaocn@outlook.com> Co-authored-by: Romain Geissler <romain.geissler@amadeus.com> Co-authored-by: Jakob Lell <jakob@jakoblell.com> Co-authored-by: vincenzo MEZZELA <vincenzo.mezzela@amadeus.com> Co-authored-by: Andrea Fioraldi <andreafioraldi@gmail.com> Co-authored-by: Bet4 <0xbet4@gmail.com> Co-authored-by: David Carlier <devnexen@gmail.com> Co-authored-by: Xeonacid <h.dwwwwww@gmail.com> Co-authored-by: Sonic <50692172+SonicStark@users.noreply.github.com> Co-authored-by: Nils Bars <nils.bars@rub.de> Co-authored-by: Jean-Romain Garnier <7504819+JRomainG@users.noreply.github.com> Co-authored-by: Jean-Romain Garnier <jean-romain.garnier@airbus.com>
12 KiB
Fuzzing binary-only targets
AFL++, libfuzzer, and other fuzzers are great if you have the source code of the target. This allows for very fast and coverage guided fuzzing.
However, if there is only the binary program and no source code available, then
standard afl-fuzz -n
(non-instrumented mode) is not effective.
For fast, on-the-fly instrumentation of black-box binaries, AFL++ still offers various support. The following is a description of how these binaries can be fuzzed with AFL++.
TL;DR:
FRIDA mode and QEMU mode in persistent mode are the fastest - if persistent mode is possible and the stability is high enough.
Otherwise, try Zafl, RetroWrite, Dyninst, and if these fail, too, then try
standard FRIDA/QEMU mode with AFL_ENTRYPOINT
to where you need it.
If your target is non-linux, then use unicorn_mode.
Fuzzing binary-only targets with AFL++
QEMU mode
QEMU mode is the "native" solution to the program. It is available in the ./qemu_mode/ directory and, once compiled, it can be accessed by the afl-fuzz -Q command line option. It is the easiest to use alternative and even works for cross-platform binaries.
For linux programs and its libraries, this is accomplished with a version of QEMU running in the lesser-known "user space emulation" mode. QEMU is a project separate from AFL++, but you can conveniently build the feature by doing:
cd qemu_mode
./build_qemu_support.sh
The following setup to use QEMU mode is recommended:
- run 1 afl-fuzz -Q instance with CMPLOG (
-c 0
+AFL_COMPCOV_LEVEL=2
) - run 1 afl-fuzz -Q instance with QASAN (
AFL_USE_QASAN=1
) - run 1 afl-fuzz -Q instance with LAF (
AFL_PRELOAD=libcmpcov.so
+AFL_COMPCOV_LEVEL=2
), alternatively you can use FRIDA mode, just switch-Q
with-O
and remove the LAF instance
Then run as many instances as you have cores left with either -Q mode or - even better - use a binary rewriter like Dyninst, RetroWrite, ZAFL, etc. The binary rewriters all have their own advantages and caveats. ZAFL is the best but cannot be used in a business/commercial context.
If a binary rewriter works for your target then you can use afl-fuzz normally and it will have twice the speed compared to QEMU mode (but slower than QEMU persistent mode).
The speed decrease of QEMU mode is at about 50%. However, various options exist to increase the speed:
- using AFL_ENTRYPOINT to move the forkserver entry to a later basic block in the binary (+5-10% speed)
- using persistent mode qemu_mode/README.persistent.md this will result in a 150-300% overall speed increase - so 3-8x the original QEMU mode speed!
- using AFL_CODE_START/AFL_CODE_END to only instrument specific parts
For additional instructions and caveats, see qemu_mode/README.md. If possible, you should use the persistent mode, see qemu_mode/README.persistent.md. The mode is approximately 2-5x slower than compile-time instrumentation, and is less conducive to parallelization.
Note that there is also honggfuzz: https://github.com/google/honggfuzz which now has a QEMU mode, but its performance is just 1.5% ...
If you like to code a customized fuzzer without much work, we highly recommend to check out our sister project libafl which supports QEMU, too: https://github.com/AFLplusplus/LibAFL
WINE+QEMU
Wine mode can run Win32 PE binaries with the QEMU instrumentation. It needs Wine, python3, and the pefile python package installed.
It is included in AFL++.
For more information, see qemu_mode/README.wine.md.
FRIDA mode
In FRIDA mode, you can fuzz binary-only targets as easily as with QEMU mode. FRIDA mode is most of the times slightly faster than QEMU mode. It is also newer, and has the advantage that it works on MacOS (both intel and M1).
To build FRIDA mode:
cd frida_mode
gmake
For additional instructions and caveats, see frida_mode/README.md.
If possible, you should use the persistent mode, see instrumentation/README.persistent_mode.md. The mode is approximately 2-5x slower than compile-time instrumentation, and is less conducive to parallelization. But for binary-only fuzzing, it gives a huge speed improvement if it is possible to use.
You can also perform remote fuzzing with frida, e.g., if you want to fuzz on iPhone or Android devices, for this you can use https://github.com/ttdennis/fpicker/ as an intermediate that uses AFL++ for fuzzing.
If you like to code a customized fuzzer without much work, we highly recommend to check out our sister project libafl which supports Frida, too: https://github.com/AFLplusplus/LibAFL. Working examples already exist :-)
Nyx mode
Nyx is a full system emulation fuzzing environment with snapshot support that is built upon KVM and QEMU. It is only available on Linux and currently restricted to x86_x64.
For binary-only fuzzing a special 5.10 kernel is required.
See nyx_mode/README.md.
Unicorn
Unicorn is a fork of QEMU. The instrumentation is, therefore, very similar. In contrast to QEMU, Unicorn does not offer a full system or even userland emulation. Runtime environment and/or loaders have to be written from scratch, if needed. On top, block chaining has been removed. This means the speed boost introduced in the patched QEMU Mode of AFL++ cannot be ported over to Unicorn.
For non-Linux binaries, you can use AFL++'s unicorn_mode which can emulate anything you want - for the price of speed and user written scripts.
To build unicorn_mode:
cd unicorn_mode
./build_unicorn_support.sh
For further information, check out unicorn_mode/README.md.
Shared libraries
If the goal is to fuzz a dynamic library, then there are two options available.
For both, you need to write a small harness that loads and calls the library.
Then you fuzz this with either FRIDA mode or QEMU mode and either use
AFL_INST_LIBS=1
or AFL_QEMU/FRIDA_INST_RANGES
.
Another, less precise and slower option is to fuzz it with utils/afl_untracer/ and use afl-untracer.c as a template. It is slower than FRIDA mode.
For more information, see utils/afl_untracer/README.md.
Coresight
Coresight is ARM's answer to Intel's PT. With AFL++ v3.15, there is a coresight
tracer implementation available in coresight_mode/
which is faster than QEMU,
however, cannot run in parallel. Currently, only one process can be traced, it
is WIP.
Fore more information, see coresight_mode/README.md.
Binary rewriters
An alternative solution are binary rewriters. They are faster than the solutions native to AFL++ but don't always work.
ZAFL
ZAFL is a static rewriting platform supporting x86-64 C/C++, stripped/unstripped, and PIE/non-PIE binaries. Beyond conventional instrumentation, ZAFL's API enables transformation passes (e.g., laf-Intel, context sensitivity, InsTrim, etc.).
Its baseline instrumentation speed typically averages 90-95% of afl-clang-fast's.
https://git.zephyr-software.com/opensrc/zafl
RetroWrite
RetroWrite is a static binary rewriter that can be combined with AFL++. If you have an x86_64 or arm64 binary that does not contain C++ exceptions and - if x86_64 - still has it's symbols and compiled with position independent code (PIC/PIE), then the RetroWrite solution might be for you. It decompiles to ASM files which can then be instrumented with afl-gcc.
Binaries that are statically instrumented for fuzzing using RetroWrite are close in performance to compiler-instrumented binaries and outperform the QEMU-based instrumentation.
https://github.com/HexHive/retrowrite
Dyninst
Dyninst is a binary instrumentation framework similar to Pintool and DynamoRIO. However, whereas Pintool and DynamoRIO work at runtime, Dyninst instruments the target at load time and then let it run - or save the binary with the changes. This is great for some things, e.g., fuzzing, and not so effective for others, e.g., malware analysis.
So, what you can do with Dyninst is taking every basic block and putting AFL++'s instrumentation code in there - and then save the binary. Afterwards, just fuzz the newly saved target binary with afl-fuzz. Sounds great? It is. The issue though - it is a non-trivial problem to insert instructions, which change addresses in the process space, so that everything is still working afterwards. Hence, more often than not binaries crash when they are run.
The speed decrease is about 15-35%, depending on the optimization options used with afl-dyninst.
https://github.com/vanhauser-thc/afl-dyninst
Mcsema
Theoretically, you can also decompile to llvm IR with mcsema, and then use llvm_mode to instrument the binary. Good luck with that.
https://github.com/lifting-bits/mcsema
Binary tracers
Pintool & DynamoRIO
Pintool and DynamoRIO are dynamic instrumentation engines. They can be used for getting basic block information at runtime. Pintool is only available for Intel x32/x64 on Linux, Mac OS, and Windows, whereas DynamoRIO is additionally available for ARM and AARCH64. DynamoRIO is also 10x faster than Pintool.
The big issue with DynamoRIO (and therefore Pintool, too) is speed. DynamoRIO has a speed decrease of 98-99%, Pintool has a speed decrease of 99.5%.
Hence, DynamoRIO is the option to go for if everything else fails and Pintool only if DynamoRIO fails, too.
DynamoRIO solutions:
- https://github.com/vanhauser-thc/afl-dynamorio
- https://github.com/mxmssh/drAFL
- https://github.com/googleprojectzero/winafl/ <= very good but windows only
Pintool solutions:
- https://github.com/vanhauser-thc/afl-pin
- https://github.com/mothran/aflpin
- https://github.com/spinpx/afl_pin_mode <= only old Pintool version supported
Intel PT
If you have a newer Intel CPU, you can make use of Intel's processor trace. The big issue with Intel's PT is the small buffer size and the complex encoding of the debug information collected through PT. This makes the decoding very CPU intensive and hence slow. As a result, the overall speed decrease is about 70-90% (depending on the implementation and other factors).
There are two AFL intel-pt implementations:
-
https://github.com/junxzm1990/afl-pt => This needs Ubuntu 14.04.05 without any updates and the 4.4 kernel.
-
https://github.com/hunter-ht-2018/ptfuzzer => This needs a 4.14 or 4.15 kernel. The "nopti" kernel boot option must be used. This one is faster than the other.
Note that there is also honggfuzz: https://github.com/google/honggfuzz. But its IPT performance is just 6%!
Non-AFL++ solutions
There are many binary-only fuzzing frameworks. Some are great for CTFs but don't work with large binaries, others are very slow but have good path discovery, some are very hard to set up...
- Jackalope: https://github.com/googleprojectzero/Jackalope
- Manticore: https://github.com/trailofbits/manticore
- QSYM: https://github.com/sslab-gatech/qsym
- S2E: https://github.com/S2E
- TinyInst: https://github.com/googleprojectzero/TinyInst
- ... please send me any missing that are good
Closing words
That's it! News, corrections, updates? Send an email to vh@thc.org.