Edit instrumentation READMEs

This commit is contained in:
llzmb
2021-11-23 21:03:56 +01:00
parent d9ff3745d0
commit 6cce577b90
6 changed files with 343 additions and 314 deletions

View File

@ -1,11 +1,12 @@
# CmpLog instrumentation # CmpLog instrumentation
The CmpLog instrumentation enables logging of comparison operands in a The CmpLog instrumentation enables logging of comparison operands in a shared
shared memory. memory.
These values can be used by various mutators built on top of it. These values can be used by various mutators built on top of it. At the moment,
At the moment we support the RedQueen mutator (input-2-state instructions only), we support the RedQueen mutator (input-2-state instructions only), for details
for details see [the RedQueen paper](https://www.syssec.ruhr-uni-bochum.de/media/emma/veroeffentlichungen/2018/12/17/NDSS19-Redqueen.pdf). see
[the RedQueen paper](https://www.syssec.ruhr-uni-bochum.de/media/emma/veroeffentlichungen/2018/12/17/NDSS19-Redqueen.pdf).
## Build ## Build
@ -14,7 +15,8 @@ program.
The first version is built using the regular AFL++ instrumentation. The first version is built using the regular AFL++ instrumentation.
The second one, the CmpLog binary, is built with setting AFL_LLVM_CMPLOG during the compilation. The second one, the CmpLog binary, is built with setting AFL_LLVM_CMPLOG during
the compilation.
For example: For example:
@ -32,8 +34,8 @@ unset AFL_LLVM_CMPLOG
## Use ## Use
AFL++ has the new `-c` option that needs to be used to specify the CmpLog binary (the second AFL++ has the new `-c` option that needs to be used to specify the CmpLog binary
build). (the second build).
For example: For example:
@ -41,4 +43,4 @@ For example:
afl-fuzz -i input -o output -c ./program.cmplog -m none -- ./program.afl @@ afl-fuzz -i input -o output -c ./program.cmplog -m none -- ./program.afl @@
``` ```
Be sure to use `-m none` because CmpLog can map a lot of pages. Be sure to use `-m none` because CmpLog can map a lot of pages.

View File

@ -1,64 +1,68 @@
# GCC-based instrumentation for afl-fuzz # GCC-based instrumentation for afl-fuzz
See [../README.md](../README.md) for the general instruction manual. For the general instruction manual, see [../README.md](../README.md).
See [README.llvm.md](README.llvm.md) for the LLVM-based instrumentation. For the LLVM-based instrumentation, see [README.llvm.md](README.llvm.md).
This document describes how to build and use `afl-gcc-fast` and `afl-g++-fast`, This document describes how to build and use `afl-gcc-fast` and `afl-g++-fast`,
which instrument the target with the help of gcc plugins. which instrument the target with the help of gcc plugins.
TLDR: TL;DR:
* check the version of your gcc compiler: `gcc --version` * Check the version of your gcc compiler: `gcc --version`
* `apt-get install gcc-VERSION-plugin-dev` or similar to install headers for gcc plugins * `apt-get install gcc-VERSION-plugin-dev` or similar to install headers for gcc
* `gcc` and `g++` must match the gcc-VERSION you installed headers for. You can set `AFL_CC`/`AFL_CXX` plugins.
to point to these! * `gcc` and `g++` must match the gcc-VERSION you installed headers for. You can
* `make` set `AFL_CC`/`AFL_CXX` to point to these!
* just use `afl-gcc-fast`/`afl-g++-fast` normally like you would do with `afl-clang-fast` * `make`
* Just use `afl-gcc-fast`/`afl-g++-fast` normally like you would do with
`afl-clang-fast`.
## 1) Introduction ## 1) Introduction
The code in this directory allows to instrument programs for AFL using The code in this directory allows to instrument programs for AFL++ using true
true compiler-level instrumentation, instead of the more crude compiler-level instrumentation, instead of the more crude assembly-level
assembly-level rewriting approach taken by afl-gcc and afl-clang. This has rewriting approach taken by afl-gcc and afl-clang. This has several interesting
several interesting properties: properties:
- The compiler can make many optimizations that are hard to pull off when - The compiler can make many optimizations that are hard to pull off when
manually inserting assembly. As a result, some slow, CPU-bound programs will manually inserting assembly. As a result, some slow, CPU-bound programs will
run up to around faster. run up to around faster.
The gains are less pronounced for fast binaries, where the speed is limited The gains are less pronounced for fast binaries, where the speed is limited
chiefly by the cost of creating new processes. In such cases, the gain will chiefly by the cost of creating new processes. In such cases, the gain will
probably stay within 10%. probably stay within 10%.
- The instrumentation is CPU-independent. At least in principle, you should - The instrumentation is CPU-independent. At least in principle, you should be
be able to rely on it to fuzz programs on non-x86 architectures (after able to rely on it to fuzz programs on non-x86 architectures (after building
building `afl-fuzz` with `AFL_NOX86=1`). `afl-fuzz` with `AFL_NOX86=1`).
- Because the feature relies on the internals of GCC, it is gcc-specific - Because the feature relies on the internals of GCC, it is gcc-specific and
and will *not* work with LLVM (see [README.llvm.md](README.llvm.md) for an alternative). will *not* work with LLVM (see [README.llvm.md](README.llvm.md) for an
alternative).
Once this implementation is shown to be sufficiently robust and portable, it Once this implementation is shown to be sufficiently robust and portable, it
will probably replace afl-gcc. For now, it can be built separately and will probably replace afl-gcc. For now, it can be built separately and co-exists
co-exists with the original code. with the original code.
The idea and much of the implementation comes from Laszlo Szekeres. The idea and much of the implementation comes from Laszlo Szekeres.
## 2) How to use ## 2) How to use
In order to leverage this mechanism, you need to have modern enough GCC In order to leverage this mechanism, you need to have modern enough GCC (>=
(>= version 4.5.0) and the plugin development headers installed on your system. That version 4.5.0) and the plugin development headers installed on your system. That
should be all you need. On Debian machines, these headers can be acquired by should be all you need. On Debian machines, these headers can be acquired by
installing the `gcc-VERSION-plugin-dev` packages. installing the `gcc-VERSION-plugin-dev` packages.
To build the instrumentation itself, type `make`. This will generate binaries To build the instrumentation itself, type `make`. This will generate binaries
called `afl-gcc-fast` and `afl-g++-fast` in the parent directory. called `afl-gcc-fast` and `afl-g++-fast` in the parent directory.
The gcc and g++ compiler links have to point to gcc-VERSION - or set these The gcc and g++ compiler links have to point to gcc-VERSION - or set these by
by pointing the environment variables `AFL_CC`/`AFL_CXX` to them. pointing the environment variables `AFL_CC`/`AFL_CXX` to them. If the `CC`/`CXX`
If the `CC`/`CXX` environment variables have been set, those compilers will be environment variables have been set, those compilers will be preferred over
preferred over those from the `AFL_CC`/`AFL_CXX` settings. those from the `AFL_CC`/`AFL_CXX` settings.
Once this is done, you can instrument third-party code in a way similar to the Once this is done, you can instrument third-party code in a way similar to the
standard operating mode of AFL, e.g.: standard operating mode of AFL++, e.g.:
``` ```
CC=/path/to/afl/afl-gcc-fast CC=/path/to/afl/afl-gcc-fast
CXX=/path/to/afl/afl-g++-fast CXX=/path/to/afl/afl-g++-fast
@ -66,15 +70,15 @@ standard operating mode of AFL, e.g.:
./configure [...options...] ./configure [...options...]
make make
``` ```
Note: We also used `CXX` to set the C++ compiler to `afl-g++-fast` for C++ code. Note: We also used `CXX` to set the C++ compiler to `afl-g++-fast` for C++ code.
The tool honors roughly the same environmental variables as `afl-gcc` (see The tool honors roughly the same environmental variables as `afl-gcc` (see
[env_variables.md](../docs/env_variables.md). This includes `AFL_INST_RATIO`, [docs/env_variables.md](../docs/env_variables.md). This includes
`AFL_USE_ASAN`, `AFL_HARDEN`, and `AFL_DONT_OPTIMIZE`. `AFL_INST_RATIO`, `AFL_USE_ASAN`, `AFL_HARDEN`, and `AFL_DONT_OPTIMIZE`.
Note: if you want the GCC plugin to be installed on your system for all Note: if you want the GCC plugin to be installed on your system for all users,
users, you need to build it before issuing 'make install' in the parent you need to build it before issuing 'make install' in the parent directory.
directory.
## 3) Gotchas, feedback, bugs ## 3) Gotchas, feedback, bugs
@ -83,41 +87,40 @@ reports to afl@aflplus.plus.
## 4) Bonus feature #1: deferred initialization ## 4) Bonus feature #1: deferred initialization
AFL tries to optimize performance by executing the targeted binary just once, AFL++ tries to optimize performance by executing the targeted binary just once,
stopping it just before main(), and then cloning this "main" process to get stopping it just before `main()`, and then cloning this "main" process to get a
a steady supply of targets to fuzz. steady supply of targets to fuzz.
Although this approach eliminates much of the OS-, linker- and libc-level Although this approach eliminates much of the OS-, linker- and libc-level costs
costs of executing the program, it does not always help with binaries that of executing the program, it does not always help with binaries that perform
perform other time-consuming initialization steps - say, parsing a large config other time-consuming initialization steps - say, parsing a large config file
file before getting to the fuzzed data. before getting to the fuzzed data.
In such cases, it's beneficial to initialize the forkserver a bit later, once In such cases, it's beneficial to initialize the forkserver a bit later, once
most of the initialization work is already done, but before the binary attempts most of the initialization work is already done, but before the binary attempts
to read the fuzzed input and parse it; in some cases, this can offer a 10x+ to read the fuzzed input and parse it; in some cases, this can offer a 10x+
performance gain. You can implement delayed initialization in GCC mode in a performance gain. You can implement delayed initialization in GCC mode in a
fairly simple way. fairly simple way:
First, locate a suitable location in the code where the delayed cloning can First, locate a suitable location in the code where the delayed cloning can take
take place. This needs to be done with *extreme* care to avoid breaking the place. This needs to be done with *extreme* care to avoid breaking the binary.
binary. In particular, the program will probably malfunction if you select In particular, the program will probably malfunction if you select a location
a location after: after:
- The creation of any vital threads or child processes - since the forkserver - The creation of any vital threads or child processes - since the forkserver
can't clone them easily. can't clone them easily.
- The initialization of timers via setitimer() or equivalent calls. - The initialization of timers via `setitimer()` or equivalent calls.
- The creation of temporary files, network sockets, offset-sensitive file - The creation of temporary files, network sockets, offset-sensitive file
descriptors, and similar shared-state resources - but only provided that descriptors, and similar shared-state resources - but only provided that their
their state meaningfully influences the behavior of the program later on. state meaningfully influences the behavior of the program later on.
- Any access to the fuzzed input, including reading the metadata about its - Any access to the fuzzed input, including reading the metadata about its size.
size.
With the location selected, add this code in the appropriate spot: With the location selected, add this code in the appropriate spot:
``` ```c
#ifdef __AFL_HAVE_MANUAL_CONTROL #ifdef __AFL_HAVE_MANUAL_CONTROL
__AFL_INIT(); __AFL_INIT();
#endif #endif
@ -131,14 +134,14 @@ Finally, recompile the program with afl-gcc-fast (afl-gcc or afl-clang will
## 5) Bonus feature #2: persistent mode ## 5) Bonus feature #2: persistent mode
Some libraries provide APIs that are stateless, or whose state can be reset in Some libraries provide APIs that are stateless or whose state can be reset in
between processing different input files. When such a reset is performed, a between processing different input files. When such a reset is performed, a
single long-lived process can be reused to try out multiple test cases, single long-lived process can be reused to try out multiple test cases,
eliminating the need for repeated `fork()` calls and the associated OS overhead. eliminating the need for repeated `fork()` calls and the associated OS overhead.
The basic structure of the program that does this would be: The basic structure of the program that does this would be:
``` ```c
while (__AFL_LOOP(1000)) { while (__AFL_LOOP(1000)) {
/* Read input data. */ /* Read input data. */
@ -147,22 +150,21 @@ The basic structure of the program that does this would be:
} }
/* Exit normally */ /* Exit normally. */
``` ```
The numerical value specified within the loop controls the maximum number The numerical value specified within the loop controls the maximum number of
of iterations before AFL will restart the process from scratch. This minimizes iterations before AFL++ will restart the process from scratch. This minimizes
the impact of memory leaks and similar glitches; 1000 is a good starting point. the impact of memory leaks and similar glitches; 1000 is a good starting point.
A more detailed template is shown in ../utils/persistent_mode/. A more detailed template is shown in ../utils/persistent_mode/. Similarly to the
Similarly to the previous mode, the feature works only with afl-gcc-fast or previous mode, the feature works only with afl-gcc-fast or afl-clang-fast;
afl-clang-fast; #ifdef guards can be used to suppress it when using other #ifdef guards can be used to suppress it when using other compilers.
compilers.
Note that as with the previous mode, the feature is easy to misuse; if you Note that as with the previous mode, the feature is easy to misuse; if you do
do not reset the critical state fully, you may end up with false positives or not reset the critical state fully, you may end up with false positives or waste
waste a whole lot of CPU power doing nothing useful at all. Be particularly a whole lot of CPU power doing nothing useful at all. Be particularly wary of
wary of memory leaks and the state of file descriptors. memory leaks and the state of file descriptors.
When running in this mode, the execution paths will inherently vary a bit When running in this mode, the execution paths will inherently vary a bit
depending on whether the input loop is being entered for the first time or depending on whether the input loop is being entered for the first time or
@ -171,5 +173,5 @@ executed again. To avoid spurious warnings, the feature implies
## 6) Bonus feature #3: selective instrumentation ## 6) Bonus feature #3: selective instrumentation
It can be more effective to fuzzing to only instrument parts of the code. It can be more effective to fuzzing to only instrument parts of the code. For
For details see [README.instrument_list.md](README.instrument_list.md). details, see [README.instrument_list.md](README.instrument_list.md).

View File

@ -1,80 +1,84 @@
# Using AFL++ with partial instrumentation # Using AFL++ with partial instrumentation
This file describes two different mechanisms to selectively instrument This file describes two different mechanisms to selectively instrument only
only specific parts in the target. specific parts in the target.
Both mechanisms work for LLVM and GCC_PLUGIN, but not for afl-clang/afl-gcc. Both mechanisms work for LLVM and GCC_PLUGIN, but not for afl-clang/afl-gcc.
## 1) Description and purpose ## 1) Description and purpose
When building and testing complex programs where only a part of the program is When building and testing complex programs where only a part of the program is
the fuzzing target, it often helps to only instrument the necessary parts of the fuzzing target, it often helps to only instrument the necessary parts of the
the program, leaving the rest uninstrumented. This helps to focus the fuzzer program, leaving the rest uninstrumented. This helps to focus the fuzzer on the
on the important parts of the program, avoiding undesired noise and important parts of the program, avoiding undesired noise and disturbance by
disturbance by uninteresting code being exercised. uninteresting code being exercised.
For this purpose, "partial instrumentation" support is provided by AFL++ that For this purpose, "partial instrumentation" support is provided by AFL++ that
allows to specify what should be instrumented and what not. allows to specify what should be instrumented and what not.
Both mechanisms can be used together. Both mechanisms for partial instrumentation can be used together.
## 2) Selective instrumentation with __AFL_COVERAGE_... directives ## 2) Selective instrumentation with __AFL_COVERAGE_... directives
In this mechanism the selective instrumentation is done in the source code. In this mechanism, the selective instrumentation is done in the source code.
After the includes a special define has to be made, eg.: After the includes, a special define has to be made, e.g.:
``` ```
#include <stdio.h> #include <stdio.h>
#include <stdint.h> #include <stdint.h>
// ... // ...
__AFL_COVERAGE(); // <- required for this feature to work __AFL_COVERAGE(); // <- required for this feature to work
``` ```
If you want to disable the coverage at startup until you specify coverage If you want to disable the coverage at startup until you specify coverage should
should be started, then add `__AFL_COVERAGE_START_OFF();` at that position. be started, then add `__AFL_COVERAGE_START_OFF();` at that position.
From here on out you have the following macros available that you can use From here on out, you have the following macros available that you can use in
in any function where you want: any function where you want:
* `__AFL_COVERAGE_ON();` - enable coverage from this point onwards * `__AFL_COVERAGE_ON();` - Enable coverage from this point onwards.
* `__AFL_COVERAGE_OFF();` - disable coverage from this point onwards * `__AFL_COVERAGE_OFF();` - Disable coverage from this point onwards.
* `__AFL_COVERAGE_DISCARD();` - reset all coverage gathered until this point * `__AFL_COVERAGE_DISCARD();` - Reset all coverage gathered until this point.
* `__AFL_COVERAGE_SKIP();` - mark this test case as unimportant. Whatever happens, afl-fuzz will ignore it. * `__AFL_COVERAGE_SKIP();` - Mark this test case as unimportant. Whatever
happens, afl-fuzz will ignore it.
A special function is `__afl_coverage_interesting`. A special function is `__afl_coverage_interesting`. To use this, you must define
To use this, you must define `void __afl_coverage_interesting(u8 val, u32 id);`. `void __afl_coverage_interesting(u8 val, u32 id);`. Then you can use this
Then you can use this function globally, where the `val` parameter can be set function globally, where the `val` parameter can be set by you, the `id`
by you, the `id` parameter is for afl-fuzz and will be overwritten. parameter is for afl-fuzz and will be overwritten. Note that useful parameters
Note that useful parameters for `val` are: 1, 2, 3, 4, 8, 16, 32, 64, 128. for `val` are: 1, 2, 3, 4, 8, 16, 32, 64, 128. A value of, e.g., 33 will be seen
A value of e.g. 33 will be seen as 32 for coverage purposes. as 32 for coverage purposes.
## 3) Selective instrumentation with AFL_LLVM_ALLOWLIST/AFL_LLVM_DENYLIST ## 3) Selective instrumentation with AFL_LLVM_ALLOWLIST/AFL_LLVM_DENYLIST
This feature is equivalent to llvm 12 sancov feature and allows to specify This feature is equivalent to llvm 12 sancov feature and allows to specify on a
on a filename and/or function name level to instrument these or skip them. filename and/or function name level to instrument these or skip them.
### 3a) How to use the partial instrumentation mode ### 3a) How to use the partial instrumentation mode
In order to build with partial instrumentation, you need to build with In order to build with partial instrumentation, you need to build with
afl-clang-fast/afl-clang-fast++ or afl-clang-lto/afl-clang-lto++. afl-clang-fast/afl-clang-fast++ or afl-clang-lto/afl-clang-lto++. The only
The only required change is that you need to set either the environment variable required change is that you need to set either the environment variable
AFL_LLVM_ALLOWLIST or AFL_LLVM_DENYLIST set with a filename. `AFL_LLVM_ALLOWLIST` or `AFL_LLVM_DENYLIST` set with a filename.
That file should contain the file names or functions that are to be instrumented That file should contain the file names or functions that are to be instrumented
(AFL_LLVM_ALLOWLIST) or are specifically NOT to be instrumented (AFL_LLVM_DENYLIST). (`AFL_LLVM_ALLOWLIST`) or are specifically NOT to be instrumented
(`AFL_LLVM_DENYLIST`).
GCC_PLUGIN: you can use either AFL_LLVM_ALLOWLIST or AFL_GCC_ALLOWLIST (or the GCC_PLUGIN: you can use either `AFL_LLVM_ALLOWLIST` or `AFL_GCC_ALLOWLIST` (or
same for _DENYLIST), both work. the same for `_DENYLIST`), both work.
For matching to succeed, the function/file name that is being compiled must end in the For matching to succeed, the function/file name that is being compiled must end
function/file name entry contained in this instrument file list. That is to avoid in the function/file name entry contained in this instrument file list. That is
breaking the match when absolute paths are used during compilation. to avoid breaking the match when absolute paths are used during compilation.
**NOTE:** In builds with optimization enabled, functions might be inlined and would not match! **NOTE:** In builds with optimization enabled, functions might be inlined and
would not match!
For example, if your source tree looks like this:
For example if your source tree looks like this:
``` ```
project/ project/
project/feature_a/a1.cpp project/feature_a/a1.cpp
@ -83,36 +87,45 @@ project/feature_b/b1.cpp
project/feature_b/b2.cpp project/feature_b/b2.cpp
``` ```
and you only want to test feature_a, then create an "instrument file list" file containing: And you only want to test feature_a, then create an "instrument file list" file
containing:
``` ```
feature_a/a1.cpp feature_a/a1.cpp
feature_a/a2.cpp feature_a/a2.cpp
``` ```
However if the "instrument file list" file contains only this, it works as well: However, if the "instrument file list" file contains only this, it works as
well:
``` ```
a1.cpp a1.cpp
a2.cpp a2.cpp
``` ```
but it might lead to files being unwantedly instrumented if the same filename
But it might lead to files being unwantedly instrumented if the same filename
exists somewhere else in the project directories. exists somewhere else in the project directories.
You can also specify function names. Note that for C++ the function names You can also specify function names. Note that for C++ the function names must
must be mangled to match! `nm` can print these names. be mangled to match! `nm` can print these names.
AFL++ is able to identify whether an entry is a filename or a function. However,
if you want to be sure (and compliant to the sancov allow/blocklist format), you
can specify source file entries like this:
AFL++ is able to identify whether an entry is a filename or a function.
However if you want to be sure (and compliant to the sancov allow/blocklist
format), you can specify source file entries like this:
``` ```
src: *malloc.c src: *malloc.c
``` ```
and function entries like this:
And function entries like this:
``` ```
fun: MallocFoo fun: MallocFoo
``` ```
Note that whitespace is ignored and comments (`# foo`) are supported. Note that whitespace is ignored and comments (`# foo`) are supported.
### 3b) UNIX-style pattern matching ### 3b) UNIX-style pattern matching
You can add UNIX-style pattern matching in the "instrument file list" entries. You can add UNIX-style pattern matching in the "instrument file list" entries.
See `man fnmatch` for the syntax. We do not set any of the `fnmatch` flags. See `man fnmatch` for the syntax. We do not set any of the `fnmatch` flags.

View File

@ -2,19 +2,17 @@
## Introduction ## Introduction
This originally is the work of an individual nicknamed laf-intel. This originally is the work of an individual nicknamed laf-intel. His blog
His blog [Circumventing Fuzzing Roadblocks with Compiler Transformations](https://lafintel.wordpress.com/) [Circumventing Fuzzing Roadblocks with Compiler Transformations](https://lafintel.wordpress.com/)
and gitlab repo [laf-llvm-pass](https://gitlab.com/laf-intel/laf-llvm-pass/) and GitLab repo [laf-llvm-pass](https://gitlab.com/laf-intel/laf-llvm-pass/)
describe some code transformations that describe some code transformations that help AFL++ to enter conditional blocks,
help AFL++ to enter conditional blocks, where conditions consist of where conditions consist of comparisons of large values.
comparisons of large values.
## Usage ## Usage
By default these passes will not run when you compile programs using By default, these passes will not run when you compile programs using
afl-clang-fast. Hence, you can use AFL as usual. afl-clang-fast. Hence, you can use AFL++ as usual. To enable the passes, you
To enable the passes you must set environment variables before you must set environment variables before you compile the target project.
compile the target project.
The following options exist: The following options exist:
@ -24,32 +22,30 @@ Enables the split-switches pass.
`export AFL_LLVM_LAF_TRANSFORM_COMPARES=1` `export AFL_LLVM_LAF_TRANSFORM_COMPARES=1`
Enables the transform-compares pass (strcmp, memcmp, strncmp, Enables the transform-compares pass (strcmp, memcmp, strncmp, strcasecmp,
strcasecmp, strncasecmp). strncasecmp).
`export AFL_LLVM_LAF_SPLIT_COMPARES=1` `export AFL_LLVM_LAF_SPLIT_COMPARES=1`
Enables the split-compares pass. Enables the split-compares pass. By default, it will
By default it will
1. simplify operators >= (and <=) into chains of > (<) and == comparisons 1. simplify operators >= (and <=) into chains of > (<) and == comparisons
2. change signed integer comparisons to a chain of sign-only comparison 2. change signed integer comparisons to a chain of sign-only comparison and
and unsigned integer comparisons unsigned integer comparisons
3. split all unsigned integer comparisons with bit widths of 3. split all unsigned integer comparisons with bit widths of 64, 32, or 16 bits
64, 32 or 16 bits to chains of 8 bits comparisons. to chains of 8 bits comparisons.
You can change the behaviour of the last step by setting You can change the behavior of the last step by setting `export
`export AFL_LLVM_LAF_SPLIT_COMPARES_BITW=<bit_width>`, where AFL_LLVM_LAF_SPLIT_COMPARES_BITW=<bit_width>`, where bit_width may be 64, 32, or
bit_width may be 64, 32 or 16. For example, a bit_width of 16 16. For example, a bit_width of 16 would split larger comparisons down to 16 bit
would split larger comparisons down to 16 bit comparisons. comparisons.
A new experimental feature is splitting floating point comparisons into a A new experimental feature is splitting floating point comparisons into a series
series of sign, exponent and mantissa comparisons followed by splitting each of sign, exponent and mantissa comparisons followed by splitting each of them
of them into 8 bit comparisons when necessary. into 8 bit comparisons when necessary. It is activated with the
It is activated with the `AFL_LLVM_LAF_SPLIT_FLOATS` setting. `AFL_LLVM_LAF_SPLIT_FLOATS` setting. Please note that full IEEE 754
Please note that full IEEE 754 functionality is not preserved, that is functionality is not preserved, that is values of nan and infinity will probably
values of nan and infinity will probably behave differently. behave differently.
Note that setting this automatically activates `AFL_LLVM_LAF_SPLIT_COMPARES` Note that setting this automatically activates `AFL_LLVM_LAF_SPLIT_COMPARES`.
You can also set `AFL_LLVM_LAF_ALL` and have all of the above enabled :-)
You can also set `AFL_LLVM_LAF_ALL` and have all of the above enabled. :-)

View File

@ -1,55 +1,56 @@
# afl-clang-lto - collision free instrumentation at link time # afl-clang-lto - collision free instrumentation at link time
## TLDR; ## TL;DR:
This version requires a current llvm 11+ compiled from the github master. This version requires a current llvm 11+ compiled from the GitHub master.
1. Use afl-clang-lto/afl-clang-lto++ because it is faster and gives better 1. Use afl-clang-lto/afl-clang-lto++ because it is faster and gives better
coverage than anything else that is out there in the AFL world coverage than anything else that is out there in the AFL world.
2. You can use it together with llvm_mode: laf-intel and the instrument file listing 2. You can use it together with llvm_mode: laf-intel and the instrument file
features and can be combined with cmplog/Redqueen listing features and can be combined with cmplog/Redqueen.
3. It only works with llvm 11+ 3. It only works with llvm 11+.
4. AUTODICTIONARY feature! see below 4. AUTODICTIONARY feature (see below)!
5. If any problems arise be sure to set `AR=llvm-ar RANLIB=llvm-ranlib`. 5. If any problems arise, be sure to set `AR=llvm-ar RANLIB=llvm-ranlib`. Some
Some targets might need `LD=afl-clang-lto` and others `LD=afl-ld-lto`. targets might need `LD=afl-clang-lto` and others `LD=afl-ld-lto`.
## Introduction and problem description ## Introduction and problem description
A big issue with how AFL/AFL++ works is that the basic block IDs that are A big issue with how AFL++ works is that the basic block IDs that are set during
set during compilation are random - and hence naturally the larger the number compilation are random - and hence naturally the larger the number of
of instrumented locations, the higher the number of edge collisions are in the instrumented locations, the higher the number of edge collisions are in the map.
map. This can result in not discovering new paths and therefore degrade the This can result in not discovering new paths and therefore degrade the
efficiency of the fuzzing process. efficiency of the fuzzing process.
*This issue is underestimated in the fuzzing community!* *This issue is underestimated in the fuzzing community!* With a 2^16 = 64kb
With a 2^16 = 64kb standard map at already 256 instrumented blocks there is standard map at already 256 instrumented blocks, there is on average one
on average one collision. On average a target has 10.000 to 50.000 collision. On average, a target has 10.000 to 50.000 instrumented blocks, hence
instrumented blocks hence the real collisions are between 750-18.000! the real collisions are between 750-18.000!
To reach a solution that prevents any collisions took several approaches To reach a solution that prevents any collisions took several approaches and
and many dead ends until we got to this: many dead ends until we got to this:
* We instrument at link time when we have all files pre-compiled * We instrument at link time when we have all files pre-compiled.
* To instrument at link time we compile in LTO (link time optimization) mode * To instrument at link time, we compile in LTO (link time optimization) mode.
* Our compiler (afl-clang-lto/afl-clang-lto++) takes care of setting the * Our compiler (afl-clang-lto/afl-clang-lto++) takes care of setting the correct
correct LTO options and runs our own afl-ld linker instead of the system LTO options and runs our own afl-ld linker instead of the system linker.
linker * The LLVM linker collects all LTO files to link and instruments them so that we
* The LLVM linker collects all LTO files to link and instruments them so that have non-colliding edge overage.
we have non-colliding edge overage * We use a new (for afl) edge coverage - which is the same as in llvm
* We use a new (for afl) edge coverage - which is the same as in llvm -fsanitize=coverage edge coverage mode. :)
-fsanitize=coverage edge coverage mode :)
The result: The result:
* 10-25% speed gain compared to llvm_mode
* guaranteed non-colliding edge coverage :-) * 10-25% speed gain compared to llvm_mode
* The compile time especially for binaries to an instrumented library can be * guaranteed non-colliding edge coverage :-)
much longer * The compile time, especially for binaries to an instrumented library, can be
much longer.
Example build output from a libtiff build: Example build output from a libtiff build:
``` ```
libtool: link: afl-clang-lto -g -O2 -Wall -W -o thumbnail thumbnail.o ../libtiff/.libs/libtiff.a ../port/.libs/libport.a -llzma -ljbig -ljpeg -lz -lm libtool: link: afl-clang-lto -g -O2 -Wall -W -o thumbnail thumbnail.o ../libtiff/.libs/libtiff.a ../port/.libs/libport.a -llzma -ljbig -ljpeg -lz -lm
afl-clang-lto++2.63d by Marc "vanHauser" Heuse <mh@mh-sec.de> in mode LTO afl-clang-lto++2.63d by Marc "vanHauser" Heuse <mh@mh-sec.de> in mode LTO
@ -62,21 +63,24 @@ AUTODICTIONARY: 11 strings found
### Installing llvm version 11 or 12 ### Installing llvm version 11 or 12
llvm 11 or even 12 should be available in all current Linux repositories. llvm 11 or even 12 should be available in all current Linux repositories. If you
If you use an outdated Linux distribution read the next section. use an outdated Linux distribution, read the next section.
### Installing llvm from the llvm repository (version 12+) ### Installing llvm from the llvm repository (version 12+)
Installing the llvm snapshot builds is easy and mostly painless: Installing the llvm snapshot builds is easy and mostly painless:
In the follow line change `NAME` for your Debian or Ubuntu release name In the following line, change `NAME` for your Debian or Ubuntu release name
(e.g. buster, focal, eon, etc.): (e.g. buster, focal, eon, etc.):
``` ```
echo deb http://apt.llvm.org/NAME/ llvm-toolchain-NAME NAME >> /etc/apt/sources.list echo deb http://apt.llvm.org/NAME/ llvm-toolchain-NAME NAME >> /etc/apt/sources.list
``` ```
then add the pgp key of llvm and install the packages:
Then add the pgp key of llvm and install the packages:
``` ```
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add -
apt-get update && apt-get upgrade -y apt-get update && apt-get upgrade -y
apt-get install -y clang-12 clang-tools-12 libc++1-12 libc++-12-dev \ apt-get install -y clang-12 clang-tools-12 libc++1-12 libc++-12-dev \
libc++abi1-12 libc++abi-12-dev libclang1-12 libclang-12-dev \ libc++abi1-12 libc++abi-12-dev libclang1-12 libclang-12-dev \
@ -87,7 +91,8 @@ apt-get install -y clang-12 clang-tools-12 libc++1-12 libc++-12-dev \
### Building llvm yourself (version 12+) ### Building llvm yourself (version 12+)
Building llvm from github takes quite some long time and is not painless: Building llvm from GitHub takes quite some time and is not painless:
```sh ```sh
sudo apt install binutils-dev # this is *essential*! sudo apt install binutils-dev # this is *essential*!
git clone --depth=1 https://github.com/llvm/llvm-project git clone --depth=1 https://github.com/llvm/llvm-project
@ -126,10 +131,12 @@ sudo make install
Just use afl-clang-lto like you did with afl-clang-fast or afl-gcc. Just use afl-clang-lto like you did with afl-clang-fast or afl-gcc.
Also the instrument file listing (AFL_LLVM_ALLOWLIST/AFL_LLVM_DENYLIST -> [README.instrument_list.md](README.instrument_list.md)) and Also, the instrument file listing (AFL_LLVM_ALLOWLIST/AFL_LLVM_DENYLIST ->
laf-intel/compcov (AFL_LLVM_LAF_* -> [README.laf-intel.md](README.laf-intel.md)) work. [README.instrument_list.md](README.instrument_list.md)) and laf-intel/compcov
(AFL_LLVM_LAF_* -> [README.laf-intel.md](README.laf-intel.md)) work.
Example: Example:
``` ```
CC=afl-clang-lto CXX=afl-clang-lto++ RANLIB=llvm-ranlib AR=llvm-ar ./configure CC=afl-clang-lto CXX=afl-clang-lto++ RANLIB=llvm-ranlib AR=llvm-ar ./configure
make make
@ -143,51 +150,48 @@ NOTE: some targets also need to set the linker, try both `afl-clang-lto` and
Note: this is highly discouraged! Try to compile to static libraries with Note: this is highly discouraged! Try to compile to static libraries with
afl-clang-lto instead of shared libraries! afl-clang-lto instead of shared libraries!
To make instrumented shared libraries work with afl-clang-lto you have to do To make instrumented shared libraries work with afl-clang-lto, you have to do
quite some extra steps. quite some extra steps.
Every shared library you want to instrument has to be individually compiled. Every shared library you want to instrument has to be individually compiled. The
The environment variable `AFL_LLVM_LTO_DONTWRITEID=1` has to be set during environment variable `AFL_LLVM_LTO_DONTWRITEID=1` has to be set during
compilation. compilation. Additionally, the environment variable `AFL_LLVM_LTO_STARTID` has
Additionally the environment variable `AFL_LLVM_LTO_STARTID` has to be set to to be set to the added edge count values of all previous compiled instrumented
the added edge count values of all previous compiled instrumented shared shared libraries for that target. E.g., for the first shared library this would
libraries for that target. be `AFL_LLVM_LTO_STARTID=0` and afl-clang-lto will then report how many edges
E.g. for the first shared library this would be `AFL_LLVM_LTO_STARTID=0` and have been instrumented (let's say it reported 1000 instrumented edges). The
afl-clang-lto will then report how many edges have been instrumented (let's say second shared library then has to be set to that value
it reported 1000 instrumented edges).
The second shared library then has to be set to that value
(`AFL_LLVM_LTO_STARTID=1000` in our example), for the third to all previous (`AFL_LLVM_LTO_STARTID=1000` in our example), for the third to all previous
counts added, etc. counts added, etc.
The final program compilation step then may *not* have `AFL_LLVM_LTO_DONTWRITEID` The final program compilation step then may *not* have
set, and `AFL_LLVM_LTO_STARTID` must be set to all edge counts added of all shared `AFL_LLVM_LTO_DONTWRITEID` set, and `AFL_LLVM_LTO_STARTID` must be set to all
libraries it will be linked to. edge counts added of all shared libraries it will be linked to.
This is quite some hands-on work, so better stay away from instrumenting This is quite some hands-on work, so better stay away from instrumenting shared
shared libraries :-) libraries. :-)
## AUTODICTIONARY feature ## AUTODICTIONARY feature
While compiling, a dictionary based on string comparisons is automatically While compiling, a dictionary based on string comparisons is automatically
generated and put into the target binary. This dictionary is transfered to afl-fuzz generated and put into the target binary. This dictionary is transferred to
on start. This improves coverage statistically by 5-10% :) afl-fuzz on start. This improves coverage statistically by 5-10%. :)
Note that if for any reason you do not want to use the autodictionary feature Note that if for any reason you do not want to use the autodictionary feature,
then just set the environment variable `AFL_NO_AUTODICT` when starting afl-fuzz. then just set the environment variable `AFL_NO_AUTODICT` when starting afl-fuzz.
## Fixed memory map ## Fixed memory map
To speed up fuzzing a little bit more, it is possible to set a fixed shared To speed up fuzzing a little bit more, it is possible to set a fixed shared
memory map. memory map. Recommended is the value 0x10000.
Recommended is the value 0x10000.
In most cases this will work without any problems. However if a target uses In most cases, this will work without any problems. However, if a target uses
early constructors, ifuncs or a deferred forkserver this can crash the target. early constructors, ifuncs, or a deferred forkserver, this can crash the target.
Also on unusual operating systems/processors/kernels or weird libraries the Also, on unusual operating systems/processors/kernels or weird libraries the
recommended 0x10000 address might not work, so then change the fixed address. recommended 0x10000 address might not work, so then change the fixed address.
To enable this feature set AFL_LLVM_MAP_ADDR with the address. To enable this feature, set `AFL_LLVM_MAP_ADDR` with the address.
## Document edge IDs ## Document edge IDs
@ -206,143 +210,155 @@ these.
An example of a hard to solve target is ffmpeg. Here is how to successfully An example of a hard to solve target is ffmpeg. Here is how to successfully
instrument it: instrument it:
1. Get and extract the current ffmpeg and change to its directory 1. Get and extract the current ffmpeg and change to its directory.
2. Running configure with --cc=clang fails and various other items will fail 2. Running configure with --cc=clang fails and various other items will fail
when compiling, so we have to trick configure: when compiling, so we have to trick configure:
``` ```
./configure --enable-lto --disable-shared --disable-inline-asm ./configure --enable-lto --disable-shared --disable-inline-asm
``` ```
3. Now the configuration is done - and we edit the settings in `./ffbuild/config.mak` 3. Now the configuration is done - and we edit the settings in
(-: the original line, +: what to change it into): `./ffbuild/config.mak` (-: the original line, +: what to change it into):
```
-CC=gcc
+CC=afl-clang-lto
-CXX=g++
+CXX=afl-clang-lto++
-AS=gcc
+AS=llvm-as
-LD=gcc
+LD=afl-clang-lto++
-DEPCC=gcc
+DEPCC=afl-clang-lto
-DEPAS=gcc
+DEPAS=afl-clang-lto++
-AR=ar
+AR=llvm-ar
-AR_CMD=ar
+AR_CMD=llvm-ar
-NM_CMD=nm -g
+NM_CMD=llvm-nm -g
-RANLIB=ranlib -D
+RANLIB=llvm-ranlib -D
```
4. Then type make, wait for a long time and you are done :) ```
-CC=gcc
+CC=afl-clang-lto
-CXX=g++
+CXX=afl-clang-lto++
-AS=gcc
+AS=llvm-as
-LD=gcc
+LD=afl-clang-lto++
-DEPCC=gcc
+DEPCC=afl-clang-lto
-DEPAS=gcc
+DEPAS=afl-clang-lto++
-AR=ar
+AR=llvm-ar
-AR_CMD=ar
+AR_CMD=llvm-ar
-NM_CMD=nm -g
+NM_CMD=llvm-nm -g
-RANLIB=ranlib -D
+RANLIB=llvm-ranlib -D
```
4. Then type make, wait for a long time, and you are done. :)
### Example: WebKit jsc ### Example: WebKit jsc
Building jsc is difficult as the build script has bugs. Building jsc is difficult as the build script has bugs.
1. checkout Webkit: 1. Checkout Webkit:
```
svn checkout https://svn.webkit.org/repository/webkit/trunk WebKit ```
cd WebKit svn checkout https://svn.webkit.org/repository/webkit/trunk WebKit
``` cd WebKit
```
2. Fix the build environment: 2. Fix the build environment:
```
mkdir -p WebKitBuild/Release
cd WebKitBuild/Release
ln -s ../../../../../usr/bin/llvm-ar-12 llvm-ar-12
ln -s ../../../../../usr/bin/llvm-ranlib-12 llvm-ranlib-12
cd ../..
```
3. Build :) ```
mkdir -p WebKitBuild/Release
cd WebKitBuild/Release
ln -s ../../../../../usr/bin/llvm-ar-12 llvm-ar-12
ln -s ../../../../../usr/bin/llvm-ranlib-12 llvm-ranlib-12
cd ../..
```
``` 3. Build. :)
Tools/Scripts/build-jsc --jsc-only --cli --cmakeargs="-DCMAKE_AR='llvm-ar-12' -DCMAKE_RANLIB='llvm-ranlib-12' -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_CC_FLAGS='-O3 -lrt' -DCMAKE_CXX_FLAGS='-O3 -lrt' -DIMPORTED_LOCATION='/lib/x86_64-linux-gnu/' -DCMAKE_CC=afl-clang-lto -DCMAKE_CXX=afl-clang-lto++ -DENABLE_STATIC_JSC=ON"
``` ```
Tools/Scripts/build-jsc --jsc-only --cli --cmakeargs="-DCMAKE_AR='llvm-ar-12' -DCMAKE_RANLIB='llvm-ranlib-12' -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_CC_FLAGS='-O3 -lrt' -DCMAKE_CXX_FLAGS='-O3 -lrt' -DIMPORTED_LOCATION='/lib/x86_64-linux-gnu/' -DCMAKE_CC=afl-clang-lto -DCMAKE_CXX=afl-clang-lto++ -DENABLE_STATIC_JSC=ON"
```
## Potential issues ## Potential issues
### compiling libraries fails ### Compiling libraries fails
If you see this message: If you see this message:
``` ```
/bin/ld: libfoo.a: error adding symbols: archive has no index; run ranlib to add one /bin/ld: libfoo.a: error adding symbols: archive has no index; run ranlib to add one
``` ```
This is because usually gnu gcc ranlib is being called which cannot deal with clang LTO files.
The solution is simple: when you ./configure you also have to set RANLIB=llvm-ranlib and AR=llvm-ar This is because usually gnu gcc ranlib is being called which cannot deal with
clang LTO files. The solution is simple: when you `./configure`, you also have
to set `RANLIB=llvm-ranlib` and `AR=llvm-ar`.
Solution: Solution:
``` ```
AR=llvm-ar RANLIB=llvm-ranlib CC=afl-clang-lto CXX=afl-clang-lto++ ./configure --disable-shared AR=llvm-ar RANLIB=llvm-ranlib CC=afl-clang-lto CXX=afl-clang-lto++ ./configure --disable-shared
``` ```
and on some targets you have to set AR=/RANLIB= even for make as the configure script does not save it.
Other targets ignore environment variables and need the parameters set via
`./configure --cc=... --cxx= --ranlib= ...` etc. (I am looking at you ffmpeg!).
And on some targets you have to set `AR=/RANLIB=` even for `make` as the
configure script does not save it. Other targets ignore environment variables
and need the parameters set via `./configure --cc=... --cxx= --ranlib= ...` etc.
(I am looking at you ffmpeg!)
If you see this message:
If you see this message
``` ```
assembler command failed ... assembler command failed ...
``` ```
then try setting `llvm-as` for configure:
Then try setting `llvm-as` for configure:
``` ```
AS=llvm-as ... AS=llvm-as ...
``` ```
### compiling programs still fail ### Compiling programs still fail
afl-clang-lto is still work in progress. afl-clang-lto is still work in progress.
Known issues: Known issues:
* Anything that llvm 11+ cannot compile, afl-clang-lto cannot compile either - obviously * Anything that llvm 11+ cannot compile, afl-clang-lto cannot compile either -
* Anything that does not compile with LTO, afl-clang-lto cannot compile either - obviously obviously.
* Anything that does not compile with LTO, afl-clang-lto cannot compile either -
obviously.
Hence if building a target with afl-clang-lto fails try to build it with llvm12 Hence, if building a target with afl-clang-lto fails, try to build it with
and LTO enabled (`CC=clang-12` `CXX=clang++-12` `CFLAGS=-flto=full` and llvm12 and LTO enabled (`CC=clang-12`, `CXX=clang++-12`, `CFLAGS=-flto=full`,
`CXXFLAGS=-flto=full`). and `CXXFLAGS=-flto=full`).
If this succeeeds then there is an issue with afl-clang-lto. Please report at If this succeeds, then there is an issue with afl-clang-lto. Please report at
[https://github.com/AFLplusplus/AFLplusplus/issues/226](https://github.com/AFLplusplus/AFLplusplus/issues/226) [https://github.com/AFLplusplus/AFLplusplus/issues/226](https://github.com/AFLplusplus/AFLplusplus/issues/226).
Even some targets where clang-12 fails can be build if the fail is just in Even some targets where clang-12 fails can be build if the fail is just in
`./configure`, see `Solving difficult targets` above. `./configure`, see `Solving difficult targets` above.
## History ## History
This was originally envisioned by hexcoder- in Summer 2019, however we saw no This was originally envisioned by hexcoder- in Summer 2019. However, we saw no
way to create a pass that is run at link time - although there is a option way to create a pass that is run at link time - although there is a option for
for this in the PassManager: EP_FullLinkTimeOptimizationLast this in the PassManager: EP_FullLinkTimeOptimizationLast. ("Fun" info - nobody
("Fun" info - nobody knows what this is doing. And the developer who knows what this is doing. And the developer who implemented this didn't respond
implemented this didn't respond to emails.) to emails.)
In December then came the idea to implement this as a pass that is run via In December then came the idea to implement this as a pass that is run via the
the llvm "opt" program, which is performed via an own linker that afterwards llvm "opt" program, which is performed via an own linker that afterwards calls
calls the real linker. the real linker. This was first implemented in January and work ... kinda. The
This was first implemented in January and work ... kinda. LTO time instrumentation worked, however, "how" the basic blocks were
The LTO time instrumentation worked, however "how" the basic blocks were instrumented was a problem, as reducing duplicates turned out to be very, very
instrumented was a problem, as reducing duplicates turned out to be very, difficult with a program that has so many paths and therefore so many
very difficult with a program that has so many paths and therefore so many dependencies. A lot of strategies were implemented - and failed. And then sat
dependencies. A lot of strategies were implemented - and failed. solvers were tried, but with over 10.000 variables that turned out to be a
And then sat solvers were tried, but with over 10.000 variables that turned dead-end too.
out to be a dead-end too.
The final idea to solve this came from domenukk who proposed to insert a block The final idea to solve this came from domenukk who proposed to insert a block
into an edge and then just use incremental counters ... and this worked! into an edge and then just use incremental counters ... and this worked! After
After some trials and errors to implement this vanhauser-thc found out that some trials and errors to implement this vanhauser-thc found out that there is
there is actually an llvm function for this: SplitEdge() :-) actually an llvm function for this: SplitEdge() :-)
Still more problems came up though as this only works without bugs from Still more problems came up though as this only works without bugs from llvm 9
llvm 9 onwards, and with high optimization the link optimization ruins onwards, and with high optimization the link optimization ruins the instrumented
the instrumented control flow graph. control flow graph.
This is all now fixed with llvm 11+. The llvm's own linker is now able to This is all now fixed with llvm 11+. The llvm's own linker is now able to load
load passes and this bypasses all problems we had. passes and this bypasses all problems we had.
Happy end :) Happy end :)

View File

@ -132,7 +132,7 @@ and you should be all set!
Some libraries provide APIs that are stateless, or whose state can be reset in Some libraries provide APIs that are stateless, or whose state can be reset in
between processing different input files. When such a reset is performed, a between processing different input files. When such a reset is performed, a
single long-lived process can be reused to try out multiple test cases, single long-lived process can be reused to try out multiple test cases,
eliminating the need for repeated fork() calls and the associated OS overhead. eliminating the need for repeated `fork()` calls and the associated OS overhead.
The basic structure of the program that does this would be: The basic structure of the program that does this would be: