Fix various missed issues - 1st run

This commit is contained in:
llzmb
2021-12-05 19:04:45 +01:00
parent 139db40cfc
commit bb506de0b8
8 changed files with 159 additions and 114 deletions

View File

@ -1,17 +1,17 @@
# GramaTron
GramaTron is a coverage-guided fuzzer that uses grammar automatons to perform
grammar-aware fuzzing. Technical details about our framework are available
in the [ISSTA'21 paper](https://nebelwelt.net/files/21ISSTA.pdf).
The artifact to reproduce the experiments presented in the paper are present
in `artifact/`. Instructions to run a sample campaign and incorporate new
grammars is presented below:
grammar-aware fuzzing. Technical details about our framework are available in
the [ISSTA'21 paper](https://nebelwelt.net/files/21ISSTA.pdf). The artifact to
reproduce the experiments presented in the paper are present in `artifact/`.
Instructions to run a sample campaign and incorporate new grammars is presented
below:
# Compiling
## Compiling
Execute `./build_gramatron_mutator.sh`.
# Running
## Running
You have to set the grammar file to use with `GRAMATRON_AUTOMATION`:
@ -23,26 +23,27 @@ export GRAMATRON_AUTOMATION=grammars/ruby/source_automata.json
afl-fuzz -i in -o out -- ./target
```
# Adding and testing a new grammar
## Adding and testing a new grammar
- Specify in a JSON format for CFG. Examples are correspond `source.json` files
- Specify in a JSON format for CFG. Examples are correspond `source.json` files.
- Run the automaton generation script (in `src/gramfuzz-mutator/preprocess`)
which will place the generated automaton in the same folder.
```
./preprocess/prep_automaton.sh <grammar_file> <start_symbol> [stack_limit]
```
./preprocess/prep_automaton.sh <grammar_file> <start_symbol> [stack_limit]
E.g., ./preprocess/prep_automaton.sh ~/grammars/ruby/source.json PROGRAM
```
E.g., ./preprocess/prep_automaton.sh ~/grammars/ruby/source.json PROGRAM
```
- If the grammar has no self-embedding rules then you do not need to pass the
stack limit parameter. However, if it does have self-embedding rules then you
- If the grammar has no self-embedding rules, then you do not need to pass the
stack limit parameter. However, if it does have self-embedding rules, then you
need to pass the stack limit parameter. We recommend starting with `5` and
then increasing it if you need more complexity
- To sanity-check that the automaton is generating inputs as expected you can use the `test` binary housed in `src/gramfuzz-mutator`
then increasing it if you need more complexity.
- To sanity-check that the automaton is generating inputs as expected, you can
use the `test` binary housed in `src/gramfuzz-mutator`.
```
./test SanityCheck <automaton_file>
```
./test SanityCheck <automaton_file>
E.g., ./test SanityCheck ~/grammars/ruby/source_automata.json
```
E.g., ./test SanityCheck ~/grammars/ruby/source_automata.json
```

View File

@ -2,17 +2,17 @@
(See [../README.md](../README.md) for the general instruction manual.)
This subdirectory contains a set of dictionaries that can be used in
conjunction with the -x option to allow the fuzzer to effortlessly explore the
grammar of some of the more verbose data formats or languages.
This subdirectory contains a set of dictionaries that can be used in conjunction
with the -x option to allow the fuzzer to effortlessly explore the grammar of
some of the more verbose data formats or languages.
These sets were done by Michal Zalewski, various contributors, and imported
from oss-fuzz, go-fuzz and libfuzzer.
These sets were done by Michal Zalewski, various contributors, and imported from
oss-fuzz, go-fuzz and libfuzzer.
Custom dictionaries can be added at will. They should consist of a
reasonably-sized set of rudimentary syntax units that the fuzzer will then try
to clobber together in various ways. Snippets between 2 and 16 bytes are
usually the sweet spot.
to clobber together in various ways. Snippets between 2 and 16 bytes are usually
the sweet spot.
Custom dictionaries can be created in two ways:
@ -34,9 +34,9 @@ In the file mode, every name field can be optionally followed by @<num>, e.g.:
`keyword_foo@1 = "foo"`
Such entries will be loaded only if the requested dictionary level is equal or
higher than this number. The default level is zero; a higher value can be set
by appending @<num> to the dictionary file name, like so:
higher than this number. The default level is zero; a higher value can be set by
appending @<num> to the dictionary file name, like so:
`-x path/to/dictionary.dct@2`
Good examples of dictionaries can be found in xml.dict and png.dict.
Good examples of dictionaries can be found in xml.dict and png.dict.

View File

@ -468,7 +468,8 @@ cd ../../
sudo make install
```
To learn more about remote monitoring and metrics visualization with StatsD, see [rpc_statsd.md](rpc_statsd.md).
To learn more about remote monitoring and metrics visualization with StatsD, see
[rpc_statsd.md](rpc_statsd.md).
### Addendum: status and plot files
@ -524,9 +525,9 @@ into each of them or deploy scripts to read the fuzzer statistics. Using
`AFL_STATSD` (and the other related environment variables `AFL_STATSD_HOST`,
`AFL_STATSD_PORT`, `AFL_STATSD_TAGS_FLAVOR`) you can automatically send metrics
to your favorite StatsD server. Depending on your StatsD server, you will be
able to monitor, trigger alerts, or perform actions based on these metrics (e.g:
alert on slow exec/s for a new build, threshold of crashes, time since last
crash > X, etc.).
able to monitor, trigger alerts, or perform actions based on these metrics
(e.g.: alert on slow exec/s for a new build, threshold of crashes, time since
last crash > X, etc.).
The selected metrics are a subset of all the metrics found in the status and in
the plot file. The list is the following: `cycle_done`, `cycles_wo_finds`,
@ -537,6 +538,6 @@ the plot file. The list is the following: `cycle_done`, `cycles_wo_finds`,
definitions can be found in the addendum above.
When using multiple fuzzer instances with StatsD, it is *strongly* recommended
to setup the flavor (AFL_STATSD_TAGS_FLAVOR) to match your StatsD server. This
to setup the flavor (`AFL_STATSD_TAGS_FLAVOR`) to match your StatsD server. This
will allow you to see individual fuzzer performance, detect bad ones, see the
progress of each strategy...

View File

@ -276,7 +276,7 @@ gcc -shared -Wall -O3 example.c -o example.so
```
Note that if you specify multiple custom mutators, the corresponding functions
will be called in the order in which they are specified. e.g. first
will be called in the order in which they are specified. E.g., the first
`post_process` function of `example_first.so` will be called and then that of
`example_second.so`.

View File

@ -585,7 +585,7 @@ The FRIDA wrapper used to instrument binary-only code supports many of the same
options as `afl-qemu-trace`, but also has a number of additional advanced
options. These are listed in brief below (see
[frida_mode/README.md](../frida_mode/README.md) for more details). These
settings are provided for compatibiltiy with QEMU mode, the preferred way to
settings are provided for compatibility with QEMU mode, the preferred way to
configure FRIDA mode is through its [scripting](../frida_mode/Scripting.md)
support.

View File

@ -1,7 +1,7 @@
# Important features of AFL++
AFL++ supports llvm from 3.8 up to version 12, very fast binary fuzzing with
QEMU 5.1 with laf-intel and redqueen, frida mode, unicorn mode, gcc plugin, full
QEMU 5.1 with laf-intel and redqueen, FRIDA mode, unicorn mode, gcc plugin, full
*BSD, Mac OS, Solaris and Android support and much, much, much more.
| Feature/Instrumentation | afl-gcc | llvm | gcc_plugin | FRIDA mode(9) | QEMU mode(10) |unicorn_mode(10) |coresight_mode(11)|
@ -30,7 +30,7 @@ QEMU 5.1 with laf-intel and redqueen, frida mode, unicorn mode, gcc plugin, full
versions that write to a file to use with afl-fuzz' `-x`
8. the snapshot LKM is currently unmaintained due to too many kernel changes
coming too fast :-(
9. frida mode is supported on Linux and MacOS for Intel and ARM
9. FRIDA mode is supported on Linux and MacOS for Intel and ARM
10. QEMU/Unicorn is only supported on Linux
11. Coresight mode is only available on AARCH64 Linux with a CPU with Coresight
extension

View File

@ -2,21 +2,35 @@
## What is this?
`Autodict-QL` is a plugin system that enables fast generation of Tokens/Dictionaries in a handy way that can be manipulated by the user (unlike The LLVM Passes that are hard to modify). This means that autodict-ql is a scriptable feature which basically uses CodeQL (a powerful semantic code analysis engine) to fetch information from a code base.
`Autodict-QL` is a plugin system that enables fast generation of
Tokens/Dictionaries in a handy way that can be manipulated by the user (unlike
The LLVM Passes that are hard to modify). This means that autodict-ql is a
scriptable feature which basically uses CodeQL (a powerful semantic code
analysis engine) to fetch information from a code base.
Tokens are useful when you perform fuzzing on different parsers. The AFL++ `-x` switch enables the usage of dictionaries through your fuzzing campaign. If you are not familiar with Dictionaries in fuzzing, take a look [here](https://github.com/AFLplusplus/AFLplusplus/tree/stable/dictionaries) .
Tokens are useful when you perform fuzzing on different parsers. The AFL++ `-x`
switch enables the usage of dictionaries through your fuzzing campaign. If you
are not familiar with Dictionaries in fuzzing, take a look
[here](https://github.com/AFLplusplus/AFLplusplus/tree/stable/dictionaries).
## Why CodeQL ?
## Why CodeQL?
We basically developed this plugin on top of the CodeQL engine because it gives the user scripting features, it's easier and it's independent of the LLVM system. This means that a user can write his CodeQL scripts or modify the current scripts to improve or change the token generation algorithms based on different program analysis concepts.
We basically developed this plugin on top of the CodeQL engine because it gives
the user scripting features, it's easier and it's independent of the LLVM
system. This means that a user can write his CodeQL scripts or modify the
current scripts to improve or change the token generation algorithms based on
different program analysis concepts.
## CodeQL scripts
Currently, we pushed some scripts as defaults for Token generation. In addition, we provide every CodeQL script as an standalone script because it's easier to modify or test.
Currently, we pushed some scripts as defaults for Token generation. In addition,
we provide every CodeQL script as an standalone script because it's easier to
modify or test.
Currently we provided the following CodeQL scripts :
Currently we provided the following CodeQL scripts:
`strcmp-str.ql` is used to extract strings that are related to the `strcmp` function.
`strcmp-str.ql` is used to extract strings that are related to the `strcmp`
function.
`strncmp-str.ql` is used to extract the strings from the `strncmp` function.
@ -24,13 +38,18 @@ Currently we provided the following CodeQL scripts :
`litool.ql` extracts Magic numbers as Hexadecimal format.
`strtool.ql` extracts strings with uses of a regex and dataflow concept to capture the string comparison functions. If `strcmp` is rewritten in a project as Mystrcmp or something like strmycmp, then this script can catch the arguments and these are valuable tokens.
`strtool.ql` extracts strings with uses of a regex and dataflow concept to
capture the string comparison functions. If `strcmp` is rewritten in a project
as Mystrcmp or something like strmycmp, then this script can catch the arguments
and these are valuable tokens.
You can write other CodeQL scripts to extract possible effective tokens if you think they can be useful.
You can write other CodeQL scripts to extract possible effective tokens if you
think they can be useful.
## Usage
Before you proceed to installation make sure that you have the following packages by installing them:
Before you proceed to installation make sure that you have the following
packages by installing them:
```shell
sudo apt install build-essential libtool-bin python3-dev python3 automake git vim wget -y
@ -38,66 +57,91 @@ sudo apt install build-essential libtool-bin python3-dev python3 automake git vi
The usage of Autodict-QL is pretty easy. But let's describe it as:
1. First of all, you need to have CodeQL installed on the system. We make this possible with `build-codeql.sh` bash script. This script will install CodeQL completety and will set the required environment variables for your system.
Do the following:
1. First of all, you need to have CodeQL installed on the system. We make this
possible with `build-codeql.sh` bash script. This script will install CodeQL
completety and will set the required environment variables for your system.
Do the following:
```shell
# chmod +x codeql-build.sh
# ./codeql-build.sh
# source ~/.bashrc
# codeql
```
```shell
# chmod +x codeql-build.sh
# ./codeql-build.sh
# source ~/.bashrc
# codeql
```
Then you should get:
Then you should get:
```shell
Usage: codeql <command> <argument>...
Create and query CodeQL databases, or work with the QL language.
```shell
Usage: codeql <command> <argument>...
Create and query CodeQL databases, or work with the QL language.
GitHub makes this program freely available for the analysis of open-source software and certain other uses, but it is
not itself free software. Type codeql --license to see the license terms.
GitHub makes this program freely available for the analysis of open-source software and certain other uses, but it is
not itself free software. Type codeql --license to see the license terms.
--license Show the license terms for the CodeQL toolchain.
Common options:
-h, --help Show this help text.
-v, --verbose Incrementally increase the number of progress messages printed.
-q, --quiet Incrementally decrease the number of progress messages printed.
Some advanced options have been hidden; try --help -v for a fuller view.
Commands:
query Compile and execute QL code.
bqrs Get information from .bqrs files.
database Create, analyze and process CodeQL databases.
dataset [Plumbing] Work with raw QL datasets.
test Execute QL unit tests.
resolve [Deep plumbing] Helper commands to resolve disk locations etc.
execute [Deep plumbing] Low-level commands that need special JVM options.
version Show the version of the CodeQL toolchain.
generate Generate formatted QL documentation.
github Commands useful for interacting with the GitHub API through CodeQL.
```
--license Show the license terms for the CodeQL toolchain.
Common options:
-h, --help Show this help text.
-v, --verbose Incrementally increase the number of progress messages printed.
-q, --quiet Incrementally decrease the number of progress messages printed.
Some advanced options have been hidden; try --help -v for a fuller view.
Commands:
query Compile and execute QL code.
bqrs Get information from .bqrs files.
database Create, analyze and process CodeQL databases.
dataset [Plumbing] Work with raw QL datasets.
test Execute QL unit tests.
resolve [Deep plumbing] Helper commands to resolve disk locations etc.
execute [Deep plumbing] Low-level commands that need special JVM options.
version Show the version of the CodeQL toolchain.
generate Generate formatted QL documentation.
github Commands useful for interacting with the GitHub API through CodeQL.
```
2. Compile your project with CodeQL: For using the Autodict-QL plugin, you need to compile the source of the target you want to fuzz with CodeQL. This is not something hard.
- First you need to create a CodeQL database of the project codebase, suppose we want to compile `libxml` with codeql. Go to libxml and issue the following commands:
- `./configure --disable-shared`
- `codeql create database libxml-db --language=cpp --command=make`
- Now you have the CodeQL database of the project :-)
3. The final step is to update the CodeQL database you created in step 2 (Suppose we are in `aflplusplus/utils/autodict_ql/` directory):
- `codeql database upgrade /home/user/libxml/libxml-db`
2. Compile your project with CodeQL: For using the Autodict-QL plugin, you need
to compile the source of the target you want to fuzz with CodeQL. This is not
something hard.
- First you need to create a CodeQL database of the project codebase, suppose
we want to compile `libxml` with codeql. Go to libxml and issue the
following commands:
- `./configure --disable-shared`
- `codeql create database libxml-db --language=cpp --command=make`
- Now you have the CodeQL database of the project :-)
3. The final step is to update the CodeQL database you created in step 2
(Suppose we are in `aflplusplus/utils/autodict_ql/` directory):
- `codeql database upgrade /home/user/libxml/libxml-db`
4. Everything is set! Now you should issue the following to get the tokens:
- `python3 autodict-ql.py [CURRECT_DIR] [CODEQL_DATABASE_PATH] [TOKEN_PATH]`
- example : `python3 /home/user/AFLplusplus/utils/autodict_ql/autodict-ql.py $PWD /home/user/libxml/libxml-db tokens`
- This will create the final `tokens` dir for you and you are done, then pass the tokens path to AFL++'s `-x` flag.
- `python3 autodict-ql.py [CURRECT_DIR] [CODEQL_DATABASE_PATH] [TOKEN_PATH]`
- example: `python3 /home/user/AFLplusplus/utils/autodict_ql/autodict-ql.py
$PWD /home/user/libxml/libxml-db tokens`
- This will create the final `tokens` dir for you and you are done, then
pass the tokens path to AFL++'s `-x` flag.
5. Done!
## More on dictionaries and tokens
Core developer of the AFL++ project Marc Heuse also developed a similar tool named `dict2file` which is a LLVM pass which can automatically extract useful tokens, in addition with LTO instrumentation mode, this dict2file is automatically generates token extraction. `Autodict-QL` plugin gives you scripting capability and you can do whatever you want to extract from the Codebase and it's up to you. In addition it's independent from LLVM system.
On the other hand, you can also use Google dictionaries which have been made public in May 2020, but the problem of using Google dictionaries is that they are limited to specific file formats and specifications. For example, for testing binutils and ELF file format or AVI in FFMPEG, there are no pre-built dictionaries, so it is highly recommended to use `Autodict-QL` or `Dict2File` features to automatically generate dictionaries based on the target.
Core developer of the AFL++ project Marc Heuse also developed a similar tool
named `dict2file` which is a LLVM pass which can automatically extract useful
tokens, in addition with LTO instrumentation mode, this dict2file is
automatically generates token extraction. `Autodict-QL` plugin gives you
scripting capability and you can do whatever you want to extract from the
Codebase and it's up to you. In addition it's independent from LLVM system. On
the other hand, you can also use Google dictionaries which have been made public
in May 2020, but the problem of using Google dictionaries is that they are
limited to specific file formats and specifications. For example, for testing
binutils and ELF file format or AVI in FFMPEG, there are no pre-built
dictionaries, so it is highly recommended to use `Autodict-QL` or `Dict2File`
features to automatically generate dictionaries based on the target.
I've personally prefered to use `Autodict-QL` or `dict2file` rather than Google dictionaries or any other manually generated dictionaries as `Autodict-QL` and `dict2file` are working based on the target.
In overall, fuzzing with dictionaries and well-generated tokens will give better results.
I've personally preferred to use `Autodict-QL` or `dict2file` rather than Google
dictionaries or any other manually generated dictionaries as `Autodict-QL` and
`dict2file` are working based on the target. In overall, fuzzing with
dictionaries and well-generated tokens will give better results.
There are 2 important points to remember :
There are 2 important points to remember:
- If you combine `Autodict-QL` with AFL++ cmplog, you will get much better code coverage and hence better chances to discover new bugs.
- Do not forget to set `AFL_MAX_DET_EXTRAS` at least to the number of generated dictionaries. If you forget to set this environment variable, then AFL++ uses just 200 tokens and use the rest of them only probabilistically. So this will guarantee that your tokens will be used by AFL++.
- If you combine `Autodict-QL` with AFL++ cmplog, you will get much better code
coverage and hence better chances to discover new bugs.
- Do not forget to set `AFL_MAX_DET_EXTRAS` at least to the number of generated
dictionaries. If you forget to set this environment variable, then AFL++ uses
just 200 tokens and use the rest of them only probabilistically. So this will
guarantee that your tokens will be used by AFL++.

View File

@ -10,8 +10,8 @@ heap-related security bugs in several ways:
subsequent PROT_NONE page, causing most off-by-one reads and writes to
immediately segfault,
- It adds a canary immediately below the allocated buffer, to catch writes
to negative offsets (won't catch reads, though),
- It adds a canary immediately below the allocated buffer, to catch writes to
negative offsets (won't catch reads, though),
- It sets the memory returned by malloc() to garbage values, improving the
odds of crashing when the target accesses uninitialized data,
@ -19,35 +19,34 @@ heap-related security bugs in several ways:
- It sets freed memory to PROT_NONE and does not actually reuse it, causing
most use-after-free bugs to segfault right away,
- It forces all realloc() calls to return a new address - and sets
PROT_NONE on the original block. This catches use-after-realloc bugs,
- It forces all realloc() calls to return a new address - and sets PROT_NONE
on the original block. This catches use-after-realloc bugs,
- It checks for calloc() overflows and can cause soft or hard failures
of alloc requests past a configurable memory limit (AFL_LD_LIMIT_MB,
- It checks for calloc() overflows and can cause soft or hard failures of
alloc requests past a configurable memory limit (AFL_LD_LIMIT_MB,
AFL_LD_HARD_FAIL).
- Optionally, in platforms supporting it, huge pages can be used by passing
USEHUGEPAGE=1 to make.
- Size alignment to `max_align_t` can be enforced with AFL_ALIGNED_ALLOC=1.
In this case, a tail canary is inserted in the padding bytes at the end
of the allocated zone. This reduce the ability of libdislocator to detect
- Size alignment to `max_align_t` can be enforced with AFL_ALIGNED_ALLOC=1. In
this case, a tail canary is inserted in the padding bytes at the end of the
allocated zone. This reduce the ability of libdislocator to detect
off-by-one bugs but also it make slibdislocator compliant to the C standard.
Basically, it is inspired by some of the non-default options available for the
OpenBSD allocator - see malloc.conf(5) on that platform for reference. It is
also somewhat similar to several other debugging libraries, such as gmalloc
and DUMA - but is simple, plug-and-play, and designed specifically for fuzzing
jobs.
also somewhat similar to several other debugging libraries, such as gmalloc and
DUMA - but is simple, plug-and-play, and designed specifically for fuzzing jobs.
Note that it does nothing for stack-based memory handling errors. The
-fstack-protector-all setting for GCC / clang, enabled when using AFL_HARDEN,
can catch some subset of that.
The allocator is slow and memory-intensive (even the tiniest allocation uses up
4 kB of physical memory and 8 kB of virtual mem), making it completely unsuitable
for "production" uses; but it can be faster and more hassle-free than ASAN / MSAN
when fuzzing small, self-contained binaries.
4 kB of physical memory and 8 kB of virtual mem), making it completely
unsuitable for "production" uses; but it can be faster and more hassle-free than
ASAN / MSAN when fuzzing small, self-contained binaries.
To use this library, run AFL++ like so: