mirror of
https://github.com/AFLplusplus/AFLplusplus.git
synced 2025-06-08 08:11:34 +00:00
157 lines
7.6 KiB
Markdown
157 lines
7.6 KiB
Markdown
# Map density
|
|
|
|
## How coverage works
|
|
|
|
The coverage in AFL++ works by assigning each basic block of code a unique ID
|
|
and during execution when transitioning between blocks (e.g., by calls or jumps)
|
|
assigning each of these edges an ID based upon the source and destination block
|
|
ID.
|
|
|
|
For each individual execution of the target, a single dimensional byte array
|
|
indexed by the edge ID is used to count how many times each edge is traversed.
|
|
|
|
A single dimensional cumulative byte array is also constructed where each byte
|
|
again represents an individual edge ID, but this time, the value of the byte
|
|
represents a range of how many times that edge has been traversed.
|
|
|
|
`1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`
|
|
|
|
The theory is that a new path isn't particularly interesting if an edge has been
|
|
traversed `23` instead of `24` times for example, but is interesting if an edge
|
|
has been traversed for the very first time or the number of times fits within a
|
|
different bucket.
|
|
|
|
After each run, the count of times each edge is hit is compared to the values in
|
|
the cumulative map and if it is different, then the input is kept as a new seed
|
|
and the cumulative map is updated.
|
|
|
|
This mechanism is described in greater detail in the seminal
|
|
[paper](https://lcamtuf.coredump.cx/afl/technical_details.txt) on AFL by
|
|
[lcamtuf](https://github.com/lcamtuf).
|
|
|
|
## Collisions
|
|
|
|
In black-box fuzzing, we must assume that control may flow from any block to any
|
|
other block, since we don't know any better. Thus, for a target with `n` basic
|
|
blocks of code, there are `n * n` potential edges. As we can see, even with a
|
|
small number of edges, a very large map will be required so that we have space
|
|
to fit them all. Even if our target only had `1024` blocks, this would require a
|
|
map containing `1048576` entries (or 1Mb in size).
|
|
|
|
Whilst this may not seem like a lot of memory, it causes problems for two
|
|
reasons. Firstly, the processing step after each execution must now process much
|
|
more data, and secondly, a map this size is unlikely to fit within the L2 cache
|
|
of the processor. Since this is a very hot code path, we are likely to pay a
|
|
very heavy performance cost.
|
|
|
|
Therefore, we must accept that not all edges can have a unique ID and that
|
|
therefore there will be collisions. This means that if the fuzzer finds a new
|
|
path by uncovering an edge which was not previously found, but that the same
|
|
edge ID is used by another edge, then it may go completely unnoticed. This is
|
|
obviously undesirable, but equally if our map is too large, then we will not be
|
|
able to process as many potential inputs in the same time and hence not uncover
|
|
edges for that reason. Thus a careful trade-off of map size must be made.
|
|
|
|
## Block & edge numbering
|
|
|
|
Since the original AFL, blocks and edges have always been numbered in the same
|
|
way as we can see from the following C snippet from the whitepaper:
|
|
|
|
```c
|
|
cur_location = (block_address >> 4) ^ (block_address << 8);
|
|
shared_mem[cur_location ^ prev_location]++;
|
|
prev_location = cur_location >> 1;
|
|
```
|
|
|
|
Each block ID is generated by performing a shift and XOR on its address. Then
|
|
the edge ID is calculated as `E = B ^ (B' >> 1)`. Here, we can make two
|
|
observations. In fact, the edge ID is also masked to ensure it is less than the
|
|
size of the map being used.
|
|
|
|
### Block IDs
|
|
|
|
Firstly, the block ID doesn't have very good entropy. If we consider the address
|
|
of the block, then whilst each block has a unique ID, it isn't necessarily very
|
|
evenly distributed.
|
|
|
|
We start with a large address and need to discard a large number of the bits to
|
|
generate a block ID which is within range. But how do we choose the unique bits
|
|
of the address versus those which are the same for every block? The high bits of
|
|
the address may be all `0s` or all `1s` to make the address canonical, the
|
|
middle portion of the address may be the same for all blocks (since if they are
|
|
all within the same binary, then they will all be adjacent in memory), and on
|
|
some systems, even the low bits may have poor entropy as some use fixed length
|
|
aligned instructions. Then we need to consider that a portion of each binary may
|
|
contain the `.data` or `.bss` sections and so may not contain any blocks of code
|
|
at all.
|
|
|
|
### Edge IDs
|
|
|
|
Secondly, we can observe that when we generate an edge ID from the source and
|
|
destination block IDs, we perform a right shift on the source block ID. Whilst
|
|
there are good reasons as set out in the whitepaper why such a transform is
|
|
applied, in doing so, we dispose of `1` bit of precious entropy in our source
|
|
block ID.
|
|
|
|
All together, this means that some edge IDs may be more popular than others.
|
|
This means that some portions of the map may be very densely populated with
|
|
large numbers of edges, whilst others may be very sparsely populated, or not
|
|
populated at all.
|
|
|
|
## Improvements
|
|
|
|
One of the main reasons why this algorithm selected, is performance. All of the
|
|
operations are very quick to perform and given we may be carrying this out for
|
|
every block of code we execute, performance is critical.
|
|
|
|
However, the design of the binary instrumentation modes of AFL++ has moved on.
|
|
Both QEMU and FRIDA modes use a two stage process when executing a target
|
|
application. Each block is first compiled or instrumented, and then it is
|
|
executed. The compiled blocks can be re-used each time the target executes them.
|
|
|
|
Since a blocks ID is based on its address, and this is known at compile time, we
|
|
only need to generate this ID once per block and so this ID generation no longer
|
|
needs to be as performant. We can therefore use a hash algorithm to generate
|
|
this ID and therefore ensure that the block IDs are more evenly distributed.
|
|
|
|
Edge IDs, however, can only be determined at run-time. Since we don't know which
|
|
blocks a given input will traverse until we run it. However, given our block IDs
|
|
are now evenly distributed, generating an evenly distributed edge ID becomes
|
|
simple. Here, the only change we make is to use a rotate operation rather than a
|
|
shift operation so we don't lose a bit of entropy from the source ID.
|
|
|
|
So our new algorithm becomes:
|
|
|
|
```c
|
|
cur_location = hash(block_address)
|
|
shared_mem[cur_location ^ prev_location]++;
|
|
prev_location = rotate(cur_location, 1);
|
|
```
|
|
|
|
Lastly, in the original design, the `cur_location` was always set to `0`, at the
|
|
beginning of a run, we instead set the value of `cur_location` to `hash(0)`.
|
|
|
|
## Parallel fuzzing
|
|
|
|
Another sub-optimal aspect of the original design is that no matter how many
|
|
instances of the fuzzer you ran in parallel, each instance numbered each block
|
|
and so each edge with the same ID. Each instance would therefore find the same
|
|
subset of edges collide with each other. In the event of a collision, all
|
|
instances will hit the same road block.
|
|
|
|
However, if we instead use a different seed for our hashing function for each
|
|
instance, then each will ascribe each block a different ID and hence each edge
|
|
will be given a different edge ID. This means that whilst one instance of the
|
|
fuzzer may find a given pair of edges collide, it is very unlikely that another
|
|
instance will find the same pair also collide.
|
|
|
|
Due to the collaborative nature of parallel fuzzing, this means that whilst one
|
|
instance may struggle to find a particular new path because the new edge
|
|
collides, another instance will likely not encounter the same collision and thus
|
|
be able to differentiate this new path and share it with the other instances.
|
|
|
|
If only a single new edge is found, and the new path is shared with an instance
|
|
for which that edge collides, that instance may disregard it as irrelevant. In
|
|
practice, however, the discovery of a single new edge, likely leads to several
|
|
more edges beneath it also being found and therefore the likelihood of all of
|
|
these being collisions is very slim. |