mirror of
https://github.com/AFLplusplus/AFLplusplus.git
synced 2025-06-09 08:41:32 +00:00
Edit FRIDA mode MD files
This commit is contained in:
parent
133fba724a
commit
f1d2332657
@ -1,8 +1,9 @@
|
||||
# Map Density
|
||||
# Map density
|
||||
|
||||
## How coverage works
|
||||
|
||||
# How Coverage Works
|
||||
The coverage in AFL++ works by assigning each basic block of code a unique ID
|
||||
and during execution when transitioning between blocks (e.g. by calls or jumps)
|
||||
and during execution when transitioning between blocks (e.g., by calls or jumps)
|
||||
assigning each of these edges an ID based upon the source and destination block
|
||||
ID.
|
||||
|
||||
@ -13,11 +14,12 @@ A single dimensional cumulative byte array is also constructed where each byte
|
||||
again represents an individual edge ID, but this time, the value of the byte
|
||||
represents a range of how many times that edge has been traversed.
|
||||
|
||||
```1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+```
|
||||
`1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`
|
||||
|
||||
The theory is that a new path isn't particularly interesting if an edge has been
|
||||
traversed `23` instead of `24` times for example, but is interesting if an edge
|
||||
has been traversed for the very first time, or the number of times fits within a different bucket.
|
||||
has been traversed for the very first time or the number of times fits within a
|
||||
different bucket.
|
||||
|
||||
After each run, the count of times each edge is hit is compared to the values in
|
||||
the cumulative map and if it is different, then the input is kept as a new seed
|
||||
@ -27,19 +29,22 @@ This mechanism is described in greater detail in the seminal
|
||||
[paper](https://lcamtuf.coredump.cx/afl/technical_details.txt) on AFL by
|
||||
[lcamtuf](https://github.com/lcamtuf).
|
||||
|
||||
# Collisions
|
||||
## Collisions
|
||||
|
||||
In black-box fuzzing, we must assume that control may flow from any block to any
|
||||
other block, since we don't know any better. Thus for a target with `n` basic
|
||||
other block, since we don't know any better. Thus, for a target with `n` basic
|
||||
blocks of code, there are `n * n` potential edges. As we can see, even with a
|
||||
small number of edges, a very large map will be required so that we have space
|
||||
to fit them all. Even if our target only had `1024` blocks, this would require a
|
||||
map containing `1048576` entries (or 1Mb in size).
|
||||
|
||||
Whilst this may not seem like a lot of memory, it causes problems for two reasons. Firstly, the processing step after each execution must now process much more
|
||||
data, and secondly a map this size is unlikely to fit within the L2 cache of the processor. Since this is a very hot code path, we are likely to pay a very heavy
|
||||
performance cost.
|
||||
Whilst this may not seem like a lot of memory, it causes problems for two
|
||||
reasons. Firstly, the processing step after each execution must now process much
|
||||
more data, and secondly, a map this size is unlikely to fit within the L2 cache
|
||||
of the processor. Since this is a very hot code path, we are likely to pay a
|
||||
very heavy performance cost.
|
||||
|
||||
Therefore, we must accept that not all edges can have a unique and that
|
||||
Therefore, we must accept that not all edges can have a unique ID and that
|
||||
therefore there will be collisions. This means that if the fuzzer finds a new
|
||||
path by uncovering an edge which was not previously found, but that the same
|
||||
edge ID is used by another edge, then it may go completely unnoticed. This is
|
||||
@ -47,15 +52,15 @@ obviously undesirable, but equally if our map is too large, then we will not be
|
||||
able to process as many potential inputs in the same time and hence not uncover
|
||||
edges for that reason. Thus a careful trade-off of map size must be made.
|
||||
|
||||
# Block & Edge Numbering
|
||||
## Block & edge numbering
|
||||
|
||||
Since the original AFL, blocks and edges have always been numbered in the same
|
||||
way as we can see from the following C snippet from the whitepaper.
|
||||
way as we can see from the following C snippet from the whitepaper:
|
||||
|
||||
```c
|
||||
cur_location = (block_address >> 4) ^ (block_address << 8);
|
||||
shared_mem[cur_location ^ prev_location]++;
|
||||
prev_location = cur_location >> 1;
|
||||
|
||||
cur_location = (block_address >> 4) ^ (block_address << 8);
|
||||
shared_mem[cur_location ^ prev_location]++;
|
||||
prev_location = cur_location >> 1;
|
||||
```
|
||||
|
||||
Each block ID is generated by performing a shift and XOR on its address. Then
|
||||
@ -63,15 +68,16 @@ the edge ID is calculated as `E = B ^ (B' >> 1)`. Here, we can make two
|
||||
observations. In fact, the edge ID is also masked to ensure it is less than the
|
||||
size of the map being used.
|
||||
|
||||
## Block IDs
|
||||
### Block IDs
|
||||
|
||||
Firstly, the block ID doesn't have very good entropy. If we consider the address
|
||||
of the block, then whilst each block has a unique ID, it isn't necessarily very
|
||||
evenly distributed.
|
||||
|
||||
We start with a large address, and need to discard a large number of the bits to
|
||||
We start with a large address and need to discard a large number of the bits to
|
||||
generate a block ID which is within range. But how do we choose the unique bits
|
||||
of the address verus those which are the same for every block? The high bits of
|
||||
the address may simply be all `0s` or all `1s` to make the address cannonical,
|
||||
of the address versus those which are the same for every block? The high bits of
|
||||
the address may simply be all `0s` or all `1s` to make the address canonical,
|
||||
the middle portion of the address may be the same for all blocks (since if they
|
||||
are all within the same binary, then they will all be adjacent in memory), and
|
||||
on some systems, even the low bits may have poor entropy as some use fixed
|
||||
@ -79,20 +85,22 @@ length aligned instructions. Then we need to consider that a portion of each
|
||||
binary may contain the `.data` or `.bss` sections and so may not contain any
|
||||
blocks of code at all.
|
||||
|
||||
## Edge IDs
|
||||
### Edge IDs
|
||||
|
||||
Secondly, we can observe that when we generate an edge ID from the source and
|
||||
destination block IDs, we perform a right shift on the source block ID. Whilst
|
||||
there are good reasons as set out in the whitepaper why such a transform is
|
||||
applied, in so doing, we dispose of `1` bit of precious entropy in our source
|
||||
applied, in doing so, we dispose of `1` bit of precious entropy in our source
|
||||
block ID.
|
||||
|
||||
All together, this means that some edge IDs may be more popular than others.
|
||||
This means that some portions of the map may be very densly populated with large
|
||||
numbers of edges, whilst others may be very sparsely populated, or not populated
|
||||
at all.
|
||||
This means that some portions of the map may be very densely populated with
|
||||
large numbers of edges, whilst others may be very sparsely populated, or not
|
||||
populated at all.
|
||||
|
||||
# Improvements
|
||||
One of the main reaons why this algorithm selected, is performance. All of the
|
||||
## Improvements
|
||||
|
||||
One of the main reasons why this algorithm selected, is performance. All of the
|
||||
operations are very quick to perform and given we may be carrying this out for
|
||||
every block of code we execute, performance is critical.
|
||||
|
||||
@ -106,23 +114,25 @@ only need to generate this ID once per block and so this ID generation no longer
|
||||
needs to be as performant. We can therefore use a hash algorithm to generate
|
||||
this ID and therefore ensure that the block IDs are more evenly distributed.
|
||||
|
||||
Edge IDs however, can only be determined at run-time. Since we don't know which
|
||||
Edge IDs, however, can only be determined at run-time. Since we don't know which
|
||||
blocks a given input will traverse until we run it. However, given our block IDs
|
||||
are now evenly distributed, generating an evenly distributed edge ID becomes
|
||||
simple. Here, the only change we make is to use a rotate operation rather than
|
||||
a shift operation so we don't lose a bit of entropy from the source ID.
|
||||
simple. Here, the only change we make is to use a rotate operation rather than a
|
||||
shift operation so we don't lose a bit of entropy from the source ID.
|
||||
|
||||
So our new algorithm becomes:
|
||||
|
||||
```c
|
||||
cur_location = hash(block_address)
|
||||
shared_mem[cur_location ^ prev_location]++;
|
||||
prev_location = rotate(cur_location, 1);
|
||||
cur_location = hash(block_address)
|
||||
shared_mem[cur_location ^ prev_location]++;
|
||||
prev_location = rotate(cur_location, 1);
|
||||
```
|
||||
|
||||
Lastly, in the original design, the `cur_location` was always set to `0`, at the
|
||||
beginning of a run, we instead set the value of `cur_location` to `hash(0)`.
|
||||
|
||||
# Parallel Fuzzing
|
||||
## Parallel fuzzing
|
||||
|
||||
Another sub-optimal aspect of the original design is that no matter how many
|
||||
instances of the fuzzer you ran in parallel, each instance numbered each block
|
||||
and so each edge with the same ID. Each instance would therefore find the same
|
||||
@ -144,4 +154,4 @@ If only a single new edge is found, and the new path is shared with an instance
|
||||
for which that edge collides, that instance may disregard it as irrelevant. In
|
||||
practice, however, the discovery of a single new edge, likely leads to several
|
||||
more edges beneath it also being found and therefore the likelihood of all of
|
||||
these being collisions is very slim.
|
||||
these being collisions is very slim.
|
@ -1,25 +1,32 @@
|
||||
# Scripting
|
||||
|
||||
FRIDA now supports the ability to configure itself using JavaScript. This allows
|
||||
the user to make use of the convenience of FRIDA's scripting engine (along with
|
||||
it's support for debug symbols and exports) to configure all of the things which
|
||||
were traditionally configured using environment variables.
|
||||
|
||||
By default FRIDA mode will look for the file `afl.js` in the current working
|
||||
By default, FRIDA mode will look for the file `afl.js` in the current working
|
||||
directory of the target. Alternatively, a script file can be configured using
|
||||
the environment variable `AFL_FRIDA_JS_SCRIPT`.
|
||||
|
||||
This script can make use of all of the standard [frida api functions](https://frida.re/docs/javascript-api/), but FRIDA mode adds some additional functions to allow
|
||||
you to interact with FRIDA mode itself. These can all be accessed via the global
|
||||
`Afl` parameter. e.g. `Afl.print("HELLO WORLD");`,
|
||||
This script can make use of all of the standard [frida api
|
||||
functions](https://frida.re/docs/javascript-api/), but FRIDA mode adds some
|
||||
additional functions to allow you to interact with FRIDA mode itself. These can
|
||||
all be accessed via the global `Afl` parameter, e.g., `Afl.print("HELLO
|
||||
WORLD");`.
|
||||
|
||||
If you encounter a problem with your script, then you should set the environment
|
||||
variable `AFL_DEBUG_CHILD=1` to view any diagnostic information.
|
||||
|
||||
## Example
|
||||
|
||||
# Example
|
||||
Most of the time, users will likely be wanting to call the functions which configure an address (e.g. for the entry point, or the persistent address).
|
||||
Most of the time, users will likely be wanting to call the functions which
|
||||
configure an address (e.g., for the entry point or the persistent address).
|
||||
|
||||
The example below uses the API [`DebugSymbol.fromName()`](https://frida.re/docs/javascript-api/#debugsymbol). Another use API is [`Module.getExportByName()`](https://frida.re/docs/javascript-api/#module).
|
||||
The example below uses the API
|
||||
[`DebugSymbol.fromName()`](https://frida.re/docs/javascript-api/#debugsymbol).
|
||||
Another use API is
|
||||
[`Module.getExportByName()`](https://frida.re/docs/javascript-api/#module).
|
||||
|
||||
```js
|
||||
/* Use Afl.print instead of console.log */
|
||||
@ -86,9 +93,9 @@ Afl.done();
|
||||
Afl.print("done");
|
||||
```
|
||||
|
||||
# Stripped Binaries
|
||||
## Stripped binaries
|
||||
|
||||
Lastly, if the binary you attempting to fuzz has no symbol information, and no
|
||||
Lastly, if the binary you attempting to fuzz has no symbol information and no
|
||||
exports, then the following approach can be used.
|
||||
|
||||
```js
|
||||
@ -98,11 +105,12 @@ const address = module.base.add(0xdeadface);
|
||||
Afl.setPersistentAddress(address);
|
||||
```
|
||||
|
||||
# Persisent Hook
|
||||
## Persistent hook
|
||||
|
||||
A persistent hook can be implemented using a conventional shared object, sample
|
||||
source code for a hook suitable for the prototype of `LLVMFuzzerTestOneInput`
|
||||
can be found [here](hook/hook.c). This can be configured using code similar to
|
||||
the following.
|
||||
can be found in [hook/hook.c](hook/hook.c). This can be configured using code
|
||||
similar to the following.
|
||||
|
||||
```js
|
||||
const path = Afl.module.path;
|
||||
@ -112,7 +120,8 @@ const hook = mod.getExportByName('afl_persistent_hook');
|
||||
Afl.setPersistentHook(hook);
|
||||
```
|
||||
|
||||
Alternatively, the hook can be provided by using FRIDAs built in support for `CModule`, powered by TinyCC.
|
||||
Alternatively, the hook can be provided by using FRIDA's built-in support for
|
||||
`CModule`, powered by TinyCC.
|
||||
|
||||
```js
|
||||
const cm = new CModule(`
|
||||
@ -134,8 +143,10 @@ const cm = new CModule(`
|
||||
Afl.setPersistentHook(cm.afl_persistent_hook);
|
||||
```
|
||||
|
||||
# Advanced Persistence
|
||||
## Advanced persistence
|
||||
|
||||
Consider the following target code...
|
||||
|
||||
```c
|
||||
|
||||
#include <fcntl.h>
|
||||
@ -281,14 +292,15 @@ Afl.done();
|
||||
Here, we replace the function `slow` with our own code. This code is then
|
||||
selected as the entry point as well as the persistent loop address.
|
||||
|
||||
## Replacing LLVMFuzzerTestOneInput
|
||||
The function `LLVMFuzzerTestOneInput` can be replaced just like any other. Also
|
||||
### Replacing LLVMFuzzerTestOneInput
|
||||
|
||||
The function `LLVMFuzzerTestOneInput` can be replaced just like any other. Also,
|
||||
any replaced function can also call itself. In the example below, we replace
|
||||
`LLVMFuzzerTestOneInput` with `My_LLVMFuzzerTestOneInput` which ignores the
|
||||
parameters `buf` and `len` and then calls the original `LLVMFuzzerTestOneInput`
|
||||
with the paramaters `__afl_fuzz_ptr` and `__afl_fuzz_len`. This allows us to
|
||||
with the parameters `__afl_fuzz_ptr` and `__afl_fuzz_len`. This allows us to
|
||||
carry out in-memory fuzzing without the need for any hook function. It should be
|
||||
noted that the replacement function and the original can *NOT* share the same
|
||||
noted that the replacement function and the original *CANNOT* share the same
|
||||
name, since otherwise the `C` code in the `CModule` will not compile due to a
|
||||
symbol name collision.
|
||||
|
||||
@ -320,7 +332,8 @@ Afl.setInMemoryFuzzing();
|
||||
Interceptor.replace(LLVMFuzzerTestOneInput, cm.My_LLVMFuzzerTestOneInput);
|
||||
```
|
||||
|
||||
## Hooking `main`
|
||||
### Hooking `main`
|
||||
|
||||
Lastly, it should be noted that using FRIDA mode's scripting support to hook
|
||||
the `main` function is a special case. This is because the `main` function is
|
||||
already hooked by the FRIDA mode engine itself and hence the function `main` (or
|
||||
@ -359,14 +372,16 @@ Afl.setPersistentAddress(cm.main);
|
||||
Afl.setInMemoryFuzzing();
|
||||
Afl.setJsMainHook(cm.main);
|
||||
```
|
||||
## Library Fuzzing
|
||||
|
||||
### Library Fuzzing
|
||||
|
||||
It doesn't take too much imagination to see that the above example can be
|
||||
extended to use FRIDA's `Module.load` API so that the replaced `main` function
|
||||
can then call an arbitrary function. In this way, if we have a library which we
|
||||
wish to fuzz rather than an execuatble, then a surrogate executable can be used.
|
||||
wish to fuzz rather than an executable, then a surrogate executable can be used.
|
||||
|
||||
## Patching
|
||||
|
||||
# Patching
|
||||
Consider the [following](test/js/test2.c) test code...
|
||||
|
||||
```c
|
||||
@ -498,7 +513,7 @@ int main(int argc, char **argv) {
|
||||
There are a couple of obstacles with our target application. Unlike when fuzzing
|
||||
source code, though, we can't simply edit it and recompile it. The following
|
||||
script shows how we can use the normal functionality of FRIDA to modify any
|
||||
troublesome behaviour.
|
||||
troublesome behavior.
|
||||
|
||||
```js
|
||||
Afl.print('******************');
|
||||
@ -537,8 +552,10 @@ Afl.done();
|
||||
Afl.print("done");
|
||||
```
|
||||
|
||||
# Advanced Patching
|
||||
## Advanced patching
|
||||
|
||||
Consider the following code fragment...
|
||||
|
||||
```c
|
||||
extern void some_boring_bug2(char c);
|
||||
|
||||
@ -565,7 +582,7 @@ void LLVMFuzzerTestOneInput(char *buf, int len) {
|
||||
}
|
||||
```
|
||||
|
||||
Rather than using FRIDAs `Interceptor.replace` or `Interceptor.attach` APIs, it
|
||||
Rather than using FRIDA's `Interceptor.replace` or `Interceptor.attach` APIs, it
|
||||
is possible to apply much more fine grained modification to the target
|
||||
application by means of using the Stalker APIs.
|
||||
|
||||
@ -649,39 +666,43 @@ Afl.setStalkerCallback(cm.js_stalker_callback)
|
||||
Afl.setStdErr("/tmp/stderr.txt");
|
||||
```
|
||||
|
||||
Note that you will more likely want to find the
|
||||
patch address by using:
|
||||
Note that you will more likely want to find the patch address by using:
|
||||
|
||||
```js
|
||||
const module = Process.getModuleByName('target.exe');
|
||||
/* Hardcoded offset within the target image */
|
||||
const address = module.base.add(0xdeadface);
|
||||
```
|
||||
|
||||
OR
|
||||
|
||||
```
|
||||
const address = DebugSymbol.fromName("my_function").address.add(0xdeadface);
|
||||
```
|
||||
|
||||
OR
|
||||
|
||||
```
|
||||
const address = Module.getExportByName(null, "my_function").add(0xdeadface);
|
||||
```
|
||||
|
||||
The function `js_stalker_callback` should return `TRUE` if the original
|
||||
instruction should be emitted in the instrumented code, or `FALSE` otherwise.
|
||||
In the example above, we can see it is replaced with a `NOP`.
|
||||
instruction should be emitted in the instrumented code or `FALSE` otherwise. In
|
||||
the example above, we can see it is replaced with a `NOP`.
|
||||
|
||||
Lastly, note that the same callback will be called when compiling instrumented
|
||||
code both in the child of the forkserver (as it is executed) and also in the
|
||||
parent of the forserver (when prefetching is enabled) so that it can be
|
||||
parent of the forkserver (when prefetching is enabled) so that it can be
|
||||
inherited by the next forked child. It is **VERY** important that the same
|
||||
instructions be generated in both the parent and the child, or if prefetching is
|
||||
instructions be generated in both the parent and the child or if prefetching is
|
||||
disabled that the same instructions are generated every time the block is
|
||||
compiled. Failure to do so will likely lead to bugs which are incredibly
|
||||
difficult to diagnose. The code above only prints the instructions when running
|
||||
in the parent process (the one provided by `Process.id` when the JS script is
|
||||
executed).
|
||||
|
||||
# OSX
|
||||
## OSX
|
||||
|
||||
Note that the JavaScript debug symbol api for OSX makes use of the
|
||||
`CoreSymbolication` APIs and as such the `CoreFoundation` module must be loaded
|
||||
into the target to make use of it. This can be done by setting:
|
||||
@ -691,10 +712,11 @@ AFL_PRELOAD=/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation
|
||||
```
|
||||
|
||||
It should be noted that `CoreSymbolication` API may take a while to initialize
|
||||
and build its caches. For this reason, it may be nescessary to also increase the
|
||||
and build its caches. For this reason, it may be necessary to also increase the
|
||||
value of the `-t` flag passed to `afl-fuzz`.
|
||||
|
||||
# API
|
||||
## API
|
||||
|
||||
```js
|
||||
class Afl {
|
||||
/**
|
||||
@ -973,4 +995,4 @@ class Afl {
|
||||
return Afl.module.getExportByName(name);
|
||||
}
|
||||
}
|
||||
```
|
||||
```
|
Loading…
x
Reference in New Issue
Block a user