bug fixes to afl-ld and intensive README.lto.md update on errors and how to do the steps by hand, plus global code format

This commit is contained in:
van Hauser 2020-03-09 08:27:23 +01:00
parent a3161b902e
commit 0581f6ec00
5 changed files with 230 additions and 110 deletions

View File

@ -136,7 +136,8 @@ override CFLAGS = -Wall \
-D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign -I ../include/ \
-DAFL_PATH=\"$(HELPER_PATH)\" -DBIN_PATH=\"$(BIN_PATH)\" \
-DLLVM_BINDIR=\"$(LLVM_BINDIR)\" -DVERSION=\"$(VERSION)\" \
-DLLVM_VERSION=\"$(LLVMVER)\" -DAFL_CLANG_FLTO=\"$(AFL_CLANG_FLTO)\"
-DLLVM_VERSION=\"$(LLVMVER)\" -DAFL_CLANG_FLTO=\"$(AFL_CLANG_FLTO)\" \
-DAFL_REAL_LD=\"$(AFL_REAL_LD)\"
ifdef AFL_TRACE_PC
CFLAGS += -DUSE_TRACE_PC=1
endif
@ -253,7 +254,7 @@ endif
../afl-ld: afl-ld.c
ifneq "$(AFL_CLANG_FLTO)" ""
ifeq "$(LLVM_LTO)" "1"
$(CC) $(CFLAGS) $< -o $@ $(LDFLAGS) -DAFL_REAL_LD=\"$(AFL_REAL_LD)\"
$(CC) $(CFLAGS) $< -o $@ $(LDFLAGS)
ln -sf afl-ld ../ld
@rm -f .test-instr
@-export AFL_QUIET=1 AFL_PATH=.. PATH="..:$(PATH)" ; ../afl-clang-lto -Wl,--afl -o .test-instr ../test-instr.c && echo "[+] afl-clang-lto and afl-ld seem to work fine :)" || echo "[!] WARNING: clang seems to have a hardcoded "'/bin/ld'" - check README.lto"

View File

@ -100,6 +100,32 @@ need to link all llvm IR LTO files does not support this - yet (hopefully).
Hence if you see this error either you have to remove the duplicate global
variable (think `#ifdef` ...) or you are out of luck. :-(
### "expected top-level entity" + binary ouput error
This happens if multiple .a archives are to be linked and they contain the
same object filenames, the first in LTO form, the other in ELF form.
This can not be fixed programmatically, but can be fixed by hand.
You can try to delete the file from either archive
(`llvm-ar d <archive>.a <file>.o`) or performing the llvm-linking, optimizing
and instrumentation by hand (see below).
### "undefined reference to ..."
This *can* be the opposite situation of the "expected top-level entity" error -
the library with the ELF file is before the LTO library.
However it can also be a bug in the program - try to compile it normally. If
fails then it is a bug in the program.
Solutions: You can try to delete the file from either archive, e.g.
(`llvm-ar d <archive>.a <file>.o`) or performing the llvm-linking, optimizing
and instrumentation by hand (see below).
### "File format not recognized"
This happens if the build system has fixed LDFLAGS, CPPFLAGS, CXXFLAGS and/or
CFLAGS. Ensure that they all contain the `-flto` flag that afl-clang-lto was
compiled with (you can see that by typing `afl-clang-lto -h` and inspecting
the last line of the help output) and add them otherwise
### clang is hardcoded to /bin/ld
Some clang packages have 'ld' hardcoded to /bin/ld. This is an issue as this
@ -129,6 +155,53 @@ This can result in two problems though:
When you install an updated gcc/clang/... package, your OS might restore
the ld link.
### Performing the steps by hand
It is possible to perform all the steps afl-ld by hand to workaround issues
in the target.
1. Recompile with AFL_DEBUG=1 and collect the afl-clang-lto command that fails
e.g.: `AFL_DEBUG=1 make 2>&1 | grep afl-clang-lto | tail -n 1`
2. run this command prepended with AFL_DEBUG=1 and collect the afl-ld command
parameters, e.g. `AFL_DEBUG=1 afl-clang-lto[++] .... | grep /afl/ld`
3. for every .a archive you want to instrument unpack it into a seperate
directory, e.g.
`mkdir archive1.dir ; cd archive1.dir ; llvm-link x ../<archive>.a`
4. run `file archive*.dir/*.o` and make two lists, one containing all ELF files
and one containing all LLVM IR bitcode files.
You do the same for all .o files of the ../afl/ld command options
5. Create a single bitcode file by using llvm-link, e.g.
`llvm-link -o all-bitcode.bc <list of all LLVM IR .o files>`
If this fails it is game over - or you modify the source code
6. Run the optimizer on the new bitcode file:
`opt -O3 --polly -o all-optimized.bc all-bitcode.bc`
7. Instrument the optimized bitcode file:
`opt --load=$AFL_PATH/afl-llvm-lto-instrumentation.so --disable-opt --afl-lto all-optimized.bc -o all-instrumented.bc
8. If the parameter `--allow-multiple-definition` is not in the list, add it
as first command line option.
9. Link everything together.
a) You use the afl-ld command and instead of e.g. `/usr/local/lib/afl/ld`
you replace that with `ld`, the real linker.
b) Every .a archive you instrumented files from you remove the <archive>.a
or -l<archive> from the command
c) If you have entries in your ELF files list (see step 4), you put them to
the command line - but them in the same order!
d) put the all-instrumented.bc before the first library or .o file
e) run the command and hope it compiles, if it doesn't you have to analyze
what the issue is and fix that in the approriate step above.
Yes this is long and complicated. That is why there is afl-ld doing this and
that why this can easily fail and not all different ways how it *can* fail can
be implemented ...
### compiling programs still fail
afl-clang-lto is still work in progress.

View File

@ -291,11 +291,11 @@ static void edit_params(u32 argc, char** argv) {
cc_params[cc_par_cnt++] = AFL_PATH;
cc_params[cc_par_cnt++] = lto_flag;
} else
if (getenv("USE_TRACE_PC") || getenv("AFL_USE_TRACE_PC") ||
getenv("AFL_LLVM_USE_TRACE_PC") || getenv("AFL_TRACE_PC")) {
if (getenv("USE_TRACE_PC") || getenv("AFL_USE_TRACE_PC") ||
getenv("AFL_LLVM_USE_TRACE_PC") || getenv("AFL_TRACE_PC")) {
cc_params[cc_par_cnt++] =
"-fsanitize-coverage=trace-pc-guard"; // edge coverage by default
@ -533,7 +533,7 @@ int main(int argc, char** argv, char** envp) {
}
if (strstr(argv[0], "afl-clang-lto") == NULL) callname = "afl-clang-lto";
if (strstr(argv[0], "afl-clang-lto") != NULL) callname = "afl-clang-lto";
if (argc < 2 || strcmp(argv[1], "-h") == 0) {
@ -554,55 +554,66 @@ int main(int argc, char** argv, char** envp) {
#endif /* ^USE_TRACE_PC */
SAYF(
"\n"
"%s[++] [options]\n"
"\n"
"This is a helper application for afl-fuzz. It serves as a drop-in "
"replacement\n"
"for clang, letting you recompile third-party code with the required "
"runtime\n"
"instrumentation. A common use pattern would be one of the "
"following:\n\n"
SAYF(
"\n"
"%s[++] [options]\n"
"\n"
"This is a helper application for afl-fuzz. It serves as a drop-in "
"replacement\n"
"for clang, letting you recompile third-party code with the "
"required "
"runtime\n"
"instrumentation. A common use pattern would be one of the "
"following:\n\n"
" CC=%s/afl-clang-fast ./configure\n"
" CXX=%s/afl-clang-fast++ ./configure\n\n"
" CC=%s/afl-clang-fast ./configure\n"
" CXX=%s/afl-clang-fast++ ./configure\n\n"
"In contrast to the traditional afl-clang tool, this version is "
"implemented as\n"
"an LLVM pass and tends to offer improved performance with slow "
"programs.\n\n"
"In contrast to the traditional afl-clang tool, this version is "
"implemented as\n"
"an LLVM pass and tends to offer improved performance with slow "
"programs.\n\n"
"Environment variables used:\n"
"AFL_CC: path to the C compiler to use\n"
"AFL_CXX: path to the C++ compiler to use\n"
"AFL_PATH: path to instrumenting pass and runtime (afl-llvm-rt.*o)\n"
"AFL_DONT_OPTIMIZE: disable optimization instead of -O3\n"
"AFL_NO_BUILTIN: compile for use with libtokencap.so\n"
"AFL_INST_RATIO: percentage of branches to instrument\n"
"AFL_QUIET: suppress verbose output\n"
"AFL_DEBUG: enable developer debugging output\n"
"AFL_HARDEN: adds code hardening to catch memory bugs\n"
"AFL_USE_ASAN: activate address sanitizer\n"
"AFL_USE_MSAN: activate memory sanitizer\n"
"AFL_USE_UBSAN: activate undefined behaviour sanitizer\n"
"AFL_LLVM_WHITELIST: enable whitelisting (selective instrumentation)\n"
"AFL_LLVM_NOT_ZERO: use cycling trace counters that skip zero\n"
"AFL_LLVM_USE_TRACE_PC: use LLVM trace-pc-guard instrumentation\n"
"AFL_LLVM_LAF_SPLIT_COMPARES: enable cascaded comparisons\n"
"AFL_LLVM_LAF_SPLIT_SWITCHES: casc. comp. in 'switch'\n"
"AFL_LLVM_LAF_TRANSFORM_COMPARES: transform library comparison "
"function calls\n"
" to cascaded comparisons\n"
"AFL_LLVM_LAF_SPLIT_FLOATS: transform floating point comp. to cascaded "
"comp.\n"
"AFL_LLVM_LAF_SPLIT_COMPARES_BITW: size limit (default 8)\n"
"AFL_LLVM_INSTRIM: use light weight instrumentation InsTrim\n"
"AFL_LLVM_INSTRIM_LOOPHEAD: optimize loop tracing for speed\n"
"AFL_LLVM_CMPLOG: log operands of comparisons (RedQueen mutator)\n"
"\nafl-clang-fast was built for llvm %s with the llvm binary path of "
"\"%s\".\n\n",
callname, BIN_PATH, BIN_PATH, LLVM_VERSION, LLVM_BINDIR);
"Environment variables used:\n"
"AFL_CC: path to the C compiler to use\n"
"AFL_CXX: path to the C++ compiler to use\n"
"AFL_PATH: path to instrumenting pass and runtime "
"(afl-llvm-rt.*o)\n"
"AFL_DONT_OPTIMIZE: disable optimization instead of -O3\n"
"AFL_NO_BUILTIN: compile for use with libtokencap.so\n"
"AFL_INST_RATIO: percentage of branches to instrument\n"
"AFL_QUIET: suppress verbose output\n"
"AFL_DEBUG: enable developer debugging output\n"
"AFL_HARDEN: adds code hardening to catch memory bugs\n"
"AFL_USE_ASAN: activate address sanitizer\n"
"AFL_USE_MSAN: activate memory sanitizer\n"
"AFL_USE_UBSAN: activate undefined behaviour sanitizer\n"
"AFL_LLVM_WHITELIST: enable whitelisting (selective "
"instrumentation)\n"
"AFL_LLVM_NOT_ZERO: use cycling trace counters that skip zero\n"
"AFL_LLVM_USE_TRACE_PC: use LLVM trace-pc-guard instrumentation\n"
"AFL_LLVM_LAF_SPLIT_COMPARES: enable cascaded comparisons\n"
"AFL_LLVM_LAF_SPLIT_SWITCHES: casc. comp. in 'switch'\n"
"AFL_LLVM_LAF_TRANSFORM_COMPARES: transform library comparison "
"function calls\n"
" to cascaded comparisons\n"
"AFL_LLVM_LAF_SPLIT_FLOATS: transform floating point comp. to "
"cascaded "
"comp.\n"
"AFL_LLVM_LAF_SPLIT_COMPARES_BITW: size limit (default 8)\n"
"AFL_LLVM_INSTRIM: use light weight instrumentation InsTrim\n"
"AFL_LLVM_INSTRIM_LOOPHEAD: optimize loop tracing for speed\n"
"AFL_LLVM_CMPLOG: log operands of comparisons (RedQueen mutator)\n"
"\nafl-clang-fast was built for llvm %s with the llvm binary path "
"of "
"\"%s\".\n",
callname, BIN_PATH, BIN_PATH, LLVM_VERSION, LLVM_BINDIR);
if (strcmp(callname, "afl-clang-lto") == 0)
SAYF("Compiled with linker target \"%s\" and LTO flags \"%s\"\n",
AFL_REAL_LD, AFL_CLANG_FLTO);
SAYF("\n");
exit(1);
@ -665,3 +676,4 @@ int main(int argc, char** argv, char** envp) {
return 0;
}

View File

@ -44,6 +44,8 @@
#include <dirent.h>
#define MAX_PARAM_COUNT 4096
static u8 **ld_params, /* Parameters passed to the real 'ld' */
**link_params, /* Parameters passed to 'llvm-link' */
**opt_params, /* Parameters passed to 'opt' opt */
@ -145,15 +147,21 @@ int is_llvm_file(const char* file) {
int fd;
u8 buf[5];
if ((fd = open(file, O_RDONLY)) < 0) return 0;
if ((fd = open(file, O_RDONLY)) < 0) {
if (read(fd, buf, sizeof(buf)) != sizeof(buf)) return 0;
if (debug) SAYF(cMGN "[D] " cRST "File %s not found", file);
return 0;
}
if (read(fd, buf, 4) != 4) return 0;
buf[sizeof(buf) - 1] = 0;
close(fd);
if (strncmp(buf, "; Mo", 4) == 0) return 1;
if (buf[0] == 'B' && buf[1] == 'C' && buf[2] == 0xC0 && buf[3] == 0xDE)
if (buf[0] == 'B' && buf[1] == 'C' && buf[2] == 0xc0 && buf[3] == 0xde)
return 1;
return 0;
@ -186,7 +194,7 @@ int is_duplicate(u8** params, u32 ld_param_cnt, u8* ar_file) {
static void edit_params(int argc, char** argv) {
u32 i, have_lto = 0, libdir_index;
u8 libdir_file[4096];
u8 libdir_file[4096];
if (tmp_dir == NULL) {
@ -204,13 +212,13 @@ static void edit_params(int argc, char** argv) {
final_file =
alloc_printf("%s/.afl-%u-%u-3.bc", tmp_dir, getpid(), (u32)time(NULL));
ld_params = ck_alloc((argc + 4096) * sizeof(u8*));
link_params = ck_alloc((argc + 4096) * sizeof(u8*));
ld_params = ck_alloc(4096 * sizeof(u8*));
link_params = ck_alloc(4096 * sizeof(u8*));
inst_params = ck_alloc(12 * sizeof(u8*));
opt_params = ck_alloc(12 * sizeof(u8*));
ld_params[0] = (u8*)real_ld;
ld_params[argc] = 0;
ld_params[ld_param_cnt++] = "--allow-multiple-definition";
link_params[0] = alloc_printf("%s/%s", LLVM_BINDIR, "llvm-link");
link_params[link_param_cnt++] = "-S"; // we create the linked file as .ll
@ -224,6 +232,7 @@ static void edit_params(int argc, char** argv) {
opt_params[opt_param_cnt++] = "--polly";
} else
opt_params[opt_param_cnt++] = "-O0";
// opt_params[opt_param_cnt++] = "-S"; // only when debugging
opt_params[opt_param_cnt++] = linked_file; // input: .ll file
@ -243,11 +252,16 @@ static void edit_params(int argc, char** argv) {
// first we must collect all library search paths
for (i = 1; i < argc; i++)
if (strlen(argv[i]) > 2 && argv[i][0] == '-' && argv[i][1] == 'L')
libdirs[libdir_cnt++] = argv[i] + 2;
libdirs[libdir_cnt++] = argv[i] + 2;
// then we inspect all options to the target linker
for (i = 1; i < argc; i++) {
if (ld_param_cnt >= MAX_PARAM_COUNT || link_param_cnt >= MAX_PARAM_COUNT)
FATAL(
"Too many command line parameters because of unpacking .a archives, "
"this would need to be done by hand ... sorry! :-(");
if (strncmp(argv[i], "-flto", 5) == 0) have_lto = 1;
if (!strcmp(argv[i], "-version")) {
@ -266,23 +280,26 @@ static void edit_params(int argc, char** argv) {
exit(0);
}
// if a -l library is linked and no .so is found but an .a archive is there
// then the archive will be used. So we have to emulate this and check
// if an archive will be used and if yes we will instrument it too
libdir_file[0] = 0;
libdir_index = libdir_cnt;
if (strncmp(argv[i], "-l", 2) == 0 && libdir_cnt > 0 && strncmp(argv[i], "-lgcc", 5) != 0) {
if (strncmp(argv[i], "-l", 2) == 0 && libdir_cnt > 0 &&
strncmp(argv[i], "-lgcc", 5) != 0) {
u8 found = 0;
for (uint32_t j = 0; j < libdir_cnt && !found; j++) {
snprintf(libdir_file, sizeof(libdir_file), "%s/lib%s%s", libdirs[j], argv[i] + 2, ".so");
if (access(libdir_file, R_OK) != 0) { // no .so found?
snprintf(libdir_file, sizeof(libdir_file), "%s/lib%s%s", libdirs[j],
argv[i] + 2, ".so");
if (access(libdir_file, R_OK) != 0) { // no .so found?
snprintf(libdir_file, sizeof(libdir_file), "%s/lib%s%s", libdirs[j], argv[i] + 2, ".a");
if (access(libdir_file, R_OK) == 0) { // but .a found?
snprintf(libdir_file, sizeof(libdir_file), "%s/lib%s%s", libdirs[j],
argv[i] + 2, ".a");
if (access(libdir_file, R_OK) == 0) { // but .a found?
libdir_index = j;
found = 1;
@ -294,16 +311,18 @@ static void edit_params(int argc, char** argv) {
found = 1;
if (debug) SAYF(cMGN "[D] " cRST "Found %s\n", libdir_file);
}
}
}
// is the parameter an .a AR archive? If so, unpack and check its files
if (libdir_index < libdir_cnt || (argv[i][0] != '-' && strlen(argv[i]) > 2 &&
argv[i][strlen(argv[i]) - 1] == 'a' && argv[i][strlen(argv[i]) - 2] == '.')) {
if (libdir_index < libdir_cnt ||
(argv[i][0] != '-' && strlen(argv[i]) > 2 &&
argv[i][strlen(argv[i]) - 1] == 'a' &&
argv[i][strlen(argv[i]) - 2] == '.')) {
// This gets a bit odd. I encountered several .a files being linked and
// where the same "foo.o" was in both .a archives. llvm-link does not
@ -317,8 +336,7 @@ static void edit_params(int argc, char** argv) {
DIR* arx;
struct dirent* dir_ent;
if (libdir_index < libdir_cnt)
file = libdir_file;
if (libdir_index < libdir_cnt) file = libdir_file;
if (ar_dir_cnt == 0) { // first archive, we setup up the basics
@ -376,7 +394,7 @@ static void edit_params(int argc, char** argv) {
if (dir_ent->d_name[strlen(dir_ent->d_name) - 1] == 'o' &&
dir_ent->d_name[strlen(dir_ent->d_name) - 2] == '.') {
if (passthrough || argv[i][0] == '-' || is_llvm_file(ar_file) == 0) {
if (passthrough || is_llvm_file(ar_file) == 0) {
if (is_duplicate(ld_params, ld_param_cnt, ar_file) == 0) {
@ -428,7 +446,7 @@ static void edit_params(int argc, char** argv) {
ld_params[ld_param_cnt++] = "-plugin-opt=O2";
else
ld_params[ld_param_cnt++] = argv[i];
} else {
if (we_link == 0) { // we have to honor order ...
@ -618,7 +636,11 @@ int main(int argc, char** argv) {
edit_params(argc, argv); // here most of the magic happens :-)
if (debug) SAYF(cMGN "[D] " cRST "param counts: ar:%u lib:%u ld:%u link:%u opt:%u instr:%u\n", ar_dir_cnt, libdir_cnt, ld_param_cnt, link_param_cnt, opt_param_cnt, inst_param_cnt);
if (debug)
SAYF(cMGN "[D] " cRST
"param counts: ar:%u lib:%u ld:%u link:%u opt:%u instr:%u\n",
ar_dir_cnt, libdir_cnt, ld_param_cnt, link_param_cnt, opt_param_cnt,
inst_param_cnt);
if (!just_version) {
@ -650,15 +672,25 @@ int main(int argc, char** argv) {
if (pid < 0) PFATAL("fork() failed");
if (waitpid(pid, &status, 0) <= 0) PFATAL("waitpid() failed");
if (WEXITSTATUS(status) != 0) {
SAYF(bSTOP RESET_G1 CURSOR_SHOW cRST cLRD \
"\n[-] PROGRAM ABORT : " cRST);
SAYF( "llvm-link failed, if this is because of a \"linking globals\n"
" named '...': symbol multiply defined\" error then there is nothing we can do -\n"
"llvm-link is missing an important feature :-(\n\n");
if (WEXITSTATUS(status) != 0) {
SAYF(bSTOP RESET_G1 CURSOR_SHOW cRST cLRD
"\n[-] PROGRAM ABORT : " cRST);
SAYF(
"llvm-link failed! Probable causes:\n\n"
" #1 If the error is \"linking globals named '...': symbol "
"multiply defined\"\n"
" then there is nothing we can do - llvm-link is missing an "
"important feature\n\n"
" #2 If the error is \"expected top-level entity\" and then "
"binary output, this\n"
" is because the same file is present in different .a archives "
"in different\n"
" formats. This can be fixed by manual doing the steps afl-ld "
"is doing but\n"
" programmatically - sorry!\n\n");
exit(WEXITSTATUS(status));
}
/* then we perform an optimization on the collected objects files */

View File

@ -99,6 +99,7 @@ class AFLLTOPass : public ModulePass {
AU.addRequired<LoopInfoWrapperPass>();
}
#endif
// Calculate the number of average collisions that would occur if all
@ -120,29 +121,30 @@ class AFLLTOPass : public ModulePass {
// Get the internal llvm name of a basic block
// This is an ugly debug support so it is commented out :-)
/*
static char *getBBName(const BasicBlock *BB) {
/*
static char *getBBName(const BasicBlock *BB) {
static char *name;
static char *name;
if (!BB->getName().empty()) {
name = strdup(BB->getName().str().c_str());
return name;
}
std::string Str;
raw_string_ostream OS(Str);
BB->printAsOperand(OS, false);
name = strdup(OS.str().c_str());
if (!BB->getName().empty()) {
name = strdup(BB->getName().str().c_str());
return name;
}
std::string Str;
raw_string_ostream OS(Str);
BB->printAsOperand(OS, false);
name = strdup(OS.str().c_str());
return name;
}
*/
*/
static bool isBlacklisted(const Function *F) {
@ -236,8 +238,7 @@ bool AFLLTOPass::runOnModule(Module &M) {
for (succ_iterator SI = succ_begin(&BB), SE = succ_end(&BB); SI != SE;
++SI)
if ((*SI)->size() > 0)
succ++;
if ((*SI)->size() > 0) succ++;
if (succ < 2) // no need to instrument
continue;
@ -249,13 +250,13 @@ bool AFLLTOPass::runOnModule(Module &M) {
if (InsBlocks.size() > 0) {
uint32_t i = InsBlocks.size();
do {
--i;
BasicBlock *origBB = &(*InsBlocks[i]);
BasicBlock * origBB = &(*InsBlocks[i]);
std::vector<BasicBlock *> Successors;
Instruction *TI = origBB->getTerminator();
Instruction * TI = origBB->getTerminator();
for (succ_iterator SI = succ_begin(origBB), SE = succ_end(origBB);
SI != SE; ++SI) {
@ -267,7 +268,7 @@ bool AFLLTOPass::runOnModule(Module &M) {
if (TI == NULL || TI->getNumSuccessors() < 2) continue;
//if (Successors.size() != TI->getNumSuccessors())
// if (Successors.size() != TI->getNumSuccessors())
// FATAL("Different successor numbers %lu <-> %u\n", Successors.size(),
// TI->getNumSuccessors());
@ -396,7 +397,8 @@ bool AFLLTOPass::runOnModule(Module &M) {
getenv("AFL_USE_ASAN") ? ", ASAN" : "",
getenv("AFL_USE_MSAN") ? ", MSAN" : "",
getenv("AFL_USE_UBSAN") ? ", UBSAN" : "");
OKF("Instrumented %u locations with no collisions (on average %llu collisions would be in afl-gcc/afl-clang-fast) (%s mode).",
OKF("Instrumented %u locations with no collisions (on average %llu "
"collisions would be in afl-gcc/afl-clang-fast) (%s mode).",
inst_blocks, calculateCollisions(inst_blocks), modeline);
}