Commit Graph

571 Commits

Author SHA1 Message Date
37f06bb324 handle libfuzzer fuzzing non-zero exits better (#381)
When running libfuzzer in 'fuzzing' mode, we expect the following on exit.

If the exit code is zero, crashing input isn't required.  This happens if the user specifies '-runs=N'

If the exit code is non-zero, then crashes are expected.  In practice, there are two causes to non-zero exits.
1. If the binary can't execute for some reason, like a missing prerequisite
2. If the binary _can_ execute, sometimes the sanitizers are put in such a bad place that they are unable to record the input that caused the crash.

This PR enables handling these two non-zero exit cases.

1. Optionally verify the libfuzzer target loads appropriately using `target_exe -help=1`.  This allows failing faster in the common issues, such a missing prerequisite library.
2. Optionally allow non-zero exits without crashes to be a warning, rather than a task failure.
2021-01-05 14:40:15 +00:00
75d2ffd7f4 lint test utils (#395) 2021-01-05 08:50:52 -05:00
014cb5bcfd Re-adds POST for node endpoint (#412)
Re-adds the POST method for the `node` endpoint, which got accidentally dropped.
2021-01-05 10:49:20 +00:00
4d9abe936b increase function timeout to 15 minutes (#384) 2021-01-04 20:55:15 -05:00
365722c5fa upgrade AFL++ to 3.00b (#393)
Update the version of AFL++ provided in OneFuzz to 3.00b, which was released yesterday.
2021-01-05 00:42:52 +00:00
e51d7affb7 Fixes race condition of a libfuzzer coverage without inputs (#403)
This fixes an issue running a libfuzzer coverage task and don't have any initial seeds (or there are seeds found by the fuzzer by the time the task starts), it will fail.
2021-01-05 00:05:13 +00:00
ce32981b1b address clippy issues in proxy-manager (#410) 2021-01-04 22:33:42 +00:00
1b1af1f84f log stdout & stderr lines for supervisor & generator (#400)
This fixes #371 and #372.
2021-01-04 21:53:49 +00:00
f8f7e28aa2 add 'onefuzz debug log tail' (#401)
Adds `onefuzz debug log tail <keyword>`, which enables performing the same component in `onefuzz debug log keyword <keyword>` in a loop.  

Optimizations:
* only returns the N records at a time (default 1000)
* each query only returns records that occur after the latest record received.
* If no results are returned, waits 10s before retrying
* Increases the wait time by 1.5x until the wait time is larger than 60s

Using `--filter` provides the ability to filter each record that comes back via jmespath.

Example uses:

Monitor any log messages (which ignores metrics) for a given job_id GUID
```
onefuzz debug logs tail bf4efdfd-685c-444a-81c5-d911477433ae --filter message
```

Log the job_id and task_id for each new unique report:
```
onefuzz debug logs tail new_unique_report --filter '[customDimensions.job_id, customDimensions.task_id]'
```

Log the job_id and task_id for each new unique report only for the specific job_id:
```
onefuzz debug logs tail "new_unique_report d5bcd4d2-4dab-49d5-a215-66db94fb0309" --filter '[customDimensions.job_id, customDimensions.task_id]'
```
2021-01-04 21:08:27 +00:00
29c7cfbd5d filter out deleted nodes as to prevent them from being saved later (#391)
In `Scaleset.cleanup_nodes`, nodes that are no longer part of the scaleset should get deleted.  Without filtering the list, the nodes could get re-saved to the Node table later on.
2021-01-04 20:28:57 +00:00
4c2679d61e Re-add windows ssh key (#390)
Adds a scaleset specific setup script, which allows us to save the scaleset based SSH keys into the VM on setup.
2021-01-04 19:52:27 +00:00
3441790322 add delayed start to heartbeats (#387)
Adds a random initial jitter the size of the heartbeat periodicity to prevent heartbeats storming the service when we launch 3000 nodes roughly at the same time.

Fixes #386
2021-01-04 18:50:02 +00:00
d038cca1e1 Verify a workset only exists along with a reboot context (#378)
Adds the following:

1. Serializes a workset to disk during setup.
2. Upon deserializing a RebootContext, delete the file from disk (We support rebooting once and only once)
3. Check if a workset exists with a RebootContext
    1. If True, continuing processing
    2. if False, mark the tasks & node as "Done" with appropriate errors via:
        1. send WorkerEvent::Done events for each of the tasks in the work set
        2. send StateUpdateEvent::Done for the node
2021-01-04 17:51:20 +00:00
36b3e2a5aa disable py-cache prior to mypy on cli (#408) 2021-01-04 11:49:28 -05:00
e222b01003 update rust prereqs (#396) 2020-12-16 07:38:37 -05:00
6dc7b78447 support ASAN odr-violation outputs (#380) 2020-12-10 15:48:15 -05:00
7f5673eb21 handle non-utf8 from libfuzzer stderr (#379) 2020-12-10 15:13:14 -05:00
56090cb01d Demonstrate a more complex template management (#366)
Add a job_template example that demonstrates customization of the arguments to the job. 

This example demonstrates setting the Area and Iteration paths for Azure Devops work items.
2020-12-05 12:30:37 +00:00
69fc9f508b fix clippy issue (#367) 2020-12-04 15:04:29 -05:00
f1b4efc5ff Add troubleshooting guide for the registration issue at deployment (#362) 2020-12-02 18:54:29 -05:00
1d49f27961 Release 1.10.0 (#365) 1.10.0 2020-12-02 17:48:27 -05:00
203bc22756 Allow unmaintained memmap (#364) 2020-12-02 15:34:22 -05:00
fd131c63bf Document managing declarative templates (#361) 2020-12-02 14:18:45 -05:00
b81c6fa89e fix job_templates deletion (#360) 2020-12-02 14:02:16 -05:00
054989f232 Add support for ASAN print_scariness (#359) 2020-12-02 11:33:22 -05:00
e6b55ab95a Simplify job template management workflow (#354)
1. Merge 'create' and 'update' to a single 'save' operation.
2. Allow fetching a single template.

This enables the following workflow:

```
$ onefuzz job_templates manage get libfuzzer_linux > template.json
$ <... update template as desired ...>
$ onefuzz job_templates manage save libfuzzer_linux @./template.json
$
```
2020-12-02 14:27:42 +00:00
9b3ccf37ea use the correct instrumentation key (#355) 2020-12-01 18:44:10 -05:00
0182dc597d handle asan check failures (#358) 2020-12-01 18:23:26 -05:00
fc34725428 update rust prereqs (#357) 2020-12-01 17:22:32 -05:00
aef511efe8 Fail the task if parsing asan_log files fail (#351)
This differentiates parsing ASAN log parse failures from ASAN logs not existing, fixing the first part of #343.
2020-12-01 21:10:59 +00:00
7f97c142ed add the instrumentation key to Info (#353) 2020-12-01 11:13:06 -05:00
3f3193beeb Use disable_check_debugger on asan integration tests (#352) 2020-12-01 10:36:53 -05:00
a1af90cb83 Update deployment prerequisites to remove pyopenssl errors (#348)
Over the weekend, pyOpenssl 20.0 was released.  This causes an incompatible library issue during deployment.

Prior to this change, deployment would generate the following error
```
ERROR: pyopenssl 20.0.0 has requirement cryptography>=3.2, but you'll have cryptography 2.9.2 which is incompatible.
```
2020-12-01 14:43:53 +00:00
5092f96af4 Fix deployment of backdated versions of OneFuzz (#347)
When running automated deployments, 'tools' were not being properly replaced with the updated versions if the deployment was created _prior_ to the original instance deployment.
2020-12-01 10:59:43 +00:00
37e3251966 render the event model as json to not include error (#350) 2020-11-30 23:19:27 -05:00
30cc5d4778 ignore nodes already scheduled for re-imaging in outdated check (#341)
If a node is already scheduled to be reimaged/deleted, we should not bother checking if it's outdated.
2020-11-30 17:36:15 +00:00
2391d927f7 Updating yml file to run config endpoint command with tenant/authority ID. (#339)
## Summary of the Pull Request

Originally, the yml file printed out a semi-generalized _onefuzz config --endpoint_ comman. This command did have a specified _--authority_ and so it used the Microsoft id by default. To enable users to work with OneFuzz on tenants other than the standard Microsoft tenant, we have added a _--authority_ parameter that is printed out at the end of the deployment. 

## PR Checklist
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

## Info on Pull Request

Changes to the yml file. 

## Validation Steps Performed

We have made this change to our local automation repository and tested an automated deployment pipeline with this change.
2020-11-30 14:54:42 +00:00
079f387b88 clarify prefix-expansion errors (#342) 2020-11-24 11:51:03 -05:00
33b7608aaf Adding option to merge all inputs at once (#282) 2020-11-24 08:43:08 -05:00
79cc82098a Move integration test artifacts into primary source tree (#336) 2020-11-24 08:03:01 -05:00
905dc7c0d6 Re-enable the retry logic for App Password creation (#338) 2020-11-24 08:00:31 -05:00
d47124fe8c Fix state management in the scheduler (#337) 2020-11-24 12:43:51 +00:00
32ba86be9d Update current_thread_id when setting current thread (#340) 2020-11-23 13:39:03 -08:00
2e276de0f5 Release 1.9.0 (#335) 1.9.0 2020-11-20 16:01:28 -05:00
3ddb756504 Add linting to deployment tools (#332) 2020-11-20 13:00:19 -05:00
9e2a61fe66 Add user_info to Jobs & Repro (#327)
This adds information about the user that created a job or repro VM to the respective resources.

This expands on the addition made to tasks in #303.
2020-11-20 15:46:52 +00:00
d96209c659 Include the body when receiving a registration error (#321)
* Include the body when receiving a registration error to help debug issued like #215 
* increase the agent registration timeout to 20 min
2020-11-20 14:43:40 +00:00
7e3b807479 Support pre-release instance specific setup script paths (#331)
Support `instance-specific-setup/<OS>/setup` and `instance-specific-setup/setup` scripts.

Fixes #328
2020-11-20 12:42:58 +00:00
3974d680ef Support retry during function deploy (#330)
Starting earlier today, I saw roughly 1 in 3 deployments fail with the error `Azure.Functions.Cli.Common.CliException: Timed out waiting for SCM to update the Environment Settings`.  Redeploying the application resolves the issue.  New builds and past releases alike hit this exception.

According to https://github.com/Azure/azure-functions-core-tools/issues/1863, function app deployments may fail due to timeouts related to cold-start.

This PR executes the deploy in a loop with a delay in the case of failure.
2020-11-19 20:04:18 +00:00
31a661f071 Expose coverage/exec_sec for libfuzzer targets via CLI (#325)
Adds debug subcommands to the SDK/CLI that simplify querying Application Insights for libfuzzer telemetry.  

Querying for the latest execs_sec for a job, by job_id fragment.
```
$ onefuzz debug job libfuzzer_execs_sec 88 --limit 1
[
    {
        "execs_sec": "191035",
        "machine_id": "b2dbe720-4fd8-4342-957a-6cb0979d2187",
        "timestamp": "2020-11-18T00:08:53.98Z",
        "worker_id": "0"
    }
]
```

Querying for the latest coverage for a job, by job_id fragment.
```
$ onefuzz debug job libfuzzer_coverage 88 --limit 1
[
    {
        "covered": "10",
        "features": "21",
        "rate": "0.47619047619047616",
        "timestamp": "2020-11-18T00:09:40.793Z"
    }
]
```
2020-11-19 15:14:37 +00:00