Commit Graph

262 Commits

Author SHA1 Message Date
29c7cfbd5d filter out deleted nodes as to prevent them from being saved later (#391)
In `Scaleset.cleanup_nodes`, nodes that are no longer part of the scaleset should get deleted.  Without filtering the list, the nodes could get re-saved to the Node table later on.
2021-01-04 20:28:57 +00:00
4c2679d61e Re-add windows ssh key (#390)
Adds a scaleset specific setup script, which allows us to save the scaleset based SSH keys into the VM on setup.
2021-01-04 19:52:27 +00:00
3441790322 add delayed start to heartbeats (#387)
Adds a random initial jitter the size of the heartbeat periodicity to prevent heartbeats storming the service when we launch 3000 nodes roughly at the same time.

Fixes #386
2021-01-04 18:50:02 +00:00
d038cca1e1 Verify a workset only exists along with a reboot context (#378)
Adds the following:

1. Serializes a workset to disk during setup.
2. Upon deserializing a RebootContext, delete the file from disk (We support rebooting once and only once)
3. Check if a workset exists with a RebootContext
    1. If True, continuing processing
    2. if False, mark the tasks & node as "Done" with appropriate errors via:
        1. send WorkerEvent::Done events for each of the tasks in the work set
        2. send StateUpdateEvent::Done for the node
2021-01-04 17:51:20 +00:00
36b3e2a5aa disable py-cache prior to mypy on cli (#408) 2021-01-04 11:49:28 -05:00
e222b01003 update rust prereqs (#396) 2020-12-16 07:38:37 -05:00
6dc7b78447 support ASAN odr-violation outputs (#380) 2020-12-10 15:48:15 -05:00
7f5673eb21 handle non-utf8 from libfuzzer stderr (#379) 2020-12-10 15:13:14 -05:00
56090cb01d Demonstrate a more complex template management (#366)
Add a job_template example that demonstrates customization of the arguments to the job. 

This example demonstrates setting the Area and Iteration paths for Azure Devops work items.
2020-12-05 12:30:37 +00:00
69fc9f508b fix clippy issue (#367) 2020-12-04 15:04:29 -05:00
f1b4efc5ff Add troubleshooting guide for the registration issue at deployment (#362) 2020-12-02 18:54:29 -05:00
1d49f27961 Release 1.10.0 (#365) 1.10.0 2020-12-02 17:48:27 -05:00
203bc22756 Allow unmaintained memmap (#364) 2020-12-02 15:34:22 -05:00
fd131c63bf Document managing declarative templates (#361) 2020-12-02 14:18:45 -05:00
b81c6fa89e fix job_templates deletion (#360) 2020-12-02 14:02:16 -05:00
054989f232 Add support for ASAN print_scariness (#359) 2020-12-02 11:33:22 -05:00
e6b55ab95a Simplify job template management workflow (#354)
1. Merge 'create' and 'update' to a single 'save' operation.
2. Allow fetching a single template.

This enables the following workflow:

```
$ onefuzz job_templates manage get libfuzzer_linux > template.json
$ <... update template as desired ...>
$ onefuzz job_templates manage save libfuzzer_linux @./template.json
$
```
2020-12-02 14:27:42 +00:00
9b3ccf37ea use the correct instrumentation key (#355) 2020-12-01 18:44:10 -05:00
0182dc597d handle asan check failures (#358) 2020-12-01 18:23:26 -05:00
fc34725428 update rust prereqs (#357) 2020-12-01 17:22:32 -05:00
aef511efe8 Fail the task if parsing asan_log files fail (#351)
This differentiates parsing ASAN log parse failures from ASAN logs not existing, fixing the first part of #343.
2020-12-01 21:10:59 +00:00
7f97c142ed add the instrumentation key to Info (#353) 2020-12-01 11:13:06 -05:00
3f3193beeb Use disable_check_debugger on asan integration tests (#352) 2020-12-01 10:36:53 -05:00
a1af90cb83 Update deployment prerequisites to remove pyopenssl errors (#348)
Over the weekend, pyOpenssl 20.0 was released.  This causes an incompatible library issue during deployment.

Prior to this change, deployment would generate the following error
```
ERROR: pyopenssl 20.0.0 has requirement cryptography>=3.2, but you'll have cryptography 2.9.2 which is incompatible.
```
2020-12-01 14:43:53 +00:00
5092f96af4 Fix deployment of backdated versions of OneFuzz (#347)
When running automated deployments, 'tools' were not being properly replaced with the updated versions if the deployment was created _prior_ to the original instance deployment.
2020-12-01 10:59:43 +00:00
37e3251966 render the event model as json to not include error (#350) 2020-11-30 23:19:27 -05:00
30cc5d4778 ignore nodes already scheduled for re-imaging in outdated check (#341)
If a node is already scheduled to be reimaged/deleted, we should not bother checking if it's outdated.
2020-11-30 17:36:15 +00:00
2391d927f7 Updating yml file to run config endpoint command with tenant/authority ID. (#339)
## Summary of the Pull Request

Originally, the yml file printed out a semi-generalized _onefuzz config --endpoint_ comman. This command did have a specified _--authority_ and so it used the Microsoft id by default. To enable users to work with OneFuzz on tenants other than the standard Microsoft tenant, we have added a _--authority_ parameter that is printed out at the end of the deployment. 

## PR Checklist
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

## Info on Pull Request

Changes to the yml file. 

## Validation Steps Performed

We have made this change to our local automation repository and tested an automated deployment pipeline with this change.
2020-11-30 14:54:42 +00:00
079f387b88 clarify prefix-expansion errors (#342) 2020-11-24 11:51:03 -05:00
33b7608aaf Adding option to merge all inputs at once (#282) 2020-11-24 08:43:08 -05:00
79cc82098a Move integration test artifacts into primary source tree (#336) 2020-11-24 08:03:01 -05:00
905dc7c0d6 Re-enable the retry logic for App Password creation (#338) 2020-11-24 08:00:31 -05:00
d47124fe8c Fix state management in the scheduler (#337) 2020-11-24 12:43:51 +00:00
32ba86be9d Update current_thread_id when setting current thread (#340) 2020-11-23 13:39:03 -08:00
2e276de0f5 Release 1.9.0 (#335) 1.9.0 2020-11-20 16:01:28 -05:00
3ddb756504 Add linting to deployment tools (#332) 2020-11-20 13:00:19 -05:00
9e2a61fe66 Add user_info to Jobs & Repro (#327)
This adds information about the user that created a job or repro VM to the respective resources.

This expands on the addition made to tasks in #303.
2020-11-20 15:46:52 +00:00
d96209c659 Include the body when receiving a registration error (#321)
* Include the body when receiving a registration error to help debug issued like #215 
* increase the agent registration timeout to 20 min
2020-11-20 14:43:40 +00:00
7e3b807479 Support pre-release instance specific setup script paths (#331)
Support `instance-specific-setup/<OS>/setup` and `instance-specific-setup/setup` scripts.

Fixes #328
2020-11-20 12:42:58 +00:00
3974d680ef Support retry during function deploy (#330)
Starting earlier today, I saw roughly 1 in 3 deployments fail with the error `Azure.Functions.Cli.Common.CliException: Timed out waiting for SCM to update the Environment Settings`.  Redeploying the application resolves the issue.  New builds and past releases alike hit this exception.

According to https://github.com/Azure/azure-functions-core-tools/issues/1863, function app deployments may fail due to timeouts related to cold-start.

This PR executes the deploy in a loop with a delay in the case of failure.
2020-11-19 20:04:18 +00:00
31a661f071 Expose coverage/exec_sec for libfuzzer targets via CLI (#325)
Adds debug subcommands to the SDK/CLI that simplify querying Application Insights for libfuzzer telemetry.  

Querying for the latest execs_sec for a job, by job_id fragment.
```
$ onefuzz debug job libfuzzer_execs_sec 88 --limit 1
[
    {
        "execs_sec": "191035",
        "machine_id": "b2dbe720-4fd8-4342-957a-6cb0979d2187",
        "timestamp": "2020-11-18T00:08:53.98Z",
        "worker_id": "0"
    }
]
```

Querying for the latest coverage for a job, by job_id fragment.
```
$ onefuzz debug job libfuzzer_coverage 88 --limit 1
[
    {
        "covered": "10",
        "features": "21",
        "rate": "0.47619047619047616",
        "timestamp": "2020-11-18T00:09:40.793Z"
    }
]
```
2020-11-19 15:14:37 +00:00
bb2b18a2b9 Fix MSVC Libfuzzer coverage reporting (#324)
This PR fixes two issues:
- First, in MSVC compiled binaries both the LLVM _and_ MSVC symbols are
present, but only the MSVC symbols have correct values. For example:

```
0:000> cdb: Reading initial command '.scriptload DumpCountersOld.js ; !dumpcounters "cov" ; q'
JavaScript script successfully loaded from 'DumpCountersOld.js'
[+] not disabling sympath
INFO: Seed: 58715679
INFO: Loaded 1 modules   (3968 inline 8-bit counters): 3968 [00007FF70DB4B000, 00007FF70DB4BF80), # XXX Note
xxx.exe: Running 1 inputs 1 time(s) each.
Running: inp
[+] processing xxx.exe
[+] using LLVM 10 symbols - 0x7ff70db72b00:0x7ff70db72b08 # XXX These are wrong
```

This means the order we search for the coverage symbols is important.

- Secondly, this enables support for MSVC 8bit counter coverage.

## Validation Steps Performed

Running any recent MSVC compiled libfuzzer target should fail to actually collect coverage, instead just returning the 8 null bytes described in the linked issue.
2020-11-19 02:47:33 +00:00
b2b4a06afa Address typing issues hidden by memoization.caching (#322) 2020-11-18 15:08:40 -05:00
bb6d083768 Enable unmanaged registrations and configuration by environment variables (#318) 2020-11-18 12:19:09 -05:00
e47e89609a Use Storage Account types, rather than account_id (#320)
We need to move to supporting data sharding.

One of the steps towards that is stop passing around `account_id`, rather we need to specify the type of storage we need.
2020-11-18 14:06:14 +00:00
52eca33237 Move more run-time actions to setup-time (#317)
This script moves more of the run-time actions to setup-time.  This is important for running fuzzing within docker containers, such that installing llvm & gdb is done as part of the container, rather than on each launch.
2020-11-18 10:13:01 +00:00
64bd389eb7 Declarative templates (#266) 2020-11-17 16:00:09 -05:00
ce3356d597 Add SDK Feature Flags (#313)
## Summary of the Pull Request

This enables feature flags for the SDK, which enables gating access to preview features to those that have specifically asked for them.  This is intended to be used within #266.

Note, this change also moves to using a `pydantic` model for the config, rather than hand-crafted JSON dicts.
2020-11-17 15:40:16 +00:00
c4f266ee00 fix webhook events doc link (#316) 2020-11-16 18:45:54 -05:00
41271c62e0 Release 1.8.0 (#315) 1.8.0 2020-11-16 17:56:50 -05:00