Commit Graph

143 Commits

Author SHA1 Message Date
c5bb0f0588 Update Proxy heartbeat & logging (#502) 2021-02-04 15:38:17 -05:00
5e2e9448df add security auditing of python code using Bandit during CICD (#491) 2021-02-01 16:51:03 -05:00
0f70ffa3e2 try pushing updates to scaleset configs frequently until the push succeeds (#489) 2021-02-01 10:09:40 -05:00
a46f7b4193 expose supervisor tasks that are fully self-contained fuzzing tasks in the service (#474)
Exposes the functionality added in #454 to the service & CLI.

Fixes #439
2021-01-29 00:01:59 +00:00
14fc1ca51f remove unused Event generation from the pre-2.0.0 SignalR integration (#477)
Remove a vestige of the adhoc events used by the previous SingalR integration for container updates.
2021-01-28 21:56:31 +00:00
f155ad625f reimage long-lived nodes (#476)
This helps keep nodes on scalesets that use `latest` OS image SKUs reasonably up-to-date with OS patches without disrupting running fuzzing tasks with patch reboot cycles.

In combination with the already-merged #416, this PR closes #414.
2021-01-28 20:36:40 +00:00
24685ca8df Updating Windows Default Image from RS5-Pro to 20H2-Pro (#469)
RS5-Pro is no longer updated in the Azure Marketplace. In order to ensure the Windows 10 VMs are regularly updated, we need to switch the default image to 20H2-Pro, which is regularly maintained.
2021-01-27 13:46:46 +00:00
5027745ee2 simplify get/delete for scalesets (#468) 2021-01-26 14:43:14 -05:00
165257e989 update python prereqs (#427)
Updates the following libraries in the service:
* azure-core
* azure-functions
* azure-identity
* azure-keyvault-keys
* azure-keyvault-secrets
* azure-mgmt-compute
* azure-mgmt-core
* azure-mgmt-loganalytics
* azure-mgmt-network
* azure-mgmt-resource
* azure-mgmt-storage
* azure-mgmt-subscription
* azure-storage-blob
* azure-storage-queue
* pydantic
* requests
* jsonpatch

Removes the following libraries in the service:
* azure-cli-core
* azure-cli-nspkg
* azure-mgmt-cosmosdb
* azure-servicebus

Updates the following libraries in the CLI:
* requests
* semver
* asciimatics
* pydantic
* tenacity

Updates the following libraries in onefuzztypes:
* pydantic

The primary "legacy" libraries are [azure-graphrbac](https://pypi.org/project/azure-graphrbac/) and azure-cosmosdb-table.  The former has not been updated to use azure-identity yet. The later is being rewritten as [azure-data-tables](https://pypi.org/project/azure-data-tables/), but is still in early beta.
2021-01-25 20:53:40 +00:00
31ea71e8b6 use the unique-string based keyvault names (#462) 2021-01-25 15:02:12 -05:00
4bc90a7564 set max stdout/stderr size (#460) 2021-01-25 13:07:35 -05:00
3f2883d38e Storing secrets in azure keyvault (#326) 2021-01-25 11:12:07 -05:00
e4ecf7e230 remove early-exit from cleanup_nodes that broke dead node cleanup (#458) 2021-01-22 18:04:50 -05:00
2f3139cda1 unify node resetting & deleting into delete/recreate (#450) 2021-01-22 22:04:44 +00:00
e6dec041b2 move to using machine_id rather than node_id (#451)
Handle unifying onto machine_id for NodeMessage.
2021-01-21 16:22:16 +00:00
fd956380d4 experimental "local fuzzing" support (#405)
This PR adds an experimental "local" mode for the agent, starting with `libfuzzer`.  For tasks that poll a queue, in local mode, they just monitor a directory for new files.

Supported commands: 
* libfuzzer-fuzz (models the `libfuzzer-fuzz` task)
* libfuzzer-coverage (models the `libfuzzer-coverage` task)
* libfuzzer-crash-report (models the `libfuzzer-crash-report` task)
* libfuzzer (models the `libfuzzer basic` job template, running libfuzzer-fuzz and libfuzzer-crash-report tasks concurrently, where any files that show up in `crashes_dir` are automatically turned into reports, and optionally runs the coverage task which runs the coverage data exporter for each file that shows up in `inputs_dir`).

Under the hood, there are a handful of changes required to the rest of the system to enable this feature.
1. `SyncedDir` URLs are now optional.  In local mode, these no longer make sense.   (We've discussed moving management of `SyncedDirs` to the Supervisor.  This is tangential to that effort.)
2. `InputPoller` uses a `tempdir` rather than abusing `task_id` for temporary directory naming.
3. Moved the `agent` to only use a single tokio runtime, rather than one for each of the subcommands.
4. Sets the default log level to `info`.  (RUST_LOG can still be used as is).

Note, this removes the `onefuzz-agent debug` commands for the tasks that are now exposed via `onefuzz-agent local`, as these provide a more featureful version of the debug tasks.
2021-01-20 03:33:25 +00:00
513d1f52c9 Unify Dashboard & Webhook events (#394)
This change unifies the previously adhoc SignalR events and Webhooks into a single event format.
2021-01-11 21:43:09 +00:00
d573100a97 Clear node messages on deletion (#419)
## Summary of the Pull Request

_What is this about?_

## PR Checklist
* [ ] Applies to work item: #xxx
* [ ] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/onefuzz) and sign the CLI.
* [ ] Tests added/passed
* [ ] Requires documentation to be updated
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

## Info on Pull Request

_What does this include?_

## Validation Steps Performed

_How does someone test & validate?_
2021-01-11 20:14:43 +00:00
6aa7d5f6cf remove unused back_channel_address entry (#420) 2021-01-08 14:23:30 -05:00
46e8454569 compare containers rather than SAS urls when building worksets (#418)
By comparing container names rather than SAS urls, this removes a race condition that prevented co-locatable tasks from being co-located.
2021-01-08 09:45:05 +00:00
e799eb03cd Shorten the expiry window for the work queue SAS URLs assigned at node registration (#416)
The underlying impact is that nodes must re-register on a more frequent basis.

Nodes find out they are out-of-date is during registration and immediately prior to starting a new set of work.  Requiring nodes re-register on a shortened cycle provides more opportunities for nodes to get re-imaged.

Additionally, this addresses an issue handling the SAS URL expiry in a more clean fashion in the supervisor.
2021-01-07 12:34:26 +00:00
3b26ffef65 support multiple corpus accounts (#334)
Add support for sharding across multiple storage accounts for blob containers used for corpus management.

Things to note:

1. Additional storage accounts must be in the same resource group, support the "blob" endpoint, and have the tag `storage_type` with the value `corpus`.  A utility is provided (`src/utils/add-corpus-storage-accounts`), which adds storage accounts. 
2. If any secondary storage accounts exist, they are used by default for containers.
3. Storage account names are cached in memory the Azure Function instance forever.   Upon adding new storage accounts, the app needs to be restarted to pick up the new accounts.
2021-01-06 23:11:39 +00:00
f345bd239d Add ssh keys to nodes on demand (#411)
Our existing model has a per-scaleset SSH key.  This update moves towards using user provided SSH keys when they need to connect to a given node.
2021-01-06 19:29:38 +00:00
c1a50f6f6c Colocate tasks (#402)
Enables co-locating multiple tasks in a given work-set.

Tasks are bucketed by the following:
* OS
* job id
* setup container
* VM SKU & image (used in pre-1.0 style tasks)
* pool name (used in 1.0+ style tasks)
* if the task needs rebooting after the task setup script executes.

Additionally, a task will end up in a unique bucket if any of the following are true:
* The task is set to run on more than one VM
* The task is missing the `task.config.colocate` flag (all tasks created prior to this functionality) or the value is False

This updates the libfuzzer template to make use of colocation.  Users can specify co-locating all of the tasks *or* co-locating the secondary tasks.
2021-01-06 13:49:15 +00:00
986df8fcc6 limit updating outdated nodes to 500 at a time (#397) 2021-01-05 17:40:36 -05:00
633e5b5f02 restrict api endpoints (#404)
Restrict API endpoints from agents
2021-01-05 19:40:58 +00:00
37f06bb324 handle libfuzzer fuzzing non-zero exits better (#381)
When running libfuzzer in 'fuzzing' mode, we expect the following on exit.

If the exit code is zero, crashing input isn't required.  This happens if the user specifies '-runs=N'

If the exit code is non-zero, then crashes are expected.  In practice, there are two causes to non-zero exits.
1. If the binary can't execute for some reason, like a missing prerequisite
2. If the binary _can_ execute, sometimes the sanitizers are put in such a bad place that they are unable to record the input that caused the crash.

This PR enables handling these two non-zero exit cases.

1. Optionally verify the libfuzzer target loads appropriately using `target_exe -help=1`.  This allows failing faster in the common issues, such a missing prerequisite library.
2. Optionally allow non-zero exits without crashes to be a warning, rather than a task failure.
2021-01-05 14:40:15 +00:00
29c7cfbd5d filter out deleted nodes as to prevent them from being saved later (#391)
In `Scaleset.cleanup_nodes`, nodes that are no longer part of the scaleset should get deleted.  Without filtering the list, the nodes could get re-saved to the Node table later on.
2021-01-04 20:28:57 +00:00
4c2679d61e Re-add windows ssh key (#390)
Adds a scaleset specific setup script, which allows us to save the scaleset based SSH keys into the VM on setup.
2021-01-04 19:52:27 +00:00
e6b55ab95a Simplify job template management workflow (#354)
1. Merge 'create' and 'update' to a single 'save' operation.
2. Allow fetching a single template.

This enables the following workflow:

```
$ onefuzz job_templates manage get libfuzzer_linux > template.json
$ <... update template as desired ...>
$ onefuzz job_templates manage save libfuzzer_linux @./template.json
$
```
2020-12-02 14:27:42 +00:00
9b3ccf37ea use the correct instrumentation key (#355) 2020-12-01 18:44:10 -05:00
7f97c142ed add the instrumentation key to Info (#353) 2020-12-01 11:13:06 -05:00
30cc5d4778 ignore nodes already scheduled for re-imaging in outdated check (#341)
If a node is already scheduled to be reimaged/deleted, we should not bother checking if it's outdated.
2020-11-30 17:36:15 +00:00
33b7608aaf Adding option to merge all inputs at once (#282) 2020-11-24 08:43:08 -05:00
d47124fe8c Fix state management in the scheduler (#337) 2020-11-24 12:43:51 +00:00
9e2a61fe66 Add user_info to Jobs & Repro (#327)
This adds information about the user that created a job or repro VM to the respective resources.

This expands on the addition made to tasks in #303.
2020-11-20 15:46:52 +00:00
b2b4a06afa Address typing issues hidden by memoization.caching (#322) 2020-11-18 15:08:40 -05:00
e47e89609a Use Storage Account types, rather than account_id (#320)
We need to move to supporting data sharding.

One of the steps towards that is stop passing around `account_id`, rather we need to specify the type of storage we need.
2020-11-18 14:06:14 +00:00
64bd389eb7 Declarative templates (#266) 2020-11-17 16:00:09 -05:00
beea318968 Add User Info to created tasks (#303)
This PR makes user information from JWT tokens available as part of a Task.

Included changes:
* Renamed `verify_token` to `call_if_agent`, since this function is specific to agent token verification
* Renames `is_authorized` to `is_agent`, since this function checks if the token is an agent
* Adds support for unmanaged nodes in `is_agent` (see #133 for information) 
* Saves the user information from the JWT token on task create as part of `TaskConfig`

Note, `TaskConfig` is what is provided to notification templates.  This enables Github issues and ADO work items to tie back to the user that created the task.

Note, while `upn` _usually_ means email for AAD user tokens.  If we were going to make use of the email address, we should perform a graph lookup based on the `oid`, but we're not.
2020-11-13 11:50:52 +00:00
31f099d3d4 Event based webhooks (#296) 2020-11-12 17:44:42 -05:00
a0b5d10c81 Add target_workers to TaskUnitConfig (#305) 2020-11-12 13:22:53 -05:00
ca209eb543 refactor agent_events handler (#261) 2020-11-11 18:28:16 -05:00
dec1a2d7b0 removing nodes whose ground truth is not avail (#275) 2020-11-11 12:20:05 -05:00
ba59230187 fix create_vmss log message (#293) 2020-11-11 12:17:42 -05:00
e638908aac Add application-insights debug cli (#281) 2020-11-11 06:17:43 -05:00
82806b1cf2 Keeps task/node association until the nodes are reimaged (#273) 2020-11-10 17:41:51 -05:00
bbee84ab1f Storing the user assigned managed identity in the scaleset table (#255) 2020-11-05 18:36:59 -05:00
b5578381ce default TTL for queued messages to infinite (#259) 2020-11-04 15:41:05 -05:00
04643a9eed fixing libfuzzer_merge (#240) 2020-11-03 15:46:18 -05:00