Commit Graph

221 Commits

Author SHA1 Message Date
f345bd239d Add ssh keys to nodes on demand (#411)
Our existing model has a per-scaleset SSH key.  This update moves towards using user provided SSH keys when they need to connect to a given node.
2021-01-06 19:29:38 +00:00
c1a50f6f6c Colocate tasks (#402)
Enables co-locating multiple tasks in a given work-set.

Tasks are bucketed by the following:
* OS
* job id
* setup container
* VM SKU & image (used in pre-1.0 style tasks)
* pool name (used in 1.0+ style tasks)
* if the task needs rebooting after the task setup script executes.

Additionally, a task will end up in a unique bucket if any of the following are true:
* The task is set to run on more than one VM
* The task is missing the `task.config.colocate` flag (all tasks created prior to this functionality) or the value is False

This updates the libfuzzer template to make use of colocation.  Users can specify co-locating all of the tasks *or* co-locating the secondary tasks.
2021-01-06 13:49:15 +00:00
986df8fcc6 limit updating outdated nodes to 500 at a time (#397) 2021-01-05 17:40:36 -05:00
633e5b5f02 restrict api endpoints (#404)
Restrict API endpoints from agents
2021-01-05 19:40:58 +00:00
37f06bb324 handle libfuzzer fuzzing non-zero exits better (#381)
When running libfuzzer in 'fuzzing' mode, we expect the following on exit.

If the exit code is zero, crashing input isn't required.  This happens if the user specifies '-runs=N'

If the exit code is non-zero, then crashes are expected.  In practice, there are two causes to non-zero exits.
1. If the binary can't execute for some reason, like a missing prerequisite
2. If the binary _can_ execute, sometimes the sanitizers are put in such a bad place that they are unable to record the input that caused the crash.

This PR enables handling these two non-zero exit cases.

1. Optionally verify the libfuzzer target loads appropriately using `target_exe -help=1`.  This allows failing faster in the common issues, such a missing prerequisite library.
2. Optionally allow non-zero exits without crashes to be a warning, rather than a task failure.
2021-01-05 14:40:15 +00:00
29c7cfbd5d filter out deleted nodes as to prevent them from being saved later (#391)
In `Scaleset.cleanup_nodes`, nodes that are no longer part of the scaleset should get deleted.  Without filtering the list, the nodes could get re-saved to the Node table later on.
2021-01-04 20:28:57 +00:00
4c2679d61e Re-add windows ssh key (#390)
Adds a scaleset specific setup script, which allows us to save the scaleset based SSH keys into the VM on setup.
2021-01-04 19:52:27 +00:00
e6b55ab95a Simplify job template management workflow (#354)
1. Merge 'create' and 'update' to a single 'save' operation.
2. Allow fetching a single template.

This enables the following workflow:

```
$ onefuzz job_templates manage get libfuzzer_linux > template.json
$ <... update template as desired ...>
$ onefuzz job_templates manage save libfuzzer_linux @./template.json
$
```
2020-12-02 14:27:42 +00:00
9b3ccf37ea use the correct instrumentation key (#355) 2020-12-01 18:44:10 -05:00
7f97c142ed add the instrumentation key to Info (#353) 2020-12-01 11:13:06 -05:00
30cc5d4778 ignore nodes already scheduled for re-imaging in outdated check (#341)
If a node is already scheduled to be reimaged/deleted, we should not bother checking if it's outdated.
2020-11-30 17:36:15 +00:00
33b7608aaf Adding option to merge all inputs at once (#282) 2020-11-24 08:43:08 -05:00
d47124fe8c Fix state management in the scheduler (#337) 2020-11-24 12:43:51 +00:00
9e2a61fe66 Add user_info to Jobs & Repro (#327)
This adds information about the user that created a job or repro VM to the respective resources.

This expands on the addition made to tasks in #303.
2020-11-20 15:46:52 +00:00
b2b4a06afa Address typing issues hidden by memoization.caching (#322) 2020-11-18 15:08:40 -05:00
e47e89609a Use Storage Account types, rather than account_id (#320)
We need to move to supporting data sharding.

One of the steps towards that is stop passing around `account_id`, rather we need to specify the type of storage we need.
2020-11-18 14:06:14 +00:00
64bd389eb7 Declarative templates (#266) 2020-11-17 16:00:09 -05:00
beea318968 Add User Info to created tasks (#303)
This PR makes user information from JWT tokens available as part of a Task.

Included changes:
* Renamed `verify_token` to `call_if_agent`, since this function is specific to agent token verification
* Renames `is_authorized` to `is_agent`, since this function checks if the token is an agent
* Adds support for unmanaged nodes in `is_agent` (see #133 for information) 
* Saves the user information from the JWT token on task create as part of `TaskConfig`

Note, `TaskConfig` is what is provided to notification templates.  This enables Github issues and ADO work items to tie back to the user that created the task.

Note, while `upn` _usually_ means email for AAD user tokens.  If we were going to make use of the email address, we should perform a graph lookup based on the `oid`, but we're not.
2020-11-13 11:50:52 +00:00
31f099d3d4 Event based webhooks (#296) 2020-11-12 17:44:42 -05:00
a0b5d10c81 Add target_workers to TaskUnitConfig (#305) 2020-11-12 13:22:53 -05:00
ca209eb543 refactor agent_events handler (#261) 2020-11-11 18:28:16 -05:00
dec1a2d7b0 removing nodes whose ground truth is not avail (#275) 2020-11-11 12:20:05 -05:00
ba59230187 fix create_vmss log message (#293) 2020-11-11 12:17:42 -05:00
e638908aac Add application-insights debug cli (#281) 2020-11-11 06:17:43 -05:00
82806b1cf2 Keeps task/node association until the nodes are reimaged (#273) 2020-11-10 17:41:51 -05:00
bbee84ab1f Storing the user assigned managed identity in the scaleset table (#255) 2020-11-05 18:36:59 -05:00
b5578381ce default TTL for queued messages to infinite (#259) 2020-11-04 15:41:05 -05:00
04643a9eed fixing libfuzzer_merge (#240) 2020-11-03 15:46:18 -05:00
4ef489b397 adding node shutdown (#252) 2020-11-03 11:39:51 -05:00
6c598773dd add instance_id generated at install time (#245) 2020-11-02 14:27:51 -05:00
ced8200d74 enable setting ensemble sync duration timer (#229) 2020-10-29 14:48:12 -04:00
154be220ae Enable User assigned managed identity for scalesets (#219) 2020-10-29 13:53:11 -04:00
f4b874e19e Always use the get_*_account helper methods (#226) 2020-10-28 21:40:21 -04:00
1d2fb99dd4 expose the ability manually override node reset (#201) 2020-10-27 17:29:53 -04:00
d4c584342a address multiple issues found by pylint (#206) 2020-10-26 12:24:50 -04:00
8a62830f2b reduce how often list_containers is called (#196) 2020-10-23 09:43:45 -04:00
bfbc9f8c9e Bug fixes for Pool Resize (#158) 2020-10-23 08:58:15 -04:00
d769072343 cache tokens in memory forever (#195) 2020-10-22 19:13:59 -04:00
0a560cefba fix format strings (#189) 2020-10-22 12:15:39 -04:00
041c6ae130 Reimage dead nodes (#154) 2020-10-20 16:58:02 -04:00
178537df05 Store the heartbeat data in the task and node tables (#164) 2020-10-20 14:24:00 -04:00
75f29b9f2e Remove update_event as a single event loop for the system (#160) 2020-10-16 21:42:35 -04:00
fa25823342 split node and task heartbeats in two nodes (#163) 2020-10-15 21:30:03 -04:00
645a5e5702 stop automatically queueing objects for work (#159) 2020-10-15 14:39:37 -04:00
3189daeeb7 implementing heartbeat for the supervisor (#30) 2020-10-14 15:13:16 -04:00
d73616b366 Save the managed identity as soon as it's available (#144) 2020-10-14 11:38:05 -04:00
7f0c25e2da Managing Pool Resizing at service side (#107) 2020-10-13 14:04:26 -04:00
1398285aea Fixes typing error identified by a new mypy release (#129) 2020-10-09 16:44:59 -04:00
ed370a0d8c using multiple pgs (#121) 2020-10-08 16:08:38 -04:00
46325ea490 add '--endpoint' to 'repro_cmd' for integrations (#113) 2020-10-07 12:11:34 -04:00