Commit Graph

213 Commits

Author SHA1 Message Date
fb13fb45c1 Adding a class to store and retrieve rules associated with an api (#1420)
* Adding a class to store and retrieve rules associated with an api
2021-11-01 16:29:27 -07:00
98cd7c9c56 migrate to msgraph (#966)
* migrate to msgraph

* add subscription id to query_microsoft_graph

* migrating remaingin references

* formatting

* adding missing dependencies

* flake fix

* fix get_tenant_id

* cleanup

* formatting

* migrate application creation in deploy.py

* foramt

* mypy fix

* isort

* isort

* format

* bug fixes

* specify the correct signInAudience

* fix backing service principal creation
fix preauthorized application

* remove remaining references to graphrbac

* fix ms graph authentication

* formatting

* fix typo

* format

* deployment fix

* set implicitGrantSettings in the deployment

* format

* fix deployment

* fix graph authentication on the server

* use the current cli logged in account to retrive the backend token cache

* assign the the msgraph app role permissions to the web app during the deployment

* formatting

* fix build

* build fix

* fix bandit issue

* mypy fix

* isort

* deploy fixes

* formatting

* remove assign_app_permissions

* mypy fix

* build fix

* mypy fix

* format

* formatting

* flake fix

* remove webapp identity permission assignment

* remove unused reference to assign_app_role

* remove manual registration message

* fixing name and logging

* address PR coments

* address PR comments

* build fix

* lint

* lint

* mypy fix

* mypy fix

* formatting

* address PR comments

* linting

* lint

* remove ONEFUZZ_AAD_GROUP_ID check

* regenerate webhook_events.md

* change return type of query_microsoft_graph_list

* fix tenant_id

Co-authored-by: Marc Greisen <marc@greisen.org>
Co-authored-by: Stas <stishkin@live.com>
2021-10-22 11:59:05 -07:00
b238bfea03 Revert "NSG Updated After CLI Update to Instance_Config (#1375)" (#1384)
This reverts commit 357bc4fcad.
2021-10-21 12:51:20 -07:00
357bc4fcad NSG Updated After CLI Update to Instance_Config (#1375)
* Creating InstanceConfig Attributes for NSG Refactor (#1331)

* Updating instance_config

* Updating attribute names.

* Updating list factory.

* Updating config attributes.

Co-authored-by: nharper285 <nharper285@gmail.com>

* NSG deployment on a creation of new debug/repro proxy. (#1340)

Co-authored-by: stas <statis@microsoft.com>

* Code for updating NSGs when instance_config updated.

* Updating argument to set_allowed_rules

* Temporarily ignore non-actionable `cargo audit` errors (#1365)

* Updating model to no longer be optional.

* Fixing args for set_allowed_rules

* trying to fix calls to get_nsg

* Updating calls to nsg lib

* Fixing imports.

* Updating calls to set_allowed and creating constructor for NSGConfig type.

* Removing constructor and manually setting default ip

* Fixing models.

* Hopefully fixing docs.

* Fix set_allowed call

* Adding error handling for update config.

* Changing to error check.

* Fixing error call.

* Fixing imports.

* Updating instanceconfig retrieval.

* Fixing imports.

* Adding empty() function on request.

* Fixing name of function.

* Removing empty function.

Co-authored-by: nharper285 <nharper285@gmail.com>
Co-authored-by: Stas <stishkin@live.com>
Co-authored-by: stas <statis@microsoft.com>
Co-authored-by: Joe Ranweiler <joranwei@microsoft.com>
2021-10-21 11:54:09 -07:00
97a3a67b3e Fix validation of target_exe blob name (#1371) 2021-10-19 19:20:30 -07:00
720c8dc466 Azure DevOps notifications not appearing (#1370)
Co-authored-by: stas <statis@microsoft.com>
2021-10-19 08:50:00 -07:00
22b2d62e29 enable configurable virtual network ranges (#1268) 2021-09-27 18:01:32 +00:00
599c400fa0 Custom Extension Instance Configuration (#1184) 2021-09-24 12:27:39 -04:00
3d1766271e backdate SAS URLs to avoid time sync issues (#1195) 2021-08-27 17:00:15 +00:00
2a2844ae7a enable configuring proxy VM sku (#1128) 2021-08-23 16:04:59 +00:00
d2faf7c66d Fix case of logger format string specifier (#1160)
Fix a log statement with an invalid format string specifier. At runtime, the invalid specifier causes the service to throw a `ValueError`. This is typically invoked in the `agent_can_schedule` function [here](https://github.com/microsoft/onefuzz/blob/main/src/api-service/__app__/agent_can_schedule/__init__.py#L33).
2021-08-23 14:37:01 +00:00
2fcb499888 Merge pull request from GHSA-q5vh-6whw-x745
* verify aad tenants, primarily needed in multi-tenant deployments

* add logging and fix trailing slash for issuer

* handle call_if* not supporting additional argument callbacks

* add logging

* include new datatype in webhook docs

* fix pytypes unit tests

Co-authored-by: Brian Caswell <bmc@shmoo.com>
2021-08-13 14:50:54 -04:00
5a8a1c998e Enable ado render testing (#1144) 2021-08-12 16:38:49 +00:00
338b541a94 expose coverage as an optional directory that gets synced to supervisor tasks (#1123)
Addresses #1122
2021-08-06 19:13:23 +00:00
39bd0d2ca7 don't overload list builtin (#1120) 2021-08-02 13:51:35 -04:00
cfe0ec8d5f address lint issues (#1117) 2021-08-02 12:05:10 -04:00
95e2ecff3d fix format in notification (#1115) 2021-08-02 12:04:46 -04:00
9ec7e7a20a process all expired nodes rather than those not already marked for deletion (#1103)
This makes sure debug_keep_node is reset and the rest of the reimage processing occurs regardless of reimage_requested and delete_requested being set.

Without this, nodes that are marked `debug_keep_node` do not get reimaged/deleted.
2021-07-27 00:53:04 +00:00
0e27256faf Remove signalr from endpoints (#1102)
This is a follow-on PR from #1100
2021-07-23 15:47:08 +00:00
7e6a42cdd6 require {input} in target_env or target_options for generator and coverage tasks (#1106)
Fixes #925
2021-07-23 14:58:42 +00:00
b90ee03fd9 tasks must use pools not VMs (#1105)
using config.vm depricated prior to 1.0.0
2021-07-23 14:10:51 +00:00
55366e751a allow pools & scalesets set to shutdown to halt (#1104)
Currently, if a pool or scaleset is set to `shutdown`, it cannot be set to `halt`.

While moving from `halt` to `shutdown` would cause issues, moving from `shutdown` to `halt` is fine.
2021-07-23 13:14:47 +00:00
5be9c4dcee relay SignalR integrations through a storage queue (#1100)
The SignalR integration from Azure Functions does not have automatic retry.  When the SignalR instance has issues, all other APIs fail.

To make the service resilient to SignalR outages, this bounces SignalR events through an Azure Storage queue.

NOTE: This PR does not remove the integration from all of the functions.  That is intended to be done as a follow-on PR.
2021-07-22 18:10:20 +00:00
ee3d0871f2 handle azure-mgmt expired auth tokens by clearing the client cache and retrying (#1099)
In order to reduce how frequently the IMS is hit from the service, the service caches the azure-mgmt clients between API calls.  While the management APIs should have some amount of authentication expiration redundancy built in, not all of them do.

This is seen with `ClientAuthenticationError`, most often with the nested exception record of `ExpiredAuthenticationToken`.

This wraps all of the compute layer functionality with a wrapper that checks if there has been an exception, and retries the request.
2021-07-22 18:01:02 +00:00
3269dbb1aa delete secret on object delete (#1085) 2021-07-21 16:04:27 -04:00
065272191e Replace notifications by default (#1084) 2021-07-20 18:39:31 -04:00
152dd190b7 Add more information to the logs of transient error (#1082) 2021-07-16 17:52:06 -04:00
39beb1591c use managed identity reader access for scaleset configs (#1060) 2021-07-13 13:20:50 -04:00
7a7ded6b7e force upgrade custom script extensions (#1059) 2021-07-13 12:08:07 -04:00
89b7d13125 Fix get_dead_nodes query (#1054) 2021-07-09 13:33:42 -04:00
826ef8dd22 Pool shrink queue (#1050) 2021-07-08 10:23:54 -04:00
45d468f2ce set pool_id on node creation (#1049) 2021-07-07 17:58:24 -04:00
52f83b5b26 add EventScalesetResizeScheduled (#1047) 2021-07-07 14:15:26 -04:00
7b2679a1ce make ShrinkQueue not scaleset specific (#1046) 2021-07-07 13:27:49 -04:00
15063908b0 update azure-cli to 2.26.0 (#1045) 2021-07-07 12:07:34 -04:00
29dda54b83 instance wide configuration (#1010)
TODO:
* [x] add setting initial set of admins during deployment
2021-06-30 21:13:58 +00:00
1e90ed6092 Allow notifications to be retried when an error occurs (#1026) 2021-06-30 14:05:25 -04:00
883c93aaf4 ensure VM IDs are unique before calling Azure reimage/delete APIs (#1023) 2021-06-25 11:54:52 -04:00
10d2e3e366 update azure-keyvault-secrets to 4.3.0 (#1012) 2021-06-23 18:27:32 -04:00
5f8e423265 remove nodes from db upon reimage (#1005)
The flag `Node.reimage_queued` is intended to stop nodes from reimaging repeatedly.  

In #970, in order to work around Azure API failures, this flag was cycled if the node was already set to cleanup.  Unfortunately, reimaging can sometimes take a significant amount of time, causing this change to get nodes multiple times.

Instead of using `reimage_queued` as a flag, this PR deletes the node from the storage table upon reimage.  When the node registers OR the next time through `Scaleset.cleanup_nodes`, the Node will be recreated automatically, whichever comes first.
2021-06-23 22:25:15 +00:00
50652c2e48 mark tasks as failed when the node is being reimaged due to heartbeat issues (#1015) 2021-06-23 16:39:47 -04:00
b9950c5526 update log messages to ease debugging (#988) 2021-06-14 15:18:03 -04:00
bcdae2d5cb Check scaleset size for missing nodes (#984) 2021-06-11 18:47:21 -04:00
2be1edd9dc handle reimaging failures by resetting reimage_queued (#970)
In a previous commit, reimage_queued was added to prevent reimaging a node while it is reimaging.  However, this means reimaging failures due to Azure issues don't finish reimaging.

This will reset the this flag allowing the node to reimage in the following cleanup cycle.
2021-06-09 18:58:56 +00:00
da931b3a5c address issues raised from latest mypy (#972) 2021-06-09 12:04:24 -04:00
af39d25a7d reimage/delete expired nodes even with the debug_keep_node flag (#968)
Fixes #965
2021-06-08 17:37:10 +00:00
ed289c9a3c handle scaleset resize exceptions (#967) 2021-06-08 09:30:36 -04:00
2c72bd590f Add generic coverage task (#763)
**Todo:**
- [x] Finalize format for coverage file(s)
- [x] Add service support
- [x] Integration test
- [x] Merge #926 
- [x] Merge #929
2021-06-03 23:36:00 +00:00
a92c84d42a work around issue with discriminated typed unions (#939)
We're experiencing a bug where Unions of sub-models are getting downcast, which causes a loss of information.  

As an example, EventScalesetCreated was getting downcast to EventScalesetDeleted.  I have not figured out why, nor can I replicate it locally to minimize the bug send upstream, but I was able to reliably replicate it on the service.

While working through this issue, I noticed that deserialization of SignalR events was frequently wrong, leaving things like tasks as "init" in `status top`.

Both of these issues are Unions of models with a type field, so it's likely these are related.
2021-06-02 16:40:58 +00:00
60ae07c34f handle azure-storage deleting nonexistent containers (#948) 2021-06-02 15:11:33 +00:00