Commit Graph

27 Commits

Author SHA1 Message Date
b9950c5526 update log messages to ease debugging (#988) 2021-06-14 15:18:03 -04:00
d557fc16c6 mark tasks that are stopped that never started with an error (#935) 2021-05-26 18:42:21 -04:00
2f81c44f01 Refactoring proxy lifetime to only shutdown when proxy is out-of-date. (#839)
## Summary of the Pull Request

_What is this about?_
We'd like to refactor the proxy lifecycle to only delete when the proxy is out-of-date - i.e. when the proxy is older than 7 days or a mismatched version. I've changed two files, proxy.py and timer_daily\init.py to check for the version and timestamp before stopping a live proxy. 

## PR Checklist
* [ ] Applies to work item: #xxx
* [ ] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/onefuzz) and sign the CLI.
* [ ] Tests added/passed
* [ ] Requires documentation to be updated
* [x] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

## Info on Pull Request

_What does this include?_
Changes to two files: 
proxy.py: 
- get_or_create() edited to check if timestamp is >7 days.
- Created is_outdated() to check version and timestamp for out-of-date proxy. 
timer_daily/init.py
- Proxy check now includes is_outdated() before determining if a proxy should be shutdown. 

## Validation Steps Performed
Deploying test instance to determine if proxy lives past a single day.
2021-05-20 14:33:29 +00:00
ff140a6b1b Stop tasks on nodes before deleting task queues (#801) 2021-05-17 18:59:13 +00:00
627463d94b only record the first failure if a task has multiple failures (#797) 2021-04-13 17:36:56 -04:00
ca12904684 add log checking to refactored integration check (#700)
In practice, Application Insights can take up to 3 minutes before something sent to it is available via KQL.

This PR logs a start and stop marker such that the integration tests only search for logs during the integration tests. This reduces the complexity when using the integration tests during the development process.

Note: this migrated the new functionality from #356 into the latest integration test tools.
2021-04-02 21:49:19 +00:00
1706a91291 Removing UserInfo from 'created task' logging (#725) 2021-03-23 18:45:18 -04:00
6888fc8fb8 send EventTaskFailed and EventTaskStopped once the task is stopped (#651)
As is, these events are sent once the task enters the state `stopping`.
However, the tasks can still be running on the VMs which can be
confusing.
2021-03-12 01:48:28 +00:00
14c7d5e4d9 mark dependant tasks failed upon failure (#650)
Fix #644
2021-03-11 22:24:43 +00:00
b4ceb263e0 stop jobs once all tasks are stopped (#649)
Fixed #643
2021-03-09 20:09:18 +00:00
4992b494f1 add task config to all task events (#580) 2021-02-19 14:10:48 -05:00
1d74379a70 use the primitive types in more places (#514) 2021-02-05 13:10:37 -05:00
a02e084522 split out node, scaleset, and pool code (#507) 2021-02-04 19:07:49 -05:00
513d1f52c9 Unify Dashboard & Webhook events (#394)
This change unifies the previously adhoc SignalR events and Webhooks into a single event format.
2021-01-11 21:43:09 +00:00
3b26ffef65 support multiple corpus accounts (#334)
Add support for sharding across multiple storage accounts for blob containers used for corpus management.

Things to note:

1. Additional storage accounts must be in the same resource group, support the "blob" endpoint, and have the tag `storage_type` with the value `corpus`.  A utility is provided (`src/utils/add-corpus-storage-accounts`), which adds storage accounts. 
2. If any secondary storage accounts exist, they are used by default for containers.
3. Storage account names are cached in memory the Azure Function instance forever.   Upon adding new storage accounts, the app needs to be restarted to pick up the new accounts.
2021-01-06 23:11:39 +00:00
c1a50f6f6c Colocate tasks (#402)
Enables co-locating multiple tasks in a given work-set.

Tasks are bucketed by the following:
* OS
* job id
* setup container
* VM SKU & image (used in pre-1.0 style tasks)
* pool name (used in 1.0+ style tasks)
* if the task needs rebooting after the task setup script executes.

Additionally, a task will end up in a unique bucket if any of the following are true:
* The task is set to run on more than one VM
* The task is missing the `task.config.colocate` flag (all tasks created prior to this functionality) or the value is False

This updates the libfuzzer template to make use of colocation.  Users can specify co-locating all of the tasks *or* co-locating the secondary tasks.
2021-01-06 13:49:15 +00:00
b2b4a06afa Address typing issues hidden by memoization.caching (#322) 2020-11-18 15:08:40 -05:00
e47e89609a Use Storage Account types, rather than account_id (#320)
We need to move to supporting data sharding.

One of the steps towards that is stop passing around `account_id`, rather we need to specify the type of storage we need.
2020-11-18 14:06:14 +00:00
64bd389eb7 Declarative templates (#266) 2020-11-17 16:00:09 -05:00
beea318968 Add User Info to created tasks (#303)
This PR makes user information from JWT tokens available as part of a Task.

Included changes:
* Renamed `verify_token` to `call_if_agent`, since this function is specific to agent token verification
* Renames `is_authorized` to `is_agent`, since this function checks if the token is an agent
* Adds support for unmanaged nodes in `is_agent` (see #133 for information) 
* Saves the user information from the JWT token on task create as part of `TaskConfig`

Note, `TaskConfig` is what is provided to notification templates.  This enables Github issues and ADO work items to tie back to the user that created the task.

Note, while `upn` _usually_ means email for AAD user tokens.  If we were going to make use of the email address, we should perform a graph lookup based on the `oid`, but we're not.
2020-11-13 11:50:52 +00:00
31f099d3d4 Event based webhooks (#296) 2020-11-12 17:44:42 -05:00
75f29b9f2e Remove update_event as a single event loop for the system (#160) 2020-10-16 21:42:35 -04:00
7f0c25e2da Managing Pool Resizing at service side (#107) 2020-10-13 14:04:26 -04:00
e308a4ae1e refactor node state to fully put the agent in charge (#90) 2020-10-03 02:43:04 -04:00
a196716e12 only record failures generated prior to stopping (#83) 2020-10-02 01:31:51 -04:00
27a798febe move to warning (#66) 2020-10-01 15:37:01 -04:00
d3a0b292e6 initial public release 2020-09-18 12:21:04 -04:00