Caches are getting too big and we are exceeding the 10GB limit, leading to cache churning.
1. Try to make the caches smaller by using `Swatinem/rust-cache`, which is smarter about what gets cached.
- After doing this it turns out we don't really need `sccache` any more, it has very little impact upon compile times as the cache hit ratio is low. So remove it, to reduce complexity of build and size of build caches.
2. Also fix artifact caching which had been broken by a version format change (4956cf5406).
## Summary of the Pull Request
- **Breaking** (but as far as I know this feature is not yet in use): rename the `extra_container` to `extra_setup_container`.
- **Add**: the `extra_output_container`, which pushes its outputs continually.
- We may also want a type of container which both pushes & pulls? See discussion below.
- **Improved**: if `onefuzz-task` fails upon launch, we will log its output for diagnosis (might close#3113)
---
Some thoughts for the future:
We might want to redesign the containers so that we have something like the following which is passed to the agent, and the agent doesn't need to know the specifics of the containers supplied:
```jsonc
{
// ...
"containers": {
"extra_setup_dir": {
"mode": "pull",
"container_name": "yyy",
},
"extra_output_dir": {
"mode": "push",
"continuous": true, // keep pushing while job is running
"container_name": "xxx"
}
}
}
```
At the moment the agent needs to know what each container is for, for each task type. A more generic and flexible method might be simpler overall.
* Try to kill debuggee if Linux recording times out
* Add extra cleanup
* Fix Windows warnings
* Fix import
* Minimize mutex lock scope
* Remove redundant wait
* Remove unused import
When we record coverage, if any of the files fail (e.g. due to timeouts during coverage recording), then we fail the whole process.
Instead, make coverage recording best-effort: if any files fail then we continue to record coverage using the other files. A warning is printed that we can monitor for any ongoing problems.
Closes#3041
Closes#2098.
This cleans up the authentication a bit; after this change we have two stages in the middleware pipeline:
- `AuthenticationMiddleware` reads the JWT token (it does not validate it, this is done by the Azure Functions service) and stores it in `FunctionContext.Items["ONEFUZZ_USER_INFO"]`
- `AuthorizationMiddleware` checks the user info against the `[Authorize]` attribute to see if the user has the required permissions
- Functions can read the user info from the `FunctionContext` if needed
The authorize attribute can be `[Authorize(Allow.User)]` or `Allow.Agent` or `Allow.Admin`. The `Admin` case is new and allows this to be declaratively specified rather than being checked in code. We have several functions which could be changed to use this (e.g. Pool POST/DELETE/PATCH, Scaleset POST/DELETE/PATCH), but I have only changed one so far (JinjaToScriban).
One of the benefits here is that this simplifies the test code a lot: we can set the desired user info directly onto our `(Test)FunctionContext` rather than having to supply a fake that pretends to parse the token from the HTTP request. This will also have benefits when running the service locally for testing purposes (refer to internal issue).
The other benefit is the ability to programmatically read the required authentication for each function, which may help with Swagger generation.
* Store authentication info in keyvault
* fix tests
* fix tests
* fix test
* fix build
* test fix
* more fix
* format
* fix test
* fix test
* build
* cleanup
* build fix
* test fix
* catch exception when secret does not exist
* more cleanup
* fix tests
* cleanup
* address comments
* more null check
Scaleset names are now permitted to be any (valid) strings, instead of only GUIDs. When we generate a scaleset name it is now based upon the pool name; for example the pool `pool` might get a scaleset named `pool-3b24ba211cad4b078655914754485838`.
This should be backwards-compatible since GUIDs are [already serialized to table storage as strings](dddcfa4949/src/ApiService/ApiService/onefuzzlib/orm/EntityConverter.cs (L190-L191)), so this simply loosens the restrictions placed upon them.
Scaleset IDs now have a strong type in the same way as other IDs; this helps to avoid mixing them up with other strings. Because of this I found one bug in the scaleset search query logic due to Pool ID/VMSS ID confusion. As part of fixing this I've changed the scaleset search query to only return nodes from the table rather than querying Azure to find a list; this seems to be sufficient for the CLI.
This is refactoring of our log uploading process.
- The process that upload the logs lives with the agent instead of the task
- The task now logs to a file and to the console.
- The task log file is synchronized to the log container periodically