I noticed these were getting very big and impacting our CI time due to copying huge artifacts. Presumably this would be slowing down copying in the live environment as well.
Two changes have been made:
- Use `debug=1` instead of `debug=true` (equivalent to `debug=2`); this should be sufficient for our needs
- On Linux, compress debug information after building
| Binary | Before | with `debug=1` | after compression |
|--|--:|--:|--:|
| onefuzz-agent (Linux) | 170 MB | 83 MB | 30 MB |
| onefuzz-task (Linux) | 284 MB | 134 MB | 46 MB |
| onefuzz_agent.pdb (Windows) | 89 MB | 42 MB | — |
| onefuzz_task.pdb (Windows) | 150 MB | 63 MB | — |
| onefuzz-deployment.zip | 364 MB | 286 MB | 285 MB |
Overall the compressed `release-artifacts` reduced from 374 → 297 MB.
Build speed improvements (latest build on `main` vs this PR):
| Step | Before | After |
|--|--:|--:|
| agent upload-artifact | 26s/32s/45s | 16s/15s/20s |
| package download-artifact | 1m 57s | 26 s |
| package upload-artifact | 2m 8s | 1m 35s |
* Bump tokio from 1.28.0 to 1.29.0 in /src/agent
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.0 to 1.29.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.28.0...tokio-1.29.0)
---
updated-dependencies:
- dependency-name: tokio
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* Update dependencies list for ARM64
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: George Pollard <gpollard@microsoft.com>
* Start event retention policy
* .
* Correlate telemetry from cli to service and out
* Traces end to end
* Linting
* .
* Fix build failures
* Trying to fix python dependency error
* .
* Lets let pip figure it out
* .
* Modified the wrong file
* .
* .
* .
* .
* .
* .
* This is the one
* fix lints?
* I _love_ python
* ...
* Undo some unnecessary changes
* Works again
* PR comments
* log exception as "error" since we are retrying anyways
* log exception when sending of webhook runs out of attempts
---------
Co-authored-by: stas <statis@microsoft.com>
This allows us to generate 404s when someone attempts to download from a non-existent container. At the moment we generate a 500 which isn't useful, or very good looks-wise.
* Adding initial metric.
* Syntax.
* syntax.
* Trying something else.
* Playing around with new metric function:
:
* Trying new format
* Fixing arguments.
* Importing metrics
* Reverting to events
* Removing
* Adding.
* Changing to int.
* Changing back to float.
* Adding metric lines for all events.
* trying to set.
* Fixing.
* Adding copy.
* Was this a problem..
* Adding different.
* Solution for all.
* Another.
* removing
* Resolving.
Caches are getting too big and we are exceeding the 10GB limit, leading to cache churning.
1. Try to make the caches smaller by using `Swatinem/rust-cache`, which is smarter about what gets cached.
- After doing this it turns out we don't really need `sccache` any more, it has very little impact upon compile times as the cache hit ratio is low. So remove it, to reduce complexity of build and size of build caches.
2. Also fix artifact caching which had been broken by a version format change (4956cf5406fc6817c41928a4713b7da3e4bd130d).
## Summary of the Pull Request
- **Breaking** (but as far as I know this feature is not yet in use): rename the `extra_container` to `extra_setup_container`.
- **Add**: the `extra_output_container`, which pushes its outputs continually.
- We may also want a type of container which both pushes & pulls? See discussion below.
- **Improved**: if `onefuzz-task` fails upon launch, we will log its output for diagnosis (might close#3113)
---
Some thoughts for the future:
We might want to redesign the containers so that we have something like the following which is passed to the agent, and the agent doesn't need to know the specifics of the containers supplied:
```jsonc
{
// ...
"containers": {
"extra_setup_dir": {
"mode": "pull",
"container_name": "yyy",
},
"extra_output_dir": {
"mode": "push",
"continuous": true, // keep pushing while job is running
"container_name": "xxx"
}
}
}
```
At the moment the agent needs to know what each container is for, for each task type. A more generic and flexible method might be simpler overall.