* Add support for etag and timestamp
Introducing EntityBase
Starting migration of QueueNodeHearbeat
* rename namespaces
* upgrade Microsoft.Azure.Functions.Worker to 1.6.0
Added support when name contains underscore tot the case converter
* Support for not renaming enum fields
* bug fixes
* Arm client created in the contructor
added null check
* It does some things
* Download logs from job config
* Lint
* Make mypy happy
* Update to handle the new logs path
* progress
* A job might not have logs set in config
* Mypy wanted a type annotation
* Setting the service side of the log management
- a log is created or reused when e create a job
- when scheduling the task we send the log location to the agent
The expected log structure looks like
{fuzzContainer}/logs/{job_id}/{task_id}/{machine_id}/1.log
* Setting the service side of the log management
- a log is created or reused when e create a job
- when scheduling the task we send the log location to the agent
The expected log structure looks liek
{fuzzContainer}/logs/{job_id}/{task_id}/{machine_id}/1.log
* regenerate doces
* including job_id in the container name
* regenerating docs
removing bad doc file
Add a CLI tool and library code to debug missing dynamic library errors on Windows.
The implementation manually edits the registry global flags for an image file to temporarily enable loader snaps, runs the target under our custom debugger to collect the debug output strings, then parses them for informative loading errors. It does not depend on the presence of `gflags.exe`.
This detects both dynamic linking (and thus process startup) errors, as well as dynamic loading (`LoadLibrary`) errors. It can report multiple missing dynamically-linked libraries.
* move the event grid subscription to the template
* change the name of the new subscription to prevent deleting the wrong subscription
* refactoring
* mypy fix
* format
* format
* remove old event grid before arm deployment
* fix deply
* attempt to fix check-pr issue
* fix interactive login in check-pr
* move the event grid subscription to the tempalte
* change the name of the new subscription to prevent deleting the wrong subscription
* refactoring
* mypy fix
* format
* format
* remove old event grid before arm deployment
* using resource Id
* fix type
* fix location
* revert changes in registration.py
* build fix attempt
* build fix
* revert ci changes
* remove file
* address comment
* address PR comments
* naming
* fix deplyment
* Inital changes needed for backoff 0.4 to work
* Update backoff versions, fix BackoffError:Transient fields, other uses
* Format
* Removed redundant field name
* Improved backoff update changes
* Update backoff update
* Revert
* Changed to using Error::transient function
* Abstract node disposal strategy
* Cleanup + lint
* Handle possibile scalesets being in resize state
* Setting the size is still exposed via CLI, we don't want to break that functionality
* PR comments
* Release 5.1.0
* Update CHANGELOG.md
Co-authored-by: Joe Ranweiler <joe@lemma.co>
* Update CHANGELOG.md
Co-authored-by: Joe Ranweiler <joe@lemma.co>
* Update CHANGELOG.md
Co-authored-by: Joe Ranweiler <joe@lemma.co>
* Prevent deletion of the repro VM on failure for debugging.
Co-authored-by: Joe Ranweiler <joe@lemma.co>
Add a setup script-specific timeout of 59 minutes. This is just shorter than the service-side `NODE_EXPIRATION_TIME` which otherwise garbage collects nodes whose setup scripts are stuck or taking too long.
With this change, the high-level cause of the timeout is clear, instead of the closest error being something indirect, like "node reimaged during task execution".
- Add `onefuzz::memory::available_bytes()` to enable checking system-wide memory usage
- In managed task worker runs, heuristically check for imminent OOM conditions and try to exit early
* Initial progress to adding a auto scale resource
* auto scale API is ready
* When creating a scaleset, add an autoscale resource to it as well
* Auto scale is correctly linked with scaleset
* 🧹
* Lint
* Cleaned up
Refactoring check-pr.py to extract the logic of downloading the binaries
refactoring integration-tets.py to split the logic of setup, launch, check_result and cleanup
* draft attempt at adding scaling protection
* Service can now control scaling protection policy on VM instances
* Improve logging a bit
* draft attempt at adding scaling protection
* Service can now control scaling protection policy on VM instances
* Improve logging a bit
* Error message was missing info
* Linter
* Don't schedule work if we can't protect the node
* Last of the linter changes
* Update yanked block-buffer from 0.10.0 to 0.10.1
* update yanked crossbeam-utils from 0.8.5 to 0.8.7
* update yanked crossbeam-utils from 0.8.5 to 0.8.7
Co-authored-by: stas <statis@microsoft.com>