The flag `Node.reimage_queued` is intended to stop nodes from reimaging repeatedly.
In #970, in order to work around Azure API failures, this flag was cycled if the node was already set to cleanup. Unfortunately, reimaging can sometimes take a significant amount of time, causing this change to get nodes multiple times.
Instead of using `reimage_queued` as a flag, this PR deletes the node from the storage table upon reimage. When the node registers OR the next time through `Scaleset.cleanup_nodes`, the Node will be recreated automatically, whichever comes first.
- Define an enum to track the debugger's current understanding of debuggee thread state
- Update our private suspend/resume methods to update and return the current state
- Detect thread exit/debug event races in suspend/resume calls
This normalizes the SecretData serialization from the client to address #981.
When serializing objects sent to the service with secrets, we would turn it into a SecretData
We use SecretData to convert this:
`{"auth": {"user": "A", "personal_access_token": "B"}}`
to this:
`"auth": { "secret": { "url": "https://KEYVAULT-URL" }}`
Currently, in the case we have a SecretData we've not yet saved, the serialized form looks like this:
`{"auth": { "secret": {"user": "A", "personal_access_token": "B"}}}`
This PR simplifies the client side serialization to this:
`{"auth": {"user": "A", "personal_access_token": "B"}}`
Until Pydantic supports discriminated or "smart" unions, we need to work around the coercion issue impacting unions in our models.
This reuses the "smart union" implementation from https://github.com/samuelcolvin/pydantic/pull/2092
In a previous commit, reimage_queued was added to prevent reimaging a node while it is reimaging. However, this means reimaging failures due to Azure issues don't finish reimaging.
This will reset the this flag allowing the node to reimage in the following cleanup cycle.
- resuse the regex to parse the output of libfuzzer
- added a cancellation notification to report_fuzzer_sys_info.
~~The code seems to be actively waiting this function and consuming some cpu time~~
The notification allows us to reduce the time waiting for the fuzzing loop to terminate.
## Summary of the Pull Request
_What is this about?_
## PR Checklist
* [ ] Applies to work item: #xxx
* [ ] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/onefuzz) and sign the CLI.
* [ ] Tests added/passed
* [ ] Requires documentation to be updated
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx
## Info on Pull Request
_What does this include?_
## Validation Steps Performed
_How does someone test & validate?_
We're experiencing a bug where Unions of sub-models are getting downcast, which causes a loss of information.
As an example, EventScalesetCreated was getting downcast to EventScalesetDeleted. I have not figured out why, nor can I replicate it locally to minimize the bug send upstream, but I was able to reliably replicate it on the service.
While working through this issue, I noticed that deserialization of SignalR events was frequently wrong, leaving things like tasks as "init" in `status top`.
Both of these issues are Unions of models with a type field, so it's likely these are related.
One of the difficulties in crash repro as task is a race condition where we the client tries to connect before the cdb is running.
This makes it such that we can use the heartbeat to identify if the task has started before connecting in.
NOTE: In this PR, it's always set to None. See #830 for it's actual usage. However, I split out the PR for easier review.
this fixes an issue related to object id reuse that can occur making the
object identification cache fail. Instead, this simplifies the
hide_secrets to always recurse and use setattr to always set the value
based on the recursion.
Note, the object id reuse issue was seen in the
`events.filter_event_recurse` development and this was the fix for the
id reuse there.
Python documentation states:
id(object):
Return the “identity” of an object. This is an integer (or long integer)
which is guaranteed to be unique and constant for this object during its
lifetime. Two objects with non-overlapping lifetimes may have the same
id() value.