Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Clear Telemetry Signal when Missing or Duplicated or Misordered Ops are Detected #2011

Closed
anthony-murphy opened this issue May 4, 2020 · 3 comments
Assignees
Labels
area: loader Loader related issues telemetry
Milestone

Comments

@anthony-murphy
Copy link
Contributor

It is possible in some of our service architecture for ops to be lost due to bugs. If this happens both clients and the server should be able to detect it, and provide a clear signal it has happened, so the issues can be discovered and resolved quickly.

@anthony-murphy anthony-murphy added msft: oce Pertaining to Msft-internal on call tickets area: loader Loader related issues telemetry labels May 4, 2020
@ghost ghost added the triage label May 4, 2020
@curtisman curtisman added this to the June 2020 milestone May 24, 2020
@curtisman curtisman removed the triage label May 24, 2020
@curtisman curtisman modified the milestones: June 2020, July 2020 Jul 6, 2020
@danielroney danielroney modified the milestones: July 2020, October 2020 Sep 3, 2020
@markfields markfields self-assigned this Sep 16, 2020
@ChumpChief ChumpChief removed the msft: oce Pertaining to Msft-internal on call tickets label Oct 7, 2020
@markfields
Copy link
Member

@agarwal-navin - Your current investigation into #3840 might yield some insight into what would be effective here

@markfields markfields changed the title Provide Clear Telemetry Signal when Missing Ops are Detected Provide Clear Telemetry Signal when Missing or Duplicated or Misordered Ops are Detected Oct 9, 2020
@markfields
Copy link
Member

@anthony-murphy - I'm thinking about closing this. We're chipping away at this as our OCEs investigate different issues (e.g. #3840 and #3627). We also have some basic asserts in place already, not sure how much has changed since this issue was opened.

But I could also see the value in taking a wholistic approach instrumenting the whole area and building related optics. Is that what you're getting at here? From a planning perspective that feels harder to prioritize, but I'm open to talking more about it.

And on that note, if you had some specific tactical improvements in mind, those would be great to put in here if not create individual issues we can pick off one by one.

Let me know what you think on this one. Thanks!

@markfields
Copy link
Member

GitHub ate @anthony-murphy's response, but followed up offline and we agree on closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: loader Loader related issues telemetry
Projects
None yet
Development

No branches or pull requests

7 participants