Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

werft.ts should throw errors if API is not used properly #8707

Closed
meysholdt opened this issue Mar 10, 2022 · 13 comments
Closed

werft.ts should throw errors if API is not used properly #8707

meysholdt opened this issue Mar 10, 2022 · 13 comments
Assignees
Labels
meta: stale This issue/PR is stale and will be closed soon

Comments

@meysholdt
Copy link
Member

Arthur and I just spend the morning debugging broken traces. We think we could have been more efficient if the wrong usage of the Werft TS API would throw errors.

Proposed validation:

  • a span must be open to log output
  • a span must be explicitly closed before its phase is closed
  • the phase must be open when interacting with one of its spans
  • a phase must be closed explicitly (?)
@mads-hartmann
Copy link
Contributor

Moving this to Reduce Technical Debt and adding it as a candidate to the next iteration, but I suspect we might not be able to get to it next week - let's see on Monday what the iteration looks like :)

@fullmetalrooster
Copy link
Contributor

a phase must be closed explicitly (?)

I exclude this problem for now as a phase is implicitly closed by opening new phase leaving only the last phase open. If it is decided that this should change I will return to it.

@fullmetalrooster
Copy link
Contributor

fullmetalrooster commented Mar 30, 2022

a span must be open to log output

This is already implemented [1].

If no span is open for the slice a new one will be created.

public log(slice, msg) {
        if (!this.sliceSpans[slice]) {
            const parentSpanCtx = trace.setSpan(context.active(), this.currentPhaseSpan);
            const sliceSpan = this.tracer.startSpan(`slice: ${slice}`, undefined, parentSpanCtx)
            sliceSpan.setAttributes(this.globalSpanAttributes)
            this.sliceSpans[slice] = sliceSpan
        }
        console.log(`[${slice}] ${msg}`)
    }
 public log(slice, msg) {
        if (!this.sliceSpans[slice]) {
            try {
                const parentSpanCtx = trace.setSpan(context.active(), this.currentPhaseSpan);
                const sliceSpan = this.tracer.startSpan(`slice: ${slice}`, undefined, parentSpanCtx)
                sliceSpan.setAttributes(this.globalSpanAttributes)
                this.sliceSpans[slice] = sliceSpan
            } catch (err) {
                console.log(`[${slice}] ${err}`)
            }
        }
        console.log(`[${slice}] ${msg}`)
    }

@meysholdt
Copy link
Member Author

This is already implemented [1].
If no span is open for the slice a new one will be created.

true! However, it would be helpful to have an error if a span has been closed via done() (see below) and later receives log output.

public done(slice: string) {
const span = this.sliceSpans[slice]
if (span) {
span.end()
delete this.sliceSpans[slice]
}
console.log(`[${slice}|DONE]`)
}

@meysholdt
Copy link
Member Author

If you see that more/other validation rules make sense, feel free to propose!

From a higher-level perspective, everything that could lead to
a) broken traces (see below)
b) bogus rendering in Werft (I don't have an example)
should be found by validation rules.

image

Full trace: https://ui.honeycomb.io/gitpod/datasets/werft/trace/rt5vDLpDBZJ?span=601a2355d4de6061

It's not clear that wrong API usage really is the cause for the broken trace, but I suspect it is - and if it is not, then at least we can rule it out as a cause and prevent future wrong API usage.

@fullmetalrooster
Copy link
Contributor

fullmetalrooster commented Mar 31, 2022

As far as I understood how opentelemetry and werft are interconnected this problem is not easily solved. Opentelemetry is highly structured where werft is not. Opentelemetry starts with a root span and new instances of spans are derived from that root-span. Those child-spans can have their own children and so on. Werft on the other side has only one instance, phases and slices are mere conventions of log-output. The structure we perceive over the Web-UI is an interpretation of that log-output.
To solve this problem we need to bind the structure of werft more closely to the structure of opentelemetry. This way we bind every job, phase and slice to a span and can validate the current state.
The overall concept is similar. The werft instance is bound to the root-span as it is already implemented. Phases and slices will get their own classes and a werft instance can have multiple phase instances which will than have multiple slice instances. This way we can reference spans with phases and slices which is currently only possible for the root-span.

phases_and_slices drawio
.

@meysholdt
Copy link
Member Author

Thank you for your comment. I believe your picture illustrates correctly how we currently map Werft API calls to open telemetry spans.

It's true that structure on the Werft API side currently is not enforced. Therefore, invalid structures can occur and can subsequently not be mapped to valid open telemetry spans. This is exactly the motivation behind this GitHub issue: detect the invalid structures on the Werft side and warn the developer.

if you think refactoring is necessary, feel free to make a proposal. it's easier to discuss it based on example than in an abstract/generic way. Please keep the refactoring as minimal as required.

@fullmetalrooster
Copy link
Contributor

fullmetalrooster commented Apr 1, 2022

I created a first draft of my datamodel. As I never wrote any TS besides tiny changes I need some feedback if this is a reasonable approach or if there are flaws that need to be taken care of. I left the opentelemetry part out.

class Werft {
    phases: { [phase: string]: Phase } = {}
    constructor() {}

    startPhase(name: string, desc?: string) {
        try {
            if(this.getActivePhase())
                throw new Error(`The phase ${this.getActivePhase()} is still active!`)
            if (this.phases[name]) {
                throw new Error(`The phase ${name} already exists!`);
            } else {
                this.phases[name] = new Phase(name, desc);
            }
        } catch (err) {
            console.error(err);
        }
    }

    endPhase(name: string) {
        try {
            if (this.phases[name]) {
                this.phases[name].end();
            } else {
                throw new Error(`The phase ${name} does not exist!`)
            }
        } catch (err) {
            console.error(err);
        }
    }

    getActivePhase() {
        for (const [name, phase] of Object.entries(this.phases)) {
            if (phase.active)
                return name;
        }
    }

    status() {
        console.log(JSON.stringify(this));
    }

    phase(name: string, desc?: string) {
        if(this.getActivePhase())
            this.endPhase(this.getActivePhase());
        this.startPhase(name, desc);
    }

    log(slice: string, msg: string) {
        try {
            if(this.getActivePhase()) {
                this.phases[this.getActivePhase()].log(slice, msg);
            } else {
                throw new Error(`log: There is no active phase which can receive the message!`)
            }    
        } catch (err) {
            console.error(err);
        }
    }

    fail(slice: string, err: string) {
        try {
            if(this.getActivePhase()) {
                this.phases[this.getActivePhase()].fail(slice, err);
            } else {
                throw new Error(`fail: There is no active phase which can receive the error message!`)
            }
        } catch (err) {
            console.error(err);
        }
    }

    done(slice: string) {
        try {
            if(this.getActivePhase()) {
                this.phases[this.getActivePhase()].done(slice);
            } else {
                throw new Error(`done: There is no active phase!`)
            }
        } catch (err) {
            console.error(err);
        }
    }

}


class Phase {
    name: string;
    desc: string = "";
    active: boolean;
    slices: { [slice: string]: Slice } = {};
    constructor(name: string, desc?: string) {
        this.name = name;
        if(desc)
            this.desc = desc;
        this.active = true;
        console.log(`[${name}|PHASE] ${desc || name}`)
    }

    startSlice(name: string) {
        try {
            if (this.slices[name]) {
                throw new Error(`The slice ${name} already exists within the phase ${this.name}!`);
            } else {
                this.slices[name] = new Slice(name);
            }
        } catch (err) {
            console.error(err);
        }
    }

    endSlice(name: string) {
        try {
            if (this.slices[name]) {
                this.slices[name].done();
            } else {
                throw new Error(`endSlice: The slice ${name} does not exist in the phase ${this.name}!`);
            }
        } catch(err) {
            console.error(err);
        }
    }

    end() {
        for (const [name, slice] of Object.entries(this.slices)) {
            slice.done();
        }
        this.active = false;
        console.log(`The phase ${this.name} has ended!`);
    }

    log(slice: string, msg: string) {
        if(!this.slices[slice])
            this.startSlice(slice);
        this.slices[slice].log(msg)
    }

    fail(slice: string, err: string) {
        if(!this.slices[slice])
            this.startSlice(slice);
        this.slices[slice].fail(err)
    }

    done(slice: string) {
        if(this.slices[slice]) {
            this.slices[slice].done();
        } else {
            throw new Error(`done: There is no slice ${slice} in the phase ${this.name}!`);
        }
    }

}


class Slice {
    name: string;
    active: boolean;
    constructor(name: string) {
        this.name = name;
        this.active = true;
    }

    log(msg: string) {
        console.log(`[${this.name}] ${msg}`)
    }

    fail(err: string) {
        console.log(`[${this.name}|FAIL] ${err}`);
        throw err;
    }

    done() {
        // this process should close all open resources attached to the slice
        this.active = false;
        console.log(`[${this.name}|DONE]`)
    }
}


// Testing
const werft = new Werft();

werft.phase("p1", "1st phase");
werft.phase("p2");
werft.log("test", "test message");
werft.done("test");
werft.status();

@meysholdt
Copy link
Member Author

hi @wulfthimm!

Are you sure introducing a new data model is the smallest viable change to introduce the needed validation rules? It does not seem to be in the spirit of shipping skateboards because I'm worried it changes the API and implies refactorings of all usages of the API and it would require you to re-implement the open telemetry integration (as you said).

It may be simpler to look at the validation rules one by one, determine if they can be implemented based on the current data model and if not think about what is the smallest amount of data that needs to be added to the current data model.

@ArthurSens and I did talk about this Issue and it sounded like when you and he discussed a solution, a validation rule you had in mind was to check that a phase that had been closed will not be opened again. That's a nice validation rule! It does not require creating a new data model, though. It would be sufficient to store a list of phase names that are already closed and when a new phase gets started, check if the name is already in that list.

@ArthurSens
Copy link
Contributor

I just remembered another problem @meysholdt, we use exec all across our jobs and exec can also create new slices.

I believe we'll need to add exec to the Werft class to validate if a slice/phase was already closed

@fullmetalrooster
Copy link
Contributor

@meysholdt Many thanks for your comment. I am not sure if that is the right approach therefore I posted it to gather some feedback. I made this as a first draft to follow my previous thoughts about structuring the werft job. I broke the problem down to get a better understanding about the relations between the entities and thought about useful ways to manage them. My approach reproduces the original methods and they work in the same way but are now using standardized methods. There is no refactoring necessary.
In my opinion this appoach splits the problem into smaller pieces and changes can be made easier. But I understand that it is not the minimal viable change we are aiming for.

@fullmetalrooster
Copy link
Contributor

I opened a new issue to discuss if we want to enforce closing phases [1].

@stale
Copy link

stale bot commented Jul 7, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the meta: stale This issue/PR is stale and will be closed soon label Jul 7, 2022
@stale stale bot closed this as completed Aug 13, 2022
Repository owner moved this from In Progress to Done in ☁️ DevX by 🚚 Delivery and Operations Experience Team Aug 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: stale This issue/PR is stale and will be closed soon
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants