elections-2023-10: What's the best way to reduce the huge PR / issues queue we have? #367

fidencio · 2023-10-17T09:55:11Z

fidencio
Oct 17, 2023
Maintainer

Kata Containers is a very nice project, and since I joined I've been blesses to be around people who always have been teaching me something new, every single day.

However, we know that the life of folks reportng isues on GitHub, and even the ones raising PRs is not as flower-ish as we can see a huge list of 1.1k+ issues opened, and almost 200 PRs at the moment I'm writing down this message.

It clearly states we must be "more dynamic" on what we're doing, and that we're failing somewhere, as a community, to approach those, which may lead to actually reducing the confidence on the project and / or the amount of newcomers (who could really help us).

So, two questions from this:

Do you consider this an issue? (And if not, why not?)
In case you also consider this an issue, what would be your take to make this more manageable?

fidencio · 2023-10-17T09:55:44Z

fidencio
Oct 17, 2023
Maintainer Author

cc @jepio @lifupan @zvonkok @gkurz @jiangliu

0 replies

zvonkok · 2023-10-17T11:09:38Z

zvonkok
Oct 17, 2023
Maintainer

A high volume of PRs and Issues can also indicate several positive aspects and benefits for Kata, namely an active community, diverse contributions for different backgrounds and use-cases, enhanced feature set new ideas or use cases, community growth we attract more contributors, and leads to continuous improvements since we now need to deal with this.

On the one hand, we have several positive aspects, but this also creates new challenges.

Maintaining a large backlog of PRs and Issues can strain the resources of the project maintainers; the capacity to review and address them all promptly is finite. A "huge" backlog may discourage contributors from community engagement because PRs and Issues are too long unresolved, or newcomers may find navigating the project and making meaningful contributions challenging.
All in all, it can hinder the project's progress and the engagement of both existing contributors and potential newcomers.

No other project has this diversity of components and a wide range of support for hypervisors and architectures. The code already has a modular design with a well-defined purpose, interfaces with other components, and a clear hierarchy.
What worked in previous projects I worked on was to have SMEs for each or several components, who would then be responsible for Issues, PRs, and Documentation and tests for such a module. No one can handle the complete code-base.

Now, we have a flat hierarchy of Issues and PRs. We need to create some structure in it. The first step would be to assign specific Issues/PRs to the SMEs and flag them according to the module. The SME can then prioritize and implement the PRs or reassign them.

We also need active backlog management; we could introduce a weekly meeting where SMEs/AC go over the Issues and PRs and prioritize issues and PRs based on their impact, importance, and complexity. Additionally, dedicated time for issue and PR triage. Regularly review and close outdated or irrelevant issues and provide feedback to contributors to keep the backlog manageable.

Auto labeling of Issues and PRs (based on "buzzwords") could help to organize them into buckets assigned to specific SMEs.

SMEs can bring up Issues/PRs that affect the whole project to a bigger group that prioritizes and manages backlogs.

It may also be worthwhile to assign a specific person who will do backlog triage on a turn basis for a particular time.

I do not favor any metrics or points per story; this creates a false view of progress.

An overview of Issues/PRs per module could give us insight into where help is needed, and we could shift resources accordingly.

You can talk the talk, but you also need to walk the walk, and that's why I would volunteer to start implementing some structure of managing Issues/PRs and create a framework on how to handle such a diverse code-base and pool of people coming from all over the world and from different companies.

Mr. Z

0 replies

fidencio · 2023-10-17T11:27:54Z

fidencio
Oct 17, 2023
Maintainer Author

@zvonkok, something very similar to what you proposed has been done in the past, without much success.

It may be how it was structured, where we had a self-proclaimed project manager who'd go through the issues and ask "who this should be assigned to?", without taking into consideration that this is a multi company project, where each one of us have our own priorities that pay our salaries, which may or may not include Kata Containers. The result was things piling up in the back of the very same and usual suspects, and overloading those folks even more, as that would actually waste one hour of each person who was on the call, making it a waste of at least half day, if not a full day when those hours would be summed up.

With that said, I don't fully disagree on trying something like that again, as long as "offline triage" is put in place, and then folks could meet up to discuss what's the priority of the issues / PRs opened during the week, but again, this adds an overhead.

Auto-labeling PRs depending on a specific area is something that would be good, but that requires at least two things:

An action to do so
Folks actively looking at the specific areas

We also tried, at least internally when I was working on a different company, doing the rotation of the triage and that was partially a disaster, as if the "previous owner" of the task decided not to do the job, it'd just overload the next one, and that happened quite a lot to be honest. We also had, as a community effort, a "review-rota" in place, where folks would volunteer to do the review of PRs for a week, but that was not enforced and we noticed that not everyone subscribed would engage to the real action, leading to the same outcome mentioned above, someone getting overloaded when doing it properly on their turn.

All in all, I'm not against trying those again, but I must say, based on what I've seen in the past few years, that it works better in theory than it does when applied.

0 replies

gkurz · 2023-10-17T14:13:28Z

gkurz
Oct 17, 2023
Maintainer

Kata Containers is a very nice project, and since I joined I've been blesses to be around people who always have been teaching me something new, every single day.

However, we know that the life of folks reportng isues on GitHub, and even the ones raising PRs is not as flower-ish as we can see a huge list of 1.1k+ issues opened, and almost 200 PRs at the moment I'm writing down this message.

It clearly states we must be "more dynamic" on what we're doing, and that we're failing somewhere, as a community, to approach those, which may lead to actually reducing the confidence on the project and / or the amount of newcomers (who could really help us).

So, two questions from this:

Do you consider this an issue? (And if not, why not?)

Having unattended issues and/or PRs isn't necessarily a problem per se. It just means that these issues are not critical enough to catch the attention of maintainers (who have limited time) and the original reporters didn't insist. The super long list of such unattended items isn't also a problem either, but rather the mechanical result of asking people to create issues/PRs in GH without knowing if there is capacity to have them fixed. GH exposing the full history of these gives the impression that the situation is getting worse and worse... but this is the opposite actually. It means that people are participating. A concern would be if the number of PRs drastically drops down.

What I see as an issue is the negative effect that this infinitely growing backlog seems to have on maintainers though.

In case you also consider this an issue, what would be your take to make this more manageable?

I'd advocate to implement reality here :-) Not everything will be implemented or fixed because of limited resources as usual.
Some issues/PRs won't get attention because they aren't worth it for a variety of reasons. Period. No need to keep them along with others that might just be waiting for a developer with enough cycles to pick them up.

I'd suggest to simply close everything that doesn't get proper attention after a period of time, with some notifications to warn reporters that their issue is about to close if they don't do something. Important stuff will survive, "nice to have" can make it if people advocating it did enough lobbying and the rest will go back to oblivion. This is a common practice in email based workflows like the linux kernel or QEMU : not getting a timely response to your patches acts as a "no, not interested" answer.

For the cases where an issue does actually make sense but maintainers didn't see or realize it, the reporter or contributor can still reopen and provide appropriate justification.

0 replies

zvonkok · 2023-10-17T14:48:45Z

zvonkok
Oct 17, 2023
Maintainer

@fidencio All fair points and indeed I have seen the very same behaviour where one gets overloaded and others are simply doing nothing.

I am imagining a self-driven team aka collaboration without oversight. Folks need the freedom to decide what's best and drive architecture without a "guru" telling each invidudual what and how exactly to do it.

Speaking out of my own experience what I have been doing in the past:

Assign each PR/Issue that has anything to do with GPUs the label area/gpu so I can easier filter on the Issues/PRs. This could also be a way to make someone aware of some PR/Issue that was labeled by some other community member.

I am going through the backlog and looking for Issues/PRs that are related to GPUs and assigning myself as a reviewer. This could be also done "automatically". If Issues/PRs are too old and not relevant anymore close them with a comment.
I as the "owner" of the area/submodule know what features are coming up and what makes sense to implement now or later.

And you're completely right folks need to be actively looking into it. Folks need to take responsibility, and this can be hard if it is just a "side" project.

The leads in the areas need to step up and live it show how we're doing it, and we can only lead by example. This is what we kinda already have in some sense but got drained by the flood of new features.

Doing my first PRs in the Kata community was for me a revealing situation because decisions were not made on a political basis but rather the technical know-how and dedication to the project of several individuals, and this way of work was what kept me here.

0 replies

zvonkok · 2023-10-18T06:16:15Z

zvonkok
Oct 18, 2023
Maintainer

Another thought that I had was to auto-label issues/PRs with the current Kata release. Reviewing the backlog we might immediately see that issues targeting specific features are already implemented in a current release. e..g all Issues/PRs targeting PCIe/GPU with label Kata 2.x can be closed because in 3.x we revamped the PCIe topology and added proper support for GPUs.
I think we can come up with more categorization of Issues/PRs that can help us do our job better.

0 replies

jepio · 2023-10-18T09:11:23Z

jepio
Oct 18, 2023
Collaborator

First of all: having a huge number of PRs and issues shows that there is a lot of interest in the project. That in itself is a success. The issues are also useful as a compass: they show us things that users care about.

But as with other systems, the Kata project is subject to some basic laws that apply to all systems:

if you have a queue in a system and the rate of adding items to the queue is higher than the rate of removing them - the queue will grow unbounded
"backpressure" is used to align consumers (maintainers) with producers (contributors)

So, two questions from this:
Do you consider this an issue? (And if not, why not?)

I consider it an issue for long-term project health but not one that requires an urgent (=stop the world) response. We need to make sure that issue/prs don't go unanswered indefinitely, and that we don't leave user needs unheard or we will eventually lose contributors/users to something that does. So we need a long-term strategy for dealing with issues and then need to let it do it's thing.

Off the top of my head I can think of different types of issues with different needs:

question,bugs: these should be answered promptly
tracking/reminders for the future: these will linger until they become obsolete or the reporter gets back to them (or someone else hits them)
discussions on solutions: these can and should be allowed to go on for a long time. eventually either they come to a consensus or need to be closed.
obsolete issues: the project has gone through many mergers and big changes, lots of issues are no longer relavant.

In case you also consider this an issue, what would be your take to make this more manageable?

As mentioned in my candidacy submission, I've been doing a lot of reviewing in recent times. The way I personally approach it is to regularly scan most recently reported issues and issues in areas that I'm interested in. This can be applied to the broader project in the following way:

I suggest we identify people or groups of people and associate them with areas (subsystems) that they are most interested in. Issue can be tagged (automation can help here) and we make it the responsibility of people to look after issues in their areas.

We will need more people in maintainer/reviewer roles, especially for areas that currently don't have enough staffing. Fabiano has been doing a good job bringing more people into the project to take on these duties. The AC can help project members step up if this is required.

I do not like automation that closes issues automatically after a certain amount of time (like Kubernetes projects). That being said, issues older than a year will probably tend towards being closed with no resolution. Still: it should be the responsibility of humans to make that decision, to give a human response at the closing of an issue. By reviewing closed issues we also recognize patterns or problem areas that would go unnoticed if done only by automation.

This approach coupled with a couple of regular sessions to trim down the existing backlog in a decisive manner (no long debates, quick decisions) will keep things manageable.

We will never get issues down to zero and that shouldn't be the goal. The goal is to stay aligned with and responsive to user needs.

0 replies

lifupan · 2023-10-19T08:30:28Z

lifupan
Oct 19, 2023
Maintainer

Of course, I think these are indeed a problem. First of all, we should treat different items differently: for the increasing number of issues piling up, we first see if we can classify these issues and determine what type they belong to, because our project uses The user experience was not good, causing users to encounter many problems, so they raised issues, or issues were raised after discovering bugs in the product itself. For the former, we should enrich our documents so that users, especially new users, can try our projects smoothly; for the second case, we should classify these issues according to their severity, such as serious, general and no impact. wait. This way project maintainers can prioritize fixing more serious problems. Secondly, for more and more pull requests, it is best to assign the PR to a maintainer who is familiar with the module as soon as possible. After all, for PRs that are not owned, no one may be willing to take the initiative to take on the responsibility of review. Of course, this may place a certain burden on maintainers.

0 replies

bergwolf · 2023-10-23T10:23:05Z

bergwolf
Oct 23, 2023
Maintainer

I would like to jump into the topic too (Yeah, it is so important!). I think we should start to:

use the stale issue/pr bot to close old ones, and
if someone thinks an issue/PR is really important, she can comment on it and get it re-opened, and
we should walk through the remaining issues/PRs (the really active ones!) in a weekly or monthly manner, and drive related developers/maintainers to review them

We should really have a discussion about it in one of the AC or vPTG meetings.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

elections-2023-10: What's the best way to reduce the huge PR / issues queue we have? #367

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

elections-2023-10: What's the best way to reduce the huge PR / issues queue we have? #367

fidencio Oct 17, 2023 Maintainer

Replies: 9 comments

fidencio Oct 17, 2023 Maintainer Author

zvonkok Oct 17, 2023 Maintainer

fidencio Oct 17, 2023 Maintainer Author

gkurz Oct 17, 2023 Maintainer

zvonkok Oct 17, 2023 Maintainer

zvonkok Oct 18, 2023 Maintainer

jepio Oct 18, 2023 Collaborator

lifupan Oct 19, 2023 Maintainer

bergwolf Oct 23, 2023 Maintainer

fidencio
Oct 17, 2023
Maintainer

fidencio
Oct 17, 2023
Maintainer Author

zvonkok
Oct 17, 2023
Maintainer

fidencio
Oct 17, 2023
Maintainer Author

gkurz
Oct 17, 2023
Maintainer

zvonkok
Oct 17, 2023
Maintainer

zvonkok
Oct 18, 2023
Maintainer

jepio
Oct 18, 2023
Collaborator

lifupan
Oct 19, 2023
Maintainer

bergwolf
Oct 23, 2023
Maintainer