Replies: 9 comments
-
A high volume of PRs and Issues can also indicate several positive aspects and benefits for Kata, namely an active community, diverse contributions for different backgrounds and use-cases, enhanced feature set new ideas or use cases, community growth we attract more contributors, and leads to continuous improvements since we now need to deal with this. On the one hand, we have several positive aspects, but this also creates new challenges. Maintaining a large backlog of PRs and Issues can strain the resources of the project maintainers; the capacity to review and address them all promptly is finite. A "huge" backlog may discourage contributors from community engagement because PRs and Issues are too long unresolved, or newcomers may find navigating the project and making meaningful contributions challenging. No other project has this diversity of components and a wide range of support for hypervisors and architectures. The code already has a modular design with a well-defined purpose, interfaces with other components, and a clear hierarchy. Now, we have a flat hierarchy of Issues and PRs. We need to create some structure in it. The first step would be to assign specific Issues/PRs to the SMEs and flag them according to the module. The SME can then prioritize and implement the PRs or reassign them. We also need active backlog management; we could introduce a weekly meeting where SMEs/AC go over the Issues and PRs and prioritize issues and PRs based on their impact, importance, and complexity. Additionally, dedicated time for issue and PR triage. Regularly review and close outdated or irrelevant issues and provide feedback to contributors to keep the backlog manageable. Auto labeling of Issues and PRs (based on "buzzwords") could help to organize them into buckets assigned to specific SMEs. SMEs can bring up Issues/PRs that affect the whole project to a bigger group that prioritizes and manages backlogs. It may also be worthwhile to assign a specific person who will do backlog triage on a turn basis for a particular time. I do not favor any metrics or points per story; this creates a false view of progress. An overview of Issues/PRs per module could give us insight into where help is needed, and we could shift resources accordingly. You can talk the talk, but you also need to walk the walk, and that's why I would volunteer to start implementing some structure of managing Issues/PRs and create a framework on how to handle such a diverse code-base and pool of people coming from all over the world and from different companies. Mr. Z |
Beta Was this translation helpful? Give feedback.
-
@zvonkok, something very similar to what you proposed has been done in the past, without much success. It may be how it was structured, where we had a self-proclaimed project manager who'd go through the issues and ask "who this should be assigned to?", without taking into consideration that this is a multi company project, where each one of us have our own priorities that pay our salaries, which may or may not include Kata Containers. The result was things piling up in the back of the very same and usual suspects, and overloading those folks even more, as that would actually waste one hour of each person who was on the call, making it a waste of at least half day, if not a full day when those hours would be summed up. With that said, I don't fully disagree on trying something like that again, as long as "offline triage" is put in place, and then folks could meet up to discuss what's the priority of the issues / PRs opened during the week, but again, this adds an overhead. Auto-labeling PRs depending on a specific area is something that would be good, but that requires at least two things:
We also tried, at least internally when I was working on a different company, doing the rotation of the triage and that was partially a disaster, as if the "previous owner" of the task decided not to do the job, it'd just overload the next one, and that happened quite a lot to be honest. We also had, as a community effort, a "review-rota" in place, where folks would volunteer to do the review of PRs for a week, but that was not enforced and we noticed that not everyone subscribed would engage to the real action, leading to the same outcome mentioned above, someone getting overloaded when doing it properly on their turn. All in all, I'm not against trying those again, but I must say, based on what I've seen in the past few years, that it works better in theory than it does when applied. |
Beta Was this translation helpful? Give feedback.
-
Having unattended issues and/or PRs isn't necessarily a problem per se. It just means that these issues are not critical enough to catch the attention of maintainers (who have limited time) and the original reporters didn't insist. The super long list of such unattended items isn't also a problem either, but rather the mechanical result of asking people to create issues/PRs in GH without knowing if there is capacity to have them fixed. GH exposing the full history of these gives the impression that the situation is getting worse and worse... but this is the opposite actually. It means that people are participating. A concern would be if the number of PRs drastically drops down. What I see as an issue is the negative effect that this infinitely growing backlog seems to have on maintainers though.
I'd advocate to implement reality here :-) Not everything will be implemented or fixed because of limited resources as usual. I'd suggest to simply close everything that doesn't get proper attention after a period of time, with some notifications to warn reporters that their issue is about to close if they don't do something. Important stuff will survive, "nice to have" can make it if people advocating it did enough lobbying and the rest will go back to oblivion. This is a common practice in email based workflows like the linux kernel or QEMU : not getting a timely response to your patches acts as a "no, not interested" answer. For the cases where an issue does actually make sense but maintainers didn't see or realize it, the reporter or contributor can still reopen and provide appropriate justification. |
Beta Was this translation helpful? Give feedback.
-
@fidencio All fair points and indeed I have seen the very same behaviour where one gets overloaded and others are simply doing nothing. I am imagining a self-driven team aka collaboration without oversight. Folks need the freedom to decide what's best and drive architecture without a "guru" telling each invidudual what and how exactly to do it. Speaking out of my own experience what I have been doing in the past: Assign each PR/Issue that has anything to do with GPUs the label I am going through the backlog and looking for Issues/PRs that are related to GPUs and assigning myself as a reviewer. This could be also done "automatically". If Issues/PRs are too old and not relevant anymore close them with a comment. And you're completely right folks need to be actively looking into it. Folks need to take responsibility, and this can be hard if it is just a "side" project. The leads in the areas need to step up and live it show how we're doing it, and we can only lead by example. This is what we kinda already have in some sense but got drained by the flood of new features. Doing my first PRs in the Kata community was for me a revealing situation because decisions were not made on a political basis but rather the technical know-how and dedication to the project of several individuals, and this way of work was what kept me here. |
Beta Was this translation helpful? Give feedback.
-
Another thought that I had was to auto-label issues/PRs with the current Kata release. Reviewing the backlog we might immediately see that issues targeting specific features are already implemented in a current release. e..g all Issues/PRs targeting PCIe/GPU with label Kata 2.x can be closed because in 3.x we revamped the PCIe topology and added proper support for GPUs. |
Beta Was this translation helpful? Give feedback.
-
First of all: having a huge number of PRs and issues shows that there is a lot of interest in the project. That in itself is a success. The issues are also useful as a compass: they show us things that users care about. But as with other systems, the Kata project is subject to some basic laws that apply to all systems:
I consider it an issue for long-term project health but not one that requires an urgent (=stop the world) response. We need to make sure that issue/prs don't go unanswered indefinitely, and that we don't leave user needs unheard or we will eventually lose contributors/users to something that does. So we need a long-term strategy for dealing with issues and then need to let it do it's thing. Off the top of my head I can think of different types of issues with different needs:
As mentioned in my candidacy submission, I've been doing a lot of reviewing in recent times. The way I personally approach it is to regularly scan most recently reported issues and issues in areas that I'm interested in. This can be applied to the broader project in the following way: I suggest we identify people or groups of people and associate them with areas (subsystems) that they are most interested in. Issue can be tagged (automation can help here) and we make it the responsibility of people to look after issues in their areas. We will need more people in maintainer/reviewer roles, especially for areas that currently don't have enough staffing. Fabiano has been doing a good job bringing more people into the project to take on these duties. The AC can help project members step up if this is required. I do not like automation that closes issues automatically after a certain amount of time (like Kubernetes projects). That being said, issues older than a year will probably tend towards being closed with no resolution. Still: it should be the responsibility of humans to make that decision, to give a human response at the closing of an issue. By reviewing closed issues we also recognize patterns or problem areas that would go unnoticed if done only by automation. This approach coupled with a couple of regular sessions to trim down the existing backlog in a decisive manner (no long debates, quick decisions) will keep things manageable. We will never get issues down to zero and that shouldn't be the goal. The goal is to stay aligned with and responsive to user needs. |
Beta Was this translation helpful? Give feedback.
-
Of course, I think these are indeed a problem.
First of all, we should treat different items differently: for the increasing number of issues piling up, we first see if we can classify these issues and determine what type they belong to, because our project uses The user experience was not good, causing users to encounter many problems, so they raised issues, or issues were raised after discovering bugs in the product itself. For the former, we should enrich our documents so that users, especially new users, can try our projects smoothly; for the second case, we should classify these issues according to their severity, such as serious, general and no impact. wait. This way project maintainers can prioritize fixing more serious problems.
Secondly, for more and more pull requests, it is best to assign the PR to a maintainer who is familiar with the module as soon as possible. After all, for PRs that are not owned, no one may be willing to take the initiative to take on the responsibility of review. Of course, this may place a certain burden on maintainers.
|
Beta Was this translation helpful? Give feedback.
-
I would like to jump into the topic too (Yeah, it is so important!). I think we should start to:
We should really have a discussion about it in one of the AC or vPTG meetings. |
Beta Was this translation helpful? Give feedback.
-
Kata Containers is a very nice project, and since I joined I've been blesses to be around people who always have been teaching me something new, every single day.
However, we know that the life of folks reportng isues on GitHub, and even the ones raising PRs is not as flower-ish as we can see a huge list of 1.1k+ issues opened, and almost 200 PRs at the moment I'm writing down this message.
It clearly states we must be "more dynamic" on what we're doing, and that we're failing somewhere, as a community, to approach those, which may lead to actually reducing the confidence on the project and / or the amount of newcomers (who could really help us).
So, two questions from this:
Beta Was this translation helpful? Give feedback.
All reactions