Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a section with possible duplicates to the webcompat.com reporting form #248

Open
ksy36 opened this issue Jan 5, 2022 · 4 comments
Open

Comments

@ksy36
Copy link

ksy36 commented Jan 5, 2022

We regularly receive duplicate reports and some of them have been reported a lot of times (for example imgur.com). To potentially discourage users from reporting duplicates and therefore save time on triage, we could add a "possible duplicates" section to the form.

We could place it after "Web address", "Issue", "Details", "Testing" sections and before the "Description".

There is a similar section in bugzilla:

Screen Shot 2022-01-05 at 3 38 17 PM

Do you think that could be useful? @softvision-oana-arbuzov @softvision-raul-bucata @karlcow

@softvision-oana-arbuzov
Copy link

I think that would be nice to have.

I would say the location to be after typing the URL in the "Web address" field. After the field is filled the suggestion list with duplicates/related issues could be shown, similar to Bugzilla.

@karlcow
Copy link

karlcow commented Jan 6, 2022

With the work of @denschub on having a backup of all data, It is probably easier to do than in the past.

What would be the criteria for evaluating the duplicate nature? ML? or something else?

URIs are an issue, because they do not work for things like facebook.com, google.com, etc. Discussed in

ML and backup DB are game changers in all these discussions.
And also your experience in doing ML stuff. That's super cool.

@softvision-raul-bucata
Copy link

I agree with Oana, this would be cool and nice to have.

@ksy36
Copy link
Author

ksy36 commented Jan 12, 2022

Thanks for your input everyone!

I would say the location to be after typing the URL in the "Web address" field. After the field is filled the suggestion list with duplicates/related issues could be shown, similar to Bugzilla.

This is a good idea! Also, we need to think about what to do with domains with a lot of issues. Perhaps we could take a 2-phase approach:

  1. Once a reporter presses "Confirm" for the URL address we search for the issues with such domain and there could be two cases:

a) A few issues with such domain (for example nha.chotot.com):

Screen Shot 2022-01-12 at 2 35 07 PM

This subdomain has 5 issues open, 1 of them is fixed, 1 is open and 3 duplicates. Out of those duplicates, only one has a title changed on triage, so we can assume that this is the original issue for our purpose (AND it's closed as a duplicate of a bugzilla issue). So we'll show only 2 issues to a user as a "possible duplicate" (one of them is still open and other one is closed as duplicate). This assumption is probably not going to be valid in all cases :) But the changed title is pretty important I think, as it gives most of the context and means that the issue was in diagnosis at some point.

b) Domain with a lot of issues (for example imgur.com)

Screen Shot 2022-01-12 at 2 45 55 PM

This domain has 41 reports and it's not useful to show them all and also impossible to determine potential duplicates, just by having the domain name. In this case, we need more context from the user, type of the issue, steps to reproduce, etc. So if a search determined that there are more issues than a certain threshold (7-10 maybe?), then we show nothing to the user after they confirm their URL. Instead, we could build a model with bugbug and try to predict a duplicate after all content is entered. If a duplicate is found, maybe we can display it before or after the "screenshot" step.

This is a rough idea and it's quite likely I'm missing something :) I will look through the issues Karl posted to get more context as it appears significant research and work have been put into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants