Using auto-validations to help with user quality inference #29

jonfroehlich · 2019-07-18T21:03:56Z

I'd like us to investigate how we might be able to incorporate the auto-validator CV algorithm in helping us predict performance.

(Also, how many labels per user are necessary before the auto-validator becomes useful. Somewhat related to #27.)

nch0w · 2019-08-01T21:58:10Z

The CV confidence ranges from 0-100. The higher the confidence, the more the CV model thinks its prediction is correct.

We tried to use CV confidence for each label to predict whether it is correct.

Each plot has two histograms, one for the CV confidence of correct labels, and one for the CV confidence of incorrect labels. We can see that if a CurbRamp label has a CV confidence < 40, then it is probably incorrect, for example.

jonfroehlich · 2019-08-01T22:16:20Z

Can you provide a more in-depth summary of what you found in this analysis and the implications for us?

…

On Thu, Aug 1, 2019 at 2:58 PM Neil Chowdhury ***@***.***> wrote: The CV confidence ranges from 0-100. The higher the confidence, the more the CV model thinks its prediction is correct. We tried to use CV confidence for each label to predict whether it is correct. [image: Screenshot from 2019-08-01 14-54-51] <https://user-images.githubusercontent.com/17211794/62330097-80b00e00-b46c-11e9-9829-4bbb973cea03.png> Each plot has two histograms, one for the CV confidence of correct labels, and one for the CV confidence of incorrect labels. We can see that if a CurbRamp label has a CV confidence < 40, then it is probably incorrect, for example. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#29?email_source=notifications&email_token=AAML55NBI6TEO2XXDBOUXRDQCNL7FA5CNFSM4IE7UPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3MAFMQ#issuecomment-517472946>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAML55MGRY7FHAF5OILIPBLQCNL7FANCNFSM4IE7UPSQ> .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich <https://twitter.com/jonfroehlich> - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w · 2019-08-01T22:29:40Z

We also predict that if the CV label type matches the user label type, then the label is probably correct.

The rows represent CV labels, the columns represent user labels, and the values represent the probability that a label with that specific CV label and user label is correct.

          CR          NCR          O           SP
CR:  [0.93353028, 0.95294118, 0.9118541 , 0.89325843],
NCR: [0.87033748, 0.91358025, 0.9109589 , 0.89855072],
O:   [0.63453815, 0.5483871 , 0.59813084, 0.66071429],
SP:  [0.69811321, 0.6875    , 0.69662921, 0.74647887]]

jonfroehlich · 2019-08-01T22:32:05Z

I'm still not getting a sense of how this is useful. Can you write up a ~1-2 paragraph summary of your findings to complement the numbers. Can you articulate: what you found and how this is useful?

…

On Thu, Aug 1, 2019 at 3:29 PM Neil Chowdhury ***@***.***> wrote: We also predict that if the CV label type matches the user label type, then the label is probably correct. The rows represent CV labels, the columns represent user labels, and the values represent the probability that a label with that specific CV label and user label is correct. CR NCR O SP CR: [0.93353028, 0.95294118, 0.9118541 , 0.89325843], NCR: [0.87033748, 0.91358025, 0.9109589 , 0.89855072], O: [0.63453815, 0.5483871 , 0.59813084, 0.66071429], SP: [0.69811321, 0.6875 , 0.69662921, 0.74647887]] — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29?email_source=notifications&email_token=AAML55OC5U6RXGUM3EWTRFLQCNPVLA5CNFSM4IE7UPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3MCDEQ#issuecomment-517480850>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAML55OAV3BYWYKESWBIHITQCNPVLANCNFSM4IE7UPSQ> .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich <https://twitter.com/jonfroehlich> - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w · 2019-08-02T16:46:01Z

I found that CV predictions are not reliable for predicting the accuracy of a label. The histograms show that there is not much correlation between CV confidence and the accuracy of a label. We also expected that if the CV agrees with human's label, then the label is more likely to be accurate, but as shown in the table, this is not true.

The CV model as it stands must be refined before it is useful for auto-validations.

jonfroehlich · 2019-08-02T17:06:37Z

What CV model are you using? Your investigations would depend significantly on which ML model was used and how it was trained. Also, isn't this finding far more nuanced than your description implies in that the CV model performs differently depending on label type (e.g., it's far more accurate for curb ramp labels).

…

On Fri, Aug 2, 2019 at 9:46 AM Neil Chowdhury ***@***.***> wrote: I found that CV predictions are not reliable for predicting the accuracy of a label. The histograms show that there is not much correlation between CV confidence and the accuracy of a label. We also expected that if the CV agrees with human's label, then the label is more likely to be accurate, but as shown in the table, this is not true. The CV model as it stands must be refined before it is useful for auto-validations. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29?email_source=notifications&email_token=AAML55ISOXIO75NI2LAED7LQCRQETA5CNFSM4IE7UPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3OIU5Q#issuecomment-517769846>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAML55NFWE3RITWG676PCDLQCRQETANCNFSM4IE7UPSQ> .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich <https://twitter.com/jonfroehlich> - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w · 2019-08-02T17:13:10Z

Yes, it could be useful for predicting the accuracy of CurbRamp labels. But keep in mind that 92.5% of CurbRamp labels are correct anyways.

We used the DC model.

nch0w · 2019-08-12T17:18:33Z

Here are plots updated with new predictions from Devesh.

          CR          NCR          O           SP
CR:  [0.91929825, 0.90588235, 0.89781022, 0.88343558],
NCR: [0.82674772, 0.88870432, 0.90204082, 0.86631016],
O:   [0.53398058, 0.504,      0.50246305, 0.56],
SP:  [0.64705882, 0.63694268, 0.7,        0.6 ]]

nch0w · 2019-08-12T17:19:20Z

According to the plots, I don't think CV is very useful for predicting user accuracy yet.

jonfroehlich · 2019-08-12T17:25:17Z

That surprises me. I think you should meet with Galen and discuss your results.

…

On Mon, Aug 12, 2019 at 10:19 AM Neil Chowdhury ***@***.***> wrote: According to the plots, I don't think CV is very useful for predicting user accuracy yet. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29?email_source=notifications&email_token=AAML55OTTDIRJF5J3SFT3UDQEGLRRA5CNFSM4IE7UPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4DG5WY#issuecomment-520515291>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAML55MWMBVA4SLALXKKG2DQEGLRRANCNFSM4IE7UPSQ> .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich <https://twitter.com/jonfroehlich> - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w · 2019-08-13T23:40:21Z

FYI, if we want to use CV to predict user accuracy, we will also need to run it on all the labels. I only have predictions for ~4,000 labels out of the 65,700 total labels.

nch0w assigned daotyl000 Jul 23, 2019

daotyl000 added discussion Proposing Ideas or discussions enhancement New feature or request labels Aug 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using auto-validations to help with user quality inference #29

Using auto-validations to help with user quality inference #29

jonfroehlich commented Jul 18, 2019

nch0w commented Aug 1, 2019

jonfroehlich commented Aug 1, 2019 via email

nch0w commented Aug 1, 2019

jonfroehlich commented Aug 1, 2019 via email

nch0w commented Aug 2, 2019

jonfroehlich commented Aug 2, 2019 via email

nch0w commented Aug 2, 2019

nch0w commented Aug 12, 2019

nch0w commented Aug 12, 2019

jonfroehlich commented Aug 12, 2019 via email

nch0w commented Aug 13, 2019

Using auto-validations to help with user quality inference #29

Using auto-validations to help with user quality inference #29

Comments

jonfroehlich commented Jul 18, 2019

nch0w commented Aug 1, 2019

jonfroehlich commented Aug 1, 2019 via email

nch0w commented Aug 1, 2019

jonfroehlich commented Aug 1, 2019 via email

nch0w commented Aug 2, 2019

jonfroehlich commented Aug 2, 2019 via email

nch0w commented Aug 2, 2019

nch0w commented Aug 12, 2019

nch0w commented Aug 12, 2019

jonfroehlich commented Aug 12, 2019 via email

nch0w commented Aug 13, 2019