Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Method for obtaining the highest/lowest confidence observations from each target class #19

Open
jmmalo03 opened this issue Aug 2, 2013 · 1 comment

Comments

@jmmalo03
Copy link

jmmalo03 commented Aug 2, 2013

After running/training a classifier on some observations for a binary decision problem, I often like to quickly extract the most "easy" and "difficult" observations from each target class. In other words, I would like a method (or methods) that will quickly provide me with:
(1) The 'n' observations with the largest decision statistic from the positive class
(2) The 'n' observations with the lowest decision statistic from the positive class
(3) The 'n' observations with the largest decision statistic from the negative class
(4) The 'n' observations with the lowest decision statistic from the negative class

Alternatively, it would be nice to have a single method that independently sorts the observations under each target class according to their decision statistics.

@peterTorrione
Copy link
Collaborator

As spec'd out, something to get (1)-(4), I don't think this should be a method of prtDataSetClass, and it probably shouldn't be a method of prtClass or prtAction.

It shouldn't be a method of prtDataSetClass because it makes some assumptions - e.g., that you have only one feature. That you have a "positive" and "negative" class, etc.

If you have those circumstances, there's at least one quick ways to do this:

%Example, sort into H0 and H1, sorted by yOut confidence:
yOut = classifier.run(ds);
[sorted,inds] = sort(yOut.X);
dsSort = ds.retainObservations(inds); %sort the dataSet
dsSort0 = dsSort .retainClasses(0);
dsSort1 = dsSort .retainClasses(1);

Now, the first N of dsSort0 are the easy H0, the last N are hard, and vice-versa for dsSort1.

One way to put some of these together might be: "sortBy":
e.g.
ds = ds.sortBy(sortVector,'withinClass',true);

So, you cold do:
yOut = classifier.run(ds);
ds = ds.sortBy(yOut.X,'withinClass',true);

But that doesn't actually save a whole ton of code...?

For now I don't see a super good reason to make a method that does the code in the example above...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants