Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with ExpectedErrorReduction for instance selection #26

Open
JuliaMasche opened this issue Jun 21, 2020 · 1 comment
Open

Problem with ExpectedErrorReduction for instance selection #26

JuliaMasche opened this issue Jun 21, 2020 · 1 comment

Comments

@JuliaMasche
Copy link

JuliaMasche commented Jun 21, 2020

Hi,

I am running experiments for multi-class classification and get the following error for EpectedErrorReduction:
File "/home/julia/master_thesis/env/lib/python3.6/site-packages/alipy/query_strategy/query_labels.py", line 829, in select
score.append(pv[i, yi] * self.log_loss(prob))
IndexError: index 4 is out of bounds for axis 1 with size 4

I think the Error is that my initial seed set (label index) does not contain all labels which can be found in y.
In the following code, shouldn't it be

classes = np.unique(label_y)
instead of
classes = np.unique(self.y)?

`` if self.X is None or self.y is None:
raise Exception('Data matrix is not provided.')
if model is None:
model = LogisticRegression(solver='liblinear')
model.fit(self.X[label_index if isinstance(label_index, (list, np.ndarray)) else label_index.index],
self.y[label_index if isinstance(label_index, (list, np.ndarray)) else label_index.index])

    unlabel_x = self.X[unlabel_index]
    label_y = self.y[label_index]
    ##################################

    classes = np.unique(self.y)
    pv, spv = _get_proba_pred(unlabel_x, model)
    scores = []
    for i in range(spv[0]):
        new_train_inds = np.append(label_index, unlabel_index[i])
        new_train_X = self.X[new_train_inds, :]
        unlabel_ind = list(unlabel_index)
        unlabel_ind.pop(i)
        new_unlabel_X = self.X[unlabel_ind, :]
        score = []
        for yi in classes:
            new_model = copy.deepcopy(model)
            new_model.fit(new_train_X, np.append(label_y, yi))
            prob = new_model.predict_proba(new_unlabel_X)
            score.append(pv[i, yi] * self.log_loss(prob))
        scores.append(np.sum(score))

    return unlabel_index[nsmallestarg(scores, batch_size)]``
@tangypnuaa
Copy link
Collaborator

Hi, the problem is exactly what you think. Note that most of the existing AL methods do not expect the examples in unseen classes, unless they are proposed for openset problems.

So, please re-split your dataset and ensure that each class contains at least one example. (Use the split function in alipy can achieve this goal :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants