You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to solve the binary classification problem using tsai.
I have a kind-a large dataset. I cannot use apply_sliding_window directly on it because I run into OOM.
That is why I am now trying to use the TSUnwindowedDataset[s] routines. And I doubt several points whether I'm doing the right thing.
For the following example I took just a part of full dataset, so the shape of this slice does not really matter, it is just FYI.
After I've created splits, I create instances of the TSUnwindowedDataset and TSUnwindowedDatasets classes:
WINDOW_SIZE=50defmy_y_func(y_):
returny_[:,-1] # I need only the last item from the window of targetsds=TSUnwindowedDataset(X=X, y=y, y_func=my_y_func, window_size=WINDOW_SIZE, seq_first=True)
dsets=TSUnwindowedDatasets(ds, splits=splits)
dls=TSDataLoaders.from_dsets(dsets.train, dsets.valid, dsets[2], # incuding test part of datasetbs=256,
shuffle_train=False,
batch_tfms=TSStandardize(by_sample=True)
)
and here is the first point:
dls.vars, dls.c# (5, 1)# ^# expected 2 for binary classification
The class count is 1 instead of expected 2 for binary classification. If I try to create model and train it
model=TST(dls.vars, dls.c, dls.len, dropout=0.3, fc_dropout=0.3)
cbs= [
# does not matter
]
learn=Learner(dls, model, metrics=[RocAucBinary(), accuracy], cbs=cbs)
learn.lr_find()
This code trains the model and I even get pretty good-looking charts at the end
But here is the second point:
I don't know how to properly interpret predictions.
probas, *_, labels=learner.get_preds(dl=dls.valid, with_decoded=True)
labels_=probas.argmax(dim=1)
test_eq(labels_, labels) # OK
As for my target - y[i] == 1 is good and 0 is bad But what doeslabel[i] == 1mean? It can mean the same as my target, but since the prediction return probabilities of shape (N, 2) I suspect it means the opposite.
And here is the third point - I cannot reproduce validation ROC AUC score anywhere near displayed on the chat.
In both ways I compare predicted labels to my target on the validation subset - I get ROC AUC ~0.5, but the chart shows 0.75
Why that happens? What am I missing?
The text was updated successfully, but these errors were encountered:
Hi @oguiza
I am trying to solve the binary classification problem using
tsai
.I have a kind-a large dataset. I cannot use
apply_sliding_window
directly on it because I run into OOM.That is why I am now trying to use the
TSUnwindowedDataset[s]
routines. And I doubt several points whether I'm doing the right thing.For the following example I took just a part of full dataset, so the shape of this slice does not really matter, it is just FYI.
Now I extract features and target from the slice
Checking the target is binary:
Now the
tsai
library kicks in:After I've created splits, I create instances of the
TSUnwindowedDataset
andTSUnwindowedDatasets
classes:and here is the first point:
The class count is
1
instead of expected2
for binary classification. If I try to create model and train itI get following error:
This approach differs from the sample notebook, where a transformation is used for the target:
However, the
TSUnwindowedDataset
does not have such functionality.How to properly introduce the target to the data loader in that case?
As a temporary solution, I have tried to train model like this:
This code trains the model and I even get pretty good-looking charts at the end
But here is the second point:
I don't know how to properly interpret predictions.
As for my target -
y[i] == 1
isgood
and0
isbad
But what does
label[i] == 1
mean? It can mean the same as my target, but since the prediction return probabilities of shape(N, 2)
I suspect it means the opposite.So to check it I've created a method:
...and tried both ways
And here is the third point - I cannot reproduce validation ROC AUC score anywhere near displayed on the chat.
In both ways I compare predicted labels to my target on the validation subset - I get ROC AUC ~0.5, but the chart shows 0.75
Why that happens? What am I missing?
The text was updated successfully, but these errors were encountered: