Prediction on new dataset #8

georgkempf · 2023-02-18T11:18:15Z

Hello,

I trained a model with experimental data (split into train, valid, test) and wanted to use it for prediction on an independent library. For prediction, I used the example script and substituted the test data by the new library. I was wondering what would be the best practice in this case regarding the qsar_vocab. In the example, the qsar_vocab seems to be build from train and valid data:

    qsar_vocab = TextLMDataBunch.from_df(path, train_aug, valid_aug, bs=bs, tokenizer=tok, 
                                  chunksize=50000, text_cols=0,label_cols=1, max_vocab=60000, include_bos=False)

    test_data_clas = TextClasDataBunch.from_df(path, train, test, bs=bs, tokenizer=tok, 
                          chunksize=50000, text_cols='smiles',label_cols='label', vocab=qsar_vocab.vocab, max_vocab=60000,
                                          include_bos=False)

When I now use the new library as test data, does the qsar_vocab, which would come from the experimental library used for training and validation, influence the results? Why does test_data_clas need a reference to the train data?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction on new dataset #8

Prediction on new dataset #8

georgkempf commented Feb 18, 2023

Prediction on new dataset #8

Prediction on new dataset #8

Comments

georgkempf commented Feb 18, 2023