Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: '<' not supported between instances of 'Example' and 'Example' #474

Closed
kidman99 opened this issue Nov 12, 2018 · 9 comments
Closed
Labels

Comments

@kidman99
Copy link

Got the error when running the following code. Is there anything similar to an operator overloading for "<" needed here, or there is a go around way here?

from torchtext.data import TabularDataset
from torchtext import data
from torchtext.vocab import GloVe
from torchtext.vocab import GloVe

tv_datafields = [("id", None), # we won't be needing the id, so we pass in None as the field
("question_text", TEXT),
("target", LABEL)]

trn = TabularDataset.splits(
path="data/quora", # the root directory where the data lies
train='train.csv',
format='csv',
skip_header=True, # if your csv header has a header, make sure to pass this to ensure it doesn't get proceesed as data!
fields=tv_datafields)

TEXT.build_vocab(trn, vectors=GloVe(name='6B', dim=300))

@tu-artem
Copy link

.splits() returns a tuple of datasets, in your case it is of length 1. So

trn = TabularDataset.splits(
...
...
...
fields=tv_datafields)[0]

should work here or you can use a regular TabularDataset constructor instead.

@cheryllwl
Copy link

I had the same problem with TabularDataset too
http://mlexplained.com/2018/02/08/a-comprehensive-tutorial-to-torchtext/
This tutorial was helpful.
image
added these two lines and it worked like a charm

@mttk
Copy link
Contributor

mttk commented Jan 31, 2019

thanks @cheryllwl , this should be documented properly.

@mttk mttk added the docs label Jan 31, 2019
@kunjmehta
Copy link

kunjmehta commented Oct 4, 2019

@tu-artem Can you please elaborate on what adding the index [0] does?
From what I gather the splits() method returns a Dataset object as a tuple containing Example objects (instances/rows)
So, if I write;
train, val = torchtext.data.TabularDataset.splits(path='./', train = "train.csv", test = "test.csv", format='csv', fields=data_fields, skip_header = True)
I will get a Dataset object which is a tuple containing all training instances in train variable and another Dataset object containing all test instances in val variable. Am I right?
In this case, please help me understand what the indexing [0] does. Thanks.

@tu-artem
Copy link

tu-artem commented Oct 4, 2019

@kunjmehta in your case you are already doing tuple unpacking via multiple assignment train, val = ..., so you don't need any further indexing

@aaronbriel
Copy link

What worked for me was to simply add sort=False, as sorting was not needed in my case.

@Sandesh10
Copy link

Sandesh10 commented Feb 19, 2020

What worked for me was to simply add sort=False, as sorting was not needed in my case.

This worked for me too. I added sort=False as a parameter in the BucketIterator.

@Oscarjia
Copy link

Oscarjia commented Nov 2, 2020

i solved this by add sort=False parameter.

device = "cuda" train_iterator,valid_iterator,test_iterator=data.BucketIterator.splits((train,valid,test),sort=False,batch_size=32,device=device)

@zhangguanheng66
Copy link
Contributor

@Oscarjia Thanks for helping debug.

Just FYI, Example and BucketIterator will be moved to the legacy folder and not maintained anymore. This is part of the plan for revamping torchtext library. #985

dpapathanasiou added a commit to dpapathanasiou/export-import that referenced this issue Nov 24, 2020
…device must be imported first) and added 'sort=False' to the BucketIterator.splits() command to prevent: pytorch/text#474
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants