QA Documentation: I got error just copy and pasting documentation #10210

andreabac3 · 2021-02-16T13:13:29Z

Environment info

transformers version: 4.3.1
Platform:Manjaro Linux
Python version: 1.5.1
PyTorch version (GPU?): Yes
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help

Information

I am trying the to train a QA model following the huggingface documentation, I just copied and pasted the code in my machine (and in Colab) but I was not able to proceed in the training phase because I got None value.

To reproduce

Steps to reproduce the behavior:

Go to the documentation: https://huggingface.co/transformers/custom_datasets.html at Squad training section
Copy and paste the code as you can see from my pastebin: https://pastebin.com/hZvq7Zs7
And you got the following error
File "/home/andrea/PycharmProjects/qa-srl/test.py", line 78, in __getitem__ return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()} RuntimeError: Could not infer dtype of NoneType
My naive solution was modifying the getitem method from the SquadDataset class in order to avoid to serve the val[idx] == None

The text was updated successfully, but these errors were encountered:

sgugger · 2021-02-16T13:53:05Z

Pinging @joeddav on this one, since he wrote this tutorial :-)

andreabac3 · 2021-02-16T14:02:09Z

Thank you @sgugger for the reply.
Ok I can wait for the answer from @joeddav.

Have a nice day.

joeddav · 2021-02-16T17:09:07Z

Figured it out. answer_end is the character position immediately after the answer, so end_position should be derived from answer_end - 1. I'm not sure why I was able to run it without this error previously (perhaps a resolved tokenizer bug?), but this should be correct.

def add_token_positions(encodings, answers):
    start_positions = []
    end_positions = []
    for i in range(len(answers)):
        start_positions.append(encodings.char_to_token(i, answers[i]['answer_start']))
        end_positions.append(encodings.char_to_token(i, answers[i]['answer_end'] - 1))
 
        # if start position is None, the answer passage has been truncated
        if start_positions[-1] is None:
            start_positions[-1] = tokenizer.model_max_length
            end_positions[-1] = tokenizer.model_max_length

    encodings.update({'start_positions': start_positions, 'end_positions': end_positions})

LysandreJik · 2021-02-17T12:58:07Z

Closed by #10217

andreabac3 · 2021-02-17T12:59:52Z

Thank you @joeddav the posted code works perfectly.

andreabac3 · 2021-02-24T13:19:02Z

Sorry for bothering you @joeddav again, I have a question related to the code posted by you here.
I am still getting None with the dataset built by myself using this code. My dataset works perfectly with the run_squad original script.
In this snipped posted by you I encounter None in the vector of end_positions and I don't know how fix it. I saw the condition in which there's a None the start_positions but what I have to do in the case the None is only in the end_positions vector?

Kind regards,
Andrea

@joeddav

* Fix None in add_token_positions - issue #10210 Fix None in add_token_positions related to the issue #10210 * add_token_positions fix None values in end_positions vector add_token_positions fix None in end_positions vector as proposed by @joeddav

joeddav mentioned this issue Feb 16, 2021

Fix add_token_positions in custom datasets tutorial #10217

Merged

LysandreJik closed this as completed Feb 17, 2021

andreabac3 mentioned this issue Feb 24, 2021

Fix None in add_token_positions - issue #10210 #10374

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QA Documentation: I got error just copy and pasting documentation #10210

QA Documentation: I got error just copy and pasting documentation #10210

andreabac3 commented Feb 16, 2021 •

edited

Loading

sgugger commented Feb 16, 2021

andreabac3 commented Feb 16, 2021

joeddav commented Feb 16, 2021

LysandreJik commented Feb 17, 2021

andreabac3 commented Feb 17, 2021

andreabac3 commented Feb 24, 2021

QA Documentation: I got error just copy and pasting documentation #10210

QA Documentation: I got error just copy and pasting documentation #10210

Comments

andreabac3 commented Feb 16, 2021 • edited Loading

Environment info

Who can help

Information

To reproduce

sgugger commented Feb 16, 2021

andreabac3 commented Feb 16, 2021

joeddav commented Feb 16, 2021

LysandreJik commented Feb 17, 2021

andreabac3 commented Feb 17, 2021

andreabac3 commented Feb 24, 2021

andreabac3 commented Feb 16, 2021 •

edited

Loading