Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chapter 06 - summarization - processing the entire dataset #135

Open
1 of 11 tasks
amscosta opened this issue Mar 30, 2024 · 2 comments
Open
1 of 11 tasks

chapter 06 - summarization - processing the entire dataset #135

amscosta opened this issue Mar 30, 2024 · 2 comments

Comments

@amscosta
Copy link

Information

The question or comment is about chapter:

  • Introduction
  • Text Classification
  • Transformer Anatomy
  • Multilingual Named Entity Recognition
  • Text Generation
  • Summarization
  • Question Answering
  • Making Transformers Efficient in Production
  • Dealing with Few to No Labels
  • Training Transformers from Scratch
  • Future Directions

Question or comment

Great book.
My question is very simple :
How can I extend the summarizing process for the entire dataset.
I.e. , from the first row:
sample_text = dataset["train"][1]["article"][:2000]
To all rows.
Apologies if sounds very silly.

@Ice-Citron
Copy link

sample_texts = [article[:2000] for article in dataset["train"]["article"]]
def shorten_article(example):
    example["article"] = example["article"][:2000]
    return example

dataset["train"] = dataset["train"].map(shorten_article)

@Ice-Citron
Copy link

not sure if these work. try em out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants