Chapter 8: IndexError: list index out of range in transcript_enrich_bucket.py #587

bmerkle · 2024-09-14T14:31:08Z

Describe the bug
I tried Chapter 8 with my own set of data (some different youtube videos) and the data set yield a bug (index list out of range) in the transcript_enrich_bucket.py. Please see the stack trace below.

I also have a fix for this bug, please see the MR.

To Reproduce
Steps to reproduce the behavior:

Chapter 8
use you own data set (e.g. I used https://www.youtube.com/@SICKSensors)
I created an import for all playlists.
during transcript_enrich_bucket.py it fails for -7ckbQAqhe4.json.vtt

Stacktrace:
(.venv) PS C:\work\microsoft\generative-ai-for-beginners\08-building-search-applications\scripts> python transcript_enrich_bucket.py --verbose -f $TRANSCRIPT_FOLDER -m $TRANSCRIPT_BUCKET_MINUTES
DEBUG:main:Transcription folder: transcripts_sick
DEBUG:main:Segment length 3 minutes
Enriching Buckets... ---------------------------------------- 0% -:--:--DEBUG:main:Processing file: transcripts_sick-7ckbQAqhe4.json.vtt
Enriching Buckets... ---------------------------------------- 0% -:--:--
Traceback (most recent call last):
File "C:\work\microsoft\generative-ai-for-beginners\08-building-search-applications\scripts\transcript_enrich_bucket.py", line 218, in
get_transcript(meta)
File "C:\work\microsoft\generative-ai-for-beginners\08-building-search-applications\scripts\transcript_enrich_bucket.py", line 203, in get_transcript
parse_json_vtt_transcript(vtt, metadata)
File "C:\work\microsoft\generative-ai-for-beginners\08-building-search-applications\scripts\transcript_enrich_bucket.py", line 175, in parse_json_vtt_transcript
previous_segment_tokens = len(tokenizer.encode(segments[-1]["text"]))
~~~~~~~~^^^^
IndexError: list index out of range

Expected behavior
the transcript_enrich_bucket.py should not fail.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

github-actions · 2024-09-14T14:31:23Z

👋 Thanks for contributing @bmerkle! We will review the issue and get back to you soon.

github-actions bot added the needs-review label Sep 14, 2024

github-actions bot assigned koreyspace Sep 14, 2024

bmerkle mentioned this issue Sep 14, 2024

Fix#587: index list index out of range in transcript_enrich_bucket.py #588

Merged

koreyspace closed this as completed Sep 17, 2024

github-actions bot locked and limited conversation to collaborators Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 8: IndexError: list index out of range in transcript_enrich_bucket.py #587

Chapter 8: IndexError: list index out of range in transcript_enrich_bucket.py #587

bmerkle commented Sep 14, 2024

github-actions bot commented Sep 14, 2024

Chapter 8: IndexError: list index out of range in transcript_enrich_bucket.py #587

Chapter 8: IndexError: list index out of range in transcript_enrich_bucket.py #587

Comments

bmerkle commented Sep 14, 2024

github-actions bot commented Sep 14, 2024