Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to process idx file without palette definition #24

Open
waterkip opened this issue Jan 4, 2024 · 0 comments
Open

Unable to process idx file without palette definition #24

waterkip opened this issue Jan 4, 2024 · 0 comments

Comments

@waterkip
Copy link

waterkip commented Jan 4, 2024

I have an idx file which I extracted with mkvextract from an mkv I ripped from a DVD, the process is something like this:

mplayer -dvd-device $dvd dvd://$nav -dumpstream -dumpfile $vob
ffmpeg \
    -fflags +genpts \
    -analyzeduration 1000000k \
    -probesize 1000000k \
    -i $vob \
    -c copy \
    $mapping \
    $metadata_audio \
    $metadata_subs \
    -y \
    $mkv

mkvextract $mkv tracks id:idxfile

This generates .idx files which look like this:

# VobSub index file, v7 (do not modify this line!)
langidx: 0

id: en, index: 0
timestamp: 00:00:04:200, filepos: 000000000
# etc etc

For vobsub2srt I added the line custom colors: ON, tridx: 1000, colors: 000000, ffffff, 000000, 000000 to improve on the OCR bit, but it's been a bit of a hit and miss with that.

So my idx files look like this:

# VobSub index file, v7 (do not modify this line!)
langidx: 0

custom colors: ON, tridx: 1000, colors: 000000, ffffff, 000000, 000000

id: en, index: 0
timestamp: 00:00:04:200, filepos: 000000000
# etc etc

However vobsubocr croaks with errors on these idx files:

An error occured: Could not parse VOB subtitles from 13-dut.idx: Could not parse 13-dut.idx

It seems you want to have palette present (I snipped piece of a idx I saw in one of the bugreports here) for the tool to work:

# The palette of the generated file
palette: 000000, f0f0f0, cccccc, 999999, 3333fa, 1111bb, fa3333, bb1111, 33fa33, 11bb11, fafa33, bbbb11, fa33fa, bb11bb, 33fafa, 11bbbb

This is what you seem to be able to process:

# VobSub index file, v7 (do not modify this line!)
langidx: 0

palette: 000000, f0f0f0, cccccc, 999999, 3333fa, 1111bb, fa3333, bb1111, 33fa33, 11bb11, fafa33, bbbb11, fa33fa, bb11bb, 33fafa, 11bbbb

id: en, index: 0
timestamp: 00:00:04:200, filepos: 000000000
# etc etc

I don't think we need to have the palette (or custom colors for that matter) present. It would be best if we generate an image that tesseract likes best:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant