Skip to content
This repository has been archived by the owner on Oct 7, 2021. It is now read-only.

FFmprobe read large comments from mp4 file is truncated #331

Closed
RowlandOti opened this issue Jan 29, 2020 · 22 comments
Closed

FFmprobe read large comments from mp4 file is truncated #331

RowlandOti opened this issue Jan 29, 2020 · 22 comments
Assignees
Labels
bug Something isn't working

Comments

@RowlandOti
Copy link

RowlandOti commented Jan 29, 2020

Description
FFprobe.getMediaInformation() read comments from mp3 file is truncated, when the text is long. I am not sure, but more than a certain number, it gets truncated at that point whenever, we read using FFmprobe

Expected behavior
Full text encoded in the comments section needs to be returned - even if large

Current behavior
Text from the comments section is truncated

Encoded Text in comment section
At a predominantly white Ivy League college, a group of black students navigate various forms of racial and other types of discrimination. | Buoyed by his front-page story on the blackface party, shy reporter Lionel begins to come out of his shell and embrace his true identity. | A college radio host named Samantha leads the outcry over an offensive party on campus; a revelation about Samantha's love life puts her in an awkward spot.

Actual Result
At a predominantly white Ivy League college, a group of black students navigate various forms of racial and other types of discrimination. | In the second season of this comedy series, Sam (Logan Browning) finds herself at the epicentre of social media ba

Environment

  • Platform: [e.g. Android/IOS]

  • Architecture: [arm-v7a, arm-v7a-neon, arm64-v8a, x86, x86_64, armv7, armv7s, arm64, i386, x86_64]

  • Version 4.3

  • Android Studio version 3.5

@RowlandOti RowlandOti changed the title FFpobe read large comments from mp4 file is truncated FFmprobe read large comments from mp4 file is truncated Jan 29, 2020
@tanersener
Copy link
Owner

tanersener commented Jan 29, 2020

Can you please provide some data about this issue, like a sample file that has large comments, full command you're using on FFprobe, also maybe console output?

@tanersener tanersener self-assigned this Jan 29, 2020
@tanersener tanersener added the question Further information is requested label Jan 29, 2020
@RowlandOti
Copy link
Author

Hi, @tanersener I may not be able to upload the content due to copyright issues. However, what I do know is that all the truncated comment text have the same size e.g :

Character Counter   Letter Count   Characters Calculator

Which already gives quite a bit of useful information. The videos were successfully encoded with the actual text size i.e:

dess

So then, does getMediaInformation() use command arguments that could alter the max text size? Is this an FFmpeg limitation? Is the wanted text size limit configurable?

@tanersener
Copy link
Owner

tanersener commented Jan 30, 2020

Hi, @tanersener I may not be able to upload the content due to copyright issues. However, what I do know is that all the truncated comment text have the same size e.g :

Can you share FFProbe output for that file? I don't know how a comment is stored in an mp4 file. Seeing the FFprobe output may help me to understand how it is.

There are some internal limits inside FFmpeg about logging, also Android has some limits about printing to logcat. But I need to test it to see whether any of these two is causing an issue. There might be another issue as well.

I don't define limits about MobileFFmpeg just replicate what FFmpeg provides.

@tanersener
Copy link
Owner

Made some tests on desktop ffprobe and noticed that ffprobe is truncating metadata values. Can you test your file with desktop ffprobe?

@RowlandOti
Copy link
Author

RowlandOti commented Jan 31, 2020

Hi, @tanersener using desktop ffprobe turned out to be useful. I used ffprobe -show_format 7worlds.mp4 and saw that there was no cut-off .

not-cut

However, when I just use ffprobe -i 7worlds.mp4. There is a text cut-off.

cut-off

ffprobe -v info -hide_banner -i 7worlds.mp4 which is used in the app however has this problem. It cuts off the comment text.

What is used in the library, is the source of this cut-off. It uses:

ffprobe -v error -show_format -show_streams 7worlds.mp4

Could FFprobe.getMediaInformation() and FFmpeg.getMediaInformation() be tweaked to take care of this? Or another function, with the arguments that do not result in a cut-off. What do you think?

@RowlandOti
Copy link
Author

RowlandOti commented Jan 31, 2020

It is worth noting that, the section FORMAT/TAG: has the full-text for the comment section. However, the Input #0 is still cut-off in both commands. How can we get the former, since it has the full text as expected?

@tanersener
Copy link
Owner

Could FFprobe.getMediaInformation() and FFmpeg.getMediaInformation() be tweaked to take care of this? Or another function, with the arguments that do not result in a cut-off. What do you think?

Why don't you execute FFprobe.execute() with the options/flags you need, take the command output and use MediaInformationParser.from() method to extract MediaInformation ?

@RowlandOti
Copy link
Author

RowlandOti commented Jan 31, 2020

Awesome! Resorted to using FFprobe.execute("-v quiet -print_format json -show_format --hide_banner -i 7worlds.mp4") which gives me JSON I can serialize and neatly get the data that I needed. This is more than I needed and works like a charm. Thank you for the insight.

@jaredbracken
Copy link

The above does give more text, but it is still cut off after 1024 characters. Would be great to have a way to guarantee we get the full text. If you know of a flag to pass that will do that please let me know.

@RowlandOti
Copy link
Author

RowlandOti commented May 2, 2020

If you do not get a proper solution, you will be forced to fetch the comments alone using a second execution, which effectively slows down your program.

val command = arrayOf(
            "-v", "error",
            "-hide_banner",
            "-pretty",
            "-show_error",
            "-show_entries",
            "format_tags=comment",
            "-of", "default=noprint_wrappers=1:nokey=1",
            "-i", realPath
        )

@RowlandOti RowlandOti reopened this May 2, 2020
@jaredbracken
Copy link

I was hopeful that the above would give the full comment, but it cuts it off at exactly 1000 characters, not quite as good as the 1024 I was getting. Is there anything else I could try?

@RowlandOti
Copy link
Author

My best bet is that there is a limit on the data size that the metadata can carry, and this is in FFmpeg. Best place to get help on that would be to join the mailing list and ask the question there. I am sure you'll get a response in a day.

@jaredbracken
Copy link

I found a ticket in ffmpeg which describes this issue, where the limit is defined, and how to increase it. There is also a zip file attached to the ticket with a sample audio file that demonstrates the issue (I didn't look at it myself). What do you think @tanersener? Would you be willing to increase the character limit per line for the output? It is quite annoying to have them often cut off.

It seems like it would be a small change as it looks like it would just be increasing the allowed length printing the output for each line in /src/ffmpeg/libavformat/ffmetadec.c. I believe it is the read_line_to_bprint_escaped tmp[1024] (line 40) that is enforcing this limit. Could you bump that up in your releases to something less likely to concat the tags output? Maybe 10x that amount? I think others would also benefit from this change.

https://trac.ffmpeg.org/ticket/4833

@tanersener
Copy link
Owner

@jaredbracken I prefer to solve this kind of issues without modifying ffmpeg source code. Also, in this case, I'm not convinced that the solution that works for @RowlandOti does not work for you. It would be great if you can provide some logs about your case and the file that you use?

@jaredbracken
Copy link

I am using his solution which does work to increase the limit from 255 to 1024. That worked for @RowlandOti because while his output was greater than 255 it was less than 1024. Any tag which character greater than 1024 will be truncated. Would logs and a sample file still be helpful to you?

@tanersener
Copy link
Owner

Yes, I need to see it locally to before making any changes.

@jaredbracken
Copy link

Ok, I am attaching a small audio sample file where I created a lorem ipsum comment tag that is longer than 1k characters.

I am executing FFprobe with the following arguments.
String[] args = new String[]{"-v", "quiet", "-print_format", "json", "-show_chapters", "-show_format", file.getAbsolutePath()};

The comment tag output looks like this:
"comment": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit. This sentence will be cut off at 1024 characters which is about here. Th, (<== this should continue with "This trailing text is cut off.")

This also is a problem I have to manually fix before parsing the json because an exception will happen trying to parse due to the missing close quote.

SAMPLE_MP3_700KB.zip

@tanersener
Copy link
Owner

Thanks. Made some tests with your file. I think there is a problem about parsing, I'm still trying to analyse it. But using at the following code I can see that the full comment is there.

NSString* ffrobeCommand = [[NSString alloc] initWithFormat:@"-print_format json -show_format -hide_banner %@", audioFile];
[MobileFFprobe execute: ffrobeCommand];
NSString *output = [MobileFFmpegConfig getLastCommandOutput];
MediaInformation* information = [MediaInformationParser from: output];
NSLog(@"Raw information: %@\n", [information getRawInformation]);
Raw information: {
Input #0, mp3, from '/Users/taner/Library/Developer/CoreSimulator/Devices/.../MobileFFmpegTest.app/SAMPLE_MP3_700KB.mp3':
  Metadata:
    album           : YouTube Audio Library
    artist          : Kevin MacLeod
    comment         : Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor i
    genre           : Cinematic
    title           : Impact Moderato
  Duration: 00:00:27.25, start: 0.034531, bitrate: 225 kb/s
    Stream #0:0: Audio: mp3, 32000 Hz, stereo, fltp, 224 kb/s
    Metadata:
      encoder         : LAME3.99r
    "format": {
        "filename": "/Users/taner/Library/Developer/CoreSimulator/Devices/.../MobileFFmpegTest.app/SAMPLE_MP3_700KB.mp3",
        "nb_streams": 1,
        "nb_programs": 0,
        "format_name": "mp3",
        "start_time": "0.034531",
        "duration": "27.252000",
        "size": "768754",
        "bit_rate": "225672",
        "probe_score": 51,
        "tags": {
            "album": "YouTube Audio Library",
            "artist": "Kevin MacLeod",
            "comment": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit. This sentence will be cut off at 1024 characters which is about here. This trailing text is cut off.",
            "genre": "Cinematic",
            "title": "Impact Moderato"
        }
    }
}

@jaredbracken
Copy link

Is that code for iOS? You are saying it works on iOS, but not Android? I tried with the same parameters you listed, on Android, and am still getting it cut off. I hope it's something that can be fixed without much difficulty. Thank you very much for taking time to look into this.

@tanersener tanersener added bug Something isn't working and removed needs-analysis labels May 9, 2020
@tanersener
Copy link
Owner

tanersener commented May 9, 2020

@jaredbracken Made some tests on Android and saw that my solution does not work as you've said. Further analysis showed that the actual problem is related to an internal limit defined in MobileFFmpeg. Created an issue about it, #418. Current issue will be resolved after that on is fixed. I'll let you know about the progress.

@tanersener
Copy link
Owner

tanersener commented May 10, 2020

Fixed both #417 and #418 on development. These two issues resolve this one. Tested and verified it.

@tanersener tanersener removed the question Further information is requested label May 10, 2020
@RowlandOti
Copy link
Author

Closing issue, waiting on a release.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants