[RFC] keep track of minver and maxver per element #207

robUx4 · 2024-01-06T14:04:28Z

They correspond to the attributes in the EBML specs.

For they are just informational. We may use this information when reading data to find which element are allowed given the read/version in the EBML header, that can help with error recovery, skipping elements that are not legit given the context.

mbunkus

Like I wrote elsewhere, I'm not a big fan of the libraries constraining the elements that can be read or written depending on the version. The application should deal with it. Or there should be a switch to turn such a feature off & let the application deal with it. My legitimate use case is writing the EBML header with rather low version numbers & updating it later on when elements requiring newer versions to be written.

I would definitely profit from having that information stored & accessible in libEBML/libMatroska, though. At the moment I store the same information directly in MKVToolNix. It makes total sense to let the spectools embed the information in the libraries as it is already available in the specs.

mbunkus · 2024-01-06T15:14:18Z

ebml/EbmlElement.h

+ const num_version minver;
+ // the maximum DocType version this element is allowed in
+ // ANY_VERSION if the element is supported in all (known) version
+ const num_version maxver;


I think this would be a great place to use std::optional instead of magic values. Just make minver & maxver an std::optional<unsigned int>. Special values are so C++99 😁

Unfortunately std::optional is a feature of C++17, and again, so far we only require C++14. Repeating myself here, but I'd be more than fine with bumping the requirements up to C++17, but if I remember correctly this would have the probably undesired consequence that VLC would now also require C++17 instead of C++14. Or do I misremember?

Correct, for now VLC only requires a C++14 compiler to build. That means people using just that will not be able to compile libmatroska within VLC, and have Matroska support disabled.

The change you propose is probably better. But until we officially make the switch to C++17 we can't use that. So this should not be blocking.

I agree. We can always refactor to use std::optional later if we decided to require C++17.

robUx4 · 2024-01-06T15:35:39Z

Like I wrote elsewhere, I'm not a big fan of the libraries constraining the elements that can be read or written depending on the version. The application should deal with it. Or there should be a switch to turn such a feature off & let the application deal with it. My legitimate use case is writing the EBML header with rather low version numbers & updating it later on when elements requiring newer versions to be written.

Good because that's also my opinion. In VLC we will read invalid elements no matter what.

The API allow to check which elements are valid and maybe skip them if they are invalid. It will probably something like ShouldWrite but for... reading. The host app will be in total control of what to read (mkvalidator could report when a illegal element is read, for example) but the default will be the same as now.

mbunkus · 2024-01-07T12:15:24Z

As I'm confused with the logic in the other PRs wrt. deprecated elements, I thought it would be good to have a function bool IsDeprecated() const and/or bool IsDeprecatedInVersion(num_version version) const directly in EbmlDocVersion. In that function you could add a comment describing how deprecation is expressed with the existing values. Other places would not need to know about the internals & can just call the function.

Additionally functions such as const IsValidInVersion(num_version version) const, num_version GetMinVersion() const, num_version GetMaxVersion() would be nice to have.

The goal is to hide all the internals of EbmlDocVersion from users of the class. Leave how values are stored/used up to the class, don't bother others with it.

And a last nitpick: I'm not the biggest fan of using unsigned char here. Member variables will be aligned to sensible boundaries (32bit?) for faster access on modern CPUs anyway. Just make that unsigned int.

Maybe also rename num_version to version_type to make it optically much clearer that this is a type. Or even value_type; would also match how things are named in the standard library (e.g. std::vector<T>::value_type).

robUx4 · 2024-01-07T12:51:55Z

Are you sure the member variables are usually aligned ? When I compile with AppleClang 15 this test passes in debug and release build:

  static_assert(sizeof(EbmlDocVersion) == 2, "nope");

I think since there's no bigger type, it keeps the smallest alignment which is 1 octet here.

It's all about the balance between the size in memory and the speed of the code. For 200 elements, that would add 1200 bytes in memory with code a little bit faster. Also libmatroska will subclass this class with 2 boolean which are also 1 octet each. That means the size would go to 32 octets instead of 4 octets if we force int in EbmlDocVersion, which means an extra 5600 octets used in memory.

IMO unless the code is much faster it's not worth the extra memory.

robUx4 · 2024-01-07T12:54:33Z

One reason to use a bigger storage size would be to avoid the 256 doctype version limit. It's clearly not on the horizon for Matroska. But maybe other EBML formats might need it.

mbunkus · 2024-01-07T13:18:02Z

Are you sure the member variables are usually aligned ?

Absolutely. That's why you need things such as __attribute__((__packed__)) or the MSVC equivalent of #pragma pack(push, 1) if you want a struct whose memory layout exactly represents something, e.g. structures in a file format.

It's quite possible that sizeof(EbmlDocVersion) is 2, but when you use an EbmlDocVersion inside another class then the following member variables in that embedding class will be aligned.

Here's an example that shows both effects:

[0 mosu@sweet-chili ~/test] cat cpp1.cpp
#include <cstdint>
#include <iostream>

struct s1 {
  uint32_t a;
  unsigned char b;
  uint32_t c;
};

struct __attribute__((__packed__)) s2 {
  uint32_t a;
  unsigned char b;
  uint32_t c;
};

struct embedding {
  s1 a;
  s2 b;
  s1 c;
};

int
main() {
  std::cout << "s1: " << sizeof(s1) << " s2: " << sizeof(s2) << " embedding: " << sizeof(embedding) << "\n";
  return 0;
}
[0 mosu@sweet-chili ~/test] ./cpp1
s1: 12 s2: 9 embedding: 36

One interesting fact about why these values are aligned is that a lot of CPUs cannot even read 32bit values from non-aligned addresses (they might end the process with a SIGBUS), or only with huge performance penalties.

It's a thing, this alignment.

robUx4 · 2024-01-13T13:21:23Z

I'm familiar with the alignment, but that's not exactly what is happening here. As said above the current size of EbmlDocVersion is 2. I did the following class:

class VersionExt : public EbmlDocVersion
{
    public:
        unsigned int val;
};

If the padding of the parent class was overriden, we would have a sizeof of 12. But the sizeof is 8. So the original class does keep its size of 2, the 4 of the unsigned int are added, and there's a padding of 2 to round to a power of 2.

Also this gives a sizeof 12:

class VersionExt : public EbmlDocVersion
{
    public:
        unsigned int val;
        unsigned char a;
};

When the same fields in a different order gives 8:

class VersionExt : public EbmlDocVersion
{
    public:
        unsigned char a;
        unsigned int val;
};

As we're changing the ABI that's something we should keep in mind as we may reorder elements to save on memory at no cost.

I'll switch to unsigned int, not for memory alignment/performance but simply because it allows more values.

This corresponds to the element attributes of the same name in the EBML spec.

It's possible to provide a sub-class with more features, like WebM/DivX support in matroska.

We could have a version with/without but that's way too much. Maybe some macros can be turned into templates and then we can have default values.

mbunkus · 2024-01-14T09:50:39Z

LOL, Github thought I had merged this one when I pushed the libEBML branch… anyway, both are merged & pushed.

robUx4 added api-break breaks the API (e.g. programs using it will have to adjust their source code) abi-break breaks the ABI (e.g. programs linked against the library have to be recompiled) labels Jan 6, 2024

robUx4 mentioned this pull request Jan 6, 2024

[RFC] keep the min/max/webm/divx information for each element Matroska-Org/libmatroska#148

Closed

mbunkus reviewed Jan 6, 2024

View reviewed changes

robUx4 force-pushed the min_max branch from 4d4ef95 to 0e946c9 Compare January 7, 2024 13:08

robUx4 force-pushed the min_max branch from 0e946c9 to 5a996c3 Compare January 13, 2024 13:22

robUx4 requested a review from mbunkus January 13, 2024 16:15

robUx4 added 4 commits January 14, 2024 08:30

keep the minver/maxver values for each element class

fbaa146

This corresponds to the element attributes of the same name in the EBML spec.

use a class for the minver/maxver

ca6403b

It's possible to provide a sub-class with more features, like WebM/DivX support in matroska.

force setting the minver/maxver value when using the macros

cbec74c

We could have a version with/without but that's way too much. Maybe some macros can be turned into templates and then we can have default values.

Add helpers to EbmlDocVersion and make the storage private

73f260a

robUx4 force-pushed the min_max branch from 5a996c3 to 73f260a Compare January 14, 2024 07:31

mbunkus merged commit ab2e85e into Matroska-Org:master Jan 14, 2024
15 checks passed

robUx4 deleted the min_max branch January 14, 2024 09:54

robUx4 mentioned this pull request Feb 26, 2024

Add profile availability in the semantic #117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] keep track of minver and maxver per element #207

[RFC] keep track of minver and maxver per element #207

robUx4 commented Jan 6, 2024

mbunkus left a comment

mbunkus Jan 6, 2024

robUx4 Jan 6, 2024

robUx4 Jan 7, 2024

mbunkus Jan 7, 2024

robUx4 commented Jan 6, 2024

mbunkus commented Jan 7, 2024

robUx4 commented Jan 7, 2024

robUx4 commented Jan 7, 2024

mbunkus commented Jan 7, 2024 •

edited

Loading

robUx4 commented Jan 13, 2024

mbunkus commented Jan 14, 2024

[RFC] keep track of minver and maxver per element #207

[RFC] keep track of minver and maxver per element #207

Conversation

robUx4 commented Jan 6, 2024

mbunkus left a comment

Choose a reason for hiding this comment

mbunkus Jan 6, 2024

Choose a reason for hiding this comment

robUx4 Jan 6, 2024

Choose a reason for hiding this comment

robUx4 Jan 7, 2024

Choose a reason for hiding this comment

mbunkus Jan 7, 2024

Choose a reason for hiding this comment

robUx4 commented Jan 6, 2024

mbunkus commented Jan 7, 2024

robUx4 commented Jan 7, 2024

robUx4 commented Jan 7, 2024

mbunkus commented Jan 7, 2024 • edited Loading

robUx4 commented Jan 13, 2024

mbunkus commented Jan 14, 2024

mbunkus commented Jan 7, 2024 •

edited

Loading