Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to build distribution data on bytecode-only architectures #22

Closed
glondu opened this issue Feb 6, 2023 · 11 comments
Closed

Fails to build distribution data on bytecode-only architectures #22

glondu opened this issue Feb 6, 2023 · 11 comments

Comments

@glondu
Copy link

glondu commented Feb 6, 2023

uunf fails to build when ocamlopt is missing. Replacing .native by .byte in pkg/build_support.ml seems to fix the issue.

@dbuenzli
Copy link
Owner

dbuenzli commented Feb 6, 2023

The distribution contains generated data. This is to build the generated data why don't you build from tarballs ? See #21

(I have no plans to fix this in the short term).

@dbuenzli dbuenzli changed the title Fails to build on bytecode-only architectures Fails to build distribution data on bytecode-only architectures Feb 6, 2023
@dbuenzli
Copy link
Owner

dbuenzli commented Aug 20, 2023

@glondu, given your request and #21 I am a bit confused about what it means for debian to build from the sources. I always understood that as building from the distribution tarballs, not from the software development repository.

Did your policies change ? Or do you have additional targets to build from the source repo ?

I'm happy to make things easier for you but I'd first like to understand what you are trying to do.

Even though I always disliked it, I'm starting to think about maybe starting to commit generated files to the repo. Notably because opam sandboxing made it harder to be able to pin the package to the repo since it tries to download the unicode character database (ocaml/opam#3771).

@glondu
Copy link
Author

glondu commented Aug 20, 2023

Did your policies change ?

Not recently. The Debian policy is to:

  • build from tarballs if upstream provides them (so that upstream signatures, if any, are preserved)
  • ignore generated stuff and regenerate them
  • not use network during build

Some people even "repack" upstream tarballs (i.e. create a new tarball) without generated stuff, but I don't think this is required. However, repacking might be required, e.g. if it is not possible to regenerate generated stuff from what is present in the tarball (in practice, people directly use the development repository then).

The philosophy behind this policy is that one must be able to patch anything in the package (using the preferred form for modification) and regenerate everything that needs regenerating without access to network.

To better understand things, you might also want to know that there is a mechanism to detect new upstream versions and automate the update of the Debian package. When the upstream is on GitHub, it is simply easier to use the tarballs that GitHub automatically generates for each tag, which are based on the git repository, or even directly follow the git repository itself.

I do pay attention to use (what I believe are) "official" upstream tarballs, if any, for Debian packages I create or have actively reviewed (e.g. cmdliner). However, I just checked uunf and the person who did the packaging (@SnarkBoojum) chose to use GitHub instead. This can be fixed of course punctually, but it is difficult to automate this kind of change since there is no standard way to know what are the official upstream tarballs.

@glondu
Copy link
Author

glondu commented Aug 20, 2023

FYI, I've identified 4 Debian packages where you are upstream and which were watching GitHub instead of erratique.ch: fpath, uucd, uucp and uunf. I've fixed the watch files to point to erratique.ch.

@dbuenzli
Copy link
Owner

Thanks for your explanations.

The philosophy behind this policy is that one must be able to patch anything in the package (using the preferred form for modification) and regenerate everything that needs regenerating without access to network.

I see, one question.

If I start committing the generated files to the repo – which I might start doing because I also think it provides better traceability for say bug fixe releases that are supposed to use the same ucd data. Will you still bother about regenerating these files yourself ?

(I afraid I won't commit the 41Mo ucd.xml file that you'd need to do so without pain, that would be too much bloat in my eyes)

When the upstream is on GitHub, it is simply easier to use the tarballs that GitHub automatically generates for each tag, which are based on the git repository, or even directly follow the git repository itself.

These tarballs generated by github have never been stable checksum wise, I wouldn't rely on them. For example I think they change when you move the repo to a new user or organisation.

In general I always apply a function on my sources before packing a release tarball, for example to substitute, among other things, the release version number in the sources. So one should never rely on VCS generated tarballs for my projects.

It's an old practice that I'm gradually trying to rely less on, but I'm not sure I will ever totally get rid of it. Notably because I dislike committing version numbers to the repo: it's the wrong workflow and leads to perpetual release mishaps (forgot to bump the version number before making the release). Version numbers should be defined by the VCS, not in the VCS managed content. And having dumb and flawless release procedures matters a lot when you manage lots of packages.

This can be fixed of course punctually, but it is difficult to automate this kind of change since there is no standard way to know what are the official upstream tarballs.

Well in the OCaml world you can reasonably rely on the opam metadata of the official OCaml opam repository. Ideally this should work, but for some reason it doesn't:

> opam info uunf.15.0.0 -f url.src,url.checksum
[ERROR] No printer for "url.src"

but basically opam info uunf.15.0.0 --raw has all the info you need for a given release:

…
url {
  src: "https://erratique.ch/software/uunf/releases/uunf-15.0.0.tbz"
  checksum:
    "sha512=204d923d4e8d910318180c15087fe53d98d8ec0a8d3c3f6c54219e5e09ee5c5bdf57585e5570d895f8d90647c4eeaa45d9e6e75d58edeb9febee053e0dd47fbc"
}

FYI, I've identified 4 Debian packages where you are upstream and which were watching GitHub instead of erratique.ch: fpath, uucd, uucp and uunf. I've fixed the watch files to point to erratique.ch.

Thank you! That's the way it should be, github is just a bug tracker for me, for sources it's a mirror.

@glondu
Copy link
Author

glondu commented Aug 21, 2023

If I start committing the generated files to the repo – which I might start doing because I also think it provides better traceability for say bug fixe releases that are supposed to use the same ucd data. Will you still bother about regenerating these files yourself ?

Yes.

(I afraid I won't commit the 41Mo ucd.xml file that you'd need to do so without pain, that would be too much bloat in my eyes)

Note that this file has been committed to the Debian source packages of uucp and uunf, to make all this policy-compliant (albeit suboptimal in terms of disk space).

Maybe this could be the job of a dedicated data package, especially if you use it in several packages... Like unicode-data in Debian. Unfortunately, this one doesn't seem to provide the file you need (nor does any other Debian package... actually, in all Debian, uucp and uunf are the only ones referring to ucd.xml). @SnarkBoojum did ask for it but the bug report has been unanswered for 10 months. If you decide to go this way, I guess you would make your own package anyway, to put it in opam.

These tarballs generated by github have never been stable checksum wise, I wouldn't rely on them.

I wouldn't either. But many upstreams do not provide anything else...

Well in the OCaml world you can reasonably rely on the opam metadata of the official OCaml opam repository.

Actually, I do use opam to investigate updates and keep the Debian world synchronized. It never occurred to me that I could use it to check that upstream tarballs are also the same.

@dbuenzli
Copy link
Owner

(nor does any other Debian package... actually, in all Debian, uucp and uunf are the only ones referring to ucd.xml).

The files are called differently at the source because they distribute them with different data layouts (I'm using ucd.xml because the generation works with both flat or grouped data files) and some versions only have partial data.

If you look for this prefix or that prefix there are more references (though I'm not sure if it's in their install procedures).

@SnarkBoojum did ask for it but the bug report has been unanswered for 10 months. If you decide to go this way, I guess you would make your own package anyway, to put it in opam.

It looks to me that this is the best course of action on your side. It would be nice to have either ucd.all.flat.zip or ucd.all.grouped.zip in the unicode-data package.

@dbuenzli
Copy link
Owner

(though I'm not sure if it's in their install procedures).

Well at least here https://sources.debian.org/src/angular.js/1.8.3-1/i18n/ucd/src/ you have another copy of the database if that may help convincing the unicode-data package to have the file.

@dbuenzli
Copy link
Owner

Well in the OCaml world you can reasonably rely on the opam metadata of the official OCaml opam repository. Ideally this should work, but for some reason it doesn't:

> opam info uunf.15.0.0 -f url.src,url.checksum
[ERROR] No printer for "url.src"

Just in case the correct invocation was (the : field seperator needs to be there):

> opam info uunf.15.0.0 -f url.src:,url.checksum: 
url.src:      "https://erratique.ch/software/uunf/releases/uunf-15.0.0.tbz"
url.checksum: "sha512=204d923d4e8d910318180c15087fe53d98d8ec0a8d3c3f6c54219e5e09ee5c5bdf57585e5570d895f8d90647c4eeaa45d9e6e75d58edeb9febee053e0dd47fbc"

@glondu
Copy link
Author

glondu commented Aug 22, 2023

Using opam info turned out to be unreliable, I've written a script that reads directly the opam file, used in another script that checks that upstreams are the same in Debian and in opam.

dbuenzli added a commit that referenced this issue Aug 28, 2023
Generation should now work in bytecode only installs (#22).
@dbuenzli
Copy link
Owner

dbuenzli commented Sep 5, 2024

Since v15.1.0 generated files are committed to the repo (4205ba3 is the commit for upcoming v16.0.0) I believe this trivially solves the issue so I'm closing.

@dbuenzli dbuenzli closed this as completed Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants