Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pax support #1

Closed
ygale opened this issue Sep 24, 2015 · 17 comments
Closed

Add pax support #1

ygale opened this issue Sep 24, 2015 · 17 comments

Comments

@ygale
Copy link

ygale commented Sep 24, 2015

Continuing here the tar-specific part of the discussion in snoyberg/keter#124.

@ygale
Copy link
Author

ygale commented Sep 24, 2015

@dcoutts Thanks! In our case, we need read-support in keter and write-support in yesod-bin. In both cases, it doesn't make sense to hack in support for pax if the tar package will soon be growing support for pax. So it depends on how soon we can expect pax support to be add to tar. Are you thinking of doing it, and if so, do you have any estimate of how soon?

@ygale
Copy link
Author

ygale commented Sep 24, 2015

The tar library currently supports a format which it calls "Gnu". Either that is actually the format which nowadays is known as "oldgnu", or it is only partial support for "gnu" without the support for unlimited path length.

If we are implementing unlimited path length for pax, it would be good idea to support that for the gnu format as well, since there are still many archives floating around in that format. It was the default for GNU tar from 1997 (version 1.12) until recently.

Here is a quote from the documentation for GNU tar 1.28, the current stable release version:

"Usually, GNU tar is configured to create archives in 'gnu' format [by default], however, future version will switch to 'posix'." (where "posix" means pax)

@ygale
Copy link
Author

ygale commented Sep 24, 2015

@dcoutts regarding your comment:

...so long as it was done in the same way we handle the other standards we already support, i.e. in such a way that you can choose

I assume you mean just the ability to choose, and not backwards-compatiblity with the current API.

A natural way to do that would be to add a kind-promoted Format as a type parameter to the Entry type in place of the entryFormat field, and define a closed type family that maps Format to the corresponding path type. And similarly for extended file attributes. That would allow type-safety for format-specific features while still allowing generic code. But I'm not sure we want to use GHC features quite that recent for this library.

@dcoutts
Copy link
Contributor

dcoutts commented Sep 24, 2015

Oh I didn't mean in any fancy type way, just as data as the existing entryFormat stuff.

Yes, there is support for the commonly used subset of the gnu format, but indeed not including its long path name extensions (nor multi-volume, nor sparse files).

As for when, I'm not sure I can commit the time myself any time soon. There are bigger fish to fry with cabal & hackage. However I'm very happy to give advice and do code review if you or anyone else wants to try it. The pax format is actually pretty simple. It's just a special entry type containing utf8 key value pairs. The only slightly complicated thing is that, iirc, the pax metadata entry comes after each normal entry. This has the consequence that there's no longer a 1:1 relationship between actual entries in the tar sequence and logical files. Fortunately the new pax entry containing metadata appears before the entry to which it applies.

So my suggestion about how to proceed is that pax support can be implemented as an [Entry] -> [Entry] transformation. The Entry type itself would be extended to include the optional extended metadata, and the new format name. Then for reading, our function is of type Entries -> Entries: when we encounter a 'x' type entry, we take the body, decode it and apply the metadata to the following entry. For writing ([Entry] -> [Entry]), we expand pax format entries into two ustar entries, the metadata followed by the normal file.

@gbaz
Copy link

gbaz commented Aug 20, 2017

further documentation on pax is given here: https://www.freebsd.org/cgi/man.cgi?query=tar&sektion=5 and the specification, as best as i can tell, is here: http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html

@alexbiehl
Copy link
Member

alexbiehl commented Aug 20, 2017

I have implemented parsing PAX extended headers here: alexbiehl@51a6a3b. This is still quite hacky but may be used to iterate on.

  • writing ExtendedHeader uses the wrong format
  • it reads key-value pairs, we want a proper record here

@newhoggy
Copy link

I'm hitting the filename is too long issue. What's the status of this feature?

@hvr
Copy link
Member

hvr commented Apr 14, 2019

@newhoggy the status is basically that somebody needs to submit a PR with a proposed implementation of the feature before there's going to be any progress on this...

@alexbiehl
Copy link
Member

alexbiehl commented Apr 14, 2019 via email

@newhoggy
Copy link

newhoggy commented Apr 15, 2019

@alexbiehl Can you rebase on master and push a PR?

Also I got this error whilst compiling:

Codec/Archive/Tar/Types.hs:549:10: error:
    • No instance for (Semigroup (Entries e))
        arising from the superclasses of an instance declaration
    • In the instance declaration for ‘Monoid (Entries e)’
    |
549 | instance Monoid (Entries e) where
    |          ^^^^^^^^^^^^^^^^^^

Thanks!

@alexbiehl
Copy link
Member

alexbiehl commented Apr 15, 2019 via email

@newhoggy
Copy link

newhoggy commented May 8, 2023

What's the status of this ticket?

@Bodigrim
Copy link
Contributor

Bodigrim commented May 8, 2023

@newhoggy I imagine nothing has changed since 2019. PRs are welcome.

@Bodigrim
Copy link
Contributor

Support for long file names has landed in #77.

@hasufell
Copy link
Member

Fortunately the new pax entry containing metadata appears before the entry to which it applies.

Unfortunately not entirely true. You can have a 'K' typecode followed by an 'L' typecode.

Semantically this means you have a symlink where both the filename as well as the link target are longer than 100chars.

@hasufell
Copy link
Member

Is there anything else that needs attention here? I'm not sure whether we have proper pax support now, but it seems most of the individual issues have been fixed.

@Bodigrim
Copy link
Contributor

Ideally one should implement full support for PAX header blocks. Parsing them as in alexbiehl@51a6a3b is a necessary stepping stone, but the crux is to put this information to a good use, not just store it. PRs are welcome, but I do not perceive a burning need, thus closing. Feel free to raise a new issue, focused on PAX header blocks in general instead of long file names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants