Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide builtins for downloading GitHub release tarballs #138

Closed
dhellmann opened this issue Jul 5, 2024 · 8 comments · Fixed by #164
Closed

provide builtins for downloading GitHub release tarballs #138

dhellmann opened this issue Jul 5, 2024 · 8 comments · Fixed by #164
Assignees
Labels
enhancement New feature or request

Comments

@dhellmann
Copy link
Member

dhellmann commented Jul 5, 2024

Downstream we have several dependencies we're building from release tarballs on GitHub, instead of sdists. It would be convenient if there was an easy way to express that via first-class features of fromager, instead of having to hand-code each override.

overrides.list_all() will need to be updated if we extend the settings schema.

@shubhbapna shubhbapna self-assigned this Jul 9, 2024
@shubhbapna
Copy link
Collaborator

So I have been playing around with things and I think there are two possible implementations:

  1. If include_sdist is False and include_wheels is True, then we can apply a function get_url_for_download that returns the github release tarball URL (maybe for more clarity we can define a new variable download_from_github instead of using include_sdist and include_wheels). The user can also override this function if they want to use some other URL. One issue with this method is that we need a way to get the github URL for that particular package. We could try to use the metadata that's available with the wheels but it doesn't seem to be consistent. Some packages have a clearly defined Download-URL field while others have a Project-URL field. There are some packages which have neither. To fix this we could try using the GitHub API to find the release and download it.
    For vllm the metadata field contains the project URL but for our downstream build it seems like we are getting the release tarball from a separate repository which is not included in the metadata. In this case we will have to override get_url_for_download

  2. This approach much simpler where we just expose get_url_for_download and in sources.py we can change the URL like this:

    url, version = resolve_dist(
        ctx, req, sdist_server_url, include_sdists=True, include_wheels=False
    )
    url_resolver = overrides.find_override_method(req.name, "get_url_for_download")
    if url_resolver:
        url = url_resolver(req)

The advantage of this approach is that we don't have to figure out from where we want to get the github URL for that package. However, we will still have to hand-code an ovveride for get_url_for_download

@dhellmann
Copy link
Member Author

What if we extended the settings schema to support a list of packages to download from GitHub? We could map a project canonical name to the GitHub org and repo where releases can be found. Then our logic for finding the download_source() function would have to take that into account, too.

Would we need information other than the GitHub project URL?

@dhellmann
Copy link
Member Author

On the other hand, if they're downloading sources from GitHub they are almost certainly going to have to provide an override function to build from that tarball. So, maybe it's sufficient to write a reusable function to do the download easily? That would be something they can call from download_source().

@shubhbapna
Copy link
Collaborator

What if we extended the settings schema to support a list of packages to download from GitHub? We could map a project canonical name to the GitHub org and repo where releases can be found. Then our logic for finding the download_source() function would have to take that into account, too.

Would we need information other than the GitHub project URL?

Yes this could work, I think we just need the GitHub project URL. We can also make it generic and allow mapping a project to whatever URL they want to download from

@dhellmann
Copy link
Member Author

OK. Give some thought to what I said about needing an override for the build step before you invest in extending the settings. If you think it makes sense to use the settings, we can think about how to take URL templates so we can do things like plug in the version number.

@shubhbapna
Copy link
Collaborator

On the other hand, if they're downloading sources from GitHub they are almost certainly going to have to provide an override function to build from that tarball. So, maybe it's sufficient to write a reusable function to do the download easily? That would be something they can call from download_source().

Looking at our downstream plugins which override download_source to download a tarball from GitHub, all of them simply just download the file and rename it. I think we can implement both of this within fromager itself if we have a mapping of package to download URL with an additional field in the same mapping that allows the user to specify what they want to rename it to

@dhellmann
Copy link
Member Author

On the other hand, if they're downloading sources from GitHub they are almost certainly going to have to provide an override function to build from that tarball. So, maybe it's sufficient to write a reusable function to do the download easily? That would be something they can call from download_source().

Looking at our downstream plugins which override download_source to download a tarball from GitHub, all of them simply just download the file and rename it. I think we can implement both of this within fromager itself if we have a mapping of package to download URL with an additional field in the same mapping that allows the user to specify what they want to rename it to

Look at the pyarrow plugin more closely. But yeah, it may be useful to have some core functionality for at least downloading and possibly even organizing the unpacked source.

@shubhbapna
Copy link
Collaborator

Looking at the vllm plugin, looks like there is a need for some sort of a version mapping as well

@dhellmann dhellmann added the enhancement New feature or request label Jul 14, 2024
@mergify mergify bot closed this as completed in #164 Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants