Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pypi section to connector metadata #33529

Merged
merged 11 commits into from
Dec 20, 2023
Merged

Conversation

flash1293
Copy link
Contributor

To control the publishing of python connectors to pypi, this PR introduces a new flag to opt into this publishing.

The actual publishing logic as part of airbyte-ci will be implemented in a separate PR.

Copy link

vercel bot commented Dec 15, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 19, 2023 11:02am

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Dec 15, 2023
@flash1293 flash1293 marked this pull request as ready for review December 15, 2023 16:41
@flash1293 flash1293 requested a review from a team December 15, 2023 16:41
Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we this nest down under a something like remotePackageIndex, which have field for pypi / maven / npm whatever

@flash1293
Copy link
Contributor Author

Thanks for the review @alafanechere , could you take another look?

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate an additional review from @bnchrch .
Can we also add a validator to make sure that Pypi is only used for python connector? (A metadata validation for a java connector would fail if your new fields are declared in there)

extra = Extra.forbid

enabled: bool

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding the url to the pypi package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems in line with the "documentation url" and so on, added!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since our goal here is to publish, and tooling generally expects package name, could we call this the PyPi package_name? I think that would be my vote. (E.g. airbyte-source-apify instead of https://pypi.org/project/airbyte-source-apify/)

PyPi URL just isn't very helpful because (to my knowlege at least), the full URL can't be passed to pip install or to the publish operation.

Fwiw, I do see "pip url" used frequently, but that is generally an alias for a PyPi package name or a Git ref, if the package isn't on PyPi.

Comment on lines 146 to 149

pypi: Optional[Pypi] = None


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we go "polymorphic" here?

Suggested change
pypi: Optional[Pypi] = None
index_name: # Would be an enum like Pypi | Maven etc
url: str # Would be the url to the package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we will need different implementations for each of the package indices and maybe different options a while down the road, an explicit separate object seems more future proof, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, what do you think about using RemoteRegistries instead of RemotePackageIndexes:

remoteRegistries:
  pypi:
    url: <url to pypi package>
  DockerHub:
    url: <url to dockerHub images>
    imageAdress: dockerio.
  maven:
...

Dockerhub, maven etc. can come later of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion on the wording, changing to remoteRegistries

@flash1293
Copy link
Contributor Author

Can we also add a validator to make sure that Pypi is only used for python connector? (A metadata validation for a java connector would fail if your new fields are declared in there)

Good point, added that. It's checking for the language tag

extra = Extra.forbid

enabled: bool

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since our goal here is to publish, and tooling generally expects package name, could we call this the PyPi package_name? I think that would be my vote. (E.g. airbyte-source-apify instead of https://pypi.org/project/airbyte-source-apify/)

PyPi URL just isn't very helpful because (to my knowlege at least), the full URL can't be passed to pip install or to the publish operation.

Fwiw, I do see "pip url" used frequently, but that is generally an alias for a PyPi package name or a Git ref, if the package isn't on PyPi.

from pydantic import BaseModel, Extra, Field


class Pypi(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class Pypi(BaseModel):
class PyPi(BaseModel):

class Config:
extra = Extra.forbid

pypi: Optional[Pypi] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pypi: Optional[Pypi] = None
pypi: Optional[PyPi] = None

@flash1293
Copy link
Contributor Author

Makes sense to me @aaronsteers - adjusted.

Comment on lines +125 to +126
enabled: bool
packageName: str = Field(..., description="The name of the package on PyPi.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a followup PR once the publish to PyPi step is implemented:

  • We should add a validator that makes sure that if enabled is True the current connector version is available on PyPi.
    It's basically what we do for docker images: before uploading the metadata file to GCS we validate that the docker image is available on DockerHub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean as a post-upload validator? If yes that makes a ton of sense to me, thanks for mentioning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes when we run metadata_service validate <config file path>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to not be blocking - 👍 Feel free to add validation that PyPi is only set for python connectors in a follow up PR or this one.

Copy link
Contributor

@bnchrch bnchrch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just got caught up on all this.

Great and practical discussion.

Ive got nothing more to add besides LGTM!

@@ -151,3 +151,19 @@ The supported scope types are listed below.
| Scope Type | Value Type | Value Description |
|------------|------------|------------------|
| stream | `list[str]` | List of stream names |

#### `remoteRegistries`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💎

@@ -171,12 +171,29 @@ def validate_metadata_base_images_in_dockerhub(
return True, None


def validate_pypi_only_for_python(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💎

@flash1293 flash1293 merged commit e3fa594 into master Dec 20, 2023
28 checks passed
@flash1293 flash1293 deleted the flash1293/metadata-pypi branch December 20, 2023 09:53
aaronsteers added a commit that referenced this pull request Jan 8, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants