Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: split embedders in individual files #315

Merged
merged 2 commits into from
Dec 18, 2024
Merged

Conversation

smoya
Copy link
Contributor

@smoya smoya commented Dec 17, 2024

This PR refactors the existing projects/pgai/pgai/embeddings.py file by splitting it into smaller files, each containing a single class for a specific embedding provider (atm: voyageai, openai, ollama).

The primary goal is to improve maintainability and prevent the file from growing indefinitely over time, which can make it harder to manage and navigate.

@smoya smoya force-pushed the chore/splitEmbedders branch 7 times, most recently from 711ebd2 to c802c74 Compare December 17, 2024 15:23
@smoya smoya force-pushed the chore/splitEmbedders branch from c802c74 to b66ed36 Compare December 17, 2024 15:24
@smoya smoya marked this pull request as ready for review December 17, 2024 15:31
@smoya smoya requested a review from a team as a code owner December 17, 2024 15:31
from pydantic import BaseModel
from typing_extensions import override

from ..embeddings import (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class is unmodified, except for this required imports.

from pydantic import BaseModel
from typing_extensions import override

from ..embeddings import (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class is unmodified, except for this required imports.

from pydantic import BaseModel
from typing_extensions import TypedDict, override

from ..embeddings import (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class is unmodified, except for this required imports.

Copy link
Collaborator

@cevian cevian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but lets also have @JamesGuthrie review

from .openai import OpenAI
from .voyageai import VoyageAI

__all__ = ["Ollama", "OpenAI", "VoyageAI"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this __all__? According to this:

if a package’s __init__.py code defines a list named __all__, it is taken to be the list of module names that should be imported when from package import * is encountered

This seems like somewhat esoteric functionality, which we're not actively using, and I don't see the need to support. By removing it we can remove one more instruction in the readme above.

Copy link
Contributor Author

@smoya smoya Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this in order to be able to import all providers through the same module: from .embedders import Ollama, OpenAI, VoyageAI The alternative (unless you know another one) would be to use individual imports such as:

from .embedders.ollama import Ollama
from .embedders.openai import OpenAI
from .embedders.voyageai import VoyageAI

I'm fine with this one and removing the content of the __init__.py. Removing that step which I agree it would be one less step to consider when adding a new provider.

EDIT: Another alternative in between would be just to remove the __all__ from but keep the imports in the __init__.py:

# __init__.py file
from .ollama import Ollama
from .openai import OpenAI
from .voyageai import VoyageAI

In that way we can still do the from .embedders import Ollama, OpenAI, VoyageAI (When I coded this, I believed the __all__ was necessary but it is not).

WDYT @JamesGuthrie ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! I don't have a strong opinion, but the second suggestions (imports in __init__.py) is "less magical", so I would prefer it over __all__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the fast answer. Code (and docs) updated with the second option.

from .voyageai import VoyageAI

__all__ = ["Ollama", "OpenAI", "VoyageAI"]
from .ollama import Ollama as Ollama
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redundant alias is needed, otherwise the linter will complain and provide the only alternative of adding it to the __all__ var

@smoya smoya merged commit 77673ee into main Dec 18, 2024
5 checks passed
@smoya smoya deleted the chore/splitEmbedders branch December 18, 2024 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants