Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discovery taking a very long time for databases with 1000s of tables #76

Open
BuzzCutNorman opened this issue Jul 16, 2024 · 2 comments

Comments

@BuzzCutNorman
Copy link
Owner

From Meltano Slack on 7/16/2024

I'm using the buzzcutnorman variant of tap-mssql which is based on the SDK.
There is an enormous list of tables in the DB but I only need access to a handful. I think that by default discover is generating a catalog entry for each of the 1000s of tables.
Is there any way to avoid this? I have a select in my meltano.yml already but that doesn't seem to prevent it, only stops the records from non-required tables getting emitted. (edited)

If it's the problem I think it is the discovery process takes 5-30 minutes to run (I've heard longer for some people) , it can reduce that down to seconds to minutes. Where I've seen it for folks is when they have something like 10k tables

Our "workaround" in tap-postgres is this MeltanoLabs/tap-postgres#218 add a filter_schemas config

@visch
Copy link

visch commented Jul 17, 2024

I'd just add that addressing meltano/sdk#1234 is the long term fix (I think)

@BuzzCutNorman
Copy link
Owner Author

Andy Carter is proposing the following change.

    def get_object_names(
        self, engine, inspected, schema_name: str
    ) -> list[tuple[str, bool]]:
        # Get list of tables and views
        if self.config.get('fake_it', False):
            table_names = ['my_table']
            view_names = []
            return [(t, False) for t in table_names] + [(v, True) for v in view_names]
        else:
            super().get_object_names(engine, inspected, schema_name)

    def get_schema_names(self, engine, inspected) -> list[str]:
        return ['dbo']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants