Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process input tables simplification #2143

Merged
merged 2 commits into from
Apr 24, 2024

Conversation

RobinL
Copy link
Member

@RobinL RobinL commented Apr 15, 2024

A small PR setting some of the foundations to do analysis of blocking rule without a linker.

This is a step towards solving #2142

Since the process_input_tables function is used only during table registration (and is part of the table registration logic) it makes senes to perform is as part of table registration as opposed to a separate function call.

This means that e.g. in profile_columns we need a single register_multiple_tables call rather than three lines:

tables = ensure_is_list(table_or_tables)
tables = db_api.process_input_tables(tables)
splink_df_dict = db_api.register_multiple_tables(tables)

We'll need similar logic for any functions that do exploratory analysis without a linker, so we don't want to have to repeatedly write this logic.

Don't know why the docs build triggered here, but all the other tests are passing

@RobinL RobinL changed the base branch from master to splink4_dev April 15, 2024 12:59
@RobinL RobinL requested a review from ADBond April 22, 2024 13:31
Copy link
Contributor

@ADBond ADBond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, seems sensible.

def register_table(
self, input_table, table_name, overwrite=False
) -> SplinkDataFrame:
input_tables = self.process_input_tables([input_table])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this call, as it will be handled in register_multiple_tables

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot thanks

@RobinL RobinL merged commit 0df2cc6 into splink4_dev Apr 24, 2024
15 checks passed
@RobinL RobinL deleted the process_input_tables_simplification branch April 24, 2024 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants