Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate primary key detection functionality #2132

Merged
merged 2 commits into from
Jul 17, 2024

Conversation

amontanez24
Copy link
Contributor

@amontanez24 amontanez24 commented Jul 15, 2024

resolves #2101

@amontanez24 amontanez24 changed the title making it work if it hasn't detected sdtypes and adding unit tests Separate primary key detection functionality Jul 15, 2024
@amontanez24 amontanez24 marked this pull request as ready for review July 16, 2024 02:45
@amontanez24 amontanez24 requested a review from a team as a code owner July 16, 2024 02:45
@amontanez24 amontanez24 requested review from gsheni, frances-h and pvk-developer and removed request for a team and gsheni July 16, 2024 02:45
Comment on lines +561 to +563
original_columns = data.columns
stringified_columns = data.columns.astype(str)
data.columns = stringified_columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do this here and in _detect_columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so because the column names that get added to self.columns seem to be the stringified ones. So if we don't do it here then we might get a KeyError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually now that I think about it, this could be a problem for every other function that takes data and looks in the columns. Maybe we should change the dict to store the original column names instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we modified the data so the column names are the stringified ones, no? Then at the end we convert it back to the original column names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also do it at the beginning of _detect_columns and then restore them at the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but the idea is eventually people will call this method on their own

@amontanez24 amontanez24 merged commit eac39a8 into main Jul 17, 2024
39 checks passed
@amontanez24 amontanez24 deleted the issue-2101-separate-pk-detection branch July 17, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Separate primary key detection functionality
3 participants