Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Update docstring for read_excel #56543

Merged
merged 7 commits into from
Jan 4, 2024
Merged

Conversation

phofl
Copy link
Member

@phofl phofl commented Dec 17, 2023

The initial motivation for this change was the typo in xlrd

@phofl phofl added Docs IO Excel read_excel, to_excel labels Dec 17, 2023
@phofl phofl added this to the 2.2 milestone Dec 17, 2023
@phofl phofl requested a review from rhshadrach as a code owner December 17, 2023 23:27
Comment on lines -178 to -179
When ``engine=None``, the following logic will be
used to determine the engine:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it's good to have this logic documented, can you just move it out of the versionchanged instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC xlrd used to not only support xlsx files but at one point was even the default so we had to go through some lengths to document that transition as clearly as possible. We are a few years removed from that and since then all default read libraries have specialized in a given extension(s), so I think we can do without going into this detail in the docstring

Copy link
Member

@rhshadrach rhshadrach Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and since then all default read libraries have specialized in a given extension(s)

Where are the default read libraries documented? E.g. both openpyxl and calamine can read xlsx, and both pyxlsb and calamine can read xlsb.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calamine is never the default - you'd have to explicitly use that as an engine. Otherwise this is documented in the Excel section of the IO manual

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#excel-files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted that section could be rewritten to be a little clearer, but I think that is out of scope for what @phofl is doing here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calamine is never the default - you'd have to explicitly use that as an engine.

This is true today, but there is an issue to make it the default for xlsb files. In any case, I don't believe it's documented that Calamine is never the default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and added back in without the version changed

@@ -165,31 +165,12 @@
Supported engines: "xlrd", "openpyxl", "odf", "pyxlsb", "calamine".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope, but it appears to me this line is duplicative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this line

@phofl
Copy link
Member Author

phofl commented Jan 3, 2024

/preview

Copy link
Contributor

github-actions bot commented Jan 3, 2024

Website preview of this PR available at: https://pandas.pydata.org/preview/56543/

@phofl
Copy link
Member Author

phofl commented Jan 3, 2024

I've also clarified the user guide

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach merged commit 70fc174 into pandas-dev:main Jan 4, 2024
50 checks passed
@rhshadrach
Copy link
Member

Thanks @phofl

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 4, 2024
@phofl phofl deleted the doc_excel branch January 4, 2024 08:21
phofl added a commit that referenced this pull request Jan 4, 2024
…cel) (#56730)

Backport PR #56543: DOC: Update docstring for read_excel

Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants