Skip to content

DOC: Remove docs code header and use explicit imports in the user guide #28038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
datapythonista opened this issue Aug 20, 2019 · 22 comments
Open
Labels

Comments

@datapythonista
Copy link
Member

datapythonista commented Aug 20, 2019

I've been discussing with the Binder team, and looks like it'll be possible soon to make the pandas examples in the docs runnable directly from the docs.

I think this will be amazing, but I think it's one more reason to start being explicit and self-contained with the docs. One option is of course that the trickiness we have now on importing things translate to Binder, and we automatically inject code for imports (and may be seeds...).

But my opinion is that the right thing is to start being explicit on what we run in the examples.

@datapythonista datapythonista added Docs Needs Discussion Requires discussion from core team before further action labels Aug 20, 2019
@TomAugspurger
Copy link
Contributor

we automatically inject code for imports

+1 for removing the {{ header }} stuff in favor of explicit imports.

And also avoid aliasing our own module

I disagree here. Let's keep that as a separate issue.

@jorisvandenbossche
Copy link
Member

Also +1 on explicitly doing and showing the imports.
We could also see if they can be simplified. Eg randn = np.random.randn could be fixed for all occurences to use the full version. And maybe pd.options.display.max_rows = 15 can be removed with the shorter default nowadays (although if the examples use a length of eg 50, that would still be shown).

And also agree with keeping the alias discussion for a separate issue. That will be easier for this one ;)

@datapythonista datapythonista changed the title DOC: Remove docs code header and use explicit imports in the docs DOC: Remove docs code header and use explicit imports in the user guide Aug 20, 2019
@datapythonista datapythonista removed the Needs Discussion Requires discussion from core team before further action label Aug 20, 2019
@datapythonista
Copy link
Member Author

Sorry for mixing things. I think there is agreement on this one, for the user guide. I was also considering the API, but I'll also open a separate issue for that, so we keep the discussions focused. Thanks for the feedback.

@jreback
Copy link
Contributor

jreback commented Aug 21, 2019

why the move to remove header macro? IIRC this was put in place to make sure that all of the imports are exactly the same everywhere on each doc section

this seems like churn for no gain

@datapythonista
Copy link
Member Author

The code in the header is hidden, and we implemented the union of all the blocks with imports, seeds... to avoid duplication, and because having a huge header file in every file was transparent to the user.

The rendered header added a problem that is that Sphinx doesn't report the right line for the errors anymore (it reports the line on the rendered file, but we look for the error in the template). It was already under discussion whether we wanted to keep it or not.

But the reason for proposing this now, is that the header adds extra complexity to make the code runnable in Binder. In two different ways:

  • Technical complexity in our side (it may be easy to address, but in any case it's added complexity)
  • For the user, if we let them open the code in a Binder notebook, since they may end up with a lot of code that we don't show in the docs, and that in many cases may not even be relevant

The header surely provides value in avoiding repetition, but IMO it's not worth. It's better to repeat the needed parts, and also be explicit about all what we're doing, since with Binder we'd give the users the change to play with the code, and any magic there will be confusing and unappreciated.

@jreback
Copy link
Contributor

jreback commented Aug 21, 2019

we’ll i still would have a fixed header of imports; having consistency in the docs overrides all other issues ; it easily becomes a mess without enforcement via CI

@datapythonista
Copy link
Member Author

Now we validate in the CI that Sphinx doesn't generate warnings, so if an import is missing the CI should fail.

@jorisvandenbossche
Copy link
Member

having consistency in the docs overrides all other issues

Note that this is only for the doc build, as it is about a hidden code block right now. And then @datapythonista is correct that this is now validated in CI that the doc build runs correctly.

Part of the proposal is also to make it no longer a hidden code block. So there we also need to ensure that what we show to the users is somewhat consistent throughout the user guide. But I think we can certainly achieve that without inserting a common header as well.

@jreback
Copy link
Contributor

jreback commented Aug 21, 2019

having consistency in the docs overrides all other issues

Note that this is only for the doc build, as it is about a hidden code block right now. And then @datapythonista is correct that this is now validated in CI that the doc build runs correctly.

Part of the proposal is also to make it no longer a hidden code block. So there we also need to ensure that what we show to the users is somewhat consistent throughout the user guide. But I think we can certainly achieve that without inserting a common header as well.

well this was just changed to make it consistent for each file
the concern here is the same as the original

it will get inconsistent thru time across files

so if possible to have a single file with consistent imports that is then objected to the build would be preferable

@jorisvandenbossche
Copy link
Member

Your last sentence is not very clear to me.

@jreback
Copy link
Contributor

jreback commented Aug 21, 2019

Your last sentence is not very clear to me.

right was a suggestion if we can have still a single place to import and set options for docs (and still have binder work); as this enforces consistency the best imho

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Aug 21, 2019

Well, that's more or less the whole point of the issue that Marc opened: not having it in the actual rst file (but injecting it as we do now) is probably possible to have it work with binder, but it is added complexity (need to ensure the header is injected in the binder pipeline as well, as that will probably be independent from sphinx), and hence his preference to just get rid of the injected header (as we did until very recently)

@datapythonista
Copy link
Member Author

This is what we render automatically with {{ header }}:

.. currentmodule:: pandas

.. ipython:: python
   :suppress:

   import numpy as np
   import pandas as pd

   randn = np.random.randn
   np.random.seed(123456)
   np.set_printoptions(precision=4, suppress=True)
   pd.options.display.max_rows = 15

   import os
   os.chdir(r'{}')

randn = np.random.randn is a bad idea, and just used in the release notes of 0.10.0.

I guess the rest is used in almost every page of the user guide.

I think there are two different questions:

  1. Do we want to have a variable for it (like we do now), or do we want to have it in every file
  2. Do we want to hide it , or should it be shown to users

I can see the advantages and disadvantages in both options of 1. But for 2, I think it's vert wrong to hide code, create magic to the users, not let them reproduce exactly our examples, and Binder will make that even much worse.

@jreback
Copy link
Contributor

jreback commented Aug 21, 2019

agree that assigning to randn is just incorrect - that should just be fixed

ideally we would expose this as user code in the docs
but how do you propose to keep it consistent? that was an extreme problem before

ideally code review can do this but it has over time dropped the ball
it is VERY hard to keep consistently in this large a code base (just look at some of the duplication in tests for example);

anything style related needs to be automatic and in the CI

what if you inline the code and add a style check for this?

@jorisvandenbossche
Copy link
Member

Personally, I think you overestimate the "extreme problem" a bit.

Before, the header was much longer because we imported lots of things with non-standard imports, this initial hidden ipython block was a mess. But that was cleaned up in the docs over time, and now (apart from the single randn thing), all example code uses standard imports. The initial code block with imports is thus much smaller.

I personally don't see a problem with repeating this in every file.

(but in any case, it seems there is agreement over Marc's point 2) -> let's show this code block with imports to the users)

@datapythonista
Copy link
Member Author

datapythonista commented Aug 21, 2019

For now I'm happy to just remove the :suppress: and keep the header. Hiding code from the user is what I think it's a problem, not that much the header itself (sorry I mixed both in the issues)

For the header I'm more open, but I don't think it's that important to be consistent (e.g. having different seeds or different display.max_rows). May be we could have variables for them (e.g. np.random.seed({{ random_seed }})), we would still repeat some code, but we would get correct line numbers for Sphinx errors.

For being inconsistent with the imports, if an import is missing the CI will fail because code blocks will fail and we are now failing the CI for that.

@jreback
Copy link
Contributor

jreback commented Aug 21, 2019

i am fine with changing this but i have seen over time that it is very easy to get inconsistency from well intentioned changes.

TanyaaCJain added a commit to TanyaaCJain/pandas that referenced this issue Aug 22, 2019
- Removes code header in IO doc page.
- Uses explicit imports in the IO page of user guide on first occurence of requirement.
- In reference to pandas-dev#28038
@WillAyd
Copy link
Member

WillAyd commented Sep 5, 2019

This PR is well done but I'm not sure I am on board with the concept. Associating each import with its first reference in the rst feels a little arbitrary and I think is just confusing. Seems like it could be somewhat of a nuisance when moving things around; nothing major but in any case a step backwards from just knowing standard imports are done in the header

Are we dead set on going this route?

@WillAyd
Copy link
Member

WillAyd commented Sep 5, 2019

Sorry last comment was in reference to PR #28089 but posted here in issue

@ghost
Copy link

ghost commented May 17, 2020

Dear @WillAyd , since the matter was discussed on #28089 , can we close this issue ?

@WillAyd
Copy link
Member

WillAyd commented May 19, 2020

I think so but @datapythonista I think can close or clarify

@datapythonista
Copy link
Member Author

Thanks @LukeLima for the triagging. #28089 removed the {{ header }} from one of the pages, but most of them still have it. So, this issue is still relevant, and can't be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants