-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Discussion on new getting started page (Numfocus Grant) #26831
Comments
cc @pandas-dev/pandas-core |
I don't like the idea much. I'd prefer to have few conventional end to end tutorials (solving a real world problem), targeting different use cases and users types. If I' had never heard about pandas, I think an "introductory tutorial" on how to open data (covering many options and formats) is not what I'm looking for. And it's not giving me relevant information on whether pandas is the tool I need or not. I think the user guide already addresses that quite well, and if we want tl;dr versions of the user guide pages (may be a good idea), I think that could be added at the beginning of each of them. What I'd personally appreciate is proper tutorials, for example one could be "Forecasting the stock market", where it's shown:
Of course that would only be useful for part of users, but I guess with 3 or 4 tutorials that cover the main pandas use cases, or introductory topics we can find. Finding them is surely not easy, but things that come to my mind:
I'd also add a more theoretical one, to teach the concepts (dtypes, data structures,...) |
Good point, but I do not completely agree. I do not think that, to make the documentation more welcoming to absolute beginners, advanced case studies ('proper tutorials') will help as the complexity will easily overwhelm new users. What we want to cover with the new getting started page is to provide different elements in line with http://third-bit.com/2019/04/16/what-docs-when.html:
Note: case studies are mentioned to help users from a competent level to expert level. Still, your point:
is true. Having tutorials in the getting started section on topics of the user guide is making the documentation more scattered. But this is not the purpose. To be clear, we are actually just talking about a new version of the current 10 minutes and nothing more elaborate. So, the aim of the proposed introduction tutorials is not to cover many options and formats, but very short introductions applied on small data sets (e.g. titanic) and real questions (e.g. I 'm interested in the titanic passengers older than 18 years). When the document says 'content to cover', this is NOT to explain these functions in detail, but to create a short storyline using these functions to answer the question. By doing so, we align with the how to section described in http://third-bit.com/2019/04/16/what-docs-when.html |
Thanks for the feedback!
I personally think the user guide is failing hugely on that aspect. IMO no beginner should read one of our pages of the user guide; they are way too overwhelming and unstructured. But that is maybe more saying something about those pages needing some love ... :) I always had the long term idea to split all the user guide pages in multiple subpages, with one introductory page per topic, and then several pages going in more detail in one of the subtopics (eg Indexing page: one introduction page explaining the different concepts, and then pages with more detail on advanced indexing, multi-indexing, ..). That's certainly an option to consider (and the effort Stijn is putting in those pages could be easily redirected to user guide intro's). We don't necessarily need to restrict the work of this grant strictly to the getting started section. Now, then to the question: what does a beginner need? (which is of course difficult to answer, as it is subjective and you have many different "kinds" of beginners). I personally think such a full "case study" can be easily overwhelming for newcomers, and that it can be useful to have smaller, more focused tutorials. Terminology is always difficult :-) I would call your "proper tutorial" rather a case study or showcase, but yeah, the definitions of and boundaries between tutorials, how-to's, case studies, user guides, ... is not always clear. We used the "A tutorial is a planned lesson that helps people build a mental model of a domain and acquire a few basic skills" of Greg Wilson when speaking about tutorials here. |
I think I'm misunderstanding your document. What you say about covering |
@jorisvandenbossche I agree on what you say (splitting the user guides...). You misunderstood (I phrased poorly) the text you quote. What I think the user guide addresses correctly is providing information about a specific topic (a "tutorial" about IO), which is what I understand from the google doc. Not whether pandas is the right tool for a new user. That's exactly the opposite of what I was trying to say. I think we need to discuss this more in practice. Write some drafts, and try to get people not familiar with pandas involved. And get their feedback as we progress. |
@datapythonista OK, then it would be good if you (and also others of course) could look at the specific objectives/content to cover of the different tutorials proposed in the google doc. That would be very useful. |
Although about the windowing functions, those are maybe quite common in the financial world I suppose? A reason to include after all, those kind of things would be good to get feedback on. |
why are we trying to recreate a branch new tutorial, rather than simply copying from all of the tutorials out there & consolidate that? |
My view is that the whole point of the tutorials is to let people know about what kind of things they can do with pandas. I don't expect newcomers to pandas to be experts in reindexing, window functions or whatever after reading a tutorial. But if they are not in the tutorial, how do we expect users to know they exist and use them? I'd say people go to the user guide or the API when they know what they are looking for, I'd expect the tutorial to give a quick overview of most features. |
@jreback that is what we are doing, e.g. the links provided by the twitter responds. |
I personally like this idea and the reference to the Django docs. From a practical use case, workflows more often that look something like:
I would think this is particular true for new users coming through say a Jupyter notebook, which is where I think is very friendly to newcomers. The ten points more or less cover these and can link to deeper discussions from there. I'd probably move plotting to be the last point out of these as kind of an end goal. Specific to point 9 I'd also make some emphasis around data types preceding it (like converting from a string to a date, why that's important) and relatedly we might want to touch lightly on why dtypes are important (not from a technical perspective, but maybe show object vs numeric somehow) |
@stijnvanhoey is there any work on this that can be already seen? I need to start preparing the pandas tutorial for EuroSciPy, and I'll try to reuse material from what we're building for the official docs if it makes sense, so I can get feedback from the audience. |
@datapythonista I missed yourquestion on the update of material. Based on the proposed plan with the 10 section getting started, I'm compiling now a first draft compilation of these 10 sections as notebooks in https://github.com/stijnvanhoey/pandas-getting-started-tutorials. Currently working on section 9 and 10. The idea is to have a returning pattern in the way these are setup: illustrative figure (just as intro), question/task-oriented (currently organized as quote- After a revision of the material, I'll further adjust/extend the tutorials and start on the way these will be represented in the documentation (from sphinx-integration to binder-support and proper styling of these elements together with other elements in the getting started page itself. It is a bit a struggle to find the balance between short/concise and enough informative, so any comments are certainly welcome. I do not know if the material is also fit for a workshop (as it focuses on self-learning), but it would be very interesting to test the material and the flow/logic of the content. Unfortunately I can't make it myself to EuroSciPy, but if I can do anything in the preparation, just ask! At the same time, I'm trying to get together a set of (more) consistent schemas to explain specific concepts, see https://github.com/stijnvanhoey/pandas-getting-started-tutorials/tree/master/schemas. These still need further revision, but feel free to use and try them out already. |
Great, thanks for sharing the repo. I think it'll be quite difficult to me (and probably many others) to give feedback on whether the tutorials are good and clear enough for people starting in pandas. So, I'll see if I can use them in conferences, and gather feedback from the kind of people we're targeting with them. I'll keep you updated, thanks! |
Good point, certainly because we aim here the real beginners. I'll pitch/spread the material in the carpentry community the coming weeks to gather some feedback as well and check around for local workshops/contacts to test the material. |
Just got the feedback that in the tutorials I'm giving at conferences I should include a tips and tricks section. Also, I think @justmarkham is having a lot of success with this way of teaching pandas. I don't have an exact proposal on how we could include those, but I think it's worth considering. One idea could be to have some info boxes with a "Did you know...?" tip during the tutorial. |
Thanks for letting me know about this discussion, @datapythonista! As for pandas tips and tricks, I'm not sure the best way to integrate them into teaching materials, though I do like the idea of a "did you know" info box. Zooming out: I haven't had a chance to digest the discussion and links above, but I'm happy to take a look and provide feedback, if it's helpful? I've been teaching pandas online for 5 years, and the biggest segment of my audience is beginner users, so I can speak somewhat knowledgeably about the beginner's perspective. |
If you have time to have a look at what's been discussed and give us feedback, that would be extremely valuable. The goal here is to make people new to pandas understand what it does and get started as fast and easily as possible for them. I doubt there are many people that can add as much value as you. :) |
@stijnvanhoey I was checking the notebooks, and as I said when we first discussed, I think we should have tutorials in the documentation, understanding by tutorial a notebook that shows end to end how to solve a real-life problem. Not opposed at all to have what you are building, shorter and more focused on the features. But I think some users will appreciate to have what I propse. To me personally, when I started with Django it was very useful to follow the tutorial on how to build a poll system end to end, and I wish I had such a thing when I started with pandas. I added to this page (https://datapythonista.github.io/pandas-web/try.html) couple of ideas of tutorials I'd like to have. Based on use cases similar to the ones that some users may have (data analysis, preprocessing to pass data to scikit-learn, and time series analysis). I'll probably build the first two for EuroSciPy, so we can discuss over them once I'm done. |
I do think this is complementary to the getting started notebooks, aka 10x1' material, serving a different purpose. The end-to-end provide an idea about the capabilities (what sort of tasks can I do with the tools provided by Pandas), whereas he 10x1' to pandas provides an introduction/gateway to the main features (which tools are provided by Pandas). @jorisvandenbossche and myself developed some of these end-to-end tutorials as case studies, oriented on environmental/life sciences:
Try it out on binder. I could certainly rework some of these as well to add to the end-to-end examples. Check them out and let me know. As we're aiming for starters/beginners, work is required to keep it concise and _understandable. The challenge will also be to provide a good variety in the topics/themes, maybe the input of the recent questionnaire provides useful info for this? |
Per Joris' email today I gave the notebooks a quick look. Here are just some of my thoughts:
Have to give it a few more passes but there's some quick thoughts. Generally are we tied to including these in the documentation as Notebooks? I find the standard Notebook rendering pretty ugly and I don't see interactivity of these really offering that much more to a user than copy / pasting into their own notebook or terminal session. I'm also not sure what kind of maintenance hosting those notebooks directly would incur (if any) W.r.t the website I think design is moving in the right direction. The code rendering is a little large but sounds like you are already on that. Looking forward to the next round of updates! |
Thanks for the feedback, certainly valid input. Some quick responds:
We could do it in notebooks and integrate into sphinx (e.g. using nbsphinx), but we planned to convert the tutorials content (when more polished) into the documentation pages itself (i.e. rst). With the update of the website, we have the bootstrap classes available to customize the looks and we can provide some custom css (e.g. proper styling of the questions, now in the notebook as a quote) to provide integration with a new getting started page. |
Yes, that big code rendering is a left-over of some of the css of the old website (which is already being removed in a PR, that should fix that issue). More feedback on usability is very welcome on the theme as well, but let's keep that in #15556 |
@stijnvanhoey cool all sounds good - thanks for the response |
Small note here that we made PRs for each notebook (still on the separate repo) which can be reviewed with the ReviewNB app (which makes it easier to review and comment on notebooks): https://github.com/stijnvanhoey/pandas-getting-started-tutorials/pulls |
Read most of the tutorials, and my feedback is the same as before. I wouldn't personally continue after the first tutorial, since it's too theoretical, and don't really feel engaged or curious about what's being explained.
I read half of the tutorials, and the module name |
this a convention (so not required) and is the most common import pattern and so appropriate to include |
@datapythonista, I understand your point; This is indeed not required and just a community 'convention'. If we actually include it or not, is your (core dev) call. I'll adjust the text asap. With respect to the first tutorial, I'll try to convert to a more engaging version. As mentioned earlier, the 'dtype' details will be removed to decrease theoretical level of it. Feel free to add more comments in the repo itself. Thanks for reviewing the material! |
I think I have said this before as well, but: you are welcome to start a specific discussion about this. But as long as we didn't have that, the standard we use in our docs is the status quo which is |
I agree on the first notebook that we should try to make it a bit more attractive. I also commented about that on the review PR: stijnvanhoey/pandas-getting-started-tutorials#2. For us, the goal of those getting started pages is to teach the readers some basic concepts about pandas, to give them a better understanding of how to think about pandas. @datapythonista So if you have some more concrete feedback on what you would do differently, or what you would otherwise cover in the first notebooks, that would be very welcome. If we still want to change the concept more fundamentally (which I have the feeling you are hinting on, Marc? Although it is not fully clear what your alternative is), we should decide that now, to decide what we think is best for Stijn to spend the last 3 days on (over the coming weeks). So for that it would also be welcome if some other people can chime in on this with their thoughts. |
Sorry for the noise with the alias. I agree that's a topic for a separate issue, sorry about that. I think we've already got too many open discussions with the web, so I'll create that issue at a later stage. I don't have more specific feedback. I don't think restarting from zero with a different approach is an option now, with just 3 days of work left. Just wanted to express that for my case personally, and the way I learn, I don't think the notebooks would be useful for me. I don't have any other feedback on that more than what we already discussed in the past. I agree having feedback from several people would be ideal, even more if they are beginners, the target of those documents. Sorry I can't be of much help in this. |
Hello all! I mentioned previously that I'd be happy to review the draft notebooks. Although I'm not a pandas beginner, I've been teaching pandas to beginners for many years and so I am familiar with the beginner perspective. Sorry that it has taken so long, but I've now taken a thorough look through the draft notebooks and will post my specific comments via ReviewNB. However, I also have some high-level comments:
Hope that is helpful! I'll try to get all of my specific comments into ReviewNB today. |
@justmarkham thanks a lot for looking at it and this feedback! That is really valuable and very much appreciated. I created the PRs for the two remaining notebooks. |
One other aspect to discuss: in what kind of format do we want to include this? The plan has always been to, at some point (i.e. now), convert the notebooks in the separate repo to rst, clean them up (convert to use ipython directive, etc), and include then in that way in our sphinx docs. But, there are actually more options, also in light of @TomAugspurger's recent exploration of putting our docs on binder (#27514, which converts out rst files to notebooks using sphinxcontrib-jupyter). For now, we will go with rst as planned, but mentioning it here, in case we want to discuss this (although a new issue might be better suited in case we want a general discussion about this). |
With respect to the conversion of the tutorials towards the documentation (getting started) pages, we will css-style the recurring elements (links to user guide, remember sections, ref to raw data,...) according to the following example html: https://stijnvanhoey.github.io/pandas-getting-started-tutorials/. All adjustments are bootstrap (already loaded) classes with some CSS. In practice: the |
What happened with this? Was this finally discontinued? |
Setup is tested on the new sphinx theme, currently preparing PR for pandas
itself. Can be expected coming week.
Op vr 10 jan. 2020 18:01 schreef Marc Garcia <notifications@github.com>:
… What happened with this? Was this finally discontinued?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26831?email_source=notifications&email_token=AAFYJLRNIIOAMJOF23HMZ63Q5CSVRA5CNFSM4HXYZSXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIURT3Y#issuecomment-573118959>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFYJLWBPIBO3HGMU6Q5XWDQ5CSVRANCNFSM4HXYZSXA>
.
|
As currently prepared together with @jorisvandenbossche, the following document is a proposal on how to adjust the getting started page and section of the documentation: https://docs.google.com/document/d/1Rc_eql5KLrdf0c582KyWfs2ADVNxbJy4jfosnqdrVak/edit?usp=sharing
The general idea is to split the current 10 minutes into 10 x 1' topics of Pandas. Input is very welcome on:
The text was updated successfully, but these errors were encountered: