Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data set switching in pandas section #69

Open
brownsarahm opened this issue Jun 14, 2018 · 5 comments
Open

data set switching in pandas section #69

brownsarahm opened this issue Jun 14, 2018 · 5 comments

Comments

@brownsarahm
Copy link
Contributor

brownsarahm commented Jun 14, 2018

The SAFI_results.csv dataset is used in the openrefine lesson and in much of the analysis in this lesson, but just for the first two parts of pandas, it uses the SF7577.tab dataset.

Is there a reason to switch contexts? Would it be better to use one dataset throughout?

@brownsarahm brownsarahm changed the title data set switching data set switching in pandas section Jun 15, 2018
@katrintirok
Copy link
Contributor

katrintirok commented Aug 13, 2018

After preparing with the social science lesson for teaching a workshop I am wondering about the same thing, this data switching is very confusing, especially since the SF577.tab data are never(?) introduced. Perhaps the entire lesson should be taught with the same data, and the same data should be used throughout the workshop which usually binds everything together nicely.

suggested solution: change episodes 8, 9, 11, 12, 14 towards using the SAFI data

(note: the SQL lesson also uses the SF577 data instead of SAFI data)

What do the curriculum advisory committee thinks about this issue?
From the CAC meeting minutes:
SQL lesson: "This lesson needs to be updated to use the SAFI dataset so that it is consistent with the rest of the workshop. It currently uses a database called “SN7577”. To show the advantages and power of using SQL, the data should be split into multiple tables."

@brownsarahm
Copy link
Contributor Author

@katrintirok and I taught using the SAFI data set for all of pandas and for matplotlib, should we contribute those changes?

@vinisalazar
Copy link
Contributor

vinisalazar commented Jun 24, 2021

Hi,

we are currently teaching this lesson in a workshop and also considered this. I ended up teaching Ep 8 and 9 using the SAFI dataset, and would support changing those episodes to that dataset.

However, the SN7577 dataset was well suited for episodes 11 and 12, as the tables seemed simpler to merge (less columns) than the SAFI dataset ones. I would support keeping the SN7577 dataset for Eps 11 and 12.

Best,
Vini

@brownsarahm
Copy link
Contributor Author

We created subsets of SAFI for 11 and 12, they're visible in the files repo we made for learners to download exercises and data in advance. and how we used them is visible in my fork

@fnattino
Copy link

Hi everyone, we have been teaching this lesson last week and we have also felt that dealing with multiple datasets was slightly confusing. Is there any plan to merge @brownsarahm's work on consistently using the SAFI dataset throughout the lesson in the main repository?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants