Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧐 Data exploring/inspecting #615

Closed
eliorc opened this issue Nov 14, 2020 · 12 comments · Fixed by #616
Closed

🧐 Data exploring/inspecting #615

eliorc opened this issue Nov 14, 2020 · 12 comments · Fixed by #616
Labels
emoji Gitmoji proposals.

Comments

@eliorc
Copy link
Contributor

eliorc commented Nov 14, 2020

Hello @carloscuesta 😎!

  • Emoji: 🧐
  • Code: :monocle_face:
  • Description: Data science is a fast growing field in the software industry. Most practices in data science may be translated to already existing gitmojis (new architecture -> new features etc.) but the practice of exploring and inspecting data, including exploratory data analysis are practices with no parallel meaning in classic software engineering. Since writing notebooks to inspect input data, features, model results and alike are very common in data science I think they deserve their own gitmoji :)

About testing, I read the contribution guide and haven't seen anything about that... So how should I go about testing?

@eliorc eliorc changed the title 🧐 face_with_monocle: Data exploring/inspecting 🧐 fmonocle_face: Data exploring/inspecting Nov 15, 2020
@eliorc eliorc changed the title 🧐 fmonocle_face: Data exploring/inspecting 🧐 monocle_face: Data exploring/inspecting Nov 15, 2020
@vhoyer
Copy link
Collaborator

vhoyer commented Nov 17, 2020

About testing, I read the contribution guide and haven't seen anything about that... So how should I go about testing?

Sorry, what do you mean?


Also, I'm not too knowledge in data science, and I didn't understand why this data exploration/inspecting would need a commit? And if you mean adding another grapth and/or another paragraph to a notebook, then couldn't we use 📝 or ✨ ?

@eliorc
Copy link
Contributor Author

eliorc commented Nov 18, 2020

Sorry, what do you mean?

I saw that my PR did not pass all tests, so I was wondering about how to make them pass - and stated I didn't find anything in the documentation


About why we need a gitmoji about data exploring, let's understand how data exploration is done.
In data science, we explore data through code, ususally in jupyter notebooks - see example.

These kinds of explorations are made in parts, much like any other piece of code. So you might want to commit things like "Added distribution plots for feature X" and then maybe "Exploration of textual data points" and stuff like that.

As I see it, there are no existing gitmoji equivalent since these are not new "features" but rather the act of exploring existing ones - this also extends to exploring results of model's predictions and similar things.

To sum up, the exploring/inspecting is instead of introducing new features is a code oriented way to explore existing ones

@johannchopin
Copy link
Collaborator

Hey @eliorc 👋 Thanks for opening a PR. Like @vhoyer I don't know a lot about data science's workflow and can't really understand what you mean with 'data exploring'. Is it just about new code that do something? Does it create new files? What exactly happens when you do data exploring in term of code or file manipulation?

@vhoyer
Copy link
Collaborator

vhoyer commented Nov 18, 2020

So about the tests, the thing that is breaking it probably the snapshots, I will assume you are not familiarized with those kinds of tests, and I like the ideia of adding this info on the repo, so I will do this later (mostly because this is not the first time this raises doubt haha), but to resolve your test errors, you should open the repo and run npm install && npm test -- -u; # then commit the changes probably using :camera_flash:. Of course, to do so you will need node installed (which already installs npm alongside it).


I see, I get it now, but why wouldn't you consider adding new blocks in notebooks a new feature? it's adding a table or a graph to the document that was not there before, right? On that same line, what would configure a feature in those cases in your opinion?

@eliorc
Copy link
Contributor Author

eliorc commented Nov 18, 2020

@vhoyer

We consider features to be new "functionalities" created in the project. For example, a new model architecture, a new preprocessing pipeline a new evaluation scheme. Basically these are things that can be reused in the future once they were developed.

Explorations/inspections are a one-off action, and definitely not reusable, as the insights drawn from them are only relevant to the data they were made upon. You might reuse the features you developed to create those insights - but the usage of those features in order to generate the insights is a different thing.

If you want to find the equivalent in classic engineering, I would say that data explorations/inspections are like unit tests. It is an action that is executed in a certain point of time and has results - the test runs are not features just like the explorations are not (and again, maybe you develop features to later be used in the tests, but that's a different thing).

@eliorc
Copy link
Contributor Author

eliorc commented Nov 18, 2020

@johannchopin what happens is you write some code in an interactive notebook, you execute it and you most likely create plots and even write some markdown in order to convey your insights. You will usually do it in logical blocks for example let's say you have developed 100 different models - you will want to inspect and explore their predictions, looking for their weak points to have a clearer understanding on what should you do in your next research iteration. Also I explained this in two comments ago with examples

@vhoyer
Copy link
Collaborator

vhoyer commented Nov 18, 2020

Ok, I agree with its inclusion in gitmoji, will take a look at your PR later, what do the rest of the gang think about it?

@johannchopin
Copy link
Collaborator

Ok I also agree with the integration of this emoji. But what would be the description exactly? Is there a better emoji for that because IMO 🧐 isn't explicit enough.

@eliorc
Copy link
Contributor Author

eliorc commented Nov 19, 2020

Is there a better emoji for that because IMO 🧐 isn't explicit enough.

I have more candidates, but the monocle is my favorite - here are all of them sorted with my personal preference

  1. 🧐 :monocle_face:
  2. 🕵️ :detective:
  3. 🔬 :microscope:

@eliorc
Copy link
Contributor Author

eliorc commented Nov 25, 2020

But what would be the description exactly?

About that, I think "Data exploration/inspection" covers the use cases of exploring (like EDA and alike) and inspection (inspecting model results and comparisons etc.)

@vhoyer
Copy link
Collaborator

vhoyer commented Nov 25, 2020

So lets vote on the emoji 😅

vote emoji
🚀 🧐 :monocle_face:
❤️ 🕵️ :detective:
🎉 🔬 :microscope:

@carloscuesta carloscuesta added the emoji Gitmoji proposals. label Dec 10, 2020
@carloscuesta
Copy link
Owner

carloscuesta commented Dec 10, 2020

Following up on @vhoyer comment,

It seems that we already have a winner 🎉

Just saw you already opened the PR @eliorc. We should update it with the results of the poll!

Thanks 🙌🏻

@carloscuesta carloscuesta changed the title 🧐 monocle_face: Data exploring/inspecting 🧐 Data exploring/inspecting Dec 10, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
emoji Gitmoji proposals.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants