Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Student Proposal : Support for elm notebooks #14

Open
shivanshxyz opened this issue Apr 10, 2021 · 0 comments
Open

Student Proposal : Support for elm notebooks #14

shivanshxyz opened this issue Apr 10, 2021 · 0 comments

Comments

@shivanshxyz
Copy link

shivanshxyz commented Apr 10, 2021

Name - Shivansh Yadav
Email - shivanshyadav20@gmail.com
Github - https://github.com/shivanshxyz
Possible mentor - @razzeee
Slack - Shivansh Yadav

Summary:

The project involves creating a data science ecosystem for elm by creating notebooks like ipython to support multiple code and markdown cells inside a single file.

Project Plan

Project Approach
Initially the project suggested using VScode notebook API to create notebooks for elm. But as I researched into its documentation, I realised there are a lot of disadvantages currently which makes VScode notebook API not a feasible approach to pursue.

❌ Disadvantages of VScode API -

  • The Notebook API is still proposed and under development, which means it is only available on VS Code Insiders
  • Still has a lot of bugs and is prone to crashing
  • Will not work on other IDEs

✔️ Advantages of using Jupyter kernels -

  • Better community support (a lot of jupyter kernels are being used inside tons of languages including python, typescript, javascript, c/c++, julia etc.)
  • More robust than vscode api notebooks as of now
  • Supports majority of modern IDEs
  • In the most recent presentation by vscode devs working on the notebook api (presentation recording), they mentioned that VScode notebook API will also provide support for existing jupyter kernels in future to run out of the box. So we will be able to get all the benefits of VScode API (like better load time and better extension support) into our jupyter kernel as soon as it gets released into the stable release of VScode.

So, after analysing I suggest that creating our own Jupyter kernel for elm might be a better idea as vscode api will support it natively in the future and people will be able to use our jupyter kernel with a wide range of different IDEs.

How does it work?

mechanism

The Jupyter architecture is a very flexible configuration. Both the frontend and the language kernel communicate to each other asynchronously, and the architecture allows for multiple frontends talking to the same kernel. So we can build a single jupyter kernel and can create multiple frontends for it either by using jupyter’s default webview or by using any other native frontends by different IDEs including the proposed VScode notebook API whenever it gets a stable release.

The underlying mechanism of the Jupyter kernel is such that it takes a code cell, puts it into a temporary file and then compiles that file using elm-compiler. As long as the kernel is live it keeps caching the data for other code cells in the notebook to use.

Jupyter communicates with its kernel backend using the ZeroMQ library. ZeroMQ allows us to establish communication sockets. The socket can be declared with either TCP, UDP, or IPC. Along with the method of communication (TCP, UDP, IPC etc), each socket has a type which indicates how the socket communicates and what type of sockets it can communicate with.

Benefits

The benefits of elm notebooks elm notebooks will not be limited to just data science but also has a lot of different use cases to implement :

  • Creating interactive tutorials (As of now, elm guide and elm playground are separate, but using notebook as live playground, we can unify both of them using markdown and code cells to create an interactive notebook where newcomers can learn and experiment with elm on the go)
  • Programming and Computer Science
  • Statistics, Machine Learning and Data Science
  • Mathematics, Physics, Chemistry, Biology
  • Earth Science and Geo-Spatial data
  • Linguistics and Text Mining
  • Signal Processing
  • Engineering Education

General Objectives and Project Features

At the end of the GSOC period, the notebook should have at least these features/libraries working successfully:

  • Ability to create elm code cells.
  • Ability to add markdown cells for better documentation
  • Ability to cache data from previous cells
  • Ability to run standard elm core packages
  • Support for plot graphs/charts for data visualisation
  • Support for maps for geo-location tasks
  • Support for carrying out mathematical tasks using libraries like brainrape/elm-mathml , elm-explorations/linear-algebra , ianmackenzie/elm-geometry etc.
  • (Optional) Support elm ports to use javascript data science libraries that are not present in elm ecosystem as of now

The underlying goal of the project will not be just creating an elm notebook but rather an entire data science ecosystem which will help new users to get onboard easily and also make it easier for existing python data science developers to make a switch into the elm ecosystem. This involves publishing a very well detailed blog/guide to get started with data science using elm. In this guide, I will curate all the details required for a newbie to learn data science fundamentals with the help of elm. The guide will also include using visualisations, maps, mathematical tasks. The guide will also curate a list of elm alternatives available for famous python data science libraries so that it is easier for python developers to get into the elm ecosystem.

Timeline

May 17 - June 7 (Community bonding)

Get to know the mentor(s) and learn more about the community. At the same time I would learn elm in more depth. Even though this project does not require very extensive elm experience, the goal of the program is to find long term contributors.

Also, I will communicate with the project mentors to get a more specific idea to refine the goals during evaluations

June 8 - June 22

The goal for this week is get started with a barebones kernel wrapper around elm-compiler and get to know in depth how the kernel works by building basic code and markdown cells, learning how to make calls from frontend to the kernel. At the same time, I will try to approach different kernel authors on github to get their opinions regarding the project and get to know the challenges they faced which I need to take care of.

June 23 - July 11

During this time, I will work to run and test all the standard elm packages running on the notebook. During this time, I will run tests and fix bugs continuously during this time.

July 12 - July 16 (First Evaluation)

During this time, I will get my work reviewed by the mentor(s) and publish a blog to describe my experience working on the project.

July 17 - July 31
During this time, I will focus to support all the essential packages that are required for data science. Some of the packages include visualisations, maps, and mathematical tasks. Optionally, I will also try to support elm ports so that javascript data science libraries can be used that are currently not available for elm.

August 1 - August 15

During this week, I will leave some buffer time in case something goes wrong or if something needs to be done in case of a delay with respect to the project timeline.

As we all know, documentation is a very important aspect of any project that cannot be ignored. For this, I will publish a very well detailed blog so to get started with data science using elm. In this guide, I will curate all the details required for a newbie to learn data science fundamentals with the help of elm. The guide will also include using visualisations, maps, mathematical tasks. The guide will also curate a list of elm alternatives available for famous python data science libraries so that it is easier for python developers to get into the elm ecosystem.

August 16 - August 23 (Code Submission and Final Evaluation)

I will get my final project reviewed by mentor(s) and the community members to get feedback. I will also publish a blog to describe my GSOC experience and what challenges I faced while developing the kernel.

Tech Stack required

Python - for this project, having intermediate python experience is a must.

Jupyter kernel - I have not worked on a jupyter kernel before but I am confident I can learn it on the go as there is an official documentation for jupyter kernel as well as a huge community support.

ZeroMQ - zeroMQ library will be used to establish the communication between our frontend and our kernel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant