Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss: File format in the .NET Interactive VS Code extension #467

Closed
jonsequitur opened this issue May 22, 2020 · 20 comments
Closed

Discuss: File format in the .NET Interactive VS Code extension #467

jonsequitur opened this issue May 22, 2020 · 20 comments

Comments

@jonsequitur
Copy link
Contributor

jonsequitur commented May 22, 2020

The primary reason for the experimental file format supported by .NET Interactive Notebooks is that there are a few design decisions to be worked out to enable the Microsoft Python extension for Visual Studio Code and the .NET Interactive Notebooks extension to work well together. Until a design is agreed upon, we want to avoid introducing potential compatibility issues into .ipynb.

Here are some of the challenges:

  • There is not an agreed-upon convention for polyglot notebooks in the .ipynb file format that also takes into account:
    • single-process polyglot kernels, and
    • multiple kernels (whether polyglot or not) within a single notebook.
  • There isn't currently a clear way to indicate which VS Code extension should handle a given .ipynb file, and Python clearly takes precedence.

I'd like to get people's thoughts on the experimental .dib file format (example below) that the .NET Interactive Visual Studio Code extension supports.

.ipynb is also supported, as well as conversion between the two.

These are the goals we've had in mind that led to the current state of the design:

  • It should be trivial to author and read in a simple text editor without special tooling and without concern about escaping, e.g. not JSON, XML, etc.

  • It should be copy-pasteable between different contexts, e.g. a bare script file, a notebook cell, or another .NET Interactive host (e.g. https://github.com/CodeConversations/CodeConversations)

  • It should be able to contain multiple languages, including languages not yet supported by .NET Interactive.

  • It should be amenable to a good tooling experience, including completions and other language services even when multiple languages are present.

  • It should be able to be opened as a notebook (as currently implemented by the .NET Interactive VS Code extension).

  • It should be executable as an automation script. In the simple case this need not be a polyglot script, but it should be able to take advantage of the full capabilities of .NET Interactive, e.g. magic commands, NuGet support, extensibility, variable sharing.

    • When in automated mode, it should be able to advertise and consume command line arguments in a readable, simple-to-author manner.

    • When in interactive mode (e.g. notebook, REPL), the command line inputs should be able to be collected from the user via a prompt or trivially configured inline the code.

Here's a simple example of the format:

#!powershell

Write-Output "Hello from PowerShell!"

#!fsharp

let hello = "Hello from F#"
hello |> Console.WriteLine

#!csharp

#r "nuget:Humanizer,2.8.1"
using Humanizer;

#!share --from fsharp hello

Console.WriteLine(hello.Replace("F#", "C#").Transform(To.TitleCase));
@zyzhu
Copy link
Contributor

zyzhu commented May 22, 2020

ipynb is the common notebook file format for all languages (Python, R, Julia etc.) on Jupyter.

If dib is the file format supported in VSCode, is it only limited to .Net languages?

If it's intended to be more widely adopted, is this file format extendable and easily convertible so that other language communities can extend their notebook experience in VSCode?

@jonsequitur
Copy link
Contributor Author

If dib is the file format supported in VSCode, is it only limited to .Net languages?

.dib isn't language-specific and other languages can be added dynamically by adding subkernels.

If it's intended to be more widely adopted, is this file format extendable and easily convertible so that other language communities can extend their notebook experience in VSCode?

The file format can be extended to the degree that .NET Interactive itself can be extended, whether programmatically within the script or via a NuGet extension.

@dsyme
Copy link
Contributor

dsyme commented May 22, 2020

I think that's a good list of technical goals.

It's fine to contemplate a new file format but it should be 100% optional until it is an established thing - which roughly means online and client tooling for the format is very widespread (including scripting executors for it), stable and trusted, and the format has been emotionally accepted by most potential users especially in team or collaborative settings. Also TBH Microsoft have a reputation when it comes to file formats, we have to be aware of that and be cautious here.

Note that acceptance of new file formats is a social/network/human/emotional/trust thing - these things are not technical things under our control though we can make technical decisions that will affect whether effective acceptance is eventually achieved (and I don't see any particular technical problem with the technical decisions above). However by their nature all new file formats start at "acceptance level of minus 10000" sadly.... I speaking from experience with .fsx as a programming/scripting format - it now has near universal execution tooling via dotnet fsi and fairly widespread tooling but it has taken >10 years to get to that point. And that's not including other forms of tooling.

I think this means loading, saving, editing and executing .ipynb is necessary in very many scenarios (potentially by internally converting to this format on load, but that should not be user-visible).

@jonsequitur
Copy link
Contributor Author

The file format is optional. The benefits are related to polyglot functionality and automation (e.g. running in production), neither of which are directly supported by Jupyter. Not all users will care about these goals, but many do, and they're becoming more common as notebooks become popular in new areas. For these cases, this format might provide some convenience.

The file format arose organically because it's just the .NET Interactive submission format: magic commands delineating blocks of code in potentially different languages. Since .NET Interactive is submission-centric rather than cell-centric, the following (one cell):

#!pwsh
    < PowerShell code >
#!fsharp 
    < F# code >

...is equivalent to this (two cells):

#!pwsh
    < PowerShell code >
#!fsharp 
    < F# code >

This is why the file format started here. We're simply concatenating all of the cells.

@jonsequitur
Copy link
Contributor Author

I think this means loading, saving, editing and executing .ipynb is necessary in very many scenarios (potentially by internally converting to this format on load, but that should not be user-visible).

I agree entirely.

Assuming we can sort out the default experience for a given file extension such that we're not in conflict with the VS Code Python extension's Jupyter support, an intuitive workflow might be to allow you to open and work with either format, and when you save, it saves it back to the same file in the same format. You never have to think about the other file format until or unless you want to explicitly convert between them.

@zyzhu
Copy link
Contributor

zyzhu commented May 22, 2020

I apply this snippet in Jupyter config jupyter_notebook_config.py
https://gist.github.com/jbwhit/881bdeeaae3e4128947c#file-post-save-hook-py

Then Jupyter automatically saves Python notebook into .py file and IFSharp notebook into .fs file whenever I run any cells. It happens behind the scene. It also output the .html file. I usually share my result to nontechnical people in html file as you can open it from email directly. Essentially it just concatenates all code cells into one file.

Isn't this functionality already working in Jupyter? Why do we need another file format? I can only imagine the use case for polyglot notebook. It's hard to decide whether it's .cs, .fs or .ps1

@jonsequitur jonsequitur changed the title Discuss: File format Discuss: File formats May 22, 2020
@jonsequitur
Copy link
Contributor Author

This looks like a good workflow and for Jupyter it's nice and simple. You don't need an alternative file format and it's not my intention to give the impression that we're requiring one.

I can only imagine the use case for polyglot notebook. It's hard to decide whether it's .cs, .fs or .ps1

Yes, the alternative file format helps with polyglot scenarios (which not everyone has). But people might also find benefits when editing with plain text editors or using code review tools. These considerations motivated a similar approach in the VS Code Python extension allowing execution of "cells" within a .py file.

But this isn't a language-specific tool, so we wanted to explore a format that doesn't bias toward one language. Conceptually, .NET Interactive's polyglot support includes HTML and JavaScript and might grow to other languages like SQL, bash (#465), and so on. These aren't just magic commands but are implemented as in-process subkernels. These are user-extensible. Here's a sample (.dib | .ipynb) that includes PowerShell, C#, HTML, and JavaScript.

@dsyme
Copy link
Contributor

dsyme commented May 22, 2020

Assuming we can sort out the default experience for a given file extension such that we're not in conflict with the VS Code Python extension's Jupyter support...

It would be quite nice if VSCode simply prompted in this case. Any idea what actually happens?

Also I guess VSCode can't probe into the file to check if an extension claims ownership?

@brettfo
Copy link
Member

brettfo commented May 23, 2020

@dsyme I just tried it and currently our extension steals the association; there's no prompt and no way to re-associate with the Python extension (short of uninstalling), and since we can really only handle a subset of Jupyter notebooks (i.e., those using the .NET Interactive kernel), we shouldn't take the association outright. There's an open issue at microsoft/vscode#94408 that tracks this. As soon as this is fixed then we can safely associate wtih .ipynb files.

As for probing, for text files an extension can say whether it takes the association outright or if it should simply be added to the consider list, but nothing yet for notebooks.

@jonsequitur jonsequitur changed the title Discuss: File formats Discuss: File format in the .NET Interactive VS Code extension Jun 13, 2020
@whatevergeek
Copy link

I kinda like the dib approach...
Thanks for exploring this...

This enables me to run multiple languages in one notebook file. The other thing that I like about it is that it doesn't require me to install python/jupyter.

Please keep dib as it's a lean approach (we'll convert files to ipynb if needed..e.g. run in jupyter ecosystem)...

@bwilsonms
Copy link

I think it might be better to work on how to be able to call the current line or cell magics from with the already established .ipynb extension. That is already well establish by Jupyter and their community. I'd rather have the ability to call the different kernels from within one as an option to be tied to only one.

@jonsequitur
Copy link
Contributor Author

@bwilsonms The question of magic commands is orthogonal to the file format, I believe.

We do have data sharing across kernels on our roadmap, but not direct invocation, which gets a bit thorny from a dependency standpoint. In effect, the Jupyter frontend (or e.g. VS Code or Azure Data Studio) would need to be aware of the different kernels and route submissions accordingly, so it's a bit out of .NET Interactive's scope to be aware of other kernels. There's also a syntax issue. The reason .NET Interactive's magic commands use the #! prefix is that the IPython magic command syntax would be ambiguous in C#, F#, and PowerShell, where % is an operator. This follows the Jupyter guidance for kernel developers on magic commands.

In case I've misunderstood, can you give a more specific example of what you have in mind?

@jhoneill
Copy link

jhoneill commented Sep 1, 2020

I've come to this as a PowerShell person who wants to author PowerShell notebooks, preferably with VS code.

  • .NET interactive lets me create an ipynb notebook through Jupyter, using a local web server, a stack of python bits and so on. The learning curve is not too bad, but the end product is kind of ugly.
  • ipynb files are opened with the Python extension which works up to a point, but adding a new cell (for example) fails if the notebook uses a .NET langauge. And is also kind of ugly.
  • DIB files have a glaring omission which inhibits their use - not saving results. (a work round is to save as ipynb)

The current experience is you can Open a notebook and Run some code which generates output; Closing the notebook will prompt to Save changes, but later Re-opening the notebook shows the changes were NOT saved. That's referenced in #500 and it's the difference between an interesting proof of concept and something which can be used in the real world.

I'm not clear what the objective for DIB files is.
Avoid competing with the Python extension for ownership of ipynb files ? Using the same format with a different extension would do that.
An experimental way of creating polygot files ? You have solved that for ipnyb export by telling Jupyter everything is c# and using the language magic commands for each block of code.
The top of the list of aims you have

"It should be trivial to author and read in a simple text editor without special tooling and without concern about escaping, e.g. not JSON, XML, etc."

And that's not an aim I'd normally take issue with. But this is a format for doing notebooks in VSCode so "edit anywhere" is not the priority it might be for other types. JSON, XML etc lend themselves to machine parsing, (type .\test.ipynb | ConvertFrom-Json).cells gives access to the source, outputs, etc. Given that files will be edited almost exclusively in VSCode - at least in the short term, ease of building a notebook as the output of one process, or reading it as the input to another, should carry more weight than ease of hand editing the files.

@jonsequitur
Copy link
Contributor Author

@jhoneill If you want the result stored in the output file, I'd recommend using .ipynb. We're not trying to reinvent it, and we're improving direct support for it so that the VS Code Python support and .NET Interactive don't interfere with one another, and so that you also won't need to go through Jupyter just for file writing. Hopefully this addresses the friction you're seeing.

Avoid competing with the Python extension for ownership of ipynb files ? Using the same format with a different extension would do that.

We explicitly want to be interoperable with the larger ecosystem of tools that support .ipynb, for example Jupytext, nbdime, and GitHub's notebook display capabilities. In any case this issue of sharing .ipynb has been fixed and should be available in .NET Interactive shortly.

An experimental way of creating polygot files ? You have solved that for ipnyb export by telling Jupyter everything is c# and using the language magic commands for each block of code.

I think we've only partially solved it, as the magic command approach isn't great for tooling. For example, with this approach, the frontend doesn't know the language of the cell without parsing the code, which impacts things like code coloring and potentially completions (since for example rich language services aren't implemented by IPython and so an external service is often used to provide them).

There are numerous other notebook file formats, many of which have different goals not achievable in a single format. You can see a good overview of use cases in the Jupytext docs: https://jupytext.readthedocs.io/en/latest/formats.html. We might not keep .dib around because its goals overlap with some of these pre-existing formats.

@jonsequitur
Copy link
Contributor Author

See also: #739.

@jhoneill
Copy link

jhoneill commented Sep 2, 2020

@jonsequitur -thanks the response:
If you have a good way for two extensions not to get into conflict over ownership of .ipynb that might solve most issues at a stroke. The current .NET Interactive: Open notebook command reads ipynb but drops the outputs, and save doesn't save new outputs only .NET Interactive: Save notebook as specific file format does. Fixing that is critical really.

ipynb is established , and using the file format allows sharing between machines with different toolsets but doesn't guarantee the language(s) in the notebook will be supported. Giving the world a new format only makes sense if there is something which can't be done in or added to an existing format - innovating is a problem because if add something like polyglot support it's easy enough to add the JSON schema, but then everyone else needs to adopt the change, otherwise they the file parses properly but doesn't execute as expected. Hence you may be stuck with the partial solution you have now.

@jonsequitur
Copy link
Contributor Author

If you have a good way for two extensions not to get into conflict over ownership of .ipynb that might solve most issues at a stroke. The current .NET Interactive: Open notebook command reads ipynb but drops the outputs, and save doesn't save new outputs only .NET Interactive: Save notebook as specific file format does. Fixing that is critical really.

An improvement to this has been merged and we should have a new version of the extension published today. Looking forward to feedback on it.

Giving the world a new format only makes sense if there is something which can't be done in or added to an existing format - innovating is a problem because if add something like polyglot support it's easy enough to add the JSON schema, but then everyone else needs to adopt the change, otherwise they the file parses properly but doesn't execute as expected.

Agreed. Regarding the file format aspect, one example is that certain organizations explicitly don't want the .ipynb outputs for fear of exposing sensitive data, and there are custom tools used in source control to remove them. That's one audience for an output-free file format. I see polyglot as somewhat unrelated to file formats for .NET Interactive at this stage, since .ipynb specifies a kernel, not a language. As with any Jupyter kennel, if you don't have the kernel installed, the notebook won't work. Now, if people start exploring the idea of multi-kernel notebooks, this introduces different challenges. But .NET Interactive doesn't support that.

@Szeraax
Copy link

Szeraax commented Dec 7, 2021

@jonsequitur

The benefits are related to polyglot functionality and automation (e.g. running in production)

@jhoneill:

DIB files have a glaring omission which inhibits their use - not saving results. (a work round is to save as ipynb)

As a fellow powersheller who has been seriously loving the simple and effective DIB format, I see HUGE value to storing the output in the file. otherwise, I'm going to have to use Start-Transcript or some other method of saving the output somewhere when used in an automated production environment. I only happened upon this discussion because I was lamenting dib not having saving of results and decided to go do some googling about it.

If you want the result stored in the output file, I'd recommend using .ipynb. We're not trying to reinvent it

Why not? the dib format is awesome. As a powersheller who dabbles in .net/c#, dib is rockingly cool. Inability to save is just KILLER for my hopes.

Dreaming

Mind if I share a dream?

Imagine that instead of having a powershell script in production, you have dib notebook instead. The script is broken down into cells with documentation living right with the actual code.

A finance employee who depends on this script notices that it didn't produce any output and opens the file in VSCode and sees after each code block the last 3 sets of outputs that were generated by that block. This finance user is able to scroll through this notebook and identify exactly which cell didn't produce similar output on its most recent invocation (by comparing with the other ones present). They are able to quickly determine that the cell could not find the file that is expected in a certain folder, resolve the issue (put the file in that folder), and re-run the notebook. They see everything work perfectly like it should before they close it.

How cool is that? No IT needed. The business unit MASSIVELY empowered by having historical output + code + documentation all living in a single place.

Implementing Output Preservation Support

In my mind, there are several ways that you could implement such a feature, and I can't find good documentation easily about the dib file format to see if any of this is already present or in a roadmap.

One approach that seems attractive to me is to introduce a cell level option that indicates the number of prior executions to preserve. You could call it --preserveOutputCount or something. I'd also argue that its worth adding a notebook level option before any cells occur called like #!notebook where you could also add this preserveOutputCount option if you want to save all cells output in the notebook. Cell level options would override notebook level options, so if you turn on saving cell contents, you can still turn it off at the cell level.

I'm imagining something like this:

#!notebook --preserveOutputCount 1
#!powershell

Write-Output "Hello from PowerShell!"

#!fsharp --preserveOutputCount 0

let hello = "Hello from F#"
hello |> Console.WriteLine

#!csharp

#r "nuget:Humanizer,2.8.1"
using Humanizer;

#!share --from fsharp hello

Console.WriteLine(hello.Replace("F#", "C#").Transform(To.TitleCase));

Which would save only the most recent execution for each cell immediately below it (except for the fsharp cell, which would not save any output at all when ran. You could even let the default be that when RUNNING a notebook in an automated fashion or interactive fashion that it DOES NOT UPDATE the notebook file after execution. Then, to enable saving the output into the file after execution, add another option to the file level. Something like: #!notebook --saveOutputOnExecute

Lastly, the HOW to save output. You don't need to save the STATE, the output doesn't need to be resumable. You can just let it be a plain text field like what markdown is, just without any formatter/markdown support. Maybe call it #!output for that cell.

Rolling this all together, after running each cell twice (notice that my c# cell errors the 2nd time that I run it from VSCode), if I saved the file, it would look like this in notepad:

#!notebook --preserveOutputCount 2
#!powershell

Write-Output "Hello from PowerShell!"

#!output

Hello from PowerShell!

#!output

Hello from PowerShell!

#!fsharp --preserveOutputCount 0

let hello = "Hello from F#"
hello |> Console.WriteLine

#!csharp

#r "nuget:Humanizer,2.8.1"
using Humanizer;

#!share --from fsharp hello

Console.WriteLine(hello.Replace("F#", "C#").Transform(To.TitleCase));

#!output

Error: Humanizer version 2.8.1 cannot be added because version 2.8.2 was added previously.

#!output

Hello From C#

Honestly, I think its super cool to have the idea of lightweight notebooks in VSCode that can bring documentation, code, and historical output all together for use in a real world production environment.

@jonsequitur
Copy link
Contributor Author

Thanks for sharing this dream, @Szeraax! I think it would actually be fairly simple to prototype with our existing APIs.

There are dreams over here that I think are related: Area-Automation Relating to non-interactive execution of notebooks and scripts . In particular, #1554 also discusses the possibility of behavioral differences between automated and interactive modes, and #483 hints at dealing with different output types but stops short of attempting to say how it might work.

We're trying to come up with a compelling automation story that I think will often be orthogonal to file formats. As part of this we're asking questions about what different file formats are useful for and what they provide versus what capabilities should be independent of file formats. Should we support additional formats as we go? How will those fit in?

We'll try to make this more concrete in the new year.

@Szeraax
Copy link

Szeraax commented Dec 8, 2021

Ok, I threw down my thoughts on those two that you linked and [I also commented] on the dream of saving outputs from another person: #839

When I wrote my original comment here, I didn't know about the proposal to compile notebooks.

Lets assume that you:

  1. Get a compiler working for notebooks
  2. implement the capability to return output cells as plaintext to stout independent of storing the output cells within the notebook (which also means that you can view the exact same stout of a run whether it is done interactive in VSCode output window, dotnet interactive, or compiled exe)

Back to the compiled notebooks: obviously those wouldn't be editable for storing output cells. So either its drop the capability to store output cells into the notebook when running compiled notebooks, create a 2nd feature that lets you output to another file regardless of method (VSCode,dotnet,compiled), or pick only one of those two features to implement. I would advise against ONLY creating the feature that saves output into the notebook itself. So either only do that 2nd feature to output to external file, or do both. (hopefully everytime I hit "run all cells" in VSCode, it doesn't have to create a file with all outputs :P)

I would like there to be something richer than just stout for viewing output cells from the compiled exe. My vision is for business units to increase their programming skills. I WANT them to be seeing code. I want them to be able to figure out root causes on their own (with time, of course!).

Rather than my accounting person coming to me and saying, "block 5 failed on this exe, fix it please", I want that accounting person to be able to AT LEAST look at the markdown + code + output and see if they can tell what is wrong before they ever talk to me.

Lets raise our sights and get ALL business units doing the tech work instead of heaping it onto IT :)

Honestly, I'd be fine if the only feature added between saving output in dib file vs saving output + code + markdown to external was the external one.

As for implementation, what about just creating the #!output cell type that I talked about and throwing all cells out to file? Viewing these files would generally be done in VSCode, so you don't need to do any rich text syntax highlighting. Just do a copy of the script and its #!output cells to a file and bango, you're done.

@dotnet dotnet locked and limited conversation to collaborators Feb 14, 2023
@brettfo brettfo converted this issue into discussion #2736 Feb 14, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

8 participants