🪲Bug Hunt🪲: dill goes to jupyter. what gets stupyder? #301

OwenPriceSkelly · 2023-10-11T19:58:02Z

Our new publication workflow takes an "everything but the kitchen sink" approach to serializing user code, which significantly reduces the hoops our sdk and the users need to jump through in order to publish their work. The tradeoff is that the new approach leans heavily on dill's ability to reliably save a python interpreter session to a file, and opens us up to far more edge cases. Now that we're going to try and serialize arbitrary jupyter notebooks with dill, we need to know what kinds of jupyter notebook state dill won't be able to handle as advertised.

At a high level, this is the process:

Given some jupyter notebook, convert it to a plain python script with nbconvert
Run that script in a fresh python interpreter (bringing user definitions, imports etc into scope)
- Ideally, this should reach exactly the same interpreter state as running each cell in order manually would. When might this not be true?
Save the state of that python process to a file using dill.dump_session
Later, give that file to dill.load_session in another fresh process in order to call one of the user-defined functions.

What we are not worried about:

Problems due to a mismatch between python, jupyter, dill versions or OS. Assume both sides of the serialize/deserialize process are properly containerized, just different python processes
- NB: Any version-related bugs or incompatibilities that keep the environment fixed on both sides still count!

What we are worried about:

jupyter notebook magic/state that is lost when run as a plain python script
notebook state that would be preserved but is un-pickleable and causes immediate problems on the serialization side
notebook state that seems to pickle fine but causes problems when deserializing later

Acceptance Criteria

Description of the bug with as much relevant detail as you have
Step-by-step instructions (or, even better, a script) for reproducing the issue.
Bonus points for any ideas on workarounds, but not necessary

Deadline and Prize for Best Bug

Best Bug is subjective (but we have to be able to reproduce it!). More than open to argument on the basis of size, technical interest, relevance to users, fun factor, style points, etc.
Bug description and repro steps should be linked under this issue before November 21
the Best Bug submitted before the deadline takes home this lil guy:

The text was updated successfully, but these errors were encountered:

WillEngler · 2023-10-18T17:29:08Z

Quick thoughts from lunch with Globus Compute engineers:

Do classes defined in the notebook get transferred properly?
Someone could probably shoot themself in the foot by trying to import something from an adjacent file. I don't think we want to support this, but it would be good to make good error messages for it or make it impossible to do.

isaac-darling · 2023-11-08T19:05:02Z

I found a bug (unrelated to dill) while syncing with Owen today that prevented users (me) from converting notebooks with magics in them. Apparently, in order for nbconvert to properly handle ipython magics (!pip install and the like) during conversion, ipython must be installed. Since ipython was not in the pyproject.toml, my session did not have it installed and neither would the users' (necessarily). The fix was quickly merged, adding the proper ipython version to the pyproject.toml.

MaxTuecke · 2023-11-09T23:11:57Z

Not a jupyter issue, but our version of dill cannot serialize classes that inherit from abc. For example

from abc import ABC, abstractmethod
import pandas as pd
import dill

class Preprocessor(ABC):
    @abstractmethod
    def preprocess(self, input_df: pd.DataFrame) -> pd.DataFrame:
        pass
        
class Preprocessor1(Preprocessor):
    def preprocess(self, input_df: pd.DataFrame) -> pd.DataFrame:
        input_df.fillna(0, inplace=True)
        return input_df

p = Preprocessor1()

dill.dump_session("./test.pkl") #fails

will fail to serialize.

Dill will not fail to serialize this on version 0.3.7
Alternatively, dill can serialize this if any classes inheriting from ABC or ABCMeta are in another file. For example

main.py

from a_module import Preprocessor1
import dill

p = Preprocessor1()

dill.dump_session("./test.pkl") #works

a_module.py

from abc import ABC, abstractmethod
import pandas as pd

class Preprocessor(ABC):
    @abstractmethod
    def preprocess(self, input_df: pd.DataFrame) -> pd.DataFrame:
        pass
        
class Preprocessor1(Preprocessor):
    def preprocess(self, input_df: pd.DataFrame) -> pd.DataFrame:
        input_df.fillna(0, inplace=True)
        return input_df

Dill issue for this can be found here: uqfoundation/dill#332

OwenPriceSkelly · 2023-11-30T21:00:37Z

mistakenly closed by GitHub keywords

WillEngler · 2023-12-11T21:50:32Z

All prizes from the bug hunt have been distributed, closing

WillEngler assigned OwenPriceSkelly Nov 20, 2023

OwenPriceSkelly mentioned this issue Nov 30, 2023

340 docs update #353

Merged

OwenPriceSkelly closed this as completed in #353 Nov 30, 2023

OwenPriceSkelly reopened this Nov 30, 2023

WillEngler closed this as completed Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🪲Bug Hunt🪲: dill goes to jupyter. what gets stupyder? #301

🪲Bug Hunt🪲: dill goes to jupyter. what gets stupyder? #301

OwenPriceSkelly commented Oct 11, 2023 •

edited

Loading

WillEngler commented Oct 18, 2023

isaac-darling commented Nov 8, 2023 •

edited

Loading

MaxTuecke commented Nov 9, 2023

OwenPriceSkelly commented Nov 30, 2023

WillEngler commented Dec 11, 2023

🪲Bug Hunt🪲: dill goes to jupyter. what gets stupyder? #301

🪲Bug Hunt🪲: dill goes to jupyter. what gets stupyder? #301

Comments

OwenPriceSkelly commented Oct 11, 2023 • edited Loading

What we are not worried about:

What we are worried about:

Acceptance Criteria

Deadline and Prize for Best Bug

WillEngler commented Oct 18, 2023

isaac-darling commented Nov 8, 2023 • edited Loading

MaxTuecke commented Nov 9, 2023

OwenPriceSkelly commented Nov 30, 2023

WillEngler commented Dec 11, 2023

OwenPriceSkelly commented Oct 11, 2023 •

edited

Loading

isaac-darling commented Nov 8, 2023 •

edited

Loading