-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maintain consistency when deserializing to JSON #5114
Conversation
Current dependencies on/for this PR: This comment was auto-generated by Graphite. |
Actually, we should resolve (1) as otherwise the validation fails:
|
PR Check ResultsEcosystem✅ ecosystem check detected no changes. BenchmarkLinux
Windows
|
Re 1: My intution goes towards using an enum and Re 2: That's a tricky problem! Maybe https://stackoverflow.com/a/67792465/3549270 would work? |
I tried the following:
> "execution_count": null,
14a16
> "execution_count": null,
112c114,115
< "\n"
---
> "\n",
> ""
395d397
< " model_name = f\"rank_{rank_num}\"\n",
476c478,479
< "\"\"\"))\n"
---
> "\"\"\"))\n",
> ""
504a508
> "execution_count": null,
610,617d613
< "accelerator": "GPU",
< "colab": {
< "gpuType": "T4",
< "include_colab_link": true,
< "machine_shape": "hm",
< "name": "AlphaFold2.ipynb",
< "provenance": []
< },
631d626
< "nbconvert_exporter": "python",
632a628
> "nbconvert_exporter": "python",
633a630,637
> },
> "accelerator": "GPU",
> "colab": {
> "gpuType": "T4",
> "include_colab_link": true,
> "machine_shape": "hm",
> "name": "AlphaFold2.ipynb",
> "provenance": [] This means the edition works correctly (nice!), we add an extra |
I tried this, it didn't work. It just keeps the same order as defined in the struct which is the default behavior.
Nice! Let me take a look at this. |
@konstin Do you mean something like this? (I'm not sure how to use #[derive(Debug, Serialize, Deserialize, Clone, PartialEq)]
#[serde(tag = "cell_type")]
pub enum Cell {
#[serde(rename = "code")]
Code(CodeCell),
#[serde(rename = "markdown")]
Markdown(MarkdownCell),
#[serde(rename = "raw")]
Raw(RawCell),
}
#[skip_serializing_none]
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct RawCell {
pub attachments: Option<Value>,
pub id: Option<String>,
pub metadata: Value,
pub source: SourceValue,
}
/// Notebook markdown cell.
#[skip_serializing_none]
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct MarkdownCell {
pub attachments: Option<Value>,
pub id: Option<String>,
pub metadata: Value,
pub source: SourceValue,
}
/// Notebook code cell.
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct CodeCell {
pub execution_count: Option<i64>,
pub id: Option<String>,
pub metadata: Value,
pub outputs: Vec<Value>,
pub source: SourceValue,
} |
Like this (untested, but i've used flatten in other projects) use serde::{Serialize, Deserialize};
#[derive(Debug, Serialize, Deserialize, Clone, PartialEq)]
#[serde(tag = "cell_type")]
pub enum Cell {
#[serde(rename = "code")]
Code {
#[serde(flatten)]
pub cell: CellInner,
pub execution_count: Option<i64>,
},
#[serde(rename = "markdown")]
Markdown {
#[serde(flatten)]
pub cell: CellInner,
},
#[serde(rename = "raw")]
Raw {
#[serde(flatten)]
pub cell: CellInner,
},
}
#[skip_serializing_none]
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
pub struct CellInner {
pub attachments: Option<Value>,
pub id: Option<String>,
pub metadata: Value,
pub source: SourceValue,
} |
Could you add a notebook saved from the web interface of |
I would argue to go with my implementation because:
|
Oh, I found the reason why the sorting wasn't working earlier. We've declared @konstin It seems like the feature was added during the initial PR to support Jupyter notebook (https://github.com/astral-sh/ruff/pull/3440/files#diff-2e9d962a08321605940b5a657135052fbcef87b5e360662bb527c96d9a615542). Any reason why this was added and can we remove it? |
That's a pity, yours is clearly the better solution then
none beyond "feels useful", feel free to remove it |
`serde_json` feature `preserve_order` is removed
Update
That trailing newline...Well, the JSON string might contain a trailing newline which is handled by black: https://github.com/psf/black/blob/01b8d3d4095ebdb91d0d39012a517931625c63cb/src/black/__init__.py#L1024. Currently, we don't write the JSON string with a trailing newline. This needs to be handled. |
@konstin I believe the integration test for Jupyter notebook here can be removed? ruff/crates/ruff_cli/src/commands/run.rs Lines 190 to 278 in 4079f98
|
yep feel free to remove it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test looks good!
Summary
Maintain consistency while deserializing Jupyter notebook to JSON. The following
changes were made:
Side effect
Removing the
preserve_order
feature means that the order of keys in JSON output (--format json
) will be in alphabetical order. This is because the value is represented usingserde_json::Value
which internally is aBTreeMap
, thus sorting it as per the string key. For posterity if this turns out to be not ideal, then we could define a struct representing the JSON object and the order of struct fields will determine the order in the JSON string.Test Plan
Add a test case to assert the raw JSON string.
Edit: Both have been fixed, keeping here for posterity
Still a few inconsistencies...
1
execution_count
is a required field only in code cell while for othersit's not required. We use a single
Cell
struct and acell_type
fieldinstead of having structs for each cell type (
CodeCell
,MarkdownCell
).This means that the field
execution_count
will always be added to the JSONstring.
What to do then?
cell_type
.Code
struct into 3 distinct types using enum:2
For some fields, the
additionalProperties
istrue
which means there could beunknown properties added which we've to keep while deserializing. We do this
using:
But, now the order won't be alphabetical. Wherever the
other
field is present,all of the keys in it will be flattened at that position.
In the following example, the
nbconvert_exporter
andversion
keys are extra.Now, as the
other
field in our struct is at the end, both the extra keys havebeen moved to the end.