Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify valid write of statefile to disk #108

Merged
merged 1 commit into from
Mar 23, 2023
Merged

Conversation

fonsp
Copy link
Member

@fonsp fonsp commented Mar 23, 2023

I have seen two instances where the statefile written to disk is glitched/corrupt: it is still readable by MsgPack, but the structure is totally wrong.

Reports

Both reports happened on PlutoSliderServer generating a static HTML site on a GHA runner. Both had Export_baked_state = false, i.e. the statefile was a separate .plutostate file on disk, not embedded as base64 in the HTML file.

The first report was on 1 Nov 2022:

telegram-cloud-photo-size-4-5985390626579004155-y

(This is supposed to be structured like this: https://github.com/fonsp/Pluto.jl/blob/v0.19.22-src/src/webserver/Dynamic.jl#L100 )

Luckily enough I stored the original statefile!

broken statefile huh.plutostate.zip

Second report was recently on slack. we don't have the statefile anymore, only a screenshot that @pankgeorg took in the chrome devtools, the notebook file and the GHA logs

image

Notebook file: https://github.com/IdePHICS/EPFLDoctoralLecture2023/blob/606db3cf62f8ee34662d0d1ba2c930f4b8a35073/Resources/Week2/rfim.jl (might have been one of the commits before)

Action run: https://github.com/IdePHICS/EPFLDoctoralLecture2023/actions/runs/4468560473/jobs/7849468519 (nothing out of the ordinary)

logs_71.zip

Observation

You can see that something went wrong while writing ["cell_results"]["<cell id here>"]. For the cell where it went wrong (29612df6-203f-42bc-b53b-86af618d60ec in the Nov case), only part of that Dict is available, and the rest is spread out over parent structures. You see that keys and values are being swapped, and that children of "cell_results" became top-level KV pairs.

Possible causes

notebook_to_js cannot return a Dict like this (that would be such a serious Julia bug that it would have been found before). So something must have gone wrong in one of:

  • Pluto.pack` (MsgPack),
  • writing to the file,
  • in the GHA system (corrupt disk, mistake during git push, etc).

I find a mistake during Pluto.pack unlikely, because this would also have led to bugs in regular Pluto use (with websockets or the static HTML with baked state). I also trust that https://github.com/JuliaIO/MsgPack.jl is well-implemented, also because the spec of MsgPack is not too complex. Still, some possibilities:

A mistake in the GHA system is unlikely (lots of people use it), and i only have reports of the .plutostate files being corrupted.

So I suspect that something went wrong while writing the file (which is something that I know nothing about). This is especially likely because we called Pluto.pack directly on the file handler IO:

open(export_statefile_path, "w") do io
    Pluto.pack(io, original_state)
end

I have later learned that this is not a good idea for stability (e.g. fonsp/Pluto.jl#976 because the program can get interrupted halfway during a write), and performance (because every single write(io, ...) call has overhead).

There might also be a bug in MsgPack where using a file handler as the io can lead to incorrect results. This might not have surfaced before if applications and the MsgPack tests mostly use memory buffers as IO.

So in this PR, we switch this to first Pluto.packing to memory, and then writing that Vector{UInt8} to the file:

write(export_statefile_path, Pluto.pack(original_state))

I suspect that this already fixed the bug. But to also eliminate a problem where write just writes the wrong contents, I added a read to verify the contents.

@fonsp
Copy link
Member Author

fonsp commented Mar 23, 2023

cc @gdalle

@fonsp
Copy link
Member Author

fonsp commented Mar 23, 2023

I also wrote a notebook to explore the Nov statefile a bit:

### A Pluto.jl notebook ###
# v0.19.22

using Markdown
using InteractiveUtils

# ╔═╡ 4e3146c2-280a-4823-8672-3f231bdf2edd
import Pkg

# ╔═╡ 9d1cfd50-4f53-4ce6-a10c-7282afc92055
Pkg.activate()

# ╔═╡ ac54af6d-e9c4-4c26-a5f1-28c1cee68ffa
import Pluto

# ╔═╡ c5843f00-c8c8-11ed-1fae-a59cf7067654
filename = "~/Downloads/broken statefile huh.plutostate" |> expanduser 

# ╔═╡ 592df111-aa07-449d-912a-421820a8428b
data = read(filename)

# ╔═╡ f32cb028-544e-43c5-a114-024e8c48f9d4
d = Pluto.unpack(data)

# ╔═╡ 9d20a49e-a8fb-4178-84cd-c9c91e975d35
rk = keys(d["cell_results"])

# ╔═╡ 9b913b7f-97be-4f37-9c3a-31aef4c30eee
Dict(
	k => keys(d["cell_results"][k])
	for k in keys(d["cell_results"])
)

# ╔═╡ 54cc6550-5139-49e9-9268-bc0fc1a5c1e6
setdiff(rk, d["cell_order"])

# ╔═╡ 5e8b7a2b-becf-4bd4-a634-cd3745659f96
setdiff(d["cell_order"],rk)

# ╔═╡ ebcaa53e-a07a-45df-8890-bbe9fc7a7844
Set(d["cell_order"])

# ╔═╡ b03191f1-4392-4d41-9e49-fdbb911a61a0
broken_cell_id = "29612df6-203f-42bc-b53b-86af618d60ec"

# ╔═╡ 43746f31-a903-4783-9ea3-f1aa323db1a0
d["cell_results"]["4115f7cb-d45f-4cc2-86bf-316c074393f8"]

# ╔═╡ f9c47abd-e78e-412f-b709-d36bd26188b7
d["cell_results"][broken_cell_id]

# ╔═╡ c6c7926f-c217-4d02-a84a-ea9bf6e27f0b
# String(copy(d["cell_results"][broken_cell_id]["output"]["body"])) |> Text

# ╔═╡ 8fcc29f1-52e3-489a-a746-e28a8a9f402c
rawd = copy(data)

# ╔═╡ a56e9ddc-e563-4dca-8505-4b72e56273e5
raws = String(copy(data))

# ╔═╡ 9df36e7c-fdbb-46db-a584-699f8b970262
239343 + 136808

# ╔═╡ 58f37eb2-2327-4079-a332-f74be278a889
raws[376100:378054]

# ╔═╡ 905cbd87-e879-4c4a-aee1-c525604b3213
"""
url(#clip752)\" style=\"stroke:#0000ff; stroke-linecap:round; stroke-linejoin:round; stroke-width:10; stroke-opacity:0.7; fill:none\" points=\"\n  2352.76,1179.34 2352.76,1179.34 \n  \"/>\n</svg>\n


\xb0

persist_js_state¤mime\xadimage/svg+xml\xb2last_run_timestamp\xcbA\xd8\xd8N\xe2\x8e\b-\xb7has_pluto_hook_features¬rootassignee
\xc0
\xa7
cell_id
\xd9
\$
29612df6-203f-42bc-b53b-86af618d60ec

\x88\xa4line\xff\xa3msg\x92\xd9WIndices Base.OneTo(4545) of attribute `markercolor`. Legend entries\nmay be suppressed by passing an empty label.\nFor example,\n    plot([1:2,1:3], [[4,5],[3,4,5]], label=[\"y\" \"\"], markercolor=[1 2])\n\xaatext/plain\xa7cell_id\xd9\$2"

"""


# ╔═╡ 5624ae35-db37-45bb-b17e-2badba0716c9
cell_id_ranges = findall(broken_cell_id, raws)

# ╔═╡ 76d76270-0701-4e06-813e-f42f38e497ad
findall("queued", raws)

# ╔═╡ dffef0df-922d-4b10-92b9-c627a7e899af
[
	raws[first(r)-12:end]
	for r in cell_id_ranges
]

# ╔═╡ 6eb236c2-ee15-4c1f-8a62-38a25f213ec5
Pluto.pack(Dict("running" => false, "aesdf" => [1,2,3])) .|> Char

# ╔═╡ b75e6651-89a9-4fbf-94ba-964ed567059a
bitstring(0xa6)

# ╔═╡ 58ee494a-34c3-46f5-bdfb-33dafe44851b
bitstring(0x82)

# ╔═╡ 2b55dbc2-2d49-4003-a04a-9288051a9645
"29612df6-203f-42bc-b53b-86af618d60ec\x88\xa4line\xff\xa3msg\x92\xd9WIndices Base.OneTo(4545) of attribute `markercolor` does not match data indices 1:4544.\xaatext/plain\xa7cell_id\xd9

# ╔═╡ 5513c11c-cae4-4da0-9064-02ac61b1792c
problem = raws[136773:end]

# ╔═╡ fa741ae3-f265-4709-ac68-7db8681fd912
data[1466+length(broken_cell_id):end]

# ╔═╡ 615a398e-3d63-4e9c-9184-db4b2cdf82a9
# ╠═╡ disabled = true
#=╠═╡
Text(problem)
  ╠═╡ =#

# ╔═╡ a5d6a38f-3793-40e5-a1e6-4e53a20a3c3b
"""
29612df6-203f-42bc-b53b-86af618d60ec
\xb2
upstream_cells_map
\x82
\xae
time_evolution
\x91
\xd9
\$
b48e55b7-4b56-41aa-9796-674d04adf5df
\xa2
p0
\x91
\xd9
\$
6b298184-32c6-412d-a900-b113d6bd3d53
\xd9
\$
b948830f-ead1-4f36-a237-c998f2f7deb8
\x84
\xb4
precedence_heuristic
\t
\xa7
cell_id
\xd9
\$
b948830f-ead1-4f36-a237-c998f2f7deb8
\xb4
downstream_cells_map
\x80
\xb2
upstream_cells_map
\x81
\xa8
getindex
\x90\xd9\$0b26efab-4e93-4d53-9c4d-faea68d12174\x84\xb4precedence_heuristic\t\xa7cell_id\xd9\$0b26efab-4e93-4d53-9c4d-faea68d12174\xb4downstream_cells_map\x81\xb1initial_condition\x91\xd9\$6b298184-32c6-412d-a900-b113d6bd3d53\xb2upstream_cells_map\x81\xa5zeros\x90\xd9\$98994cb9-"


# ╔═╡ Cell order:
# ╠═4e3146c2-280a-4823-8672-3f231bdf2edd
# ╠═9d1cfd50-4f53-4ce6-a10c-7282afc92055
# ╠═ac54af6d-e9c4-4c26-a5f1-28c1cee68ffa
# ╠═c5843f00-c8c8-11ed-1fae-a59cf7067654
# ╠═592df111-aa07-449d-912a-421820a8428b
# ╠═f32cb028-544e-43c5-a114-024e8c48f9d4
# ╠═9d20a49e-a8fb-4178-84cd-c9c91e975d35
# ╠═9b913b7f-97be-4f37-9c3a-31aef4c30eee
# ╠═54cc6550-5139-49e9-9268-bc0fc1a5c1e6
# ╠═5e8b7a2b-becf-4bd4-a634-cd3745659f96
# ╠═ebcaa53e-a07a-45df-8890-bbe9fc7a7844
# ╠═b03191f1-4392-4d41-9e49-fdbb911a61a0
# ╠═43746f31-a903-4783-9ea3-f1aa323db1a0
# ╠═f9c47abd-e78e-412f-b709-d36bd26188b7
# ╠═c6c7926f-c217-4d02-a84a-ea9bf6e27f0b
# ╠═8fcc29f1-52e3-489a-a746-e28a8a9f402c
# ╠═a56e9ddc-e563-4dca-8505-4b72e56273e5
# ╠═9df36e7c-fdbb-46db-a584-699f8b970262
# ╠═58f37eb2-2327-4079-a332-f74be278a889
# ╠═905cbd87-e879-4c4a-aee1-c525604b3213
# ╠═5624ae35-db37-45bb-b17e-2badba0716c9
# ╠═76d76270-0701-4e06-813e-f42f38e497ad
# ╠═dffef0df-922d-4b10-92b9-c627a7e899af
# ╠═6eb236c2-ee15-4c1f-8a62-38a25f213ec5
# ╠═b75e6651-89a9-4fbf-94ba-964ed567059a
# ╠═58ee494a-34c3-46f5-bdfb-33dafe44851b
# ╠═2b55dbc2-2d49-4003-a04a-9288051a9645
# ╠═5513c11c-cae4-4da0-9064-02ac61b1792c
# ╠═fa741ae3-f265-4709-ac68-7db8681fd912
# ╠═615a398e-3d63-4e9c-9184-db4b2cdf82a9
# ╠═a5d6a38f-3793-40e5-a1e6-4e53a20a3c3b

@fonsp fonsp merged commit cd57325 into main Mar 23, 2023
@fonsp fonsp deleted the corrupt-statefile-check branch March 23, 2023 16:37
@fonsp
Copy link
Member Author

fonsp commented Apr 26, 2023

One more, generated with an old version of PlutoSliderServer (before the fix)

random_walks_II_d189129b.plutostate.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant