downloading & parsing blobs #75

vbar · 2024-03-19T14:34:23Z

No description provided.

This reverts commit 6c21352.

zeapoz · 2024-03-19T15:21:26Z

state-reconstruct-fetcher/src/constants.rs

@@ -13,6 +13,9 @@ pub mod ethereum {

    /// zkSync smart contract address.
    pub const ZK_SYNC_ADDR: &str = "0x32400084C286CF3E17e7B677ea9583e60a000324";
+
+    /// Default Ethereum blob storage URL base.
+    pub const BLOBS_URL: &str = "https://api.blobscan.com/blobs/";


I'm wondering if we should supply a default URL at all. With the ETH-RPC URL, we made the decision to only give one as an example in the README, as opposed to hard-coding in a default one.

Of course this will depend on if blob providers can be counted on to use compatible formats. But, if it's possible, I'd prefer not tying ourselves to one specific provider.

We aren't tying ourselves - it's configurable... We're just being pushed, to the only blob archive there is... Anyway, this URL isn't directly analogical to the ETH-RPC URL, because it isn't always needed - if the user reconstructs only old blocks, it isn't necessary...

state-reconstruct-fetcher/src/constants.rs

state-reconstruct-fetcher/src/l1_fetcher.rs

zeapoz · 2024-03-19T16:28:33Z

state-reconstruct-fetcher/src/types/v3.rs

+}
+
+async fn get_blob(
+    kzg_commitment: &[u8],


We should probably encode the KZG commitment before calling the function.

Why? The commitment is inherently binary data - naturally it does have a hexadecimal representation, but whether that representation is preceded by '0x', lower- or uppercase, or whether the blob storage server accepts base64 instead should be coded at the point of calling the server, not sooner...

state-reconstruct-fetcher/src/types/v3.rs

zeapoz · 2024-03-19T16:47:38Z

state-reconstruct-fetcher/src/types/v3.rs

+    }
+
+    let blobs_view = &blobs[..l];
+    let mut pointer = 0;


We're re-initializing the pointer a lot. Now that we have pretty modular parsing systems, we should not share pointers across functions. Instead, it makes more sense for the callers to always supply byte slices such that a function-local pointer will always be initialized at 0.

Frankly I don't think it ever made much sense to have functions taking explicit pointer - but since it's already implemented, I'm not keen to rewrite the code just to change the style...

Yeah, no need to refactor it now since we'll probably end up rewriting a large chunk of this code in the future. Though, I'd recommend leaving some notes where this can be done so we don't forget to.

zeapoz · 2024-03-19T16:50:38Z

state-reconstruct-fetcher/src/types/v3.rs

+            Ok(response) => match response.text().await {
+                Ok(text) => match get_blob_data(&text) {
+                    Ok(data) => {


(Ditto for V3::get_blob_data too)

This could benefit a lot from using the ? operator in combination with let else clauses, and using if let/match clauses more sparingly. At the moment this is quite hard to read due in part to the excessive indentation. If we don't need to actively handle the error (like when re-polling), I think it's better to use ? to return the error directly (like on get_blob_data(&text)) and make use of the #[from ..] macro to automatically convert errors from other crates like serde and reqwest with ? (via thiserror).

I want to keep the logging - is it possible to do that with the ? operator?

Not at the same level, but of course it's possible to still have logger at the caller level (by matching). If you absolutely need logging, I'd recommend sticking to let else.

Though, in my personal opinion, a warning trace saying data is not string followed by an error trace saying the exact same thing (just without the source string) seems a bit superfluous. I'd advise to remove the tracing::warn!s (from get_blob_data) and just keep the errors since they get traced at the caller level anyways.

But, the source that couldn't be parsed is the most important thing for debugging - if we don't log it, we won't even know whether it's changed... There's a lot of logging here because there was (and is) a lot of problems parsing blob data...

Yeah, but right now we're logging a warning, and then we also return an error that gets caught and logged by the caller, so essentially we're logging twice for every single one of those errors, right? I think it makes more sense to instead pass in the source string in the error and avoid logging those warnings.

Well yes, it's logged twice - like in lots of other places, e.g. in retry_call and its callers... But yes, the error data can be passed up before logging it...

Co-authored-by: Jonathan <94441036+zeapoz@users.noreply.github.com>

src/cli.rs

state-reconstruct-fetcher/src/constants.rs

state-reconstruct-fetcher/src/types/v2.rs

state-reconstruct-fetcher/src/types/v1.rs

state-reconstruct-fetcher/src/types/v3.rs

Co-authored-by: Jonathan <94441036+zeapoz@users.noreply.github.com>

state-reconstruct-fetcher/src/blob_http_client.rs

zeapoz · 2024-03-20T14:55:22Z

state-reconstruct-fetcher/src/l1_fetcher.rs

-                            tracing::error!("Failed to parse calldata: {e}");
-                            cancellation_token.cancel();
-                            return last_block_number_processed;
+                    let blocks = loop {


We should not handle blob errors here, that should only be done within V3.

That isn't possible. On the one hand the program must retry on network errors, but on the other hand it cannot retry forever to remain cancellable - so there are two places where retrieval is retried: the high level has access to the CancellationToken while the low level has a hard-coded retry limit. I admit it's inelegant, but refactoring the whole fetcher around a network retrieval loop would IMHO be too drastic change... Note that retry_call works the same way.

I see, ideally we should couple all these function calls with something like a tokio::select! that will wait for whichever returns first, the retry function or the CancellationToken.

Yes, ideally, but I'm afraid getting that through the layers of the ethers provider would be non-trivial...

zeapoz · 2024-03-20T15:26:30Z

state-reconstruct-fetcher/src/blob_http_client.rs

+                Ok(response) => match response.text().await {
+                    Ok(text) => match get_blob_data(&text) {
+                        Ok(data) => {
+                            let plain = if let Some(p) = data.strip_prefix("0x") {
+                                p
+                            } else {
+                                &data
+                            };
+                            return hex::decode(plain).map_err(|e| {
+                                ParseError::BlobFormatError(plain.to_string(), e.to_string())
+                            });
+                        }
+                        Err(e) => {
+                            tracing::error!("failed parsing response of {url}");
+                            return Err(e);
+                        }
+                    },
+                    Err(e) => {
+                        tracing::error!("attempt {}: {} failed: {:?}", attempt, url, e);
+                        sleep(Duration::from_secs(FAILED_FETCH_RETRY_INTERVAL_S)).await;
+                    }
+                },


Suggested change

Ok(response) => match response.text().await {

Ok(text) => match get_blob_data(&text) {

Ok(data) => {

let plain = if let Some(p) = data.strip_prefix("0x") {

p

} else {

&data

};

return hex::decode(plain).map_err(|e| {

ParseError::BlobFormatError(plain.to_string(), e.to_string())

});

}

Err(e) => {

tracing::error!("failed parsing response of {url}");

return Err(e);

}

},

Err(e) => {

tracing::error!("attempt {}: {} failed: {:?}", attempt, url, e);

sleep(Duration::from_secs(FAILED_FETCH_RETRY_INTERVAL_S)).await;

}

},

Ok(response) => {

let Ok(text) = response.text().await else {

continue;

};

let data = get_blob_data(&text)?;

let plain = if let Some(p) = data.strip_prefix("0x") {

p

} else {

&data

};

return hex::decode(plain)

.map_err(|e| ParseError::BlobFormatError(plain.to_string(), e.to_string())));

}

How does this look? It accomplishes pretty much the same thing as it did previously, but in my opinion it's a lot more readable this way.

But it isn't the same thing at all... In case response cannot be retrieved, I want to see the error (I don't think I ever did - I suppose it's there for some complicated HTTP stuff that doesn't really happen, but who knows), not just try again...

It will give you the error if the response fails though, that part hasn't been changed at all.

However, another thing I thought of: we should actually be getting the response as a json directly instead of getting it as a string and then parsing it as one.

It will give you the error if the response fails though, that part hasn't been changed at all.

Well, no. Let's say I replace the Ok(text) = response.text() with Err(x) = response.text(), to simulate the failure (full executable code is in https://github.com/eqlabs/zksync-state-reconstruct/tree/inversion ). Then I can run cargo run reconstruct l1 --http-url https://eth.llamarpc.com --start-block 19428415 to check some blobs, and get

2024-03-21T06:57:31.341353Z INFO url = https://api.blobscan.com/blobs/0xb1647613a7b52bae4aa8f359421d1c9e8144d62768ac00946afb81838d24a3f76172fdfd01ab09f5bfa2e1638954e513 2024-03-21T06:57:32.394116Z INFO attempt 1: got response 2024-03-21T06:57:33.340344Z INFO attempt 2: got response 2024-03-21T06:57:34.429335Z INFO attempt 3: got response 2024-03-21T06:57:35.041117Z INFO attempt 4: got response 2024-03-21T06:57:35.635881Z INFO attempt 5: got response 2024-03-21T06:57:40.777653Z INFO PROGRESS: [ 0%] CUR BLOCK L1: 19428483 L2: 460962 TOTAL BLOCKS PROCESSED L1: 68 L2: 0

No error is logged, and really I don't see how it could...

However, another thing I thought of: we should actually be getting the response as a json directly instead of getting it as a string and then parsing it as one.

That link is for sending JSON, but yes, something like https://docs.rs/reqwest/latest/reqwest/struct.Response.html#method.json should be possible - except I don't see how to get the response text after failing to parse it as JSON... Really, all I want here is error handling - get response, report error on failure, then parse it, report error on failure... If that's more elaborate than some online Rust examples, well, so what?

That code is testing that we can decode the response, not get the response. Getting the response is still handled with a match statement like it was previously.

I'll admit, I don't really see why we would need that level of error logging. Failing to parse it as JSON is in my mind more than good enough, since that tells the user exactly what went wrong - it's not a valid JSON response.

And for the error handling part, if you're insistent, I would highly recommend looking into more advanced thiserror uses, like the #[from] derive. That allows you to do what you want, without polluting the code with so much boilerplate. If you go down this road, I'd also suggest making BlobError it's own error type that can then be condensed to a ParseError.

There are 2 async calls needed to get the response: send (inside BlobHttpClient::retrieve_url) and text. Both can fail, and both should be tested. I want to log the second one, precisely because I do not expect the error to happen.

I'll admit, I don't really see why we would need that level of error logging.

Well, to debug the problems that we've already had (e.g. the wrong URL returning JSON without a data attribute - looking at it, it's obvious it says NOT_FOUND, but a serde parser will loose that info).

Gotcha, sure. in that case we should definitely be using the ? operator in combination with thiserror-style error handling then since that would be an unrecoverable error. See my comment above for details.

vbar added 9 commits March 14, 2024 14:52

added blobs URL to options

8059330

downloading blobs

1d6cff4

Merge branch 'main' into feat/blob-resolve

84fdf7c

updated dependencies

6d123be

finished parse_pubdata_from_blobs

95030aa

silenced clippy

c95727f

handling zeros missing at end of compressed value

7e37172

parsing blobs as either hex, of comma-separated decimal lists

6c21352

Revert "parsing blobs as either hex, of comma-separated decimal lists"

7340fa0

This reverts commit 6c21352.

vbar requested a review from zeapoz March 19, 2024 14:34

zeapoz reviewed Mar 19, 2024

View reviewed changes

vbar and others added 5 commits March 20, 2024 08:18

Update comment

177d7e1

Co-authored-by: Jonathan <94441036+zeapoz@users.noreply.github.com>

encapsulated BlobHttpClient

85d4138

removed parse_pubdata_from_blobs pointer argument

4541901

renamed parse_pubdata_from_calldata

da19054

removed parse_resolved_pubdata pointer argument

2714dc1

vbar requested a review from zeapoz March 20, 2024 08:33

vbar added 3 commits March 20, 2024 10:05

unified try_from for v? comment

0d54a35

unified parse_total_l2_to_l1_pubdata with parse_resolved_pubdata

705193c

logging BlobFormatError just once

f57f69e

zeapoz reviewed Mar 20, 2024

View reviewed changes

vbar and others added 8 commits March 20, 2024 14:57

comment update

af59397

Co-authored-by: Jonathan <94441036+zeapoz@users.noreply.github.com>

simpler conversion to slice

fc72f40

Co-authored-by: Jonathan <94441036+zeapoz@users.noreply.github.com>

added comment

21a1a2a

moved get_blob to BlobHttpClient

3b25ad0

less indentation in get_blob_data

00b097a

comment w/ link

e06c60d

Co-authored-by: Jonathan <94441036+zeapoz@users.noreply.github.com>

comment w/ link

310de5d

Co-authored-by: Jonathan <94441036+zeapoz@users.noreply.github.com>

comment w/ link

1031503

Co-authored-by: Jonathan <94441036+zeapoz@users.noreply.github.com>

zeapoz reviewed Mar 20, 2024

View reviewed changes

vbar added a commit that referenced this pull request Mar 21, 2024

testing #75 (comment)

ebfc936

typed JSON parsing

23a7563

zeapoz approved these changes Mar 21, 2024

View reviewed changes

vbar merged commit 424cdfd into main Mar 21, 2024
4 checks passed

vbar deleted the feat/blob-resolve branch March 21, 2024 14:38

vbar mentioned this pull request Mar 21, 2024

ref: split fetcher and parser functionality #76

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

downloading & parsing blobs #75

downloading & parsing blobs #75

vbar commented Mar 19, 2024

zeapoz Mar 19, 2024

vbar Mar 20, 2024

zeapoz Mar 19, 2024

vbar Mar 20, 2024

zeapoz Mar 19, 2024

vbar Mar 20, 2024

zeapoz Mar 20, 2024

zeapoz Mar 19, 2024

vbar Mar 20, 2024

zeapoz Mar 20, 2024

vbar Mar 20, 2024

zeapoz Mar 20, 2024

vbar Mar 20, 2024

zeapoz Mar 20, 2024

vbar Mar 20, 2024

zeapoz Mar 20, 2024

vbar Mar 20, 2024

zeapoz Mar 20, 2024

vbar Mar 20, 2024

zeapoz Mar 20, 2024

vbar Mar 21, 2024

vbar Mar 21, 2024 •

edited

Loading

zeapoz Mar 21, 2024

vbar Mar 21, 2024

zeapoz Mar 21, 2024

downloading & parsing blobs #75

downloading & parsing blobs #75

Conversation

vbar commented Mar 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbar Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbar Mar 21, 2024 •

edited

Loading