Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: IPFS retrieval client #243

Merged
merged 25 commits into from
Jun 13, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
2bce5d3
test: setup env logger in integration tests
bajtos Jun 7, 2023
1a30f47
feat: add Lassie - IPFS retrieval client
bajtos Jun 7, 2023
5be5140
deps: upgrade lassie to v0.2.0
bajtos Jun 7, 2023
884f05a
fixup! remove unused lifetime param
bajtos Jun 7, 2023
93f2c31
feat: fetch('ipfs://bafycid')
bajtos Jun 7, 2023
f74672b
REVERT ME: temporarily disable aarch64
bajtos Jun 7, 2023
31a5b8c
Revert "REVERT ME: temporarily disable aarch64"
bajtos Jun 8, 2023
3c5826a
temporarily use Lassie from git main branch
bajtos Jun 8, 2023
f810edd
upgrade lassie to 0.3.0
bajtos Jun 8, 2023
7e4d990
fix tests + code cleanup
bajtos Jun 8, 2023
d6f4cdf
add support for URL and Request inputs
bajtos Jun 8, 2023
460b94d
fix clippy warning
bajtos Jun 8, 2023
19809a5
Merge remote-tracking branch 'origin/main' into feat-lassie
bajtos Jun 8, 2023
66b5837
add docs for module builders
bajtos Jun 8, 2023
ed48be7
Apply suggestions from code review
bajtos Jun 12, 2023
12d2c7f
fixup! prettier --write
bajtos Jun 12, 2023
3078d7f
add Go to build dependencies
bajtos Jun 12, 2023
75f8239
fix a bug in test assertions
bajtos Jun 12, 2023
00e396d
tweak setup-go config
bajtos Jun 12, 2023
78f629a
fixup!: go-version latest -> stable
bajtos Jun 13, 2023
7db4ee8
fixup! improve code comment
bajtos Jun 13, 2023
371b24a
fixup! remove non-ASCII characters from fetch.js
bajtos Jun 13, 2023
b2e1cca
fix syntax error to fix a failing test
bajtos Jun 13, 2023
831b954
document temp_dir setting in zinnia CLI
bajtos Jun 13, 2023
6093b21
Merge branch 'main' into feat-lassie
bajtos Jun 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 14 additions & 1 deletion cli/main.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
mod args;

use std::rc::Rc;
use std::sync::Arc;
use std::time::Duration;

use args::{CliArgs, Commands};
Expand All @@ -9,7 +10,9 @@ use clap::Parser;
use zinnia_runtime::anyhow::{Context, Error, Result};
use zinnia_runtime::deno_core::error::JsError;
use zinnia_runtime::fmt_errors::format_js_error;
use zinnia_runtime::{colors, resolve_path, run_js_module, BootstrapOptions, ConsoleReporter};
use zinnia_runtime::{
colors, lassie, resolve_path, run_js_module, BootstrapOptions, ConsoleReporter,
};

#[tokio::main(flavor = "current_thread")]
async fn main() {
Expand All @@ -32,9 +35,19 @@ async fn main_impl() -> Result<()> {
&file,
&std::env::current_dir().context("unable to get current working directory")?,
)?;

let lassie_daemon = Arc::new(
lassie::Daemon::start(lassie::DaemonConfig {
temp_dir: None, // TODO: Should we use something like ~/.cache/zinnia/lassie?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Station Core will pass $CACHE_ROOT:

https://github.com/filecoin-station/core/blob/7f07e5203c71366fa9bde713b0d28dee9ea0c51d/lib/zinnia.js#L48-L55.

Can we make Zinnia use that if it is set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am already using CACHE_ROOT in zinniad, see here:

https://github.com/filecoin-station/zinnia/pull/243/files#diff-c74dac62db9c80f1be22978c93249f7b304e05ddb38131e7969efa13effaeb1eR45

The file cli/main.js implements zinnia, the CLI people use locally when building Station modules. Let's explore together what a good developer experience would look like?

IMO:

  • We should not force zinnia users to always provide CACHE_ROOT. We don't ask them for FIL_WALLET_ADDRESS either. This way, users can type zinnia run main.js, and all works out of the box.
  • I guess allowing CLI users to control the CACHE ROOT can be helpful. I am not sure, though, if an env var provides good ergonomy. Would a project-specific config file be easier to use?
  • How important is this? Can we leave the current solution and open a follow-up GH issue to discuss what would a good (and easy-to-implement) solution look like?
  • Note: if we tell Lassie to use a specific temp dir that's not automatically cleaned by the operating system, we will need to clean any leftover files ourselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: if we tell Lassie to use a specific temp dir that's not automatically cleaned by the operating system, we will need to clean any leftover files ourselves.

I'll be implementing this cleanup in zinnia as part of #245

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting always needing to pass CACHE_ROOT, I thought if it's not passed then lassie shall pick its own temp dir, if it is passed (as in Station Core) it shall use that, just to keep all of the files together.

The primary use case for changing lassie's temp dir to me is not CLI usage but inside Station.

Note: if we tell Lassie to use a specific temp dir that's not automatically cleaned by the operating system, we will need to clean any leftover files ourselves.

That's a great point! What does Lassie use by default rn? If it uses an OS cleaned up dir, I'd suggest to leave it at that and add a comment summarizing our discussion here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point! What does Lassie use by default rn? If it uses an OS cleaned up dir, I'd suggest to leave it at that and add a comment summarizing our discussion here

Quoting from Lassie comments:
https://github.com/filecoin-project/lassie/blob/afc2ee5a4bc6f5e22ef2cc69396cc9b25f57b854/pkg/lassie/lassie.go#L199-L201

// WithTempDir allows you to specify a custom temp directory for bitswap
// retrievals, used for a temporary block store for the preloader. The default
// is the system temp directory.

I think that should be good enough for now, even if we may end up leaving some temporary files behind when zinnia exists unexpectedly.

port: 0,
})
.context("cannot initialize the IPFS retrieval client Lassie")?,
);

let config = BootstrapOptions::new(
format!("zinnia/{}", env!("CARGO_PKG_VERSION")),
Rc::new(ConsoleReporter::new(Duration::from_millis(500))),
lassie_daemon,
None,
);
run_js_module(&main_module, &config).await?;
Expand Down
18 changes: 14 additions & 4 deletions daemon/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ mod station_reporter;

use std::path::PathBuf;
use std::rc::Rc;
use std::sync::Arc;
use std::time::Duration;

use args::CliArgs;
use clap::Parser;

use log::{error, info};
use zinnia_runtime::anyhow::{anyhow, Context, Error, Result};
use zinnia_runtime::{get_module_root, resolve_path, run_js_module, BootstrapOptions};
use zinnia_runtime::{get_module_root, lassie, resolve_path, run_js_module, BootstrapOptions};

use crate::station_reporter::{log_info_activity, StationReporter};

Expand All @@ -27,7 +27,7 @@ async fn main() {
}

async fn run(config: CliArgs) -> Result<()> {
info!("Starting zinniad with config {config:?}");
log::info!("Starting zinniad with config {config:?}");

if config.files.is_empty() {
return Err(anyhow!("You must provide at least one module to run."));
Expand All @@ -41,6 +41,15 @@ async fn run(config: CliArgs) -> Result<()> {
let state_file = PathBuf::from(config.state_root).join("state.json");
log::debug!("Using state file: {}", state_file.display());

let lassie_config = lassie::DaemonConfig {
temp_dir: Some(PathBuf::from(config.cache_root).join("lassie")),
port: 0,
};
let lassie_daemon = Arc::new(
lassie::Daemon::start(lassie_config)
.context("cannot initialize the IPFS retrieval client Lassie")?,
);

log_info_activity("Module Runtime started.");

let file = &config.files[0];
Expand All @@ -63,6 +72,7 @@ async fn run(config: CliArgs) -> Result<()> {
Duration::from_millis(200),
module_name.into(),
)),
lassie_daemon,
module_root: Some(module_root),
no_color: true,
is_tty: false,
Expand All @@ -88,6 +98,6 @@ fn exit_with_error(error: Error) {
let error_string = format!("{error:?}");
let error_code = 1;

error!("{}", error_string.trim_start_matches("error: "));
log::error!("{}", error_string.trim_start_matches("error: "));
std::process::exit(error_code);
}
27 changes: 27 additions & 0 deletions docs/building-modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ import * as code from "../../other/code.js";
- [Web APIs](#web-apis)
- [Unsupported Web APIs](#unsupported-web-apis)
- [libp2p](#libp2p)
- [IPFS retrieval client](#ipfs-retrieval-client)

### Standard JavaScript APIs

Expand Down Expand Up @@ -329,6 +330,32 @@ Report that a single job was completed.

Call this function every time your module completes a job. It's ok to call it frequently.

### IPFS Retrieval Client

Zinnia provides a built-in IPFS retrieval client making it easy to fetch content-addressed data from
IPFS and Filecoin networks. You can retrieve data for a given CID using the web platform API `fetch`
and using the URL scheme `ipfs://`.
bajtos marked this conversation as resolved.
Show resolved Hide resolved

Example:

```js
const response = await fetch("ipfs://bafybeib36krhffuh3cupjml4re2wfxldredkir5wti3dttulyemre7xkni");
assert(response.ok);
const data = await response.arrayBuffer();
// data contains binary data in the CAR format
```

> Note: At the moment, Zinnia does not provide any tools to interpret the returned CAR data. We are
bajtos marked this conversation as resolved.
Show resolved Hide resolved
> discussing support for reading UnixFS data in
> [zinnia#245](https://github.com/filecoin-station/zinnia/issues/246).

Under the hood, Zinnia handles `ipfs://bafy...` requests by calling Lassie HTTP API. You can learn
bajtos marked this conversation as resolved.
Show resolved Hide resolved
more about supported parameters (request headers, query string arguments), response headers and
possible error status codes in
[Lassie's HTTP Specification](https://github.com/filecoin-project/lassie/blob/main/docs/HTTP_SPEC.md).
The format of CAR data returned by the retrieval client is described in
[Lassie's Returned CAR Specification](https://github.com/filecoin-project/lassie/blob/main/docs/CAR.md).

## Testing Guide

Zinnia provides lightweight tooling for writing and running automated tests.
Expand Down
2 changes: 2 additions & 0 deletions runtime/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ deno_fetch = "0.129.0"
deno_url = "0.105.0"
deno_web = "0.136.0"
deno_webidl = "0.105.0"
lassie = "0.3.0"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. I didn't know there is a Rust client!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a thin Rust wrapper embedding the original Go Lassie, I started the project three weeks ago :)

https://github.com/filecoin-station/rusty-lassie

# lassie = { git = "https://github.com/filecoin-station/rusty-lassie.git" }
log.workspace = true
once_cell = "1.18.0"
serde.workspace = true
Expand Down
1 change: 1 addition & 0 deletions runtime/ext.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ deno_core::extension!(
"90_zinnia_apis.js",
"98_global_scope.js",
"internals.js",
"fetch.js",
"test.js",
"vendored/asserts.bundle.js",
"99_main.js",
Expand Down
2 changes: 1 addition & 1 deletion runtime/js/98_global_scope.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import * as fileReader from "ext:deno_web/10_filereader.js";
import * as formData from "ext:deno_fetch/21_formdata.js";
import * as request from "ext:deno_fetch/23_request.js";
import * as response from "ext:deno_fetch/23_response.js";
import * as fetch from "ext:deno_fetch/26_fetch.js";
import * as fetch from "ext:zinnia_runtime/fetch.js";
import * as messagePort from "ext:deno_web/13_message_port.js";
import * as webidl from "ext:deno_webidl/00_webidl.js";
import DOMException from "ext:deno_web/01_dom_exception.js";
Expand Down
3 changes: 3 additions & 0 deletions runtime/js/99_main.js
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ import {
mainRuntimeGlobalProperties,
windowOrWorkerGlobalScope,
} from "ext:zinnia_runtime/98_global_scope.js";
import { setLassieUrl } from "ext:zinnia_runtime/fetch.js";

function formatException(error) {
if (ObjectPrototypeIsPrototypeOf(ErrorPrototype, error)) {
Expand Down Expand Up @@ -66,6 +67,8 @@ function runtimeStart(runtimeOptions) {

// deno-lint-ignore prefer-primordials
Error.prepareStackTrace = core.prepareStackTrace;

setLassieUrl(runtimeOptions.lassieUrl);
}

let hasBootstrapped = false;
Expand Down
66 changes: 66 additions & 0 deletions runtime/js/fetch.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import { fetch as fetchImpl } from "ext:deno_fetch/26_fetch.js";
import { fromInnerResponse, toInnerResponse } from "ext:deno_fetch/23_response.js";
import { toInnerRequest, fromInnerRequest, Request } from "ext:deno_fetch/23_request.js";
import { guardFromHeaders } from "ext:deno_fetch/20_headers.js";

let ipfsScheme = "ipfs://";
bajtos marked this conversation as resolved.
Show resolved Hide resolved
let ipfsBaseUrl = undefined;

export function setLassieUrl(/** @type {string} */ value) {
ipfsBaseUrl = value + "ipfs/";
}

export function fetch(resource, options) {
let request = new Request(resource, options);
// Fortunately, Request#url is a string, not an instance of URL class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is fortunate about that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! (The original answer got lost in the git history.)

The fetch API accepts a wide range of types for the "resource" argument. Quoting from https://developer.mozilla.org/en-US/docs/Web/API/fetch#parameters:

resource

This defines the resource that you wish to fetch. This can either be:

  • A string or any other object with a stringifier — including a URL object — that provides the URL of the resource you want to fetch.
  • A Request object.

If request#url was preserving the original value, then I would need to figure out how our custom fetch wrapper can detect an object with a stringier and call that stringier to obtain the resource URL as a string.

What's fortunate: the conversion from "resource in one of the many supported formats" to "resource URL as a string" is already handled by the Request constructor.

Can you suggest how to improve my code comment to make this matter easier to understand for future readers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha!

What do you think about

// Fortunately Request#url is always a string, no matter what was used to construct it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to write a longer comment, see 7db4ee8

  // The `resource` arg can be a string or any other object with a stringifier — including a URL
  // object — that provides the URL of the resource you want to fetch; or a Request object.
  // See https://developer.mozilla.org/en-US/docs/Web/API/fetch#parameters
  // Fortunately, Request's constructor handles the conversions, and Request#url is always a string.
  // See https://developer.mozilla.org/en-US/docs/Web/API/Request/url

// See https://developer.mozilla.org/en-US/docs/Web/API/Request/url
if (request.url.startsWith(ipfsScheme)) {
return fetchFromIpfs(request);
} else {
return fetchImpl(request);
}
}

async function fetchFromIpfs(request) {
// Rewrite request URL to use Lassie
request = buildIpfsRequest(request);

// Call Deno's `fetch` using the rewritten URL to make the actual HTTP request
const response = await fetchImpl(request);

// Patch the response object to hide the fact that we are calling Lassie
// We don't want to leak Lassie's URL
return patchIpfsResponse(response);
}

// Deno's Fetch Request is a thin immutable wrapper around InnerRequest. In order to modify the
// request URL, we must convert Request to InnerRequest first, make changes on the inner object,
// and finally convert the InnerRequest back to a new Request instance.
function buildIpfsRequest(request) {
const inner = toInnerRequest(request);

inner.urlList = /** @type {(() => string)[]}*/ (inner.urlList).map((urlFn) => {
const url = urlFn();
if (!url.startsWith(ipfsScheme)) return urlFn;
const newUrl = ipfsBaseUrl + url.slice(ipfsScheme.length);
return () => newUrl;
});
inner.urlListProcessed = /** @type {string[]} */ (inner.urlListProcessed).map((url) =>
url.startsWith(ipfsScheme) ? ipfsBaseUrl + url.slice(ipfsScheme.length) : url,
);

return fromInnerRequest(inner, request.signal, guardFromHeaders(request.headers));
}

// Deno's Fetch Response is a thin immutable wrapper around InnerResponse. In order to modify the
// response URL, we must convert Response to InnerResponse first, make changes on the inner object,
// and finally convert the InnerResponse back to a new Response instance.
function patchIpfsResponse(response) {
const inner = toInnerResponse(response);

inner.urlList = /** @type {string[])} */ (inner.urlList).map((url) =>
url.startsWith(ipfsBaseUrl) ? "ipfs://" + url.slice(ipfsBaseUrl.length) : url,
);

return fromInnerResponse(inner, guardFromHeaders(response.headers));
}
2 changes: 2 additions & 0 deletions runtime/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,6 @@ mod reporter;
pub use console_reporter::*;
pub use reporter::*;

pub use lassie;

mod ext;
8 changes: 8 additions & 0 deletions runtime/runtime.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
use std::path::PathBuf;
use std::rc::Rc;
use std::sync::Arc;

use deno_core::{located_script_name, serde_json, JsRuntime, ModuleSpecifier, RuntimeOptions};

Expand Down Expand Up @@ -35,12 +36,17 @@ pub struct BootstrapOptions {

/// Report activities
pub reporter: Rc<dyn Reporter>,

/// Lassie daemon to use as the IPFS retrieval client. We must use Arc here to allow sharing of
/// the singleton Lassie instance between multiple threads spawned by Rust's test runner.
pub lassie_daemon: Arc<lassie::Daemon>,
}

impl BootstrapOptions {
pub fn new(
agent_version: String,
reporter: Rc<dyn Reporter>,
lassie_daemon: Arc<lassie::Daemon>,
module_root: Option<PathBuf>,
) -> Self {
Self {
Expand All @@ -52,6 +58,7 @@ impl BootstrapOptions {
// See https://lotus.filecoin.io/lotus/manage/manage-fil/#public-key-address
wallet_address: String::from("t1abjxfbp274xpdqcpuaykwkfb43omjotacm2p3za"),
reporter,
lassie_daemon,
}
}

Expand All @@ -60,6 +67,7 @@ impl BootstrapOptions {
"noColor": self.no_color,
"isTty": self.is_tty,
"walletAddress": self.wallet_address,
"lassieUrl": format!("http://127.0.0.1:{}/", self.lassie_daemon.port()),
});
serde_json::to_string_pretty(&payload).unwrap()
}
Expand Down
11 changes: 10 additions & 1 deletion runtime/tests/fetch_api_tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,12 @@ use tokio::io::{AsyncReadExt, AsyncWriteExt};
use tokio::net::TcpListener;
use zinnia_runtime::{anyhow, deno_core, run_js_module, BootstrapOptions, RecordingReporter};

mod helpers;

#[tokio::test]
async fn fetch_reports_user_agent() -> Result<()> {
let _ = env_logger::builder().is_test(true).try_init();

let user_agent = "zinnia_fetch_api_tests agent/007";
let server_port = start_echo_server().await?;

Expand All @@ -29,7 +33,12 @@ assertArrayIncludes(request_lines, ["user-agent: {user_agent}"]);
&std::env::current_dir().context("unable to get current working directory")?,
)?;
let reporter = Rc::new(RecordingReporter::new());
let config = BootstrapOptions::new(user_agent.into(), reporter.clone(), None);
let config = BootstrapOptions::new(
user_agent.into(),
reporter.clone(),
helpers::lassie_daemon(),
None,
);
run_js_module(&main_module, &config).await?;
// the test passes when the JavaScript code does not throw
Ok(())
Expand Down
14 changes: 14 additions & 0 deletions runtime/tests/helpers/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
use std::sync::{Arc, OnceLock};

pub fn lassie_daemon() -> Arc<lassie::Daemon> {
static LASSIE_DAEMON: OnceLock<Result<Arc<lassie::Daemon>, lassie::StartError>> =
OnceLock::new();

let result = LASSIE_DAEMON
.get_or_init(|| lassie::Daemon::start(lassie::DaemonConfig::default()).map(Arc::new));

match result {
Ok(ptr) => Arc::clone(ptr),
Err(err) => panic!("could not start Lassie daemon: {err}"),
}
}
Loading