Skip to content

Commit

Permalink
bootstrap: implement --snapshot-blob and --build-snapshot
Browse files Browse the repository at this point in the history
This patch introduces `--build-snapshot` and `--snapshot-blob` options
for creating and using user land snapshots.

For the initial iteration, user land CJS modules and ESM are not yet
supported in the snapshot, so only one single file can be snapshotted
(users can bundle their applications into a single script with their
bundler of choice to build a snapshot though).

A subset of builtins should already work, and support for more builtins
are being added. This PR includes tests checking that the TypeScript
compiler and the marked markdown renderer (and the builtins they use)
can be snapshotted and deserialized.

To generate a snapshot using `snapshot.js` as entry point and write the
snapshot blob to `snapshot.blob`:

```
$ echo "globalThis.foo = 'I am from the snapshot'" > snapshot.js
$ node --snapshot-blob snapshot.blob --build-snapshot snapshot.js
```

To restore application state from `snapshot.blob`, with `index.js` as
the entry point script for the deserialized application:

```
$ echo "console.log(globalThis.foo)" > index.js
$ node --snapshot-blob snapshot.blob index.js
I am from the snapshot
```

Users can also use the `v8.startupSnapshot` API to specify an entry
point at snapshot building time, thus avoiding the need of an additional
entry script at deserialization time:

```
$ echo "require('v8').startupSnapshot.setDeserializeMainFunction(() => console.log('I am from the snapshot'))" > snapshot.js
$ node --snapshot-blob snapshot.blob --build-snapshot snapshot.js
$ node --snapshot-blob snapshot.blob
I am from the snapshot
```

Note that this patch only adds functionality to the `node` executable
for building run-time user-land snapshots, the generated snapshot is
stored into a separate file on disk. Building a single binary with both
Node.js and an embedded snapshot has already been possible with the
`--node-snapshot-main` option to the `configure` script if the user
compiles Node.js from source. It would be a different task to enable the
`node` executable to produce a single binary that contains both Node.js
and an embedded snapshot without building Node.js from source, which
should be layered on top of the SEA (Single Executable Apps) initiative.

Known limitations/bugs that are being fixed in the upstream:

- V8 hits a DCHECK when deserializing certain mutated globals, e.g.
  `Error.stackTraceLimit` (it should work fine in the release build,
  however): https://chromium-review.googlesource.com/c/v8/v8/+/3319481
- Layout of V8's read-only heap can be inconsistent after
  deserialization, resulting in memory corruption:
  https://bugs.chromium.org/p/v8/issues/detail?id=12921

PR-URL: #38905
Refs: #35711
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
  • Loading branch information
joyeecheung authored and ruyadorno committed Aug 23, 2022
1 parent 1b3fcf7 commit 3561514
Show file tree
Hide file tree
Showing 17 changed files with 1,408 additions and 52 deletions.
76 changes: 76 additions & 0 deletions doc/api/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,62 @@ If this flag is passed, the behavior can still be set to not abort through
[`process.setUncaughtExceptionCaptureCallback()`][] (and through usage of the
`node:domain` module that uses it).

### `--build-snapshot`

<!-- YAML
added: REPLACEME
-->

> Stability: 1 - Experimental
Generates a snapshot blob when the process exits and writes it to
disk, which can be loaded later with `--snapshot-blob`.

When building the snapshot, if `--snapshot-blob` is not specified,
the generated blob will be written, by default, to `snapshot.blob`
in the current working directory. Otherwise it will be written to
the path specified by `--snapshot-blob`.

```console
$ echo "globalThis.foo = 'I am from the snapshot'" > snapshot.js

# Run snapshot.js to intialize the application and snapshot the
# state of it into snapshot.blob.
$ node --snapshot-blob snapshot.blob --build-snapshot snapshot.js

$ echo "console.log(globalThis.foo)" > index.js

# Load the generated snapshot and start the application from index.js.
$ node --snapshot-blob snapshot.blob index.js
I am from the snapshot
```

The [`v8.startupSnapshot` API][] can be used to specify an entry point at
snapshot building time, thus avoiding the need of an additional entry
script at deserialization time:

```console
$ echo "require('v8').startupSnapshot.setDeserializeMainFunction(() => console.log('I am from the snapshot'))" > snapshot.js
$ node --snapshot-blob snapshot.blob --build-snapshot snapshot.js
$ node --snapshot-blob snapshot.blob
I am from the snapshot
```

For more information, check out the [`v8.startupSnapshot` API][] documentation.

Currently the support for run-time snapshot is experimental in that:

1. User-land modules are not yet supported in the snapshot, so only
one single file can be snapshotted. Users can bundle their applications
into a single script with their bundler of choice before building
a snapshot, however.
2. Only a subset of the built-in modules work in the snapshot, though the
Node.js core test suite checks that a few fairly complex applications
can be snapshotted. Support for more modules are being added. If any
crashes or buggy behaviors occur when building a snapshot, please file
a report in the [Node.js issue tracker][] and link to it in the
[tracking issue for user-land snapshots][].

### `--completion-bash`

<!-- YAML
Expand Down Expand Up @@ -1105,6 +1161,22 @@ minimum allocation from the secure heap. The minimum value is `2`.
The maximum value is the lesser of `--secure-heap` or `2147483647`.
The value given must be a power of two.

### `--snapshot-blob=path`

<!-- YAML
added: REPLACEME
-->

> Stability: 1 - Experimental
When used with `--build-snapshot`, `--snapshot-blob` specifies the path
where the generated snapshot blob will be written to. If not specified,
the generated blob will be written, by default, to `snapshot.blob`
in the current working directory.

When used without `--build-snapshot`, `--snapshot-blob` specifies the
path to the blob that will be used to restore the application state.

### `--test`

<!-- YAML
Expand Down Expand Up @@ -1727,6 +1799,7 @@ Node.js options that are allowed are:
* `--require`, `-r`
* `--secure-heap-min`
* `--secure-heap`
* `--snapshot-blob`
* `--test-only`
* `--throw-deprecation`
* `--title`
Expand Down Expand Up @@ -2100,6 +2173,7 @@ done
[ECMAScript module loader]: esm.md#loaders
[Fetch API]: https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API
[Modules loaders]: packages.md#modules-loaders
[Node.js issue tracker]: https://github.com/nodejs/node/issues
[OSSL_PROVIDER-legacy]: https://www.openssl.org/docs/man3.0/man7/OSSL_PROVIDER-legacy.html
[REPL]: repl.md
[ScriptCoverage]: https://chromedevtools.github.io/devtools-protocol/tot/Profiler#type-ScriptCoverage
Expand Down Expand Up @@ -2130,6 +2204,7 @@ done
[`tls.DEFAULT_MAX_VERSION`]: tls.md#tlsdefault_max_version
[`tls.DEFAULT_MIN_VERSION`]: tls.md#tlsdefault_min_version
[`unhandledRejection`]: process.md#event-unhandledrejection
[`v8.startupSnapshot` API]: v8.md#startup-snapshot-api
[`worker_threads.threadId`]: worker_threads.md#workerthreadid
[conditional exports]: packages.md#conditional-exports
[context-aware]: addons.md#context-aware-addons
Expand All @@ -2145,4 +2220,5 @@ done
[security warning]: #warning-binding-inspector-to-a-public-ipport-combination-is-insecure
[semi-space]: https://www.memorymanagement.org/glossary/s.html#semi.space
[timezone IDs]: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
[tracking issue for user-land snapshots]: https://github.com/nodejs/node/issues/44014
[ways that `TZ` is handled in other environments]: https://www.gnu.org/software/libc/manual/html_node/TZ-Variable.html
36 changes: 21 additions & 15 deletions src/env.cc
Original file line number Diff line number Diff line change
Expand Up @@ -248,17 +248,6 @@ std::ostream& operator<<(std::ostream& output,
return output;
}

std::ostream& operator<<(std::ostream& output,
const std::vector<PropInfo>& vec) {
output << "{\n";
for (const auto& info : vec) {
output << " { \"" << info.name << "\", " << std::to_string(info.id) << ", "
<< std::to_string(info.index) << " },\n";
}
output << "}";
return output;
}

std::ostream& operator<<(std::ostream& output,
const IsolateDataSerializeInfo& i) {
output << "{\n"
Expand Down Expand Up @@ -298,7 +287,7 @@ IsolateDataSerializeInfo IsolateData::Serialize(SnapshotCreator* creator) {
for (size_t i = 0; i < AsyncWrap::PROVIDERS_LENGTH; i++)
info.primitive_values.push_back(creator->AddData(async_wrap_provider(i)));

size_t id = 0;
uint32_t id = 0;
#define V(PropertyName, TypeName) \
do { \
Local<TypeName> field = PropertyName(); \
Expand Down Expand Up @@ -352,7 +341,7 @@ void IsolateData::DeserializeProperties(const IsolateDataSerializeInfo* info) {

const std::vector<PropInfo>& values = info->template_values;
i = 0; // index to the array
size_t id = 0;
uint32_t id = 0;
#define V(PropertyName, TypeName) \
do { \
if (values.size() > i && id == values[i].id) { \
Expand Down Expand Up @@ -1482,6 +1471,7 @@ std::ostream& operator<<(std::ostream& output,
AsyncHooks::SerializeInfo AsyncHooks::Serialize(Local<Context> context,
SnapshotCreator* creator) {
SerializeInfo info;
// TODO(joyeecheung): some of these probably don't need to be serialized.
info.async_ids_stack = async_ids_stack_.Serialize(context, creator);
info.fields = fields_.Serialize(context, creator);
info.async_id_fields = async_id_fields_.Serialize(context, creator);
Expand Down Expand Up @@ -1676,7 +1666,7 @@ EnvSerializeInfo Environment::Serialize(SnapshotCreator* creator) {
info.should_abort_on_uncaught_toggle =
should_abort_on_uncaught_toggle_.Serialize(ctx, creator);

size_t id = 0;
uint32_t id = 0;
#define V(PropertyName, TypeName) \
do { \
Local<TypeName> field = PropertyName(); \
Expand All @@ -1693,6 +1683,22 @@ EnvSerializeInfo Environment::Serialize(SnapshotCreator* creator) {
return info;
}

std::ostream& operator<<(std::ostream& output,
const std::vector<PropInfo>& vec) {
output << "{\n";
for (const auto& info : vec) {
output << " " << info << ",\n";
}
output << "}";
return output;
}

std::ostream& operator<<(std::ostream& output, const PropInfo& info) {
output << "{ \"" << info.name << "\", " << std::to_string(info.id) << ", "
<< std::to_string(info.index) << " }";
return output;
}

std::ostream& operator<<(std::ostream& output,
const std::vector<std::string>& vec) {
output << "{\n";
Expand Down Expand Up @@ -1774,7 +1780,7 @@ void Environment::DeserializeProperties(const EnvSerializeInfo* info) {

const std::vector<PropInfo>& values = info->persistent_values;
size_t i = 0; // index to the array
size_t id = 0;
uint32_t id = 0;
#define V(PropertyName, TypeName) \
do { \
if (values.size() > i && id == values[i].id) { \
Expand Down
11 changes: 8 additions & 3 deletions src/env.h
Original file line number Diff line number Diff line change
Expand Up @@ -580,7 +580,7 @@ typedef size_t SnapshotIndex;

struct PropInfo {
std::string name; // name for debugging
size_t id; // In the list - in case there are any empty entries
uint32_t id; // In the list - in case there are any empty entries
SnapshotIndex index; // In the snapshot
};

Expand Down Expand Up @@ -987,8 +987,9 @@ struct EnvSerializeInfo {
struct SnapshotData {
enum class DataOwnership { kOwned, kNotOwned };

static const size_t kNodeBaseContextIndex = 0;
static const size_t kNodeMainContextIndex = kNodeBaseContextIndex + 1;
static const uint32_t kMagic = 0x143da19;
static const SnapshotIndex kNodeBaseContextIndex = 0;
static const SnapshotIndex kNodeMainContextIndex = kNodeBaseContextIndex + 1;

DataOwnership data_ownership = DataOwnership::kOwned;

Expand All @@ -1000,12 +1001,16 @@ struct SnapshotData {
// TODO(joyeecheung): there should be a vector of env_info once we snapshot
// the worker environments.
EnvSerializeInfo env_info;

// A vector of built-in ids and v8::ScriptCompiler::CachedData, this can be
// shared across Node.js instances because they are supposed to share the
// read only space. We use native_module::CodeCacheInfo because
// v8::ScriptCompiler::CachedData is not copyable.
std::vector<native_module::CodeCacheInfo> code_cache;

void ToBlob(FILE* out) const;
static void FromBlob(SnapshotData* out, FILE* in);

~SnapshotData();

SnapshotData(const SnapshotData&) = delete;
Expand Down
133 changes: 111 additions & 22 deletions src/node.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1148,38 +1148,127 @@ void TearDownOncePerProcess() {
per_process::v8_platform.Dispose();
}

int GenerateAndWriteSnapshotData(const SnapshotData** snapshot_data_ptr,
InitializationResult* result) {
// nullptr indicates there's no snapshot data.
DCHECK_NULL(*snapshot_data_ptr);

// node:embedded_snapshot_main indicates that we are using the
// embedded snapshot and we are not supposed to clean it up.
if (result->args[1] == "node:embedded_snapshot_main") {
*snapshot_data_ptr = SnapshotBuilder::GetEmbeddedSnapshotData();
if (*snapshot_data_ptr == nullptr) {
// The Node.js binary is built without embedded snapshot
fprintf(stderr,
"node:embedded_snapshot_main was specified as snapshot "
"entry point but Node.js was built without embedded "
"snapshot.\n");
result->exit_code = 1;
return result->exit_code;
}
} else {
// Otherwise, load and run the specified main script.
std::unique_ptr<SnapshotData> generated_data =
std::make_unique<SnapshotData>();
result->exit_code = node::SnapshotBuilder::Generate(
generated_data.get(), result->args, result->exec_args);
if (result->exit_code == 0) {
*snapshot_data_ptr = generated_data.release();
} else {
return result->exit_code;
}
}

// Get the path to write the snapshot blob to.
std::string snapshot_blob_path;
if (!per_process::cli_options->snapshot_blob.empty()) {
snapshot_blob_path = per_process::cli_options->snapshot_blob;
} else {
// Defaults to snapshot.blob in the current working directory.
snapshot_blob_path = std::string("snapshot.blob");
}

FILE* fp = fopen(snapshot_blob_path.c_str(), "wb");
if (fp != nullptr) {
(*snapshot_data_ptr)->ToBlob(fp);
fclose(fp);
} else {
fprintf(stderr,
"Cannot open %s for writing a snapshot.\n",
snapshot_blob_path.c_str());
result->exit_code = 1;
}
return result->exit_code;
}

int LoadSnapshotDataAndRun(const SnapshotData** snapshot_data_ptr,
InitializationResult* result) {
// nullptr indicates there's no snapshot data.
DCHECK_NULL(*snapshot_data_ptr);
// --snapshot-blob indicates that we are reading a customized snapshot.
if (!per_process::cli_options->snapshot_blob.empty()) {
std::string filename = per_process::cli_options->snapshot_blob;
FILE* fp = fopen(filename.c_str(), "rb");
if (fp == nullptr) {
fprintf(stderr, "Cannot open %s", filename.c_str());
result->exit_code = 1;
return result->exit_code;
}
std::unique_ptr<SnapshotData> read_data = std::make_unique<SnapshotData>();
SnapshotData::FromBlob(read_data.get(), fp);
*snapshot_data_ptr = read_data.release();
fclose(fp);
} else if (per_process::cli_options->node_snapshot) {
// If --snapshot-blob is not specified, we are reading the embedded
// snapshot, but we will skip it if --no-node-snapshot is specified.
*snapshot_data_ptr = SnapshotBuilder::GetEmbeddedSnapshotData();
}

if ((*snapshot_data_ptr) != nullptr) {
NativeModuleLoader::RefreshCodeCache((*snapshot_data_ptr)->code_cache);
}
NodeMainInstance main_instance(*snapshot_data_ptr,
uv_default_loop(),
per_process::v8_platform.Platform(),
result->args,
result->exec_args);
result->exit_code = main_instance.Run();
return result->exit_code;
}

int Start(int argc, char** argv) {
InitializationResult result = InitializeOncePerProcess(argc, argv);
if (result.early_return) {
return result.exit_code;
}

if (per_process::cli_options->build_snapshot) {
fprintf(stderr,
"--build-snapshot is not yet supported in the node binary\n");
return 1;
}
DCHECK_EQ(result.exit_code, 0);
const SnapshotData* snapshot_data = nullptr;

{
bool use_node_snapshot = per_process::cli_options->node_snapshot;
const SnapshotData* snapshot_data =
use_node_snapshot ? SnapshotBuilder::GetEmbeddedSnapshotData()
: nullptr;
uv_loop_configure(uv_default_loop(), UV_METRICS_IDLE_TIME);

if (snapshot_data != nullptr) {
NativeModuleLoader::RefreshCodeCache(snapshot_data->code_cache);
auto cleanup_process = OnScopeLeave([&]() {
TearDownOncePerProcess();

if (snapshot_data != nullptr &&
snapshot_data->data_ownership == SnapshotData::DataOwnership::kOwned) {
delete snapshot_data;
}
});

uv_loop_configure(uv_default_loop(), UV_METRICS_IDLE_TIME);

// --build-snapshot indicates that we are in snapshot building mode.
if (per_process::cli_options->build_snapshot) {
if (result.args.size() < 2) {
fprintf(stderr,
"--build-snapshot must be used with an entry point script.\n"
"Usage: node --build-snapshot /path/to/entry.js\n");
return 9;
}
NodeMainInstance main_instance(snapshot_data,
uv_default_loop(),
per_process::v8_platform.Platform(),
result.args,
result.exec_args);
result.exit_code = main_instance.Run();
return GenerateAndWriteSnapshotData(&snapshot_data, &result);
}

TearDownOncePerProcess();
return result.exit_code;
// Without --build-snapshot, we are in snapshot loading mode.
return LoadSnapshotDataAndRun(&snapshot_data, &result);
}

int Stop(Environment* env) {
Expand Down
Loading

0 comments on commit 3561514

Please sign in to comment.