-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node_code_cache.cc and node_snapshot.cc generation is unreproducible #29108
Comments
/cc @joyeecheung |
The array contents are generated directly from outputs of V8 APIs...are those supposed to be reproducible? cc @hashseed |
The APIs in questions are:
|
For node_code_cache.cc the reason is entropy. This fixes it although it's probably not a good idea: diff --git a/tools/code_cache/mkcodecache.cc b/tools/code_cache/mkcodecache.cc
index defa1462ce..5c1c5b8cb4 100644
--- a/tools/code_cache/mkcodecache.cc
+++ b/tools/code_cache/mkcodecache.cc
@@ -38,6 +38,11 @@ int main(int argc, char* argv[]) {
return 1;
}
+ v8::V8::SetEntropySource([] (unsigned char* buffer, size_t length) {
+ memset(buffer, 0, length);
+ return true;
+ });
+
std::unique_ptr<v8::Platform> platform = v8::platform::NewDefaultPlatform();
v8::V8::InitializePlatform(platform.get());
v8::V8::Initialize(); For node_snapshot.cc, I suspect there are additionally things like timestamps getting encoded in. |
Looking into node_snapshot.cc more, after disabling entropy the offsets of the read-only snapshot data and the first context still shift around by 16 bytes for some inexplicable reason. That suggests the startup snapshot data is variable in size? The layout is this:
Note: those shifts also change the checksum in the header that's at offset 8-16. Disabling entropy at least ensures that the blob size is the same, otherwise it fluctuates in size by about 100 bytes... |
I enabled
Second run:
Third run:
diff --git a/tools/snapshot/node_mksnapshot.cc b/tools/snapshot/node_mksnapshot.cc
index f52cccb705..63e4b7dd40 100644
--- a/tools/snapshot/node_mksnapshot.cc
+++ b/tools/snapshot/node_mksnapshot.cc
@@ -19,6 +19,12 @@ int wmain(int argc, wchar_t* argv[]) {
int main(int argc, char* argv[]) {
#endif // _WIN32
+ v8::V8::SetFlagsFromString("--profile_deserialization");
+ v8::V8::SetEntropySource([] (unsigned char* buffer, size_t length) {
+ memset(buffer, 0, length);
+ return true;
+ });
+
if (argc < 2) {
std::cerr << "Usage: " << argv[0] << " <path/to/output.cc>\n";
return 1; |
How is entropy used in the code cache process? Like, what exact properties does it affect? I would have suspected that it could be related to hash table impl security, but anything going on there is static either way and does not change between Node.js launches as the code cache is generated at the build time, so it is predictable in runtime given that the Node.js version is fixed. |
The user-visible use of entropy is I poked at it some more and I discovered that changing diff --git a/tools/code_cache/mkcodecache.cc b/tools/code_cache/mkcodecache.cc
index defa1462ce..bc45d46a43 100644
--- a/tools/code_cache/mkcodecache.cc
+++ b/tools/code_cache/mkcodecache.cc
@@ -26,6 +26,8 @@ int wmain(int argc, wchar_t* argv[]) {
int main(int argc, char* argv[]) {
#endif // _WIN32
+ v8::V8::SetFlagsFromString("--random_seed=42");
+
if (argc < 2) {
std::cerr << "Usage: " << argv[0] << " <path/to/output.cc>\n";
return 1;
diff --git a/tools/snapshot/node_mksnapshot.cc b/tools/snapshot/node_mksnapshot.cc
index f52cccb705..29f9e1cbcb 100644
--- a/tools/snapshot/node_mksnapshot.cc
+++ b/tools/snapshot/node_mksnapshot.cc
@@ -19,6 +19,8 @@ int wmain(int argc, wchar_t* argv[]) {
int main(int argc, char* argv[]) {
#endif // _WIN32
+ v8::V8::SetFlagsFromString("--random_seed=42");
+
if (argc < 2) {
std::cerr << "Usage: " << argv[0] << " <path/to/output.cc>\n";
return 1;
The hash seed is reseeded too (i.e., |
Use a fixed random seed to ensure that the generated sources are identical across runs. The final node binary still reseeds itself on start-up so there should be no security implications caused by predictable random numbers (e.g., `Math.random()`, ASLR, the hash seed, etc.) Fixes: nodejs#29108
I surely didn't. 😮Thanks for the explanation! |
Use a fixed random seed to ensure that the generated sources are identical across runs. The final node binary still reseeds itself on start-up so there should be no security implications caused by predictable random numbers (e.g., `Math.random()`, ASLR, the hash seed, etc.) Fixes: #29108 PR-URL: #29142 Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
Refs: nodejs/build#589
Currently, performing these steps on Linux:
/tmp/node-v12.8.0
, build once with default configuration, rename the dir to/tmp/node-v12.8.0-1
/tmp/node-v12.8.0
, build again with default configuration, rename the dir to/tmp/node-v12.8.0-2
.diff -qr /tmp/node-v12.8.0-1 /tmp/node-v12.8.0-2
.Produces this result:
Note: it is important to extract from archive (to preserve the same modification timestamps) and build it from the same path, at least at this moment.
Note: on macOS, the diff is larger due to something unreproducible happening on the linker stage, let's check on Linux only for now. But on macOS,
node_code_cache.cc
andnode_snapshot.cc
generation is also affected.That blocks reproducible builds afaik, and it at the first glance seems to be the only major cause behind mismatching binaries produced in the same environment with the process above.
Those files differ in generated array contents.
Perhaps someone knows what exactly is going on there?
The text was updated successfully, but these errors were encountered: