Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP: Support http:// https:// and ssl:// stream wrappers #1093

Closed
wants to merge 3 commits into from

Conversation

adamziel
Copy link
Collaborator

@adamziel adamziel commented Mar 7, 2024

What is this PR doing?

Intercepts all network traffic coming from PHP and handles http:// and https:// requests using fetch(). This enables using the native networking features of PHP without implementing custom transport classes.

🚧 Work in progress – this PR needs discussion and cleaning up before it can be shipped 🚧

How is it implemented?

Emscripten can be configured to stream all network traffic through a WebSocket. @php-wasm/node and wp-now use that to access the internet via a local WebSocket->TCP proxy, but the in-browser version of WordPress Playground exposes no such proxy.

This PR ships a "fake" WebSocket implementation that doesn't initiate any ws:// connection. Instead, it mocks the WebSocket interface and analyzes the connection address and transmitted bytes to infer a corresponding fetch() call.

In case of HTTP, it parses the request text, extracts the method, path, and headers (body TBD), and feeds that information to a fetch() call. Then, as the response status, headers, and the data stream comes in, it rewrites it as raw bytes and pretends to emit them as incoming WebSocket data.

In case of HTTPS, it uses the node-forge package to start a HTTPS server with a self-signed certificate that is also added to PHP CA store via the openssl.cafile PHP.ini setting. The outbound traffic is piped to node-forge which handles the SSL handshake, the encryption, and yields unencrypted HTTP request bytes. They are treated the same as in the previous paragraph, and then the response is piped back to node-forge for encryption and then, finally, piped back to PHP.

The current implementation is naive and optimistically assumes the data is intended for either HTTP or HTTPS use.

Upsides

  • file_get_contents(), fopen(), fsockopen() and other functions using stream wrappers will now work with http://, https://, and ssl:// URLs!
  • libcurl can be now be supported without any customizations as the network traffic handler doesn't care about the library that initiated the connection.

Downsides

  • node-forge doesn't seem to work well with the latest TLS so we'd have to force-downgrade it in PHP. It doesn't seem like a big deal in the browser because all SSL security is provided by fetch(), but it's something to note. This approach is , of course, unsuitable and unnecessary outside of web browsers.
  • Raw sockets and CORS-less URLs still can't be supported this way.
  • I'm not sure whether we can reliably distinguish between HTTP / HTTPS traffic and a byte transmission via fsockopen('ssl://somesite.com'). Perhaps not. In this case we could ship a naive heuristic that would cover the majority of cases, and then patch PHP to provide an explicit flag "I'm about to request something via HTTPS".
  • node-forge is slow and the certificates are generated synchronously. It would be far better to use the browser-native asynchronous API like crypto.subtle.generateKey().
  • HTTPS adds noticeable overhead. Sticking to the FetchTransport in WordPress sounds like a good idea since it triggers fetch() directly and without an encryption layer.
  • I think each domain requires a separate cert, which means a bit of slowness added to the first request to every domain. That sounds like a fair trade-off for providing the much-requested networking support.

Remaining work

  • Discuss the approach
  • Clean up the code and the import structure
  • Remove as much synchronous processing as possible
  • Generate SSL certificates lazily

Testing Instructions

That Blueprint above runs the following PHP code:

<?php
echo 'Hello-dolly.zip downloaded from https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip has this many bytes: ';
var_dump(strlen(file_get_contents('https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip')));

Related to #724

cc @dmsnell @bgrgicak @brandonpayton @ThomasTheDane

Copy link
Collaborator

@bgrgicak bgrgicak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see progress on this 🚀

Should this eventually become a separate package?

A naive question. Is HTTPS necessary or could we just use HTTP internally?

@bgrgicak
Copy link
Collaborator

bgrgicak commented Mar 8, 2024

@adamziel When I opened the test link (on Linux) I got back an error Hello-dolly.zip downloaded from https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip has this many bytes: int(1887).

@adamziel
Copy link
Collaborator Author

adamziel commented Mar 8, 2024

That's the expected output and not an error :-)

@adamziel
Copy link
Collaborator Author

adamziel commented Mar 8, 2024

A naive question. Is HTTPS necessary or could we just use HTTP internally?

How would that work?

@bgrgicak
Copy link
Collaborator

bgrgicak commented Mar 8, 2024

That's the expected output and not an error :-)

I should read all the testing instructions before I comment 😕

@bgrgicak
Copy link
Collaborator

bgrgicak commented Mar 8, 2024

How would that work?

I imagined that we could just send these requests without having an SSL server.
WordPress and CURL requests can be configured to ignore SSL certificates.
But that's probably a bad idea.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is still early on in its development?

were there any specific things you'd like review on?

@@ -44,6 +48,560 @@ const fakeWebsocket = () => {
};
};

async function generateKeys() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and createCertificate() would both be very good functions to document, specifically why they exist, what kind of keys and certificates they generate, and why

},
{
name: 'localityName',
value: 'Blacksburg',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably don't need all of these attributes do we? if not, maybe we should have a good explanation for why we're using the very specific values we are. we could probably make up a virtual/imaginary Playground land in WordPressWorld so that if someone were to examine the certificates it would be clearer that these are for a virtual system and not supposed to be actual legitimate signed certificates

sslCA: true,
emailCA: true,
objCA: true,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we surely don't need all these capabilities in the certificate do we? it shouldn't hurt anything, but do we expect, for example, the Playground to sign email signatures?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, most of these should be fine to remove

10
);

const ws = this;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy/paste or AI code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick&dirty prototype, ws is used in a function in line 316 below where this refers to a different object.

@adamziel
Copy link
Collaborator Author

were there any specific things you'd like review on?

Mostly the general idea and approach in case there was something off with it.

@dmsnell
Copy link
Member

dmsnell commented Mar 25, 2024

Summarizing a conversation @adamziel and I had: this seems like a fine approach, though I think it's a long way away from where it needs to be with naming and documentation.

This is essentially a TLS proxy standing as a middleman between the PHP code and JS code. What it isn't is writing our own security layer, and this is because ultimately all TLS connections on the JS side will be enforced with the security mechanisms that the browser provides.

Proxying through fetch() is challenging because it requires that we grab the data out of the encrypted stream and present it to PHP in a new encrypted stream, acting like it was never translated, thus the middleman.

It looks like we can provide a ReadableStream to fetch as the body of the network request so I think it's even possible to go beyond what's here and provide a generalized socket interface into PHP. I was on my way to testing that out when I ran out of time. There's still the challenge of detecting when a TLS handshake is beginning, but maybe that's not too hard to do.

@adamziel adamziel added this to the PHP Feature Parity milestone Apr 4, 2024
adamziel added a commit that referenced this pull request Apr 29, 2024
Ships the Node.js version of PHP built with `--with-libcurl` option to support the curl extension.

It also changes two nuances in the overall PHP build process:

* It replaces the `select(2)` function using `-Wl,--wrap=select` emcc
option instead of patching PHP source code – this enables supporting
asynchronous `select(2)` in curl without additional patches.
* Brings the `__wrap_select` implementation more in line with
`select(2)`, add support for `POLLERR`.
* Adds support for polling file descriptors that represent neither child
processes nor streams in `poll(2)` – that's because `libcurl` polls
`/dev/urandom`.

Builds on top of and supersedes
#1133

## Debugging Asyncify problems

The [typical way of resolving Asyncify
crashes](https://wordpress.github.io/wordpress-playground/architecture/wasm-asyncify/)
didn't work during the work on this PR. Functions didn't come up in the
error messages and even raw stack traces. The reasons are unclear.

[The JSPI build of
PHP](#1339) was
more helpful as it enabled logging the current stack trace in all the
asynchronous calls, which quickly revealed all the missing
`ASYNCIFY_ONLY` functions. This is the way to debug any future issues
until we fully migrate to JSPI.

## Testing Instructions

Confirm the CI checks pass. This PR ships a few new tests specifically
targeting networking with curl.


## Related resources

* #85
* #1093

---------

Co-authored-by: Adam Zieliński <adam@adamziel.com>
Co-authored-by: MHO <yannick@chillpills.io>
adamziel added a commit that referenced this pull request May 15, 2024
…1389)

Aligns the boot process between the in-browser Playground Remote and
Node-oriented Playground CLI.

With this PR, both apps use a similar `createPHP()` function that:

* Sets up the SAPI name
* Sets up PHP ini entries
* Sets up the `/phpinfo.php` route
* Sets up platform-level mu0plugins
* Proxies filesystem directories from secondary PHP instances to primary
* Sets up PHP runtime rotation to avoid OOM errors in long-running
primary processes

There are still the following discrepancies:

* The in-browser PHP sets up a SAPI name conditionally, Node.js one
always uses `cli` (it probably shouldn't)
* The in-browser PHP uses a custom spawn handler
* The in-browser PHP uses a different set of php.ini directives
* The in-browser PHP loads more mu-plugins
* The Node.js PHP sets up CA certificates for HTTPS connections (the
in-browser PHP [will fake the CA chain
eventually](#1093))

This is the first step towards a consistent Boot Protocol, see
#1379 for
more details.

## Testing Instructions

* Confirm the CI checks work
* Run `bun packages/playground/cli/src/cli.ts server --login`, confirm
the server starts without issues, test wp-admin and HTTPS-reliant
features like the plugin directories.

We'll need a set of unit tests for these new boot-related features,
let's create them sooner than later.
adamziel added a commit that referenced this pull request May 15, 2024
…1389)

Aligns the boot process between the in-browser Playground Remote and
Node-oriented Playground CLI.

With this PR, both apps use a similar `createPHP()` function that:

* Sets up the SAPI name
* Sets up PHP ini entries
* Sets up the `/phpinfo.php` route
* Sets up platform-level mu0plugins
* Proxies filesystem directories from secondary PHP instances to primary
* Sets up PHP runtime rotation to avoid OOM errors in long-running
primary processes

There are still the following discrepancies:

* The in-browser PHP sets up a SAPI name conditionally, Node.js one
always uses `cli` (it probably shouldn't)
* The in-browser PHP uses a custom spawn handler
* The in-browser PHP uses a different set of php.ini directives
* The in-browser PHP loads more mu-plugins
* The Node.js PHP sets up CA certificates for HTTPS connections (the
in-browser PHP [will fake the CA chain
eventually](#1093))

This is the first step towards a consistent Boot Protocol, see
#1379 for
more details.

## Testing Instructions

* Confirm the CI checks work
* Run `bun packages/playground/cli/src/cli.ts server --login`, confirm
the server starts without issues, test wp-admin and HTTPS-reliant
features like the plugin directories.

We'll need a set of unit tests for these new boot-related features,
let's create them sooner than later.
@adamziel adamziel mentioned this pull request May 17, 2024
@adamziel
Copy link
Collaborator Author

node-forge is quite slow when it comes to certificate generation, perhaps https://pkijs.org/docs/examples/certificates-and-revocation/create-and-validate-certificate would be faster

@adamziel
Copy link
Collaborator Author

adamziel commented Sep 9, 2024

Surfacing this comment:

I got this from Claude, it's probably incorrect but it's fast :D It may or may not be a good starting point for CA cert generation:

/**
 * Generate a CA.pem certificate pair dynamically in the browser with no dependencies
 * using just the Browser-native crypto API.
 */

async function generateCaPem() {
	const certInfo = {
		serialNumber: '1',
		validity: {
			notBefore: new Date(),
			notAfter: new Date(Date.now() + 1000 * 60 * 60 * 24 * 365),
		},
		subject: {
			commonName: 'Root CA',
		},
		issuer: {
			commonName: 'Root CA',
		},
		extensions: {
			basicConstraints: {
				critical: true,
				cA: true,
			},
			keyUsage: {
				digitalSignature: true,
				keyCertSign: true,
			},
		},
	};

	const crypto = window.crypto;
	const encoder = new TextEncoder();
	const decoder = new TextDecoder();

	const caKey = await crypto.subtle.generateKey(
		{
			name: 'RSASSA-PKCS1-v1_5',
			modulusLength: 2048,
			publicExponent: new Uint8Array([1, 0, 1]),
			hash: 'SHA-256',
		},
		true,
		['sign', 'verify']
	);

	// Create a simple ASN.1 structure for the certificate
	const tbs = encoder.encode(JSON.stringify({
		version: 3,
		serialNumber: certInfo.serialNumber,
		issuer: certInfo.issuer,
		subject: certInfo.subject,
		validity: {
			notBefore: certInfo.validity.notBefore.toISOString(),
			notAfter: certInfo.validity.notAfter.toISOString(),
		},
		extensions: certInfo.extensions,
	}));

	const signature = await crypto.subtle.sign(
		{
			name: 'RSASSA-PKCS1-v1_5',
		},
		caKey.privateKey,
		tbs
	);

	// Combine TBS and signature into a simple certificate structure
	const cert = encoder.encode(JSON.stringify({
		tbsCertificate: decoder.decode(tbs),
		signatureAlgorithm: 'sha256WithRSAEncryption',
		signatureValue: btoa(String.fromCharCode(...new Uint8Array(signature))),
	}));

	const caPem = `-----BEGIN CERTIFICATE-----\n${btoa(decoder.decode(cert))}\n-----END CERTIFICATE-----`;
	const caKeyPem = await exportKeyToPem(caKey.privateKey);

	return { caPem, caKeyPem };
}

async function exportKeyToPem(key) {
	const exported = await crypto.subtle.exportKey('pkcs8', key);
	const exportedAsBase64 = btoa(String.fromCharCode(...new Uint8Array(exported)));
	return `-----BEGIN PRIVATE KEY-----\n${exportedAsBase64}\n-----END PRIVATE KEY-----`;
}

generateCaPem().then(({ caPem, caKeyPem }) => {
	console.log({ caPem, caKeyPem });
});

@adamziel
Copy link
Collaborator Author

node-forge is cool, but it can also be slow. Down the road we could potentially replace it with a WASM module

@adamziel adamziel force-pushed the trunk branch 2 times, most recently from 680cd19 to 2e376d2 Compare October 4, 2024 09:24
adamziel added a commit that referenced this pull request Oct 15, 2024
Adjusts setupFetchNetworkTransport() to funnel all PHP-sourced
network requests through the CORS proxy.

This is intentionally not adding that logic to the `fetch` event handler
in the service worker, as that would also affect all the cross-origin
`fetch()` requests initiated by JavaScript shipped with WordPress
plugins.

 ## Changes in this PR

* Adds a `corsProxyUrl` option to `setupFetchNetworkTransport()`
* Adds the constants required by the CORS proxy to run in the local dev setup
* Configures the CORS proxy to provide the Access-Control-Allow-Origin
  headers in the local setup where it's running on a different host.

 ## Concerns

* Is it too permissive?
* I built this to turn Playground into an RSS reader with @akirk's
  Friends plugins. It can fetch the list of feeds as expected, but then
  it usess SimplePie to parse the feeds and that requires curl which we
  don't provide in the web version of Playground. #1093 is a blocker for
  that. Until that's resolved, I'm not compelled to prioritize this.

 ## Testing changes

TBD
adamziel added a commit that referenced this pull request Oct 23, 2024
…nslate it into fetch() (#1926)

Enables HTTPS requests from PHP via `file_get_contents()`, curl, and all
other networking mechanisms. This PR effectively performs a MITM attack
on the PHP instance to decrypt the outbound traffic, run the request
using `fetch()`, and then provide an encrypted response – everything as
if PHP was directly talking to the right server.

## How is it implemented?

Emscripten can be configured to stream all network traffic through a
WebSocket. `@php-wasm/node` and `wp-now` use that to access the internet
via a local WebSocket->TCP proxy, but the in-browser version of
WordPress Playground exposes no such proxy.

This PR ships a "fake" WebSocket class. Instead of starting a `ws://`
connection, it translates the raw HTTP/HTTPS bytes into a `fetch()`
call.

In case of HTTP, the raw request bytes are parsed into a Request object
with a body stream and passes it to `fetch()`. Then, as the response
status, headers, and the body arrive, they're stream-encoded as raw
response bytes and exposed as incoming WebSocket data.

In case of HTTPS, we the raw bytes are first piped through a custom
TCPConnection class as follows:

1. We generate a self-signed CA certificate and tell PHP to trust it
using the `openssl.cafile` PHP.ini option
1. We create a domain-specific child certificate and sign it with the CA
private key.
1. We start accepting raw encrypted bytes, process them as structured
TLS records, and perform the TLS handshake.
1. Encrypted tunnel is established
	* TLSConnection decrypts the encrypted outbound data sent by PHP
	* TLSConnection encrypts the unencrypted inbound data fed back to PHP

From there, the plaintext data is treated by the same HTTP<->fetch()
machinery as described in the previous paragraph.

## Implementation details

This PR ships:

* PHP.wasm bindings to pipe the outbound bytes through a `WebSocket <->
TLS <-> fetch()` pipeline.
* A subset of TLS 1.2 protocol implementation (parts of [RFC
5246](https://datatracker.ietf.org/doc/html/rfc5246), [RFC
6066](https://datatracker.ietf.org/doc/html/rfc6066.html), [RFC
4492](https://datatracker.ietf.org/doc/html/rfc4492#section-5.4), [RFC
8446](https://www.iana.org/go/rfc8446), [RFC
6070](https://www.ietf.org/rfc/rfc6070.txt))
* SSL certificate generator supporting CA certs signed certs

### TLS 1.2

* Parses all TLS record types: handshakes, alerts, application data.
* Performs the full TLS handshake required for ECDH encryption including
the necessary TLS 1.2 extensions.
* Correctly encrypts and decrypts all the post-handshake data.
* Uses `window.crypto()` for encryption.
* Only supports the `TLS1_CK_ECDHE_RSA_WITH_AES_128_GCM_SHA256` mode.
* Doesn't support multiple `ChangeCipherSpec` messages.

### SSL certificate generator

* CA certificate is generated at WASM boot (if networking is enabled)
* Host-specific certificate is generated at every request and signed
with CA private key
* Certificates are created using a custom
[ASN.1/DER](https://letsencrypt.org/docs/a-warm-welcome-to-asn1-and-der/)
encoder and a PEM exporter shipped in this PR
* Only RSA 2048 with SHA-256 supported today

## Avenues explored but not pursued

This work supersedes
#1093 where
`node-forge` was used. Here's why I'm moving to a custom TLS
implementation:

* `node-forge` runs everything synchronously and ships a lot of code.
`window.crypto` is async, faster, bundles less code, and is more
convenient than `node-forge`.
* With `node-forge`, every error made me question fundamentals like the
RSA implementation. With `window.crypto()`, I feel confident assuming
that encryption, hashing, signing etc. are implemented correctly.
* `node-forge` doesn't support TLS 1.3. Neither does this PR, but after
implementing TLS 1.2 I think adding TLS 1.3 support would be reasonably
easy

## Testing instructions

Go to the URL below and confirm you see "Hello-dolly.zip downloaded from
https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip has this
many bytes: int(1887)"

```
http://localhost:5400/website-server/?php=8.0&wp=6.6&networking=yes&language=&multisite=no&random=f1qv1twpssr#%7B%22landingPage%22:%22/network-test.php%22,%22preferredVersions%22:%7B%22php%22:%228.0%22,%22wp%22:%22latest%22%7D,%22phpExtensionBundles%22:%5B%22kitchen-sink%22%5D,%22steps%22:%5B%7B%22step%22:%22writeFile%22,%22path%22:%22/wordpress/network-test.php%22,%22data%22:%22%3C?php%20echo%20'Hello-dolly.zip%20downloaded%20from%20https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip%20has%20this%20many%20bytes:%20';%20var_dump(strlen(file_get_contents('https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip')));%22%7D%5D%7D
```

From there, you could manipulate the URL in the `file_get_contents()`
call to fetch a different file, file with no CORS headers, invalid URLs
etc. Confirm that each time PHP did something sensible, e.g. displayed
the length, displayed the error message, etc. It should never just hang.

Also, confirm the newly added CI tests work as expected.

## Remaining work

- [x] Add a solid unit and E2E test suite, especially for:
    - [x] Streaming: bytes, pause, more bytes
    - [x] `fetch()` exceptions
    - [x] Slow servers
    - [x] POST requests
- [x] Add abundant docstrings to explain what's happening at each stage
- [x] Core work
- [x] Be more strict in `httpRequestToFetch` about HTTP (plaintext) vs
HTTPS (go through TLS) vs other protocols (reject connection). For
example, check ports, pay attention to parsing errors, etc.
   - [x] Rebuild all the in-browser PHP.wasm versions
   - [x] Don't run any of this code when networking is disabled
- [x] Continue using the custom handler for the Requests library to
enable direct `fetch()` calls without the
encrypt->decrypt->encrypt->decrypt overhead.
- [x] Clean it up

## Follow up work

* Caching – perhaps as a follow-up
  * Ship a precomputed CA cert and private key
  * Memoize host-specific certificates

CC @brandonpayton @bgrgicak @dmsnell @mho22
@adamziel
Copy link
Collaborator Author

Superseded by #1926

@adamziel adamziel closed this Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

PHP feature parity
3 participants