Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom TLS verification logic #30

Closed
4 tasks done
tegefaulkes opened this issue Jun 9, 2023 · 34 comments
Closed
4 tasks done

Custom TLS verification logic #30

tegefaulkes opened this issue Jun 9, 2023 · 34 comments
Assignees
Labels
development Standard development r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices

Comments

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Jun 9, 2023

Specification

We need the ability to override the TLS verification logic. This is essential for Polykey to check the identity of a connection by verifying the cert chain. We need to fail any connection that can't prove it's the Node we're trying to connect to.

The boring SslContextBuilder has a method called set_verify_callback which we can use to set a verify function. We need to translate a rust function into an async call we provide in the js space.

The method we provide it will be given the normal verification result as well as a x509 store reference we can use to inspect the cert chain. We return a bool corresponding to our verification result. This result will override the normal verification result.

The set_verify_callback also sets the verification mode. I think this would override what set_verify sets for the mode.

Additional context

Tasks

  • 1. Work out how to call an js async function from the rust code with napi.
  • 2. Work out the details of how set_verify_callback works.
  • 3. Add a way to set the verification callback when creating a quiche config.
  • 4. Add tests for custom verification.
@tegefaulkes tegefaulkes added the development Standard development label Jun 9, 2023
@tegefaulkes tegefaulkes self-assigned this Jun 9, 2023
@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Jun 9, 2023

Few observations about the verify_peer_callback

  1. The callback is called for each cert in the chain.
  2. We're given the cert store containing all the certs to verify
  3. The current cert is gotten with current_cert()
  4. The chain is gotten with chain()
  5. The order of the certs seems to be root cert to leaf. So if a CA cert was provided, that would be checked first and then the certs the other side provided.

Since the verify callback will be called multiple times, we'll have to handle that some how. I think Polykey verification logic expects to be run once over the whole chain. We can shoe horn that in by caching the result of the first run and returning that for the subsequent calls.

@CMCDragonkai
Copy link
Member

Remember that callback will have to be synchronous function. NAPI will be executing it from the rust side.

@tegefaulkes
Copy link
Contributor Author

I've been prototyping how to handle the callback. It has been a frustrating process since none of this is documented very well. I've identified 4-ish ways of calling a callback.

  1. Defining a callback with fn some_fun<T: Fn() -> Result<String>(callback: T){} as the type. This was the most promising.
  2. Defining a callback with fn some_fun(env: Env, callback: JsFunction){}.
  3. Using the CallContext with fn some_fun(ctx: CallContext){}.
  4. Using a ThreadsafeFunction with method 2.
Method 1.

This should be the simplest method. The callback can be defined and used as so.

#[napi]
fn test_callback<T: Fn(String, String) -> Result<bool>>(callback: T) {
  let result = callback("Hello".to_string(), "olleh".to_string()).unwrap();
  println!("result: {}", result);
}

It will work synchronously and should be thread-safe? I'm not sure. The problem with this however is that it doesn't work in a class factory method. @CMCDragonkai has made an issue for it at napi-rs/napi-rs#1374. So I don't think we can use this method here.

Method 2.

Method 2 is similar to 1 but uses JsValues when calling the callback. It depends on env to create new values e.g. env.create_string("hello"). This is not thread safe due to it's dependence on Env to create the JsValues.

We need thread safety to call the callback within the boring validate function.

#[napi]
pub fn callback_two(env: Env, callback: JsFunction) -> napi::Result<()> {
  let result = callback.call(None, &vec![env.create_string("hello").unwrap()]);
  let val = result
    .unwrap()
    .coerce_to_number()
    .unwrap()
    .get_int64()
    .unwrap();
  println!("result was: {}", val);
  Ok(())
}
Method 3.

I can't create a working example of this. The documentation gives examples of this using functions using the #[js_function] macro. But I can't find any info on napi functions defined this way. Defining a function this way doesn't result in a method on the quiche native module.

I don't think this method would work, I don't think it's thread-safe?

Method 4.

I think due to the thread safety, we have to use a ThreadsafeFunction when using the callback. A ThreadsafeFunction can be created and used as such.

#[napi]
pub fn call_threadsafe_function(callback: JsFunction) -> Result<()> {
  let tsfn: ThreadsafeFunction<(u32, String), ErrorStrategy::CalleeHandled> = callback
    .create_threadsafe_function(0, |ctx| {
      let(num, s) = ctx.value;
      println!("value: {}", num);
      println!("value: {}", s);
      ctx.env.create_string("Hello!").map(|v| vec![v])
    })?;
  println!("Making native call");
  thread::spawn(move || {
    tsfn.call(
      Ok((100, "hello!".to_string())),
      ThreadsafeFunctionCallMode::Blocking,
    );
    println!("Result!: {}", result.unwrap());
  });
  Ok(())
}

We should be able to use this within boring's validate_callback function and it will be thread safe. But with the current version of napi there is no way to get the result of calling the ThreadsafeFunction.

The tsfn.call() only returns a status and no value. Looking at the rust docs, the newer version provides call_async and call_with_return_value which both return the result. But we'll need to update the version of napi-rs we're using.

@tegefaulkes
Copy link
Contributor Author

Using tsfn.call_async I can get the result of the callback. But unfortunately I can't do this synchronously. I think it's possible to call an async function synchronously using tokio. but that will have it's own caveats.

@CMCDragonkai
Copy link
Member

Ideally we should not introduce any rust based IO runtimes like Tokio.

If it cannot be used in a factory method, perhaps it should be set after construction of the rust struct? This is supposed into the config object right? Then why not call a method on the config object or does it have to be part of the boring SSL context?

@tegefaulkes
Copy link
Contributor Author

Acording to https://napi.rs/docs/compat-mode/concepts/thread-safe-function, any function has to be converted to a ThreadsafeFunction if we want to call it between threads. And a ThreadsafeFunction can only be called asynchronously.

This means it's not possible to use any js callback functions within the verify_callback for the following reasons.

  1. We can only use a ThreadsafeFunction across threads.
  2. We can only get the result of a ThreadsafeFunction asynchronously.
  3. The verify_callback we provide to set_verify_callback can only be synchronously.

Note that this is only a problem if we want to get the result of the callback. It is possible to call the callback synchronously and provide information such as the cert PEMs and the internal validation result.

So now I think we need to think about a work-around. Strictly we only need to override the verify failing due to a missing CA cert. The cert chain can be verified on the application level and any errors and connection rejection can be handled there.

The callback can used to signal the secured event and provide the certificate information.

So we can side-step this and move forward with the following steps.

  1. Hard code a verify function to allow self-signed cert chains.
  2. Have the hard-coded verify function call a callback to trigger the securedP event.
  3. Have the application handle any application specific cert validation and manually shutdown the connection with an app level error.

@tegefaulkes
Copy link
Contributor Author

It's not ideal but I have something workable.

  1. I've added a flag to withBoringSslCtx called verify_allow_fail. If both verify_peer and verify_allow_fail are set to true then the peer must send it's certs, but any specific problems with certs are ignored. So if no certs are provided then we get a TLSFail. Providing a self-signed or incomplete chain will not result in a failure.
  2. Added a verify_callback which is called with the details of each cert. It will be called in order of root -> inter -> leaf. It will be called separately for each cert in the chain. This will be useful for knowing when the remote certs have been received. We can get most of this information from alternative means but it's a good event for knowing when the connection is secured.

The information being provided to the callback has the following format. A depth of 0 is the last cert to be verified in the chain.

  {
    preSuccess: true,
    errorMessage: 'ok',
    depth: 0,
    length: 2,
    cert: '-----BEGIN CERTIFICATE-----\n' +
      'MIIDLjCCAhagAwIBAgIRAOP1Hv5+GXJMxyBvMi6IFSAwDQYJKoZIhvcNAQELBQAw\n' +
      'EDEOMAwGA1UEAxMFcnNhQ0EwHhcNMjMwNDE5MDU0ODU4WhcNMjQwNDE4MDU0ODU3\n' +
      'WjAPMQ0wCwYDVQQDEwRyc2ExMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKC\n' +
      'AQEAu9DbhhyDmXQZhcUxmJMbUPd8w1YPnZjNTIDREWMtjRaLmeKylPL1cuL8Wmsj\n' +
      'wvFMbx2FhDjSXsQDzjPmFNIbT3F+i2VmolAvKUHG4JqUyfcN3E+rty1Lytvu3vDV\n' +
      '3OC8i1o0Y89QTXiRJIjfSBktPy3y8hskKJ6nAAbL0MQKRyKpzIV2k6epLd1Xx6dl\n' +
      'RtA4QHihcD5HedbD8OaVRLBbeJ/nx38Kb/vW1G6Uivd9yVdLOi306epKlks60bSW\n' +
      '1U5Ffy/2MkVW35/8JSyoAExOHBNix7LYaNB2ajCjLCjcrQI3AFcGuep8OPIS0MvD\n' +
      'y/srxR/PR1nWl3muhClGChkP+wIDAQABo4GDMIGAMA4GA1UdDwEB/wQEAwIFoDAd\n' +
      'BgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwHQYDVR0OBBYEFFFzBNyAicB+\n' +
      'yUT8xh+isRcSixzkMB8GA1UdIwQYMBaAFPl6JPRYyNrrxk30v1MV8LTYwqLLMA8G\n' +
      'A1UdEQQIMAaCBHJzYTEwDQYJKoZIhvcNAQELBQADggEBAEQydSNhxW1v+f5J6v0K\n' +
      'vhWSqJmwTwhxhAvQctnophPPJb2cuyfcc0X1x+JlZFJD4eEHu5qvaYMVv7EY4Is8\n' +
      'xliZbQj4L9x0246q7j44XuRlK67rj4moIoT/hLBMwRM300LIABVUHvZjAy0Fl+hs\n' +
      'OuVl7m1iASHoJ07mTHoKkh4tdSjIYiMoF3Nbag4hu01iJYJIK48NzvrPxCzOb4zh\n' +
      'eMhQG5C/2bijWMSPI112ksCV1PT9skuj4/R0bCuMzdJ/RWxOl1sF9pXACwpi/y6M\n' +
      'dEPebTEWIOJtHB05IMb4nLy5DoOwW6o8ZrCjVZqdNps3LCJG2PrMkYVeKYYC3ecU\n' +
      'Pug=\n' +
      '-----END CERTIFICATE-----\n',
    chain: [
      '-----BEGIN CERTIFICATE-----\n' +
        'MIIDLjCCAhagAwIBAgIRAOP1Hv5+GXJMxyBvMi6IFSAwDQYJKoZIhvcNAQELBQAw\n' +
        'EDEOMAwGA1UEAxMFcnNhQ0EwHhcNMjMwNDE5MDU0ODU4WhcNMjQwNDE4MDU0ODU3\n' +
        'WjAPMQ0wCwYDVQQDEwRyc2ExMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKC\n' +
        'AQEAu9DbhhyDmXQZhcUxmJMbUPd8w1YPnZjNTIDREWMtjRaLmeKylPL1cuL8Wmsj\n' +
        'wvFMbx2FhDjSXsQDzjPmFNIbT3F+i2VmolAvKUHG4JqUyfcN3E+rty1Lytvu3vDV\n' +
        '3OC8i1o0Y89QTXiRJIjfSBktPy3y8hskKJ6nAAbL0MQKRyKpzIV2k6epLd1Xx6dl\n' +
        'RtA4QHihcD5HedbD8OaVRLBbeJ/nx38Kb/vW1G6Uivd9yVdLOi306epKlks60bSW\n' +
        '1U5Ffy/2MkVW35/8JSyoAExOHBNix7LYaNB2ajCjLCjcrQI3AFcGuep8OPIS0MvD\n' +
        'y/srxR/PR1nWl3muhClGChkP+wIDAQABo4GDMIGAMA4GA1UdDwEB/wQEAwIFoDAd\n' +
        'BgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwHQYDVR0OBBYEFFFzBNyAicB+\n' +
        'yUT8xh+isRcSixzkMB8GA1UdIwQYMBaAFPl6JPRYyNrrxk30v1MV8LTYwqLLMA8G\n' +
        'A1UdEQQIMAaCBHJzYTEwDQYJKoZIhvcNAQELBQADggEBAEQydSNhxW1v+f5J6v0K\n' +
        'vhWSqJmwTwhxhAvQctnophPPJb2cuyfcc0X1x+JlZFJD4eEHu5qvaYMVv7EY4Is8\n' +
        'xliZbQj4L9x0246q7j44XuRlK67rj4moIoT/hLBMwRM300LIABVUHvZjAy0Fl+hs\n' +
        'OuVl7m1iASHoJ07mTHoKkh4tdSjIYiMoF3Nbag4hu01iJYJIK48NzvrPxCzOb4zh\n' +
        'eMhQG5C/2bijWMSPI112ksCV1PT9skuj4/R0bCuMzdJ/RWxOl1sF9pXACwpi/y6M\n' +
        'dEPebTEWIOJtHB05IMb4nLy5DoOwW6o8ZrCjVZqdNps3LCJG2PrMkYVeKYYC3ecU\n' +
        'Pug=\n' +
        '-----END CERTIFICATE-----\n',
      '-----BEGIN CERTIFICATE-----\n' +
        'MIIC8DCCAdigAwIBAgIRAJCwiiQ3TTulJhS9EK55cXUwDQYJKoZIhvcNAQELBQAw\n' +
        'EDEOMAwGA1UEAxMFcnNhQ0EwHhcNMjMwNDE5MDU0ODEyWhcNMjQwNDE4MDU0ODEy\n' +
        'WjAQMQ4wDAYDVQQDEwVyc2FDQTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC\n' +
        'ggEBAOPvjy4GlvFbX7iCAyt/jd0+eHlEFx/zMFwBPLZmqIA3HPJwj2cp5TY5NEWk\n' +
        'MM42qQLzCOmO/K4f9m6WUZw1J9tKGIEc7b+RQ/JisHz8iiFpEDayROrV4Cs/Nd0r\n' +
        'wvTautTXl6bd37rI4V8o0yHB0E/WcazAUJQDwvl7bVL942qB86LQqsEfxMIM7Ddq\n' +
        'S+9dyr67RG1MMw5Qm0889l3yDKkKOKy3I6w4dAdfCZSeZFWXl7GeIVPj1Rc6Fxt8\n' +
        'Tg79Cm5lgMnmqzGBZUS4eitJq6BhTyQfoylHayCkpqhcojVDbcN5D1bH/YTUh/38\n' +
        'klDaOwqCajpdu6VUr1eRGrl55A0CAwEAAaNFMEMwDgYDVR0PAQH/BAQDAgEGMBIG\n' +
        'A1UdEwEB/wQIMAYBAf8CAQEwHQYDVR0OBBYEFPl6JPRYyNrrxk30v1MV8LTYwqLL\n' +
        'MA0GCSqGSIb3DQEBCwUAA4IBAQDIACzucf/ePefGgLVTWMVvZdfW9cXuf0Ib3zWC\n' +
        '6RYbA+cBBIgn0SGToY613vIeKV1uPhFpzYJd+3eFVj9x1gVHTK7Hk6o5Tf780ZKN\n' +
        'd4TkoTp9WER/boaOB+77nH9WopvE+ien1GM2HTgOBUrClPGCZ2OAJBq4xxDDeU2T\n' +
        'rxlRD06692CdVh5uSUtCIsqoCUTi7s+C9p5WG1gOVIkafFmyR2i28E2n0rKlNzXQ\n' +
        'UTo3936xFx8cdIJVzvPFOvMhKrOTzCCcMD8NiXXHlX51HMy3fua16W8dfLKhJp4N\n' +
        '1PFD57chAacE/HKlnB4LugDLCl+MBhpuyxt+Ifk0geqeV15p\n' +
        '-----END CERTIFICATE-----\n'
    ]
  }

The callback has the type (err, data) => void and will not override any verification result.

Now to do verification on the application level we can do the following.

  1. start connection with verify_peer: true, verify_allow_fail: true to force certs but allow failure.
  2. wait for established
  3. wait for secured signified by the callback been called with depth === 0 indicating the whole cert chain should be available. (server certs are available immediately, client certs on a delay.)
  4. Verify the peer's cert chain.
  5. On failure, force close the connection with custom error.

I think this is very similar to the validation process the proxy did.

@tegefaulkes
Copy link
Contributor Author

One problem with this is that the connection can establish before TLS validation is completed. On the application level we need to make sure not to start any streams until the following conditions.

  1. Connection is established
  2. Connection is secured, signalled by the verificationCallback being called,
  3. Verify the remote certificate, force the connection close if verification fails.
  4. Wait for the peer to verify the connection and potentially close.

I'm not sure of a good way to know that the peer has verified, possibly the first stream can signal this?

@CMCDragonkai
Copy link
Member

We talked about this before. There is a trick about the exact quic packets that is received. On the whiteboard somewhere.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Jun 16, 2023

It seems that the verify callback is called for each error in the chain. At minimum that means the callback the called at least once for each cert in the chain if it succeeded with error message 'ok'. Or multiple times if the cert had an error.

This means that it's not clear that any callback is the last one to be made on the chain. Unless it's depth 0 and succeeded with an 'ok' message. I'll need to add a done flag or done callback to signal it. But I need a way in the internal callback to know. I'll do some more digging.

This can be very useful. It means we can override specific verification errors such as 'self signed certificate in certificate chain', 'unable to get local issuer certificate' or 'unable to verify the first certificate'.

I think given this, we can re-create our Polykey custom verification using a combination of two primitive overrides that we can hard code.

  1. Override the result of any specific error code to force it to succeed. Errors such as self signed cert can be allowed.
  2. Provide an optional array of public keys (NodeIds) and fail if the chain does not contain any keys in that list.

This should make verification failures part of quiches internal logic. but it doesn't solve a timing problem with the client connections. The client doesn't know when the server has finished verifying it. So in reality for any connections there are 3 events for when the connection can be considered fully established.

  1. Connection established returns true from quiche.
  2. Server TLS validation completes.
  3. Client TLS validation completes.
  4. We won't expect and TLSFAIL events here and can consider the connection successfully opened.

If I recall, the servers certs are verified early in the connection and happens before the connection is established?
But the client certs are verified after the connection is established and can result in the connection failing with TLSFail after we think the connection is open.

So we still need a way to know from both sides that both TLS verification steps have completed. This is a problem weather we hard-code the verification logic or handle it on the application level. Worst case we'll need to communicate this somehow on the application level. This could just be opening the first stream after verifying locally.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Jun 16, 2023

We talked about this before. There is a trick about the exact quic packets that is received. On the whiteboard somewhere.

Ah I remember that. Ok so we'll have a way of knowing that event on both ends.

There is one thing I need your feedback on. I have outlined two was of handling the verification now.

  1. Application level where we allow the hard coded verification to fail, do our own verification and force the connection to fail with an error.
  2. hard-coded level where we can configure some overrides in the boring verification_callback and re-create our application level verification logic.
  3. Combination of the two where we have the freedom of the hard-coded overrides and the application level.

Option 2. likely works better with the connection lifecycle events. but it's simple enough to do option 3.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Jun 16, 2023

Useful reference for verification error messages.
https://github.com/google/boringssl/blob/e1b8685770d0e82e5a4a3c5d24ad1602e05f2e83/crypto/x509/x509_txt.c

Error codes are also defined in https://github.com/google/boringssl/blob/e1b8685770d0e82e5a4a3c5d24ad1602e05f2e83/include/openssl/x509.h#L2654

So an error such as 'self signed certificate in certificate chain' refers to...
'self signed certificate in certificate chain' | X509_V_ERR_SELF_SIGNED_CERT_IN_CHAIN | 19.

@tegefaulkes
Copy link
Contributor Author

Based on the discussion just now. We're going to cut out the verify_callback and allow failure override and handle the custom TLS verification on the application level. It needs the following changes

  1. Connection needs to take a verification callback, this will be run as soon as the remote certs are available.
  2. The connection needs to know when the remote certs should be available. There should be a type of incoming packet where they should be available.
  3. If the verification callback fails, we need to force close the connection with an error resembling a TLSFail.

I'll need to look at the new native tests to divine the stage when certs should be available. Hopefully this doesn't vary based on certificate type or chain size.

@tegefaulkes
Copy link
Contributor Author

Client certs are not sent unless we try to verify the client. So verifyPeer for the server must be true for certs to be sent. So we'll have to override the verify callback anyway so we can set verifyPeer to true so that the client's certs are requested, but override the results.

@CMCDragonkai
Copy link
Member

How about this? Set verify peer to be true but then override the callback to return true if the peer's certificate is fine without the identity check. Then apply the application level check on top. Only do this if one gives a custom JS callback in the first place.

@CMCDragonkai
Copy link
Member

That way the library is compatible outside PK and still works for what we need inside PK.

@CMCDragonkai
Copy link
Member

Client certs are not sent unless we try to verify the client. So verifyPeer for the server must be true for certs to be sent. So we'll have to override the verify callback anyway so we can set verifyPeer to true so that the client's certs are requested, but override the results.

Wait I don't get this. How does the client know not to send the client server if server doesn't enable verify peer?

@tegefaulkes
Copy link
Contributor Author

After reviewing the tests connection lifecycle tests I can see that After each side verifies the TLS and it succeeds, the send a short frame. If the TLS failed, it sends a handshake frame with the error instead. This means we can use receiving a short frame as an indication that the peer succeeded all TLS checks.

The peer certs are available after receiving the initial frame. So the custom TLS verification can be done after receiving the initial frame. This needs to be done synchronously before sending the next frame out.

If a connection receives a short frame then we can reasonably assume the peer will not fail any verification and the connection is fully established.

Two changes need to be done

  1. Add the verifyCallback to the connection creation. This must be called after receiving the initial frame from the peer. Any failure of this callback should immediately close the connection with a descriptive error.
  2. Add a secured promise that resolves when a short frame is first received. or reject with any error such as the TLS fail.

@tegefaulkes
Copy link
Contributor Author

Ok, The custom TLS verification callback has been implemented. It is a simple callback that is called with the remote cert chain in PEM format (certs: Array<string>) => void. If the TLS fails then we throw an error. This is caught and translated into closing the connection.

Now that I think about it, It could just return {code: number, message: string} | undefined. No need to do anything with catching errors here. In any case I need to review how this will work with mapping errors to codes and vice versa.

We have a working secured event that triggers when the peer has verified without error. It is triggered by the first short header received that doesn't trigger the draining state. Creating the client awaits the secure event before resolving, resulting in an error if it failed.

I need to update this so it only triggers a short frame has be received AND sent. Otherwise the server side connection can trigger the secured event even when client verification fails. In this case the client accepts the server's certs and sends the short frame. Ideally the connection is secure when both sides give the thumbs up. It's a simple change.

I added some tests for demonstrating custom TLS success and failure. I also fixed two skipped tests that depended on the secured event.

Some things that still need to be done.

  1. Secure event should only trigger after both sides have secured without error.
  2. use a proper error code for failing TLS verification. Either pretend to be a normal TLS failure or use custom codes and error mapping. I think we just use the normal TLS fail code here? either 304 or 372?
  3. General review.

One last thing. I've been working off of the staging branch here but the changes need to be applied to the PR. There are two ways I can go about this.

  1. Apply the change to the staging branch so we have a working commit with the new feature. This could be pre-published and used. I'll have to re-base the PR on the new feature.
  2. Transplant the changes into the PR directly.

I'm leaning towards option 1 here. It keeps better history but is a little more annoying to PR. Not too bad though.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Jun 19, 2023

When this is done I can go two ways.

  1. Proceed with custom TLS on Polykey.
  2. Take over the js-quic refactoring PR with the monitor and life cycle changes.

Extra note: I need to update js-async-init version for locks dependency, same for polykey. async-init should manually publish.

@tegefaulkes
Copy link
Contributor Author

There may be a problem with OKP certs. If you fail the custom TLS verification callback on the client side, the server side doesn't respect the close frame with the error and will time out instead. I'm not sure why this happens, but it may be another issue with quiche.

I'll dig into it more after transplanting the changes into the main PR.

@CMCDragonkai
Copy link
Member

Can you check that the normal quiche based TLS failure works and examine the received failure code on the close frame and ensure that you are doing the exact code and whatever other information when you force a close frame from JS-based TLS failure. Based on my native tests the OKP works fine when quiche fails TLS normally.

@CMCDragonkai
Copy link
Member

Can you check that the normal quiche based TLS failure works and examine the received failure code on the close frame and ensure that you are doing the exact code and whatever other information when you force a close frame from JS-based TLS failure. Based on my native tests the OKP works fine when quiche fails TLS normally.

The only difference would be what the close frame/packet looks like to the other side. And I'm sure that if this is made exactly equal, we the same behaviour is expected since programs are deterministic.

@tegefaulkes
Copy link
Contributor Author

Unless we manually craft the packet with the error we can't get the same result. A normal TLS failure results in a handshake frame containing the failure. Us closing the connection with the same error code results in a CC frame to close the connection.

@CMCDragonkai
Copy link
Member

That's strange, I thought during TLS failure we still get a CC frame? Did you check this on Wireshark?

If truly there's no CC frame, then you want to force close if that what is happening. Check the native tests that I've done, there is certain conditions where it does wait to timeout.

Alternatively we can craft the packet if we have all the information to send into the UDP socket.

@CMCDragonkai
Copy link
Member

It's also because when the other side checks the actual error object, it gets the TLS code and other information. Where does information come from if not from a close frame?

@tegefaulkes
Copy link
Contributor Author

I need to test some expectation using the native connection tests. Based on the discussion, it seems that the handshake needs to complete before we can close the connection.

What I was doing was closing the connection immediately after receiving the peer certs and verifying them. What I should do is wait for the handshake to complete before verifying and closing the connection.

This means receiving and sending the initial frames.

I'll need to expand the native tests to check this expectation for each type of cert.

@tegefaulkes
Copy link
Contributor Author

Curious, I wasn't aware of this but it seems that connection.timeout() is actually aware of time passing. Calling it multiple times will return an updated time. Calling connection.onTimeout() will not cause a state transition unless the time has actually passed.

  console.log
    
    ----client before timeout----
    established: true,
    draining: true,
    closed: false,
    resumed: false,
    earlyData: false,
    peerCerts: Avaliable,
    timeout: 404,

      at Object.log (tests/native/quiche.tls.test.ts:2732:15)

  console.log
    
    ----client after sleeping 50ms----
    established: true,
    draining: true,
    closed: false,
    resumed: false,
    earlyData: false,
    peerCerts: Avaliable,
    timeout: 353,

      at Object.log (tests/native/quiche.tls.test.ts:2735:15)

  console.log
    
    ----client after sleeping timeout----
    established: true,
    draining: true,
    closed: true,
    resumed: false,
    earlyData: false,
    peerCerts: Avaliable,
    timeout: null,

      at Object.log (tests/native/quiche.tls.test.ts:2738:15)


@tegefaulkes
Copy link
Contributor Author

I've updated the native TLS tests in feature-key-gen branch. The earliest point we can close the connection from the application using connection.close is when the following conditions have been met.

  1. Connection is established.
  2. Connection has sent after establishing.

As an example, for a RSA connection the events for client <-> server will be

  1. initial <-
  2. initial ->
  3. handshake <-
  4. client established
  5. handshake ->
  6. server established
  7. client or server can close connection

For a Ed25519

  1. initial <-
  2. client established
  3. initial ->
  4. server established
  5. client or server can close connection

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 21, 2023 via email

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 21, 2023 via email

@tegefaulkes
Copy link
Contributor Author

Ok, Looks like everything is working now. The main changes were.

  1. custom verification is run after the first outgoing packet where established is true and certs are available. This ensures all the required handshake negotiation has completed.
  2. The securedP event is resolved after the first short is sent and received, connection is not draining and at least 1 packet has been received after the previous conditions. The first received packet after the shorts should be a CC or ping frame.
  3. A ping frame is sent after verification succeeds to ensure a frame is always sent after a short frame. There are some cases with certain certs where that was not the case.

I'll need to review and clean the changes but I think this is addressed now. Changes are still in the custom_verification branch just to have a commit for reference when transplanting the changes to the PR.

@tegefaulkes
Copy link
Contributor Author

Just looked at the spec, I really need to update it with details of the new changes.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jul 6, 2023

Closed by #26.

@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
Development

No branches or pull requests

2 participants