Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: race condition when reloading TLS certificates #4316

Open
yurishkuro opened this issue Mar 19, 2023 · 6 comments
Open

[Bug]: race condition when reloading TLS certificates #4316

yurishkuro opened this issue Mar 19, 2023 · 6 comments
Labels
bug help wanted Features that maintainers are willing to accept but do not have cycles to implement

Comments

@yurishkuro
Copy link
Member

What happened?

This is a continuation of this discussion: #4260 (comment)

Steps to reproduce

We can probably reproduce it by creating a unit test (which will run with -race) that does the following:

  • updates the certificates a couple of times
  • meanwhile, makes a couple of client/server calls with TLS using the same tls config

Expected behavior

We expect that changes to certificates in a server take effect on future client connections. Instead, the test described above should detect a race condition as we try to update the same certPools that are already being used by the server (and therefore will be read on new client connections).

Relevant log output

No response

Screenshot

No response

Additional context

The rough proposal would be as follows:

  • make the certWatcher the owner of certPools internally, accessible via getter (instead of passing them from options.go)
  • store those pools as atomic.Value
  • when reloading the certs and adding the to the pools, create new pools via copy first, and store them in atomic.Value
  • and most importantly, define tls.Config.GetConfigForClient function by returning a new config with certPools retrieved from certWatcher (similar to how we already define cfg.Get{Client}Certificate functions). This way the server will be using immutable tls configs and cert pools

NB: this probably would not address the issue of reloading root CA for the server or the client, perhaps we should not be even supporting it. GetConfigForClient would only allow supporting reloading of ClientCA on the server, per @tsaarni 's comment.

Jaeger backend version

No response

SDK

No response

Pipeline

No response

Stogage backend

No response

Operating system

No response

Deployment model

No response

Deployment configs

No response

@yurishkuro yurishkuro added bug help wanted Features that maintainers are willing to accept but do not have cycles to implement labels Mar 19, 2023
@ChillOrb
Copy link
Contributor

@yurishkuro I am trying to reproduce the error by writing a test case.

Is this the right approach to identify the problem you are referring in the issue?

  1. Can i create self signed server certificates and keys and store them as temp files inside func generateNewCertificateFiles()
  2. Setup tls config and spin up a server
  3. In another goroutine ,in a loop, use generateNewCertificateFiles() to create new certs and reload certs for local server.
  4. In another loop try to make get calls to local server.

@yurishkuro
Copy link
Member Author

Yes, and you can do this all in a unit test, such that if run with -race flag Go should reliably detect race condition.

For step 1 you don't need to generate new certs, we already have working pairs as text fixtures in the tlscfg package.

@ChillOrb
Copy link
Contributor

ChillOrb commented May 1, 2023

@yurishkuro sorry got busy.
I am having trouble in reproducing the race condition 1. this is the test function that i came up with to reproduce the error

  1. I am trying to create option with existing test certs in tlscfg package link
  2. Here i start tls server in background with option above link
  3. In another goroutine , generating new certificates using gen-certs.sh and generate new server config by calling cfg, err := options.Config(zap.NewNop()) then i update the server config server.TLSConfig = cfg this causes another race condition.
  4. In another go routine , crating client with options and getting new config , also making Get calls to background

Not able to reproduce race conditions at

WARNING: DATA RACE
Read at 0x00c0000105e8 by goroutine 22:
  crypto/x509.(*CertPool).addCertFunc()
      /opt/hostedtoolcache/go/1.20.2/x64/src/crypto/x509/cert_pool.go:189 +0x6b8
  crypto/x509.(*CertPool).AppendCertsFromPEM()
      /opt/hostedtoolcache/go/1.20.2/x64/src/crypto/x509/cert_pool.go:227 +0x4f1
  github.com/jaegertracing/jaeger/pkg/config/tlscfg.addCertToPool()
      /home/runner/work/jaeger/jaeger/pkg/config/tlscfg/options.go:142 +0x1d7

as mentioned in comment
I want to know if this approach to test is in the right direction ?

I will try to spend some more time on this

@shashank-iitbhu
Copy link
Contributor

shashank-iitbhu commented Feb 11, 2024

Hey @yurishkuro ! I was trying to reproduce this issue. So I came up with two tests, first test TestConcurrentCertPoolAccessForDataRace link directly provoking a data race by concurrently appending certificates to an x509.CertPool while the second test TestConcurrentConfigAccess link concurrently invokes a method Options.Config that loads and update certificate pools indirectly.

@shashank-iitbhu
Copy link
Contributor

shashank-iitbhu commented Feb 11, 2024

Both the test are provoking Data Race conditions.
Data Race provoked by the first test TestConcurrentCertPoolAccessForDataRace link is similar to the actual Data Race:

==================
WARNING: DATA RACE
Read at 0x00c00012f7a0 by goroutine 8:
  runtime.makeBucketArray()
      /usr/local/go/src/runtime/map.go:346 +0x21c
  crypto/x509.(*CertPool).addCertFunc()
      /usr/local/go/src/crypto/x509/cert_pool.go:189 +0x55c
  crypto/x509.(*CertPool).AppendCertsFromPEM()
      /usr/local/go/src/crypto/x509/cert_pool.go:227 +0x404
  github.com/jaegertracing/jaeger/pkg/config/tlscfg.TestConcurrentCertPoolAccessForDataRace.func1()
      /Users/shashankmittal/Documents/Developer/jaeger/pkg/config/tlscfg/options_test.go:204 +0xa0
  github.com/jaegertracing/jaeger/pkg/config/tlscfg.TestConcurrentCertPoolAccessForDataRace.func2()
      /Users/shashankmittal/Documents/Developer/jaeger/pkg/config/tlscfg/options_test.go:212 +0x44

Previous write at 0x00c00012f7a0 by goroutine 16:
  runtime.mapaccessK()
      /usr/local/go/src/runtime/map.go:519 +0x1ec
  crypto/x509.(*CertPool).addCertFunc()
      /usr/local/go/src/crypto/x509/cert_pool.go:193 +0x594
  crypto/x509.(*CertPool).AppendCertsFromPEM()
      /usr/local/go/src/crypto/x509/cert_pool.go:227 +0x404
  github.com/jaegertracing/jaeger/pkg/config/tlscfg.TestConcurrentCertPoolAccessForDataRace.func1()
      /Users/shashankmittal/Documents/Developer/jaeger/pkg/config/tlscfg/options_test.go:204 +0xa0
  github.com/jaegertracing/jaeger/pkg/config/tlscfg.TestConcurrentCertPoolAccessForDataRace.func2()
      /Users/shashankmittal/Documents/Developer/jaeger/pkg/config/tlscfg/options_test.go:212 +0x44
==================```

@yurishkuro
Copy link
Member Author

Does it mean your tests reliably reproduce data race? If so, great job! Can you create a PR adding those tests? And since you're deep into this, do you see a way to fix this?

yurishkuro added a commit that referenced this issue Nov 24, 2024
## Which problem is this PR solving?
- Part of #4316 

---------

Signed-off-by: chahatsagarmain <chahatsagar2003@gmail.com>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
yurishkuro pushed a commit that referenced this issue Nov 26, 2024
## Which problem is this PR solving?
- Part of #4316 

## Description of the changes
- 

## How was this change tested?
- 

## Checklist
- [ ] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [ ] I have signed all commits
- [ ] I have added unit tests for the new functionality
- [ ] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

Signed-off-by: chahatsagarmain <chahatsagar2003@gmail.com>
yurishkuro added a commit that referenced this issue Nov 28, 2024
## Which problem is this PR solving?
- Part of #4316 

## Description of the changes
- 

## How was this change tested?
- 

## Checklist
- [ ] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [ ] I have signed all commits
- [ ] I have added unit tests for the new functionality
- [ ] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: chahatsagarmain <chahatsagar2003@gmail.com>
Signed-off-by: chahat sagar <109112505+chahatsagarmain@users.noreply.github.com>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
yurishkuro pushed a commit that referenced this issue Nov 28, 2024
## Which problem is this PR solving?
- Part of #4316 

## Description of the changes
- 

## How was this change tested?
- 

## Checklist
- [ ] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [ ] I have signed all commits
- [ ] I have added unit tests for the new functionality
- [ ] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: chahatsagarmain <chahatsagar2003@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help wanted Features that maintainers are willing to accept but do not have cycles to implement
Projects
None yet
Development

No branches or pull requests

3 participants