Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Handling Sensitive Data When Using ENV Variables #1018

Open
oswaldolinhares opened this issue Apr 18, 2024 · 6 comments
Open

Issue with Handling Sensitive Data When Using ENV Variables #1018

oswaldolinhares opened this issue Apr 18, 2024 · 6 comments

Comments

@oswaldolinhares
Copy link

oswaldolinhares commented Apr 18, 2024

Description
I'm encountering an issue with the VCR gem, specifically related to handling sensitive data through environment variables. My setup involves making requests to the Telegram API, where I use a URL formatted as https://api.telegram.org/bot_TELEGRAM_BOT_TOKEN_/sendMessage and send a body with the content chat_id=TELEGRAM_CHAT_ID&text=Test+message.

Issue
I have configured VCR to filter sensitive data such as TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID using the following configuration:

c.filter_sensitive_data('TELEGRAM_BOT_TOKEN') { ENV.fetch('TELEGRAM_BOT_TOKEN', nil) }
c.filter_sensitive_data('TELEGRAM_CHAT_ID') { ENV.fetch('TELEGRAM_CHAT_ID', nil) }

According to my understanding, once sensitive data is declared and replaced in the cassette, there shouldn't be a need to continuously refill the environment variables (ENV). This should be necessary only on the initial cassette creation or when generating a new cassette.

Current Behavior
Currently, despite declaring the environment variables as sensitive and having them filtered out in the cassette, VCR still requires these variables to be filled for the tests to pass. This requires developers to continuously manage these sensitive ENV variables even after the initial cassette creation.

Expected Behavior
I believe the expected behavior should be:

Fill the environment variables.
Declare them as sensitive to ensure they are replaced in the cassette.
Generate the cassette using the necessary environment variables.
Post-cassette generation, there should no longer be a requirement to refill these environment variables unless a new cassette is needed or the existing one is being modified.
Request
Can we have a review of how sensitive data is managed in VCR, especially after the cassette is generated? Ideally, once a cassette is created and sensitive data is replaced, further tests should not require refilling of these sensitive ENV variables.

expected no Exception, got #<URI::InvalidURIError: bad URI(is not URI?): "/bot<TELEGRAM_BOT_TOKEN>/sendMessage">

"The characters < and > make the URL invalid."


http_interactions:

  • request:
    method: post
    uri: https://api.telegram.org/bot_TELEGRAM_BOT_TOKEN_/sendMessage
    body:
    encoding: UTF-8
    string: chat_id=TELEGRAM_CHAT_ID&text=Test+message
    headers:
    User-Agent:
    - Faraday v2.9.0
    Content-Type:
    - application/x-www-form-urlencoded
    Accept-Encoding:
    - gzip;q=1.0,deflate;q=0.6,identity;q=0.3
    Accept:
    - "/"
    response:
    status:
    code: 200
    message: OK
    headers:
    Server:
    - nginx/1.18.0
    Date:
    - Wed, 17 Apr 2024 16:15:43 GMT
    Content-Type:
    - application/json
    Content-Length:
    - '265'
    Connection:
    - keep-alive
    Strict-Transport-Security:
    - max-age=31536000; includeSubDomains; preload
    Access-Control-Allow-Origin:
    - "*"
    Access-Control-Allow-Methods:
    - GET, POST, OPTIONS
    Access-Control-Expose-Headers:
    - Content-Length,Content-Type,Date,Server,Connection
    body:
    encoding: UTF-8
    string: '{"ok":true,"result":{"message_id":41,"from":{"id":7013369343,"is_bot":true,"first_name":"BBzim","username":"BBzimBOT"},"chat":{"id":TELEGRAM_CHAT_ID,"title":"Hotel
    Terezinha","type":"group","all_members_are_administrators":true},"date":1713370543,"text":"Test
    message"}}'
    recorded_at: Wed, 17 Apr 2024 16:15:43 GMT
    recorded_with: VCR 6.2.0

Ruby: 3.3.0
Gem: VCR 6.2.0
HTTP: Faraday 2.9.0
Mock: WebMock 3.23.0
Rails: 7.1.3
Rspec: 6.1.1

@olleolleolle
Copy link
Member

👋 Hi! Hope your day is great.

I think I understand the request: to somehow "know" the secret value to be replaced, even while not knowing the actual secret value? So that developers won't need to lug around some ENV vars they need to be careful about.

The filter_sensitive_data setup code here is taking the known secret from an ENV variable, defaulting to nil.

Let's say we give it some fake values instead of nil? Such as "fake-bot-token" and "fake-chat-id". And then, where you configure your Telegram client in this app, pass in those same default-in-test-environment values, so that requests will be matching the ones in the recorded interactions?

@oswaldolinhares
Copy link
Author

oswaldolinhares commented Apr 18, 2024

Hi Olle

I hope you are well. I have been to Sweden, specifically Stockholm, and I was impressed by how friendly the people are and how much they enjoy nature.

I'll describe a step-by-step process

Step 1
I add environment variables to my .env file
TELEGRAM_BOT_TOKEN="xyzCaLaBreZo"
TELEGRAM_CHAT_ID="-2222222"

Step 2
I configure VCR to mask these sensitive data
c.filter_sensitive_data('_TELEGRAM_BOT_TOKEN_') { ENV.fetch('TELEGRAM_BOT_TOKEN', nil) }
c.filter_sensitive_data('_TELEGRAM_CHAT_ID_') { ENV.fetch('TELEGRAM_CHAT_ID', nil) }

Note: If I use the suggested format (<TELEGRAM_BOT_TOKEN>) by the platform, it causes an error in the URL, so I use _TELEGRAM_BOT_TOKEN_

Step 3
I run the tests and the content of the ENVs is replaced in the cassette

module Notification
  class TelegramService
	...
    def initialize(options = {})
      @token = options.fetch(:token, ENV.fetch('TELEGRAM_BOT_TOKEN', nil))
      @chat_id = options.fetch(:chat_id, ENV.fetch('TELEGRAM_CHAT_ID', nil))
    end
end
Cassette:
http_interactions:
- request:
    method: post
    uri: https://api.telegram.org/bot_TELEGRAM_BOT_TOKEN_/sendMessage

Step 4
I clear the content of the ENV
TELEGRAM_BOT_TOKEN=""
TELEGRAM_CHAT_ID=""

Expected: That the test would continue working, since I already have the API response in the cassette

What happens: The test fails

IMPORTANT: If I do not use the actual API data the first time, the API will not respond. What I would like is that after the cassette is generated, I would not be obligated to provide the real env value; this would help in CI and also prevent breaking tests for developers who do not have the real ENV content.

@olleolleolle
Copy link
Member

Thanks for the steps, it helps.

My suggestion is that you do keep using a TELEGRAM_BOT_TOKEN during recordings, and falling back to a fake value when not recording.

What if in Step 4, there was some content_ in the ENVs? Unsetting them, or setting them to something blank could make the requests made not match the recording.

@oswaldolinhares
Copy link
Author

oswaldolinhares commented Apr 21, 2024

Step 1
I add environment variables to my .env file
TELEGRAM_BOT_TOKEN="xyzCaLaBreZo"
TELEGRAM_CHAT_ID="-2222222"

Step 2
I configure VCR to mask these sensitive data
c.filter_sensitive_data('TELEGRAM_BOT_TOKEN') { ENV.fetch('TELEGRAM_BOT_TOKEN', nil) }
c.filter_sensitive_data('TELEGRAM_CHAT_ID') { ENV.fetch('TELEGRAM_CHAT_ID', nil) }

Step 3
I run the tests and the content of the ENVs is replaced in the cassette

Step 4
Run test after generation cassette, tests OK

Step 5
I configure VCR to mask these sensitive data
c.filter_sensitive_data('TELEGRAM_BOT_TOKEN') { 'token 1'}
c.filter_sensitive_data('TELEGRAM_CHAT_ID') { 'token 2' }

Run tests, tests FAIL

Step 6
I configure VCR to mask these sensitive data
c.filter_sensitive_data('TELEGRAM_BOT_TOKEN') { 'TELEGRAM_BOT_TOKEN' }
c.filter_sensitive_data('TELEGRAM_CHAT_ID') { 'TELEGRAM_CHAT_ID' }
Run tests, tests FAIL

Step 7
I configure VCR to mask these sensitive data
c.filter_sensitive_data('TELEGRAM_BOT_TOKEN') { ENV.fetch('TELEGRAM_BOT_TOKEN', nil) }
c.filter_sensitive_data('TELEGRAM_CHAT_ID') { ENV.fetch('TELEGRAM_CHAT_ID', nil) }
Run tests, tests OK

if change my envs, tests FAIL

Step 8
change my env file
TELEGRAM_BOT_TOKEN=TELEGRAM_BOT_TOKEN
TELEGRAM_CHAT_ID=TELEGRAM_CHAT_ID

change my sensitive_data
c.filter_sensitive_data('TELEGRAM_BOT_TOKEN') { 'TELEGRAM_BOT_TOKEN' }
c.filter_sensitive_data('TELEGRAM_CHAT_ID') { 'TELEGRAM_CHAT_ID' }
Run tests, tests FAIL

@galori
Copy link
Collaborator

galori commented Dec 16, 2024

Hi @oswaldolinhares - where is the failure coming from? Will you share a full stack trace and error message?

I have a guess:

VCR is not actually intercepting the request when you are expecting it to play back the cassette. What is happening instead is that the code path is falling through to your HTTP library and/or API client library, and it is attempting to actually make the HTTP request but without the token.

This would be plausible because VCR matches on :method and :uri by default to decide if a recorded cassette matches a request.

This would require your configuration to allow HTTP requests to go through when there is not a matching recording.

The stack trace would provide more insight.

Also, a quick way to prove or disprove this (and this would actually be the fix too) is to configure VCR with match_requests_on: to exclude the :uri. Although: depending on how many interactions are recorded in the same cassette, this may not be enough to differentiate each request. If thats the case you can provide a custom matcher that ignores the bot_TELEGRAM_BOT_TOKEN_ part of the uri with a simple regex.

@vfonic
Copy link
Contributor

vfonic commented Jan 18, 2025

@oswaldolinhares this step should pass:

Step 7
I configure VCR to mask these sensitive data
c.filter_sensitive_data('TELEGRAM_BOT_TOKEN') { ENV.fetch('TELEGRAM_BOT_TOKEN', nil) }
c.filter_sensitive_data('TELEGRAM_CHAT_ID') { ENV.fetch('TELEGRAM_CHAT_ID', nil) }
Run tests, tests OK

if change my envs, tests FAIL

This is how VCR is supposed to work.
When you change your ENVs, your tests should pass.

c.filter_sensitive_data(<NEW TEXT THAT REPLACES>) { <TEXT TO BE REPLACED> }

From my understanding, this is what happens:

  1. When storing a new cassette, VCR looks into the request/response and finds all occurrences of <TEXT TO BE REPLACED>. It then replaces it with <NEW TEXT THAT REPLACES> and saves the cassette into a file.
  2. When using an existing cassette, for every request, VCR finds all occurrences of <TEXT TO BE REPLACED>. It then replaces it with <NEW TEXT THAT REPLACES> and then searches in the cassette if there's a matching request. If found, it returns the response to web mocking library (most likely webmock gem).

I believe this is not VCR issue, but there's something wrong with your VCR setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants