Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spin up separate database for pytest execution #235

Merged
merged 13 commits into from
Nov 24, 2021

Conversation

earmenda
Copy link
Contributor

@earmenda earmenda commented Nov 18, 2021

Closes #121

Code Changes

  • Added a test mode that allows the server to use the test_database_url config to access the database
  • Update make pytest target to set FIDESCTL_TEST_MODE to true
  • Update make api and make cli to allow passing in a FIDESCTL_TEST_MODE. make api FIDESCTL_TEST_MODE=True
  • Add tests to test_config.py

Steps to Confirm

  • Verify that test database is created if FIDESCTL_TEST_MODE is true
  • Verify that test database is used if FIDESCTL_TEST_MODE is true

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation Updated

Description Of Changes

  • The original issue Have Pytest spin up its own test database #121 I think had the impression that this was natively supported but it requires a script to be executed which comes from a github repo. See here: https://stackoverflow.com/questions/46668233/multiple-databases-in-docker-and-docker-compose
  • Test database creation is now done through a fixture
  • Server context for tests is set through an api but I am still thinking about whether this is a good way to go about this
  • Added a test mode to the server which is configured through an environment variable FIDESCTL_TEST_MODE
  • If in test mode the config test_database_url will be used to access the database
  • Updated make commands for pytest and also launching api/cli using test mode

@earmenda
Copy link
Contributor Author

In this current state the two databases are created successfully but it requires we add this script that comes from a separate repo. It is possible to use a image that includes this script but I don't see why we would do that tbh as it would be limiting.

postgres=# \l
                                     List of databases
     Name      |  Owner   | Encoding |  Collate   |   Ctype    |     Access privileges      
---------------+----------+----------+------------+------------+----------------------------
 fidesctl      | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =Tc/postgres              +
               |          |          |            |            | postgres=CTc/postgres     +
               |          |          |            |            | fidesctl=CTc/postgres
 fidesctl_test | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =Tc/postgres              +
               |          |          |            |            | postgres=CTc/postgres     +
               |          |          |            |            | fidesctl_test=CTc/postgres

@earmenda
Copy link
Contributor Author

One alternative that is mentioned in the stackoverflow post i linked is to just have two different containers each with a different database to be created. This to me seems almost like a better solution just for the sake of simplicity.

services: 

  db:
    image: postgres
    environment: 
      - POSTGRES_DB
      - POSTGRES_USER
      - POSTGRES_PASSWORD
    ports:
      - ${POSTGRES_DEV_PORT}:5432
    volumes:
      - app-volume:/var/lib/postgresql/data

  db-test:
    image: postgres
    environment: 
      - POSTGRES_DB
      - POSTGRES_USER
      - POSTGRES_PASSWORD
    ports:
      - ${POSTGRES_TEST_PORT}:5432
    # Notice I don't even use a volume here since I don't care to persist test data between runs

volumes:
  app-volume: #

@ThomasLaPiana
Copy link
Contributor

One alternative that is mentioned in the stackoverflow post i linked is to just have two different containers each with a different database to be created. This to me seems almost like a better solution just for the sake of simplicity.

services: 

  db:
    image: postgres
    environment: 
      - POSTGRES_DB
      - POSTGRES_USER
      - POSTGRES_PASSWORD
    ports:
      - ${POSTGRES_DEV_PORT}:5432
    volumes:
      - app-volume:/var/lib/postgresql/data

  db-test:
    image: postgres
    environment: 
      - POSTGRES_DB
      - POSTGRES_USER
      - POSTGRES_PASSWORD
    ports:
      - ${POSTGRES_TEST_PORT}:5432
    # Notice I don't even use a volume here since I don't care to persist test data between runs

volumes:
  app-volume: #

my workflow is usually to run make cli and then pytest from within that shell, so I'd want this container to run as long as I'm running any docker shells. does it seem to be a bit weird to be running two containers in parallel?

Even though its maybe a bit more clunky, i think using the script is a little bit smoother and more scalable for now. There's also a fair argument to be made for what this even solves given our current setup. Hopefully people wouldn't be spinning up a database with those credentials and then running the tests in their production environment?

@ThomasLaPiana
Copy link
Contributor

I'm ok with either solution but I'm now wondering if there is an actual problem to solve here, but I'm probably overlooking something

@NevilleS @seanpreston @ethyca/fides-control any thoughts here?

@ThomasLaPiana ThomasLaPiana marked this pull request as ready for review November 18, 2021 06:12
@brentonmallen1
Copy link
Contributor

It seems like the original concern was to avoid potential impact to production data. I think that's probably worth putting some safe guards around. The script seems like a better option than trying to a esoteric docker image, imho. I would suggest maybe adding a comment stating where it came from for posterity, though.

@ThomasLaPiana
Copy link
Contributor

It seems like the original concern was to avoid potential impact to production data. I think that's probably worth putting some safe guards around. The script seems like a better option than trying to a esoteric docker image, imho. I would suggest maybe adding a comment stating where it came from for posterity, though.

we have a separate test_config.toml file though, so unless someone runs a production application using our docker-compose file (they shouldn't!) and uses the exact same credentials (they shouldn't!) I don't think this issue would happen? When this issue was created, we were still pointing people at our own make commands/compose file, but the setup/installation guide is completely different now

@NevilleS
Copy link
Contributor

That script (that leverages postgres' init scripts) isn't the only way to create a DB though- in fidesops we have code that creates a database from with python using the alembic schema. That feels like the right layer to put this kind of logic: in the application code instead of the docker configuration.

In other words, if the pytest command is what creates & destroys the test db, you make it very easy to avoid any accidents (both in production and in local dev)

@ThomasLaPiana
Copy link
Contributor

@ThomasLaPiana ThomasLaPiana marked this pull request as draft November 18, 2021 21:36
@earmenda
Copy link
Contributor Author

@NevilleS @ThomasLaPiana

Is it possible that the solution posted doesn't actually do what it is meant to? At least for our fidesctl setup it doesn't seem to work well because tests are running in a different process than the api service.

When we run make api usually we start a server in one process and then run pytest in a separate process. The usage of os.environ["TESTING"] = "True" in fidesops is odd for our use case because it would only set this variable for the test process. If fidesops works the same then I don't think setting this variable is doing anything when it is picked up by the server later on:

https://github.com/ethyca/fidesops/blob/3acd69ea1fbdc2e3ba7fabff89ae240ded577b8e/src/fidesops/db/session.py#L25

Maybe I am missing something but you shouldn't be able to modify the environment variables from a different python process.

For now I posted a working solution where we temporarily set the context server side through the api. This is working as we'd want and achieves the same as the fidesops example. This solution still feels not great to me though, but I can't think of any other way that the tests could interact with the server's context other than through the api.

@ThomasLaPiana
Copy link
Contributor

I agree it feels super weird to have code logic dedicated to spinning up a test database, that feels super bad.

If the assumption here is that this ticket exists specifically so that people don't overwrite their production data, what is the simplest way to do that? I think the simplest way is to do the following:

  1. add a new variable to the config called test_database_url
  2. in conftest.py check if test_database_url is set
    • If test_database_url is not set, throw an error and exit the test suite
    • if test_database_url matches database_url, throw an error and exit the test suite
    • If test_database_url is set and does not match database_url, set the value of database_url to that of test_database_url. This ensures that other than a new config value, no other code logic needs to change outside of conftest.py

@earmenda does this implementation make sense? does it seem reasonably elegant? Open to other suggestions, after some thought this is the cleanest I could think of

@earmenda
Copy link
Contributor Author

earmenda commented Nov 22, 2021

@ThomasLaPiana

I think the new config is a good idea but that still means that the execution of pytest would have to indicate to the server that it needs to use that database. Don't we still have that problem with this solution? Were you thinking that the service has a flag which either starts it in default or test domain to determine which database to use?

@earmenda
Copy link
Contributor Author

earmenda commented Nov 22, 2021

Here is what I currently see as some options

  • (current approach) Allow tests to temporarily change the context on the server through api - I don't really like the idea of context switching within the application, even worse having an api for it
  • When running tests the server can be spun up using a test context which points to the test database url. make api --domain=test - I think this is a good option, you get to determine which database to point at when you start up the service
  • Add a flag in api calls which tell the server to use a test database url for the current call - This could work, I am not very familiar with our api framework yet but I imagine based on a common parameter used by all the apis we could define a general behavior.
  • Have alternative test server or endpoints which are stood up with the main api service - eh?

@ThomasLaPiana
Copy link
Contributor

Here is what I currently see as some options

  • (current approach) Allow tests to temporarily change the context on the server through api - I don't really like the idea of context switching within the application, even worse having an api for it
  • When running tests the server can be spun up using a test context which points to the test database url. make api --domain=test - I think this is a good option, you get to determine which database to point at when you start up the service
  • Add a flag in api calls which tell the server to use a test database url for the current call - This could work, I am not very familiar with our api framework yet but I imagine based on a common parameter used by all the apis we could define a general behavior.
  • Have alternative test server or endpoints which are stood up with the main api service - eh?

I think the second option is the best. Maybe the API can check for a FIDESCTL_TEST_MODE=TRUE env var or something and if it is set to true, it will use the test_database_url from the config

@earmenda
Copy link
Contributor Author

@ThomasLaPiana

What do you think about this?

  • Added a check for FIDESCTL_TEST_MODE environment variable to know which database to access
  • configure_db creates the database only if FIDESCTL_TEST_MODE is enabled

With this tests don't really have to do too much differently. I know you mentioned adding a check for whether the test database is being used from the tests but I don't see that much value to that to be honest.

If this looks good I just need to add the variable to our make commands. We could make sure that make pytest uses the test database and potentially add a make test-cli and make test-api targets. What do you think?

@earmenda
Copy link
Contributor Author

With the latest change it is possible to start the server using make by passing in test mode as a parameter

make api FIDESCTL_TEST_MODE=True
make cli FIDESCTL_TEST_MODE=True

Verified that this starts the service in test mode

Test mode is enabled, creating test database if needed

@earmenda
Copy link
Contributor Author

Verified that starting the server in test mode will use the expected database

fidesctl_test=# \l
                                   List of databases
     Name      |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
---------------+----------+----------+------------+------------+-----------------------
 fidesctl      | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 fidesctl_test | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 

Verified that only the test database is used even after successfully running pytest

fidesctl=# \dt
Did not find any relations.

@earmenda
Copy link
Contributor Author

Added some tests to validate the setting of database_url through validator in test_config.py

@earmenda earmenda marked this pull request as ready for review November 24, 2021 02:01
Makefile Outdated Show resolved Hide resolved
@ThomasLaPiana
Copy link
Contributor

@earmenda sorry for hijacking this at the end! You nailed the requirements, I just had a few nits and didn't want to keep kicking it back to you for those small things

Thank you!

@earmenda
Copy link
Contributor Author

@earmenda sorry for hijacking this at the end! You nailed the requirements, I just had a few nits and didn't want to keep kicking it back to you for those small things

Thank you!

no worries, just added a comment to your change though

Copy link
Contributor

@ThomasLaPiana ThomasLaPiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ThomasLaPiana ThomasLaPiana merged commit 5618f69 into main Nov 24, 2021
@ThomasLaPiana ThomasLaPiana deleted the earmenda-pytest-isolated-database branch November 24, 2021 05:31
ThomasLaPiana pushed a commit that referenced this pull request Aug 17, 2022
ThomasLaPiana pushed a commit that referenced this pull request Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Have Pytest spin up its own test database
4 participants