Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to update from a circleci/postgres ram image? #2

Closed
caleb15 opened this issue Sep 1, 2021 · 12 comments
Closed

How to update from a circleci/postgres ram image? #2

caleb15 opened this issue Sep 1, 2021 · 12 comments

Comments

@caleb15
Copy link

caleb15 commented Sep 1, 2021

I noticed there's no ram tags like in circleci/postgres. Does cimg-postgres use ram by default? If not, please consider this a feature request for images with ram tags.

@FelicianoTech
Copy link
Contributor

FelicianoTech commented Sep 14, 2021

This image does not use a RAM disk by default. When the image goes GA, this will be considered. One of the tasks right now though is to see what the performance difference is. On more modern hardware, some people have reported not much of a difference. I'd like to run a benchmark on CircleCI.

ref: https://schinckel.net/2019/09/25/speeding-up-postgres-using-a-ram-disk/

@bf4
Copy link

bf4 commented Dec 13, 2021

couldn't we be able to specify a mount something like e.g.

    docker:
      - image: cimg/postgres:13.5-postgis
        options: >-
            --mount type=tmpfs,destination=/var/lib/postgresql/data

@BytesGuy
Copy link
Contributor

@bf4 It is not possible to set docker-level options like this in CircleCI. What you can do is set the PGDATA env var to a location on the ramdisk which is available on the Docker executor, e.g., PGDATA: /dev/shm/pgdata/data. This is essentially all the old -ram variants used to do: https://github.com/CircleCI-Public/circleci-dockerfiles/blob/master/postgres/images/13.2/ram/Dockerfile

On the topic of whether it is worth using a ramdisk, from some testing I have done today using pgbench, I have found very little difference in the latency when using the ramdisk for this image. Most of the results came back within a margin of error and in some instances using the ramdisk ended up marginally slower.

If anyone has any other benchmarks to share, it would be helpful in determining whether it is worth producing a -ram variant for the service images. As it stands, it seems unlikely there will be much benefit here at the moment. Will defer to @FelicianoTech to have the final say on this though 🙂

@caleb15
Copy link
Author

caleb15 commented Dec 14, 2021

Hypothesis: When the database has a small amount of data, or if only a certain section of data gets called often, then the ramdisk would provide no benefit because the data would get cached in ram after it's pulled from storage. Where the ramdisk could be helpful is if you have a larger-size database and you're pulling more varied data. I'm guessing a lot of CI users (including us) fall into the first category.

@bf4
Copy link

bf4 commented Dec 14, 2021 via email

@BytesGuy
Copy link
Contributor

@caleb15 You raise a valid point. I went back and re-ran my pgbench tests with a higher scaling factor so it had a much larger database to deal with (around 1.6GB) and ran a 5 minute test.

With ramdisk:

pghost: /var/run/postgresql pgport: 5432 nclients: 100 duration: 300 dbName: postgresql://postgres@localhost/circle_test
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 100
number of threads: 1
duration: 300 s
number of transactions actually processed: 14052503
latency average = 2.135 ms
tps = 46840.030647 (including connections establishing)
tps = 46840.397260 (excluding connections establishing)

Without ramdisk:

pghost: /var/run/postgresql pgport: 5432 nclients: 100 duration: 300 dbName: postgresql://postgres@localhost/circle_test
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 100
number of threads: 1
duration: 300 s
number of transactions actually processed: 16481347
latency average = 1.820 ms
tps = 54937.399633 (including connections establishing)
tps = 54937.801775 (excluding connections establishing)

Re-running these several times showed a range of 1.8-2.4ms latency average regardless of whether the ramdisk was used or not.

Example config for this test:

jobs:
  build:

    docker:
      - image: cimg/base:2021.11
        environment:
          TEST_DATABASE_URL: postgresql://postgres@localhost/circle_test

      - image: cimg/postgres:14.1
        environment:
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: password
          POSTGRES_DB: circle_test

    steps:
      - checkout
      - run: sudo apt-get update && sudo apt-get install postgresql-client postgresql-contrib
      - run:
          name: install dockerize
          command: wget https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz && sudo tar -C /usr/local/bin -xzvf dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz && rm dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz
          environment:
            DOCKERIZE_VERSION: v0.6.1            
      - run: dockerize -wait tcp://localhost:5432 -timeout 2m
      - run: pgbench -i -s 100 -h 127.0.0.1 -p 5432 -d $TEST_DATABASE_URL --quiet
      - run: pgbench -c 100 -T 300 -S -n -p 5432 -d $TEST_DATABASE_URL >> benchmark 2>&1
      - run: tail -n 50 benchmark

@bf4 Seems the env var trick doesn't work without updating the docker entrypoint due to a hard coded path. I spun an example image here which you can test if you want. As above though, I cannot see any gains to using a ramdisk in this instance.

https://hub.docker.com/layers/bytesguy/postgres/14.1-ram/images/sha256-77303339ee71e93c367afacc18c79c697286807c46943f8fe737b592663ef877?context=explore

@bf4
Copy link

bf4 commented Dec 16, 2021

So, we used to run circleci/postgres:13.2-postgis-ram but when we were playing with docker and codespaces we switched to postgis/postgis:13-3.1-alpine since in part I didn't know about this project at the time. Which is a long way to say, the I'm using the postgis/postgis:13-3.1-alpine image right now. I ran CI a few times with and without PGDATA: /dev/shm/pgdata/data and it wasn't a big difference, but tests ran in about 1hr10min without and 50min with. So, it's a little faster for us to point PGDATA to shm (tmpfs). (Looking at the build steps I can confirm it's picked up the env var)

    - image: postgis/postgis:13-3.1-alpine
      environment:
        PGDATA: /dev/shm/pgdata/data

shrug

@FelicianoTech
Copy link
Contributor

@BytesGuy Any thoughts here on using the PGDATA envar? If we think this is useful, perhaps we can close out this Issue by having Jeff adding it to the readme?

@FelicianoTech
Copy link
Contributor

We've determined no noticeable difference between versions right now to justify the extra variant.

@caleb15
Copy link
Author

caleb15 commented Feb 14, 2022

We switched to cimg/postgres. We haven't done any performance tests but no-one's complained about a slowdown. For others wondering about performance I would recommend trying upgrading to cimg and then you could use a custom docker image if that's too slow for you.

@dopry
Copy link

dopry commented Jan 27, 2023

@FelicianoTech I've been running builds with and without PGDATA on shm. The builds on SHM seem to run in 8-10 minutes and builds without SHM run in 9-12 minutes. 10-20% is not insignificant. Our build run a Django test suite that uses fixtures pretty extensively so large datasets get reloaded with each test case, and individual tests are wrapped in transactions and rolled back. Have you done benchmarks with similar load profiles where you may be pumping 300K-3M of data into the DB between each test with hundreds of tests?

@oleksii-leonov
Copy link

@FelicianoTech
We are running large Ruby on Rails app integration test suite on CircleCI.

With our custom PostgreSQL 15.3 + PostGIS image with PGDATA placed in shm tests are completed in 9 minutes.
With cimg/postgres:15.3-postgis it takes 14 minutes.

In our case, it's 50% longer, so we keep using our own custom PostgreSQL + PostGIS image instead of cimg/postgres:15.3-postgis.

We would like to switch to cimg/postgres:15.3-postgis. But without -ram option it's a too big performance penalty for our use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants