Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(platform): Add aws-secretsmanager-jdbc driver in dependencies #5968

Merged

Conversation

atul-chegg
Copy link
Contributor

@atul-chegg atul-chegg commented Sep 16, 2022

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@atul-chegg atul-chegg force-pushed the add-aws-mysql-jdbc-driver branch from f83b097 to 0d3b678 Compare September 16, 2022 14:35
@atul-chegg
Copy link
Contributor Author

Issue link #5872

@github-actions
Copy link

github-actions bot commented Sep 19, 2022

Unit Test Results (build & test)

562 tests   562 ✔️  13m 55s ⏱️
139 suites      0 💤
139 files        0

Results for commit aefeed3.

♻️ This comment has been updated with latest results.

@atul-chegg atul-chegg force-pushed the add-aws-mysql-jdbc-driver branch from 37f2f02 to 073250a Compare September 19, 2022 15:02
@shirshanka
Copy link
Contributor

Does adding the jar on the classpath automatically ensure that secrets are resolved from the secret store?

@anshbansal anshbansal added the product PR or Issue related to the DataHub UI/UX label Sep 19, 2022
@atul-chegg
Copy link
Contributor Author

atul-chegg commented Sep 19, 2022

@shirshanka

Does adding the jar on the classpath automatically ensure that secrets are resolved from the secret store?

Yes. Please see documentation at https://github.com/aws/aws-secretsmanager-jdbc
We just need to change EBEAN_DATASOURCE_DRIVER to com.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver for mysql and EBEAN_DATASOURCE_USERNAME to secret ID in env variables. ENV variable EBEAN_DATASOURCE_PASSWORD can be set to empty as it is not used. And scheme for EBEAN_DATASOURCE_URL should be "jdbc-secretsmanager:mysql:"
And yes, container should have access on AWS Secret manager. How to grant that access is out of scope of this MR. In EKS, this access is usually granted using IRSA.

@anshbansal anshbansal added the community-contribution PR or Issue raised by member(s) of DataHub Community label Sep 20, 2022
@atul-chegg
Copy link
Contributor Author

I tested the following way.

  1. Checkout
git clone git@github.com:atul-chegg/datahub.git
git checkout add-aws-mysql-jdbc-driver
git merge master
  1. Build image. I use mac.
atul.atri@C02FD3A3MD6M datahub % docker buildx ls
NAME/NODE       DRIVER/ENDPOINT STATUS  BUILDKIT PLATFORMS
default *       docker                           
  default       default         running 20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
desktop-linux   docker                           
  desktop-linux desktop-linux   running 20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
atul.atri@C02FD3A3MD6M datahub % docker buildx create --use desktop-linux
frosty_williams
atul.atri@C02FD3A3MD6M datahub % docker buildx ls   
NAME/NODE          DRIVER/ENDPOINT  STATUS  BUILDKIT PLATFORMS
frosty_williams *  docker-container                  
  frosty_williams0 desktop-linux    running v0.10.4  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
default            docker                            
  default          default          running 20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
desktop-linux      docker                            
  desktop-linux    desktop-linux    running 20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
atul.atri@C02FD3A3MD6M datahub % docker buildx build -f ./docker/datahub-gms/Dockerfile --tag atulchegg/datahub-gms:6dc5e46eb --platform='linux/amd64,linux/arm64' .
<..output skipped..>
atul.atri@C02FD3A3MD6M datahub % docker images
REPOSITORY                             TAG               IMAGE ID       CREATED          SIZE
atulchegg/datahub-gms                  6dc5e46eb         b214d3d8105b   20 minutes ago   442MB
  1. Push image
atul.atri@C02FD3A3MD6M datahub % docker login
<..output skipped..>
atul.atri@C02FD3A3MD6M datahub % docker push atulchegg/datahub-gms:6dc5e46eb 
<..output skipped..>
  1. Update helm values for GMS service. Notice the following
    a. Image is changed to custom image containing aws-secretsmanager-jdbc
    b. Added EBEAN_DATASOURCE_* Env variables. This is not done in global values so other services are not affected.
    c. Env variable EBEAN_DATASOURCE_USERNAME points to secret ID in secret manager. The value for this secret ID should be JSON containing usernmae and password.
    {"username": "db-user-name", "password": "db-user-password"}
    d. Scheme in env variable EBEAN_DATASOURCE_URL changed to jdbc-secretsmanager:mysql:
    e. Env variable EBEAN_DATASOURCE_DRIVER is changed to com.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver
datahub-gms:
  image:
    repository: atulchegg/datahub-gms
    tag: 6dc5e46eb
  extraVolumes:
      - name: secrets-store-inline
        csi:
          driver: secrets-store.csi.k8s.io
          readOnly: true
          volumeAttributes:
            secretProviderClass: datahub-secrets
  extraVolumeMounts:
    - name: secrets-store-inline
      mountPath: "/mnt/secrets-store"
      readOnly: true
  serviceAccount:
    annotations:
      "eks.amazonaws.com/role-arn": "${datahub_role_arn}"
  service:
    type: LoadBalancer
    port: 443
    targetPort: http
    protocol: TCP
    name: https
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      service.beta.kubernetes.io/aws-load-balancer-scheme: internal
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: ${acm_certificate_id}
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
      service.beta.kubernetes.io/aws-load-balancer-security-groups: ${datahub_frontend_sg_id}
      service.beta.kubernetes.io/aws-load-balancer-subnets: ${datahub_frontend_lb_subnets}
  podAnnotations:
    prometheus.io/scrape: 'true'
    prometheus.io/path: '/metrics'
    prometheus.io/port: '4318'
  extraEnvs:
    - name: EBEAN_DATASOURCE_USERNAME
      value: "test/datahub/mysql/datahub-db/datahub"
    - name: EBEAN_DATASOURCE_PASSWORD
      value: "dummy-pass-not-used"
    - name: EBEAN_DATASOURCE_HOST
      value: "${mysql_server}:3306"
    - name: EBEAN_DATASOURCE_URL
      value: "jdbc-secretsmanager:mysql://${mysql_server}:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8&enabledTLSProtocols=TLSv1.2"
    - name: EBEAN_DATASOURCE_DRIVER
      value: "com.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver"

It works very well. It picks DB username and password from secret manager those are automatically updated when credentials in secret manager are rotated.

@shirshanka Please let me know if this MR can be reviewed and merged. Please let me know if any more work is needed from my side on this.

@jjoyce0510
Copy link
Collaborator

Running CI. I think the PR looks okay. Will go ahead and merge once CI passes.

@atul-chegg
Copy link
Contributor Author

@jjoyce0510 Please let me know if it can be merged now

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Since we don't currently have a way to load external jars, accepting this PR.
Thanks @atul-chegg for the thorough testing on your end!

It does increase the surface area of jars that datahub needs to ship with which increases our vulnerability surface area as well, so will trim this back to a "provided jar" dependency after we have that mechanism.

@shirshanka shirshanka merged commit d53cbbb into datahub-project:master Sep 23, 2022
@atul-chegg
Copy link
Contributor Author

@shirshanka

Thank you for accepting my MR.

About external jars,

I really do not know how you are going to implement this. But I did some testing.

I started datahub-gms container with sleep 7200 command so that I can login to the container. After SSHing to the container,

  1. I downloaded aws-secretsmanager-jdbc and all it's dependent libraries.
cd /home/datahub
curl -L -O http://search.maven.org/remotecontent?filepath=org/apache/ivy/ivy/2.5.0/ivy-2.5.0.jar
java -jar ivy-2.5.0.jar -dependency com.amazonaws.secretsmanager aws-secretsmanager-jdbc 1.0.8 -retrieve "/home/datahub/dependencies/lib/[artifact]-[revision](-[classifier]).[ext]"
  1. Then I tried to start datahub gms service with the following command. I added --lib option
java -javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --lib /home/datahub/dependencies/lib --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war

But it did not work because of two reasons

  1. aws-secretsmanager-jdbc documentation mentions that real driver ("com.mysql.cj.jdbc.Driver") must be pre-registered (issue: 44). "com.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver" is just a wrapper that under the hood uses "com.mysql.cj.jdbc.Driver".
  2. Jetty started reporting multiple versions. Gradle, at the time of packaging, resolves dependencies and download correct version of dependent library. But in my testing, I downloaded these dependencies externally, so it caused some multiple version issue.

I hope this information helps datahub team when they implement external jars functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants