Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: s3 preview client #1499

Merged
merged 1 commit into from
Oct 2, 2021

Conversation

jroof88
Copy link
Contributor

@jroof88 jroof88 commented Sep 22, 2021

Summary of Changes

This commit implements a new preview client that gets preview data from S3. The preview client is good for folks who want to persist their preview data somewhere rather than relaying on an external API call that can both fail and take a while depending on the table the query is being run on.

In addition to the base client, I implemented a JSON version of the preview client that my organization is using. This fetches data from S3 in a JSON format and works very nicely with marshmellow serialization to the PreviewData format.

Tests

N/A. I didn't see tests for any other preview client so I decided not to implement any. The preview clients seem to be fairly custom so testing is probably overkill.

Documentation

Added a documentation file adjacent to the Apache Superset Preview Client docs 👍

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

@jroof88 jroof88 requested a review from a team as a code owner September 22, 2021 18:07
@boring-cyborg boring-cyborg bot added the area:frontend From the Frontend folder label Sep 22, 2021
@jroof88 jroof88 force-pushed the jroof-amundsen-s3PreviewClient branch from bd0e4e6 to bfc8014 Compare September 22, 2021 18:07
@jroof88 jroof88 changed the title WIP FOR CI feat: s3 preview client Sep 22, 2021
@jroof88 jroof88 force-pushed the jroof-amundsen-s3PreviewClient branch 2 times, most recently from a418829 to 6fae380 Compare September 22, 2021 18:24
@jroof88 jroof88 force-pushed the jroof-amundsen-s3PreviewClient branch from 6fae380 to a62e3db Compare September 22, 2021 21:55
@feng-tao feng-tao added the keep fresh Disables stalebot from closing an issue label Sep 23, 2021
This commit implements a new preview client that gets preview data from S3. The preview client is good for folks who want to persist their preview data somewhere rather than relaying on an external API call that can both fail and take a while depending on the table the query is being run on.

In addition to the base client, I implemented a JSON version of the preview client that my organization is using. This fetches data from S3 in a JSON format and works very nicely with marshmellow serialization to the PreviewData format.

Signed-off-by: jroof88 <jack.roof@samsara.com>
@jroof88 jroof88 force-pushed the jroof-amundsen-s3PreviewClient branch from a62e3db to 457d23f Compare September 25, 2021 01:00
@jroof88
Copy link
Contributor Author

jroof88 commented Sep 30, 2021

@feng-tao Friendly ping for a review here. This is getting a bit stale.

@feng-tao
Copy link
Member

sure, will take a look. also cc @dkunitsk @youngyjd as well

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @jroof88 , I am fine with the pr. But curious: how do you persist preview data into s3 first? and why not trigger an API call or delta sql to fetch the data?

@jroof88
Copy link
Contributor Author

jroof88 commented Oct 1, 2021

@feng-tao

how do you persist preview data into s3 first?

Databricks job that scans our data lake, run a SELECT * FROM x LIMIT 10 on tables, and uploads results in JSON to S3

why not trigger an API call or delta sql to fetch the data?

I thought about this but we want to get something up quickly and the downsides were too great.

  1. All of the network configuration would've been a headache. We'd have to someone run our Amundsen K8s cluster behind a NAT Gateway to get a stable IP on outbound requests that we could then allow list on Databricks since we have IP Allow List turned on for security
  2. There is no concept of a cache or persistence in the Preview API layer so every time someone landed on a table we would emit a SELECT * FROM x LIMIT 10 from the same table. This seemed costly.
  3. We want to leverage utilized partitions during our SELECT * FROM x LIMIT 10. There wouldn't really be an easy way to tell Amundsen how to query specific tables. On Databricks we can run a DESCRIBE x to know the partitions so we could get recent data. Most of our stuff is partitioned by date.

@jroof88 jroof88 requested a review from feng-tao October 1, 2021 17:48
@feng-tao
Copy link
Member

feng-tao commented Oct 2, 2021

@jroof88 got it, thanks for the context!

@feng-tao
Copy link
Member

feng-tao commented Oct 2, 2021

@jroof88 is it ok to add your company as user list?

@feng-tao feng-tao merged commit 20bdb07 into amundsen-io:main Oct 2, 2021
@jroof88
Copy link
Contributor Author

jroof88 commented Oct 4, 2021

@jroof88 is it ok to add your company as user list?

@feng-tao Checking with my team and I will get back to you

amommendes pushed a commit to amommendes/amundsen that referenced this pull request Jan 21, 2022
This commit implements a new preview client that gets preview data from S3. The preview client is good for folks who want to persist their preview data somewhere rather than relaying on an external API call that can both fail and take a while depending on the table the query is being run on.

In addition to the base client, I implemented a JSON version of the preview client that my organization is using. This fetches data from S3 in a JSON format and works very nicely with marshmellow serialization to the PreviewData format.

Signed-off-by: jroof88 <jack.roof@samsara.com>
Signed-off-by: Amom Mendes <amommendes@hotmail.com>
ozandogrultan pushed a commit to deliveryhero/amundsen that referenced this pull request Apr 28, 2022
This commit implements a new preview client that gets preview data from S3. The preview client is good for folks who want to persist their preview data somewhere rather than relaying on an external API call that can both fail and take a while depending on the table the query is being run on.

In addition to the base client, I implemented a JSON version of the preview client that my organization is using. This fetches data from S3 in a JSON format and works very nicely with marshmellow serialization to the PreviewData format.

Signed-off-by: jroof88 <jack.roof@samsara.com>
Signed-off-by: Ozan Dogrultan <ozan.dogrultan@deliveryhero.com>
zacr pushed a commit to SaltIO/amundsen that referenced this pull request May 13, 2022
This commit implements a new preview client that gets preview data from S3. The preview client is good for folks who want to persist their preview data somewhere rather than relaying on an external API call that can both fail and take a while depending on the table the query is being run on.

In addition to the base client, I implemented a JSON version of the preview client that my organization is using. This fetches data from S3 in a JSON format and works very nicely with marshmellow serialization to the PreviewData format.

Signed-off-by: jroof88 <jack.roof@samsara.com>
hansadriaans pushed a commit to DataChefHQ/amundsen that referenced this pull request Jun 30, 2022
This commit implements a new preview client that gets preview data from S3. The preview client is good for folks who want to persist their preview data somewhere rather than relaying on an external API call that can both fail and take a while depending on the table the query is being run on.

In addition to the base client, I implemented a JSON version of the preview client that my organization is using. This fetches data from S3 in a JSON format and works very nicely with marshmellow serialization to the PreviewData format.

Signed-off-by: jroof88 <jack.roof@samsara.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:frontend From the Frontend folder keep fresh Disables stalebot from closing an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants