Wizard MVP proposal #3499

Jonathan-Rosenberg · 2022-06-13T16:15:04Z

This is a proposal for the time to value wizard MVP.
For further info please refer to #3411, #3373, and the Figma design.

johnnyaug · 2022-06-14T11:31:33Z

design/open/ttv-wizard-mvp.md

+    2. Imported data from <S3 location>.
+    3. Generated the following configurations:
+       <Spark configurations with [“<secret1>”, “<secret2>”...] hidden>
+    4. Instructions to configure Hive Metastore with lakeFS.


And, which lakectl metastore commands to run when creating/deleting/merging a branch/tag.

I thought that we wanted to have the following:

Have a step in the Wizard that generates the Hive configurations snippet

Include instructions for which lakectl commands to use when creating/deleting/merging a branch/tag (as @johnnyaug mentioned).

Do you think that the extra step is meaningless?

Point one will generate configurations that are exactly the same as the Spark configurations generated...

johnnyaug

Boom

talSofer

Thanks!
Can you please add a section that talks about where the templates are stored?

(forgot to hit submit yesterday...)

talSofer · 2022-06-15T07:30:34Z

design/open/ttv-wizard-mvp.md

+    2. Imported data from <S3 location>.
+    3. Generated the following configurations:
+       <Spark configurations with [“<secret1>”, “<secret2>”...] hidden>
+    4. Instructions to configure Hive Metastore with lakeFS.


I thought that we wanted to have the following:

Have a step in the Wizard that generates the Hive configurations snippet

Include instructions for which lakectl commands to use when creating/deleting/merging a branch/tag (as @johnnyaug mentioned).

Do you think that the extra step is meaningless?

talSofer · 2022-06-15T07:54:36Z

design/open/ttv-wizard-mvp.md

+        2. Generated the following configurations:
+           <Spark configurations with [“<secret1>”, “<secret2>”...] hidden>
+        3. Instructions to configure Hive Metastore with lakeFS.
+        4. Generated this README file and committed it.


I would remove step number 4

arielshaqed

NEAT!

How do we render the README? I assume this requires an additional feature.

arielshaqed · 2022-06-16T10:40:23Z

design/open/ttv-wizard-mvp.md

+
+The wizard UI component is responsible for the user’s Spark onboarding process. The process is as follows:
+1. Create a repository (named as the user wishes) and a ‘main’ branch in it.
+2. Import the user’s data to ‘main’ and display a progress bar (which will show a link to required permissions). Only available on cloud-deployed lakeFS.


Why only cloud-deployed? I suspect it will only add work to enforce this

arielshaqed · 2022-06-16T10:42:35Z

design/open/ttv-wizard-mvp.md

+
+The [templating service](https://github.com/treeverse/lakeFS/pull/3373) is responsible for fetching, authenticating, and expanding the required templates and returning them to the client.  
+**Process**:  
+1. Get the S3 templates (locations should be specified in the incoming request). The files must be valid [`html/template`](https://pkg.go.dev/html/template) or [`text/template`](https://pkg.go.dev/text/template) parsable template texts.


Not sure incoming request should entirely specify the location. Would be happier if a set prefix were configured, and the incoming request could only select below that prefix. Otherwise admins will have a very tough time ensuring the access is safe.

The proposed template service never allows access by arbitrary path precisely for this reason. It protects access to templates on lakefs in some way; if you want to do the same but on S3, we will need to design similar protections for files from S3.

arielshaqed · 2022-06-16T10:46:29Z

design/open/ttv-wizard-mvp.md

+2. Templating service- status 200
+    ```json
+    {
+        "class": "templating_service",
+        "name": "successful_call",
+        "value": "<service name>"
+    }
+    ```
+3. Templating service - status 500 - no access to provided template location
+    ```json
+    {
+        "class": "templating_service",
+        "name": "no_access",
+        "value": "<service name>"
+    }


Don't we send status on our monitoring outputs as field values, rather than as metric names?

arielshaqed · 2022-06-16T10:47:09Z

design/open/ttv-wizard-mvp.md

+4. Templating service - status 5xx - general
+    ```json
+    {
+        "class": "templating_service",
+        "name": "5xx",
+        "value": "<service name>"
+    }
+    ```
+5. Templating service - status 4xx - general
+    ```json
+    {
+        "class": "templating_service",
+        "name": "4xx",
+        "value": "<service name>"
+    }
+    ```


Don't the existing metrics on web endpoints already cover this?

possibly. I just listed what we want, not what we need to implement.

arielshaqed · 2022-06-16T10:49:20Z

design/open/ttv-wizard-mvp.md

+3. The wizard creates a repo named ‘spark-repo’ and sets a default ‘main’ branch.
+4. The wizard asks the users if they want to import existing data to lakeFS. The user skips this step using the skip button.
+5. The wizard asks the user: “How are you using Spark”, and offers three alternatives: Databricks, AWS EMR, and spark-submit. The user chooses ‘Databricks’. It also asks the user for their lakeFS endpoint (and will show a default placeholder pointing to the current URL). The user types the endpoint.
+6. The wizard sends a GET request to the templating service with a query string of the format:


Optionally, we might use the templating service as the source for the "how are you using this" data. It's some JSON, that needs to be generated somewhere, and may have some fields to specify. Looks like another use-case for templates.

arielshaqed

Thanks!

Still open issues:

Rendering the README
Monitoring: can we use existing metrics rather than define new ones?

arielshaqed · 2022-06-20T06:08:48Z

design/open/ttv-wizard-mvp.md

+
+The [templating service](https://github.com/treeverse/lakeFS/pull/3373) is responsible for fetching, authenticating, and expanding the required templates and returning them to the client.  
+**Process**:  
+1. Get the template (the location should be specified in the incoming request). The file must be a valid [`html/template`](https://pkg.go.dev/html/template) or [`text/template`](https://pkg.go.dev/text/template) parsable template text.


How do we distinguish the two? I suggest the extension before .tt, say PATH.something.tt generates PATH.something and PATH.html.tt generates PATH.html.

arielshaqed · 2022-06-20T06:11:45Z

design/open/ttv-wizard-mvp.md

+4. The wizard asks the users if they want to import existing data to lakeFS. The user specifies the location of the bucket (after they validated that the lakeFS role has the right permissions and that the bucket has the correct policy), and clicks ‘OK’.
+   * An object counter will show the progress of the import process and will signal once it’s over.


I would really like this to link to the import design. Or, if we need to change import, let's call it out and remain with one good implementation of the same feature.

Here, if a counter makes sense during the wizard, it also makes sense during a plain import.

We don't need to change the import. We don't use it as it is, just leverage it's functionality, not the exact behavior.
It's not the same feature.
There is a counter in the plain import.

The UI-Import is an assembly of 5-6 different API calls. To get the behaviour you want here you'll need to replace 1-2 API calls, but it's all pretty straight forward.

arielshaqed · 2022-06-20T06:13:17Z

design/open/ttv-wizard-mvp.md

+* SparkEMR.conf.tt
+* SparkDatabricks.conf.tt
+* SparkXML.conf.tt
+* README.md.tt


How is the generated README.md rendered?

We agreed that it will not be rendered at first.

arielshaqed · 2022-06-20T06:15:22Z

design/open/ttv-wizard-mvp.md

+
+### BI Metrics
+
+Sent directly from the GUI Wizard


I think that all of these should identify which wizard it was. Initially we expect to have only one, but later we expect to have more!

Similarly, we might report which Spark configs were generated.

They identify it as spark_wizard, did you mean something else?

Jonathan-Rosenberg · 2022-06-20T07:48:16Z

@arielshaqed, regarding monitoring, if we can leverage existing metrics, we shall do that, if not we'll make amendments to get the knowledge we need. Anyway, this is an implementation detail.

…using the suffix of the file's name

itaiad200 · 2022-06-20T18:26:04Z

design/open/ttv-wizard-mvp.md

+   2. [Figma Design](https://www.figma.com/file/haD9y599LzW6LvsYBI2xWU/Spark-use-case?node-id=31%3A200)
+4. Generates a README file with all actions performed, and will commit it as a first commit (after the import) to the ‘main’ branch of the created repository.
+
+### Templating Service


Is this service (a) bundled with the lakeFS server binary or (b) completely decoupled but shipped with the lakeFS binary or (c) a dedicated centrelized service?

itaiad200 · 2022-06-20T18:34:33Z

design/open/ttv-wizard-mvp.md

+5. The wizard asks the user for their lakeFS endpoint (and will show a default placeholder pointing to the current URL).
+6. The wizard will send a GET request to the templating service with a query string of the format:  
+    ```
+    ?lakefs_url=https://my-lakefs.io&template_location=databricksConfig.props.tt
+    ```


Why can't the service infer the lakeFS url from the request itself? The UI knows the address of the server, right?
For trickier installation, where the address depends on the caller flow (i.e. with/without VPC), supplying the address gets harder for the user itself.

The intention was that the URL will be set by default by the Wizard GUI (inferred from the URL it's working at), and we'll let the users change it if they want to.

(and will show a default placeholder pointing to the current URL).

itaiad200 · 2022-06-20T18:36:59Z

design/open/ttv-wizard-mvp.md

+    ```properties
+    spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
+    spark.hadoop.fs.s3a.access_key=ACCESSKEYDONTTELL
+    spark.hadoop.fs.s3a.secret_key=SECRETKEYDONTTELL


This kinds of break our promise when we create credentials: This is the only time where you can see your credentials - make sure to copy them....... I don't think that we shouldn't do it, only that we should probably rephrase that sentence.

Good point.
Will add that

itaiad200 · 2022-06-20T18:39:00Z

design/open/ttv-wizard-mvp.md

+    ?lakefs_url=https://my-lakefs.io&template_location=README.md.tt
+    ```
+   It will do so to generate a README.md file.
+10. The returned README file will describe the steps taken, the configurations generated but without secrets and some commands to explain how to connect Hive Metastore to lakeFS:


How does it know which steps were taken? If I run the same wizard 100 times, will it show 600 steps?

The Wizard's intention is to create a repo ready to use. Each run will create a different repo (we'll not allow overriding repos) thus a wizard can run once on each repo, making the README specify the steps taken while creating that repo.

Wizard MVP proposal

debab2f

Jonathan-Rosenberg requested review from johnnyaug, arielshaqed, talSofer and lynnro314 June 13, 2022 16:15

Jonathan-Rosenberg added exclude-changelog PR description should not be included in next release changelog team/ecosystem Team Ecosystem labels Jun 13, 2022

Jonathan-Rosenberg marked this pull request as ready for review June 14, 2022 08:11

johnnyaug reviewed Jun 14, 2022

View reviewed changes

johnnyaug approved these changes Jun 14, 2022

View reviewed changes

talSofer reviewed Jun 16, 2022

View reviewed changes

arielshaqed approved these changes Jun 16, 2022

View reviewed changes

Jonathan-Rosenberg added 2 commits June 19, 2022 17:43

Update design to math final decisions. Add BI metrics.

667b3b7

delete excalidraw file

c50b576

arielshaqed reviewed Jun 20, 2022

View reviewed changes

specify type of template (html/template or text/template) explicitly …

57cfa46

…using the suffix of the file's name

itaiad200 reviewed Jun 20, 2022

View reviewed changes

Jonathan-Rosenberg merged commit 015241e into master Jun 21, 2022

Jonathan-Rosenberg deleted the proposal/wizard branch June 21, 2022 11:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wizard MVP proposal #3499

Wizard MVP proposal #3499

Jonathan-Rosenberg commented Jun 13, 2022

johnnyaug Jun 14, 2022

talSofer Jun 15, 2022

Jonathan-Rosenberg Jun 16, 2022

johnnyaug left a comment

talSofer left a comment

talSofer Jun 15, 2022

talSofer Jun 15, 2022

arielshaqed left a comment

arielshaqed Jun 16, 2022

arielshaqed Jun 16, 2022

arielshaqed Jun 16, 2022

arielshaqed Jun 16, 2022

Jonathan-Rosenberg Jun 20, 2022

arielshaqed Jun 16, 2022

arielshaqed left a comment

arielshaqed Jun 20, 2022

Jonathan-Rosenberg Jun 20, 2022

arielshaqed Jun 20, 2022

Jonathan-Rosenberg Jun 20, 2022

itaiad200 Jun 20, 2022 •

edited

Loading

arielshaqed Jun 20, 2022

Jonathan-Rosenberg Jun 20, 2022

arielshaqed Jun 20, 2022

Jonathan-Rosenberg Jun 20, 2022

Jonathan-Rosenberg commented Jun 20, 2022

itaiad200 Jun 20, 2022

Jonathan-Rosenberg Jun 21, 2022

itaiad200 Jun 20, 2022

Jonathan-Rosenberg Jun 21, 2022

itaiad200 Jun 20, 2022

Jonathan-Rosenberg Jun 21, 2022

itaiad200 Jun 20, 2022

Jonathan-Rosenberg Jun 21, 2022

		4. The wizard asks the users if they want to import existing data to lakeFS. The user specifies the location of the bucket (after they validated that the lakeFS role has the right permissions and that the bucket has the correct policy), and clicks ‘OK’.
		* An object counter will show the progress of the import process and will signal once it’s over.

Wizard MVP proposal #3499

Wizard MVP proposal #3499

Conversation

Jonathan-Rosenberg commented Jun 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnnyaug left a comment

Choose a reason for hiding this comment

talSofer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arielshaqed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arielshaqed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itaiad200 Jun 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jonathan-Rosenberg commented Jun 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itaiad200 Jun 20, 2022 •

edited

Loading