Initial suggestion of a template server #3373

arielshaqed · 2022-05-17T08:14:54Z

Useful basis for decreased TTV, more maintainable Spark clients, and more
configuration of detached GUIs.

Useful basis for decreased TTV, more maintainable Spark clients, and more configuration of detached GUIs.

Jonathan-Rosenberg

Do you mind adding some kind of a flow chart to show the whole process of what you're describing here?
And just to clarify, the server will hold the templates that were configured by the administrators (and might have some predefined-ready-to-roll templates that we'll offer?), and the clients of the server will ask for the templates they need and populate them when they get them back?

johnnyaug · 2022-06-07T16:15:04Z

design/open/template-server/template-server.md

+```conf
+spark.hadoop.fs.lakefs.impl=io.lakefs.LakeFSFileSystem
+spark.hadoop.fs.lakefs.access_key={{.user.credentials.access_key.id}}
+spark.hadoop.fs.lakefs.secret_key={{.user.credentials.access_key.secret}}


Not sure it's a good idea to expose the secrets through templating

Note that these are just the API access credentials that appeared on the request. So I'm assuming your complaint is relevant also (or more) to blockstore.s3.credentials below.

If I put them here on a template, the admin can use widely-understood IAM actions to control access to the template, or to the secrets. If they're just on some endpoint, the admin needs to use a single-purpose IAM action to control access to that endpoint. How is using a consistent IAM scheme worse than exposing them on any other random API endpoint?

Thanks for clarifying about these being taken from the request.

So yes, let's discuss returning the underlying storage credentials from the configuration: I'm not referring specifically to templating - I think in general we shouldn't return them from any API endpoint.

johnnyaug · 2022-06-09T12:26:52Z

design/open/template-server/template-server.md

+[html/template][html/template] for safety.
+
+The template is read by lakeFS from a specific (configured) prefix, by
+default `lakefs://lakefs-templates/main`.  IAM authorizes users to see


I'm not sure what we gain by storing the templates in lakeFS.
This makes it harder for us to create built-in templates.
For example: there is going to be an option to create a new repository pre-configured to be connected to Spark.
So with this suggestion, we will need to pre-upload the template to an existing repository or to the new one, just for the user to get the templated Spark snippet.

@arielshaqed, WDYT?

Just the usual advantages of giving Git versioning semantics to everything. So I believe it makes management easier, at least in terms of history and lineage. For instance, attach the commit ID of the template that generated the file (add this as an available property, say). Now its lineage is clear. Or, a template suddenly seems different. Why did it change?

On the opposite side, so far each time we added direct access to the backing store we got into trouble. #2732 is still open, and is tough to fix. Every new feature we add tries to bypass lakeFS, and many of them end up getting into trouble because of this. #2491 is an attempt to bypass our known-racy implementation of direct access to (some) "repo-level settings", which are implemented by directly accessing the blockstore.

Also, if we implement for S3 we immediately get 2 more open issues to reimplement for Azure and GCS, and maybe another issue to reimplement for using a different S3 endpoint (if I'm using MinIO and want to access the templates controlled by lakeFS).

(I do understand that for some use-cases this complicates matters. Would you prefer to leave storing on lakeFS optional, and starting with an implementation that takes templates from the block adaptor... and future adding the option to store them in lakeFS? You would still run into loads of tech debt for supporting all object storage types, but at least we could move on to the other important features!)

My only concern was that taking templates from lakeFS complicates our first time-to-value tasks - namely showing the user Spark snippets when creating a repository. If @Jonathan-Rosenberg is happy with the current design, I am too.

arielshaqed · 2022-06-12T07:19:16Z

Thanks; PTAL!

johnnyaug · 2022-06-12T09:22:07Z

design/open/template-server/template-server.md

+
+| template function | value source              | IAM permissions                              |
+|:------------------|:--------------------------|:---------------------------------------------|
+| config            | lakeFS configuration[^1]  | `arn:lakefs:template:::config/path/to/field` |


I think there should be a global, fixed whitelist of configurations we allow to expose in this way.
Some values should never be exposed, for example the installation's encryption key.

That makes sense!

I can think of 2 ways to do this:

Have a list of configuration properties that may be exported in a configuration setting.

Specify an additional (Go...) tag on the Config struct that allows a configuration setting to be exposed. I think I prefer this option: it is the most static way possible, but also easy enough for developers to add more properties if needed.

In any case, I think we should still enforce IAM for that setting. Controlling everything by permissions on the path to the template is too hard to get right when you do care, and we can just add a "everything is allowed" default policy and put that on users.

arielshaqed

Sorry, managed not to post my 2 pending comments!

arielshaqed · 2022-06-12T06:56:16Z

design/open/template-server/template-server.md

+[html/template][html/template] for safety.
+
+The template is read by lakeFS from a specific (configured) prefix, by
+default `lakefs://lakefs-templates/main`.  IAM authorizes users to see


Just the usual advantages of giving Git versioning semantics to everything. So I believe it makes management easier, at least in terms of history and lineage. For instance, attach the commit ID of the template that generated the file (add this as an available property, say). Now its lineage is clear. Or, a template suddenly seems different. Why did it change?

On the opposite side, so far each time we added direct access to the backing store we got into trouble. #2732 is still open, and is tough to fix. Every new feature we add tries to bypass lakeFS, and many of them end up getting into trouble because of this. #2491 is an attempt to bypass our known-racy implementation of direct access to (some) "repo-level settings", which are implemented by directly accessing the blockstore.

Also, if we implement for S3 we immediately get 2 more open issues to reimplement for Azure and GCS, and maybe another issue to reimplement for using a different S3 endpoint (if I'm using MinIO and want to access the templates controlled by lakeFS).

(I do understand that for some use-cases this complicates matters. Would you prefer to leave storing on lakeFS optional, and starting with an implementation that takes templates from the block adaptor... and future adding the option to store them in lakeFS? You would still run into loads of tech debt for supporting all object storage types, but at least we could move on to the other important features!)

arielshaqed · 2022-06-12T07:00:47Z

design/open/template-server/template-server.md

+```conf
+spark.hadoop.fs.lakefs.impl=io.lakefs.LakeFSFileSystem
+spark.hadoop.fs.lakefs.access_key={{.user.credentials.access_key.id}}
+spark.hadoop.fs.lakefs.secret_key={{.user.credentials.access_key.secret}}


Note that these are just the API access credentials that appeared on the request. So I'm assuming your complaint is relevant also (or more) to blockstore.s3.credentials below.

If I put them here on a template, the admin can use widely-understood IAM actions to control access to the template, or to the secrets. If they're just on some endpoint, the admin needs to use a single-purpose IAM action to control access to that endpoint. How is using a consistent IAM scheme worse than exposing them on any other random API endpoint?

Jonathan-Rosenberg · 2022-06-12T10:40:29Z

pkg/config/template.go

@@ -31,8 +31,7 @@ type S3AuthInfo struct {
 }

 // Output struct of configuration, used to validate.  If you read a key using a viper accessor
-// rather than accessing a field of this struct, that key will *not* be validated.  So don't
-// do that.
+// rather than accessing a field of this struct, that key will *not* be validated.  So don'// do that.


Whoops, sorry!
Reverted.

arielshaqed · 2022-06-12T18:51:15Z

Added a new_credentials template function, as discussed.

Jonathan-Rosenberg · 2022-06-12T20:36:45Z

design/open/template-server/template-server.md

+spark.hadoop.fs.lakefs.impl=io.lakefs.LakeFSFileSystem
+{{with $creds := new_credentials}}
+spark.hadoop.fs.lakefs.access_key={{$creds.ID}}
+spark.hadoop.fs.lakefs.secret_key={{$creds.Secret}}


Suggested change

spark.hadoop.fs.lakefs.secret_key={{$creds.Secret}}

spark.hadoop.fs.lakefs.secret_key={{$creds.Secret}}

{{end}}

talSofer

Very interesting, thanks!
Added a few questions

talSofer · 2022-06-13T06:59:06Z

design/open/template-server/template-server.md

+
+## Template expansion flow
+
+#### _:warning: This flow assumes templates stored on lakeFS. :warning:_


Do you mean that we store the templates in a hidden repo?

What are the main pros that you see in storing templates as objects of a certain repo as opposed to storing them on the server? Are you thinking of making the template repo easily extendable?

Templates are in a repo. I don't know why we would want to hide it.

In general I believe that everything should be versioned, so everything should live in a repo. Among the many advantages of storing in a repo, for templates we have:

Ease of configuration. No need to overload the block adapter to read templates.

In particular, note that so far every time we decided to store files on an object store outside of lakeFS, the implementation contained serious mistakes.

Reproducibility and lineage. If templates change behind the user's back, they have no way of understanding why they got a particular expansion at a particular date.

Adaptability. I want admins to be free to change templates or add new ones.

Ease of development. When developing and testing improved or new templates, having a branch makes everything easier.

I would further claim that, as people who work on a versioning product, the burden of proof is with those who want not to store things inside the product, not with those who do.

Jonathan-Rosenberg · 2022-06-13T08:42:09Z

design/open/template-server/template-server.md

+```conf
+spark.hadoop.fs.lakefs.impl=io.lakefs.LakeFSFileSystem
+{{with $creds := new_credentials}}
+spark.hadoop.fs.lakefs.access_key={{$creds.ID}}


Suggested change

spark.hadoop.fs.lakefs.access_key={{$creds.ID}}

spark.hadoop.fs.lakefs.access_key={{$creds.Key}}

?

Not done: it's called the "ID" everywhere, in AWS and in our docs.

johnnyaug

Looking good!

Jonathan-Rosenberg · 2022-06-16T12:22:02Z

design/open/template-server/template-server.md

+| config            | exportable lakeFS configuration[^1] | `arn:lakefs:template:::config/path/to/field` |
+| object            | (small) object on lakeFS            | IAM read permission for that object          |
+| contenttype       | none; set `Content-Type:`           | (none)                                       |
+| new_credentials   | new credentials added to user       | `auth:CreateCredentials`                     |


This would potentially generate multiple credentials for each user. These credentials should be managed in some way.

True. However, given that there is no "usage" or other metadata associated with a key, there is no way to fetch one again. Let's solve it... later...

arielshaqed

Thanks!

arielshaqed · 2022-06-21T12:01:25Z

design/open/template-server/template-server.md

+| config            | exportable lakeFS configuration[^1] | `arn:lakefs:template:::config/path/to/field` |
+| object            | (small) object on lakeFS            | IAM read permission for that object          |
+| contenttype       | none; set `Content-Type:`           | (none)                                       |
+| new_credentials   | new credentials added to user       | `auth:CreateCredentials`                     |


True. However, given that there is no "usage" or other metadata associated with a key, there is no way to fetch one again. Let's solve it... later...

arielshaqed · 2022-06-21T12:01:56Z

design/open/template-server/template-server.md

+```conf
+spark.hadoop.fs.lakefs.impl=io.lakefs.LakeFSFileSystem
+{{with $creds := new_credentials}}
+spark.hadoop.fs.lakefs.access_key={{$creds.ID}}


Not done: it's called the "ID" everywhere, in AWS and in our docs.

arielshaqed · 2022-06-21T12:02:52Z

design/open/template-server/template-server.md

+spark.hadoop.fs.lakefs.impl=io.lakefs.LakeFSFileSystem
+{{with $creds := new_credentials}}
+spark.hadoop.fs.lakefs.access_key={{$creds.ID}}
+spark.hadoop.fs.lakefs.secret_key={{$creds.Secret}}


arielshaqed · 2022-06-21T13:05:08Z

Go tests and linters stuck (never showed up as running), and this is just a design doc PR with no code. So pulling by the powers vested in me as an admin.

Initial suggestion of a template server

b20fd64

Useful basis for decreased TTV, more maintainable Spark clients, and more configuration of detached GUIs.

arielshaqed requested a review from a team May 17, 2022 08:15

arielshaqed added the exclude-changelog PR description should not be included in next release changelog label May 17, 2022

Jonathan-Rosenberg reviewed May 19, 2022

View reviewed changes

nopcoder mentioned this pull request May 23, 2022

UI: Dynamic load js snippets #3398

Merged

johnnyaug reviewed Jun 7, 2022

View reviewed changes

johnnyaug reviewed Jun 9, 2022

View reviewed changes

arielshaqed requested review from johnnyaug and Jonathan-Rosenberg June 12, 2022 07:21

johnnyaug suggested changes Jun 12, 2022

View reviewed changes

arielshaqed commented Jun 12, 2022

View reviewed changes

Jonathan-Rosenberg reviewed Jun 12, 2022

View reviewed changes

arielshaqed added 2 commits June 12, 2022 21:50

[CR] Suggest templates-on-blockstore alternative; specify flow

62ee1d4

[CR] Add new_credentials template function

3543c87

arielshaqed force-pushed the design/template-engine branch from 54b04aa to 3543c87 Compare June 12, 2022 18:50

arielshaqed requested a review from johnnyaug June 12, 2022 18:50

Jonathan-Rosenberg reviewed Jun 12, 2022

View reviewed changes

talSofer reviewed Jun 13, 2022

View reviewed changes

Jonathan-Rosenberg reviewed Jun 13, 2022

View reviewed changes

johnnyaug approved these changes Jun 13, 2022

View reviewed changes

Jonathan-Rosenberg mentioned this pull request Jun 13, 2022

Wizard MVP proposal #3499

Merged

Jonathan-Rosenberg reviewed Jun 16, 2022

View reviewed changes

[CR] Close "{{with...}" block in example

a9eb7fd

arielshaqed commented Jun 21, 2022

View reviewed changes

Jonathan-Rosenberg approved these changes Jun 21, 2022

View reviewed changes

arielshaqed merged commit 09efbaf into master Jun 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial suggestion of a template server #3373

Initial suggestion of a template server #3373

arielshaqed commented May 17, 2022

Jonathan-Rosenberg left a comment •

edited

Loading

johnnyaug Jun 7, 2022

arielshaqed Jun 12, 2022

johnnyaug Jun 12, 2022

johnnyaug Jun 9, 2022

arielshaqed Jun 12, 2022

johnnyaug Jun 12, 2022

arielshaqed commented Jun 12, 2022

johnnyaug Jun 12, 2022

arielshaqed Jun 12, 2022

johnnyaug Jun 12, 2022

arielshaqed left a comment

arielshaqed Jun 12, 2022

arielshaqed Jun 12, 2022

Jonathan-Rosenberg Jun 12, 2022

arielshaqed Jun 12, 2022

arielshaqed commented Jun 12, 2022

Jonathan-Rosenberg Jun 12, 2022

arielshaqed Jun 21, 2022

talSofer left a comment

talSofer Jun 13, 2022

arielshaqed Jun 13, 2022

Jonathan-Rosenberg Jun 13, 2022 •

edited

Loading

arielshaqed Jun 21, 2022

johnnyaug left a comment

Jonathan-Rosenberg Jun 16, 2022

arielshaqed Jun 21, 2022

arielshaqed left a comment

arielshaqed Jun 21, 2022

arielshaqed Jun 21, 2022

arielshaqed Jun 21, 2022

arielshaqed commented Jun 21, 2022

	spark.hadoop.fs.lakefs.secret_key={{$creds.Secret}}
	spark.hadoop.fs.lakefs.secret_key={{$creds.Secret}}
	{{end}}


		## Template expansion flow

		#### _:warning: This flow assumes templates stored on lakeFS. :warning:_

	spark.hadoop.fs.lakefs.access_key={{$creds.ID}}
	spark.hadoop.fs.lakefs.access_key={{$creds.Key}}

Initial suggestion of a template server #3373

Initial suggestion of a template server #3373

Conversation

arielshaqed commented May 17, 2022

Jonathan-Rosenberg left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arielshaqed commented Jun 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arielshaqed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arielshaqed commented Jun 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

talSofer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jonathan-Rosenberg Jun 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnnyaug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arielshaqed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arielshaqed commented Jun 21, 2022

Jonathan-Rosenberg left a comment •

edited

Loading

Jonathan-Rosenberg Jun 13, 2022 •

edited

Loading