Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial suggestion of a template server #3373

Merged
merged 4 commits into from
Jun 21, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions design/open/template-server/template-server.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Template Server

## About

The template server adds a simple generic templating capability to the
lakeFS web server. It adds server-side support for administrators and
developers to serve templated web objects securely.

Among the planned applications:

* Serve capabilities to clients. Serving capabilities from a template
lets developers add them easily but allows administrators to control
what is actually served.
* Serve prepopulated config files for popular setups. For instance we
can serve `.lakectl` and Hadoop/Spark config files. These sometimes
need to be modified, for instance when lakeFS does not know the name
of its endpoint.

This can be an important part of for time-to-value. Using templates
we offer admins some flexibility in configuration: for instance they
might define templates with different values, and clients can select
which to receive by path.

## Template format

We use Golang [text/template][text/template] to expand most templates.
For browsers (based on filename, e.g. `*.html`) we instead must use
[html/template][html/template] for safety.

Accesses to http://<lakefs>/api/v1/templates/some/path read the template
from `some/path`.

The template is read by lakeFS from a specific (configured) prefix, by
default `lakefs://lakefs-templates/`. We might decide to support
various object storage types; `lakefs` gives the best management and lineage
capabilities, but possible supporting the production blockstore types `s3`,
`azure` and `gcs` will simplify deployment of some lakeFS deployments.

If storing on lakeFS, IAM authorizes users to see this template by
`fs:ReadObject`, at which point expansion can start. If we want to store
templates on the blockstore types, we will probably need to add a new IAM
action type for template expansion, or alternatively overload
`fs:ReadObject` to support off-lakeFS URIs. We will supply the following
replacements (at least):

| template function | value source | IAM permissions |
|:------------------|:------------------------------------|:---------------------------------------------|
| config | exportable lakeFS configuration[^1] | `arn:lakefs:template:::config/path/to/field` |
| object | (small) object on lakeFS | IAM read permission for that object |
| contenttype | none; set `Content-Type:` | (none) |
| new_credentials | new credentials added to user | `auth:CreateCredentials` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would potentially generate multiple credentials for each user. These credentials should be managed in some way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. However, given that there is no "usage" or other metadata associated with a key, there is no way to fetch one again. Let's solve it... later...


A configuration variable will be "exportable" if its struct field is tagged
`` `export:yes` ``, or (if implementing that is too hard) if its type
implements

```go
interface Exportable {
Export() string
}
```

IAM is still checked for exportable configuration variables (which will be
ON by default for users, but can be removed).

Adding a freeform dictionary to our config will allow admins to set up
any needed configuration.

We shall supply the user object in `.user` and the parsed query args in
`.query`[^2], to allow conditional operations. We will probably add new
functions or objects.

## Template expansion flow

#### _:warning: This flow assumes templates stored on lakeFS. :warning:_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do you mean that we store the templates in a hidden repo?
  • What are the main pros that you see in storing templates as objects of a certain repo as opposed to storing them on the server? Are you thinking of making the template repo easily extendable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Templates are in a repo. I don't know why we would want to hide it.

In general I believe that everything should be versioned, so everything should live in a repo. Among the many advantages of storing in a repo, for templates we have:

  1. Ease of configuration. No need to overload the block adapter to read templates.

    In particular, note that so far every time we decided to store files on an object store outside of lakeFS, the implementation contained serious mistakes.

  2. Reproducibility and lineage. If templates change behind the user's back, they have no way of understanding why they got a particular expansion at a particular date.

  3. Adaptability. I want admins to be free to change templates or add new ones.

  4. Ease of development. When developing and testing improved or new templates, having a branch makes everything easier.

I would further claim that, as people who work on a versioning product, the burden of proof is with those who want not to store things inside the product, not with those who do.


1. User accesses the template
http://<lakefs>/api/v1/templates/main/expand/me.
1. lakeFS expands to a lakeFS path (by default this is
lakefs://lakefs-templates/main/expand/me).
1. lakeFS uses IAM to verify user has `fs:ReadObject` permission on this
object.
1. Object is parsed into a text/template.
1. lakeFS creates template
[`Funcs`](https://pkg.go.dev/text/template#FuncMap) that will check IAM
permissions as required for expansion.
1. Template gets expanded and returned to user.

### Credentials

We supply a function `new_credentials` that creates a new set of credential
for the user, registers them, and returns them. See this [example of template
functions returning multiple values][example_template_multiple].

This lets us implement the requirement to create and return new credentials
as part of a downloaded configuration.

## Examples

(All examples subject to change as we define object format, and decide
what functions are allowed.)

Set a template that looks like this:

```conf
spark.hadoop.fs.lakefs.impl=io.lakefs.LakeFSFileSystem
{{with $creds := new_credentials}}
spark.hadoop.fs.lakefs.access_key={{$creds.ID}}
Copy link
Contributor

@Jonathan-Rosenberg Jonathan-Rosenberg Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spark.hadoop.fs.lakefs.access_key={{$creds.ID}}
spark.hadoop.fs.lakefs.access_key={{$creds.Key}}

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done: it's called the "ID" everywhere, in AWS and in our docs.

spark.hadoop.fs.lakefs.secret_key={{$creds.Secret}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spark.hadoop.fs.lakefs.secret_key={{$creds.Secret}}
spark.hadoop.fs.lakefs.secret_key={{$creds.Secret}}
{{end}}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

{{end}}
spark.hadoop.fs.lakefs.endpoint={{"local.templates.lakefs_url" | config}}
# Or pass something from "local" to give users a different key. In any case
# requires IAM authorization to fetch.
spark.hadoop.fs.s3a.api.access_key={{"blockstore.s3.credentials.access_key_id" | config}}
spark.hadoop.fs.s3a.api.secret_key={{"blockstore.s3.credentials.secret_access_key" | config}}
```

This is a complete ready configuration for using lakeFSFS. In future,
we could leverage it to provide more configuration, but it immediately
allows a single well-known point to set up.

A future lakeFSFS version might read from the configured endpoint when
it is loaded, allowing that to be the only required configuration.


[text/template]: https://pkg.go.dev/text/template
[html/template]: https://pkg.go.dev/html/template
[example_template_multiple]: https://go.dev/play/p/1sfcodvxNvr

[^1]: note that this effectively includes environment variables!
[^2]: might instead be a function.