Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP/Proposal] Flux User Modules #4296

Closed
nathanielc opened this issue Nov 23, 2021 · 15 comments
Closed

[WIP/Proposal] Flux User Modules #4296

nathanielc opened this issue Nov 23, 2021 · 15 comments

Comments

@nathanielc
Copy link
Contributor

We want the Flux ecosystem to facilitate sharing Flux code. Currently this is not possible as Flux can only import code provided as part of the standard library. The proposal is to build the APIs necessary to host and share Flux code. Our plan is to limit the scope with which code can be shared and grow that scope as we learn more. An MVP of this proposal is to enable organizations to share Flux code within their own organization.

Technical Design

There are two parts to sharing Flux code:

  • Host Flux code in a registry
  • Update Flux engine to be able to load code from the filesystem and from a registry

Registry

The registry will need to store and serve Flux modules. An API has been previously designed and a PDF dump of its swagger doc can bee seen here. This API will very likely change but that gives an idea of what we are thinking.

The rough API will look like this:

Method Route Description
GET /modules Return all modules owned by the user/org
POST /modules Create a new module by uploading a zip archive
GET /module/{path} Download a specific module
GET /versions/{path} Get a list of versions for a specific module

Versioning

Flux modules will be immutable, i.e. cannot be deleted. To make a change to a module create a new version of that module. Version number will follow semantic versioning where a breaking change must increment the major version number.

Flux Engine

The Flux engine will need to support loading Flux code. This will be the process of downloading the module from the registry and including it during the execution of the Flux script.

We need to design a few aspects of this problem.

Contents of the module itself

A module must contain the code to run and other metadata in order integrate it into the runtime. This will likely exists as a zip archive of the semantic graph of the packages included in a module. Note a module may contain multiple packages.

Specification of dependencies

How does a Flux package define which versions of a module it wants to import? Some design considerations:

Flux is lightweight and a scripting language, we do not want to make it onerous on the author of a Flux script to define versions of a dependency. This means we will want clear sane defaults when a version is not specified.

Resolving versions of dependencies

Resolving a tree of dependencies will need to solve some important questions:

  • Can we have multiple versions of the same Flux module included into the engine simultaneously?
  • If not, how do we pick a version if we need to pick just one?
  • How do we keep this simple?

It is proposed that if we need to resolve to a single version of a dependency then we use Go's Minimum Version method for its simplicity. In short packages define the minimum version of a package they need and then the maximum version of all the minimums is used. Semantic versioning must be followed closely for this strategy to work. We could possibly enforce/lint semantic versioning when a module is created by directly comparing the API of the module to the previous version and generating the new version number automatically.

@wolffcm
Copy link

wolffcm commented Dec 9, 2021

Partly I'm commenting just to indicate that I've read this. I like the basic design, I'm sure there will be lots of complexity and challenges in the details.

We could possibly enforce/lint semantic versioning when a module is created by directly comparing the API of the module to the previous version and generating the new version number automatically.

This seems possible to me, and a really neat idea. We could even optionally auto-generate the version number based on some analysis maybe?

  • Breaking changes bump the major version (an existing function goes away, or its parameters names or types change)
  • Additive changes bump the minor version number (adding a new function, or adding a new parameter with a default)
  • All other changes bump the patch number

@jsternberg
Copy link
Contributor

Do we have a plan to allow for a shared registry so that others can upload code and then use it? Or is the plan for code to be scoped by organization?

We would likely abstract this from the Flux engine itself using the dependency subsystem. I see that the API is /modules to list all modules owned by the user/org which seems like very influxdb specific terminology that we may want to steer clear of if this is something native to Flux itself.

I'll try to write out a bit more later, but we may want to talk about the various pieces and how they fit together. I'll propose something when I have time to think it over and write it out.

@nathanielc
Copy link
Contributor Author

Do we have a plan to allow for a shared registry so that others can upload code and then use it? Or is the plan for code to be scoped by organization?

Long term yes we want what we build today to be sharable to a wider perhaps even global scale. That said its explicitly not a requirement of this initial implementation. My hope/expectation is that we can use this more limited scope implementation to learn what works well and what doesn't before we try to build a fully public registry.

@timhallinflux
Copy link
Contributor

From a broader product integration perspective... and thinking through the API, CLI, and UI aspects. I would very much like to see the corresponding issues be created and linked to this feature:

CLI - provide CRUD commands. leverage the template packager and stacks functionality to ensure that user modules can be packaged and distributed either as standalone entities and/or we can use the community templates repo (or other repos) to provide a Flux shared registry.

UI - needs a CRUD page to manage the Flux modules. Modules imported through a template should land here...similarly to the way that dashboards/tasks can be moved around.

I would like to see modules allow for labels. This allows a user to provide additional meta data about the module which can be used by the CLI/UI.

@wz2b
Copy link

wz2b commented Mar 11, 2022

What I am asking for in #23179 is similar to this and perhaps they can share code but what I want are really local packages to help reduce boilerplate and avoid repeating myself (DRY). The idea of a 'registry' and community libraries that you can fetch is interesting, but some of the security considerations are a little scary to me. My thought was to store user-defined functions in packages that are stored in the metadata database, and when you do an 'import' if it's not a standard package it looks there to see if it's a user-defined package. So I think what I was thinking of in #23179 was a whole lot simpler than this but maybe they can leverage big parts of each other's implementations.

@nathanielc
Copy link
Contributor Author

The idea of a 'registry' and community libraries that you can fetch is interesting, but some of the security considerations are a little scary to me.

Understood, we intend to build this API as part of the normal product API and then later make it a community features. Meaning that you will first get access to organization scoped packages that only your org can see and use. We intend for the community bits to come later and will likely use the same API but allowing for differences in the fact it would be a public registry.

Said another way we tend to design and build a registry, any given registry could be private or public. Our plan is to build and host the private per org registries first and let a community public one come after.

@wz2b
Copy link

wz2b commented Mar 11, 2022

I'm a big fan of code signing, using either a public chain of trust or Influx Data makes a signing CA. The purpose of that isn't necessarily to verify the code, but to verify the identity of the person who signed to it. So maybe something like that is in the cards.

My outstanding concern about your approach is that if I'm an individual developer on maybe InfluxDB oss, I don't want to have to mess with building packages ... I just want to stick a function / package in my local system and I want to be able to easily edit it. Performance wise I don't even care if it's an import or more like an #include.

@wz2b
Copy link

wz2b commented Mar 24, 2022

Another idea that occurred to me is that there's already a concept of dashboard variables in the UI. The use of those seems to be restricted to just dashboard use. It seems like another possible vector for what I want would be to add a function type (in addition to Query, Map, etc.) then allow those to be persisted in some way that normal query invocation can import them, from any query or task (outside of the UI).

@nathanielc
Copy link
Contributor Author

After some internal discussion here is an updated proposal with more specifics:

Flux Modules Specification

A Flux module is a collection of Flux packages.
Flux modules are versioned and can be imported by Flux code using the import statement.

Module Versions

All modules will be versioned using a semantic version number MAJOR.MINOR.PATCH.
Once a module version has been published it cannot be modified.
A new version of the module must be published.

Importing Modules

Flux modules are imported using an import statement:

import "myorg/mypackage" // imports the latest version

Imports may specify the minimum version of a module that needs to be imported by including a version number like this:

import "myorg/mypackage@1.5.6" // imports _at least_ version 1.5.6

At least version 1.5.6 of the module myorg/mypackage will be imported when running the above code.
It is possible that a new version is imported if a transitive dependency specifies a new version is required.

If a module has a major version of 2 or greater specify the major version of a module like this:

import "myorg/mypackage/v2" // imports the latest published 2.x version

Imports must not specify the major versions for v0 and v1 as it is not necessary.
A change from v0 to v1 may include a breaking change but once v1 is published any future breaking changes will be a new major version.
This means that import "myorg/mypackage" imports the latest 0.x or 1.x version of the module.

An import may also specify the version pre which is the most recent pre-release version of the module.

import "myorg/mypackage@pre" // imports the latest pre-release version
import "myorg/mypackage/v2@pre" // imports the latest pre-release 2.x version

Using pre is intended to make developing modules easier as a new version need not be released in order to import and test changes to the module.

NOTE: The expectation is that we can build a workflow that makes it easy to publish new pre-release version (i.e. a timestamp).

Version Resolution

When multiple modules both depend on a specific version of another module the maximum version of the minimum versions is used.
Major versions of a module are considered different modules,therefore multiple major versions of a module may be imported into the same Flux script.

For example given the following Flux code

// a.flux
package a

import "foo@1.1.0"
// b.flux
package b

import "foo@1.2.0"
// main.flux
package main

import "a"
import "b"

Package main depends on module foo via both of the module a and b.
However a and b specify different versions of foo.
The possible versions of foo include 1.1.0 and 1.2.0.
Flux will pick the maximum version of these possible version so version 1.2.0 of foo is used.
This is safe because module a has specified that it needs at least version 1.1.0 of foo and that constraint is satisfied.

Modules API

The modules API will support the following HTTP paths, this is heavily inspired by the Go proxy API.

In the following table, $base is the anchor point of the API, $module is a module path, and $version is a version.
For example, to download the zip file for a module myorg/mypackage at version 0.5.6, for an API endpoint anchored at http://example.com/fluxmod/ use this path http://example.com/fluxmod/myorg/mypackage/@v/1.5.6.zip.

Method Path Description
GET $base/$module/@v/list Returns a list of known versions of the given module in plain text, one per line.
GET $base/$module/@v/$version.zip Returns a zip file of the contents of the module at a specific version.
GET $base/$module/@latest Returns the highest released version, or if no released versions exist the highest pre-release version of the given module in plain text on a single line.
GET $base/$module/@pre Returns the highest pre-released version of the given module in plain text on a single line.
POST $base/$module/@v/$version.zip Publish a new version of the module where the POST body contains the zip file contents of the module.

This API has been designed such that it can be served from the filesystem and therefore cached on the filesystem.
Therefore a $base URL of file:// should work.

NOTE: The Go proxy API has a specification for case insensitive file systems, do we want/need the same?
From the Go docs:

To avoid ambiguity when serving from case-insensitive file systems, the $module and $version elements are case-encoded by replacing every uppercase letter with an exclamation mark followed by the corresponding lower-case letter. This allows modules example.com/M and example.com/m to both be stored on disk, since the former is encoded as example.com/!m.

Examples

The following examples use a $base of /fluxmod/

GET /fluxmod/myorg/mypackage/@v/list # Return a list of versions for the module myorg/mypackage
GET /fluxmod/foo/@v/1.3.4.zip        # Return a zip file of the foo module at version 1.3.4
GET /fluxmod/foo/v2/@v/2.3.4.zip     # Return a zip file of the foo module at version 2.3.4
GET /fluxmod/foo/@latest             # Return the latest 0.x or 1.x version of foo
GET /fluxmod/foo/@pre                # Return the latest  0.x or 1.x pre-release version of foo
GET /fluxmod/foo/v2/@latest          # Return the latest 2.x release version of foo
GET /fluxmod/foo/v2/@pre             # Return the latest 2.x pre-release version of foo

@jsternberg
Copy link
Contributor

I'm generally good with the above API. I think my only qualm would be the pre api. I think we'll need a special api for @pre because of race conditions with it being a potentially mutable endpoint.

First, we need a way to publish the pre release version. Second, we need a way to download the zip file for the pre-release version since we'll need to do that to avoid a race condition between hitting the get endpoint and the download endpoint. With that, I propose that we remove $base/$module/@pre and replace it with allowing pre.zip to be downloaded and uploaded using the same GET and POST endpoints.

I also wonder if we should make some stipulation that pre.zip will reference the same module as latest whenever the latest version is updated. We should also specify how latest is determined. Is latest the latest uploaded or the highest version? I'd say the highest version.

@pre
Copy link

pre commented Aug 1, 2022

I’ll have that special API, thanks

@jdstrand
Copy link
Contributor

jdstrand commented Aug 30, 2022

FYI, I took a look at this with my security hat on (cc @nathanielc). Comments in-line

After some internal discussion here is an updated proposal with more specifics:

Flux Modules Specification

A Flux module is a collection of Flux packages. Flux modules are versioned and can be imported by Flux code using the import statement.

Module Versions

All modules will be versioned using a semantic version number MAJOR.MINOR.PATCH. Once a module version has been published it cannot be modified. A new version of the module must be published.

This is excellent. It makes me curious how the implementation will guarantee this, but that is for another day :)

Importing Modules

Flux modules are imported using an import statement:

import "myorg/mypackage" // imports the latest version

Imports may specify the minimum version of a module that needs to be imported by including a version number like this:

As mentioned elsewhere, this seems to be heavily inspired by Go (fine). Go supports specifying a hash (which gets turned into a pseudo-version). Do you plan to support hashes or exact matches? If not, perhaps it makes sense to update the specification to say this is unsupported.

...

Version Resolution

...
Package main depends on module foo via both of the module a and b. However a and b specify different versions of foo. The possible versions of foo include 1.1.0 and 1.2.0. Flux will pick the maximum version of these possible version so version 1.2.0 of foo is used. This is safe because module a has specified that it needs at least version 1.1.0 of foo and that constraint is satisfied.

How will version conflicts be handled? Eg, one specifies 1.2.0 and another 2.0.0? Are both downloaded and used? Is this an error? Something else? Adding this answer to the spec would be nice.

Modules API

The modules API will support the following HTTP paths, this is heavily inspired by the Go proxy API.

In the following table, $base is the anchor point of the API, $module is a module path, and $version is a version. For example, to download the zip file for a module myorg/mypackage at version 0.5.6, for an API endpoint anchored at http://example.com/fluxmod/ use this path http://example.com/fluxmod/myorg/mypackage/@v/1.5.6.zip.
Method Path Description
GET $base/$module/@v/list Returns a list of known versions of the given module in plain text, one per line.
GET $base/$module/@v/$version.zip Returns a zip file of the contents of the module at a specific version.
GET $base/$module/@latest Returns the highest released version, or if no released versions exist the highest pre-release version of the given module in plain text on a single line.
GET $base/$module/@pre Returns the highest pre-released version of the given module in plain text on a single line.
POST $base/$module/@v/$version.zip Publish a new version of the module where the POST body contains the zip file contents of the module.

How is authn and authz expected to be done, especially wrt org isolation? Eg, in the short term AIUI modules will be scoped to the org name. In the above, it sounds like the $base is http://example.com/fluxmod/myorg and so the module lives under myorg (fine, but see later comment on disambiguation). Can you add some text on how authn is expected to be done and how the authenticated request is allowed to read or write to myorg (ie, describe authz plans at a high-level)?

AIUI, later phases of this feature will allow cross-org access to modules. Putting aside discussion on how we deal with bad actors and malware, can you add your thoughts on how this might be done? Is it expanding authz? Adding new metadata to the module for 'public' vs 'private' (ie, org-scoped) where 'public' might simply be GET-able via https:// without authn? Something else?

Also, if it is envisioned that myorg corresponds to the organization within InfluxDB 2.x or Cloud 2, there has been quite a bit of talk around lifting the requirement that the organization name be unique within (at least) Cloud 2 (since the APIs and internal services have all been/are in the process of being converted to ignoring org name and always using orgid). This seems like it could be a problem for disambiguating $base (which could be solved via authn/authz if the file:// path/org db column/etc is adjusted to the orgid (plus for a public url, we wouldn't necessarily want to leak the orgid); other solutions certainly exist).

This API has been designed such that it can be served from the filesystem and therefore cached on the filesystem. Therefore a $base URL of file:// should work.

There is great utility in this, but will have to be careful with how this is handled by remote users if they can control $base. Ie, we don't want to allow arbitrary file writes or reads from the server when $base is set to weird stuff like ../../../../some/path, etc. In writing that, we'll want good input validation regardless of file:// or https://.

nathanielc added a commit that referenced this issue Sep 23, 2022
The implementation of this work is still in-progress see #4296
nathanielc added a commit that referenced this issue Nov 2, 2022
The implementation of this work is still in-progress see #4296
nathanielc added a commit that referenced this issue Nov 2, 2022
The implementation of this work is still in-progress see #4296
nathanielc added a commit that referenced this issue Nov 8, 2022
The implementation of this work is still in-progress see #4296
nathanielc added a commit that referenced this issue Nov 8, 2022
The implementation of this work is still in-progress see #4296
@poldown
Copy link

poldown commented Apr 19, 2023

Is this feature scheduled to be implemented sometime soon? I'd be happy to get involved and help implementing it

@garylfowler
Copy link

This feature is on hold right now, as our focus right now is getting our new storage engine integrated into all of our products.

@github-actions
Copy link

This issue has had no recent activity and will be closed soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants