Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Enabling configuration of Dataverse using simple file(s) #10684

Open
poikilotherm opened this issue Jul 14, 2024 · 1 comment
Labels
Component: Code Infrastructure formerly "Feature: Code Infrastructure" Component: Containers Anything related to cloudy Dataverse, shipped in containers. Feature: Installation Guide Feature: Installer Size: 10 A percentage of a sprint. 7 hours. Type: Feature a feature request User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh

Comments

@poikilotherm
Copy link
Contributor

poikilotherm commented Jul 14, 2024

Overview of the Feature Request

Let's enable using a TOML file to configure JVM options for starters. Ideally extend to configuring "DB" options and REST API DTO-based options, too, providing a unified approach to how one configures a Dataverse instance. Basically, enable sth like /etc/dataverse/config.tomlin the spirit of many UNIX/Linux services.

What kind of user is the feature intended for?

Sysadmin

What inspired the request?

Many not-so-experienced Dataverse admins have a hard time setting up their JVM options right.
With our now ever growing list of options (recently the PID providers were added), it's easy to end up with a mess of options.
It's friggin' complicated!

DB options these days cannot be provisioned from the same place as JVM options. Having to very different ways of configuring things as well as adding API endpoints for model based config approaches (auth, licenses, ...) is not making it easier for newbies and non-superhero-admins to follow.

Even though our JVM options start to be scoped and hierarchical, in reality the configuration requires a flat structure as system properties in domain.xml, long env var names, etc. There is one exception: the Dir Config Source allows to create folders with files. But it seems hardly used in classic installations and is clearly geared towards container usage.

Enabling configuration file(s) allows to provision a Dataverse instance from configuration management system in a much easier way. There is loads of tooling around to manage TOML files in idempotent ways, while editing domain.xml is a lot harder. Humans tend to like TOML more than YAML and even more than XML. (Let alone the fact that domain.xml is a VERY large and complex file.) Serving one or more of these files from a K8s ConfigMap, maybe even generated by a K8s Operator is simple and makes people have more "control" over their deployments.

What existing behavior do you want changed?

I want to be able to configure JVM options using a TOML file. Here's an example.

Instead of configuring all these options:

DATAVERSE_PID_PROVIDERS: "zb-test"
DATAVERSE_PID_DEFAULT_PROVIDER: "zb-test"
DATAVERSE_PID_ZB_TEST_TYPE: "datacite"
DATAVERSE_PID_ZB_TEST_LABEL: "DataCite Test Fabrica"
DATAVERSE_PID_ZB_TEST_AUTHORITY: "10.0346"
DATAVERSE_PID_ZB_TEST_SHOULDER: "JUELICH-DATA-BETA/"
DATAVERSE_PID_ZB_TEST_IDENTIFIER_GENERATION_STYLE: "randomString"
DATAVERSE_PID_ZB_TEST_DATACITE_REST_API_URL: "https://api.test.datacite.org/"
DATAVERSE_PID_ZB_TEST_DATACITE_MDS_API_URL: "https://mds.test.datacite.org/"
DATAVERSE_PID_ZB_TEST_DATACITE_USERNAME: "FOO.BAR"
DATAVERSE_PID_ZB_TEST_DATACITE_PASSWORD: "whatever"

Let's put this into a TOML file:

[dataverse.pid]
providers        = "zb-test"
default-provider = "zb-test"

[dataverse.pid.zb-test]
type                        = "datacite"
label                       = "DataCite Test Fabrica"
authority                   = "10.0346"
shoulder                    = "JUELICH-DATA-BETA/"
identifier-generation-style =  "randomString"

[dataverse.pid.zb-test.datacite]
rest-api-url = "https://api.test.datacite.org/"
mds-api-url  = "https://mds.test.datacite.org/"
username     = "FOO.BAR"
password     = "whatever"

Isn't this a lot easier to read and maintain? (It's a lot more DRY-compliant and less chatty...)

A different way to write this, which might be preffered by some is like this:

[dataverse.pid]
providers        = "zb-test"
default-provider = "zb-test"

[dataverse.pid.zb-test]
type                        = "datacite"
label                       = "DataCite Test Fabrica"
authority                   = "10.0346"
shoulder                    = "JUELICH-DATA-BETA/"
identifier-generation-style =  "randomString"
datacite.rest-api-url       = "https://api.test.datacite.org/"
datacite.mds-api-url        = "https://mds.test.datacite.org/"
datacite.username           = "FOO.BAR"
datacite.password           = "whatever"

Any brand new behavior do you want to add to Dataverse?

It's not really brand new yet when talking about JVM options. It would be brand new when talking about DB options and stuff like auth providers, licenses etc (which are configured by REST API calls with a DTO).

Any open or closed issues related to this feature request?

@poikilotherm poikilotherm added Type: Feature a feature request Component: Code Infrastructure formerly "Feature: Code Infrastructure" Feature: Installer Feature: Installation Guide User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh Component: Containers Anything related to cloudy Dataverse, shipped in containers. Size: 10 A percentage of a sprint. 7 hours. labels Jul 14, 2024
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Jul 14, 2024
@cmbz
Copy link

cmbz commented Jul 18, 2024

2024/07/18 - 6.4 proposal request from @poikilotherm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Code Infrastructure formerly "Feature: Code Infrastructure" Component: Containers Anything related to cloudy Dataverse, shipped in containers. Feature: Installation Guide Feature: Installer Size: 10 A percentage of a sprint. 7 hours. Type: Feature a feature request User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Projects
Status: Important
Status: No status
Status: 🔍 Interest
Development

No branches or pull requests

2 participants