Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible config #126

Merged
merged 46 commits into from
Dec 23, 2022
Merged

More flexible config #126

merged 46 commits into from
Dec 23, 2022

Conversation

mmalenic
Copy link
Member

@mmalenic mmalenic commented Dec 23, 2022

Closes #102

Apologies for the very large PR, there is a bunch of changes here.

Changes

  • Config is now much more flexible, and can configure from environment variables and a TOML config file.
    • Envy is swapped out for figment, which is a hierarchical system for config.
    • Config files are read using a command line option (--config) or from an environment variable.
    • When no command line options are passed, this results in a default configuration with the same behaviour as before this PR.
    • Environment variables override config files.
  • Add multiple bucket support.
  • Add additional CORS features, like allow origins, methods, headers, and expose headers.
  • Multiple RegexResolvers are allowed, which can be configured using an array, where the first matching resolver is the one used for the query ID mapping.
    • Each resolver can use local storage, or AWS S3 storage.
  • The resolvers also take into account the query parameters. I.e. It is possible to specify that a given resolver only resolves an ID if the format is Bam, for example.
  • The local storage data server can be enabled/disabled using a flag in the config file.
  • Re-done documentation changes for config.
  • Additional tests for new behaviour.
  • Various small refactors and re-organisation of code.
  • A --print-default-config flag can be passed as a command line arg to print the default TOML config to stdout.

Issues

There are still some issues that can be addressed in other PRs.

  • No support for different file endings yet - i.e. BAM files always must end in .bam, issue Arbitrary file endings for mapped resources #127.
  • This does increase the complexity of the configuration, however I've attempted to make it as opt-in as possible. Most of the same environment variables are available. The most significant increase in complexity is in the [[resolvers]] array of tables, however this can still be set from an environment variable, and the additional features are not required. The defaults should be the same when running htsget-rs with no configuration.

This PR and branch is also an experiment on using somewhat more conventional commits to see how we feel about it. I haven't used many of the standard types such as feat or bug, but I have added custom types that mostly describe the scope of the change.

Also happy to revert this and squash it as one if we think it's better.

@mmalenic mmalenic self-assigned this Dec 23, 2022
@mmalenic mmalenic added the enhancement New feature or request label Dec 23, 2022
Copy link
Member

@brainstorm brainstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, that was a beefy PR, sorry it took me a while to go through :-O As you mentioned, try to keep them smaller in the future if you can!

Other than the minor details I commented and running cargo clippy --all-targets --all-features --fix and cargo fmt, I'd advise merging it today instead of later in January as you have this fresh on your head? ;)

Happy xmas and NY otherwise!

| `bucket` | The AWS S3 bucket where resources can be retrieved from. | String | `''` |

Additionally, the resolver component has a feature, which allows resolving IDs based on the other fields present in a query.
This is useful as allows the resolver to match only match an ID, if a particular set of query parameters are also present. For example,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*match only an ID


The resolvers component of htsget-rs is used to map query IDs to the location of the resource. Each query that htsget-rs receives is
'resolved' to a location, which a data server can respond with. A query ID is matched with a regex, and is then mapped with a substitution string that
has access to the regex capture groups. Each resolver is an array of TOML of tables that attempts to match a query ID. This array matches IDs in order, meaning that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each resolver is an array of TOML of tables

I'd rephrase that :-?

This is useful as allows the resolver to match only match an ID, if a particular set of query parameters are also present. For example,
a resolver can be set to only resolve IDs if the format is also BAM.

To set the resolver 'allow guard', add a `[resolver.allow_guard]` table, and set the following options:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's an "allow guard" resolver?

allow_interval_start = 100
allow_interval_end = 1000
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for example, this resolver would... only allow queries to chr1 between positions 100 and 1000 on BAMs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll clarify that a bit in the docs.


#### Configuring htsget-rs with environment variables

All the htsget-rs config options can be set by environment variables. The ticket server, data server and service info options are flattened and can be set directly using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the htsget-rs config options can be set by environment variables

, which is convenient for runtimes such as AWS Lambda

htsget-config/src/config/mod.rs Show resolved Hide resolved
@@ -493,7 +485,7 @@ mod tests {
assert!(router.get_route(&Method::DELETE, &uri).is_none());
},
&config,
formatter_from_config(&config),
formatter_from_config(&config).unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try to avoid this series of .unwrap() here and below if possible

Copy link
Member Author

@mmalenic mmalenic Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this one could definitely be avoided.

Although, this made me think of a different point. All our tests use unwraps to panic, however there is also support to return a Result from a test which could lead to more informative errors and make them more concise. Do you think it would be good for our tests to return a Result? (obviously this is for a different PR, and not super high priority)

Copy link
Member

@brainstorm brainstorm Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tend to prefer avoiding unwrap() even in small examples for that kind of reason. Different issue/PR if that's a big refactor, agreed.

@mmalenic
Copy link
Member Author

Thanks for the review! Happy Xmas 😃

How do you feel about the commits and the messages?
This branch is actually cherry-picked from https://github.com/umccr/htsget-rs/tree/more-flexible-config so I could reword and squash commits with a bit more ease.

From now on, I would try and just use a particular commit style so that I don't have to reword things.

@brainstorm
Copy link
Member

How do you feel about the commits and the messages?

I still haven't had time to review the integrations with conventional commits, but I think that feat(config) might help in categorising the effort we put on fix/feat/chore/docs categories other than just components? Just a thought, no need to change every single commit now, we're just testing the waters and we don't have automation in place... yet ;)

@mmalenic
Copy link
Member Author

mmalenic commented Dec 23, 2022

but I think that feat(config) might help in categorising the effort we put on fix/feat/chore/docs categories other than just components

Yep, that's fair. Also need to do a bit more review with how this works, but would each feat add one minor version bump when the changelog is generated? Or would there only be one increase in the minor version every time a changelog is generated (as long as there is at least one feat), even if there is more than one feat since the last changelog?

@mmalenic mmalenic merged commit 5f0ea19 into main Dec 23, 2022
@mmalenic mmalenic deleted the more-flexible-config-rename branch December 23, 2022 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider multiple buckets support
2 participants