Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

key-value module for std-rfc #965

Merged
merged 5 commits into from
Jan 29, 2025

Conversation

NotTheDr01ds
Copy link
Contributor

@NotTheDr01ds NotTheDr01ds commented Oct 2, 2024

Overview

std-rfc/kv is a straightforward but flexible interface for setting and getting key-value pairs.

Use-cases

  • Ergonomically assign the result of a pipeline to a "variable". Just "Up Arrow" and append | kv set foo.
  • Use mid-pipeline to assign a "variable" and yet still continue the pipeline
  • Use mid-pipeline to inspect the state (like the inspect command) and examine the results via kv list or kv get after the command completes.
  • Chaining assignments/setters
  • Set universal variables once and access them even after the shell exits (or in other simultaneously running shells).

Features

  • Values can be any Nushell type other than a closure. Values are converted to and from nuons that are stored in a SQLite database.
  • The module's commands can operate on either an in-memory database (using stor) or on-disk (into sqlite).
  • Includes a hook that enables "universal variables" similar to that of the Fish shell. Universal variables are environment variables that are immediately updated and available in all Nushell sessions that are running the hook. Since they are stored in an on-disk SQLite database, they also persist when the shell exits.
  • Because kv pairs are stored as rows in a database, they can be removed, unlike normal variables.
  • kv pairs are easily converted to a record using | transpose -dr. The resulting record is, of course, easily converted to environment variables using load-env.
  • Assignment can come from either pipeline input or a positional parameter. When both are provided, the positional parameter is preferred so that $in can be used.
  • A closure can be used to modify the pipeline input before storing.
  • Can optionally return either the pipeline input (default), the value that was set, or the entire store back to continue the pipeline.

Examples

Simple get/set (positional)

use std-rfc/kv *
kv set foo 42
kv get foo
# => 42

kv list
# => ╭───┬─────┬───────╮
# => │ # │ key │ value │
# => ├───┼─────┼───────┤
# => │ 0 │ foo │    42 │
# => ╰───┴─────┴───────╯

kv drop foo
# => 42   # Returns the value that was dropped

kv list
# => ╭────────────╮
# => │ empty list │
# => ╰────────────╯

Multiple assignments, conversion to record, and conversion to environment variables

use std-rfc/kv *
42 | kv set a | kv set b | kv set c
# => 42

kv list
# => ╭───┬─────┬───────╮
# => │ # │ key │ value │
# => ├───┼─────┼───────┤
# => │ 0 │ a   │    42 │
# => │ 1 │ b   │    42 │
# => │ 2 │ c   │    42 │
# => ╰───┴─────┴───────╯

kv list | transpose -dr
# => ╭───┬────╮
# => │ a │ 42 │
# => │ b │ 42 │
# => │ c │ 42 │
# => ╰───┴────╯

kv list | transpose -dr | load-env
$env.a
# => 42
#env.b
# => 42

Using list of timestamps, pipeline examples

use std-rfc/kv *
kv set timestamps [
  2024-01-25T00:00:00
  2024-04-17T04:24:32
  2023-09-12T19:01:33
]
kv list
# => ╭───┬────────────┬──────────────────────╮
# => │ # │    key     │        value         │
# => ├───┼────────────┼──────────────────────┤
# => │ 0 │ timestamps │ ╭───┬──────────────╮ │
# => │   │            │ │ 0 │ a year ago   │ │
# => │   │            │ │ 1 │ 9 months ago │ │
# => │   │            │ │ 2 │ a year ago   │ │
# => │   │            │ ╰───┴──────────────╯ │
# => ╰───┴────────────┴──────────────────────╯

kv get timestamps
# => ╭───┬──────────────╮
# => │ 0 │ a year ago   │
# => │ 1 │ 9 months ago │
# => │ 2 │ a year ago   │
# => ╰───┴──────────────╯

# update each timestamp to 4 weeks in the future
kv get timestamps | each { $in + 4wk } | kv set timestamps
# => ╭───┬───────────────╮
# => │ 0 │ 11 months ago │
# => │ 1 │ 8 months ago  │
# => │ 2 │ a year ago    │
# => ╰───┴───────────────╯

# Create a new kv-pair named updated_timestamps
# But return the entire kv store to the pipeline
kv get timestamps
| each { $in + 4wk }
| kv set --return all updated_timestamps
# => ╭───┬────────────────────┬───────────────────────╮
# => │ # │        key         │         value         │
# => ├───┼────────────────────┼───────────────────────┤
# => │ 0 │ timestamps         │ ╭───┬──────────────╮  │
# => │   │                    │ │ 0 │ a year ago   │  │
# => │   │                    │ │ 1 │ 9 months ago │  │
# => │   │                    │ │ 2 │ a year ago   │  │
# => │   │                    │ ╰───┴──────────────╯  │
# => │ 1 │ updated_timestamps │ ╭───┬───────────────╮ │
# => │   │                    │ │ 0 │ 11 months ago │ │
# => │   │                    │ │ 1 │ 8 months ago  │ │
# => │   │                    │ │ 2 │ a year ago    │ │
# => │   │                    │ ╰───┴───────────────╯ │
# => ╰───┴────────────────────┴───────────────────────╯

# Same as above, but then update the original timestamps to use Unix epoch
kv get timestamps
| each { $in + 4wk }
| kv set --return all updated_timestamps
| transpose -dr | get timestamps
| each {format date '%s'}
| kv set timestamps
# => ╭───┬────────────╮
# => │ 0 │ 1706140800 │
# => │ 1 │ 1713327872 │
# => │ 2 │ 1694545293 │
# => ╰───┴────────────╯

kv list
# => ╭───┬────────────────────┬───────────────────────╮
# => │ # │        key         │         value         │
# => ├───┼────────────────────┼───────────────────────┤
# => │ 0 │ updated_timestamps │ ╭───┬───────────────╮ │
# => │   │                    │ │ 0 │ 11 months ago │ │
# => │   │                    │ │ 1 │ 8 months ago  │ │
# => │   │                    │ │ 2 │ a year ago    │ │
# => │   │                    │ ╰───┴───────────────╯ │
# => │ 1 │ timestamps         │ ╭───┬────────────╮    │
# => │   │                    │ │ 0 │ 1706140800 │    │
# => │   │                    │ │ 1 │ 1713327872 │    │
# => │   │                    │ │ 2 │ 1694545293 │    │
# => │   │                    │ ╰───┴────────────╯    │
# => ╰───┴────────────────────┴───────────────────────╯

Note that, in the example above, the following would be roughly equivalent:

use std-rfc/kv *
kv set timestamps1 [
  2024-01-25T00:00:00
  2024-04-17T04:24:32
  2023-09-12T19:01:33
]

[
  2024-01-25T00:00:00
  2024-04-17T04:24:32
  2023-09-12T19:01:33
] | kv set timestamps2
# => ╭───┬──────────────╮
# => │ 0 │ a year ago   │
# => │ 1 │ 9 months ago │
# => │ 2 │ a year ago   │
# => ╰───┴──────────────╯

(kv get timestamps1) == (kv get timestamps2)
# => true

The difference is that the second form (pipeline input) results in pipeline output by default. Since the first form does not have any pipeline input, the value returned to the pipeline output is null.

Using a closure to manipulate the value

ls | kv set -u foo {|| select name }
# The pipeline *input* is returned by default
# => ╭───┬────────┬──────┬────────┬─────────────╮
# => │ # │  name  │ type │  size  │  modified   │
# => ├───┼────────┼──────┼────────┼─────────────┤
# => │ 0 │ mod.nu │ file │ 5.5 kB │ 2 hours ago │
# => ╰───┴────────┴──────┴────────┴─────────────╯

# But the value that was *stored* is the result of the closure
kv get -u foo
# => ╭───┬────────╮
# => │ # │  name  │
# => ├───┼────────┤
# => │ 0 │ mod.nu │
# => ╰───┴────────╯

Note that in the example above, the universal store is used (-u). The behavior of the commands is the same, with the only difference being that the universal kv pairs are stored on-disk and persist across Nushell sessions. Remember to "clean up" test values from the universal store:

kv drop -u foo

Universal variables hook

The module includes a hook that can be added to pre_execution to create and update environment variables from the universal store. If this hook is set during your startup, all Nushell sessions will share the same universal variables.

# Add this line to config.nu
$env.config.hooks.pre_execution ++= [(kv universal-variable-hook)]

Start two different Nushell sessions. In the first:

[
  ('~/.ssh/id_ecdsa' | path expand)
  ('~/.ssh/id_for_raspberrypi' | path expand)
] | kv set -u SSH_KEYS_TO_AUTOLOAD
# => ╭───┬────────────────────────────────────╮
# => │ 0 │ /home/user/.ssh/id_ecdsa           │
# => │ 1 │ /home/user/.ssh/id_for_raspberrypi │
# => ╰───┴────────────────────────────────────╯

In the other shell, an environment variable of the same name is created and updated. There is no need to restart the shell.

$env.SSH_KEYS_TO_AUTOLOAD
# => ╭───┬────────────────────────────────────╮
# => │ 0 │ /home/user/.ssh/id_ecdsa           │
# => │ 1 │ /home/user/.ssh/id_for_raspberrypi │
# => ╰───┴────────────────────────────────────╯

@NotTheDr01ds
Copy link
Contributor Author

NotTheDr01ds commented Oct 2, 2024

Before anyone asks, I wrote this as a set of space-separated-commands, rather than a <module> subcommand because get is a built in. Attempting to define:

export def get ...

... was clobbering/shadowing the internal get. It's possible workaround this in one of two ways:

  1. Avoid get, which is possible, since it is mostly syntactic sugar for | $in.<cell-path>
  2. Make the name kv get to avoid shadowing.

Option 1 resulted in some "less readable" code and was also more "dangerous", since if someone imported the module with use std-rfc/kv * instead of use std-rfc/kv, it would shadow the built-in and clobber most any other module in memory.

So I went with Option 2 ;-)

@fdncred
Copy link
Collaborator

fdncred commented Oct 2, 2024

I have no real problem with this but I wonder what our process should be for adding something to the stdlib. Seems like it should be more formal with voting or something. Not sure.

@NotTheDr01ds
Copy link
Contributor Author

Agreed - I was thinking the same thing while i was working on these. It's something we have to figure out if we hope to open up std once again.

Possibilities:

  • Telemetry ... Kidding. Mostly. Could be opt-in with an environment variable where it just pings (http get) some URL to update a counter. Longer-term, pie-in-the-sky.

  • Probably something more realistic like an "issue" thread here for commenting and upvotes. One thread per command (or potentially module). Reaches some level of upvotes (currently pretty low) and it gets moved forward?

  • Special-case core-team votes. I seem to recall you saying that right now it takes two core team to move a PR forward (or something like that; not sure if there's anything official).

@NotTheDr01ds NotTheDr01ds marked this pull request as draft October 3, 2024 15:23
@NotTheDr01ds
Copy link
Contributor Author

I'll add tests for this one as well before bringing out of draft.

Any thoughts on whether I should use msgpackz in place of nuon for the value pickling?

@fdncred
Copy link
Collaborator

fdncred commented Oct 3, 2024

You'd have to do some tests to see if it makes any difference because with msgpackz you have to pay for creating msgpack and brotli for compression/decompression.

@NotTheDr01ds NotTheDr01ds marked this pull request as ready for review January 29, 2025 17:56
@NotTheDr01ds NotTheDr01ds merged commit dbcecf2 into nushell:main Jan 29, 2025
1 check passed
@NotTheDr01ds NotTheDr01ds deleted the std-cand-kv-stor branch February 9, 2025 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants