Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Dictionary Data Type #66

Open
TheMayoras opened this issue May 21, 2024 · 24 comments
Open

[Feature] Dictionary Data Type #66

TheMayoras opened this issue May 21, 2024 · 24 comments
Labels
enhancement New feature or request help wanted Extra attention is needed syntax

Comments

@TheMayoras
Copy link

The data-types in Amber could be even better if they were extended to include a dictionary/map type. Something like [Text, Num].

@Ph0enixKM
Copy link
Member

Ph0enixKM commented May 21, 2024

That is definitely something worth exploring. It would be even better to make it conform to JSON format - this way we wouldn't need to do conversions of the type... although I'm not sure if this would be the best idea performance wise....

@b1ek
Copy link
Member

b1ek commented May 22, 2024

That is definitely something worth exploring. It would be even better to make it conform to JSON format - this way we wouldn't need to do conversions of the type... although I'm not sure if this would be the best idea performance wise....

perhaps consider some js-like syntax?

let obj = {
    foo: "bar"
}
echo "foo is " + obj[foo]
echo "0 is " + obj[0]

then i guess it should compile into something kind of like this

obj_k=( foo )
obj_v=( bar )
get_obj_by_key() {
	for i in "${!obj_k[@]}"; do
		if [[ ${obj_k[$i]} == "$1" ]]; then
			echo ${obj_v[$i]}
			return
		fi
	done
}
get_obj_by_index() {
	for i in "${!obj_v[@]}"; do
		if [[ "$i" == "$1" ]]; then
			echo ${obj_v[$i]}
			return
		fi
	done
}

echo foo is $(get_obj_by_key foo)
echo 0 is $(get_obj_by_index 0)

not sure how to handle nested objects, though

@Ph0enixKM
Copy link
Member

perhaps consider some js-like syntax?

I think that JS-like syntax here is on spot. Having to support more complex data could be harder in bash. We can discuss it on the community discord server

@Ph0enixKM Ph0enixKM added enhancement New feature or request help wanted Extra attention is needed labels May 22, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Amber Project May 25, 2024
@Ph0enixKM Ph0enixKM changed the title [Feature Request] Dictionary Data Type ✨ [Feature Request] Dictionary Data Type May 25, 2024
@Ph0enixKM Ph0enixKM moved this from 🆕 New to 📋 Todo in Amber Project May 25, 2024
@Ph0enixKM Ph0enixKM moved this from 📋 Todo to 💬 Need clarification in Amber Project May 25, 2024
@b1ek b1ek added this to the Stable release milestone May 29, 2024
@b1ek
Copy link
Member

b1ek commented May 30, 2024

i feel like we should decide on the syntax we want to adopt.

imo there are 3 ways it could go:

  • create a dynamic, JS-like structure
    • create a wrapper around jq? im not sure its a good idea tbh, as it is not cross platform by any means and probably slower than native bash implementation
    • use bash's linked arrays? that would still be very portable, but wouldn't support VERY old systems (that shouldnt be supported tbh, updating bash is not at all complicated)
  • object-ify amber by adding custom classes, types, etc
    • not sure how this will even work

also another topic for discussion is how we are going to handle serialization to/from JSON and other types. maybe create an external package like serde for rust?

@arapower
Copy link
Contributor

arapower commented May 30, 2024

Bash arrays have a drawback of slowing down as their size increases.
Using files, we could implement something like this.

// Amber
let obj = {
    foo: "bar",
    baz: "quz",
    fruits: {
      orange: 1,
      apple: 3
}
echo "foo is " + obj["foo"]
echo "0 is " + obj[0]
echo "apple is " + obj["fruits"]["apple"]
#!/bin/sh

get_obj_by_key() {
	cat "$1" |
	grep "$2" |
	sed -n '$s/^'"$2"'="*\(.*[^"]\)"*/\1/p'
}

get_obj_by_index() {
	num=$2
	line=$((num + 1))
	cat "$1" |
	sed -n ''"$line"'s/^[^=][^=]*="*\(.*[^"]\)"*$/\1/p'
}

obj=$(mktemp)
echo "foo=\"bar\"" >> "${obj}"
echo "baz=\"obj_baz\"" >> "${obj}"
echo "fruits_orange=1" >> "${obj}"
echo "fruits_apple=3" >> "${obj}"

echo "foo is $(get_obj_by_key "${obj}" foo)"
echo "0 is $(get_obj_by_index "${obj}" 0)"
echo "apple is $(get_obj_by_key "${obj}" fruits_apple)"

@Ph0enixKM
Copy link
Member

Ph0enixKM commented May 30, 2024

I agree that adding jq as another dependency is hard to swallow and going for an easy route. The solution to use linked bash lists is pretty cool and more performant but this could be a more challenging to implemtent. I think that the alternative to use temporary is also really cool as it introduces a backwards compatibility. We will have to write some functions in bash hardcoded to the header and plan a way to store data of the Object type in the files though.

Screenshot 2024-05-30 at 20 33 39

I think that I like the mktemp version a little bit more. What do you think @b1ek @arapower @boushley @TheMayoras?

@garyrob
Copy link

garyrob commented May 30, 2024

About the mktemp version... couldn't that also be a non-temp file, thus giving us a persistent key-value datastore, akin to Python's shelve? (Though for small amounts of data.) And secondarily, would it be conceivable then to support TOML as the format, which would seem to allow multiple sets of key-value pairs in the same file? (Maybe that's going too far but it seems like being able to read TOML files would be good, and could dovetail with the other key-value functionality.)

@b1ek
Copy link
Member

b1ek commented May 30, 2024

mktemp

are you sure you want to rely on temporary files? afterall, the script's user might not have the permissions to do that, and it is awful from a security perspective - a third program can easily modify the script's memory

what if we used bash's variables instead of files? that seems pretty much doable

@b1ek b1ek mentioned this issue May 31, 2024
@arapower
Copy link
Contributor

arapower commented May 31, 2024

the script's user might not have the permissions to do that

It cannot be denied.
However, this also applies to commands like sed and bc that Amber already depends on.

it is awful from a security perspective

I consider the security of the permissions for files created by the mktemp command, which are set to 600, to be high.

a third program can easily modify the script's memory

This risk is about the same for general programs or shell scripts that create temporary files.

what if we used bash's variables instead of files? that seems pretty much doable

Your previous post mentioned the following:

not sure how to handle nested objects, though

If you have any ideas for a clever implementation using variables, an example would be greatly appreciated.

Handling large amounts of data without using temporary files can also be difficult.
I think implementing a solution that makes appropriate use of external commands to enhance compatibility with sh and similar shells would be easier than relying solely on Bash features.

In the future, there may be cases where temporary files (or directories) are used when implementing other features.
So, it would be beneficial to consider now the proper handling of temporary files.

Since mktemp is not included in POSIX, there may be environments where it does not exist.
In such cases, you can refer to implementations like the following:

@b1ek
Copy link
Member

b1ek commented Jun 2, 2024

i've spent this weekend implementing different appoaches to objects in bash, trying to get as close as possible to something like objects in actual dynamically typed languages.

i dont think that we could do much with implementing this thing. like, we are pretty much limited by bash. maintaining our own file specification is overkill, not to even mention how we are going to handle escaped strings and nested objects, and how is this going to affect code readability + emitted program size.

someone has mentioned linked arrays in bash. they do not exist in bash that comes with all macos's and cannot be nested, or passed to a function.

like, the best we could do is to depend on jq and store it in string variables or temp files. anything else is either awfully unportable and very limited, or will take an incomprehensible amount of effort to implement.

@arapower
Copy link
Contributor

arapower commented Jun 2, 2024

I think it would be good to implement it with jq command. Shell variables would be fine for data retention.
If we implement the various Amber functions that manipulate data in JSON format, the functionality we need will naturally become clear.
Then we may decide to implement new functions or possibly reduce the jq dependency.

@b1ek
Copy link
Member

b1ek commented Jun 8, 2024

I think it would be good to implement it with jq command.

just to make sure, we are going with this?

also we might consider this: https://github.com/kristopolous/TickTick

@Ph0enixKM @boushley @brumik what do you think

@Ph0enixKM
Copy link
Member

Ph0enixKM commented Jun 8, 2024

We have a couple of routes at this point:

jq route

This is the easiest one. We'd just use the jqand call it a day. This adds requirement for user to also install jq in order to do operations on collections and dictionaries.

Using Bash's 2.0 native structures

We'd have to use hacky ways to get around of some limitations. The dimentional arrays could be solved by using array with linking to variables

# Amber: let arr = [[1, 2, 3], [4, 5, 6]]
arr0=(1 2 3)
arr1=(4 5 6)

arr=(arr0 arr1)

# Amber: echo arr[0][1]
eval echo \${${arr[0]}[1]}

Idk if this is example breaks. It probably does. We'd have to test this thoroughly.

Bumping requirement for Bash to 4.0

This is a bummer since this way we drop the support for macOS (unless we make it work with zsh) and some distros.

The zsh uses a pretty similar syntax although I wouldn't go for is since this would introduce a disambiguity for some packages that people create with Amber.

Building our own implementation of storing object in some way

@arapower had an idea to use temporary files to store data. How about using just variables and keep the data relatively easy to parse? This way we can keep things fast and also maintain the backwards compatibility. We could store not only objects but also lists (and perhaps some other data types as well)

@brumik
Copy link
Contributor

brumik commented Jun 9, 2024

@b1ek Sorry to come back late. I had to do some small research on my own.

Here are my two cents: I think Amber has to decide what it wants to be. As far as it was going until now it was a wrapper around bash for people who already do scripting in bash, for systems that support bash.

With this in mind I do not think it is absolutely necessary to have nested arrays, objects and things that are impossible (by default at least) in bash.

I can also see issues with creating a temp file in performance too (and completely agree with @b1ek about security permission and immutable system problems).

If I would suggest something probably would be the fact to not to support dictionary type, only arrays (which can be done with simple constants). This seems like it would be an issue for some, but overall more healthy for the project and the expectations. I really think it is important for users and developers to define the scope of the language. For cross platform programming (not scripting) there are other languages (like C and rust for example).

@Ph0enixKM
Copy link
Member

Thank you @brumik for your insight. 👏 As we've been discussing this issue some other idea arouse #161. We could build a runtime that can get fetched (if not exited) and would extend Amber for more functionality. I think that letting Amber be a shell language and yet letting users use an extension if required needed for their needs. Perhaps they have already built something with Amber and the need just that one little thing that is not really supported by Bash but is pretty common in other programming languages.

But honestly I think that ultimately the best way would be to just utilize the Bash's features as well as possible and perhaps provide some other functionalities as a form of a library.

@arapower
Copy link
Contributor

arapower commented Jun 9, 2024

@Ph0enixKM
Wouldn't it be better to go beyond data types and discuss the direction of language design?

@Ph0enixKM
Copy link
Member

@Ph0enixKM Wouldn't it be better to go beyond data types and discuss the direction of language design?

Yes. But not in the scope of this issue. Let’s create a discussion for that.

@Ph0enixKM Ph0enixKM changed the title ✨ [Feature Request] Dictionary Data Type ✨ Dictionary Data Type Jun 16, 2024
@garyrob
Copy link

garyrob commented Jun 22, 2024

Just a minor comment, possibly moot, about the jq possibility: for some people, it's nontrivial to install on MacOS: https://stackoverflow.com/questions/71406984/how-to-instal-jq-without-homebrew

@Mte90
Copy link
Member

Mte90 commented Jun 24, 2024

For jq as dependency we have a @b1ek bash project that we have to integrate so it is a complete different issue.

@b1ek
Copy link
Member

b1ek commented Jun 25, 2024

For jq as dependency we have a @b1ek bash project that we have to integrate so it is a complete different issue.

the problem remains though: it is not available on all systems

@Mte90
Copy link
Member

Mte90 commented Jun 25, 2024

I think that for any tool we will use in the Bash generated that can be jq or curl this dependency checker does on every run a check if the various commands exists and in case report an error.

After all in this way it is the same feature that other scripting languages have, only that in the Bash case maybe the stuff/commands already avalaible are less.

@Ph0enixKM
Copy link
Member

After all in this way it is the same feature that other scripting languages have, only that in the Bash case maybe the stuff/commands already avalaible are less.

I don't understand what you are trying to say here. I think that Amber should not depend on the jq as it adds more dependencies. We could later on implement some standard library function to parse the JSON format. But that's just an idea.

@Mte90
Copy link
Member

Mte90 commented Jul 1, 2024

Parse JSON in pure bash it is something that I don't like at all, I think that we should use tools if they are there otherwise there is an error about the script can't run there.
There are various jq alternatives anyway and like we did it in my PR with new commands for download we can do a wrap around them.

@Mte90
Copy link
Member

Mte90 commented Jul 4, 2024

https://github.com/h4l/json.bash

It is a project to manipulate JSON in pure bash, so we should start thinking to create a system to embed pure bash libraries.
Maybe in the future those lbiraries will be migrated to Amber.

@Mte90 Mte90 added the syntax label Jul 19, 2024
@Ph0enixKM Ph0enixKM changed the title ✨ Dictionary Data Type [Feature] Dictionary Data Type Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed syntax
Projects
None yet
Development

No branches or pull requests

7 participants