Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General metadata system #2

Open
adamchalmers opened this issue Jul 12, 2023 · 18 comments
Open

General metadata system #2

adamchalmers opened this issue Jul 12, 2023 · 18 comments

Comments

@adamchalmers
Copy link
Collaborator

adamchalmers commented Jul 12, 2023

All KCL objects (lines, paths, points, corners, solids, etc) should support metadata. This way users can:

  • Add notes for other humans (annotations)
  • Add structured data for APIs to read (e.g. physical properties like mass, which feed into KittyCAD's API for checking the total weight of your assemblies)
  • Add custom properties which get carried through to the GLTF export (for manufacturing devices)

These metadata need to support both statically-typed metadata (which we typecheck), and flexible data (for external devices or clients, whose schema we won't understand).

Right now the fantasy docs describe a Material type, which all 3D objects take as a parameter when they're constructed. In my opinion this could be a special case of the more general and powerful metadata system I described above.

@adamchalmers
Copy link
Collaborator Author

adamchalmers commented Jul 13, 2023

I see two possible approaches. One copies Attributes from C# and Rust. I'm going to describe the Attributes approach now, and think about the other over lunch.

Attributes

In both C# and Rust you can put attributes on most language items. E.g. in Rust you can put #[doc = "..."] on a type or module, and you can put #[serde(default)] on a field.

Both Rust and C# have a set of built-in attributes (e.g. #[doc] and #[cfg]) but also let you create your own custom attributes, which can be consumed by libraries, e.g. the common serde library uses attributes like #[serde(default)] to customize how fields get serialized into JSON.

Attributes are structured: each attribute defines what schema it follows. E.g. the #[doc = "..."] attribute only accepts a string, and #[serde(...)] accepts a list of specific Serde keywords, like #[serde(default, skip_if_empty, serialize = "my_custom_serialize_function")]. This is great, because we can use structured attributes for:

  • Materials, e.g. #[material = Aluminium.ISO5052]
  • Visual annotations, e.g.
    • #[note(audience = "all", text = "this is the main gear shaft", color = rgb(255, 20, 20))]
    • #[note(audience = "manufacturing", text = "Make sure to use Donnely nut spacing and cracked system rim-riding grip configuration"]
  • Surface finishes, e.g. #[surface_finish = Paint.Pantone235]
  • Opaque blobs for 3rd party clients like printing services, which get serialized into the GLTF format. E.g. #[kv=("print_with", 0x23C3598A)]

Eventually other clients could create their own libraries with structured attributes. For example, users might create a library with attributes for shapeways.com 3D printing service. Like #[shapeways(plastic = Shapeways.Plastic.Blend35, color = rgb(20, 200, 33)].

Ideally we'd allow users to put attributes on basically anything. Because KCL is such a simple language, it doesn't have many language constructs -- only functions and expressions. In Rust you cannot currently put an attribute on an expression (see their open RFC for this) -- but you can put them almost anywhere. If you want to put an attribute on an expression, you can just move it into its own function, though, so it's not hard to work around. Hopefully because KCL is simpler, the KCL compiler would support attributes almost anywhere.

@adamchalmers
Copy link
Collaborator Author

tagging @Irev-Dev to read this when he wakes up

@adamchalmers
Copy link
Collaborator Author

adamchalmers commented Jul 13, 2023

Databases

On reflection, our language is really outputting:

  • A set of objects (points, lines, extrudes, etc)
  • Properties of those objects (geometric, material, text-annotations, 3d-printer-tags, etc)
  • Relationships between objects (these 4 edges are part of the same path, this face is created by that face, extruded 10cm)

Does this kinda sound like a database to anyone else?

Semantics

We could "compile" KCL source code into a database:

  • Relational database, with a table for each primitive object (i.e. points) with tables for each relationship between them (e.g. a "lines" table, with columns start: Point, end: Point which are foreign keys back to the "points" table)
  • Graph database, where each node is a KCL object and each edge is a relation between them.

If we do this, KCL basically becomes a domain-specific language (DSL) for inserting into a database.

This has some big advantages:

  • It implies a really easy query language, so you can submit queries like "show me all edges made of aluminium", and make everything else in the UI/visualization fade away except the objects which meet your query
  • Query languages give you an easy way to enforce invariants, e.g. "there should never be a glass face taller than 20 metres". These invariants could be from legal requirements, manufacturing constraints, etc. When users want an invariant, we translate it into a constraint on the underlying database, e.g. ADD CONSTRAINT faces CHECK !material == glass || height < 20.
  • Queries can be used to set properties, so you can now easily express things like "fillet any edge between two points with a "fillet: true" label", or "fillet any edge of any 3D solid whose material is metallic".
  • Databases have schema, so type-checking becomes really easy! If the underlying database code gives a schema/type error, then we expose that to users as a type error.
  • We already have UUIDs for each object, so every object has a natural key for the database

Syntax

See below

@Irev-Dev
Copy link

I haven't used attributes much because I've used rust so little, so they don't click with me quiet as much if I had more exposure, would you mind outlining the advantages of attributes over something like a metadata param with key-value pairs?

@adamchalmers
Copy link
Collaborator Author

We totally could use a metadata param, we'd just need to put it in every single function that can create an object. So it would be a somewhat inconvenient user experience.

On the other hand, we could make metadata an implicit first parameter to every function...

@Irev-Dev
Copy link

Yeah cool, sorry taking me a bit to wrap my head around things, so if we could use params, but would likely start looking noisy. Maybe a good rubric is that modeling and 3d-geo is a first class citizen, meta-data is an important thing to layer in, so modeling data goes in params, medata is layered on with attributes.

One thing I wasn't sure about either from your the example #[note(audience = "all", text = "this is the main gear shaft", color = rgb(255, 20, 20))], maybe it's just the example you used, but because this is a general human-readable comment, not structured data (like the shapeways example), is there some conflict with /// style comments? If I were to guess how they differ it would be

  • /// comments are for in-app use and give IDE level support documentation of how to use function etc, the comments have nothing to do with the execution, or the output data of the software
  • attributes layer in extra metadata to things being created by the code executing, this information maybe also be included in the export (depending on the format, but definitely our GLTF extension)
    Sound right?

I had a very tangential idea with attributes, after writing it, it became obvious it should probably be in it's own issue #3.

@adamchalmers
Copy link
Collaborator Author

adamchalmers commented Jul 14, 2023

For background, I think having structured notes could be useful. You can see in my example I imagine different notes having different intended audiences, so the manufacturers could filter only notes relevant to them, or the designers could filter only notes relevant to them.

The way Rust does it, /// comment is actually shorthand for #[doc = "comment"] -- so here's my proposal. In KCL there are two ways to make a note. These two are equivalent:

  1. Attribute syntax, e.g.
#[note(audience = "all", text = "this is the main gear shaft", color = rgb(255, 20, 20))]
  1. Docstring syntax, e.g.
/// this is the main gear shaft
/// @audience all
/// @color rgb(255, 20, 20)

The former is nice because you can do it all on one line, and it reuses the very general, flexible attribute syntax. The latter is nice because you can write really long comments that span multiple lines, without needing to enable word wrap in your IDE.

@adamchalmers
Copy link
Collaborator Author

adamchalmers commented Jul 14, 2023

Keyword arguments

(an alternative to attribute syntax)

Motivation

If types like Solid3d and Line are database tables, then to the user, they would probably be structs with

  • a lot of typed fields (e.g. a field for "note" with subfields for "text", "color", "audience" etc) that KittyCAD can interpret/analyze/use
  • an untyped properties map (Map<String, Any>), for properties KittyCAD can't use. When users export their KCL to GLTF, these properties get serialized into the GLTF. This would be opaque (from kittycad's perspective) properties for printers or other 3rd parties whose APIs aren't known to us.

The problem with structs is that you have to initialize all their fields. This would be really inconvenient for users -- you might not care about adding a note to every value, or a surface finish, or whatever. So we should define default values for many (perhaps all) of these fields. If users don't set the value explicitly, we use the default value.

Unfortunately, there's an awkward tension between optional arguments and positional arguments. Because the compiler needs to know which fields are being omitted! Say you have a function call like cube(side_length, material, note, surface, fillet_properties, extra). How do you omit note and fillet_properties (replacing them with a default value) but keep the rest? If you write cube(side_length, material, surface, extra) then the compiler isn't sure which two properties you've omitted.

Python has this problem, but solves it by using keyword arguments (kwargs)

How do kwargs work

For example, the open function in python's standard lib (docs) is defined as

def open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

Here file is a positional argument but the others are all keyword arguments, with their own default values. You call it by saying:

# You have to specify all positional arguments. There's only one here.
# Default values are used for all keyword arguments.
f1 = open("log.txt")
# You can override the defaults for keyword arguments.
f2 = open("log.txt", encoding="UTF-16")

I really like this, because it allows us to:

  1. Define a lot of structured properties without overwhelming the user
  2. Let users rely on our defaults, and opt into just the metadata they care about
  3. Add new metadata fields to KCL without breaking existing source code (i.e. maintain backwards compatibility). If we introduce a new metadata field, we add it as a keyword argument, with a sensible default. This way old programs written before we introduce the field will still compile, using the default

So, returning to the problem above: how do you invoke the cube function without specifying all properties?

// Only specify one metadata item
myCube = cube(Distance::foot(13), material=Aluminium.ISO5052)

// Metadata items can be the output of other functions.
// Here the `note` metadata is the output of a `note` function.
// The `note` function could be from the stdlib or defined by the user.
fancyDie = cube(Distance::inch(1), note=note("this is for D&D games", color=rgb(200,10,20), audience="designer"))

// Same as above, but using `let-in` syntax to make it a bit more readable.
fancyDie = let
  myNote = note("this is for D&D games", color=rgb(200,10,20), audience="designer")
in cube(Distance::inch(1), note=myNote)

// You can always specify 'extra' properties that just go straight into GLTF
d = cube(Distance::inch(1), extra={"cnc_mode": "slice", "cnc_blade_width", 4.2})

When you define a function with kwargs, you specify the default values.

// A special die where the "1" pip is replaced by some custom text. Customers want to personalize their dice.
customDie = (d: Distance, custom_text: Text = "KittyCAD") =>
    cube(d)
    |> emboss_top(custom_text)

Improving ergonomics

Shorthand

I think it'll be pretty common for users to define values and then pass them into keyword arguments like cube(dist, note=note) or cube(dist, material=material). Here the left-hand side is the name of the keyword parameter, and the right-hand side is the value being passed in as an argument. We could make that more concise by letting users just write cube(dist, note=) as a shorthand for cube(dist, note=note). Or maybe they'd write cube(dist, =note), either one works I guess.

Delegating kwargs

I think we'll need a way for users to define their own functions and "delegate" all the kwargs to the inner functions they're calling. For example, say you're dealing with a lot of really big cubes throughout your code. You want to avoid repeating yourself all over the codebase, so you define a function for the big cube.

reallyBigCube = cube(Distance::metre(100))

Later on, you realize that each cube needs to have a different note on it. So you add a kwarg for notes, with a default value. You reuse the standard library's Note.default() as the default value for your notes too.

reallyBigCube = (note: Note = Note.default()) =>
    cube(Distance::metre(100), note=note)

So far so good. But as your program evolves, you'll probably want to add more and more properties. This gets unwieldy pretty quickly:

reallyBigCube = (note: Note = Note.default(), material: Material = Material.default(), surface_finish: SurfaceFinish  = SurfaceFinish.default(), ...) =>
    cube(Distance::metre(100), note=note, material=material, surface_finish=surface_finish, ...)

You can omit the type annotations to make this nicer. KCL can infer the types of all your kwargs here, because it knows the underlying types of the stdlib function cube:

reallyBigCube = (note = Note.default(), material = Material.default(), surface_finish  = SurfaceFinish.default(), ...) =>
    cube(Distance::metre(100), note=note, material=material, surface_finish=surface_finish, ...)

But this is still unwieldy. You'll probably want, at some point, to let users of reallyBigCube add any metadata that the underlying cube supports.

So, we should support a syntax like Python's special **kwargs parameter, which collects all keyword args into a dictionary. That'd make it much more concise:

reallyBigCube = (**kwargs) =>
    cube(Distance::metre(100), kwargs)

This seems much nicer, but I worry that it's hard to compose. What if you have two different shapes? Say you want to put a sphere on top of a cube.

myShape = let
    myCube = cube(Distance::metre(2))
    mySphere = sphere(Distance::metre(0.5)) |> translate(0, 0, 2)
in union(myCube,  mySphere)

Now if you delegate metadata, you've got to send the same metadata to each shape... that's not good...

myShape = (**kwargs) => let
    myCube = cube(Distance::metre(2), **kwargs)
    mySphere = sphere(Distance::metre(0.5) **kwargs) |> translate(0, 0, 2)
in union(myCube,  mySphere)

This locks you into using the same set of metadata for both cube and sphere. We'd need some syntax for overriding that metadata -- unpacking it into a struct, updating some fields, then packing it back together and passing it into the other function calls.

Even worse, how do you delegate kwargs to different underlying types?

myShape = (text, **kwargs) => square(Distance::foot(2)) 
    |> extrude(Distance::foot(10)
    |> emboss(text)

The kind of kwargs that are needed for the square, the extrude and the emboss functions are all likely to be different. So you can't just pass one **kwargs object into all of them.

@adamchalmers
Copy link
Collaborator Author

On the other hand, a function's keyword arguments are basically equivalent to structs. The set of keyword args accepted by a function is equivalent to a struct, with each individual keyword arg corresponds to a field of that struct (the field is optional and has a default value).

So maybe instead of adding kwargs, we should add struct types. Then each function can explicitly declare a metadata struct, with whatever fields make sense for them. If users omit a field, it's set to a default value. This way, we can add new fields in new versions of KCL without breaking users -- we just define a default value.

I like this because users will probably want struct types anyway, and it simplifies how functions work (which simplifies the type system). I can't help but be a little peturbed that no other static, functional languages use keyword args... maybe there's a reason why.

@greg-kcio what do you think?

@greg-kcio
Copy link

I like the pythonic style of default args and kwargs (disclaimer: I'm biased bc Python is my daily driver).

When using default arguments in Python, you must declare required arguments before optional arguments:

# this is good
def cube(length: Distance, note: Note=Note.default(), material: Material=Material.default()):
  pass

# valid calls:
cube_01 = cube(Distance(10, 'mm'))
cube_02 = cube(Distance(10, 'mm'), Note("This is Cube 02"))  # kwarg without key, but in order
cube_03 = cube(Distance(10, 'mm'), material=Material("PETG"))  # omit the first kwarg and specify the second
cube_04 = cube(Distance(10, 'mm'), Note("This is Cube 02"), Material("PETG"))  # kwargs ordered, without keys
cube_05 = cube(Distance(10, 'mm'), note=Note("This is Cube 02"), Material("PETG"))  # when ordered, kwarg keys do not need to be specified
cube_06 = cube(Distance(10, 'mm'), Note("This is Cube 02"), material=Material("PETG"))
cube_07 = cube(Distance(10, 'mm'), material=Material("PETG"), note=Note("This is Cube 02"))  # kwargs can be out of order when you specify the key 
cube_08 = cube(material=Material("PETG"), note=Note("This is Cube 02"), length=Distance(10, 'mm'))  # required args can also be out of order when you specify the key
kwargs = {material=Material("PETG"), note=Note("This is Cube 02"), length=Distance(10, 'mm')}
cube_09 = cube(**kwargs)  # this works too and all the same rules apply

# invalid calls:
cube_10 = cube(note=Note("This is Cube 02"), material=Material("PETG"), Distance(10, 'mm'))  # illegal even though we can reason about the args
cube_11 = cube(Distance(10, 'mm'), Material("PETG"), Note("This is Cube 02"))  # kwargs out of order with no keys

# this is illegal because there is a required arg declared after the optional arg
def bad_cube(length: Distance, note: Note=Note.default(), material: Material):
  pass

Personally I prefer leaving no ambiguity for type requirements and would like typed args and kwargs. Python has built-in types that support this (nominally only, there is no runtime type checking this way): TypedDict and @dataclass. Both are essentially structs (nominally!). So whether we call them structs or dataclasses or something else, it would be convenient to support those as kwarg "types" and unroll them into function arguments. Idk why other languages don't have something similar... it is straightforward in Python since the language is dynamically typed and encourages duck typing.

@adamchalmers
Copy link
Collaborator Author

But... if we have structs, then I don't see a need for keyword args anymore. You just have a struct with a field for each kwarg, and if you want them to be optional, you just use Option instead of String. Then your function unwraps the optional with its desired default value.

This solves the composition problem I outlined above. Instead of declaring **kwargs and delegating them to both sphere() and cube() we can just define sphere_args: Option<...> and cube_args: <...>`, or define your own union/intersection of them and pass the values around as you want.

@adamchalmers
Copy link
Collaborator Author

So, I discovered that OCaml has "label args" (docs) which are basically just like keyword args. You can define any argument with a label. You can also define optional parameters. Declaring an optional T parameter is just syntactic sugar for a required Option parameter.

They don't have something like **kwargs. So in my above point where I said

But this is still unwieldy. You'll probably want, at some point, to let users of reallyBigCube add any metadata that the underlying cube supports.

OCaml solves this by saying "No, you can't declare the function as supporting any metadata that the underlying cube supports". I guess that's OK. You just declare the kwargs you want, and if the function signature becomes very long, that's OK.

Reading the OCaml docs for kwargs definitely reduces my worries about including them in the language. I know you can accomplish the same things using structs, but kwargs seems like a better DX.

@lf94
Copy link

lf94 commented Jul 14, 2023

It appears ECMA-335 / ECMA-334 attributes are something which are non-extensible by users. This is problematic for users with unforeseen needs.

As proposed by @Irev-Dev , key-value pairs (objects) would be much more ideal, and flexible, and type-safe if users are able to define their own types. Sure, there can be a set of standardized keys and values, but there must be a way to extend this by the users themselves.

Some will want attributes to propagate and others not. The only solution to this is to either have syntax to explicitly propagate or not. The easiest thing is have the user repeat the application of an attribute. The best thing is making it easy to combine attributes.

Named arguments are like @adamchalmers said, essentially structures, but in their defense a bit more user-friendly. I'm sure this particular aspect could be bikeshedded a lot, but we're all aware structures are more widespread than kwargs 🙂

@adamchalmers
Copy link
Collaborator Author

I agree, I think attributes are the wrong way to solve this. These metadata values need to be first-class concepts in the language, so they have to become return values or arguments to functions.

So far in the language design, I've found places where I'd like keyword arguments (here), and other places where they'd help introduce backwards-compatible API changes. On the other hand, I haven't really found places where users would need to design their own structs. I think because this is going to be such a limited, single-purpose language, structs from the stdlib might be enough, i.e. there may be no need for users to define their own types.

If that's so, then avoiding structs would really simplify the language implementation. It frees us from thinking about

  • struct definition syntax
  • struct initialization syntax
  • field access syntax
  • field update syntax
  • default values for struct fields (so you can make backwards-compatible changes)
  • trait implementation syntax

So, for now, keyword arguments seem much simpler to implement and design. I suggest we:

  • Support keyword arguments in function definition/invocation
  • Don't support **kwargs or other methods of gathering all keyword arguments

@lf94
Copy link

lf94 commented Jul 16, 2023

I haven't really found places where users would need to design their own structs.

It's hard to comment without seeing some potential examples. That would further help everyone else understand the direction of the language. I too rarely, if ever, use a structure that includes things other than measurements and positions. So I completely agree.

I don't think metadata values have to be returned by anything, but user defined function with user defined keyword arguments to generate particular values sound super dang useful (maybe this is what you meant: "metadata [derived] values")!

@adamchalmers
Copy link
Collaborator Author

OK, I feel good about this discussion. Keyword args, no **kwargs syntax, and maybe we'll get to structs down the road. Thanks!

@Irev-Dev
Copy link

Irev-Dev commented Jul 18, 2023

Apologies for being late to this, I think javascript has some pretty good patterns, with spread operator, rest operator and destructuring

from your example @adamchalmers

myShape = (**kwargs) => let
    myCube = cube(Distance::metre(2), **kwargs)
    mySphere = sphere(Distance::metre(0.5) **kwargs) |> translate(0, 0, 2)
in union(myCube,  mySphere)

If I were to mix in some js syntax here

myShape = ({..args}) => let
    { forCubeOnly, forSphereOnly: radius, ...rest} = args
    myCube = cube({dis: Distance::metre(2), length: forCubeOnly, ...rest})
    mySphere = sphere({dis: Distance::metre(0.5), radius, ...rest) |> translate(0, 0, 2)
in union(myCube,  mySphere)

What happens here is all args are collected in args, then we peel off some keyValue pairs where interested in, forCubeOnly and forSphereOnly but we rename the latter to radius then the rest are thrown into a new bucket rest.
When we call my cube, we use a key-value pair for length, but the key and variable name match for radius in the sphere example so the shorthand radius, is fine over radius: radius, in both cases we also dump the rest of the arguments by spreading them into the rest of the object.

Any decent js dev would immediately refactor to the destructuring in the function param definition, definitely cleaner

myShape = ({ forCubeOnly, forSphereOnly: radius, ...rest}) => let
    myCube = cube({dis: Distance::metre(2), length: forCubeOnly, ...rest})
    mySphere = sphere({dis: Distance::metre(0.5), radius, ...rest) |> translate(0, 0, 2)
in union(myCube,  mySphere)

This to me seems like all the benefits of kwarg, with the one exception of ordered params are not part of this, but I actually like this, ordered params are kinda hostile to a new reader
image
source

To hammer this home imagine you're a mechanical engineer and you are very familiar with modelling concepts, every modelling software you've used takes a sketch or similar and extrudes it by some scalar, you're reading someone's example KCL and you see the line myExtrude = extrude(poorlyNameVar, someOtherPoorlyNameVar), your new to programming so your thought isn't immediately "I need to figure out which each of these are" and so you get stumped, vs myExtrude = extrude({sketch: poorlyNameVar, distance: someOtherPoorlyNameVar}) and it clicks with what you already know about CAD modelling.

@adamchalmers
Copy link
Collaborator Author

adamchalmers commented Jul 18, 2023

There are a few programming languages where only keyword arguments are allowed, or at least, the default is keyword arguments. E.g. Swift.

// Declare a function, looks normal.
func greet(person: String) -> String {
    let greeting = "Hello, " + person + "!"
    return greeting
}

// Use a function: note the parameter name on each argument!
print(greet(person: "Anna"))
print(greet(person: "Brian"))

You can also use different labels for the argument (passed by caller) and the parameter (input within the function).

func greet(person: String, from hometown: String) -> String {
    return "Hello \(person)!  Glad you could visit from \(hometown)."
}
print(greet(person: "Bill", from: "Cupertino"))

In this example there's one value, its argument label is from and its parameter label is hometown.

You can opt into positional arguments (i.e. just a parameter with no argument label, see docs) and also give default values for arguments.

I will say that all these features will complicate our compiler, but they definitely will provide a nice flexible UX. I agree that this will make function calls much more readable for new programmers. The circle(400, 300, (200, 100)) readability problem is real!

@adamchalmers adamchalmers reopened this Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants