Type annotations on maps and arrays #1654

michaelhkay · 2024-12-11T19:48:13Z

Currently maps and arrays have very little type safety. You can say that your function expects array(xs:string), but that involves testing what the array actually contains, and there's nothing to stop you then appending an integer to the array.

I would like to explore the possibility of having arrays and maps annotated with a type (either always, or optionally), and for this type to constrain operations such as array:append() and map:put().

@dnovatchev has suggested that ordered maps and unordered maps should be different types, and I think it would be difficult to do that unless we move to structural typing. It's also more consistent with typing of atomic values and nodes - though it raises a question about sequences, where the type is purely descriptive.

This would also have implications for records: presumably a map could be annotated with a record type, and this too could constrain the operations available on the value.

This is a rather big change and I put it forward fairly tentatively, but I'm interested to hear people's views.

dnovatchev · 2024-12-11T20:37:03Z

@dnovatchev has suggested that ordered maps and unordered maps should be different types, and I think it would be difficult to do that unless we move to structural typing. It's also more consistent with typing of atomic values and nodes - though it raises a question about sequences, where the type is purely descriptive.

The suggestion to have ordered maps introduced as a new type is based on observations and factual data:

XPath users have been using the map - type as currently defined in the XPath 3.1 for more than 7 years - without the need for ordered maps.
The described use-cases for ordered-maps are mostly for serialization/deserialization of maps to/from JSON. Absent is any documented need for any other uses of the map-type to be concerned with ordering.

The suggestion to have ordered maps introduced as a new type aims also at preserving the conceptual integrity and simplicity of the current documentation, thus not to startle/confuse the long-time users of maps with changes in the type they have been using for years.

Let us just save their time necessary to re-read the changed documentation describing what maps are and rereading the descriptions of the functions on maps, that would have grown in complexity - all this time just to understand that they don't need this new "feature" and information at all.

This would also have implications for records: presumably a map could be annotated with a record type, and this too could constrain the operations available on the value.

Again, adding complexity makes things more ... complex - and this is unnecessary. A record doesn't care whether or not the map it is based on is a regular map or an ordered map. All a record needs is that the map has a certain subset of string keys with predefined names.

My plea with everyone is: "Let us keep things simple for the sake of the User."

My main motivation to participate in this CG is to be a User's Advocate - and from the position of the users of XPath it is best not to augment a current, well-known type making it more complex, when this added complexity will rarely be of any use.

ChristianGruen · 2024-12-11T21:15:39Z

though it raises a question about sequences, where the type is purely descriptive.

Right, this makes me hesitate: If we want an array of type array(xs:int) to keep its type, this should also apply to an xs:int* sequence. Which will be difficult unless we introduce functions like fn:append, which may not be used anyway. But…

This would also have implications for records: presumably a map could be annotated with a record type, and this too could constrain the operations available on the value.

…I would indeed like to see type annotations attached to records. It seems counterintuitive to me that a record resulting from declare record coord(x, y) can be “destroyed” by removing x or y. In contrast, an array will remain an array, no matter which operation you apply on it. Technically, a record is “just a map”, but this may not be so obvious to everyone as it is for implementers.

dnovatchev · 2024-12-11T22:32:49Z

Right, this makes me hesitate: If we want an array of type array(xs:int) to keep its type

At present the standard functions defined on arrays take an array of the most general type: array(*) and if their result is a new array, it is also of the same general type array(*).

Why should we change this? This would require the re-defining and re-writing the documentation of all standard functions on arrays and may result in backwards-compatibility problems.

If a user wants to have an operation that takes an array of a more specific type, and/or produces a new array having a specific type, then the user will write a function with the appropriate signature, that implements the wanted operation.

…I would indeed like to see type annotations attached to records. It seems counterintuitive to me that a record resulting from declare record coord(x, y) can be “destroyed” by removing x or y. In contrast, an array will remain an array, no matter which operation you apply on it. Technically, a record is “just an array”, but this may not be so obvious to everyone as it is for implementers.

A record can never be destroyed - it is a new record that is produced from any such operation, thus the original record is immutable.

Also a record is not “just an array” - we may say "just a map".

So far I haven't seen any compelling use-case for keeping annotations on maps/records or on arrays.

Let us not make our current and well-established types more complex than they need to be.

As for ordered maps - we have the good precedent that the record was defined as a separate type from map, thus a map doesn't have a "property saying that this map is a record". If this is so, why would we want to do exactly the opposite with ordered maps: not to cleanly specify them as some more special kind of map but just add a property to any map (that will be false in 99.99% of cases) saying whether or not this map is ordered?

michaelhkay · 2024-12-12T00:05:41Z

You (@dnovatchev) say that you want ordered maps to be a different type from maps, and your argument for this is based on keeping things simple for the sake of the user.

I think you need to demonstrate how this will keep things simple. A good start would be to describe how the new type should relate to the existing map type: where does it fit in the type hierarchy, what are the subtyping rules, and how will this affect function signatures. Will existing map functions apply to the new type, or will new functions be defined? If existing functions can handle the new type, then their specifications are going to have to change to say how they do so, and I can't see that the result will be any simpler. A concrete proposal demonstrating this would be useful.

dnovatchev · 2024-12-12T01:06:59Z

You (@dnovatchev) say that you want ordered maps to be a different type from maps, and your argument for this is based on keeping things simple for the sake of the user.

I think you need to demonstrate how this will keep things simple. A good start would be to describe how the new type should relate to the existing map type: where does it fit in the type hierarchy, what are the subtyping rules, and how will this affect function signatures. Will existing map functions apply to the new type, or will new functions be defined? If existing functions can handle the new type, then their specifications are going to have to change to say how they do so, and I can't see that the result will be any simpler. A concrete proposal demonstrating this would be useful.

@michaelhkay, It seems that you are afraid of having something like inheritance here...

We can perfectly live without mentioning inheritance at all. Let us just call the new type map-container (or whatever better name we find), and make it have a map and an order to be imposed on this map's keys.

Thus existing map functions do not accept a map-container argument at all - and nothing needs to be modified on these functions. If someone needs a map function to be applied on the contained in the container map, they will simply pass just the map as the argument of the function call.

MarkNicholls · 2024-12-13T15:26:09Z

@michaelhkay

observations:

the type language isnt capable of capturing this.
its not capable of capturing all sorts of things, but this seems is something about types that usually you would express in the type language.
(not a massive problem, I have libraries that preserve types in exactly the way you are suggesting, I simply declare the function with a psuedo type signature, thats fine for me, but its not ideal in the core libraries), I don't enforce it, its rather a specification of what the function does, not a constraint that is enforced.
the type becomes part of the data structure itself, this is a tad unexpected to people accustomed to static type systems where types are considered design time abstractions that are effectively (sometimes literally) erased (I know there are exceptions to this)
does that mean, for example, if I assign to a variable of type array(Animal) from a variable of type array(Elephant) that a new array is effectively constructed with the new relaxed constaint "Animal"? Does this cause odd effects? I'm not sure, I can now put append a frog to something that was originally defined to be an array of elephants, again not the end of the world, its just following the rules.

I like the idea in principle, I'm just not sure that it may make things less transparent without a cascade of large changes, and a risk of somehow some inconsistency (or unexpectedness) leaking out. I think Javascript or Typescript had some weird issue with generic lists that didnt obey contra/covariance rules. It largely doesn't matter until it does, and then its really confusing.

michaelhkay · 2024-12-13T19:47:34Z

@MarkNicholls Not sure what happened to my response, it seems to have got lost.

the type language isnt capable of capturing this.

I'm not at all sure what you're getting at here. The type system is pretty flexible and I'm sure we can bend it to our convenience. Not that all changes are necessarily good.

the type becomes part of the data structure itself, this is a tad unexpected

Surely anyone who has grown up with object-oriented languages expect instances to directly identify their types? The structural (essentially predicate-based) typing we currently have with maps and arrays is, I would have thought, less familiar to most of our users.

does that mean, for example, if I assign to a variable of type array(Animal) ...

I don't know; I haven't yet tried to work through the implications. At present this issue is more of a requirement statement (improve type safety for maps and arrays) than a detailed design.

MarkNicholls · 2024-12-14T21:37:18Z

I think you are suggesting that append should be of type

forall A.function(array(A),A) as array(A)

that's not currently possible, and may not be trivial, is this something you would allow users of the language to do?

Surely anyone who has grown up with object-oriented languages expect instances to directly identify their types

well no, there is nothing in OO itself that means an object "knows" it type, type erasure for things like collection classes, is not uncommon (Typescript I think erases all types), I personally don't completely understand the typing in X languages, but at least in historical terms, types are used to statically verify code (though I accept over the years this has drifted).

With the assignment question, an interesting example is the inconsistent way C# deals with parametric types e.g.

this code doesn't compile

            List<string> xs = new List<string>(["1"]);
            List<object> ys = xs;

mutable lists are inherently invariant in the parameter.

this code does compile


            string[] xs = ["1"];
            object[] ys = xs;
            ys[1] = 1;

but throws an error at runtime

though in theory it isnt dissimilar, mutable arrays are also inherently invariant, but the C# developers decided to turn a blind eye to it, maybe for historical reasons (i.e. this code works in pre generic C#) or maybe they deemed this logical inconsistency to be pragmatic to allow certain programs to be valid.

The cost of the exception in this case is types are checked at runtime, whilst the former List construction is much more efficient and relies on compile time checks, it doesn't actually need to know its type.

the latter also allows me to write code like this

        static void Set(object[] xs,object x)
        {
            xs[0] = x;
        }

if my program crashes, and I look at the code, I can see absolutely nothing wrong here, why would this code not work?

and the answer is if you do this:

            Set(["1"],1);

C# allows the array to be covariant (when it isn't) and the hole in the type system allows theoretically provable incorrect code to be compiled.

Combining parametric polymorphism and "subtyping", is to me at least, non trivial, some languages completely outlaw combining them (e.g. F#), other limit them to some cases (e.g. C#) others allow it everywhere e.g. Scala.

in your case the data structures are immutable, so maybe this is an unfounded concern, I can certainly 'sidestep' the type signature of "adding" a Frog to an array of Elephants, simply by assigning it covariantly to an array of Animals, I actually don't think that's a bad thing, I think, the type system is forcing me to do something to explicitly indicate my intent.

So I like the idea, I'm just nervous its a can of worms, and maybe in a context where people generally don't declare their types much (I routinely declare types), has limited value and may impose significant runtime cost....I'd use it, but I dont think I'm typical.

michaelhkay · 2024-12-15T00:35:40Z

Thanks all for the feedback. It feels like it is not worth pursuing this idea. Closing unilaterally.

michaelhkay mentioned this issue Dec 11, 2024

JSON maps #1655

Closed

ChristianGruen added XDM An issue related to the XPath Data Model Discussion A discussion on a general topic. labels Dec 12, 2024

michaelhkay closed this as completed Dec 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type annotations on maps and arrays #1654

Type annotations on maps and arrays #1654

michaelhkay commented Dec 11, 2024

dnovatchev commented Dec 11, 2024

ChristianGruen commented Dec 11, 2024 •

edited

Loading

dnovatchev commented Dec 11, 2024

michaelhkay commented Dec 12, 2024 •

edited

Loading

dnovatchev commented Dec 12, 2024

MarkNicholls commented Dec 13, 2024 •

edited

Loading

michaelhkay commented Dec 13, 2024

MarkNicholls commented Dec 14, 2024 •

edited

Loading

michaelhkay commented Dec 15, 2024

Type annotations on maps and arrays #1654

Type annotations on maps and arrays #1654

Comments

michaelhkay commented Dec 11, 2024

dnovatchev commented Dec 11, 2024

ChristianGruen commented Dec 11, 2024 • edited Loading

dnovatchev commented Dec 11, 2024

michaelhkay commented Dec 12, 2024 • edited Loading

dnovatchev commented Dec 12, 2024

MarkNicholls commented Dec 13, 2024 • edited Loading

michaelhkay commented Dec 13, 2024

MarkNicholls commented Dec 14, 2024 • edited Loading

michaelhkay commented Dec 15, 2024

ChristianGruen commented Dec 11, 2024 •

edited

Loading

michaelhkay commented Dec 12, 2024 •

edited

Loading

MarkNicholls commented Dec 13, 2024 •

edited

Loading

MarkNicholls commented Dec 14, 2024 •

edited

Loading