-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add NamedTuples #22194
add NamedTuples #22194
Conversation
I'm excited for this. Thoughts:
|
base/namedtuple.jl
Outdated
getindex(t::NamedTuple, i::Symbol) = getfield(t, i) | ||
|
||
function getindex(t::NamedTuple, I::AbstractVector) | ||
names = unique( Symbol[ isa(i,Symbol) ? i : fieldname(typeof(t),i) for i in I ] ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the fields in a NamedTuple required to be unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I haven't yet done much to enforce that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this getindex
method is kind of a bummer; I played around with a definition like:
@generated function Base.getindex(nt::NamedTuple{names}, inds::Int...) where {names}
nms = [:(names[inds[$i]]) for i = 1:length(inds)]
args = [:(getfield(nt, inds[$i])) for i = 1:length(inds)]
return :(Base.namedtuple(NamedTuple{tuple($(nms...))}, $(args...)))
end
But the problem is you still don't know the values of inds
, so it has a hard time knowing the exact return type for the NamedTuple. Is there any other way we could get a smarter multi-getfield
-like type method?
Covariance isn't super important, particularly without nice syntax for types. With nicer syntax, I can imagine using |
Super cool! @quinnj I thought the idea is to use these for keyword arguments, so I'm not sure if covariance is useful to make As for curly |
Could we maybe just have lowering turn And then similarly for |
Yes, we could allow |
base/namedtuple.jl
Outdated
@@ -0,0 +1,91 @@ | |||
# This file is a part of Julia. License is MIT: https://julialang.org/license | |||
|
|||
@generated function namedtuple(::Type{NamedTuple{names,T} where T}, args...) where names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could perhaps just be simplified to
@generated function Base.namedtuple(syms::NTuple{N, Symbol}, args...) where N
if length(args) == N
Expr(:new, :(NamedTuple{syms,$(Tuple{args...})}), Any[ :(args[$i]) for i in 1:N ]...)
else
:(throw(ArgumentError("wrong number of arguments to named tuple constructor")))
end
end
since it's a bit redundant to pass NamedTuple
to the namedtuple
function. Also can this be exported? I'm finding it the handiest way to generate non-literal NamedTuples. It has a nice companion in tuple(args...)
, you just have to pass in the fieldnames as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My version is useful since it causes specialization on the names tuple. But we could add
namedtuple(syms::NTuple{N, Symbol}, args...) where N = namedtuple(NamedTuple{syms}, args...)
Though also I believe @vtjnash is working on inference improvements that could make my original method unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would depend on having some kind of dedicated syntax thought, right? Like ((x=y for (x, y) in zip(syms, vals))...)
or curly braces or something?
Have we resolved the question of inferred names that I brought up here? Having a syntax that infers the names like in C# and VB is really quite key for my use of named tuples in Query.jl. I guess I could keep using I also hope to enable the following kind of generator/query syntax very soon, and that would only work if the normal named tuple syntax supports inferred names: df = DataFrame(<somedata>)
df2 = DataFrame({i.Name, i.Address} for i in df if i.Age > 3) I think that kind of generator syntax is super julian and it essentially can cover any filtering and projection of tables with almost the shortest syntax I can think of. But it really only works if we have inferred names in the named tuple syntax. |
Here are some other syntax ideas that would be cool to have. I'm using x = {a=1, b=2}
y = {c=3, d=4}
z = {x...,y...} # Constructs a named tuple with fields a,b,c and d
z = {e=5, x..., f=8, y.d} # Constructs a named tuple with fields e, a, b, f, d One thing that would be really nice is if one could use z[a:f] # Constructs a named tuple with fields a,b,f This would essentially mean that one can no longer index with a variable into a named tuple, and I'm not sure that is a good idea... But if someone has some other idea how to easily select a subset of fields from a named tuple in a concise way it would be great. The use case for this last example might be something like this: df = DataFrame(<somedata>)
df2 = DataFrame({i[b:m]...,x=3} for i in d) This would create a new table that has columns |
No, we haven't resolved this, but Hopefully it's ok for the range indexing case to need a few extra characters. For instance |
@davidanthoff, note that subsetting & merging are already included in this PR julia> t = (a=1, b=2, c=3, d=4, e=5, f=6)
(a = 1, b = 2, c = 3, d = 4, e = 5, f = 6)
julia> t[[:a, :c, :e]]
(a = 1, c = 3, e = 5)
julia> r = (g=7, h=8, i=9)
(g = 7, h = 8, i = 9)
julia> merge(t, r)
(a = 1, b = 2, c = 3, d = 4, e = 5, f = 6, g = 7, h = 8, i = 9) I can see how a dedicated syntax for named tuples would be convenient, given that you can currently do I don't think I'm a fan of having @generated function nt(x::T) where {T}
names = tuple(T.name.names...)
vals = map(names) do n
:(getfield(x, $(Expr(:quote, n))))
end
return :(Base.namedtuple(NamedTuple{$names}, $(vals...)))
end which allows you to do julia> struct A
a::Int
b::Int
end
julia> a = A(1, 2)
A(1, 2)
julia> nt(a)
(a = 1, b = 2) in fact, that's so handy, it might be worth including in this PR (with a better name of course). Though unfortunately I'm not sure there's a way to get the reverse (i.e. auto-constructor of any |
@quinnj Yeah, the Having said this, there is precedence for that kind of syntax in Javascript and Typescript, they call this spread properties. The main use case in Query.jl for this spread syntax is in joins. You typically end up with two range variables for the two tables you join, and then you might want to construct one table out of the columns of both tables. With this spread syntax you could do this as
I guess one counter data-point is that C# and VB have had this for over a decade and I have not seen any complaints about this. And while it might not seem like a huge thing, it is really quite key for the whole LINQ design (which Query.jl copies heavily from).
The example you give seems useful, but I don't understand how it maps to the projection case in a query like statement. If we stick with the example I gave ( |
Yeah, that would work. This is also generally a larger design area where I'm just not sure what I should do in Query.jl. On some level I'd like to somehow support something similar to all the x = {fielda=1, fieldb=2, c=3}
y = {starts_with(x, :field)...} # Creates a named tuple with fields fielda and fieldb
# Or maybe
y = {x[starts_with(:field)]...}
# One could even think about set like operations on the names
y = {x[starts_with(:field) - :fieldb]...} # would create a named tuple with fielda But I'm not sure this is any good... In dplyr there are lots of other utility functions equivalent to |
Oh, and one more things: I also still have hope for #21875, so whenever I see multiple dots now, I start to wonder whether it can all be combined or not :) |
My sense is that something interval-like is the most useful application of Splatting is a good topic here. Obviously On the other hand, since That would also fix #9972, since |
The It might make sense to think where else named tuples will pop up and how often we will look at them going forward. If they are rare, maybe it doesn't matter that much if they look a bit odd, but if they will be used a lot in a lot of contexts it would be nice to have something that is aesthetically nice. One major use case might be return values of functions. Essentially I can imagine that any function that currently returns multiple values via a tuple might in the future return a named tuple instead. This would especially make sense in combination tuple deconstruction syntax and with rest properties like syntax. Say we have a function like this: optimize() = {maxium=5, opt_x=2, iterations=5, algorithm=:myown} One way to call this is obviously |
All of that already works with the parenthesized syntax
Or if we use |
Also: while |
45b2b81
to
d22ab50
Compare
I like @from i in df begin
@select (=i.a, =i.b, =i.c, =i.d, =i.e, =i.f, =i.g)
end just is less nice than the same thing without the |
Here is another syntax idea: |
Well, inside a macro argument you can have any syntax you want :) |
Yes, that works fine. We only need a single context clue to know we're in a named tuple, so multiple Given that brackets are some of the most valuable ASCII real estate, it doesn't seem likely that we'll end up spending |
Yes, but a) it won't help with my plan to make things like this Here is another (maybe wacky) idea:
|
Another idea. Having thought a bit more about But what about putting an extra symbol in front of @from i in df begin
@select @(i.a, i.b)
end Both seem ok to me. This might also nest nicely with #8470: Optically this also is more similar to normal constructor syntax, i.e. essentially the special syntax would be kind of the name of I think this would also work with the various splatting ideas, right? And one could have nice tuple and named tuple deconstruction: @(a, b, c) = foo() # Deconstructs into a named tuple with names a, b, c Now, if there was no opportunity cost in using So, what about the following strategy: for now use some syntax for named tuples that is not |
I can totally understand the aversion to |
I really like the idea of having |
I agree. I'm wondering if the "full" analogy between function signatures and tuples might allow for mixtures of positional and keyword arguments like |
|
||
namedtuple_fieldtype_a(x) = fieldtype(typeof(x), :a) | ||
@test Base.return_types(namedtuple_fieldtype_a, (NamedTuple,)) == Any[Type] | ||
@test Base.return_types(namedtuple_fieldtype_a, (typeof((b=1,a="")),)) == Any[Type{String}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests the secondary code path (test(x, y) = fieldtype(typeof(x), y)
)?
test/namedtuple.jl
Outdated
|
||
@test NamedTuple{(:a,:b),Tuple{Int8,Int16}}((1,2)) === (a=Int8(1), b=Int16(2)) | ||
@test convert(NamedTuple{(:a,:b),Tuple{Int8,Int16}}, (a=3,b=4)) === (a=Int8(3), b=Int16(4)) | ||
@test_throws MethodError convert(NamedTuple{(:a,:b),Tuple{Int8,Int16}}, (x=3,y=4)) === (a=Int8(3), b=Int16(4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should test for which method throws the error, and move most of the computations (the ones that should succeed – e.g. the constructors) into a let block / outside of the try/catch block
test/namedtuple.jl
Outdated
@test_throws MethodError convert(NamedTuple{(:a,:b),Tuple{Int8,Int16}}, (x=3,y=4)) === (a=Int8(3), b=Int16(4)) | ||
|
||
@test NamedTuple{(:a,:c)}((b=1,z=2,c=3,aa=4,a=5)) === (a=5, c=3) | ||
@test NamedTuple{(:a,)}(NamedTuple{(:b,:a),Tuple{Int,Union{Int,Void}}}((1,2))) === NamedTuple{(:a,),Tuple{Union{Int,Void}}}((2,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure more spaces would never be amiss in improving readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test for (a = 1, b = 2, a = 3)
(possibly I just missed seeing this) and equivalently for NamedTuple{(:a, :b, :a), NTuple{3, Int}}((1, 2, 3))
d08bf3b
to
6a08d12
Compare
names = Symbol[an...] | ||
for n in bn | ||
if !sym_in(n, an) | ||
push!(names, n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not preserving of the order of the values. Need to switch an
and bn
in the implementation:
names = Symbol[]
for n in an
if !sym_in(n, bn)
push!(names, n)
end
end
append!(names, bn)
return names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either behavior could be useful but I went with how merge
works for ordered dicts and keyword arguments.
for (k,v) in itr | ||
oldind = get(inds, k, 0) | ||
if oldind > 0 | ||
vals[oldind] = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
bump :) |
It seems like we've reached the point where it would be best if this was merged and then iterated upon after merging. |
Based on #16580, also much work done by quinnj. `(a=1, ...)` syntax is implemented, and `(; ...)` syntax is implemented but not yet enabled.
6a08d12
to
ebfb307
Compare
CI failures look unrelated |
Now we just have to wait a few days for Travis' macOS queue to get to this PR. |
Thanks Jeff and Jacob and everyone else involved in getting this done, really nice feature :) |
Does this make dot overloading easier to implement, or is it orthogonal? Or will named tuples have to be re-implemented once dot overloading lands? |
It's orthogonal. Once we have dot overloading we'll probably re-implement named tuples to use it, but it's not necessary. |
Is there a plan for Julia 0.6 support, e.g. via Compat? NamedTuples.jl could be used on 0.6, but currently it has different semantics for |
I have a sense that there's enough to do for 1.0 already that backporting shouldn't be a priority. Certainly not this implementation, since it touches a fair bit of the Or better yet, use this as an excuse for trying to switch to 0.7 for real work (including package development). I know it's not easy, as I haven't succeeded in pulling this off myself. (On the majority of days over the last two months I've been trying to put in ~30 minutes on Images and its dependencies, but things keep breaking faster than I can keep up.) But it seems important since this is going to be what we use, and there are so many goodies that the potential benefits are huge. |
Thanks for this! @JeffBezanson will keyword arguments be based on this? (in order to make them fast) |
This adds a built-in type very similar to the type in the NamedTuples package, with the syntax
(x=1, y=2)
. Also owes quite a bit to #16580. I tried to keep this fairly minimal for now; we will probably need more functions.The primary difference with NamedTuples is that the type itself is not used to construct a NamedTuple from field values. In general we seem to have moved to more "
convert
-like" constructors; for exampleTuple(itr)
will convert an iterable to a tuple, andTuple(a, b, c)
is not used to construct(a, b, c)
. (A bit of discussion on this in #15120 and #20969.)One big issue is that while the syntax for instances is really nice, the current (implementation-based) syntax for NamedTuple types is nasty:
NamedTuple{(:a, :b), Tuple{Int, Int}}
. Maybe we're ready to use{a=Int, b=Int}
, and{Int, Int}
for tuple types? Although there are still other potential uses for that syntax.Here's a summary of the justification and thinking behind this:
Relational tables are collections of tuples. Experience with tabular data in julia has shown that the main problem with tuples is not being able to refer to components by name. NamedTuples.jl adds that ability, and has worked very well, but suffers from (1) poor syntax, (2) relying on
eval
and thus breaking precompilation and generated functions in various ways. At the same time, we need a new approach to keyword arguments, and there are obvious parallels between the requirements for keyword arguments and the tabular data use cases. Here are some design criteria:(a=1, b=2)
to make a keyword argument container the same way(1, 2)
makes a positional argument container.a.b
,(a=1,)
,(; x, y)
,(; a.x, a.b)
are all desirable for both purposes, but only really work with symbol keys. It's generally difficult to justify a container where the keys have to be symbols, but our structs already have this property, and it's also an established property of record types and object types in other languages. So everything makes sense if we just say this is an anonymous struct type, and reuse as much of that internal machinery as possible.Covariance: we need to be able to join named tuple types and obtain types likeEDIT: Covariance is not essential, and may in fact be a bad idea. With invariance, it's possible for a table type that knows it has nullable columns to specifically construct named tuples withTuple{Union{T,Null}, Union{T,Null}}
instead of producing a union with 2^n components. Adding another built-in type that shares infrastructure with tuple types is by far the easiest way to get this.Union{T,Null}
-typed fields and have that be a concrete type, which avoids blowup in the number of named tuple types in at least some cases.There are different perspectives on dict-like types: sometimes they are more like relations, containing pairs, sometimes they are primarily collections of keys, and sometimes they are primarily collections of values. Iteration cannot do all of these at once; indeed, iteration has little to do with the concept of "keys". This becomes an issue for splatting during named tuple construction. So far, the interface to key-value splatting has been iterating pairs. But iteration isn't the whole story, since we also compute a merge (union-join) of the splatted pairs. Therefore it makes sense to use a slightly different protocol, as
merge
is more intrinsically tied to keyed collections than iteration is. So here I've mademerge(::NamedTuple, other)
the protocol for splatting in named tuples. There is a fallback definition formerge
that expects an iterator of pairs, operating like we do now for e.g. keyword arguments. I think this makes sense: if an associative collection iterates pairs it doesn't have to do anything special, but if it wants to iterate keys or values instead, it can still participate by implementingmerge
.