Dict-like interface isn't looking good! #102

buriy · 2019-07-02T18:52:04Z

first_batch = x[{"batch": 1}]
three_examples = x[{"batch": slice(1, 4)}]

It doesn't look well, no IDE hints, and constructs a new dict every time.

Would you consider pandas-like interface instead?
x[x.batch==1] and x[x.batch>=1 && x.batch<4] syntax looks much better and allows for IDE hints.
x.batch[1] and x.batch[1:4] also can be used for this specific case -- that would be amazing!

The text was updated successfully, but these errors were encountered:

EndingCredits · 2019-07-05T09:46:43Z

I was just about to make a separate issue for this, but I'll roll it into this one.

I have been thinking about this issue a bit and this is the solution I've come up with:

Method 1:
We set an order for slicing, and then use regular slicing. N.B: We set this order as a variable of the named-tensor, we don't do any transposing of the tensor.

e.g. We can set an order by using t['name1','name2,'name3'] and then access using t[a, b, c] which returns the equivalent to t[ { 'name1': a, 'name2': b, 'name3': c }].

This can be used in a number of ways:

t1 = t['name1','name2','name3'][a, b, c]
t1 = t['name1','name2','name3', a, b, c]
t1 = t['name1', a, 'name2', b, 'name3', c]

Plus we can set once and then treat our tensor as a normal tensor for a while:

t = t['name1','name2','name3']
t1 = t[a, b, c]
t2 = t[d, e, f]

Or maybe something like:

with t.order('name1','name2','name3'):
    t1 = t[a, b, c]
    t2 = t[d, e, f]

Method 2:

We abuse slicing notation and allow:
t1 = t['name1': x, 'name2': y:z, 'name3': w]
In this case we have three slices where the start member of each slice is set to the name string. This frees up the stop and step parts of the slice for our use, which we can treat as start and stop for our actual slicing. This does remove the capability to use the step component, although that currently isn't implemented anyway, and can be achieved through other means.

Example:
assert t['name1': 3, 'name2': 2:5, 'name3': -1] == t[{'name1': 3, 'name2': slice(2,5), 'name3': -1}]

Both these methods can be used together, although there's some question as to certain minor details of how this would work, e.g. what should the behaviour of t['name1', 'name2', 1, 'name3': 3, 2] be.

If you want to have a play around, I have a basic implementation here:
https://github.com/EndingCredits/namedtensor

buriy · 2019-07-05T17:18:34Z

@EndingCredits
Using raw names still isn't good.
You seemed to ignore all important points in my message.
Someone will write t['neme1'] and it won't work, because it's t['name1'].
t[t.name1] is much better, and allows operations like t[t.name1>5] or t.name[1:5].
And your t['name1': 3, 'name2': 2:5, 'name3': -1] even isn't a valid python code ( list indices must be integers or slices, not tuple ).
There are two good sources to look for similar well-known solutions:
Django API and SQLAlchemy API. Pandas used SQLAlchemy API (and did it lousily).

EndingCredits · 2019-07-05T20:33:20Z

I was't really commenting on your suggestion, just rolling the discussion into one thread.

Also, looks fine to me

buriy · 2019-07-06T07:52:03Z

@EndingCredits it is much better when linter/compiler warns you about this, instead of at run-time.

srush · 2019-07-07T23:03:39Z

Thanks for the suggestions, this is helpful and I hear the two different approaches.

I agree the dict construction approach is a bit heavyweight. particularly in loops. I need to look into whether that is actually a problem though from a memory standpoint.

I see why the x.batch approach is appealing in a an editor (if you somehow declare the names). However [x.batch == 1] seems like a strange way to do constant time indexing, unless you see that being lazy somehow.

buriy · 2019-07-08T02:00:59Z

@srush for that you might have x.batch[1] as well, and anyway it's no worse than x[{"batch": 1}] .

EndingCredits · 2019-07-08T10:10:36Z

Memory-wise I imagine the overhead of constructing dicts is negligible compared to the actual tensor operations, unless somehow a tensor itself is being copied.

More importantly, the current way of doing thing is just very clunky, and a bit unintuitive. I feel any of the methods above would be a big improvement to usability.

EndingCredits mentioned this issue Apr 27, 2021

Add support for tensor abbreviations and better addressing #104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dict-like interface isn't looking good! #102

Dict-like interface isn't looking good! #102

buriy commented Jul 2, 2019

EndingCredits commented Jul 5, 2019

buriy commented Jul 5, 2019 •

edited

Loading

EndingCredits commented Jul 5, 2019

buriy commented Jul 6, 2019

srush commented Jul 7, 2019

buriy commented Jul 8, 2019

EndingCredits commented Jul 8, 2019

Dict-like interface isn't looking good! #102

Dict-like interface isn't looking good! #102

Comments

buriy commented Jul 2, 2019

EndingCredits commented Jul 5, 2019

buriy commented Jul 5, 2019 • edited Loading

EndingCredits commented Jul 5, 2019

buriy commented Jul 6, 2019

srush commented Jul 7, 2019

buriy commented Jul 8, 2019

EndingCredits commented Jul 8, 2019

buriy commented Jul 5, 2019 •

edited

Loading