-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FeatureStore
abstraction definition
#4534
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4534 +/- ##
==========================================
+ Coverage 82.71% 82.77% +0.05%
==========================================
Files 313 315 +2
Lines 16361 16512 +151
==========================================
+ Hits 13533 13667 +134
- Misses 2828 2845 +17
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put two thoughts in the code. I think what we wrote is cool and pythonic, but not easy to follow if I read this code for the first time.
I don't have a strong opinion to change the style now but want to hear people thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. The nesting of TensorAttr
looks pretty cool. Please see my comments below (especially about some confusions I have regarding index
). I am not really sure why we store index
in the first place. It looks to be because we require TensorAttr
to be fully specified. On the other hand, it doesn't make much sense to require it in case the user wants to access the full matrix. Originally, I would have thought that it makes sense to include the tensor
inside TensorAttr
, and move any index
logic to TensorAttr
instead:
class FeatureStore:
def put_tensor(tensor, name, group):
self._put_tensor(TensorAttr(tensor, name, group))
where TensorAttr
manages any index
logic:
class TensorAttr:
def __getitem__(self, idx: IndexType):
return self.tensor[idx]
def __call__(self):
return self.tensor
This would then result in an API similar to
x_full = store['x', 'paper']()
x_selected = store['x', 'paper'][index]
Also agree with @yaoyaowd that it is pretty hard to understand on first read (although BaseStorage
has the same problem).
There exists two other alternatives to handle
|
Two additional thoughts:
|
@rusty1s is the I also feel directly use And if necessary, we define I think it is a better idea to simplify interface now and iterate fast to get back additional features we need. |
@yaoyaowd @rusty1s thank you both for the great discussion. Let me summarize some thoughts and the resulting design changes.
I think this proposal will address many of the main design decisions we discussed above and result in a streamlined, Pythonic API. Please let me know if you have any comments or thoughts! |
I like it very much. Whenever something is not fully-specified, we return a |
@rusty1s @yaoyaowd changes have been made to align with the discussion above. I've tried to add clear documentation for all of the builtins to improve code readability, and have also added tests that showcase functionality. @rusty1s, both points you listed above regarding the Let me know what you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks way better now. Thanks for updating. Left a bunch of comments nonetheless :P
key = self._attr_cls.cast(key) | ||
self.put_tensor(value, key) | ||
|
||
def __getitem__(self, key: TensorAttr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getattr
and setattr
equivalents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a FeatureStore
should implement getattr
and setattr
; imo, these should only be implemented for views on the store. I don't think it's particularly clean to have
store.group_name -> AttrView(group_name=group_name)
as this syntax seems more confusing to me than clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought it makes sense to implement for stores without group names (like PyG data
objects). We could require that the output is fully specified in case we allow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the only way for an output to be fully specified through getattr
to allow for chaining, which necessitates that getattr
can return an AttrView
? That feels odd to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If TensorAttr
sets both group_name
and index
to None
by default, then store.{attr_name}
should give you a tensor. I am okay with leaving this out for now.
@rusty1s as always, thank you for the detailed comments :) Cleaned up the interface and builtins more. A couple general notes I wanted to share:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the updates :)
Removes `TensorAttr.fully_specify` which was originally added in pyg-team#4534. --------- Co-authored-by: rusty1s <matthias.fey@tu-dortmund.de>
Defines a
FeatureStore
abstraction and associated tests as a first step to allow for independent scale-out for a graph's features and itsedge_index
. A roadmap for remaining features and implementation details will follow shortly. More details:FeatureStore(MutableMapping)
with basic CRUD operations as abstract methods as well as support for advanced indexingTensorAttr
andAttrView
as classes that aid in accessing, modifying, and viewing elements in a feature storeCastMixin
for straightforwardTensorAttr
creationMyFeatureStore
along with associated tests to showcase API and indexing functionality