Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data getter #98

Merged
merged 10 commits into from
Feb 7, 2015
Merged

Conversation

industrial-sloth
Copy link
Contributor

This adds support for getting individual records or sets of records from Data subtypes.

Three methods are added, together with __getitem__ (brackets notation) as a syntactic convenience. The methods are:

  • get: Returns a single record value, specified by record key (example: series.get((12,24,36)), returns series value at (12, 24, 36))
  • getAll: Returns multiple record values, one for each key passed (example: images.getAll([2, 4, 6]) returns a sequence of image arrays from time points 2, 4, and 6.
  • getRange: returns multiple record values specified by slices (example: series.getRange([slice(1), slice(2,3)] returns the single series value with key (0, 2).
  • brackets notation delegates either to get or to getRange depending on whether a single key or a range of keys are passed (example: series[:3,2:3] returns values of keys (0,2), (1,2), (2,2)).

@industrial-sloth
Copy link
Contributor Author

Hmmm, no sooner do I create the PR than I realize what I'm missing - need to add support for a mixture of slices and individual indices in getRange (and the brackets accessor). Coming right up... also apparently I need to resolve a merge conflict...

return True

if not hasattr(sliceOrSlices, '__len__'):
# make my func the...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand this note =)

@freeman-lab
Copy link
Member

@industrial-sloth this is really nice! It's looking great, some minor comments, main one is about possibly reorganizing the tests.

else:
return filteredVals

def getAll(self, keys):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to getMany to avoid confusion with collect

Apparently there are those in the world who do not immediately
recognize a George Clinton / Parliament reference.
Check that requested key is of appropriate arity for RDD keys,
based on a call to rdd.first() to get example key.
Previously threw exceptions. This differs from python dict or
sequence behavior but is now consistent with get and getRange.
@industrial-sloth
Copy link
Contributor Author

Ok, have updated this PR based on offline conversations with @freeman-lab.
Changes:

  • tests are now in a new test_data.py file rather than test_images and test_series
  • the various get methods now call first in order to perform type checking against actual RDD keys, throwing exceptions early if the queried key doesn't match the arity of the actual keys
  • getAll has been renamed to getMany
  • the brackets syntax for the getter methods now returns None or [] when queries fail to return any values, rather than throwing KeyError or IndexError.

@freeman-lab
Copy link
Member

Thanks @industrial-sloth ! LGTM, merging it in.

freeman-lab added a commit that referenced this pull request Feb 7, 2015
@freeman-lab freeman-lab merged commit 1d39057 into thunder-project:master Feb 7, 2015
@industrial-sloth industrial-sloth deleted the data_getter branch February 12, 2015 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants