Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Communicate if data is sampled to the front end some how #6

Open
holdenk opened this issue Dec 22, 2017 · 4 comments
Open

Communicate if data is sampled to the front end some how #6

holdenk opened this issue Dec 22, 2017 · 4 comments

Comments

@holdenk
Copy link
Collaborator

holdenk commented Dec 22, 2017

cc @rgbkrk is there a reasonable way to expose this in the current JSON format?

@rgbkrk
Copy link
Member

rgbkrk commented Dec 22, 2017

The formal spec for Data Resource has a metadata field which can contain any number of new properties. We'll probably want to settle on our own format for conveying to the frontend the caveats about number of samples, that we're doing sampling, and how many (estimated) records aren't shown/available. We'd also want something similar if someone does a .head in Spark or Pandas to show that they're only listing N records.

@rufuspollock @pwalsh - what would be the best way to signify sampling or otherwise in data resource json? Totally ok with it being in metadata, we'll just at least semi-standardize it for spark, pandas, and other library <--> jupyter communication.

@rgbkrk
Copy link
Member

rgbkrk commented Dec 22, 2017

It just occurred to me that we could indicate sampled data in another way, which is metadata on the displayed result.

display(
    JSON({'a': 'b'}),
    metadata={
        "application/json": {
            "expanded": True
        }
    }
)

screen shot 2017-12-22 at 8 13 17 am

We'd have to do the same for a data resource object, which would mean something like this:

display(
    df.sample(6),
    metadata={
        "application/vnd.dataresource+json": {
            'samples': 6
        }
    }
)

Either way it's a contract with the frontend for how we would want to indicate this, so I'd hope to version this somehow.

@rufuspollock
Copy link

@rgbkrk this is for situation where the data in the data resource was sampled from other place and you want to indicate the data was sampled is that right?

If so, i think this may be a reasonably common use case (i've certainly come across it myself) in which case it would be great if you could take a minute to open an issue on https://github.com/frictionlessdata/specs/issues so that other folks see this and we can come up with a simple pattern.

@rgbkrk
Copy link
Member

rgbkrk commented Dec 23, 2017

Ok, cool. For the time being, if you pick something for ipython's display metadata field @holdenk we can start using it to prototype on the UI side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants