Spec for model/cube #854

pwalsh · 2015-12-30T10:00:46Z

Fiscal Data Package has a mapping object. This is very very handy for building a logical model out of the physical data sources when appropriate. This logical model can in turn be used to automate visualisations and data loaders, for example.

Actually, there is nothing particularly "Fiscal" about this mapping: it is simply an OLAP cube implementation with measures and dimensions. I think we could extract out the generic pattern and expose it as a spec for declaring a model/cube mapping for any tabular data package.

The text was updated successfully, but these errors were encountered:

danfowler · 2016-01-04T23:02:52Z

+1 I was thinking in a similar direction (that the mapping/model of FDP should be made generic) when I posted this comment.

s-celles · 2016-02-29T16:01:39Z

+1 also see https://discuss.okfn.org/t/datapackage-for-3-dimensional-arrays-and-maybe-more/2107/2

pwalsh · 2016-07-12T09:00:10Z

@rgrp and all

I'm into this idea of course, but I'd rather see how this plays out with a views spec. Happy to leave this open for a while though while views gets worked on.

rufuspollock · 2016-08-11T14:34:35Z

WONTFIX. I'm going to close as wontfix for now and we can re-open if there is interest / need.

danfowler · 2017-02-07T21:28:37Z

I know this is closed, but I just came across this which seems relevant:

https://json-stat.org/

The JSON-stat format is a simple lightweight JSON format for data dissemination. It is based in a cube model that arises from the evidence that the most common form of data dissemination is the tabular form. In this cube model, datasets are organized in dimensions. Dimensions are organized in categories.

rufuspollock · 2017-02-08T00:44:20Z

@danfowler thanks - and I am aware of them (I think this started out as a simple version of SDMX).

rufuspollock · 2017-02-08T00:45:42Z

Re-opening. @pwalsh and I have discussed this recently and clear interest here and we'd like to start something in the nearish future.

/cc @ericbusboom

ericbusboom · 2017-02-13T19:30:54Z

I've been going through the JSON-stat website, and so far, I'm pretty sure that I don't understand it at all, and that none of my analysts would be able to create a JSON-stat file by hand. I can tell that the format depends on having several array properties all have the same length, basically breaking a conceptual object into separate fields, which seems like a maintenance nightmare. There is plenty to learn from here, but I don't think it is a good model for a design.

For my users, my top requirement is that it is easy to create and read the specifications. I want data creators to be able to annotate measures and dimensions from memory, with very little training. Data users must be able to understand the annotations with no training.

I have a strong preference for embedding the measure and dimension classifications into the schema, because it's easier to create and read. This can be as simple as:

Defining names in a taxonomy for the types of measures and dimensions
Attaching the names to columns in the existing schema

I imagine the names being mostly common terms like "dollars" or "yen" or "weight" or "sex".

I'd further propose that the names have a hierarchical structure to them, to allow for specification and extension. For instance 'weight/lbs' vs 'weight/kg' to distinguish units, or 'race/omb' vs race/census' to distinguish between different systems of standards for race.

But, it should also be possible for the user to annotate a column with just "weight." That's not ideal, but I've learned that getting 20% is better than getting 0%.

I'd further propose that the names be linked to JSON definitions that can be inlined or well-known. So "race/omb" may have an associated JSON file, possibly similar to the existing JSON-state or Financial data package forms. Then, perhaps, users could also define their own term 'race/orgname' and include a their own definitions in the package.

I don't (currently) have strong opinions about the structure of the definitions for the names -- the Fiscal Data Package definitions seem suitably extensible and generalizable -- since the definitions would mostly be created by experts.

However, am strongly opinionated that the typical user should be able to annotate the dataset with nothing more than applying a measure/dimension name to a column in the existing schema, and those names should be familiar and easy to memorize.

For reference, here are the inputs and outputs of the annotation system I'd produced before. This one has a rich datatype field ( rather than a separate field for the measure/dimension annotation), and a parent connection to link columns. The measure/dimension classification is inherent in the rich datatype; "count" is always a measure, "raceeth" is always a dimension. Here is a schema file:

http://test.docker1.civicknowledge.com/bundles/d04p006/file/schema.csv

And here is what the file looks like when rendered for the web:

http://test.docker1.civicknowledge.com/partitions/p04p00f006

As with Tableau, dimensions are green and measures are blue. Errors and uncertainties are grey. indentation represents parent/child relationships.

pwalsh · 2017-02-13T20:17:03Z

so, now actually reopening, and also ref. frictionlessdata/datapackage#343

pwalsh · 2017-02-15T06:37:50Z

@ericbusboom

I am quite sure I completely get what you want here, and it is very inline with where I think we need to go to generalise this out of our previous work on FDP.

One question: you say

But, it should also be possible for the user to annotate a column with just "weight." That's not ideal, but I've learned that getting 20% is better than getting 0%.

Which user? Someone who edits a descriptor file directly, so, someone comfortable with text editing a JSON file?

I ask because I want to distinguish between a canonical representation of something on the descriptor, and an ideal user experience for "end users" who might generate a descriptor via a series of actions.

OpenSpending currently supports a customisation to FDP (unspecified as yet) which does such annotations per field.

ericbusboom · 2017-02-15T20:10:54Z

Ah, Good question. I half-thought "user" was the wrong word when I wrote that .... I should have said Creator and Wrangler, as described in this analysis model. So, it's the people who are creating the dataset and the data dictionary, not the people who are defining what "weight" means.

I ask because I want to distinguish between a canonical representation of something on the descriptor, and an ideal user experience for "end users" who might generate a descriptor via a series of actions.

Yes, Absolutely. The definition of what "weight" is could be ( probably should be ) JSON.

I've updated one of my older specifications into a proposal for a semantic datatype category taxonomy. This is basically the system I've linked to previously, used in Ambry.

pwalsh mentioned this issue Dec 30, 2015

change name mapping to model or cube openspending/fiscal-data-package#111

Closed

rufuspollock closed this as completed Aug 11, 2016

pwalsh mentioned this issue Dec 18, 2016

Proper semantic type support frictionlessdata/datapackage#343

Closed

pwalsh reopened this Feb 13, 2017

pwalsh self-assigned this Feb 13, 2017

pwalsh mentioned this issue Aug 29, 2017

[WIP] Requirements for v1 of Fiscal Data Package openspending/openspending#1312

Closed

14 tasks

pwalsh mentioned this issue Jul 1, 2020

rdfType attribute allows only one Class to be provided. frictionlessdata/datapackage#686

Closed

2 tasks

roll transferred this issue from frictionlessdata/datapackage Jan 3, 2024

frictionlessdata locked and limited conversation to collaborators Jan 3, 2024

roll converted this issue into discussion #855 Jan 3, 2024

github-project-automation bot moved this to Done in Open Knowledge Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Spec for model/cube #854

Spec for model/cube #854

pwalsh commented Dec 30, 2015

danfowler commented Jan 4, 2016

s-celles commented Feb 29, 2016

pwalsh commented Jul 12, 2016

rufuspollock commented Aug 11, 2016

danfowler commented Feb 7, 2017

rufuspollock commented Feb 8, 2017

rufuspollock commented Feb 8, 2017

ericbusboom commented Feb 13, 2017

pwalsh commented Feb 13, 2017

pwalsh commented Feb 15, 2017 •

edited

Loading

ericbusboom commented Feb 15, 2017

This issue was moved to a discussion.

This issue was moved to a discussion.

Spec for model/cube #854

Spec for model/cube #854

Comments

pwalsh commented Dec 30, 2015

danfowler commented Jan 4, 2016

s-celles commented Feb 29, 2016

pwalsh commented Jul 12, 2016

rufuspollock commented Aug 11, 2016

danfowler commented Feb 7, 2017

rufuspollock commented Feb 8, 2017

rufuspollock commented Feb 8, 2017

ericbusboom commented Feb 13, 2017

pwalsh commented Feb 13, 2017

pwalsh commented Feb 15, 2017 • edited Loading

ericbusboom commented Feb 15, 2017

This issue was moved to a discussion.

pwalsh commented Feb 15, 2017 •

edited

Loading