-
Notifications
You must be signed in to change notification settings - Fork 53
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec for model/cube #854
Comments
+1 I was thinking in a similar direction (that the |
@rgrp and all I'm into this idea of course, but I'd rather see how this plays out with a |
WONTFIX. I'm going to close as wontfix for now and we can re-open if there is interest / need. |
I know this is closed, but I just came across this which seems relevant:
|
@danfowler thanks - and I am aware of them (I think this started out as a simple version of SDMX). |
Re-opening. @pwalsh and I have discussed this recently and clear interest here and we'd like to start something in the nearish future. /cc @ericbusboom |
I've been going through the JSON-stat website, and so far, I'm pretty sure that I don't understand it at all, and that none of my analysts would be able to create a JSON-stat file by hand. I can tell that the format depends on having several array properties all have the same length, basically breaking a conceptual object into separate fields, which seems like a maintenance nightmare. There is plenty to learn from here, but I don't think it is a good model for a design. For my users, my top requirement is that it is easy to create and read the specifications. I want data creators to be able to annotate measures and dimensions from memory, with very little training. Data users must be able to understand the annotations with no training. I have a strong preference for embedding the measure and dimension classifications into the schema, because it's easier to create and read. This can be as simple as:
I imagine the names being mostly common terms like "dollars" or "yen" or "weight" or "sex". I'd further propose that the names have a hierarchical structure to them, to allow for specification and extension. For instance 'weight/lbs' vs 'weight/kg' to distinguish units, or 'race/omb' vs race/census' to distinguish between different systems of standards for race. But, it should also be possible for the user to annotate a column with just "weight." That's not ideal, but I've learned that getting 20% is better than getting 0%. I'd further propose that the names be linked to JSON definitions that can be inlined or well-known. So "race/omb" may have an associated JSON file, possibly similar to the existing JSON-state or Financial data package forms. Then, perhaps, users could also define their own term 'race/orgname' and include a their own definitions in the package. I don't (currently) have strong opinions about the structure of the definitions for the names -- the Fiscal Data Package definitions seem suitably extensible and generalizable -- since the definitions would mostly be created by experts. However, am strongly opinionated that the typical user should be able to annotate the dataset with nothing more than applying a measure/dimension name to a column in the existing schema, and those names should be familiar and easy to memorize. For reference, here are the inputs and outputs of the annotation system I'd produced before. This one has a rich datatype field ( rather than a separate field for the measure/dimension annotation), and a parent connection to link columns. The measure/dimension classification is inherent in the rich datatype; "count" is always a measure, "raceeth" is always a dimension. Here is a schema file: http://test.docker1.civicknowledge.com/bundles/d04p006/file/schema.csv And here is what the file looks like when rendered for the web: http://test.docker1.civicknowledge.com/partitions/p04p00f006 As with Tableau, dimensions are green and measures are blue. Errors and uncertainties are grey. indentation represents parent/child relationships. |
so, now actually reopening, and also ref. frictionlessdata/datapackage#343 |
I am quite sure I completely get what you want here, and it is very inline with where I think we need to go to generalise this out of our previous work on FDP. One question: you say
Which user? Someone who edits a descriptor file directly, so, someone comfortable with text editing a JSON file? I ask because I want to distinguish between a canonical representation of something on the descriptor, and an ideal user experience for "end users" who might generate a descriptor via a series of actions. OpenSpending currently supports a customisation to FDP (unspecified as yet) which does such annotations per field. |
Ah, Good question. I half-thought "user" was the wrong word when I wrote that .... I should have said Creator and Wrangler, as described in this analysis model. So, it's the people who are creating the dataset and the data dictionary, not the people who are defining what "weight" means.
Yes, Absolutely. The definition of what "weight" is could be ( probably should be ) JSON. I've updated one of my older specifications into a proposal for a semantic datatype category taxonomy. This is basically the system I've linked to previously, used in Ambry. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Fiscal Data Package has a
mapping
object. This is very very handy for building a logical model out of the physical data sources when appropriate. This logical model can in turn be used to automate visualisations and data loaders, for example.Actually, there is nothing particularly "Fiscal" about this
mapping
: it is simply an OLAP cube implementation with measures and dimensions. I think we could extract out the generic pattern and expose it as a spec for declaring a model/cube mapping for any tabular data package.The text was updated successfully, but these errors were encountered: