Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make type names JSON Table Schema conformant #40

Open
rufuspollock opened this issue Mar 18, 2013 · 25 comments
Open

Make type names JSON Table Schema conformant #40

rufuspollock opened this issue Mar 18, 2013 · 25 comments

Comments

@rufuspollock
Copy link
Member

See http://www.dataprotocols.org/en/latest/json-table-schema.html#types

  • Decimal => number
  • String => string
  • Integer => integer

etc

@rufuspollock
Copy link
Member Author

/cc @mk270 @domoritz

@mk270
Copy link
Contributor

mk270 commented Mar 18, 2013

You just need a mapping, like the one in ktbh:

classes = {
StringType: "string",
IntegerType: "integer",
FloatType: "number",
DecimalType: "number",
DateType: "date",
DateUtilType: "date"
}

There's no need for everyone depending on messytables to change their code

@rufuspollock
Copy link
Member Author

A mapping where though - messytables flows through into dataconverters and thence dataproxy etc etc

@domoritz
Copy link
Contributor

I really like the json table schema but I don't think that messytables is the right point to set the type names. Especially since we use only classes and not strings to identify types. I'll close this for now.

@rufuspollock
Copy link
Member Author

So where would it go? DataConverters?

@rufuspollock rufuspollock reopened this Mar 19, 2013
@domoritz
Copy link
Contributor

Yes, it should go into DataConverters. Imo, messytables is a basic library that should not care about the actual output to the user.

@mk270
Copy link
Contributor

mk270 commented Mar 19, 2013

Please do NOT stick it in DataConvertors! It's already in one downstream package; it should be horizontally integrated.

Either stick it in a shim, or stick it in messytables.

I shall write a shim; we can then use that, and decide later whether the shim should be in messytables

@mk270
Copy link
Contributor

mk270 commented Mar 20, 2013

I've stuck a shim up at: https://github.com/mk270/messytables-jts

@domoritz
Copy link
Contributor

@mk270 Great. Now the question is whether the mapping should become part of messytables. I think it makes sense as long as it's not too confusing for people who use messytables. Would you like to add it and then we can discuss things based on the proposal?

@mk270
Copy link
Contributor

mk270 commented Mar 23, 2013

Well, people who use messytables aren't going to get confused unless they call any JTS-related functionality.

Let's wait for the JTS spec to be formally frozen, and for messytables-jts to have a few weeks in production, before worrying about whether to incorporate the mapping in messytables itself. There's no hurry, right, and we'd benefit from comparing what other people are doing with it, too?

@rufuspollock
Copy link
Member Author

I'm +1 for getting this in. I think it would be really good if messytables were JTS conformant ...

rufuspollock added a commit that referenced this issue Apr 29, 2013
Add messytables -> jts converter, with docs. Fixes #40
@rufuspollock
Copy link
Member Author

@domoritz hmmm - i should have reviewed this more carefully :-)

I think there was a bit of a confusion here. I don't think we need JTS output from messytables - we just wanted the celltypes to conform to JTS types list.

Questions:

  • What is the feeling on actually doing the latter (i.e. using JTS compatible types ...)
  • Could we remove the current jts.py module as it adds additional dependencies and isn't really needed for anything useful ....

@rufuspollock rufuspollock reopened this May 5, 2013
@domoritz
Copy link
Contributor

domoritz commented May 5, 2013

Hmm. I'm afraid I still don't understand what you want.

What is the feeling on actually doing the latter (i.e. using JTS compatible types ...)

What would that actually mean? The types can already be mapped to the JTS types easily.

Could we remove the current jts.py module as it adds additional dependencies and isn't really needed for anything useful ....

I'll revert 6e98d06 later.

@rufuspollock
Copy link
Member Author

I mean not mapping to JTS types but using JTS types as default :-)

I want messytables to just support JTS types out of the box - not have to use some special version of messytables to get the JTS types ...

@domoritz
Copy link
Contributor

domoritz commented May 5, 2013

Well, that is a huge change because it would break every application that uses messytables. Also, we could not use things like the format for dates any more. How about we add a __str__ method to every messytables type that returns the JTS type?

@mk270
Copy link
Contributor

mk270 commented May 8, 2013

This is going to couple messytables to JTS such that there'll be pressure to break JTS to accommodate new types in messytables, and vice versa.

That means people will be more hesitant to rely on either of them.

@mk270 mk270 closed this as completed May 8, 2013
@rufuspollock
Copy link
Member Author

@mk270 i don't see how depending on a set of types is going to be that tough and if there are key types not supported in JTS it would warrant adding them.

@domoritz can you explain re formats for dates?

Generally we should make not change until we have very clear agreement here :-)

@rufuspollock rufuspollock reopened this May 8, 2013
@domoritz
Copy link
Contributor

domoritz commented May 8, 2013

messytables.types.DateType also cntains the format of the dates when doing type guessing. JTS does not specify a format but recommetnds ISO. I think we should not support JTS at all because as far as I understand it they have different intensions. Messytables if for parsing tabular data from files. JTS is to define a schema. We should not mix them, IMHO.

@rgrp @mk270 What would be the advantage of supporting only JTS in messytables? Doesn't it mkae more sense to have this as a return type from dataconvereters?

@mk270
Copy link
Contributor

mk270 commented May 8, 2013

No, it doesn't make more sense to return it from dataconvertors, as stated further up at #40 (comment)

The only questions are whether JTS should be output by messytables (I don't mind, but it seems like a waste of effort given that messytables-jts exists), and whether messytables should break a bunch of things which depend on it and change its interface the better to reflect JTS.

If we break downstream, then people won't use messytables. So it's like taking messytables in house.

@rufuspollock
Copy link
Member Author

@domoritz JTS does contain an optional format field (but not sure i understand you).

I note that most people using messytables downstream should be using dataconverters (shouldn't they?)

@mk270 Would dataconverters support JTS types (or not)?

@domoritz
Copy link
Contributor

domoritz commented May 8, 2013

@rgrp Well, it's not in the list of attributes. I made a pull request to fix that frictionlessdata/datapackage#45.

Most people (~90%) should probably use dataconverters but not everyone.

I don't see any problems when we tie messytables and jts because jts should be a standard and be used in implementations. @rgrp I'm not 100% sure, I understand what you want. Do you only want us to return strings instead of classes for types or a whole jts with the data? I think we need an example to know what we are talking about.

Is it okay, if I revert the jts commit from messytables for now until we come up with a proper solution?

@rufuspollock
Copy link
Member Author

Yes, ok to revert the JTS stuff - we certainly should not change any interface with folks until we have thoroughly discussed etc.

What i'm motivated by is what you get when we convert the "metadata" we get from messytables to JSON e.g. look at the metadata here (which is messytables type info converted through dataconverters to json):

http://jsonpdataproxy.appspot.com/?url=https://raw.github.com/datasets/gold-prices/master/data/data.csv&max-results=2&guess-types=1

@mk270
Copy link
Contributor

mk270 commented May 11, 2013

Can't you just use messytables-jts in dataconverters?

@domoritz
Copy link
Contributor

👍 for using the jts in dataconverters instead of messytables. I understand the motivation but I don't think that messtables as a low-level library is the right place for it. What I would do is adding a field to the messytables type classes which is called jts_type.

Alternatively, we can keep headers_and_typed_as_jts and rowset_as_jts but I's prefer the jts_type field as it is much easier to use and people can switch to the jts types very easily.

@rufuspollock
Copy link
Member Author

@domoritz this sounds good - can you explain the proposed change to messytables a bit more and how this would be used in e.g. dataconverters ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants