Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple example of conversion functions #2

Open
jbenet opened this issue Apr 17, 2014 · 11 comments
Open

simple example of conversion functions #2

jbenet opened this issue Apr 17, 2014 · 11 comments

Comments

@jbenet
Copy link
Owner

jbenet commented Apr 17, 2014

not final, food for thought.

Want output

{
  'name': 'Juan Batiz-Benet'
  'city': 'San Francisco, CA'
}

Input Type FOO

{
  'name': {
    '@type': 'pandat/name',
    'label': 'NAME',
    'codec': 'pandat/name-last-name-first'
  },
  'addr': {
    '@type': 'pandat/us-street-address',
    'label': 'ADDR',
  }
}

Output Type BAR

{
  'name': 'pandat/name',
  'city': 'pandat/us-city',
}

You can write a conversion function, use it and/or publish it to pandat:

(excuse this interface, might be simplified some)

var Foo2Bar = pandat.Conversion({'invertible': 'false'}, [Foo], [Bar]);

Foo2Bar.convert = function(foo) {
  return {
    'name': pandat(Foo.name['@type'], Bar.name['@type'], foo.name),
    'city': pandat(Foo.addr['@type'], Bar.city['@type'], foo.addr)
  }
}

Or, pandat might be able to generate the function, with some hints about how the names map to each other. (not quite sure what the right interface is here, but will think about it.)

@yoshuawuyts
Copy link

I'd like to see something more along the lines of this:

Source:

{ 
  "name": {
    "type": "pandat/name-last-name-first",
    "label": "NAME"
  },
  "city": {
    "type": "pandat/us-street-address",
    "label": "ADDR"
  }
}

Output:

{ 
  "name": {
    "type": "pandat/name",
    "label": "NAME",
    "source": "NAME"
  },
  "addr": {
    "type": "pandat/us-city",
    "label": "CITY",
    "source": "ADDR"
  }
}

Converter:

/**
 * Module dependencies
 */

var object1 = require('./object1.json')
var fooSchema = require('./foo.json');
var barSchema = require('./bar.json');
var pandat = require('pandat');

/**
 * Initialize converter.
 *
 * @param {Object} sourceSchema
 * @param {Object} targetSchema
 * @return {Function}
 */

var converter = pandat.Conversion(fooSchema, barSchema, {invertible: false});

/**
 * Execute conversion
 */

var resultObject = converter(object1);

I think that if you design your relations beforehand, there'll be no need for further declarations. Such an implementation would allow for more flexibility, and result in a cleaner API.

Also: I didn't quite catch the difference between a codec and conversion, could you explain what you mean by that?

@jbenet
Copy link
Owner Author

jbenet commented Apr 18, 2014

Hello!

I think that if you design your relations beforehand, there'll be no need for further declarations. Such an implementation would allow for more flexibility, and result in a cleaner API.

Yeah! What i meant above by "pandat might be able to generate the function, with some hints about how the names map to each other".

"source": "NAME"

I like this relational mapping, though it won't quite happen on the output type, as the output type may be an input type elsewhere. Relevant to mention here is that users will be reusing types published by others. Totally possible to just have to specify:

var converter = pandat.Conversion('jbenet/foo', 'jbenet/bar')

Given I published foo and bar schemas :)

I didn't quite catch the difference between a codec and conversion, could you explain what you mean by that?

Yeah. A Codec is a named pair of functions to encode and decode between raw data and typed objects. For example, see https://github.com/jbenet/pandat/blob/master/stdlib/json_codec.js and https://github.com/jbenet/pandat/blob/master/stdlib/xml_codec.js (these are just examples, nothing works yet). Codecs don't have to be as general as json or xml. They can be type-specific. See https://github.com/jbenet/pandat/blob/master/stdlib/date_type.js#L23-L35 (again nothing works yet, there's errors there :] ). Codecs can be published and installed (npm modules).

A Conversion is a function converting between two types. The example above shows converting between Foo and Bar. While it's certainly possible to generate conversion functions from relations (inferred based on the types, or specified with source/target keys), many conversion functions will be complex and require programming. These would be publishable/installable modules as well.

Lmk if that makes sense? Will put this all on the Readme.

@yoshuawuyts
Copy link

Your explanation of Codec makes sense. But before I start suggesting any changes, let me check if I understood it correctly:

A conversion has a:

  • input schema
  • output schema
  • link schema, which plots the transformation from A to B

An input schema has:

  • Types, which define the data type
  • Labels, which handle as unique id's
  • Codecs, which prepare the data for conversion

An output schema has:

  • Types, which define the data type
  • Codecs, which prepare the data for consumption

Or outputSchema == inputSchema? Let me know if this sounds about right.

@jbenet
Copy link
Owner Author

jbenet commented Apr 18, 2014

Output schema == input schema. They're the same thing. They define Types. Types can be used as inputs or outputs in a conversion.

Other than that, right on!

@yoshuawuyts
Copy link

I really dislike the @something syntax. I don't think keys should be namespaced if they're not used outside pandat/transform.

And couldn't this:

/**
 * Module dependencies
 */

var outputSchema = require('./bar');
var inputSchema = require('./foo');
var linkSchema = require('baz');
var pandat = require('pandat');

/**
 * Initialize converter.
 */

var Foo2Bar = pandat.Conversion({'invertible': 'false'}, [inputSchema], [outputSchema]);

Foo2Bar.convert = function(linkSchema) {
  return {
    'name': pandat(inputSchema.name['@type'], outputSchema .name['@type'], linkSchema.name),
    'city': pandat(inputSchema.addr['@type'], outputSchema .city['@type'], linkSchema.addr)
  }
}

be rewritten to this:

/**
 * Module dependencies
 */

var outputSchema = require('./bar');
var inputSchema = require('./foo');
var linkSchema = require('baz');
var pandat = require('pandat');

/**
 * Export converter.
 */

module.exports = var converter = pandat({'invertible': 'false'});

converter.schema = {
  'name': [inputSchema.name, outputSchema.name, linkSchema.name],
  'city': [inputSchema.addr, outputSchema.city, linkSchema.city],
}

You could use an internal function to execute converter.schema. Not sure if closures are passed around correctly though.

The less friction the API causes, the more developers will love using it. Imo things like @type should be evaded. What do you think?

@jbenet
Copy link
Owner Author

jbenet commented Apr 19, 2014

I really dislike the @something syntax.

Take that up with json-ld.org :)

I don't think keys should be namespaced if they're not used outside pandat/transform.

They are, the goal is for all transformer objects to have a definition in JSON-LD. (sorry, haven't made it clear in the REAMDE.) They'll have their own @context, etc. The trick is that the library can fill in a lot of the standard stuff, so:

(s/pandat/transformer/ in your mind here)

t = pandat.Type({
  'name': {
    '@type': 'pandat/name',
    'label': 'NAME',
    'codec': 'pandat/name-last-name-first'
  },
  'addr': {
    '@type': 'pandat/us-street-address',
    'label': 'ADDR',
  }
})

fill's in:

> t.src
{
  '@context': 'http://pandat.io/context/pandat.jsonld',
  '@type': 'Type',
  'codec': 'pandat/identity-codec',
  'schema': {
    'name': {
      '@type': 'pandat/name',
      'label': 'NAME',
      'codec': 'pandat/name-last-name-first'
    },
    'addr': {
      '@type': 'pandat/us-street-address',
      'label': 'ADDR',
    }
  }
}

See https://github.com/jbenet/pandat/blob/master/js/type.js

Though none if this is final. Will try to have working code by end of this weekend.

@max-mapper
Copy link

just for the sake of argument, how about this for a minimum viable JSON type:

t = pandat.Type({
  'name': {
    'type': 'name',
    'label': 'NAME',
    'codec': 'name-last-name-first'
  },
  'addr': {
    'type': 'us-street-address',
    'label': 'ADDR',
  }
})

e.g. default type to @type if @type doesn't exist (agreed that @ symbols in keys are weird) and default all types to pandat/ if no other 'namespace' is specified

@jbenet
Copy link
Owner Author

jbenet commented Apr 19, 2014

As for the example, the goal is that most users won't have to write their own conversion functions at all, simply use published ones. Some people will, and in those cases, both doing it in code directly or with a relational schema (expressing the mapping of one type to the other) that allows transformer to generate the code. Precisely like you suggest! :)

You could use an internal function to execute converter.schema

👍

@jbenet
Copy link
Owner Author

jbenet commented Apr 19, 2014

just for the sake of argument, how about this for a minimum viable JSON type:

Yeah! lgtm! both filling in the @ and default namespace. If we run into problems, figure it out then.

@yoshuawuyts
Copy link

👍

@jbenet
Copy link
Owner Author

jbenet commented Apr 26, 2014

Turns out the @context can symlink type -> @type 👍

@id is not required for a valid JSON-LD document. Also note that you can alias "@id" to something less strange looking, like "id" or "url", for instance.

From frictionlessdata/datapackage#110 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants