Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links to files #211

Closed
merkys opened this issue Nov 26, 2019 · 11 comments · Fixed by #360
Closed

Links to files #211

merkys opened this issue Nov 26, 2019 · 11 comments · Fixed by #360
Assignees
Labels
status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. topic/response-format Issue discussing changes and improvements to the API response format type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus.
Milestone

Comments

@merkys
Copy link
Member

merkys commented Nov 26, 2019

In the COD the primary description of a structure is a Crystallographic Information Framework (CIF) file. Thus we at the COD desire a standard method to present a link to such file in OPTiMaDe entries. In JSON API format I suggest using links, for example:

{
  "type" : "structure",
  "id" : "1234567",
  "links" : {
    "file" : {
      "href" : "https://www.crystallography.net/cod/1234567.cif",
      "meta" : {
        "type" : "chemical/x-cif"
      }
    }
  }
}

Thus links/file would be a JSON API link object. A link object would contain href field with URL and meta/type field with media type string.

@merkys merkys added status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. topic/response-format Issue discussing changes and improvements to the API response format type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus. labels Nov 26, 2019
@CasperWA
Copy link
Member

Looks all right to me 👍
It would make more sense to have the type be more explicit (i.e., not a meta field), but then file wouldn't be a valid JSON link object any more. And since we are already mandating/reserving special fields for the top-level meta, I think your suggestion is indeed the way to go.
It may also be a nice thing to add file_extension or similar, since the link may be a proxy or something?

@ml-evs
Copy link
Member

ml-evs commented Nov 26, 2019

I was also thinking about the best way of doing this, but to return multiple files. Do I need a separate key per file in this scheme (as example 1 below) or is this better suited to using the relationships field (as example 2)? I guess using relationships becomes messy as the objects aren't entries and probably shouldn't ever appear in the included reponse field...

1/

{
  "type": "structure",
  "id": "1234567",
  "links": {
    "input_file": {
         "href": "http://example.com/raw/input.file",
    }
    "output_file": {
        "href": "http://example.com/raw/output.file",
  }
}

2/

{
  "type": "structure",
  "id": "1234567",
  "relationships": {
    "files": {
      "links": [
            {"type": "links", "href": "http://example.com/raw/input.file"},
            {"type": "links", "href": "http://example.com/raw/output.file"}
      ]
    }
}

@merkys
Copy link
Member Author

merkys commented Nov 27, 2019

It may also be a nice thing to add file_extension or similar, since the link may be a proxy or something?

Maybe filename, to be more general?

@merkys
Copy link
Member Author

merkys commented Nov 27, 2019

I was also thinking about the best way of doing this, but to return multiple files. Do I need a separate key per file in this scheme (as example 1 below) or is this better suited to using the relationships field (as example 2)?

You are right, it's better to think how to support any number of files. My initial idea was the same as shown in your first example, but I don't like the idea of allowing any keys under links for that (self seems to be taken already, for example). A possible solution would be to require file_ prefix for file links.

I guess using relationships becomes messy as the objects aren't entries and probably shouldn't ever appear in the included reponse field...

I agree.

@rartino
Copy link
Contributor

rartino commented Dec 3, 2019

Since we are now using the links endpoint mostly for connections to other API implementations (base URLs), I'm a bit scared about referring to this concept as links in general.

To continue on @ml-evs model (2), it seems to me that these are indeed relationships to files, rather than relationships to "links"/urls. So, how about:

  • actually having a files/ endpoint
  • a files datatype with properties such as filename, content_type, and download_link, where the latter is a JSON API link object that provides a link to where you can download the file.
  • Use our standard handling of relationships to indicate a relationship between a structures object and a files object, with the decription field set to, e.g., "based on cif file".

The end result would thus be more like (with an included section, which would only be there if asked for):

{
  "data": {
    "type": "structure",
    "id": "1234567",
    "relationships": {
      "files": {
        "data": [
              {"type": "files", "id": "4711", meta: {"description": "Source cif file"}},
              {"type": "files", "id": "4712", meta: {"description": "Output file"}}
        ]
      }
 },
 "included": [
    {
      "type": "files",
      "id": "4711",
      "attributes": {
        "filename": "1234567.cif"
        "content_type": "chemical/x-cif",
        "download_link": "http://example.com/raw/1234567.cif",
      }
    },
    {
      "type": "files",
      "id": "4712",
      "attributes": {
        "filename": "1234567.cif"
        "content_type": "text/plain",
        "download_link": "http://example.com/raw/output.txt",
      }
    }
 ]
}

(Note: as I am typing this up, I realize that our present description of how to do relationships with descriptions isn't fully clear. Where should that description field inside a meta field really go? The way I have placed it above inside the resource identifier object is the only way I can see to have multiple relationships to the same type of object with different descriptions. But intuitively that placement feels a bit weird to me. Am I also allowed to use the meta under the files object if all relationships matches the same description...? I'll file an issue on this.)

@merkys
Copy link
Member Author

merkys commented Dec 3, 2019

So, how about:

  • actually having a files/ endpoint

  • a files datatype with properties such as filename, content_type, and download_link, where the latter is a JSON API link object that provides a link to where you can download the file.

  • Use our standard handling of relationships to indicate a relationship between a structures object and a files object, with the decription field set to, e.g., "based on cif file".

I like this a lot. Introducing files endpoint would also mean having straightforward means to query on them.

@ml-evs
Copy link
Member

ml-evs commented Dec 3, 2019

I'd support the addition of an optional files/ endpoint too.

I think it would be nice to have some way of grouping files together with an extra field without having to rely on the descriptions. A field, say, "kind" that can be used to query for input, output or auxiliary files. e.g. adding kind=input and kind=output to a potential files endpoint spec (where kind=null indicates neither). Adding this to @rartino's example:

{
  "data": {
    "type": "structure",
    "id": "1234567",
    "relationships": {
      "files": {
        "data": [
              {"type": "files", "id": "4711", "meta": {"description": "Source cif file"}},
              {"type": "files", "id": "4712", "meta": {"description": "DFT code input file"}},
              {"type": "files", "id": "4713", "meta": {"description": "DFT code output file"}},
              {"type": "files", "id": "4714", "meta": {"description": "DFT convergence data"}}
        ]
      }
 },
 "included": [
    {
      "type": "files",
      "id": "4711",
      "kind": "input",
      "attributes": {
        "filename": "1234567.cif"
        "content_type": "chemical/x-cif",
       "kind": "input",
        "download_link": "http://example.com/raw/1234567.cif",
      }
    },
    {
      "type": "files",
      "id": "4712",
      "attributes": {
        "filename": "dft.param"
        "kind": "input",
        "content_type": "text/plain",
        "download_link": "http://example.com/raw/dft.param",
      }
    },
    {
      "type": "files",
      "id": "4713",
      "attributes": {
        "filename": "1234567.cif"
        "content_type": "text/plain",
        "kind": "output",
        "download_link": "http://example.com/raw/1234567.cif",
      }
    },
    {
      "type": "files",
      "id": "4714",
      "attributes": {
        "filename": "convergence.csv"
        "content_type": "text/plain",
        "kind": null,
        "download_link": "http://example.com/raw/convergence.csv",
      }
    }
 ]
}

@CasperWA
Copy link
Member

CasperWA commented Dec 3, 2019

I think it would be nice to have some way of grouping files together with an extra field without having to rely on the descriptions. A field, say, "kind" that can be used to query for input, output or auxiliary files. e.g. adding kind=input and kind=output to a potential files endpoint spec (where kind=null indicates neither).

I am uncomfortable adding a lot to the resource object identifiers. Already adding the meta.description puffs up the response unnecessarily. If one wants the files in the same response, one would simply add files to the include parameter. Or one may go directly to /structures/{entry_id}/relationships/files.

However, I recognize your need for having a grouping option, but this could surely be done by filtering on /structures/{entry_id}/relationships/files or simply on /files?

Also, don't the kind values you have put here make sense for the files? Shouldn't it rather be for the link between the files and a calculation? Or perhaps on as a link between a structure and a calculation?

@ml-evs
Copy link
Member

ml-evs commented Dec 3, 2019

I am uncomfortable adding a lot to the resource object identifiers. Already adding the meta.description puffs up the response unnecessarily. If one wants the files in the same response, one would simply add files to the include parameter. Or one may go directly to /structures/{entry_id}/relationships/files.

However, I recognize your need for having a grouping option, but this could surely be done by filtering on /structures/{entry_id}/relationships/files or simply on /files?

Also, don't the kind values you have put here make sense for the files? Shouldn't it rather be for the link between the files and a calculation? Or perhaps on as a link between a structure and a calculation?

Typed this up too quickly, meant for kind to only be an attribute under included. Have edited the example above. I agree regarding the meta and description, but that discussion can be had in the other issue. I also agree regarding calculations, I realise this is preempting the calculations entry type somewhat, but I think it would be useful as an optional field to support.

@mkhorton
Copy link

mkhorton commented Dec 3, 2019

Briefly, the suggestion of a "files datatype with properties such as filename, content_type, and download_link" seems completely reasonable, but I would err against having a "kind" attribute. Concepts like "input" and "output" are hazy and ill-defined, and it over-complicates the spec in my opinion. Concepts like this are much bigger discussion and seem more suited for a future version of the specification.

@merkys
Copy link
Member Author

merkys commented Dec 4, 2019

I assume this feature to be a post-v1.0 addition. I will draft the proposal later, as we're in feature freeze right now. Or should we rush this feature to v1.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. topic/response-format Issue discussing changes and improvements to the API response format type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants