canonical-data.json: Shared data among test cases #699

rbasso · 2017-03-11T08:02:41Z

While working in #625, we discovered that some canonical-data.json files have data shared among test cases:

grep - Has a files object that encodes files to be "grepped" in multiples test cases.
pov - Has a trees object shared by multiple test cases, referenced by their names.

The current JSON Schema doesn't allow that, so I plan to sidestep this issue putting the offending structures in comments, temporarily, until we decide how deal with that.

I see a few option:

Replicate the information where it is needed: This has the advantage of keeping all test cases independent, but it hurts to ask a programmer to do it.
Change the schema to allow a resources/shared structure: This would avoid repetition, but also increase the complexity of test generators significantly.
Move from JSON to YAML: Using YAML anchors would be a clean and easy solution, but YAML is not well-supported as JSON, and that may be a problem for test generators in some languages.
Change the offending exercises, so that this feature isn't needed: The simple, trustworthy non-solution.

Any other option?

The text was updated successfully, but these errors were encountered:

petertseng · 2017-03-11T08:54:11Z

Doh. And I know that I was at least partially involved in both:

Let me pretend for a moment that we have decided to accept duplication in pov/canonical-data.json, and see what that looks like.

$ grep '"tree"' pov/canonical-data.json | sort | uniq -c                                             
      6         "tree": "cousins",
      1         "tree": "deeply_nested",
      1         "tree": "kids",
      2         "tree": "large_flat",
      2         "tree": "simple",
      2         "tree": "singleton",

So we won't care if deeply_nested and kids go away, and we probably can survive repeating simple and large_flat. cousins would hurt. What are the cases that use it?

Can reroot a tree with cousins: All right, I guess we have to keep it here.
Errors if target does not exist in a large tree: we probably do not have to use cousins for that. We can use a smaller tree that is less painful to write out in full.
Can find path to cousin: OK, I guess we have to keep it here as well.
Can find path from nodes other than x: we most definitely do not have to use cousins for that. We can use a smaller tree that is less painful to write out in full.
Errors if destination does not exist: same; doesn't require cousins.
Errors if source does not exist: same; doesn't require cousins.

In general I was having some doubts about whether the extra indirection of "tree": was making the canonical-data.json file difficult to understand for humans (a test generator obviously would have no problem, given that it has been taught about how to deal with the indirection.

Given this, I could support removing the indirection in "pov" (using smaller trees instead of cousins where possible)

grep... what would someone think about just placing the three files (iliad.txt, midsummer-night.txt, paradise-lost.txt) as normal .txt files in this repo rather than embedding them in JSON? That would remove the need for the files section.

Alternatively, perhaps we think it is acceptable to embed smaller files within each individual case. It could be that we don't need to use those exact three files for all our tests. Being able to tailor the files to each test might help each test be a better check of the respective behaviours.

Even if we think we can solve the problem with these two exercises today, what if another exercise comes along that does in fact need the shared data? Will we accept then that the schema will have to have it, and the test generators will have to deal with it?

I also see that it will be pretty rare to have exercises that need this, which means that a test generator that simply doesn't understand this section will only miss out on a handful of exercises, and maybe that is acceptable for them.

However, it seems the meaning of such a section will be very exercise-dependent. That is surely a negative.

rbasso · 2017-03-11T09:07:46Z

Good analysis, @petertseng.

grep... what would someone think about just placing the three files (iliad.txt, midsummer-night.txt, paradise-lost.txt) as normal .txt files in this repo rather than embedding them in JSON?

Seems a good solution for this exercise.

Even if we think we can solve the problem with these two exercises today, what if another exercise comes along that does in fact need the shared data?

I'm trying to find an example where this is indeed needed.

Maybe that is a canonical-data smell. It could be the case that, instead of carefully designing data to test a specific thing, we are saving work by utilizing a bigger, shared data structure.

Makes sense?

ErikSchierboom · 2017-03-11T20:47:45Z

grep... what would someone think about just placing the three files (iliad.txt, midsummer-night.txt, paradise-lost.txt) as normal .txt files in this repo rather than embedding them in JSON?

Seems like a very reasonable solution.

Given this, I could support removing the indirection in "pov" (using smaller trees instead of cousins where possible)

I think smaller trees could actually be beneficial, as they can help people better focus on what the actual problem is instead of figuring out the structure.

All in all, I think the benefit of reducing duplication by introducing a new, shared data concept does not outweight its advantages. With the existing tests, I feel that we are better off just accepting the duplication, and when possible, make the duplication less by cutting away unnecessary bits of data.

rbasso · 2017-03-12T03:43:44Z

pov is solved, so the only thing we need to decide now is grep.

If we think that it is not worthy to patch the schema, we have ~~two~~ three two options:

Add the text files to x-common.
Duplicate the data and maybe simplify, as we did with pov.
Just put the files section in the comments, and leave it as it is. (edit)

I just checked and there is no exercise with additional files in x-common, which is a nice regularity. So, I'm leaning if favor of the ~~second~~ third option to avoid creating future problems.

Edit: The second option generates a huge file. It seems to make more sense to change as little as possible now and leave the problem for a future redesign of this exercise. Is anyone using a test generator for it?!

ErikSchierboom · 2017-03-12T08:57:41Z

@rbasso I'm in favor of the third option. This is a bit of an odd one, but I don't think we can fully encode all edge-cases without getting into trouble somewhere. Better to have one exception, than to make things more complex for maintaining (2) or less obvious (1).

rbasso · 2017-03-13T11:24:41Z

Is there something else to be decide here or can we close it?

petertseng · 2017-03-13T12:02:22Z

I think we can close.

there is a possibility for more focused cases in grep, but those can be a separate issue.

Reformat 'about' documentation for nextercism

rbasso mentioned this issue Mar 11, 2017

canonical-data.json: Compliance with JSON Schema #625

Closed

75 tasks

rbasso mentioned this issue Mar 12, 2017

grep: Make exercise schema-compliant #712

Merged

rbasso closed this as completed Mar 13, 2017

petertseng mentioned this issue Apr 17, 2017

word-search: add canonical data #755

Merged

emcoding pushed a commit that referenced this issue Nov 19, 2018

Merge pull request #699 from exercism/nextercism-about

80a1fec

Reformat 'about' documentation for nextercism

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

canonical-data.json: Shared data among test cases #699

canonical-data.json: Shared data among test cases #699

rbasso commented Mar 11, 2017 •

edited

Loading

petertseng commented Mar 11, 2017 •

edited

Loading

rbasso commented Mar 11, 2017

ErikSchierboom commented Mar 11, 2017

rbasso commented Mar 12, 2017 •

edited

Loading

ErikSchierboom commented Mar 12, 2017

rbasso commented Mar 13, 2017

petertseng commented Mar 13, 2017

canonical-data.json: Shared data among test cases #699

canonical-data.json: Shared data among test cases #699

Comments

rbasso commented Mar 11, 2017 • edited Loading

petertseng commented Mar 11, 2017 • edited Loading

rbasso commented Mar 11, 2017

ErikSchierboom commented Mar 11, 2017

rbasso commented Mar 12, 2017 • edited Loading

ErikSchierboom commented Mar 12, 2017

rbasso commented Mar 13, 2017

petertseng commented Mar 13, 2017

rbasso commented Mar 11, 2017 •

edited

Loading

petertseng commented Mar 11, 2017 •

edited

Loading

rbasso commented Mar 12, 2017 •

edited

Loading