lint(track_config): add more checks of exercise slugs #417

ee7 · 2021-09-07T13:14:43Z

With this PR, configlet lint now checks that a track-level
config.json file follows the below rules:

The exercises.concept[].slug value must be unique in
exercises.concept[].slug and may not exist in
exercises.practice[].slug
The exercises.practice[].slug value must be unique in
exercises.practice[].slug and may not exist in
exercises.concept[].slug
There must be exactly one exercises.practice[].slug value that is
the string hello-world
The exercises.foregone values must not match any of the concept or
practice exercise slugs

At the time of writing (2021-09-07T13:37:00Z), this PR produces the following diff to the output of configlet lint, per track:

julia

https://github.com/exercism/julia/blob/0a134640aeed/config.json#L102-L296

+The slug `leap` is used for both a Concept Exercise and a Practice Exercise, but must only appear once on the track:
+./config.json
+

Looks like Sascha has a PR for it: exercism/julia#365

With this commit, `configlet lint` now checks that a track-level `config.json` file follows the below rules: - The `exercises.concept[].slug` value must be unique in `exercises.concept[].slug` and may not exist in `exercises.practice[].slug` - The `exercises.practice[].slug` value must be unique in `exercises.practice[].slug` and may not exist in `exercises.concept[].slug` - There must be exactly one `exercises.practice[].slug` value that is the string `hello-world`

With this commit, `configlet lint` now checks that a track-level `config.json` file follows the below rule: - The `exercises.foregone` values must not match any of the concept or practice exercise slugs

ErikSchierboom

Excellent

ErikSchierboom · 2021-09-07T14:53:18Z

src/lint/track_config.nim

+  ##   only exists once on the track.
+  ## - There is exactly one Practice Exercise with the slug `hello-world`.
+  ## - The `foregone` array does not contain a slug of an implemented exercise.
+  var conceptExerciseSlugs = initHashSet[string](200)


Is the 200 just an initial size, or is it a maximum?

It's an initial size - how many items the HashSet should support. But, like Nim sequences, a HashSet "expands" automatically if you keep adding items.

In case you're interested, the implementation details for HashSet are here:

lib/pure/collections/sets.nim

lib/pure/collections/setimpl.nim

lib/pure/collections/hashcommon.nim

We allocate an initial number of "slots" using the formula theIntegerThatYouPass * 3 div 2 + 4, and in general the HashSet expands when it's more than two-thirds full.

As a quick demonstration: let's try initHashSet(3) (8 slots, by the above calculation), then insert 5 items, and print the representation:

import std/sets var s = initHashSet[string](3) s.incl "aaa" s.incl "bbb" s.incl "ccc" s.incl "ddd" s.incl "eee" echo s.repr

The output looks something like the below.

[ data = 0x70000000c050@[ [Field0 = 735026264, Field1 = 0x70000000d0c0"ccc"], [Field0 = 2446311049, Field1 = 0x70000000d090"bbb"], [Field0 = 4095589471, Field1 = 0x70000000d0f0"ddd"], [Field0 = 1505603105, Field1 = 0x70000000d120"eee"], [Field0 = 0, Field1 = ""], [Field0 = 0, Field1 = ""], [Field0 = 0, Field1 = ""], [Field0 = 3033554871, Field1 = 0x70000000d060"aaa"] ], counter = 5 ]

We can see that a HashSet[string] is just an object (again, see the implementation here) with:

A data field of type seq[tuple[hcode: Hash, key: string]], where Hash is just an int

A counter field, corresponding to the number of included elements

Now if we add another item, and print the representation again:

s.incl "fff" echo s.repr

We see something like:

[ data = 0x70000000e050@[ Field0 = 966543936, Field1 = 0x70000000dab0"fff"], Field0 = 1505603105, Field1 = 0x70000000da50"eee"], Field0 = 0, Field1 = ""], Field0 = 0, Field1 = ""], Field0 = 0, Field1 = ""], Field0 = 0, Field1 = ""], Field0 = 0, Field1 = ""], Field0 = 3033554871, Field1 = 0x70000000da80"aaa"], Field0 = 735026264, Field1 = 0x70000000d9c0"ccc"], Field0 = 2446311049, Field1 = 0x70000000d9f0"bbb"], Field0 = 0, Field1 = ""], Field0 = 0, Field1 = ""], Field0 = 0, Field1 = ""], Field0 = 0, Field1 = ""], Field0 = 0, Field1 = ""], Field0 = 4095589471, Field1 = 0x70000000da20"ddd"] ], counter = 6 ]

In particular:

The HashSet grows as we insert the 6th item

The resulting HashSet occupies double the memory (because growthFactor == 2)

(If you stare hard enough) the pointer to the HashSet changes - we allocated a new block of memory

The rehashing condition is (t.dataLen * 2 < t.counter * 3) or (t.dataLen - t.counter < 4). An initialSize of 3 produces a dataLen of 8, so the above condition becomes true when t.counter is 5. With non-tiny sets, we rehash when the HashSet is more than two-thirds full.

Needless allocations are an important source of performance problems, so it's a good optimization to avoid growing strings/sequences/HashSets etc when they're large and you're doing a lot of work. But of course, it doesn't matter at all for this PR.

Cool, thanks.

This signals to the reader that we know the size of the HashSet in advance.

This makes it clearer that this proc doesn't use `trackConfig.concepts`.

ee7 requested a review from ErikSchierboom as a code owner September 7, 2021 13:14

lint(track_config): add foregone check

6e46110

With this commit, `configlet lint` now checks that a track-level `config.json` file follows the below rule: - The `exercises.foregone` values must not match any of the concept or practice exercise slugs

ErikSchierboom approved these changes Sep 7, 2021

View reviewed changes

ee7 added 2 commits September 7, 2021 20:10

lint(track_config): improve HashSet initial size

b0dd888

This signals to the reader that we know the size of the HashSet in advance.

lint(track_config): prefer tighter parameter

45557f9

This makes it clearer that this proc doesn't use `trackConfig.concepts`.

ee7 merged commit ee64be5 into exercism:main Sep 8, 2021

ee7 deleted the lint-track-config-check-exercise-slugs branch September 8, 2021 09:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lint(track_config): add more checks of exercise slugs #417

lint(track_config): add more checks of exercise slugs #417

ee7 commented Sep 7, 2021 •

edited

Loading

ErikSchierboom left a comment

ErikSchierboom Sep 7, 2021

ee7 Sep 7, 2021 •

edited

Loading

ErikSchierboom Sep 8, 2021

lint(track_config): add more checks of exercise slugs #417

lint(track_config): add more checks of exercise slugs #417

Conversation

ee7 commented Sep 7, 2021 • edited Loading

julia

ErikSchierboom left a comment

Choose a reason for hiding this comment

ErikSchierboom Sep 7, 2021

Choose a reason for hiding this comment

ee7 Sep 7, 2021 • edited Loading

Choose a reason for hiding this comment

ErikSchierboom Sep 8, 2021

Choose a reason for hiding this comment

ee7 commented Sep 7, 2021 •

edited

Loading

ee7 Sep 7, 2021 •

edited

Loading