Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lint(track_config): add more checks of exercise slugs #417

Merged
merged 4 commits into from
Sep 8, 2021

Conversation

ee7
Copy link
Member

@ee7 ee7 commented Sep 7, 2021

With this PR, configlet lint now checks that a track-level
config.json file follows the below rules:

  • The exercises.concept[].slug value must be unique in
    exercises.concept[].slug and may not exist in
    exercises.practice[].slug
  • The exercises.practice[].slug value must be unique in
    exercises.practice[].slug and may not exist in
    exercises.concept[].slug
  • There must be exactly one exercises.practice[].slug value that is
    the string hello-world
  • The exercises.foregone values must not match any of the concept or
    practice exercise slugs

At the time of writing (2021-09-07T13:37:00Z), this PR produces the following diff to the output of configlet lint, per track:

julia

https://github.com/exercism/julia/blob/0a134640aeed/config.json#L102-L296

+The slug `leap` is used for both a Concept Exercise and a Practice Exercise, but must only appear once on the track:
+./config.json
+

Looks like Sascha has a PR for it: exercism/julia#365

With this commit, `configlet lint` now checks that a track-level
`config.json` file follows the below rules:

- The `exercises.concept[].slug` value must be unique in
  `exercises.concept[].slug`  and may not exist in
  `exercises.practice[].slug`
- The `exercises.practice[].slug` value must be unique in
  `exercises.practice[].slug` and may not exist in
  `exercises.concept[].slug`
- There must be exactly one `exercises.practice[].slug` value that is
  the string `hello-world`
With this commit, `configlet lint` now checks that a track-level
`config.json` file follows the below rule:

- The `exercises.foregone` values must not match any of the concept or
  practice exercise slugs
Copy link
Member

@ErikSchierboom ErikSchierboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent

## only exists once on the track.
## - There is exactly one Practice Exercise with the slug `hello-world`.
## - The `foregone` array does not contain a slug of an implemented exercise.
var conceptExerciseSlugs = initHashSet[string](200)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 200 just an initial size, or is it a maximum?

Copy link
Member Author

@ee7 ee7 Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an initial size - how many items the HashSet should support. But, like Nim sequences, a HashSet "expands" automatically if you keep adding items.


In case you're interested, the implementation details for HashSet are here:

We allocate an initial number of "slots" using the formula theIntegerThatYouPass * 3 div 2 + 4, and in general the HashSet expands when it's more than two-thirds full.

As a quick demonstration: let's try initHashSet(3) (8 slots, by the above calculation), then insert 5 items, and print the representation:

import std/sets

var s = initHashSet[string](3)
s.incl "aaa"
s.incl "bbb"
s.incl "ccc"
s.incl "ddd"
s.incl "eee"
echo s.repr

The output looks something like the below.

[
  data = 0x70000000c050@[
    [Field0 = 735026264, Field1 = 0x70000000d0c0"ccc"],
    [Field0 = 2446311049, Field1 = 0x70000000d090"bbb"],
    [Field0 = 4095589471, Field1 = 0x70000000d0f0"ddd"],
    [Field0 = 1505603105, Field1 = 0x70000000d120"eee"],
    [Field0 = 0, Field1 = ""],
    [Field0 = 0, Field1 = ""],
    [Field0 = 0, Field1 = ""],
    [Field0 = 3033554871, Field1 = 0x70000000d060"aaa"]
  ],
  counter = 5
]

We can see that a HashSet[string] is just an object (again, see the implementation here) with:

  • A data field of type seq[tuple[hcode: Hash, key: string]], where Hash is just an int
  • A counter field, corresponding to the number of included elements

Now if we add another item, and print the representation again:

s.incl "fff"
echo s.repr

We see something like:

[
  data = 0x70000000e050@[
    Field0 = 966543936, Field1 = 0x70000000dab0"fff"],
    Field0 = 1505603105, Field1 = 0x70000000da50"eee"],
    Field0 = 0, Field1 = ""],
    Field0 = 0, Field1 = ""],
    Field0 = 0, Field1 = ""],
    Field0 = 0, Field1 = ""],
    Field0 = 0, Field1 = ""],
    Field0 = 3033554871, Field1 = 0x70000000da80"aaa"],
    Field0 = 735026264, Field1 = 0x70000000d9c0"ccc"],
    Field0 = 2446311049, Field1 = 0x70000000d9f0"bbb"],
    Field0 = 0, Field1 = ""],
    Field0 = 0, Field1 = ""],
    Field0 = 0, Field1 = ""],
    Field0 = 0, Field1 = ""],
    Field0 = 0, Field1 = ""],
    Field0 = 4095589471, Field1 = 0x70000000da20"ddd"]
  ],
  counter = 6
]

In particular:

  • The HashSet grows as we insert the 6th item
  • The resulting HashSet occupies double the memory (because growthFactor == 2)
  • (If you stare hard enough) the pointer to the HashSet changes - we allocated a new block of memory
  • The rehashing condition is (t.dataLen * 2 < t.counter * 3) or (t.dataLen - t.counter < 4). An initialSize of 3 produces a dataLen of 8, so the above condition becomes true when t.counter is 5. With non-tiny sets, we rehash when the HashSet is more than two-thirds full.

Needless allocations are an important source of performance problems, so it's a good optimization to avoid growing strings/sequences/HashSets etc when they're large and you're doing a lot of work. But of course, it doesn't matter at all for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks.

This signals to the reader that we know the size of the HashSet in
advance.
This makes it clearer that this proc doesn't use `trackConfig.concepts`.
@ee7 ee7 merged commit ee64be5 into exercism:main Sep 8, 2021
@ee7 ee7 deleted the lint-track-config-check-exercise-slugs branch September 8, 2021 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants