Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IH-633: Transition away from exception-raising Hashtbl.find and Unix.getenv #5751

Merged

Conversation

last-genius
Copy link
Contributor

@last-genius last-genius commented Jun 27, 2024

This PR starts the transition away from exception-raising OCaml's stdlib functions Unix.getenv and Hashtbl.find in favour of Sys.getenv_opt and Hashtbl.find_opt respectively.

This is beneficial because:

  • It handles failure explicitly (in a less expensive way than with exceptions, if failure was handled before in try-catch blocks), as an element of the interface, not a source code/documentation issue.
  • It avoids two traversals where .find functions where gated by .mem functions to avoid raising exceptions, instead doing just one.

A naive benchmark shows the benefit:

┌───────────────────────────────────────────────┬──────────┐
│ Name                                          │ Time/Run │
├───────────────────────────────────────────────┼──────────┤
│ id                                            │   0.93ns │
│ Option.value (Hashtbl.find_opt) ~default      │  13.58ns │
│ Hashtbl.mem then find                         │  27.05ns │
└───────────────────────────────────────────────┴──────────┘
  • It avoids a potential data race between checking if an element is present (.mem) and getting its value (.find)
  • It avoids a potential human error where .mem would be changed but its corresponding .find would not be.

===========

This PR is structured in the following way:

  • There is one commit for the Unix.getenv -> Sys.getenv_opt change along with the quality gate test for it. Unix.getenv usage has been completely eliminated.
  • There is one commit for trivial cases of Hashtbl.find -> find_opt
  • There are two commits for more complex refactoring for Hashtbl.find -> find_opt, in one case I've eliminated unnecessary Hashtbl operations altogether.
  • There is one commit for the quality gate test for Hashtbl.find - there are 36 usages left that do not handle exceptions currently and would need to introduce it on the appropriate level.

Even though the PR is relatively large, most changes are trivial and fall into several classes:

  • Exception already handled
    • Simple try-catch to provide default value => transition to Option.value
    • Some logic in try-catch => transition to match statement instead
    • Re-implementation of _opt by doing try Some (x.find) with _ -> None => just x.find_opt
  • .find gated by .mem => transition to match statement / Option.iter / Option.map depending on the case

===========

These changes passed the BST+BVT test suites.

quality-gate.sh Fixed Show fixed Hide fixed
@last-genius last-genius force-pushed the private/asultanov/opt-refactoring branch 2 times, most recently from e1759ac to a842901 Compare June 27, 2024 15:36
@last-genius last-genius marked this pull request as ready for review June 27, 2024 18:09
ocaml/libs/stunnel/stunnel.ml Outdated Show resolved Hide resolved
ocaml/libs/stunnel/stunnel_cache.ml Outdated Show resolved Hide resolved
ocaml/libs/stunnel/stunnel_cache.ml Outdated Show resolved Hide resolved
ocaml/networkd/bin/network_monitor_thread.ml Outdated Show resolved Hide resolved
ocaml/xapi-idl/lib/xcp_service.ml Show resolved Hide resolved
ocaml/xapi/binpack.ml Outdated Show resolved Hide resolved
| None ->
raise
(Api_errors.Server_error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define internal_error fmt in helpers and use that.

let fail fmt =
    Printf.kprintf
      (fun msg -> raise Api_errors.(Server_error (import_error_generic, [msg])))
      fmt

The above code is not raising internal_error but demonstrates how to define a function that raises an exception and takes printf-style arguments. You can use that to simplify other places that raise internal errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there is a similar function defined in xenops_server.ml, for example:

let internal_error fmt =
  Printf.kprintf
    (fun str ->
      error "%s" str ;
      raise (Xenopsd_error (Internal_error str))
    )
    fmt

Will open a separate ticket for this --- there are quite a few such cases with internal Printf.sprintf

@last-genius
Copy link
Contributor Author

Due to the size of this PR, I will be pushing fixup commits to ease the review process. Will squash these at the end.

Copy link
Contributor

@contificate contificate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Pedantically, there is a case where you can η-reduce (fun s -> Stunnel.disconnect s). - nevermind, we shouldn't be making unrelated changes here (I guess do them if you want so long as they are obviously correct and won't impair review?).

Also, if you want to be consistent with using Option.map and friends to outline casing behaviour, there are a few cases where fold could be used, e.g.:

let lookup table r =
  match Hashtbl.find_opt table r with
  | Some x ->
      Ref.of_string x
  | None ->
      Ref.null

could become

let lookup table r =
  Hashtbl.find_opt table r
  |> Option.fold ~none:Ref.null ~some:Ref.of_string

The same change is applicable to attempt_restart, to_wait, etc. This is largely down to individual preference though, I think the explicit match is completely acceptable.

I appreciate these changes. I have no doubt that you've probably mentally recorded other changes that could be made within the vicinity of these changes (but correctly didn't do them here as they're outwith the scope of this changeset): I suggest we look into documenting the potential further changes (of similar vein to these changes) explicitly somewhere; as there's many good stdlib API improvements - of a similar scope to these changes - available to us once we manage to bump the OCaml version up. There was a suggestion to make the quality gate a bit more flexible using semantic grepping, that could be an avenue to categorise all of these improvements (beyond that which is given to us by quality gate which can't capture every case).

ocaml/xapi/db_gc.ml Outdated Show resolved Hide resolved
ocaml/xapi/db_gc.ml Outdated Show resolved Hide resolved
ocaml/xcp-rrdd/bin/rrdd/rrdd_monitor.ml Outdated Show resolved Hide resolved
ocaml/xcp-rrdd/bin/rrdd/rrdd_server.ml Show resolved Hide resolved
ocaml/xenopsd/xc/readln.ml Show resolved Hide resolved
@last-genius
Copy link
Contributor Author

last-genius commented Jul 1, 2024

Pedantically, there is a case where you can η-reduce (fun s -> Stunnel.disconnect s). - nevermind, we shouldn't be making unrelated changes here (I guess do them if you want so long as they are obviously correct and won't impair review?).

Nice catch! I've been trying to keep this PR to changes that do not change the logic of code -- this one does not, however, a simple grep indicates there are at least 50 such cases in the codebase!

$ unbuffer rg 'fun\s+(\w+)\s+->\s+\w+\s+\1\)' --pcre2 --no-heading | tee >(wc -l)
[....]
50

Might just as well put this in a separate ticket/PR - though there is little practical benefit, it gets boiled down to the same code

Also, if you want to be consistent with using Option.map and friends to outline casing behaviour, there are a few cases where fold could be used, e.g.:

I myself find that the parseability of match is better than that of Option.fold -- perhaps it's just a question of style and preference. match is not so nice when one of the arms is heavy and the other is empty, so that's why I've used .map and .iter to immediately indicate the intention without having to search for the other arm.

@contificate
Copy link
Contributor

Might just as well put this in a separate ticket/PR - though there is little practical benefit, it gets boiled down to the same code

Indeed, it's only a stylistic concern; this case is a minor one but there are areas of the codebase where eta-reduction could be applied several times, which cleans things up a lot, visually (typical examples would be with List.fold_left). It's one of the more pedantic style changes, but it's generally accepted as good practice (so long as you don't get carried away with point-free style and get caught out by the value restriction).

@last-genius last-genius force-pushed the private/asultanov/opt-refactoring branch from 0548ee7 to 3ac45d4 Compare July 2, 2024 11:55
@last-genius
Copy link
Contributor Author

Had to rebase on top of master to get the Clock module, so hashes have changed. Otherwise review fixes are still in separate fixup commits.

@last-genius last-genius force-pushed the private/asultanov/opt-refactoring branch from 3ac45d4 to 88ba347 Compare July 2, 2024 11:56
@coveralls

This comment was marked as duplicate.

@xapi-project xapi-project deleted a comment from coveralls Jul 2, 2024
@lindig
Copy link
Contributor

lindig commented Jul 4, 2024

Has your comment being addressed, @psafont? Please revisit this such that we can merge this.

@@ -126,9 +126,13 @@ let canonicalise x =
if not (Filename.is_relative x) then
x
else (* Search the PATH and XCP_PATH for the executable *)
let paths = Re_str.split colon (Sys.getenv "PATH") in
let paths =
Re_str.split colon (Option.value (Sys.getenv_opt "PATH") ~default:"")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for next time: Re is not thread-safe, replace it if it's easy to do so

@psafont
Copy link
Member

psafont commented Jul 4, 2024

Please squash the commits

@last-genius last-genius force-pushed the private/asultanov/opt-refactoring branch from dd27180 to b2eeff2 Compare July 4, 2024 12:44
@last-genius
Copy link
Contributor Author

Squashed the fixes. Commits are still split into several as was suggested by Christian initially, so that we could revert only one if needed later (especially in the case of the more complex refactored files)

explicit handling of failure cases.

OCaml's stdlib has Sys.getenv_opt since 4.05. Some of the newer code
already uses it, and some of the old code handled exceptions (so
could nicely be transitioned to handling options instead). Some,
however, did not handle failure at all. This commit remedies that.

In most cases, getenv is used to query the PATH variable (before
adding another directory to it, for example), in which case there is
a nice default value of "". In some cases, the environment variable
is required to be present to proceed, then there is a failure of some
kind raised with the appropriate message.

A test case was added to the quality-gate.sh script to prevent
introduction of the exception-raising Unix.getenv into new code.

Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
This avoids two traversals in the cases where Hashtbl.mem is used
right before Hashtbl.find: avoiding two traversals,
possible data races and the possibility where one
would be changed without the other, introducing bugs.

Additionally, it handles failure explicitly where it wasn't
handled before, and moves from exception handling to matching on options
resulting in intentions becoming clearer.

This commit only changes trivial cases where little refactoring was
necessary.

Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
@last-genius last-genius force-pushed the private/asultanov/opt-refactoring branch from b2eeff2 to d4be15e Compare July 4, 2024 12:45
@xapi-project xapi-project deleted a comment from coveralls Jul 4, 2024
Signed-off-by: Andriy Sultanov <53952748+last-genius@users.noreply.github.com>
@last-genius last-genius merged commit caff014 into xapi-project:master Jul 8, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants