Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamps primus lisp type checker #1047

Merged

Conversation

ivg
Copy link
Member

@ivg ivg commented Feb 6, 2020

This PR fixes the long-broken Primus Lisp typechecker and fixes errors that the type checker was able to find. It also adds the --primus-lisp-type-check so that you can check the types without running the program.

Historical Context

Primus Lisp type checker was an experiment for building a soft type inference with intersecting types. It didn't work well and the error messages were useless, so it was more or less disabled. However, we have recently discovered, that (a) not all our Lisp codes are well-typed (no surprises) and (b) that ill-typed Lisp code not always evaluates to a type error, but instead exhibits weird behavior, e.g., when a lexically scoped suddenly becomes dynamically scoped. In other words, we couldn't trust our lisps anymore.

To reestablish the trust we decided to quickly fix the type checker. This (having to do this quickly) forced us to drop intersecting types and implement more or less common HM style inference (with a diverging type 'a.'a for gradual typing). The type checker still a flow-based and runs on the callgraph. We also added a couple of checks:

  1. for unbound variables
  2. for unresolved functions
  3. for unresolved overloaded functions
  4. for unresolved signals
  5. for not matching method for a signal
  6. for an unbound parameter in a let-binding

The error messages are now much more readable and point to the origin of the problem, not miles away. It also features nice highlighting.

To introduce more type sinks into the system (and find more type errors) we add type signatures to signals (methods are now also type-checked) and a translation of subroutine arguments into Lisp signatures (ideally, we should be able to parse header files, but so far we just rely on whatever is in program.

The fixed type-checker found a few errors, which we fixed :)

ivg added 30 commits January 30, 2020 16:25
Tasks left to be done:
1) still an open question how to deal with user type variables
2) unresolved variables are not reported
3) unresolved function calls are not reported
4) error message really sucks (at best we have location)
A type variable is quantified over the scope of its defining
expression, i.e., for each expression that has a non-ground
signature we instantiate a new type variable of the form `x%m`
where `x` is the name of the type variable and `m` is the id
of the expression.
Well, at least they are more sane than were before :)

1) we track all types that were used during unification and dump them
to users in hope that this will clarify them the issue

2) we disallow unresolved functions and variables and immediately
lower the type to bottom if we see one.

3) the error messages are now more explonatory and include source code
with the relevenat context and highlighting.
also, instead of opening a file, we just keep all the read data in the
source repository.
it looks like that having disjoint types is not really useful right
now in Primus Lisp, so we decided to switch to the classical inference
with conjunctive types (i.e., with only meet operator, no join).

It is still the gradual type as we allow the `any` type which relaxes
the typing discipline and allows ill-typed expressions. For example,
the `concat` primitives returns `any` as it is dependetly typed. In
the same vain, we relax the typing of the `if` form and do not unify
the branches, so it returns `any` (so it types depends on the
condition value).

The switch to the simple scheme enables much more readable and useful
error messages, and we can still type all our lisp programs (modulo real
type errors).
E.g., we should mips init, when we're checking for x86 (we will get
lots of unbound errors).
```
(defun init (main argc argv auxv)
  "GNU libc initialization stub"
  (declare (external "__libc_start_main")
           (context (arch "mips")))
  (set T9 main)
  (set RA @__libc_csu_fini)
  (exit-with (invoke-subroutine main argc argv)))
```
It was allowed to set an undefined variable, which is probably an
error.

why, probably? The Primus.Env semantics let's us create variables on
the fly. Whether it is used or not, good or not is a second question,
so far let's try to disable this on the typing level and later we will
consider whether we need this in runtime.
(whether a method has a corresponding signal is not yet checked,
comming soon)
It is now called automatically after components but before init and
type errors are reported as observations. We don't want right now to
stop processing in case of type errors, just in case if we still have
some false positives.
the primitives shall not be registered conditionally
They were so far referring to unresolved primitives, now the type
checker says no, so I've reimplemented them not as a primitive,
but as a big overload.
1) there is no such function as coerce
2) implement atoi using the new language specific size functions.
and fixes the documentation for `when` and `until`.
Previously it was an ill-typed piece of code as we didn't pass the
size. We employ parameters to set the size, and default to one word.

Parameters are dynamically bound, so you can do as

```lisp
(let ((*indirect-taint-size* 100))
   (taint-introduce 'indirectly k v))
```
ivg added 6 commits February 6, 2020 14:22
this signal is defined in ddtbd, which is not yet integrated into BAP.
the `strchr` function takes an integer so we have to explicitly case
chars to it.
It will typecheck Primus Lisp programs without the neccesity to run
them (so far we need to run a binary to trigger the typechecker, which
was quite unintuitive).

All error messages will be printed with a conclusive message about the
quality of the program :)

Note, that when a program is run it will be automatically typechecked
and the errors will be printed, so there is no need to use this
option, every time you need to run a program.

Note, for those who prefer to type check, there is also the
`--primus-lisp-type-check` alias. But as usual, the least unambiguous
prefix will work, e.g., `--primus-lisp-type` would suffice.
@ivg ivg merged commit 4bae192 into BinaryAnalysisPlatform:master Feb 10, 2020
@ivg ivg deleted the revamps-primus-lisp-type-checker branch June 10, 2020 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant