Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapting jsiek's executable semantics tooling for commit. #237

Merged
merged 31 commits into from
Feb 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .bazeliskrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Part of the Carbon Language project, under the Apache License v2.0 with LLVM
# Exceptions. See /LICENSE for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

USE_BAZEL_VERSION=latest
19 changes: 18 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ repos:
- id: mixed-line-ending
args: ['--fix=lf']
- id: trailing-whitespace
exclude: '^(.*/testdata/.*\.golden)$'

- repo: https://github.com/google/pre-commit-tool-hooks
rev: 1d04a2848ac54d64bd6474ccec69aac45fa88414 # frozen: v1.1.1
hooks:
Expand All @@ -34,7 +36,22 @@ repos:
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
exclude: '^(website/(firebase/.firebaserc|jekyll/(Gemfile.lock|theme/.*))|.bazelversion|compile_flags.txt|.*\.def)$'
- --custom_format
- '\.6c$'
- ''
- '// '
- ''
- --custom_format
- '\.lpp$'
- '/*'
- ''
- '*/'
- --custom_format
- '\.ypp$'
- ''
- '// '
- ''
exclude: '^(website/(firebase/.firebaserc|jekyll/(Gemfile.lock|theme/.*))|.bazelversion|compile_flags.txt|.*\.def|.*/testdata/.*\.golden)$'
- id: check-google-doc-style
- id: markdown-toc
- repo: https://github.com/codespell-project/codespell
Expand Down
2 changes: 2 additions & 0 deletions bazel/cc_toolchains/clang_cc_toolchain_config.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,8 @@ def _impl(ctx):
# paths. Those might have a system installed libc++
# and we want to find the one next to our Clang.
"-L" + llvm_bindir + "/../lib",
# Link with pthread.
"-lpthread",
],
),
]),
Expand Down
5 changes: 5 additions & 0 deletions bazel/testing/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Part of the Carbon Language project, under the Apache License v2.0 with LLVM
# Exceptions. See /LICENSE for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

exports_files(["golden_test.sh"])
25 changes: 25 additions & 0 deletions bazel/testing/golden_test.bzl
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Part of the Carbon Language project, under the Apache License v2.0 with LLVM
# Exceptions. See /LICENSE for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

"""Rule for a golden test."""

def golden_test(name, golden, subject, **kwargs):
"""Compares two files. Passes if they are identical.

Args:
name: Name of the build rule.
subject: The generated file to be compared.
golden: The golden file to be compared.
**kwargs: Any additional parameters for the generated sh_test.
"""
native.sh_test(
name = name,
srcs = ["//bazel/testing:golden_test.sh"],
args = [
"$(location %s)" % golden,
"$(location %s)" % subject,
],
data = [golden, subject],
**kwargs
)
36 changes: 36 additions & 0 deletions bazel/testing/golden_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash
# Part of the Carbon Language project, under the Apache License v2.0 with LLVM
# Exceptions. See /LICENSE for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

set -e -u -o pipefail

GOLDEN=$1
SUBJECT=$2

if [[ $# == 3 && $3 == "--update" ]]; then
cp "${SUBJECT}" "${GOLDEN}"
exit $?
fi

CMD=("diff" "-u" "${GOLDEN}" "${SUBJECT}")

if "${CMD[@]}"; then
echo "PASS"
exit 0
fi

cat <<EOT
When running under:
${TEST_SRCDIR}
the golden contents of:
${GOLDEN}
do not match generated target:
${SUBJECT}

To update the golden file, run the following:

bazel run ${TEST_TARGET} -- --update
EOT

exit 1
19 changes: 19 additions & 0 deletions docs/project/contribution_tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ contributions.
- [Cargo (optional)](#cargo-optional)
- [Main tools](#main-tools)
- [Bazel and Bazelisk](#bazel-and-bazelisk)
- [Bison and Flex](#bison-and-flex)
- [buildifier](#buildifier)
- [Clang and LLVM](#clang-and-llvm)
- [Ninja](#ninja)
Expand Down Expand Up @@ -151,6 +152,24 @@ Our recommended way of installing is:
brew install bazelisk
```

### Bison and Flex

[Bison](https://www.gnu.org/software/bison/) and
[Flex](https://github.com/westes/flex) are used by executable semantics.
Although we may
[switch to a hemertic toolchain later](https://github.com/carbon-language/carbon-lang/issues/266),
an install is currently required.

Our recommended way of installing is:

```bash
brew install bison flex
```

On MacOS, it will be necessary to explicitly add the installed paths to the
`PATH` environment variable so that the brew-installed versions are used instead
of Xcode-installed versions. Read `brew` output for instructions.

### buildifier

[Buildifier](https://github.com/bazelbuild/buildtools/tree/master/buildifier) is
Expand Down
107 changes: 107 additions & 0 deletions executable_semantics/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Part of the Carbon Language project, under the Apache License v2.0 with LLVM
# Exceptions. See /LICENSE for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

# TODO(https://github.com/carbon-language/carbon-lang/issues/266):
# Migrate bison/flex usage to a more hermetic bazel build.

load("@rules_cc//cc:defs.bzl", "cc_binary", "cc_library")
load("//bazel/testing:golden_test.bzl", "golden_test")

cc_binary(
name = "executable_semantics",
srcs = ["main.cpp"],
deps = [":syntax"],
)

cc_library(
name = "syntax",
srcs = [
"syntax.tab.cpp",
"syntax.yy.cpp",
"syntax_helpers.cpp",
"syntax_helpers.h",
],
hdrs = ["syntax.tab.h"],
# Disable warnings for generated code.
copts = [
"-Wno-unneeded-internal-declaration",
"-Wno-unused-function",
"-Wno-writable-strings",
Copy link

@dabrahams dabrahams Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"-Wno-writable-strings",
"-Wno-writable-strings",
"-std=c++14",

Without this change, uses of the "register" keyword in the bison output go from being warnings to being errors on MacOS.

With this change, and brew's bison first in the path, I can report success on MacOS!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion on Discord, I'm wary of this fix because it feels like a broad stroke. I think it'd probably be better to investigate options further, but I'd rather not block commit on this as the current state works for Linux.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not take the broad stroke now to unblock Mac users (including the original author of the code) and open an issue to find a better fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To validate, I successfully built on mac with no changes. I do get a register-related build error if I use an older version of flex, the one included by xcode.

As I previously noted, a recent version of flex is required, just as with bison. Please ensure you're using a recent version.

I think we need to be really cautious about changing how builds work; it can become more difficult to fix later. While I think no change should be needed here, in the future, please provide build errors and rationale for changing code. Sometimes, by thinking these issues through, we can get to the root of the issue and arrive at a different solution.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the diligent work. It didn't occur to me that the old flex was the cause, but that explains everything.

],
deps = [
"//executable_semantics/ast:declaration",
"//executable_semantics/ast:expression",
"//executable_semantics/ast:expression_or_field_list",
"//executable_semantics/interpreter",
],
)

genrule(
name = "syntax_bison_srcs",
srcs = ["syntax.ypp"],
outs = [
"syntax.tab.cpp",
"syntax.tab.h",
],
cmd = "bison " +
"--output=$(location syntax.tab.cpp) " +
"--defines=$(location syntax.tab.h) " +
"$(location syntax.ypp)",
)

genrule(
name = "syntax_flex_srcs",
srcs = ["syntax.lpp"],
outs = ["syntax.yy.cpp"],
cmd = "flex " +
"--outfile=$(location syntax.yy.cpp) " +
"$(location syntax.lpp)",
)

EXAMPLES = [
"block1",
"break1",
"choice1",
"continue1",
"fun_recur",
"fun1",
"fun2",
"fun3",
"fun4",
"fun5",
"fun6_fail_type",
"funptr1",
"match_int_default",
"match_int",
"match_type",
"next",
"pattern_init",
"record1",
"struct1",
"struct2",
"struct3",
"tuple_assign",
"tuple_match",
"tuple1",
"tuple2",
"undef1",
"undef2",
"while1",
"zero",
]

[genrule(
name = "%s_out" % e,
srcs = ["testdata/%s.6c" % e],
outs = ["testdata/%s.out" % e],
# Suppress command errors.
cmd = "$(location executable_semantics) $< > $@ 2>&1 || true",
tools = [":executable_semantics"],
) for e in EXAMPLES]

[golden_test(
name = "%s_test" % e,
golden = "testdata/%s.golden" % e,
subject = "testdata/%s.out" % e,
) for e in EXAMPLES]
133 changes: 133 additions & 0 deletions executable_semantics/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Executable Semantics

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

This directory contains a work-in-progress executable semantics. It started as
an executable semantics for Featherweight C and it is migrating into an
executable semantics for the Carbon language. It includes a parser, type
checker, and abstract machine.

This language currently includes several kinds of values: integer, booleans,
functions, and structs. A kind of safe union, called a `choice`, is in progress.
Regarding control-flow, it includes if statements, while loops, break, continue,
function calls, and a variant of `switch` called `match` is in progress.

The grammar of the language matches the one in Proposal
[#162](https://github.com/carbon-language/carbon-lang/pull/162). The type
checker and abstract machine do not yet have a corresponding proposal.
Nevertheless they are present here to help test the parser but should not be
considered definitive.

The parser is implemented using the flex and bison parser generator tools.

- [`syntax.lpp`](syntax/syntax.lpp) the lexer specification
- [`syntax.ypp`](syntax/syntax.ypp) the grammar

The parser translates program text into an abstract syntax tree (AST), defined
in the [ast](ast/) subdirectory.

The [type checker](interpreter/typecheck.h) defines what it means for an AST to
be a valid program. The type checker prints an error and exits if the AST is
invalid.

The parser and type checker together specify the static (compile-time)
semantics.

The dynamic (run-time) semantics is specified by an abstract machine. Abstract
machines have several positive characteristics that make them good for
specification:

- abstract machines operate on the AST of the program (and not some
lower-level representation such as bytecode) so they directly connect the
program to its behavior

- abstract machines can easily handle language features with complex
control-flow, such as goto, exceptions, coroutines, and even first-class
continuations.

The one down-side of abstract machines is that they are not as simple as a
definitional interpreter (a recursive function that interprets the program), but
it is more difficult to handle complex control flow in a definitional
interpreter.

[InterpProgram()](interpreter/interpreter.h) runs an abstract machine using the
[interpreter](interpreter/), as described below.

The abstract machine implements a state-transition system. The state is defined
by the `State` structure, which includes three components: the procedure call
stack, the heap, and the function definitions. The `Step` function updates the
state by executing a little bit of the program. The `Step` function is called
repeatedly to execute the entire program.

An implementation of the language (such as a compiler) must be observationally
equivalent to this abstract machine. The notion of observation is different for
each language, and can include things like input and output. This language is
currently so simple that the only thing that is observable is the final result,
an integer. So an implementation must produce the same final result as the one
produces by the abstract machine. In particular, an implementation does **not**
have to mimic each step of the abstract machine and does not have to use the
same kinds of data structures to store the state of the program.

A procedure call frame, defined by the `Frame` structure, includes a pointer to
the function being called, the environment that maps variables to their
addresses, and a to-do list of actions. Each action corresponds to an expression
or statement in the program. The `Action` structure represents an action. An
action often spawns other actions that needs to be completed first and
afterwards uses their results to complete its action. To keep track of this
process, each action includes a position field `pos` that stores an integer that
starts at `-1` and increments as the action makes progress. For example, suppose
the action associated with an addition expression `e1 + e2` is at the top of the
to-do list:

(e1 + e2) [-1] :: ...

When this action kicks off (in the `StepExp` function), it increments `pos` to
`0` and pushes `e1` onto the to-do list, so the top of the todo list now looks
like:

e1 [-1] :: (e1 + e2) [0] :: ...

Skipping over the processing of `e1`, it eventually turns into an integer value
`n1`:

n1 :: (e1 + e2) [0]

Because there is a value at the top of the to-do list, the `Step` function
invokes `HandleValue` which then dispatches on the next action on the to-do
list, in this case the addition. The addition action spawns an action for
subexpression `e2`, increments `pos` to `1`, and remembers `n1`.

e2 [-1] :: (e1 + e2) [1](n1) :: ...

Skipping over the processing of `e2`, it eventually turns into an integer value
`n2`:

n2 :: (e1 + e2) [1](n1) :: ...

Again the `Step` function invokes `HandleValue` and dispatches to the addition
action which performs the arithmetic and pushes the result on the to-do list.
Let `n3` be the sum of `n1` and `n2`.

n3 :: ...

The heap is an array of values. It is used not only for `malloc` but also to
store anything that is mutable, including function parameters and local
variables. A pointer is simply an index into the array. The `malloc` expression
causes the heap to grow (at the end) and returns the index of the last slot. The
dereference expression returns the nth value of the heap, as specified by the
dereferenced pointer. The assignment operation stores the value of the
right-hand side into the heap at the index specified by the left-hand side
lvalue.

As you might expect, function calls push a new frame on the stack and the
`return` statement pops a frame off the stack. The parameter passing semantics
is call-by-value, so the machine applies `CopyVal` to the incoming arguments and
the outgoing return value. Also, the machine is careful to kill the parameters
and local variables when the function call is complete.

The [`testdata/`](testdata/) subdirectory includes some example programs with
golden output.
Loading