Skip to content

Commit

Permalink
Merge pull request #165 from bbrk24/assembly-syntax
Browse files Browse the repository at this point in the history
Interpret assembly syntax
  • Loading branch information
bbrk24 authored Dec 7, 2024
2 parents 4707bbf + 07ae32d commit c04ee21
Show file tree
Hide file tree
Showing 36 changed files with 1,383 additions and 743 deletions.
1 change: 1 addition & 0 deletions .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,4 @@ WhitespaceSensitiveMacros:
- testcase
- testgroup
- TEST_ITEM
- DESTRINGIFY_NAME
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ All notable changes to this project will be documented in this file. See [Keep a

## Unreleased

### Added

- The interpreter can now handle assembly syntax, with the `-A` flag in the command line and a checkbox in the online interpreter. Currently, interactive debugging of assembly is not supported in the web interpreter.

### Changed

- Fixed C++23 support
Expand Down
31 changes: 31 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Trilangle is a 2-D, stack-based programming language inspired by [Hexagony].
> - [Interpreter flags](#interpreter-flags)
> - [Exit codes](#exit-codes)
> - [The disassembler](#the-disassembler)
> - [Assembly syntax](#assembly-syntax)
> - [C compiler](#c-compiler)
> - [Sample programs](#sample-programs)
> - [cat](#cat)
Expand Down Expand Up @@ -225,6 +226,36 @@ For example, when passing [the cat program below](#cat) with the flags `-Dn`, th
2.2: EXT
```

### Assembly syntax

In addition to producing this syntax, Trilangle is capable of interpreting this syntax. Currently, the output with `--hide-nops` is not guaranteed to be interpretable, as it may be missing jump targets. Each line can maximally consist of a label, an instruction, and a comment. The syntax can be described with the following extended Backus-Naur form:

```
program = line, {newline, line};
line = [label, [":"]], [multiple_whitespace, instruction], {whitespace}, [comment];
newline = ? U+000A END OF LINE ?;
tab = ? U+0009 CHARACTER TABULATION ?;
whitespace = " " | ? U+000D CARRIAGE RETURN ? | tab;
non_whitespace = ? Any single unicode character not in 'newline' or 'whitespace' ?;
multiple_whitespace = whitespace, {whitespace};
label = non_whitespace, {non_whitespace};
comment = ";", {non_whitespace | whitespace};
instruction = instruction_with_target | instruction_with_argument | plain_instruction;
instruction_with_target = ("BNG" | "TSP" | "JMP"), multiple_whitespace, label;
instruction_with_argument = ("PSI" | "PSC"), multiple_whitespace, number_literal;
plain_instruction = ? Any three-character instruction besides the five already covered ?;
number_literal = character_literal | decimal_literal | hex_literal;
character_literal = "'", (non_whitespace | tab), "'";
decimal_literal = "#", decimal_digit;
hex_literal = "0x", hex_digit, {hex_digit};
decimal_digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
hex_digit = "a" | "b" | "c" | "d" | "e" | "f" | "A" | "B" | "C" | "D" | "E" | "F" | decimal_digit;
```

## C compiler

When using the `-c` flag, the input program will be translated into C code. The C code is not meant to be idiomatic or easy to read, as it is a literal translation of the input. Optimizers such as those used by clang and GCC tend to do a good job of improving the program, in some cases removing the heap allocation altogether; MSVC does not.
Expand Down
2 changes: 1 addition & 1 deletion qdeql/disassembly.txt
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,7 @@ ct_bgn_not_end:
ct_bgn_not_end_cleanup:
POP ; Remove the uop from the stack
DEC ; Advance the PC
JMP ct_find_end_loop
JMP ct_bgn_find_end_loop
ct_bgn_found_end:
; If PC is pointing at the end uop, the stack layout is this:
; +======+------+----+
Expand Down
13 changes: 13 additions & 0 deletions src/any_program_holder.hh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#pragma once

#include <functional>
#include "instruction.hh"

template<typename IP>
class any_program_holder {
public:
virtual void advance(IP& ip, std::function<bool()> go_left) = 0;
virtual instruction at(const IP& ip) = 0;
virtual std::string raw_at(const IP& ip) = 0;
virtual std::pair<size_t, size_t> get_coords(const IP& ip) const = 0;
};
293 changes: 293 additions & 0 deletions src/assembly_scanner.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,293 @@
#include "assembly_scanner.hh"
#include <cinttypes>
#include <cstring>
#include <iostream>
#include <set>
#include <sstream>

using std::cerr;
using std::endl;
using std::string;
using std::string_view;
using std::string_view_literals::operator""sv;

#define WHITESPACE " \n\r\t"

[[noreturn]] static void invalid_literal(const string& argument) {
cerr << "Invalid format for literal: " << argument << endl;
exit(EXIT_FAILURE);
}

const std::vector<instruction>& assembly_scanner::get_instructions() {
if (!m_instructions.empty()) {
return m_instructions;
}

// We need to do two passes: one to resolve labels, and one to assign targets to jumps. During the first pass, the
// fragments are actually constructed. However, jumps may not have valid targets yet, so we need some way to store
// the label's name inside an IP. This code relies on the following assumption:
static_assert(sizeof(NONNULL_PTR(const string)) <= sizeof(IP), "Cannot fit string pointer inside IP");
// Using an ordered set over any other container so that references are not invalidated after insertion
std::set<string> label_names;

auto label_to_fake_location = [&](const string& name) -> IP {
auto iter = label_names.find(name);
if (iter == label_names.end()) {
auto p = label_names.insert(name);
iter = p.first;
}
NONNULL_PTR(const string) ptr = &*iter;
return reinterpret_cast<uintptr_t>(ptr);
};

// First pass
size_t line_end = 0;
while (true) {
size_t line_start = m_program.find_first_not_of('\n', line_end);
if (line_start >= m_program.size()) {
break;
}
line_end = m_program.find_first_of('\n', line_start);
string_view curr_line = string_view(m_program).substr(line_start, line_end - line_start);

// Unquoted semicolons are comments. Remove them.
size_t i;
for (i = 0; i < curr_line.size(); ++i) {
if (curr_line[i] != ';') {
continue;
}
if (i == 0 || curr_line[i - 1] != '\'') {
break;
}
}
if (i < curr_line.size()) {
curr_line.remove_suffix(curr_line.size() - i);
}
// If the line is only a comment, move on
if (curr_line.empty()) {
continue;
}
// Remove trailing whitespace. If there's only whitespace, skip this line
i = curr_line.find_last_not_of(WHITESPACE);
if (i == string::npos) {
continue;
} else {
curr_line.remove_suffix(curr_line.size() - i - 1);
}

// Look for labels (non-whitespace in the first column)
i = curr_line.find_first_not_of(WHITESPACE ":");
assert(i != string::npos);
if (i == 0) {
// Label, find end
i = curr_line.find_first_of(WHITESPACE ":");
if (i == string::npos) {
i = curr_line.size();
}
string label(curr_line.substr(0, i));

[[maybe_unused]] auto _0 = label_names.insert(label);
auto [_, inserted] = m_label_locations.insert({ label, m_instructions.size() });

if (!inserted) {
cerr << "Label '" << label << "' appears twice" << endl;
exit(EXIT_FAILURE);
}

// Set i to the first non-whitespace character after the label
i = curr_line.find_first_not_of(WHITESPACE ":", i + 1);
if (i == string::npos) {
// Line was only a label
continue;
}
}

// Remove leading whitespace, and label if there is one
curr_line.remove_prefix(i);

// Line should only be the opcode and, if there is one, the argument
if (curr_line.size() < 3) {
cerr << "Instruction too short: " << curr_line << endl;
exit(EXIT_FAILURE);
}
if (curr_line.size() > 3 && !strchr(WHITESPACE, curr_line[3])) {
cerr << "Instruction too long: " << curr_line << endl;
exit(EXIT_FAILURE);
}

string_view instruction_name(curr_line.data(), 3);
instruction::operation opcode = assembly_scanner::opcode_for_name(instruction_name);
switch (opcode) {
case instruction::operation::JMP: {
size_t label_start = curr_line.find_first_not_of(WHITESPACE, 3);
instruction::argument arg;
string label(curr_line.substr(label_start));
arg.next = { SIZE_C(0), label_to_fake_location(label) };
m_instructions.push_back({ opcode, arg });
m_slices.push_back(curr_line);
break;
}
case instruction::operation::BNG:
case instruction::operation::TSP: {
size_t label_start = curr_line.find_first_not_of(WHITESPACE, 3);
instruction::argument arg;
string label(curr_line.substr(label_start));
arg.choice = { { SIZE_C(0), m_instructions.size() + 1 }, { SIZE_C(0), label_to_fake_location(label) } };
m_instructions.push_back({ opcode, arg });
m_slices.push_back(curr_line);
break;
}
case instruction::operation::PSI:
case instruction::operation::PSC: {
size_t arg_start = curr_line.find_first_not_of(WHITESPACE, 3);
if (arg_start == string::npos) {
cerr << "Missing argument for push instruction" << endl;
exit(EXIT_FAILURE);
}

string argument(curr_line.substr(arg_start));
int24_t arg_value;
// Should be in one of three formats:
// - 'c' (single UTF-8 character)
// - 0xff (arbitrary length hex number)
// - #9 (single decimal digit)
if (argument[0] == '\'' && argument.back() == '\'') {
if (argument.size() < 3 || argument.size() > 6) {
// One UTF-8 character is 1 to 4 bytes
invalid_literal(argument);
}
i = 1;
arg_value = parse_unichar([&]() { return argument[i++]; });
if (arg_value < INT24_C(0) || i != argument.size() - 1) {
invalid_literal(argument);
}
} else if (argument[0] == '0' && argument[1] == 'x') {
char* last = nullptr;
unsigned long ul = strtoul(argument.c_str(), &last, 16);
if (*last != '\0' || ul > 0x1f'ffffUL) {
invalid_literal(argument);
}
arg_value = static_cast<int24_t>(ul);
} else if (argument[0] == '#' && argument.size() == 2) {
arg_value = static_cast<int24_t>(argument[1] - '0');
if (arg_value < INT24_C(0) || arg_value > INT24_C(9)) {
invalid_literal(argument);
}
} else {
invalid_literal(argument);
}

if (opcode == instruction::operation::PSI) {
// PSI expects to be given the digit, not the actual value
auto p = arg_value.add_with_overflow('0');
assert(!p.first);
arg_value = p.second;
}

instruction::argument arg;
arg.number = arg_value;
m_instructions.push_back({ opcode, arg });
m_slices.push_back(curr_line);
break;
}
default:
m_instructions.push_back({ opcode, instruction::argument() });
m_slices.push_back(curr_line);
break;
}
}

if (!m_instructions.empty()) {
const auto& last_instr = m_instructions.back();
if (!(last_instr.is_exit() || last_instr.m_op == instruction::operation::JMP)) {
cerr << "Program does not end in an exit instruction or loop" << endl;
exit(EXIT_FAILURE);
}
}

// Second pass
for (auto& instr : m_instructions) {
if (instr.m_op == instruction::operation::JMP) {
fake_location_to_real(instr.m_arg.next);
} else if (instr.m_op == instruction::operation::TSP || instr.m_op == instruction::operation::BNG) {
fake_location_to_real(instr.m_arg.choice.second);
}
}

return m_instructions;
}

void assembly_scanner::advance(IP& ip, std::function<bool()> go_left) {
instruction i = at(ip);

if (i.get_op() == instruction::operation::JMP) {
ip = i.get_arg().next.second;
return;
}

const auto* to_left = i.second_if_branch();
if (to_left != nullptr && go_left()) {
ip = to_left->second;
return;
}

ip++;
}

void assembly_scanner::fake_location_to_real(std::pair<size_t, size_t>& p) const {
uintptr_t reconstructed = static_cast<uintptr_t>(p.second);
auto ptr = reinterpret_cast<NONNULL_PTR(const string)>(reconstructed);
const string& str = *ptr;
auto loc = m_label_locations.find(str);
if (loc == m_label_locations.end()) {
cerr << "Undeclared label '" << str << "'" << endl;
exit(EXIT_FAILURE);
}
p = { SIZE_C(0), loc->second };
}

#define DESTRINGIFY_NAME(op) \
if (name == #op##sv) \
return instruction::operation::op

instruction::operation assembly_scanner::opcode_for_name(const string_view& name) noexcept {
DESTRINGIFY_NAME(BNG);
DESTRINGIFY_NAME(JMP);
DESTRINGIFY_NAME(TKL);
DESTRINGIFY_NAME(TSP);
DESTRINGIFY_NAME(TJN);
DESTRINGIFY_NAME(TKL);
DESTRINGIFY_NAME(NOP);
DESTRINGIFY_NAME(ADD);
DESTRINGIFY_NAME(SUB);
DESTRINGIFY_NAME(MUL);
DESTRINGIFY_NAME(DIV);
DESTRINGIFY_NAME(UDV);
DESTRINGIFY_NAME(MOD);
DESTRINGIFY_NAME(PSI);
DESTRINGIFY_NAME(PSC);
DESTRINGIFY_NAME(POP);
DESTRINGIFY_NAME(EXT);
DESTRINGIFY_NAME(INC);
DESTRINGIFY_NAME(DEC);
DESTRINGIFY_NAME(AND);
DESTRINGIFY_NAME(IOR);
DESTRINGIFY_NAME(XOR);
DESTRINGIFY_NAME(NOT);
DESTRINGIFY_NAME(GTC);
DESTRINGIFY_NAME(PTC);
DESTRINGIFY_NAME(GTI);
DESTRINGIFY_NAME(PTI);
DESTRINGIFY_NAME(PTU);
DESTRINGIFY_NAME(IDX);
DESTRINGIFY_NAME(DUP);
DESTRINGIFY_NAME(DP2);
DESTRINGIFY_NAME(RND);
DESTRINGIFY_NAME(EXP);
DESTRINGIFY_NAME(SWP);
DESTRINGIFY_NAME(GTM);
DESTRINGIFY_NAME(GDT);

cerr << "Unrecognized opcode '" << name << '\'' << endl;
exit(EXIT_FAILURE);
}
Loading

0 comments on commit c04ee21

Please sign in to comment.