Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating JSON with --opt=aggressive can result in out of bounds panics in certain cases #792

Closed
yorickpeterse opened this issue Dec 28, 2024 · 12 comments
Labels
accepting contributions Issues that are suitable to be worked on by anybody, not just maintainers bug Defects, unintended behaviour, etc compiler Changes related to the compiler std Changes related to the standard library
Milestone

Comments

@yorickpeterse
Copy link
Collaborator

yorickpeterse commented Dec 28, 2024

Please describe the bug

The following program works fine when using --opt=balanced, but produces a panic when using --opt=aggressive:

import std.fmt (fmt)
import std.io (Buffer)
import std.json (Json, ObjectParser, PullParser)
import std.net.socket (UnixClient, resolve)
import std.stdio (Stdout)
import std.time (Instant)

type inline Address {
  let @family: Int
  let @chunks: Array[Int]
}

type async Main {
  fn async main {
    # resolve('yorickpeterse.com')

    let out = Stdout.new
    let sock = UnixClient
      .new('/run/systemd/resolve/io.systemd.Resolve'.to_path)
      .get

    let start = Instant.new
    let msg = Map.new
    let params = Map.new

    params.set('name', Json.String('yorickpeterse.com'))
    params.set('family', Json.Int(2)) # AF_INET
    msg.set('method', Json.String('io.systemd.Resolve.ResolveHostname'))
    msg.set('parameters', Json.Object(params))

    let buf = Json.Object(msg).to_string.to_byte_array

    buf.push(0)
    sock.write_bytes(buf).get
    buf.clear

    # Read until the trailing NULL byte.
    loop {
      match sock.read(into: buf, size: 8 * 1024) {
        case Ok(0) -> break
        case Ok(_) if buf.last.or(-1) == 0 -> {
          buf.pop
          break
        }
        case Ok(_) -> {}
        case Error(e) -> panic(e.to_string)
      }
    }

    let json = Json.parse(Buffer.new(buf)).get
    let addrs = json
      .query
      .key('parameters')
      .key('addresses')
      .as_array
      .get
      .iter
      .try_reduce([], fn (ary, val) {
        let family = try val.query.key('family').as_int.ok_or(nil)
        let addr = try val.query.key('address').as_array.ok_or(nil)
        let nums = try addr.iter.try_reduce([], fn (ary, val) {
          ary.push(try val.query.as_int.ok_or(nil))
          Result.Ok(ary)
        })

        ary.push(Address(family: family, chunks: nums))
        Result.Ok(ary)
      })
      .or_panic('invalid response')

    out.print(fmt(start.elapsed))

    addrs.into_iter.each(fn (addr) {
      out.print('family: ${addr.family}')
      out.print('digits: ${fmt(addr.chunks)}')
    })
  }
}

The panic is as follows:

Stack trace (the most recent call comes last):
  /var/home/yorickpeterse/Downloads/test.inko:31 in main.Main.main
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/json.inko:519 in std.json.Json.to_string
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/json.inko:1580 in std.json.Generator.generate
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/json.inko:1614 in std.json.Generator.generate_value
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/json.inko:1638 in std.json.Generator.enter
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/json.inko:1615 in <closure>
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/iter.inko:92 in std.iter.Stream.each_with_index
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/iter.inko:48 in std.iter.Stream.each
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/iter.inko:92 in <closure>
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/json.inko:1622 in <closure>
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/json.inko:1590 in std.json.Generator.generate_value
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/string.inko:603 in std.string.String.escaped
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/iter.inko:48 in std.iter.Stream.each
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/string.inko:609 in <closure>
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/array.inko:304 in std.array.Array.get
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/array.inko:90 in std.array.bounds_check
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/array.inko:81 in std.array.out_of_bounds
  /var/home/yorickpeterse/Projects/inko/inko/std/src/std/process.inko:15 in std.process.panic
Process 'Main' (0x38b00eb0) panicked: the index -3895613677675478935 is out of bounds (size: 96)

Operating system

Fedora

Inko version

main

Reading material

@yorickpeterse yorickpeterse added bug Defects, unintended behaviour, etc compiler Changes related to the compiler std Changes related to the standard library labels Dec 28, 2024
@yorickpeterse yorickpeterse added this to the 0.18.0 milestone Dec 28, 2024
@yorickpeterse yorickpeterse added the accepting contributions Issues that are suitable to be worked on by anybody, not just maintainers label Dec 28, 2024
@yorickpeterse
Copy link
Collaborator Author

In https://github.com/inko-lang/inko/actions/runs/12588558439/job/35086680950 one test fails with --opt=none, which shouldn't happen. Combined with the JSON error (which I think is related), I suspect we're miscompiling something under certain conditions.

@yorickpeterse
Copy link
Collaborator Author

yorickpeterse commented Jan 8, 2025

A simpler way to reproduce this:

import std.json (Json)
import std.stdio (Stdout)

type async Main {
  fn async main {
    let out = Stdout.new
    let msg = Map.new

    msg.set('method', Json.String('io.systemd.Resolve.ResolveHostname'))
    out.print(Json.Object(msg).to_string)
  }
}

The resulting executable will sometimes panic, other times it straight up triggers a segmentation fault.

@yorickpeterse
Copy link
Collaborator Author

yorickpeterse commented Jan 8, 2025

Some extra notes:

  • Turning Json from an inline type to a regular type has no effect
  • Sometimes the program crashes with a segmentation fault, sometimes it panics, other times it succeeds
  • The number of primary and backup threads has no impact
  • The use (or not) of attributes such as noalias and others has no effect
  • String.byte_unchecked appears to return the correct value
  • String.bytes yields valid values, suggesting the data on the stack gets overwritten somehow

The following program is an even easier way to reproduce it:

import std.json (Json)
import std.stdio (Stdout)

type async Main {
  fn async main {
    let out = Stdout.new

    out.print(Json.String('io.systemd.Resolve.ResolveHostname').to_string)
  }
}

@yorickpeterse
Copy link
Collaborator Author

When the program segfaults, it appears to do so in the X86 lock instruction. This combined with the surrounding assembly suggests a String is dropped multiple times.

@yorickpeterse
Copy link
Collaborator Author

The crash goes away if both std.json.Json and std.option.Option are not inline types. This suggests we may be facing yet another struct return/argument ABI issue, as such issues have resulted in similarly weird behavior in the past.

@yorickpeterse
Copy link
Collaborator Author

A simpler way to reproduce the SEGV:

import std.json (Json)

type async Main {
  fn async main {
    let msg = Map.new

    msg.set('method', Json.String('io.systemd.Resolve.ResolveHostname'))
    Json.Object(msg)
  }
}

@yorickpeterse
Copy link
Collaborator Author

yorickpeterse commented Jan 9, 2025

If Context::argument_type is changed to always return ArgumentType::Pointer, the SEGV goes away, so this is definitely an ABI issue.

I suspect the issue is how we handle struct arguments when generating the loads for method arguments. That is, if the local type is a struct and so is the argument, we just load the argument into that local as-is. These types might not be different though:

diff --git a/compiler/src/llvm/passes.rs b/compiler/src/llvm/passes.rs
index 042bb59d..783e4ee1 100644
--- a/compiler/src/llvm/passes.rs
+++ b/compiler/src/llvm/passes.rs
@@ -1005,6 +1005,11 @@ impl<'shared, 'module, 'ctx> LowerMethod<'shared, 'module, 'ctx> {
             let var = self.variables[reg];
             let typ = self.variable_types[reg];
 
+            if typ.is_struct_type() {
+                println!("local: {}", typ);
+                println!("arg:   {}", arg);
+            }
+
             // Depending on the ABI requirements we may pass a struct in as a
             // pointer, but expect it as a value. In this case we need to load
             // the argument pointer's value into the stack slot, instead of

Produces:

local: "%\"type 0x7fab28006ca0\" = type { i16 }"
arg:   "i16 %0"
local: "%\"type 0x7fab18006ca0\" = type { i16 }"
arg:   "i16 %1"
local: "%\"type 0x7fab28006ed0\" = type { i16, [8 x i8] }"
arg:   "{ i64, i64 } %0"
local: "%\"type 0x7fab0c006e60\" = type { i16, [10 x i8] }"
arg:   "{ i64, i64 } %0"
local: "%\"type 0x7fab28006da0\" = type { i16, [8 x i8] }"
arg:   "{ i64, i64 } %0"

@yorickpeterse
Copy link
Collaborator Author

Based on looking at what Rust does (example), it seems our implementation doesn't quite match Rust as much as I thought. That is, if the struct is between 8 and 16 bytes then Rust seems to express the struct as { word, rest size }. So for 10 bytes the type is { i64, i16 } and not { i64, i64 }. The following patch seems to at least fix the SEGV:

diff --git a/compiler/src/llvm/context.rs b/compiler/src/llvm/context.rs
index e3c78d2e..7d7fa302 100644
--- a/compiler/src/llvm/context.rs
+++ b/compiler/src/llvm/context.rs
@@ -236,7 +236,12 @@ impl Context {
                     // To avoid the complexity of that we take the same approach
                     // as Rust: if the struct is larger than 8 bytes, we turn it
                     // into two 64 bits values.
-                    ArgumentType::Regular(self.two_words().as_basic_type_enum())
+                    let a = self.i64_type().into();
+                    let b = self.custom_int(size_in_bits(bytes - 8)).into();
+
+                    ArgumentType::Regular(
+                        self.struct_type(&[a, b]).as_basic_type_enum(),
+                    )
                 } else {
                     ArgumentType::StructValue(typ)
                 }

Whether this is always correct is something I have yet to figure out.

@yorickpeterse
Copy link
Collaborator Author

yorickpeterse commented Jan 9, 2025

Some examples for what Rust does when you define a struct with #[repr(C)]:

Fields AMD64 ARM64
i8 i8 i64
i8, i16 i32 i64
i8, i16, i16 i48 i64
i16, i16, i32 i64 i64
i32, i16 i64 i64
i16, i32, i32 { i64, i32 } { i64, i64 }
i16, i16, i64 { i64, i64 } { i64, i64 }
i16, i16, i32, i32 { i64, i32 } { i64, i64 }
i16, i16, i32, i32, i8 { i64, i64 } { i64, i64 }
i64, i8 { i64, i64 } { i64, i64 }
i64, i16 { i64, i64 } { i64, i64 }
i16, i16, i16, i16, i16 { i64, i16 } { i64, i64 }
i16, i16, i16, i16, i16, i8 { i64, i32 } { i64, i64 }

We should also verify what happens when the struct contains floats.

Reading material: OpenSmalltalk/opensmalltalk-vm#443 (comment)

Based on this, the algorithm seems to be as follows: iterate over the fields in order, sum their sizes until they no longer fit in 8 bytes. Round that size up to 8 bytes, then use the rest for the second field. This is why i16, i32, i32 produces { i64, i32 }: i16 + i32 is 6 bytes, and 6 bytes plus the last 4 bytes doesn't fit in 8 bytes, so we round that 6 up to 8 bytes and use the remaining 4 bytes for the second field.

yorickpeterse added a commit that referenced this issue Jan 9, 2025
yorickpeterse added a commit that referenced this issue Jan 9, 2025
yorickpeterse added a commit that referenced this issue Jan 9, 2025
The type definitions generated for small struct arguments and returns on
AMD64 wasn't correct, resulting in segmentation faults when running code
with --opt=aggressive. This fixes this issue and ensures we run tests
with all optimization levels on both AMD64 and ARM64.

This fixes #792.
yorickpeterse added a commit that referenced this issue Jan 9, 2025
The type definitions generated for small struct arguments and returns on
AMD64 and ARM64 wasn't correct, resulting in segmentation faults when
running code with --opt=aggressive. This fixes this issue and ensures we
run tests with all optimization levels on both AMD64 and ARM64.

This fixes #792.
@yorickpeterse
Copy link
Collaborator Author

To add: on ARM64 it's possible to pass up to 4 f64 values in registers, so structs such as { f64, f64, f64, f64 } should be passed as-is instead of using a pointer.

@yorickpeterse
Copy link
Collaborator Author

https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170 says the following:

Types less than or equal to 16 bytes are returned in x0 and x1, with x0 containing the lower-order 8 bytes.

This suggests that we should use the same approach as for AMD64, i.e. { i64, rest }. Using that approach indeed seems to fix the crashes on ARM64 macOS.

yorickpeterse added a commit that referenced this issue Jan 9, 2025
The type definitions generated for small struct arguments and returns on
AMD64 and ARM64 wasn't correct, resulting in segmentation faults when
running code with --opt=aggressive. This fixes this issue and ensures we
run tests with all optimization levels on both AMD64 and ARM64.

This fixes #792.
yorickpeterse added a commit that referenced this issue Jan 10, 2025
When accepting or returning structs, the data must be copied using
memcpy into a temporary slot, this slot must then be loaded into the
final slot. Without this, LLVM appears to generate the wrong code on
platforms such as ARM64.

This fixes #792.
yorickpeterse added a commit that referenced this issue Jan 10, 2025
When accepting or returning structs, the data must be copied using
memcpy into a temporary slot, this slot must then be loaded into the
final slot. Without this, LLVM appears to generate the wrong code on
platforms such as ARM64.

This fixes #792.
@yorickpeterse
Copy link
Collaborator Author

yorickpeterse commented Jan 11, 2025

TODO:

  • Expand the tests for small_amd64_struct
  • Rename this method to binary_struct since it always returns a struct with two fields
  • Fix ARM64 crash/hangs
    • memcpy struct-like return values
    • memcpy struct-like arguments before passing them
    • Handle float aggregates even when they're larger than 16 bytes (up to 4 fields)
    • Add tests
      • arm64::struct_argument
      • arm64::struct_return
      • amd64::struct_argument
      • amd64::struct_return

yorickpeterse added a commit that referenced this issue Jan 12, 2025
When accepting or returning structs, the data must be copied using
memcpy into a temporary slot, this slot must then be loaded into the
final slot. Without this, LLVM appears to generate the wrong code on
platforms such as ARM64.

In addition, for ARM64 we now generate the correct layouts for
homogeneous float structs (e.g. `{ f64, f64, f64, f64 }`).

This fixes #792.
yorickpeterse added a commit that referenced this issue Jan 12, 2025
When accepting or returning structs, the data must be copied using
memcpy into a temporary slot, this slot must then be loaded into the
final slot. Without this, LLVM appears to generate the wrong code on
platforms such as ARM64.

In addition, for ARM64 we now generate the correct layouts for
homogeneous float structs (e.g. `{ f64, f64, f64, f64 }`).

This fixes #792.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepting contributions Issues that are suitable to be worked on by anybody, not just maintainers bug Defects, unintended behaviour, etc compiler Changes related to the compiler std Changes related to the standard library
Projects
None yet
Development

No branches or pull requests

1 participant