Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mir_find_adt, mir_struct_value, and friends #1931

Merged
merged 2 commits into from
Sep 11, 2023
Merged

Conversation

RyanGlScott
Copy link
Contributor

@RyanGlScott RyanGlScott commented Sep 2, 2023

This adds supports for MIR struct types, struct values, and ADT definitions in the MIR backend, piggybacking on the existing support for struct SetupValues. MIR structs are a fair bit more involved than LLVM structs:

  • Unlike in the LLVM backend, where structs are only identified up to their field types, MIR structs each have a distinct name. This means that two structs with the same field types can still be distinct, and SAW must be aware of this difference. As a result, the XSetupStruct instance for MIR uses an Adt (algebraic data type) to encode the information about the struct name, as well as the types of its fields.
  • Building an ADT is somewhat tricky, since each monomorphic instantiation of a struct in MIR has its own distinct, autogenerated identifier. The new mir_find_adt command allows users to look up these autogenerated names by supplying a human-readable name for the struct, along with the types used to instantiate the type parameters of the struct.
  • The mir_adt function builds a struct type from an ADT, and the mir_struct_value function builds a struct value from an ADT and the field values. These behave much like llvm_struct and llvm_struct_value in the LLVM backend. Note that we call it "mir_adt" rather than "mir_struct" to be forward-compatible with enums, another type of ADT that we will add support for in a future patch.

Some careful handling of #[repr(transparent)] structs is required, as they have a different TypeShape than other structs. See the TransparentShape code added in this patch, as well as the test_mir_verify_struct test case which includes a transparent struct example.

This checks off one box in #1859.

@@ -2621,6 +2622,82 @@ the value of an array element.
* `jvm_field_is : JVMValue -> String -> JVMValue -> JVMSetup ()`
specifies the name of an object field.

In the experimental MIR verification implementation, the following functions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if the documentation here makes sense, since MIR structs require a more careful explanation than LLVM structs.

Comment on lines -919 to -921
-- | Check if two 'Mir.Ty's are compatible in SAW. This is a slightly coarser
-- notion of equality to reflect the fact that MIR's type system is richer than
-- Cryptol's type system, and some types which would be distinct in MIR are in
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code simply moved to SAWScript.Crucible.MIR.ResolveSetupValue—no other changes were applied.

Comment on lines +313 to +357
-- match the underlying field of a repr(transparent) struct
(MIRVal (TransparentShape _ shp) val, _, MS.SetupStruct adt [z])
| Just y <- Mir.reprTransparentFieldTy col adt ->
matchArg opts sc cc cs prepost md (MIRVal shp val) y z

-- match the fields of a struct point-wise
(MIRVal (StructShape _ _ xsFldShps) (Crucible.AnyValue tpr@(Crucible.StructRepr _) xs),
Mir.TyAdt _ _ _,
MS.SetupStruct adt zs)
| Ctx.sizeInt (Ctx.size xs) == length zs
, let xsTpr = Crucible.StructRepr (FC.fmapFC fieldShapeType xsFldShps)
, Just Refl <- W4.testEquality tpr xsTpr ->
case adt of
Mir.Adt _ Mir.Struct [v] _ _ _ _ ->
let ys = v ^.. Mir.vfields . each . Mir.fty in
sequence_
[ case xFldShp of
ReqField shp ->
matchArg opts sc cc cs prepost md (MIRVal shp x) y z
OptField shp -> do
x' <- liftIO $ readMaybeType sym "field" (shapeType shp) x
matchArg opts sc cc cs prepost md (MIRVal shp x') y z
| (Some (Functor.Pair xFldShp (Crucible.RV x)), y, z) <-
zip3 (FC.toListFC Some (Ctx.zipWith Functor.Pair xsFldShps xs))
ys
zs ]
Mir.Adt _ ak _ _ _ _ _ ->
panic "matchArg" ["AdtKind " ++ show ak ++ " not yet implemented"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code for matching SetupStructs against MIRVals

Comment on lines 171 to 220
MS.SetupStruct adt flds ->
case adt of
_ | adt ^. Mir.adtReprTransparent,
[fld] <- flds -> do
MIRVal shp fld' <- resolveSetupVal mcc env tyenv nameEnv fld
pure $ MIRVal (TransparentShape (mirAdtToTy adt) shp) fld'
Mir.Adt nm Mir.Struct [variant] _ _ _ _ -> do
flds' <- traverse (resolveSetupVal mcc env tyenv nameEnv) flds
let expectedFlds = variant ^. Mir.vfields
let expectedFldsNum = length expectedFlds
let actualFldTys = map (\(MIRVal shp _) -> shapeMirTy shp) flds'
let actualFldsNum = length actualFldTys
unless (expectedFldsNum == actualFldsNum) $
fail $ unlines
[ "Mismatch in number of struct fields"
, "Struct name: " ++ show nm
, "Expected number of fields: " ++ show expectedFldsNum
, "Actual number of fields: " ++ show actualFldsNum
]
zipWithM_
(\expectedFld actualFldTy ->
let expectedFldTy = expectedFld ^. Mir.fty in
let expectedFldName = expectedFld ^. Mir.fName in
unless (checkCompatibleTys expectedFldTy actualFldTy) $
fail $ unlines
[ "Struct field type mismatch"
, "Field name: " ++ show expectedFldName
, "Expected type: " ++ show expectedFldTy
, "Given type: " ++ show actualFldTy
])
expectedFlds
actualFldTys
Some fldAssn <-
pure $ Ctx.fromList $
map (\(MIRVal shp v) ->
if Mir.canInitialize col (shapeMirTy shp)
then Some $ Functor.Pair (ReqField shp) (RV v)
else Some $ Functor.Pair (OptField shp) (RV (W4.justPartExpr sym v)))
flds'
let (fldShpAssn, valAssn) = Ctx.unzip fldAssn
let structShp = StructShape (mirAdtToTy adt) actualFldTys fldShpAssn
let structTpr = StructRepr (FC.fmapFC fieldShapeType fldShpAssn)
pure $ MIRVal structShp (AnyValue structTpr valAssn)
Mir.Adt _ ak _ _ _ _ _ ->
panic "resolveSetupVal" ["AdtKind " ++ show ak ++ " not yet implemented"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code for resolving a SetupStruct into a MIRVal.

@@ -60,7 +60,7 @@ import qualified SAWScript.Crucible.Common.Setup.Value as MS

type instance MS.XSetupNull MIR = Void
type instance MS.XSetupGlobal MIR = Void
type instance MS.XSetupStruct MIR = ()
type instance MS.XSetupStruct MIR = M.Adt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A key change in this PR: we now bundle an Adt along with a MIR SetupStruct. This is used in several parts of the patch to consult the Adt definition of a struct (e.g., what the types of its fields are, whether it is a repr(transparent) struct or not, etc.)

Comment on lines +3881 to +3913
, prim "mir_find_adt" "MIRModule -> String -> [MIRType] -> MIRAdt"
(funVal3 mir_find_adt)
Experimental
[ "Consult the given MIRModule to find an algebraic data type (MIRAdt)"
, "with the given String as an identifier and the given MIRTypes as the"
, "types used to instantiate the type parameters. If such a MIRAdt cannot"
, "be found in the MIRModule, this will raise an error."
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A key function used to look up a MIR ADT from its name and type substitutions.

Comment on lines -57 to +82
newtype JSONMIRType = JSONMIRType { getMIRType :: Ty }
-- | This is like 'Mir.Ty', but with the following differences:
--
-- 1. This only contains the subset of MIR types that are currently supported
-- by the SAW remote API.
--
-- 2. 'JSONTyAdt' only contains a 'ServerName' that points points to an
-- 'Mir.Adt', as opposed to 'Mir.TyAdt', which contains a full 'Mir.Adt'.
-- The advantage of only containing a 'Mir.TyAdt' is that we do not have to
-- represent the entirety of a 'Mir.Adt' definition in JSON each time we want
-- to reference the ADT in a type.
data JSONMIRType
= JSONTyAdt !ServerName
| JSONTyArray !JSONMIRType !Int
| JSONTyBool
| JSONTyFloat !Mir.FloatKind
| JSONTyInt !Mir.BaseSize
| JSONTySlice !JSONMIRType
| JSONTyStr
| JSONTyTuple ![JSONMIRType]
| JSONTyUint !Mir.BaseSize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed JSONMIRType from a newtype around Mir.Ty to a distinct data type solely for the purpose of making the Adt case (JSONMIRType) store a ServerName pointing to a Mir.Adt rather than storing a Mir.Adt directly. Storing a Mir.Adt directly would have meant that we would have had to store the entire Mir.Adt as JSON, which would have been quite expensive, as Mir.Adt is a large data structure. Moreover, it's unnecessary to do so, since the ADT definition is already contained in the MIR module, so storing the Mir.Adt would be redundant.

@RyanGlScott
Copy link
Contributor Author

The CI is failing because I forgot to update some code in crucible-mir-comp. Taking a look at this code, however, I've realized that it won't be straightforward to fix at the moment. In particular, this line scares me a bit:

go (TupleShape _ _ flds) rvs = MS.SetupStruct () <$> goFields flds rvs

The () in MS.SetupStruct () should now be an Adt instead. But MIR tuples (classified by TupleShape) don't have Adts, so it's not obvious to me how I'd obtain an Adt in this case. I'm not entirely sure how this code has managed to get away with conflating MIR structs with tuples for so long, but I don't think this sort of thing will work any more with the design I've adopted.

Perhaps I should wait until I've added a mir_tuple_value function first, as that will likely require adding a SetupTuple constructor to SetupValue. Once that happens, I can use SetupTuple here instead of SetupStruct, which should resolve the tension.

Copy link
Contributor

@spernsteiner spernsteiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far, aside from the crux-mir-comp breakage.

I'm not entirely sure how this code has managed to get away with conflating MIR structs with tuples for so long

My guess is that everywhere crux-mir-comp uses a SetupValue, it also has the M.Ty or TypeShape on hand to distinguish structs from tuples (and structs from other structs).

Given that crux-mir-comp never reads the new M.Adt field, maybe you could have it use undefined :: M.Adt or a dummy value when building SetupStructs as a stopgap measure until a proper tuple case is added.

Perhaps I should wait until I've added a mir_tuple_value function first, as that will likely require adding a SetupTuple constructor to SetupValue.

My first thought is that structs and tuples are similar enough that it would be nice to use the same SetupValue constructor for both. But I guess they have different TypeShapes and would probably wind up needing different code paths in most places. Would it make sense to redefine XSetupStruct MIR to have separate ADT and tuple variants, or would it be cleaner to do that split in SetupValue instead? (I also thought about generating fake M.Adts for tuples, but that seems like more trouble than it's worth due to the differences between struct and tuple TypeReprs.)

doc/manual/manual.md Outdated Show resolved Hide resolved
intTests/test_mir_verify_structs/test.saw Show resolved Hide resolved
saw-remote-api/docs/SAW.rst Outdated Show resolved Hide resolved
Copy link
Contributor Author

@RyanGlScott RyanGlScott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that crux-mir-comp never reads the new M.Adt field, maybe you could have it use undefined :: M.Adt or a dummy value when building SetupStructs as a stopgap measure until a proper tuple case is added.

That is a good idea. That being said, I actually don't think it would take that much effort to properly support tuples (see my comments below). I'll submit a separate PR for that and then rebase this work on top it afterwards.

Would it make sense to redefine XSetupStruct MIR to have separate ADT and tuple variants, or would it be cleaner to do that split in SetupValue instead?

I think it would be cleanest to have a SetupTuple constructor (separate from SetupStruct) in SetupValue. Adding a new SetupValue constructor is relatively cheap, so I don't see much advantage to trying to consolidate multiple concepts into SetupStruct. Plus, if we're going to have separate StructShape and TupleShape cases, then I think it makes sense to mirror this split in SetupValue.

doc/manual/manual.md Outdated Show resolved Hide resolved
intTests/test_mir_verify_structs/test.saw Show resolved Hide resolved
saw-remote-api/docs/SAW.rst Outdated Show resolved Hide resolved
@RyanGlScott
Copy link
Contributor Author

I've submitted #1934 to implement support for SetupTuples, and I've verified that that suffices to fix the crucible-mir-comp build failure observed here. I'll rebase this PR on top of #1934 once it lands.

This adds supports for MIR struct types, struct values, and ADT definitions in
the MIR backend, piggybacking on the existing support for struct `SetupValue`s.
MIR structs are a fair bit more involved than LLVM structs:

* Unlike in the LLVM backend, where structs are only identified up to their
  field types, MIR structs each have a distinct name. This means that two
  structs with the same field types can still be distinct, and SAW must be
  aware of this difference. As a result, the `XSetupStruct` instance for `MIR`
  uses an `Adt` (algebraic data type) to encode the information about the
  struct name, as well as the types of its fields.
* Building an ADT is somewhat tricky, since each monomorphic instantiation of
  a struct in MIR has its own distinct, autogenerated identifier. The new
  `mir_find_adt` command allows users to look up these autogenerated names by
  supplying a human-readable name for the struct, along with the types used to
  instantiate the type parameters of the struct.
* The `mir_adt` function builds a struct type from an ADT, and the
  `mir_struct_value` function builds a struct value from an ADT and the field
  values. These behave much like `llvm_struct` and `llvm_struct_value` in the
  LLVM backend. Note that we call it "`mir_adt`" rather than "`mir_struct`" to
  be forward-compatible with enums, another type of ADT that we will add
  support for in a future patch.

Some careful handling of `#[repr(transparent)]` structs is required, as they
have a different `TypeShape` than other structs. See the `TransparentShape`
code added in this patch, as well as the `test_mir_verify_struct` test case
which includes a transparent struct example.

This checks off one box in #1859.
@RyanGlScott RyanGlScott marked this pull request as ready for review September 11, 2023 18:09
@RyanGlScott RyanGlScott added the PR: ready to merge Magic flag for pull requests to ask Mergify to merge given an approval and a successful CI run label Sep 11, 2023
@mergify mergify bot merged commit 583b4e9 into master Sep 11, 2023
38 checks passed
@mergify mergify bot deleted the T1859-mir_struct_value branch September 11, 2023 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: ready to merge Magic flag for pull requests to ask Mergify to merge given an approval and a successful CI run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants