Skip to content
This repository has been archived by the owner on Aug 2, 2019. It is now read-only.

Extra types for the Native Interface #34

Closed
wks opened this issue Jun 18, 2015 · 1 comment
Closed

Extra types for the Native Interface #34

wks opened this issue Jun 18, 2015 · 1 comment

Comments

@wks
Copy link
Member

wks commented Jun 18, 2015

Philosophy: There should be a subset of Mu types and instructions that can do what C can do. It should be possible to implement the C programming language in this subset of Mu while still be able to access the memory in a way specified by the platform's ABI (be compatible with "good" native programs).

Types

Pointer types

  • ptr<T>: A memory pointer to type T. (Is there a better name? A pointer always points to somewhere in the memory. Maybe "data pointer" or "value pointer"? In C, it is object pointer, but "object" has a different meaning in Mu.)
  • funcptr<sig>: A function pointer to a function with signature sig

Pointers are addresses. They can be cast to and from integer values by interpreting the integer as the address. Mu does not check the validity of this cast.

ptr<T> can be used by the memory addressing instructions: GETFIELDIREF, GETELEMIREF, ... will work as they are iref types. Memory access instructions can work with ptr<T> with a PTR flag:

// %p is ptr<int<64>>

%result1 = LOAD PTR ACQUIRE <@i64> %p

STORE PTR RELEASE <@i64> %p @const1

%result2 = CMPXCHG PTR SEQ_CST SEQ_CST <@i64> %p @const1 @const2

%result3 = ATOMICRMW PTR SEQ_CST ADD <@i64> %p @const3

funcptr<sig> can be called with the CCALL instruction:

// assume @write is funcptr<@size_t (@i32 @voidptr @size_t)>

%result = CCALL C <@sig> @write (%fd %buf %sz)   // C means the "C" calling convention

Union type

I think there is a way to introduce the union type from C without compromising the safety of Mu's reference types.

Define the union type as: union<T1 T2 T3 ...>

T1, T2, T3, ... are its members. The members of a union type cannot contain ref, iref, weakref, func, thread, stack or tagref64 types as they are either object references or opaque references. However, ptr and funcptr are allowed.

union must be in the memory. It cannot be the type of an SSA variable. It does not make sense: union is a, err..., "union" of several types (no puns intended), but an SSA variable holds exactly one type.

One may argue that "I want to LOAD a union and STORE to another location without looking into it, so I need union to be an SSA variable". However, for data transfer, there could be a memcpy-like instruction that can copy large structures efficiently. So it is unnecessary.

When allocated in the Mu memory, its initial value is all zeros: If any member is loaded before another value is stored into it, the result is always the "zero value" of that type (int 0, fp +0.0, ref NULL).

A union only holds the latest stored member:

  • if a load is *not atomic, and there is only one visible store to a member of the union, then
    • if the store accesses the same member as the load, the load gets the value of that store;
    • if the store accesses a different member, the load instruction has undefined behaviour.
  • Union members cannot be accessed atomically.

I am still uncertain how the C memory model plays together with unions. C11 defines a union as "an overlapping set of member objects" and "When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values." This implies that storing into one member of a union has the side effect of modifying other members.

@wks
Copy link
Member Author

wks commented Jun 19, 2015

Reasons for not supporting the union type in Mu:

  1. If it is pure Mu application, it is much better to use OOP for polymorphism. The tagged reference type is an optimisation for run-time type checking.
  2. If the program needs to interface with C programs, it must depend on the binary interface. Therefore the byte representations of native data types are defined. It is possible to represent the native union type with a byte array in Mu and interpret bytes directly.
  3. Union works horribly with concurrency and the memory model.

Example: the following C type

union sigval {
    int sival_int;
    void *sival_ptr;
};

struct sigevent {
    int sigev_notify;
    int sigev_signo;
    union sigval    sigev_value;
    void (*sigev_notify_function)(union sigval);
    pthread_attr_t *sigev_notify_attributes;
};

can be matched by the following Mu types:

.typedef @sigval_storage = array<@i8 8> // 8 byte storage area

.typedef @ptrfunc = funcptr<@blahblah>  // some function pointer
.typedef @ptrdata = ptr<@blahblahblah>  // some data pointer

.typedef @sigevent = struct <@i32 @i32 @sigval_storage @ptrfunc @ptrdata>

Then the pointer to the sigev_value field shall be cast to the appropriate case (@i32 or ptr<void>) for interpretation. The size of the @sigval_storage type is also just enough to properly align the next field: ptrfunc.

This means Mu should define the "byte representation" as C does. i.e. give semantics to "casting ptr<T> to ptr<array<int<8> N>> to interpret T as a sequence of N byte. It may be defined in the native interface, but defined as "implementation-defined" in the core specification. (Note: For heap objects or Mu memory data, the client can only mess with the platform details after pinning the object and getting a pointer. Even so, the reference type still has opaque representations which the client is not supposed to interpret as bytes.)

Appendix: How polymorphism is handled in the socket interface in C:

struct sockaddr {
    __uint8_t sa_len;   
    sa_family_t sa_family;
    char sa_data[14];
};

struct sockaddr_in {
    __uint8_t sin_len;
    sa_family_t sin_family;
    in_port_t sin_port;
    struct in_addr sin_addr;
    char sin_zero[8];
};

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant