Rethink Structural Types

Originally, structural types were introduced to make the language fit better the underlying foundations (type theorists prefer structural), and to emulate the idea of "duck typing" in dynamic languages but with static guarantees. But it turned out that almost nobody uses them. It seems that a combination of traits and classes, together with type classes represented by implicit parameters gives enough flexibility, so the need for duck typing is rarely felt.

However, there is another area where statically-typed languages are often more awkward than dynamically-typed ones: database access. In a dynamically typed language, it's quite natural to model a row as a record or object, and to select entries with simple dot notation, e.g. `row.columnName`. In a statically typed language. we can do that only if we somehow define a class for every possible row arising from a data-base manipulation (including rows arising from joins and projections), and set up a scheme to map between a row and the class representing it. This requires a lot of boilerplate code. So quite often one opts for a simpler scheme where column names are represented as strings that are passed to a select operator, e.g. `row.select("columnName")`.  But this forgoes all the advantages of static typing and additionally is more awkward to write than the dynamically typed version.

A case in point is the Spark framework. The first version of Spark essentially supported distributed collections using RDDs. Used from Scala, this was very natural, an RDD was just some kind of collection, and was accessed in the same way as other Scala collections. Collection elements were defined by classes which were mapped transparently to database rows.

Later versions of Spark added database schemas ("data frames") for better optimizations and multi-language support. But, sadly, this meant that some amount of type safety was lost and member access was now done via strings instead of the more natural dot notation.

It seems the most natural type to represent a row in a database scheme is a structural type, with one field for each column. But unfortunately this does not work, at least not with structural types
as they are currently defined in Scala. The problem is that accessing a member of a structural types is always implemented in terms of accessing a field of method in a class, using Java reflection. For database access, this is not what we want; instead we would like to
use the field name as a parameter for operation that's defined by the system. In short, structural types are useless for database access because their member access implementation is not programmable.

The rest of this note describes a way to change that. It lays out a scheme to define programmatically the meaning of accessing a member of a structural type. The scheme is based on the idea of representing structural types programmatically, using "Selectables". It is implemented in PR #1881.

`Selectable` is a trait defined as follows:

    trait Selectable extends Any {
      def selectDynamic(name: String): Any
      def selectDynamicMethod(name: String, paramClasses: ClassTag[_]*): Any =
        new UnsupportedOperationException("selectDynamicMethod")
    }

The principal method of a selectable is `selectDynamic`: It takes a field name and returns the value associated with that name in the selectable.

To make this precise, assume `r` is a value with structural type `S`. In general `S` is of the form `C { Rs }`, i.e. it consists of a class reference `C` and refinement declarations `Rs`. We call a field selection `r.f` _structural_ if `f` is a name defined by a declaration in `Rs` whereas `C` defines no member of name `f`. Assuming the selection has type `T`, it is mapped to something equivalent to the following code:

    (r: Selectable).selectDynamic("f").asInstanceOf[T]

That is, we make sure `r` conforms to type `Selectable`, potentially by adding an implicit conversion. We then invoke the `get` operation of that instance, passing the the name `"f"` as a parameter. We finally cast the resulting value back to the statically known type `T`.

`Selectable` also defines another access method called `selectDynamicMethod`. This operation is used to select methods instead of fields. It gets passed the class tags of the selected method's formal parameter types as additional arguments. These can then be used to disambiguate one of several overloaded variants. 

Package `scala.reflect` contains an implicit conversion which can map any value to a selectable that emulates reflection-based selection, in a way similar to what was done until now:

    package scala.reflect

    object Selectable {
      implicit def reflectiveSelectable(receiver: Any): scala.Selectable = receiver match {
        case receiver: scala.Selectable => receiver
        case _                          => new scala.reflect.Selectable(receiver)
      }
    }

When imported, `reflectiveSelectable` provides a way to access fields of any structural type using Java reflection. This is similar to the current implementation of structural types. The main difference is that to get reflection-based structural access one now has to add an import like

    import scala.relect.Selectable.reflectiveSelectable

On the other hand, the previously required language feature import of `reflectiveCalls` would now be redundant and can be dropped.

As you can see from its implementation above, `reflectSelectable` checks first whether its argument is already a run-time instance of `Selectable`, in which case it is returned directly. This means that reflection-based accesses only take place as a last resort, if no other `Selectable` is defined.

Other selectable instances can be defined in libraries. For instance, here is a simple class of records that support dynamic selection:

    case class Record(elems: (String, Any)*) extends Selectable {
      def selectDynamic(name: String): Any = elems.find(_._1 == name).get._2
    }

`Record` consists of a list of pairs of element names and values. Its `selectDynamic` operation finds the pair with given name and returns its value.

Let's define a record value and cast it to a structural type `Person`:

    type Person = Record { val name: String; val age: Int }
    val person = Record(("name" -> "Emma", "age" -> 42)).asInstanceOf[Person]

(we get back to the issue of casting below, for now just note that the cast will succeed, as it checks at runtime only the erased portion of `Person`, which is `Record`).

Then `person.name` will have static type `String`, and will produce `"Emma"` as result.

The safety of this scheme relies on the correctness of the cast. If the cast lies about the structure of the record, the corresponding `selectDynamic` operation would fail.  In practice, the cast would likely be part if a database access layer which would ensure its correctness.

It would be nice if the correctness of structural types could be ensured in a way less resembling pulling a rabbit our of your hat. Maybe this could be achieved by providing a language-defined bijection between structural types and a recursive generic type structure such as an `HMap`, i.e. an `HList` over pairs of labels and values. The idea is that one can define type-level operations over the generic type which implement data manipulations in a type-safe way. Structural types themselves do not lend themselves to recursive type-level operations because their fundamental shape is a set (of key/value pairs), not a recursive type such as a list. On the other hand, structural types naturally implement the natural subtyping one would expect for records, which HLists or HMaps cannot do.

### Notes:

1. The scheme does not handle polymorphic methods in structural refinements. For now, such polymorphic methods are flagged as errors. It's not clear whether the use case is common enough to warrant the additional complexity of supporting it.

2. There are clearly some connections with `scala.Dynamic` here, since both select members programmatically. But there are also some differences.

 - Fully dynamic selection is not typesafe, but structural selection is, as long as the correspondence of the structural type with the underlying value is as stated.

 - `Dynamic` is just a marker trait, which gives more leeway where and how to define reflective access operations. By contrast `Selectable` is a trait which declares the access operations.

 - One access operation, `selectDynamic` is shared between both approaches, but the other access operations are different. `Selectable` defines a `selectDynamicMethod`, which takes class tags indicating the method's formal parameter types as additional argument. `Dynamic` comes with `applyDynamic` and `updateDynamic` methods, which take actual argument values. 

It would be interesting to see whether we can arrive at a harmonization between the two schemes. If we only look at `selectDynamic`, this is easy: We can define a class

    trait DynamicSelectable extends Selectable with Dynamic

So one `selectDynamic` operation can perform double duty for structural and dynamic dispatch. The differences between the other methods are a bit harder to bridge however.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rethink Structural Types #1886

Notes:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rethink Structural Types #1886

Description

Notes:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions