Skip to content

Latest commit

 

History

History
225 lines (175 loc) · 12.8 KB

kotlin-design.md

File metadata and controls

225 lines (175 loc) · 12.8 KB

Kotlin protos design

Context

This doc describes a proposal for a set of Kotlin APIs to generate for protocol buffers, as augmentations of the APIs already generated by the Java proto compiler. The logic of this approach is as follows:

  • Kotlin, as a language, is designed specifically to enable an incremental transition from Java. Many Kotlin developers are already using the Java protobuf APIs from Kotlin, and we’d like to provide the same smooth, backwards-compatible transition.
  • Java protos already use bean-style APIs that are automatically recognized by Kotlin and turned into properties. Using the existing Java APIs does not imply sacrificing the desire to write idiomatic Kotlin code for protos.
  • Our optimization infrastructure, much of which is making its way into open source, is specifically equipped to optimize the Java proto APIs at the bytecode level. Continuing to use the same APIs allows us to continue to benefit from these optimizations.

This document does not include proposals for gRPC improvements for use in Kotlin. That is a project we are actively working on, but we are treating it as an orthogonal set of improvements.
At this point, we're inclined to believe it's not a good idea to combine the two.

Proposed APIs

We propose generating APIs for the following key improvements to the Java proto APIs:

  • DSL-based construction of message objects. The long chains of builders typical in Java proto creation involve a lot of unnecessary ceremony, which a Kotlin DSL can significantly improve on.
  • Type-level representation of oneof fields. Kotlin’s improved support for smart casts and type matching relative to Java makes it easier to dispatch on different oneof cases.

Proto Creation DSL

At a basic level, the DSL for a proto message is exactly what you’d expect.

val myMessage = message {
  fieldA = value1
  fieldB = subMessage {
    subField = subValue
  }
}

Nothing here is particularly surprising, and it reads very cleanly compared to using the builders generated for Java:

val myMessage = Message.newBuilder()
  .setFieldA(value1)
  .setFieldB(
    SubMessage.newBuilder()
      .setSubField(subValue)
      .build()
  )
  .build()

What is a little surprising, however, is what we’re doing for repeated and map fields, so let’s explain that. Instead of either

  • exposing an unmodifiable collection view and a set of specific mutation methods like addMyRepeatedField, as the current Java API does
  • creating a custom mutable collection wrapper, which runs into a few problems (code size, boxing, difficulty controlling the scope of mutability)

we’re taking a slightly different approach: providing an unmodifiable collection view and extension methods -- available only inside the DSL -- that modify the underlying builder.

Here’s an API sketch:

message MyMessage {
  repeated int32 ints = 1;
}
// defined once, in the proto runtime.  Can be an inline class, if we like.
class DslList<E, P>(private val delegate: List<E>): List<E> by delegate

class MyMessageDsl {
  // never instantiated
  class IntsProxy private constructor()

  val ints: DslList<Int, IntsProxy>

  operator fun DslList<Int, IntsProxy>.plusAssign(newValue: Int) { … }
  // ...more extension methods...
}

(We'll be automatically generating KDoc to clarify this for each proto.)

This allows us to write the code

myMessage {
  ints += 5
  ints += listOf(6, 7)
}

without allowing the mutability outside of the DSL, and avoids boxing the Int. The JIT, or ProGuard or the like for Android, can optimize this down to the equivalent of myMessageBuilder.addInts(5).

This approach may come off as a little strange. We get it, really. But we care very strongly about constraining the scope of mutability -- e.g. avoiding a mutable list attached to the builder getting stored somewhere and further modified -- and avoiding boxing. The Java proto builder API exposes an unmodifiable list view and additional mutation methods for the same reason. Our goal was to provide the same API guarantees while enabling the use of convenient Kotlin operator overloads and the like, and this was the best balance we could find.

The most notable disadvantage of this approach is that, as with Java builders, you cannot directly assign to a repeated field, you can only modify and append to it: e.g. ints += listOf(6, 7) instead of ints = listOf(6, 7). This seems tolerable: it only requires one more character (+), and it’s a constraint we already have in Java.

Map fields in the proto get similar operator-style syntax:

myMessage {
  mapField[5] = "banana"
  mapField[3] = "apple"
  mapField += otherMap
}

In the event you need to break populating a proto into multiple methods, you can still do it; just write extension methods on MyMessageDsl.

Creating altered protos

In Java, when given a proto, creating a modified version of it usually looks like myProto.toBuilder().setFoo(foo).setBar(bar).build(). We can use the DSL for this as well.

For the moment, we are planning to use the method name copy, which has specific precedent in Kotlin data classes: myDataObject.copy(fooProperty = foo) is the language’s built-in syntax for creating a modified version of a data object. From there, it looks basically like the baseline DSL:

myProto.copy {
  foo = makeFoo()
  bar = theBar
}

Oneof Class Representation

Here is a simplified example of the API we propose:

message MyMessage {
  oneof choice {
    string string_option = 1;
    int32 int_option = 2;
  }
}
object MyMessageKt {
  abstract class ChoiceOneof<V>
      internal constructor(val value: V, val case: ChoiceCase) {
    class StringOption(value: String) :
      ChoiceOneof<String>(value, ChoiceCase.STRING_OPTION)
    class IntOption(value: Int):
      ChoiceOneof<Int>(value, ChoiceCase.INT_OPTION)
    object NotSet: ChoiceOneof<Unit>(Unit, ChoiceCase.CHOICE_NOT_SET)
  }
}

val MyMessage.choice: MyMessageKt.ChoiceOneof<*>
var MyMessageDsl.choice: MyMessageKt.ChoiceOneof<*>

In short, we’re representing oneof options as subclasses of an outer class, named for the oneof. This outer class is not marked sealed, because adding new options to a proto oneof is supposed to be a compatible change. Note that not making the class sealed has really only one effect: that you have to include an else branch in when expressions, which is exactly appropriate to make code remain compatible after new options have been added. Since the oneof class’s constructor is visibility restricted, users can’t add fake oneof options.

(Note that this is a simplification of the actual implementation, but this is basically what users see.)

This lets users write nice code like

fun bar(myMessage: MyMessage) =
  when (val option = myMessage.choice) { 
    // unfortunately, to make smart casts work, we have to store the property in a val
    is StringOption -> doString(option.value)
    is IntOption -> doInt(option.value)
    else -> doUnknown()
  }
// note that option.value is smart-cast to the correct type for that oneof option

val theMessage = myMessage {
  choice = if (useString) StringOption("the string") else IntOption(6)
}

Open Questions

Field Absence and Kotlin Extensions

Proto3, for better or worse, eliminates the distinction at the encoding level between absent scalar fields and scalar fields set to default. No API design can change that. There are straightforward, if somewhat awkward, workarounds: just using a simple one-field message to wrap scalar fields. Google provides these as well-known types built into protocol buffers.

This was a consciously made decision based on our experience within Google, and it’s worked reasonably well for us. Our experience has been that we haven’t had much trouble with treating absent and default as the same. But it’s not entirely clear to us how much demand exists among our open-source customers for improvements to the workarounds. We could provide helpers for Kotlin that represent optional fields as nullable properties, but we’d prefer not to do so without clearly demonstrated demand.

Here are some features we could provide:

  • For fields with a message type, which can still distinguish between absent and set-to-default, we could provide fieldOrNull extension properties.
  • For fields that use the wrapper types, specifically intended to represent scalar fields which need to distinguish between absent and set-to-default, we could generate nullable fieldValue extension properties: e.g. a proto containing google.protobuf.Int32Value my_field gets the extension MyMessage.myFieldValue: Int?.

Each of these would be provided as a read-only property on the message itself, but as a mutable property on the builder and DSL.

Destructuring declarations

This is a bit of a long-shot idea, but it might be that we could provide proto fields with number N as componentN() operator functions, so protos could be used in destructuring declarations. That is:

message MyMessage {
  int32 id = 1;
  string value = 2;
}
val (id, value) = myMessage

This seems plausible, but it seems unlikely to be especially useful, and would make choosing field numbers much more complicated than it is. Currently, they are chosen almost entirely arbitrarily, with the only real significance being that lower field numbers are less expensive on the wire. Proto field numbers are not really something that consumers are supposed to care about, and creating incentives to reshuffle proto field numbers -- a breaking change that could cause highly subtle bugs -- seems undesirable. Users can easily enough write their own extension functions component1, etc, and customize their choice of fields for each number.

Non-Goals

I expect this proposal will surprise many people in things we aren’t providing, in the vein of “Why don’t you do X?” I want to preemptively address a few of those questions.

Why not factory methods?

That is, methods of the form

fun myMessage(myInt: Int = 0, myString: String = "", …) : MyMessage

Protos are intended to support new fields added between old fields, which would break anyone using traditional, non-named-argument method calls. myMessage(5, 15) breaks if a new String-typed argument is added between the previous integer-valued fields. Note that if we could force users to use named parameters only, this might be more plausible. Note that this would also forbid Java callers. This seems like it could be a language feature worth pursuing in the future. Note that even if users use named parameters only, adding a new field to a proto becomes a source-compatible change. The factory method must be inline and directly delegate to the builder or DSL for it to be a binary compatible change as well.

There are some notable tricky aspects to creating a factory-style copy method (MyMessage.copy(myInt: Int = this.myInt, …)) and preserving repeated or map fields without extra performance overhead linear in the number of entries, even when nothing is changed. We can’t use reference equality, because the underlying Java proto currently allocates a new unmodifiable wrapper each time, and either checking equality or unconditionally clearing and rebuilding take linear time. We have some ideas for workarounds, but they’re very hackish.

Using a DSL has none of these problems, and looks syntactically similar in many ways:

// factory-style
myMessage(
  foo = theFoo,
  bar = theBar,
  repeatedBazes = theBazes
)
// DSL-style
myMessage {
  foo = theFoo
  bar = theBar
  repeatedBazes += theBazes
}

It’s a bit more awkward than if you aren’t wrapping the method arguments to new lines, but it’s not that much of a penalty, and it avoids all of the above issues.

Why not data classes?

The problems described above with factory methods apply just as much to data class constructors. We could not preserve the guarantee that exists today -- part of the fundamental objective of protocol buffers -- that adding new proto fields would not break existing code. Additionally, creating our own classes would break compatibility with existing Java code, which is not currently acceptable.

What about multiplatform?

This is definitely something we’re considering for down the road, but compatibility concerns make this tricky. The solution we’re thinking about is generating a full Kotlin API that includes the Java API as a subset -- meaning that when the Kotlin API is compiled, the resulting bytecode is source- and binary- compatible with the original Java proto API. This is far from trivial, especially in terms of maintaining performance characteristics -- which is tremendously important to us, since essentially all Google infrastructure communicates using protocol buffers. For the moment, we're focusing on the simpler case of extensions on the existing Java APIs.