Skip to content

Automatic Orderings, Monoids and Arbitraries

P. Oscar Boykin edited this page May 3, 2016 · 6 revisions

Ordering

Scalding uses implicit Ordering[K] instances to do the sort for keys. If your key type is a case class, primitive, collection, or recursion of these types you can automatically generate the Ordering using a macro:

import com.twitter.scalding.serialization.macros.impl.BinaryOrdering._

If you are working with scrooge thrift data, you might instead use:

import com.twitter.scalding.thrift.macros.Macros._

from the scalding-thrift-macros package.

The above macro actually creates an OrderedSerialization[T] which extends Ordering[T] with a Serialization[T] and a means to compare serialized data directly without allocating objects in the sort. When these are used with scalding we have seen 20-80% decreases in running time of jobs bigger keys (the bigger the key and the more strings in the key, the bigger the win).

Semigroup, Monoid, Group, Ring, Arbitrary

Algebird has similar macros to provide automatic instances for case classes:

import com.twitter.algebird.macros.caseclass._ // get the algebras if all the elements of the case class have one

If you add then algebird-test package to your dependencies you can also access:

import com.twitter.algebird.ArbitraryCaseClassMacro

case class Foo(i: Int, s: String)
implicit val fooArb: Arbitrary[Foo] = ArbitraryCaseClassMacro.arbitrary[Foo]

Contents

Getting help

Documentation

Matrix API

Third Party Modules

Videos

How-tos

Tutorials

Articles

Other

Clone this wiki locally