Skip to content

Hadrian Basic Use

Jim Pivarski edited this page Aug 6, 2015 · 20 revisions

Before you begin...

Download any pre-built Hadrian JAR (except for the hadrian-gae WAR). This article was tested with Hadrian 0.7.1; newer versions should work with no modification. Scala >= 2.10 is required.

Launch a Scala prompt using that JAR as a classpath:

> scala -cp hadrian-standalone-TRUNK-jar-with-dependencies.jar

and import com.opendatagroup.hadrian.jvmcompiler.PFAEngine:

Welcome to Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45).
Type in expressions to have them evaluated.
Type :help for more information.

scala> import com.opendatagroup.hadrian.jvmcompiler.PFAEngine

Simplest possible scoring engines

Let's start with an engine that merely adds 10 to each input. That's something we can write inline.

scala> val engine = PFAEngine.fromJson("""
     | {"input": "double",
     |  "output": "double",
     |  "action": {"+": ["input", 100]}}
     | """).head
engine: com.opendatagroup.hadrian.jvmcompiler.PFAEngine[AnyRef,AnyRef] = PFA_Engine_1@3f792b9b

For convenience, we could have written it in YAML (all of Hadrian's unit tests are written this way).

scala> val engine = PFAEngine.fromYaml("""
     | input: double
     | output: double
     | action: {+: [input, 100]}
     | """).head
engine: com.opendatagroup.hadrian.jvmcompiler.PFAEngine[AnyRef,AnyRef] = PFA_Engine_2@53e211ee

Note in both cases that we asked for the .head of what PFAEngine.fromJson and PFAEngine.fromYaml produces. In general, these functions produce a collection of PFAEngine objects from one PFA file (pass multiplicity = 4 and drop .head to see that). These scoring engines can run in parallel and share memory. For now, though, we're only interested in one scoring engine.

By virtue of having created an engine, the PFA has been fully validated. If the PFA is not valid, you would see

  • a Jackson exception because the JSON wasn't valid;
  • a SnakeYAML exception because the YAML wasn't valid;
  • PFASyntaxException if Hadrian could not build an AST of the PFA from the JSON, for instance if a JSON field name is misspelled;
  • PFASemanticException if Hadrian could not build Java bytecode from the AST, for instance if data types don't match;
  • PFAInitializationException if Hadrian could not create a scoring engine instance, for instance if the cell/pool data are incorrectly formatted.

Now run the scoring engine on some sample input:

scala> engine.action(java.lang.Double.valueOf(3.14))
res0: AnyRef = 103.14

For Java accessibility, the action method takes and returns boxed values of type AnyRef (Object in Java). See [Feeding-Data-to-Hadrian] for a complete menu.

You should only ever see one of the following exceptions at runtime

  • PFARuntimeException if a PFA library function encountered an exceptional case, such as a.max of an empty list.
  • PFAUserException if the PFA has explicit {"error": "my error message"} directives.
  • PFATimeoutException if the PFA has some "options": {"timeout": 1000} set and a calculation takes too long.

Abstract Syntax Tree

The PFA AST is an immutable tree structure built from the serialized JSON, stored in engine.config, which is an EngineConfig. You can query anything about the original PFA file in a structured way through this AST. For instance,

scala> engine.config.action.head
res1: com.opendatagroup.hadrian.ast.Expression = {"+":["input",100]}

scala> engine.config.action.head.getClass.getName
res2: String = com.opendatagroup.hadrian.ast.Call

scala> engine.config.input.avroType
res3: com.opendatagroup.hadrian.datatype.AvroType = "double"

There are also a few methods for recursively walking over the AST. The collect method applies a partial function to all nodes in the tree and produces a list of matches. For instance, to get all Expressions (function calls like "+", symbol references like "input", and literal values like "100"), do

scala> import com.opendatagroup.hadrian.ast.Expression
scala> engine.config collect {case x: Expression => x}
res4: Seq[com.opendatagroup.hadrian.ast.Expression] = List({"+":["input",100]}, "input", 100)

You can also build new scoring engines by passing a replacement function. This one turns instances of 100 into 999. You can do quite a lot just by crafting the right partial function.

scala> import com.opendatagroup.hadrian.ast.LiteralInt
scala> engine.config replace {case x: LiteralInt if (x.value == 100) => LiteralInt(999)}
res5: com.opendatagroup.hadrian.ast.Ast = {"name":"Engine_2","method":"map","input":"double","output":"double","action":[{"+":["input",999]}]}
scala> import com.opendatagroup.hadrian.ast._
scala> import com.opendatagroup.hadrian.datatype._
scala> import com.opendatagroup.hadrian.options.EngineOptions

scala> trait LispCode extends TaskResult

scala> case class LispFunction(car: String, cdr: Seq[LispCode]) extends LispCode {
     |   override def toString() = "(" + car + cdr.map(" " + _.toString).mkString + ")"
     | }

scala> case class LispSymbol(name: String) extends LispCode {
     |   override def toString() = name
     | }

scala> object GenerateLisp extends Task {
     |   def apply(context: AstContext, engineOptions: EngineOptions, resolvedType: Option[Type]): TaskResult = context match {
     |     case Call.Context(_, _, fcn: LibFcn, args: Seq[TaskResult], _, _, _) => LispFunction(fcn.name, args.map(_.asInstanceOf[LispCode]))
     |     case Ref.Context(_, _, name: String) => LispSymbol(name)
     |     case LiteralInt.Context(_, _, value: Int) => LispSymbol(value.toString)
     |   }
     | }

scala> val symbolTable = SymbolTable.blank
scala> symbolTable.put("input", AvroDouble())
scala> engine.config.action.head.walk(GenerateLisp, symbolTable, FunctionTable.blank, new EngineOptions(Map(), Map()))._2
res10: com.opendatagroup.hadrian.ast.TaskResult = (+ input 100)
Clone this wiki locally