PMML Evaluator

Table of Content

Example of using PMML Evaluator
Discussion of the functionName attribute for PMML model
The name of Mining Fields and Local Transformation Derived Fields
Normalize data

Example of using PMML Evaluator

Here is an example for using NeuralNetwork Evaluator

           public void testEvaluator() {
                   PMML pmml = PMMLUtil.loadPMML(PMMLFILEPATH);
                   NeuralNetworkEvaluator evaluator = new NeuralNetworkEvaluator(pmml);
                   List<Map<FieldName, String>> input = CsvUtil.load(EVALUATIONDATASET);
                   for (Map<FieldName, String> maps : input) {
                       switch (evaluator.getModel().getFunctionName()) {
                           case REGRESSION:
                               Map<FieldName, Double> regressionTerm = (Map<FieldName, Double>) evaluator.evaluate(maps);
                               for (Double value : regressionTerm.values())
                                   System.out.println(value * 1000);
                               break;
                           case CLASSIFICATION:
                               Map<FieldName, ClassificationMap<String>> classificationTerm = (Map<FieldName, ClassificationMap<String>>) evaluator.evaluate(maps);
                               for (ClassificationMap<String> cMap : classificationTerm.values())
                                   for (Map.Entry<String, Double> entry : cMap.entrySet())
                                       System.out.println(entry.getValue() * 1000);
                       }
                   }
               }

Discussion of the functionName attribute for PMML model

The difference of setting the functionName attribute to classification and regression

	regression	classificaton	notes
Specify function name	`<NeuralNetwork modelName="demoModel" functionName="regression">`	`<NeuralNetwork modelName="demoModel" functionName="classification">`
Output expression	`<FieldRef field="diagnosis_transformed"/>`	`<NormDiscrete field="diagnosis_transformed" value="M"/>`	Neural network models often split categorical and ordinal fields into multiple dummy fields. This kind of normalization is supported in PMML by the element NormDiscrete.
PMML evaluator	`Map<FieldName, Double> regressionTerm = (Map<FieldName, Double>) evaluator.evaluate(maps);`	`Map<FieldName, ClassificationMap<String>> classificationTerm = (Map<FieldName, ClassificationMap<String>>) evaluator.evaluate(maps);`

Example1: regression <NeuralNetwork modelName="demoModel" functionName="regression">

            <NeuralOutputs numberOfOutputs="1">
                <NeuralOutput outputNeuron="2,0">
                    <DerivedField optype="continuous" dataType="double">
                        <FieldRef field="diagnosis"/>
                    </DerivedField>
                </NeuralOutput>
            </NeuralOutputs>

Example2: regression <NeuralNetwork modelName="demoModel" functionName="regression">

 <NeuralOutputs numberOfOutputs="1">
      <NeuralOutput outputNeuron="13">
        <DerivedField optype="continuous" dataType="double">
          <NormContinuous field="amount of claims">
            <LinearNorm orig="0" norm="0.1"/>
            <LinearNorm orig="1291.68" norm="0.5"/>
            <LinearNorm orig="5327.26" norm="0.9"/>
          </NormContinuous>
        </DerivedField>
      </NeuralOutput>
    </NeuralOutputs>

Example3: classification <NeuralNetwork modelName="demoModel" functionName="classification">

Here is an example that uses classification function: Iris PMML

    <NeuralLayer>
    	<Neuron id="2,0" bias="36.829174221809204">
    		<Con from="1,0" weight="-15.428606782109018" />
    		<Con from="1,1" weight="-58.68586577113855" />
    		<Con from="1,2" weight="-4.533681748641222" />
    	</Neuron>
    	<Neuron id="2,1" bias="-3.832065207474468">
    		<Con from="1,0" weight="4.803555297576479" />
    		<Con from="1,1" weight="4.858790438015236" />
    		<Con from="1,2" weight="-12.562463287384077" />
    	</Neuron>
    </NeuralLayer>
    <NeuralOutputs numberOfOutputs="2">
    	<NeuralOutput outputNeuron="2,0">
    		<DerivedField optype="categorical" dataType="string">
    			<NormDiscrete field="class" value="Iris-setosa" />
    		</DerivedField>
    	</NeuralOutput>
    	<NeuralOutput outputNeuron="2,1">
    		<DerivedField optype="categorical" dataType="string">
    			<NormDiscrete field="class" value="Iris-versicolor" />
    		</DerivedField>
    	</NeuralOutput>
    </NeuralOutputs>

Notes:

Neural network models often split categorical and ordinal fields into multiple dummy fields. This kind of normalization is supported in PMML by the element NormDiscrete.
The id of outputNeuron decides the score of the neuron output, while the value attribute of the NormDiscrete decides the item name in the ClassificationMap<String>.
The computed activation of the output neurons is compared with the normalized values of the corresponding target fields;The difference between the neuron's activation and the normalized target field determines the prediction error.
For scoring the normalization for the target field is used to denormalize the predicted value in the output neuron. Therefore, each instance of Neuron which represent an output neuron, is additionally connected to a normalized field.

The name of Mining Fields and Local Transformation Derived Fields

Please check scope of field for PMML.

PMML Evaluator will first check whether the field is an input field, if the field is an input, it will return the value directly.
If the field is not an input field, it will check the local derived fields in local transformation and returns the value after the transforamtion, that is, the normalized data.

class ExpressionUtil {
	static	public FieldValue evaluate(FieldName name, EvaluationContext context){
		Map.Entry<FieldName, FieldValue> entry = context.getFieldEntry(name); //input fields
		if(entry == null){
			DerivedField derivedField = context.resolveDerivedField(name); //get local derived fields
			if(derivedField == null){
				return null;
			}
			FieldValue value = evaluate(derivedField, context);
			// Make the calculated value available for re-use
			context.declare(name, value);
			return value;
		}
		return entry.getValue();
	}
}

The implementation of generating neuron inputs based on both local transformation and mining schema haven't been finished.

For each mining field that is supplementary in usageType, ignore it.
If the field is not used by any of the local transformations, create a neuron input using the name of the mining field.
If the field is used by a local transformation, creates a neuron input with FieldRef of transformed field's name, followed by the field order of the mining schema. (Q: What if there are more than one mining fields that use in one local transformation, and what if there are more than one local transformations that use a single mining field?)

Normalize data

Example of normalizing data in SparkLogisticRegressionToPMMLTest

Sample code in SparkLogisticRegressionToPMMLTest

	private void evaluate(SparkTestDataGenerator evalInput) {
		for (Map<FieldName, String> map : evalInput.getEvaluatorInput()) {
			ModelEvaluationContext context = new ModelEvaluationContext(null, evaluator);
			context.declareAll(map);
			Vector vector = new DenseVector(evalInput.normalizeData(context));
			Assert.assertEquals(getPMMLEvaluatorResult(map),mlModel.predict(vector), DELTA);
		}
	}

Sample code in SparkTestDataGenerator

   	public double[] normalizeData(ModelEvaluationContext context) {
		Model model = pmml.getModels().get(0);
		List<DerivedField> derivedFields = model.getLocalTransformations().getDerivedFields();
		List<Double> transformed = new ArrayList<Double>();
		for (DerivedField df : derivedFields) {
			if (df.getExpression() instanceof NormContinuous) {
				NormContinuous norm = (NormContinuous) df.getExpression();
				transformed.add(Double.parseDouble(NormalizationUtil.normalize(norm, context.getField(norm.getField())).getValue().toString()));
			}
                   ...
		}
		int len = transformed.size();
		double[] result = new double[len];
		for (int i = 0; i < len; i++)
			result[i] = transformed.get(i);
		return result;
	}

Shifu: A Distributed Model Training Framework on Hadoop

DOWNLOAD

Provide feedback

Saved searches