Skip to content

Suite of processors to allow content and attribute ingress and egress with NiFi using the the Apache Thrift protocol

Notifications You must be signed in to change notification settings

DavidTurland/nifi-simple-thrift-converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nifi-simple-thrift-converter

A suite of processors to allow content and attribute ingrees and egress with NiFi using the the Apache Thrift protocol.

Author : David Turland

Table of Contents

Summary

Apache Thrift is an alternative to other transport protocols, eg Google Protocol Buffers, and Apache Avro

nifi-simple-thrift-converter comprises three NiFi processors:

  • FromThriftProcessor - converts serialised Thrift object in the flowfile content to Flowfile content and attributes
  • ToThriftProcessor - converts Flowfile content and attributes to a Thrift object and serialises to the flowfile content
  • PutThriftIDL - returns the Thrift IDL as the flowfile content. Useful to allow extracting the Thrift IDL file from a Running NiFi

NOTE The processors are not edge processors, ingress and egrees to NiFi can be provided by standard NiFi processors, eg

  • HTTTPRequest
  • HTTResponse

These processors wil be useful if your existing workflow ultilises Thrift, or there is that Thrift killer-feature, eg better language support

Description

nifi-simple-thrift-converter provides a number of features:

  • NiFi-prescribed thrift specification
  • These processors work with a fixed Thrift IDL specification: a Thrift FlowFile struct which largely mimics a NiFi flowfile, with 'attributes', and 'content'.
  • This allows arbitrary flowfiles to be converted(serialised) to Thrift, and conversely, deserialised Thrift structs to flow intact within NiFi, with the NiFi attribute,content look and feel

If your requirement is to have custom data structures enter, exit and flow around NiFi(captured in say JSON) then maybe Avro is the better option. Though FlowFile content can always be JSON

Build and Installation

Thrift Version

Requires thrift>=0.16.0

Build the NiFi Nar file

Prerequisites

  • Ensure Apache Thrift is installed.
    • If your thrift version is not as per thrift.version in pom.xml then this will require defining on the maven cli, eg mvn -Dthrift.version=0.16.0 package
  • NiFi Version
    • If your NiFi version is not as per NiFi.version in pom.xml then this will also require defiinng on the maven line, eg mvn -Dnifi.version=2.0.0-SNAPSHOT package
  • build and deploy processors
# nifi is installed in dir NIFI_DIR
mvn package
# or maybe
mvn -Dthrift.version=0.16.0 package
# or even
mvn -Dthrift.version=10.16.0 -Dnifi.version=2.0.0-SNAPSHOT package
cp nifi-simple-thrift-converter-nar/target/nifi-simple-thrift-converter-nar-2.0.0-SNAPSHOT.nar \
   $NIFI_DIR/lib

Although supposedly there is support for runtime autoload of nars see property in nifi.properties:

nifi.nar.library.autoload.directory=./extensions

In which case:

cp nifi-simple-thrift-converter-nar/target/nifi-simple-thrift-converter-nar-2.0.0-SNAPSHOT.nar\
   $NIFI_DIR/extension

Developing clients for Nifi Thrift

Obtaining the thrift file

  • from repo ( nifi-simple-thrift-converter-processors/src/main/resources/flowfile_nifi.thrift)
  • at runtime via PutThriftIDL processor ( see below )

The thrift file (Mapping of content and attributes)

Two classes,FlowFileRequest, and FlowFileReply used for passing requests to NiFi and holding responses, respectively. Both contain a ThriftFlowFile which is akin to a NiFi flowfile

  struct ThriftFlowFile{
   3: map<string,string> attributes,
   15: binary content,
}
  1. NOTE To avoid name collisions, attributes in ThriftFlowFile.attributes, eg 'xxx' are prefixed with 'thrift.attr' ('thrift.attrr.xxx') when copied to flowfile attributes

  2. ThriftFlowFile.content is coped verbatim to flowfile content

Example FromThriftprocessor and ToThriftProcessor

NiFi template This is a simple round-trip example

In practice:

  • The FromThriftProcessor SUCCESS would flow to 'many things'
  • The HandleHttpResponse might just return success or failure flowed from some downstream process

FromThriftProcessor and ToThriftProcessor Properties

  • ThriftProtocol - The Thrift protocol to be used (unless dynamic protocol set) for:
    • serialisation (ToThriftProcessor)
    • deserialisation (FromThriftProcessor)
  • DynamicProtocol (TODO) - should the required Thrift protocol be dynamic (per flowfile) with the protocol specified in the a thrift object attribute?
  • ConversionScope - what members of the Thrift FlowFile are converted
    • All - both the 'content' and 'attributes' members
    • Attributes - only the 'attributes' member
    • Content - only the 'content' member

PutThriftIDL Processor

This allows extracting the Thrift IDL file being used in a Running Nifi

NiFi template

Testing:

curl http://localhost:9091 --output /tmp/flowfile.thrift
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1602    0  1602    0     0  53367      0 --:--:-- --:--:-- --:--:-- 55241

Unit Tests, Client examples,tests and benchmarks

Unit Tests

These are run automatically.

Client examples,tests and benchmarks

Client HTTP Client Benchmark
Java Y JMH
Perl Y Benchmark
Python Y

Benchmarking notes

The java http client example is also a JMH test ( https://github.com/openjdk/jmh )

cd examples/java
mvn clean verify
java -jar target/benchmarks.jar

The Perl (NO LONGER MAINTAINED) example also contained a simple Benchmark (10 kB content) (https://perldoc.perl.org/Benchmark)

cd examples/perl
bash build.bash
...
Benchmark: running simple for at least 1 CPU seconds...
    simple:  6 wallclock secs ( 0.94 usr +  0.13 sys =  1.07 CPU) @ 224.30/s (n=240)

TODO

Licence

nifi-simple-thrift-converter is copyright 2021 David Turland

nifi-simple-thrift-converter is released under the Apache Licence 2.0

Author

David Turland Still passionate, after over 25 years in HPC, about developing robust, performant[1], software, libraries, frameworks, to hide the vagaries, yet exploit the possibilities of whatever future hardware arrives.

Useful Links

Apache Avro

Apache Thrift

Maven .m2/settings.xml

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 https://maven.apache.org/xsd/settings-1.0.0.xsd">
    <!-- https://www.baeldung.com/maven-settings-xml
       mvn -X clean
       mvn help:effective-settings
    -->
    <profiles>
        <profile>
            <id>windows-profile</id>
            <activation>
                <activeByDefault>false</activeByDefault>
                <os>
                    <name>Windows 10</name>
                    <family>Windows</family>
                    <arch>amd64</arch>
                    <version>10.0</version>
                </os>
            </activation>
            <properties>
                <thrift.executable>${user.home}\bin\thrift-0.16.0.exe</thrift.executable>
                <thrift.version>0.13.0</thrift.version>
            </properties>
        </profile>
            <id>linux-profile</id>
            <activation>
                <activeByDefault>false</activeByDefault>
                <os>
                    <family>unix</family>
                </os>
            </activation>
            <properties>
                <thrift.executable>/usr/bin/thrift</thrift.executable>
                <thrift.version>0.16.0</thrift.version>
            </properties>
        </profile>
    </profiles>
</settings>

Apache NiFi

Apache Trift

Thrift, is an alternative to say protobuf,Avro, where

  • Thrift supports many languages ( more than protobuf )
  • Thrift offers an efficient binary format ( and JSON, etc)
  • though, and more likely, you already use Thrift in-house

NOTE Avro, a newer self-describing protocol, is already supported within NiFi Services

Testing ( Mock and unit tests)

Assembling flows:

1: Arbitrary footnote

About

Suite of processors to allow content and attribute ingress and egress with NiFi using the the Apache Thrift protocol

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published