-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AppFlow L2 Constructs #483
Comments
Re: "Automatic activation/deactivation of OnEventFlow and OnScheduleFlow types" There's a feature request to add a CFN template field to create Event&Scheduled flows as "activated". It's relatively simple to implement, will try to have it done within a month. So hopefully L2 CDK will be able to use that to keep it's code simpler. |
Adding some notes here for discussion:
|
The
I haven't experienced that. What matters is the existence of particular items in the array, but not the order itself. Could I ask you to confirm that @konls? |
In Console
Again, AppFlow's CFN handler doesn't care about Tasks order - just translates the collection into SDK's field.
AppFlow doesn't make promises on task execution order within each task type group - though likely the original task order is going to be maintained. |
We decided to publish this as a third-party library while we conduct the RFC review. All are welcome to try it out and provide feedback, it will really help us out in the review :). |
Closing this ticket. We believe the functionality is beneficial, but does not intersect with the core framework and should be vended and maintained separately. |
Description
I've been working on PoC L2 constructs for AppFlow recently and would like to discuss the structure as a proposition to include within the core AWS CDK library. Below I present the current state and decisions behind some choices made.
Flows
AppFlow is a fully managed integration service to transfer data between SaaS services and AWS. From the technical standpoint it enables its functionality through so-called flows, and distinguishes three separate types (so called flow triggers): OnDemand, Event, and Scheduled. I would like to propose that they are translated to separate types and currently the approach is to have:
OnDemandFlow
,OnEventFlow
,OnScheduleFlow
. This type division comes from the fact that some CFN elements are defined by a particular flow (trigger) type chosen. The naming conventionOn{type}Flow
is just to have it more consistent (I know, it's subjective).The below snippet shows a minimal definition of a flow (omitting source and destination definitions for brevity, which will be presented further in the description) in the proposed implementation:
A single source and a single destination
The Cfn
AWS::AppFlow::Flow
resource allows for providing a single source, and multiple destinations. However, AppFlow as of now works with only one source and only one destination, so the implementation enables setting a single source, and a single destination only. It can be changed with changingdestination
property todestinations
property (with changing the property type to an array). However, this could give a wrong impression that multiple destinations are an option.EventBridge notifications
Each flow publishes events to the default EventBridge bus. The proposed approach follows the one for the CodeCommit events and would be as:
onStarted
onFinished
onDeactivated
(only for theOnEventFlow
and theOnScheduleFlow
)onStatus
(only for theOnEventFlow
)This way one can consume the notifications as in the example below
Automatic activation/deactivation of
OnEventFlow
andOnScheduleFlow
typesAppFlow doesn't automatically enable
OnEventFlow
andOnScheduleFlow
. After creation one needs to call an SDK method to start a particular flow. Additionally, the deletion of a flow will fail if the flow is not deactivated. In order to deactivate the flow one needs to make a yet another SDK method. In the proposed implementation both flow types haveautoActivate
boolean property that when set totrue
will make sure the flow is activated during the deployment and deactivated before deletion.NOTE: currently the implementation uses a custom resource for the AWS API that might require shifting to the more general provider framework, but current (preliminary) tests show that this simple approach works.
Start/End windows for
OnScheduleFlow
The current issue with CloudFormation templates for AppFlow is that the start and end times for the schedule (if provided) are required to be in the future (evaluated during the deployment time), which means that when re-applying a CloudFormation template the user can face an error. This, in turn, means that a Cfn template cannot be immutable because any update to the flow potentially requires updating the start/end times. With CDK we can apply an approach using the Provider Framework for making sure that the windows are updated for us. For the rate-based flows this would require shifting a
startTime
specified in the past a number of cycles forward in order to make sure that the deployment is successful and to make sure that if the user expects a specific cadence (say a flow triggered at 00 time every hour) it is preserved. This means that with the proposed implementation we make the CDK code immutable, but the (resolved) deployment definition will not be (in general).The minimal example shows how to declare an
OnScheudleFlow
with a cron schedule.In Cfn the
startTime
definition would make the deployment fail. This means that for any fixed date a future deployment might fail and due to the current approach in handling the date in CloudFormation - the IaC definition will have to be modified. With the proposed implementation thestartTime
property in the definition above will be shifted to the nearest now so that the user will not face a failure with such a definition.NOTE: it still depends on the deployment time. That means that if we shift the time to the nearest future point in time, but the other deployment parts take long time the actual execution of the flow deployment might still be after the re-calculated startTime. An option here could be specifying some artificial offset to the future point in time (say, a hint for when calculating the future point in time to add some additional
Duration
).Applications
The approach is to have the core functionality put in a
core
folder, and a per-application implementations of a connector profiles, sources, and destinations under a dedicated application folder (e.g.salesforce
,marketo
,sapodata
, etc.) with a structure like:Connector Profiles
A connector profile is an instance of a connector. There are native and custom connectors. From the perspective of a consumer there should be no difference in how a connector profile is defined in the CDK. A common functionality is placed in
ConnectorProfileBase
class that due to the restrictions from JSII does not use generics.Example of importing a connector profile for a Salesforce integration
the
fromConnectorProfileName
uses an internal_fromConnectorProfileName
method ofConnectorProfileBase
casting to the appropriate type.An example of instantiating the connector profile for Salesforce integration.
Mind that the elements to be put there can be
IResolvable
, so aSecret
can be used withsecretValueFromJson
method.The properties contain an
oAuth
element somewhat influenced by how OAuth flows are defined in the Amazon Cognito UserPool client in CDK. That is why there is a sub-level elementflow
and thererefreshTokenGrant
- the flow used with this is a refresh token grant flow.ConnectorType
Connector profiles use connectorProfileName/connectorLabel properties to set actual Connectors. Custom connectors have
ConnectorType
property asCustomConnector
and the actual connector type is passed in ConnectorLabel. This is why this logic is abstracted away toConnectorType
type. Additionally, connector types are used in Tasks (described in more detail further in the text) as operations origin (the term used in this implementation). That is whyConnectorType
handles this as well.Public/Private modes
Currently the focus is on public connection mode. As private mode is not currently supported by most of the connections and it needs more analysis (more in-depth view for example in this blog).
Sources & Destinations
a Connector Profile can be used for either a source or a destination. In the proposed implementation they are both called vertices and each contains a property
connectorType
. The reasoning behind it is that tasks, when they specify what operator is to be used require (currently: source) connector type. Making it a required element for a vertex enables us to remove the requirement for a user to specify the connector type each time a task is defined, as it will be done on their behalf.Sources and destinations are not Cfn resources themselves, that is why the approach follows the usage of
bind
method.OnScheduleFlow
andincrementalPullConfig
In the Cfn the definition of the
incrementalPullConfig
(which effectively gives a name of the field used for tracking the last pulled timestamp) is on the SourceFlowConfig property. As this is effectively a responsibility of theOnScheduleFlow
this has been moved to the flow type's properties.S3Destination and Glue Catalog
Although in Cfn the glue catalog configuration is settable on the flow level - it works only when the destination is S3. That is why the implementation shifts the properties to the
S3Destination
, which in turn requires usingLazy
for populatingmetadataCatalogConfig
in the flow.Tasks
Note: in the current implementation the proposed naming convention is to talk about records which are tuples in a set, and fields which are particular elements in a record.
Tasks are steps that can be taken upon fields. Tasks compose higher level constructs that in this implementation are named
Operations
. It is the operations that are the founding elements in the proposed implementation. There are four operations identified:Transforms
- 1-1 transforms on source fields, like truncation or maskingMappings
- 1-1 or many-to-1 operations from source fields to a destination fieldFilters
- operations that limit the source data on a particular conditionsValidations
- operations that work on a per-record level and can have either a record-level consequence (i.e. dropping the record) or a global one (terminating the flow).The approach to the usage is heavily influenced by the experience of defining Step Function Choice SFN tasks. See the example below:
The operations are not resources themselves, but parts of the definition of the Cfn
AWS::AppFlow::Flow
resource. That is why they have abind
method. In order to remove the necessity of specifying the (source) connector type in them thebind
method takes bothIFlow
andISource
types, where the latter having propertyconnectorType
allows for specifying task operator's origin.Note on field types: Current implementation doesn't remove setting field types via
dataType
. There is thedescribe-connector-entity
SDK call that could be investigated to remove this, but for now the assumption is that the users would have to know the record fields and their types to use the implementation.Mappings
When using mappings that use
MAP
task type it is required to have aFilter
task type withPROJECTION
operator for the fields. The current implementation automates building this projection. This makes defining only maps whereas the proposed implementation takes care of the rest.Many-to-one mappings
When creating many-to-one mappings the current AppFlow mechanism creates an intermediate field with a name
${field1Name},${field2Name}
. The current implementation follows that. This has a consequence that there is only one many-to-one definition allowed. It will be identified if this can be extended.Filters
Similar to Mappings, Filters also require a field present in
PROJECTION
, that is why they are included in the projection builder inFlowBase
AppFlow permissions
AppFlowPermissionsManager
is a class inspired by the AWS CDK coreTokenMap
, and utilisesLazy
to make sure we're generating only one input to the S3 Bucket, KMS Key, and Secrets Manager Secret resource policies no matter how many times we're reusing the resource. The reason behind this was to remove the necessity of manual setting of the right permissions for AppFlow in the resource policies and the fact that sometimes the resources are used multiple times in the stack definition For example an S3 Bucket can be used for ErrorHandling definition with a different prefix, so when we pass the bucket to different source/destination definitions and potentially flows - we need a set of permissions for the service principal.PoC implementation
https://github.com/rpawlaszek/appflow-cdk
Roles
Workflow
status/proposed
)status/review
)api-approved
applied to pull request)status/final-comments-period
)status/approved
)status/planning
)status/implementing
)status/done
)The text was updated successfully, but these errors were encountered: