Skip to content

Latest commit

 

History

History
197 lines (117 loc) · 12.8 KB

README.md

File metadata and controls

197 lines (117 loc) · 12.8 KB

Logo


Introduction

nTangle is a Change Data Capture (CDC) code generation tool and corresponding runtime. Unlike other CDC-based technologies which replicate changes to rows, nTangle is designed to replicate business entity (aggregate) changes.

For example, if a database contains a Person and one-to-many related Address table, a traditional CDC replicator would leverage the CDC-capabilities of the database as the data source and replicate all changes from both tables largely distinct from each other. Additional logic would then be required within the downstream systems to aggregate these distinct changes back into a holistic business entity where required, if possible.

nTangle tackles this differently by packaging the changes at the source into an aggregated entity which is then replicated. With nTangle the CDC-capabilities of the database are leveraged as the trigger, with a corresponding query across all related tables to produce a holistic business entity. Therefore, if a change is made to Person or Address this will result in the publishing of the entity. Where transactional changes are made to both Person and Address a single holistic business entity will be published including all changes.

This has a key advantage of being an excellent candidate within event-streaming scenarios where business entities are to be published based on underlying database changes.


Sidecar database

As of version 3.0.0 the preferred (recommended and default) approach is to use a sidecar database to manage the nTangle runtime artefacts. This is to limit changes to the source database beyond the requirement for CDC itself.

Usage of a sidecar database will also limit impact (load and data) on the source database by minimizing access to the required CDC and related data selection only. Otherwise, the required runtime orchestration will leverage the sidecar database only.

Note that there are no cross database dependencies; as such, the sidecar database can be hosted separately, be on a different version, etc. as required. The .NET orchestrator logic will require access to both databases to function.


Demonstration

The following video provides a high-level demonstration of nTangle (v2-no sidecar) and its capabilities.

ntangle.demo.mp4

Status

CI NuGet version Coverage Status

The included change log details all key changes per published version.


Approach

The nTangle CDC approach taken here is to consolidate the tracking of individual tables (one or more) into a aggregated entity to simplify the publishing to an event stream (or equivalent). The advantage of this is where a change occurs to any of the rows related to an entity, even where multiples rows are updated, this will only result in a single event. This makes it easier (more logical) for downstream subscribers to consume.

This is achieved by defining (configuring) the entity, being the primary (parent) table, and its related secondary (child) tables. For example, a SalesOrder, may be made up multiple tables - when any of these change then a single SalesOrder event should occur. These relationships are also defined with a cardinality of either OneToMany or OneToOne.

SalesOrder             // Parent
└── SalesOrderAddress  // Child 1:n - One or more addresses (e.g. Billing and Shipping)
└── SalesOrderItem     // Child 1:n - One or more items

The CDC capability is used specifically as a trigger for change (being Create, Update or Delete). The resulting data that is published is the latest, not a snapshot in time (CDC captured). The reason for this is two-fold:

  1. Given how the CDC data is batch retrieved there is no guarantee that the CDC captured data represents a final intended state (transactionally consistent) suitable for publishing; and,
  2. This process is intended to be running near real-time so getting the latest version will produce the most current committed version as at that time.

To further guarantee only a single event for a specific version is published the resulting entity is JSON serialized and hashed; this value is checked (and saved) against the prior version to ensure a publish contains data that is actionable. This will minimize redundant publishing, whilst also making the underlying processing more efficient.


Change-data-capture (CDC)

This official documentation describes the Microsoft SQL Server CDC-capabilities.

Although throughout references are made to Microsoft SQL Server, the intention of nTangle is that it is largely agnostic to the database technology, and therefore support for other databases will (or may) be supported in the future based on demand, and their capabilities.


Architecture

The NTangle Microsoft SQL Server underlying architecture is described here.


Capabilities

nTangle has been created to provide a seamless means to create CDC-enabled aggregated entity publishing solution. The nTangle solution is composed of the following:

  1. Code generation - a configuration file defines the database tables, none or more relationships, and other functionality-based properties, that are used to drive the database-driven code-generation to create the required solution artefacts.
  2. Runtime - the generated solution artefacts leverage a number of .NET runtime components/capabilities to support and enable. The code-generated solution then uses these at runtime to execute and orchestrate the CDC-triggered aggregated entity publishing process.

Code-generation

The code-generation is managed via a console application using the CodeGenConsole to manage. This internally leverages OnRamp to enable the underlying code-generation capabilities.

Additionally, the code-generator inspects (queries) the database leveraging DbEx to infer the underlying table schema for all tables and their columns. This is used as a source in which the configuration references to validate, whilst also minimizes configuration where the inferred schema information can be used. The code-generation adopts a gen-many philosophy, therefore where schema changes are made, the code-generation can be executed again to update accordingly.

As stated, the code-generation is driven by a configuration file, typically named ntangle.yaml. Both YAML and JSON formats are supported; there is also a corresponding JSON schema to enable editor intellisense, etc.

The nTangle configuration is as follows:

Root
└── Table(s)
  └── Join(s)
    └── JoinOn(s)
    └── JoinMapping(s)
  └── TableMapping(s)

Documentation related to each of the above are as follows:

  • Root - defines the root configuration settings.
  • Table - defines the primary table as being the entity aggregate.
  • Join - defines none or more table joins to include within the entity.
  • JoinOn - defines the join on column characteristics.
  • JoinMapping - defines global identifier mappings for any of the join table columns.
  • TableMapping - defines global identifier mappings for any of the primary table columns.

An example ntangle.yaml configuration file exists within the SqlServerSidecarDemo sample. The SqlServerSidecarDemo.CodeGen sample also demonstrates how to invoke the code generator from the underlying Program.

The code-generator will output a number of generated artefacts; these will be either database-related (see SqlServerSidecarDemo.Database sample) or corresponding .NET runtime components (see SqlServerSidecarDemo.Publisher sample).

The following NTangle namespaces provide the code-generation capabilties:

Namespace Description
Config The internal capabilities that support the YAML/JSON configuration.
Console The code-generation tooling capabilities, primarily CodeGenConsole.
Generators The internal code-generators used to select configuration for one or more Templates as orchestrated by the underlying Scripts.

Runtime

Generally, a runtime publisher is required to orchestrate the CDC-triggered aggregated entity publishing process (see SqlServerSidecarDemo.Publisher sample). This in turn takes a dependency on the nTangle runtime to enable.

The following NTangle namespaces provide the runtime capabilties:

Namespace Description
Cdc CDC-orchestration capabilities, primarily EntitySidecarOrchestrator.
Data Database access capabilities to support the likes of batch tracking, identifier mapping and versioning.
Events Event capabilities, leveraging and extending the capabilities enabled by CoreEx.
Services Service hosting capabilities, primarily the CdcHostedService.

Additional documentation

The following are references to additional documentation.

  • Microsoft SQL Server - deep-dive of the Microsoft SQL Server nTangle architecture and implementation.

Samples

The following samples are provided to guide usage:

Sample Description
SqlServerSidecarDemo A sample as an end-to-end solution to demonstrate the usage of nTangle against a Microsoft SQL Server database leveraging a sidecar database. This is the preferred and default approach to use nTangle.
SqlServerDemo A sample as an end-to-end solution to demonstrate the usage of nTangle against a single Microsoft SQL Server database. This is the legacy approach to use nTangle.

However, the best place to follow along and learn is to use the NTangle.Template tool - instructions are within to guide end-to-end setup and execution.


Tooling

The following tools are provided to support development:

Sample Description
NTangle.Template This is the .NET template used to accelerate the creation of an nTangle solution and all projects using dotnet new. This leverages the .NET Core templating functionality.
NTangle.ArtefactGenerate.Tool This in an internal tool used for nTangle development that provides a means to auto-generate the corresponding JSON Schema and markdown documentation from the related .NET configuration entities.

License

nTangle is open source under the MIT license and is free for commercial use.


Contributing

One of the easiest ways to contribute is to participate in discussions on GitHub issues. You can also contribute by submitting pull requests (PR) with code changes. Contributions are welcome. See information on contributing, as well as our code of conduct.


Security

See our security disclosure policy.


Who is Avanade?

Avanade is the leading provider of innovative digital and cloud services, business solutions and design-led experiences on the Microsoft ecosystem, and the power behind the Accenture Microsoft Business Group.