Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink ConnectorConfigurationFactory #1607

Closed
mandy-chessell opened this issue Sep 20, 2019 · 21 comments
Closed

Rethink ConnectorConfigurationFactory #1607

mandy-chessell opened this issue Sep 20, 2019 · 21 comments
Assignees
Labels
consumability Makes the software easier to use or understand. Includes docs, error messages, logs, API definitions enhancement New feature or request no-issue-activity Issues automatically marked as stale because they have not had recent activity.

Comments

@mandy-chessell
Copy link
Contributor

mandy-chessell commented Sep 20, 2019

The ConnectorConfigurationFactory is used by the admin services to create default Connection objects when configurating OMAG Servers.

The first problem with it is that it is located in the adapters module which is confusing. The adapters module should contain implementations of connectors for specific technologies and so the connector configuration factory does not belong there. It should be moved under the admin services module.

The second problem is that this class pulls the connector implementations used in our default set up into the OMAG server platform. These remain even when that technology is not being used. Most are file-based connectors which are pure java and do not pull in client libraries. However we do pull in the Kafka event bus provider. Is this a problem for an organization that does not use Kafka? Should we change this to a hardcoded definition using strings?

There is a similar problem for the goverance daemons whose purpose is to connect to lots of different technologies. The hard-coding approach quickly becomes untenable. We cannot have all of these connectors loaded into the OMAG Server Platform.

Should we have an admin client that is responsible for creating the Connection objects using the connector implementations and then sending the connection objects only to the admin services to store?

@mandy-chessell
Copy link
Contributor Author

Another approach, is to maintain the connection definitions as an open metadata archive so they are built as part of the build based in the actual implementation without creating dependencies in the server.

This would make it easy to load them into an open metadata repository.
It would require some thought to enable the platform/admin services to be able to load them as archive loading is typically done by the OMRS at server startup.

@mandy-chessell mandy-chessell self-assigned this Sep 20, 2019
@mandy-chessell mandy-chessell added consumability Makes the software easier to use or understand. Includes docs, error messages, logs, API definitions enhancement New feature or request labels Sep 20, 2019
@mandy-chessell
Copy link
Contributor Author

mandy-chessell commented Sep 20, 2019

A final approach is to manage them in the build, as with the archive approach, but serialize the connection objects directly to JSON files. The admin services are already able to work with serialized connection objects because they are used in the configuration documents.

@mandy-chessell mandy-chessell added the functionality-call To discuss and agree during weekly functionality call(s) label Sep 20, 2019
@planetf1
Copy link
Member

See also comment on #1606 about provided scope - but this causes it's own set of issues...
In a pluggable environment, the organization should be able to chose what to deploy, and if they don't want to use our kafka connector, there should be no remnants of kafka on their system in an idea world.. I do think loading the default set always is not appropriate, we should load whatever is configured for that deployment

@mandy-chessell
Copy link
Contributor Author

I would ideally like to get to a clean architecture in the code if possible rather than rely on build options - just to keep it simple to understand. The connector broker dynamically loads classes as long as they are in the class path. If we had a common place where the jars for connectors could be assembled during the build, and we removed the compile dependencies then as long as the directory where the connectors are located is in the class path for the platform then the platform would start lean and only load the connectors requested by the servers running on the platform.

@planetf1
Copy link
Member

I completely agree - my suggestion doesn't replace or negate that, it's about the connectors themselves, which are necessarily dependent on more external libraries. It is an optional consideration for the way we build them.

@mandy-chessell
Copy link
Contributor Author

ahhh :)

@cmgrote
Copy link
Member

cmgrote commented Oct 29, 2019

I think the challenge we'll face with putting together the connection objects at build time rather than run-time are typical run-time differences that shouldn't impact the build -- in particular when it comes to testing across the SDLC. That is, we ideally want to have a consistent artefact that we can test across various lifecycle environments, without needing to build a different artefact for each one to cater for differences like credentials or hostnames (endpoints).

So if we went the serialization approach, I think we'd need to have some way of embedding variables / parameters that could be injected at run-time (?)

@mandy-chessell
Copy link
Contributor Author

mandy-chessell commented Oct 29, 2019

@cmgrote if I am understanding the comment correctly, it is a little off topic.

All server Connection parameters that need to be injected at runtime should be in the configuration document for the server. These properties can be changes bu updating the config document to accomodate any changes during SDLC.

The Connection objects for the Platform are passed in over REST after the platform has started.

The ConnectorConfigurationFactory is an admin component that builds the Connection objects for default components. It was being misused by various governance servers which were not storing Connection objects in their part of the ConfigurationDocument. I think that is all cleared up.

Its current implementation causes the classes for the default connectors to be loaded by the class loader because the classes are referenced explicitly in order to extract the class name.
If the class names were hardcoded strings then the class loader is not involved and the classes for the default components are not loaded.

This approach creates 2 challenges:

  • keeping the class name strings up to date (their rarely change so not a big deal).
  • ensuring the connector implementations are available on the classpath of the appropriate platform in all environments.

It is the second part that needs work.

@cmgrote
Copy link
Member

cmgrote commented Oct 29, 2019

Makes sense -- I think I've confused myself with the earlier comments referring to "connection definitions" and "connection objects"... Based on the statement above, I think what we're saying is:

  • keep the class name strings up to date for the ConnectorProvider classes (as these are what the ConnectorConfigurationFactory is using explicitly currently rather than as strings)
  • ensure that the implementations of these (ConnectorProvider and Connector that they will ultimately reference) are available on the classpath of the appropriate platform in all environments

When we were talking about "connections" I was thinking about the connection parameters we typically send in via a request that get pulled through a bean into the endpoint, credentials, etc that are ultimately used to configure the connector...

For the first part it might even make sense to leverage a "resources" file that puts these string values into a .properties or something along those lines, so that someone can easily override even the default ConnectorProviders they want to use across their environments? (I'm thinking the likes of swapping between the simple plain-text file-based server config store and the encrypted one, for example.) That would also de-couple any changes from any need to re-compile the code itself (?)

@mandy-chessell
Copy link
Contributor Author

When I refer to connections I am refering to the object that is passed into the connector broker to configure the connector. This is only the information that is passed to a connector to configure it.

We do not want any other back door configuration

@mandy-chessell
Copy link
Contributor Author

There is no need to recompile the code when we switch connectors - we just change the configuration document and a different connector is used.

The only time the names of the default connector provider classes change will be if we refactor the code - in which case we need to recomplie anyway

@cmgrote
Copy link
Member

cmgrote commented Oct 29, 2019

When I refer to connections I am refering to the object that is passed into the connector broker to configure the connector.

I'm back to my original question, then... Isn't this object that's passed to the connector broker where we would typically put information like endpoint, credentials, etc (?) If we serialize these, and our connector relies on endpoint and credential information, wouldn't we be locking them in as part of the serialization (?)

(As explained in https://egeria.odpi.org/open-metadata-implementation/frameworks/open-connector-framework/docs/concepts/connection.html / in the code under org.odpi.openmetadata.frameworks.connectors.properties.beans.Connection)

There is no need to recompile the code when we switch connectors - we just change the configuration document and a different connector is used.

Right, once we have the platform up and can change the configuration document -- but I thought part of the challenge was for scenarios where someone might want something other than the default config to bring up the platform in the first place (a sort of bootstrapping question) (?) But maybe we're suggesting that the platform always starts up the first time with the default config, and only if then overridden by config document changes do we replace with something else. (Which is probably fair: since without any config document setup, we probably don't really load much of anything anyway?)

Thinking we probably need a whiteboard and to walkthrough some scenarios step-by-step -- maybe something for our next session 🙂

@mandy-chessell
Copy link
Contributor Author

You are right that the connection information that the administrator adds to the connection object is serialized into the configuration document for the server and is used each time the server comes up - this is desired behaviour.

I think you are getting confused between platform connectors and server connectors.

The platform comes up with its default connectors and these are overridden by REST calls. A configuration document does not control these connectors.

The connectors I am concerned about are the server ones covering, for example, the Kafka event bus. These are controlled by the configuration documents - one configuration document per server.

@mandy-chessell
Copy link
Contributor Author

In 1.3/1.4 there is now a lib directory in the server assembly that is holding the connector implementations. This is to get them onto the class path so Egeria can load the connector implementation.

@planetf1
Copy link
Member

#2375 will aim to move to using (and proving) the assemblies in our containers and fix #2215 , and #2378 will document best practice for developers using an IDE

@planetf1 planetf1 removed this from the 2020.03 (1.6) milestone Mar 25, 2020
@planetf1
Copy link
Member

In 2.7 I think we can start to move away from fat connectors & chassis & slim down our jars. That aggregation will instead be done in the assembly. We should then be able to confirm if this is now addressed and/or complete the remaining work?

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Jun 30, 2021
@mandy-chessell mandy-chessell removed the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Jun 30, 2021
@planetf1 planetf1 removed this from the 2021.05 (2.10) milestone Jun 30, 2021
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consumability Makes the software easier to use or understand. Includes docs, error messages, logs, API definitions enhancement New feature or request no-issue-activity Issues automatically marked as stale because they have not had recent activity.
Projects
None yet
Development

No branches or pull requests

3 participants