Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Engine Proxy Config not using Connections #1732

Closed
mandy-chessell opened this issue Oct 12, 2019 · 17 comments
Closed

Data Engine Proxy Config not using Connections #1732

mandy-chessell opened this issue Oct 12, 2019 · 17 comments
Assignees
Labels
invalid This doesn't seem right

Comments

@mandy-chessell
Copy link
Contributor

I was reviewing the documentation for the governance servers and noticed this in the data engine proxy server descriptions ...

```json
{
    "class": "DataEngineProxyConfig",
    "dataEngineProxyProvider": "org.odpi.egeria.connectors.ibm.datastage.dataengineconnector.DataStageConnectorProvider",
    "dataEngineConfig": {
        "ibm.igc.services.host": "myhost.somewhere.com",
        "ibm.igc.services.port": "9445",
        "ibm.igc.username": "igcuser",
        "ibm.igc.password": "password"
    }
}
```

Governance server config should consist of connection objects for the connectors it supports and any properties that control its behaviour. The connectors should abstract the technology that it is integrated with.

So the data engine proxy config should have a Connection object for the data engine connector that includes the connector provider class name in the Connector Type of the Connection, the security properties in the Connection's properties and the endpoint information in the Endpoint of the Connection.

The Connection object is turned into a connector when the governance server calls the ConnectorBroker. The governance server does not need to know about the contents of the Connection object.

@mandy-chessell mandy-chessell added the invalid This doesn't seem right label Oct 12, 2019
@cmgrote
Copy link
Member

cmgrote commented Oct 14, 2019

Hi @mandy-chessell -- to be sure I fix correctly, is there an example of another service you can point me at that separates these bits out along these lines? 🙏

@mandy-chessell
Copy link
Contributor Author

mandy-chessell commented Oct 14, 2019

@cmgrote
The metadata server's configuration document follows this pattern, so does the discovery server. I am going through all of the other governance servers to make sure they are following our arch patterns too - I am sure you are not alone :)

@cmgrote
Copy link
Member

cmgrote commented Oct 14, 2019

I am sure you are not alone :)

No worries on that -- just want to fix it the first time rather than iterate, to make sure we can include it in the 1.1 release 😄

@cmgrote
Copy link
Member

cmgrote commented Oct 14, 2019

I think I got lucky in that my implementation wasn't far off, just that I wasn't putting the properties in the correct object(s)... Just to double-check, the new configuration would look something like this?

{
    "class": "DataEngineProxyConfig",
    "dataEngineProxyConnection": {
        "class": "Connection",
        "connectorType": {
            "class": "ConnectorType",
            "connectorProviderClassName": "org.odpi.egeria.connectors.ibm.datastage.dataengineconnector.DataStageConnectorProvider"
        },
        "endpoint": {
            "class": "Endpoint",
            "address": "myhost.somewhere.com:9445",
            "protocol": "https"
        },
        "userId": "igcuser",
        "clearPassword": "password"
    },
    "pollIntervalInSeconds": 60
}

(The only thing specific to the Data Engine Proxy is the pollIntervalInSeconds -- the rest is just general Connection information.)

Two follow-up questions, assuming that looks better now:

  1. Do I need these "class": "..." entries in every sub-object, or will Jackson figure those out itself based on the top-level "class": "DataEngineProxyConfig" and the fact that class defines what these sub-objects are class-wise?
  2. If I do need to specify them, do I need a fully-qualified classname like I've used for the connectorProviderClassName?

@mandy-chessell
Copy link
Contributor Author

mandy-chessell commented Oct 14, 2019

The "class" entry is added by Jackson in the client if you have the Jackson annotations specifying that it should include a class attribute and the real Java class that should be used to deserialize it in the server.

The OCF beans are set up to expect the class attribute since this is necessary to correctly deserialize a subclass of the object specified on the interface. They use the simple class name (like "Connection"). This is the annotation that defines the class name of the subtype.
This is the definiitons in Referenceable (Connection's superclass). It is the name value that goes in the JSON. It means we can change the interface class without impacting the API.

@JsonAutoDetect(getterVisibility=PUBLIC_ONLY, setterVisibility=PUBLIC_ONLY, fieldVisibility=NONE)
@JsonInclude(JsonInclude.Include.NON_NULL)
@JsonIgnoreProperties(ignoreUnknown=true)
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME,
        include = JsonTypeInfo.As.PROPERTY,
        property = "class")
@JsonSubTypes(
        {
                @JsonSubTypes.Type(value = Asset.class, name = "Asset"),
                @JsonSubTypes.Type(value = Certification.class, name = "Certification"),
                @JsonSubTypes.Type(value = Comment.class, name = "Comment"),
                @JsonSubTypes.Type(value = Connection.class, name = "Connection"),
                @JsonSubTypes.Type(value = ConnectorType.class, name = "ConnectorType"),
                @JsonSubTypes.Type(value = Endpoint.class, name = "Endpoint"),
                @JsonSubTypes.Type(value = ExternalIdentifier.class, name = "ExternalIdentifier"),
                @JsonSubTypes.Type(value = ExternalReference.class, name = "ExternalReference"),
                @JsonSubTypes.Type(value = License.class, name = "License"),
                @JsonSubTypes.Type(value = Location.class, name = "Location"),
                @JsonSubTypes.Type(value = Note.class, name = "Note"),
                @JsonSubTypes.Type(value = NoteLog.class, name = "NoteLog"),
                @JsonSubTypes.Type(value = RelatedMediaReference.class, name = "RelatedMediaReference"),
                @JsonSubTypes.Type(value = SchemaElement.class, name = "SchemaElement")
        })

and this is the annotation in Connection to ensure that if you pass a virtual connection as a connection, this subtype is correctly deserialized.

@JsonAutoDetect(getterVisibility=PUBLIC_ONLY, setterVisibility=PUBLIC_ONLY, fieldVisibility=NONE)
@JsonInclude(JsonInclude.Include.NON_NULL)
@JsonIgnoreProperties(ignoreUnknown=true)
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME,
        include = JsonTypeInfo.As.PROPERTY,
        property = "class")
@JsonSubTypes(
        {
                @JsonSubTypes.Type(value = VirtualConnection.class, name = "VirtualConnection")
        })

The other queston is how are you initializing the connector. It needs to be through the ConnectorBroker. If the Connection object is set up correctly then the ConnectorBroker will load the right connector provider and request that it creates the connector instance. This way the proxy server is kept independent of the connector implementation.

@cmgrote
Copy link
Member

cmgrote commented Oct 14, 2019

The other queston is how are you initializing the connector. It needs to be through the ConnectorBroker. If the Connection object is set up correctly then the ConnectorBroker will load the right connector provider and request that it creates the connector instance. This way the proxy server is kept independent of the connector implementation.

I always was initializing the connector via the ConnectorBroker -- just that I wasn't leveraging the properties of the Connection object itself that I really should have (ie. putting the server address, username, password information there rather than in my own custom Map of properties):

// ...
        Connection dataEngineProxy = dataEngineProxyConfig.getDataEngineProxyConnection();
        if (dataEngineProxy != null) {
            try {
                ConnectorBroker connectorBroker = new ConnectorBroker();
                dataEngineProxyConnector = (DataEngineConnectorBase) connectorBroker.getConnector(dataEngineProxy);
// ...

@cmgrote
Copy link
Member

cmgrote commented Oct 14, 2019

How do I persist the credential information (userId, clearPassword) of a Connection object as part of the configuration? At the moment these seem to be stripped out of the Connection object before being persisted into eg. the server config -- which means I won't be able to use a Connection I have previously configured between server restarts (?) (Actually it doesn't even seem to be passed through the connectionProperties parameter of the initialize method of a connector, so it's always null -- but maybe I'm doing something wrong?)

@mandy-chessell
Copy link
Contributor Author

What is stripping them out of the connection ? The only part that is hidden is the secured properties.

@cmgrote
Copy link
Member

cmgrote commented Oct 14, 2019

🤷‍♂️ Something between the Connection I send in via admin services config and when that hands over to the Connector::initialize() method (via ConnectorBroker::getConnector())... Will try to dig deeper later tonight / tmw morning...

@mandy-chessell
Copy link
Contributor Author

The admin services should just store the connection object when the server is configured. - this is easy to check by querying the configuration document.

When the server is started, the relevant parts of the configuration document (including the connection object) should be passed to the initialization routine for the data engine proxy services. This initialization routine then passes the connection object to the connector broker on the getConnection method.

The admin services do not call the connector broker.

@cmgrote
Copy link
Member

cmgrote commented Oct 14, 2019

Hmm, I think it is my config service as it is generating a new Connection object via the ConnectorConfigurationFactory and is only passing through the configuration properties to my method in there from which it generates a new Connection (thereby losing anything other than the configuration properties).

I'm a bit confused again whether I need to actually be using the ConnectorConfigurationFactory for this, or whether I can just create my own method entirely within the data-engine-proxy-services-server module (?) ... and also confused over whether I need to track a localURL (for the proxy) in addition to the remote URL of the data engine itself (?)

I'm assuming I cannot just pass the Connection object I outlined in a comment further up, because I need to give it some UUIDs to make it a unique Connection instance, which is what ConnectorConfigurationFactory was doing, but seems a bit awkward to have a method there that I send in a Connection to get a Connection back. (And to have to maintain this in the general OMAG services rather than in my own contained area around data-engine-proxy stuff...)

Should I try to self-contain as much as possible, and put in a PR to see if I'm at least moving in the right direction?

@mandy-chessell
Copy link
Contributor Author

You should not be using the connector configuration factory - this class was for creating the default connection objects for the OMRS and OMASs. It needs to be removed because it is creating code dependencies on these connectors irrespective of whether the deployment environment is using them or not. See #1607.

The data engine proxy would definately need the url and server name of the metadata server where the Data Engine OMAS is running in order to create the client. It would probably not need its own URL for its own processing. However we use this value to deploy the configuration document from the server platform where the configuration is done to the platform where the proxy server will run.

The connection object needs to be created by the caller of the admin service that sets up the data engine proxy's configuration document. The data engine proxy code should not be creating the connection object. We only do that in the admin code for default connectors that are part of our implementation. The data engine proxy should have not idea what type of data engine it is dealing with - there is no obvious default data engine.

@cmgrote
Copy link
Member

cmgrote commented Oct 14, 2019

You should not be using the connector configuration factory - this class was for creating the default connection objects for the OMRS and OMASs. It needs to be removed because it is creating code dependencies on these connectors irrespective of whether the deployment environment is using them or not. See #1607.

Excellent! Dropping out of that 👍

The data engine proxy would definately need the url and server name of the metadata server where the Data Engine OMAS is running in order to create the client. It would probably not need its own URL for its own processing. However we use this value to deploy the configuration document from the server platform where the configuration is done to the platform where the proxy server will run.

Agreed, I have the URL and server name of the Data Engine OMAS. Need to wrap my head around that last sentence, though...

The connection object needs to be created by the caller of the admin service that sets up the data engine proxy's configuration document. The data engine proxy code should not be creating the connection object. We only do that in the admin code for default connectors that are part of our implementation. The data engine proxy should have not idea what type of data engine it is dealing with - there is no obvious default data engine.

I think there's some devil in the detail on this one... Here's my current call-path:

  1. I expect configuration against a REST endpoint defined by org.odpi.openmetadata.adminservices.spring.DataEngineProxyResource -- the configuration I am expecting is of type org.odpi.openmetadata.adminservices.configuration.properties.DataEngineProxyConfig (inside of which is are parameters for the Data Engine OMAS server name, Data Engine OMAS URL and a generic Connection object.) The REST endpoint simply saves the DataEngineProxyConfig into the server configuration (including the OMAS info and the Connection object).
  2. (There is also an "enable" REST endpoint that can pull the configuration stored by the step above so that it does not need to be re-configured every time the server starts up.)
  3. As part of org.odpi.openmetadata.adminservices.activateWithStoredConfig, if any DataEngineProxyConfig is found in the server's configuration document a new org.odpi.openmetadata.governanceservers.dataengineproxy.admin.DataEngineProxyOperationalServices instance is created.
  4. This DataEngineProxyOperationalServices instance is then initialized, which instantiates a Data Engine OMAS Client (using server and URL in the DataEngineProxyConfig), creates a new ConnectorBroker, calls getConnector() with the generic Connection object stored as part of the DataEngineProxyConfig, and then starts polling for changes through the DataEngineConnectorBase that this returns (which is simply a default class from which I would expect every DataEngineConnector will extend to ensure that the DataEngineInterface is implemented).

So I'm creating and storing a generic Connection object as part of step (1). Step (3) reads this back as part of the DataEngineProxyConfig (which contains both that generic Connection object and (separately) the Data Engine OMAS information). Only at step (4) is the Connection object being passed along to the broker to instantiate a Connector -- the only idea that the Data Engine Proxy itself has of the type of connection is that it adheres to the DataEngineConnectorBase set of methods (there are specific operations that need to be available on a DataEngineConnector -- which I think is fair enough? -- and these are defined by DataEngineInterface that DataEngineConnectorBase implements).

The two bits that I'm still not sure about:

  • I'm using this DataEngineProxyConfig both for my initial configuration REST calls, as well as for how I'm storing the information in the server configuration document -- is that problematic?
  • Your statement that "It would probably not need its own URL for its own processing. However we use this value to deploy the configuration document from the server platform where the configuration is done to the platform where the proxy server will run." (I'm not confident I'm keeping these multiple URLs the way I might need to be keeping them...)

cmgrote added a commit to cmgrote/egeria that referenced this issue Oct 14, 2019
Signed-off-by: Christopher Grote <chris@thegrotes.net>
@mandy-chessell
Copy link
Contributor Author

Step 2 - The enable method does not seem to do anything useful - from the code I see it extracts it from the config store and then saves it back into the config store. By saving the config into the configuration document, as in the setDataEngineProxyConfig(), the configuration is available for server restart. The 'instance' admin service reads the configuration document for the server and starts up the requested services. This includes passing the data engine proxy config to the initialization service for the data engine proxy server. The DataEngineProxy server seems to be missing the DELETE configuration call. Some services also have a GET configuration so that the administrator can retrieve just the specific config for the governance service.

Using the DataEngineProxyConfig for the REST calls and for the store is fine. The alternative is to have separate REST calls for the 3 pieces of information. If in a future release you want to extend it with other config parameters then you add another admin call.

You do not need to manage the localServerURL - it is stored in the main config document and managed by the admin services. If you want to know more about the deployment service, it is demonstrated in the egeria-server-start lab notebook.

@mandy-chessell
Copy link
Contributor Author

One other point - there is a lot of missing javadoc in the data engine proxy server config code in master - is this addressed in the latest pull request.

@cmgrote
Copy link
Member

cmgrote commented Oct 15, 2019

Not yet, but I’ll push another commit later today to fix the JavaDoc gaps, method gaps and remove unnecessary methods. Thanks for all the guidance!

cmgrote added a commit to cmgrote/egeria that referenced this issue Oct 15, 2019
Signed-off-by: Christopher Grote <chris@thegrotes.net>
@cmgrote
Copy link
Member

cmgrote commented Oct 17, 2019

Should be resolved now by referenced PRs that have been merged -- closing (I would suggest new issues with the specific gaps if there's anything left that I missed!)

@cmgrote cmgrote closed this as completed Oct 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants