Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow DataEndpoint singleton to auto-discover available data endpoints #157

Merged
merged 7 commits into from
Jun 19, 2017

Conversation

smgallo
Copy link
Contributor

@smgallo smgallo commented Jun 12, 2017

Allow DataEndpoint singleton to auto-discover available data endpoints

Description

Rather than require a developer to modify ETL\DataEndpoint.php and explicitly list each available data endpoint and it's configuration code, we now allow the DataEndpoint singleton to automatically discover what data endpoints are available. As per PSR-4 the contiguous sub-namespace names after the "namespace prefix" (ETL in our case) correspond to a subdirectory within a "base directory", in which the namespace separators represent directory separators. From this, we can assume the ETL\DataEndpoint sub-namespace is in the DataEndpoint subdirectory relative to where the DataEndpoint singleton class is defined. Using this knowledge we use reflection to recursively examine each file in the DataEndpoint subdirectory and any class implementing iDataEndpoint and also defining an ENDPOINT_NAME constant will be a valid endpoint definition. The ENDPOINT_NAME constant defines the endpoint name to be used in configuration files. Conflicts in endpoint names are logged.

The etl_overseer script has been modified to list the available endpoints:

$ php etl_overseer.php -c ../../../etc/etl/etl.json -l endpoint-types                       
Name    Class
directoryscanner        ETL\DataEndpoint\DirectoryScanner
file    ETL\DataEndpoint\File
jsonfile        ETL\DataEndpoint\JsonFile
mysql   ETL\DataEndpoint\Mysql
oracle  ETL\DataEndpoint\Oracle
postgres        ETL\DataEndpoint\Postgres
rest    ETL\DataEndpoint\Rest

Motivation and Context

Make it easier to define data endpoints.

Tests performed

Ran existing tests, several of which use the File and Json endpoints.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project as found in the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@smgallo smgallo added enhancement Enhancement of the functionality of an existing feature Category:ETL Extract Transform Load labels Jun 12, 2017
@smgallo smgallo added this to the v7.0.0 milestone Jun 12, 2017
@smgallo smgallo requested review from plessbd, jpwhite4 and tyearke June 12, 2017 20:40
@smgallo smgallo requested a review from ryanrath June 13, 2017 13:43
Copy link
Contributor

@ryanrath ryanrath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor misspellings / questions is all.

* @var string
*/

private static $dataEndpointRelativeNs = 'DataEndpoint';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended that these be changed at some point during the classes life time? Just wanting to understand the choice of private static vs const. This same question applies to $dataEndpointRequiredInterface and $endpointNameConstant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no they don't change programmatically so it does make sense for them to be constants. I'll make that change.

} // getDataEndpointInfo()

/** -----------------------------------------------------------------------------------------
* Discover the list of currently supported data endpoints and constuct a list mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/constuct/construct

// file resides represent sub-namespaces.

// The endpoint directory is relative to the directory where this file is found
$endpointDir = dirname(__FILE__) . '/' . strtr(self::$dataEndpointRelativeNs, '\\', '/');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More for myself than anything else are hard coded '/' cool as opposed to using say, DIRECTORY_SEPARATOR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DIRECTORY_SEPARATOR is a more portable way to specify it (not that it matters on the platforms that we support). I'll make that change as well.


class DataEndpoint
{
/**
* Namesapce, relative to the current namespace, where data endpoint classes are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Namesapce/Namespace

/** -----------------------------------------------------------------------------------------
* Discover the list of currently supported data endpoints and constuct a list mapping
* their names to the classes that implement them. All data endpoints must implement
* the interface specified in self::$dataEndpointRequiredInterface. By automatically
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should either mention that they also need to define the const ENDPOINT_NAME or add a function to iDataEndpoint that returns this value so that all that is required is to implement the interface.

* @param $options A DataEndpointOptions object containing options parsed from the ETL config.
* @param DataEndpointOptions $options A DataEndpointOptions object containing options
* parsed from the ETL config.
* @param Log $logger A PEAR Log object or null to use the null logger.
*
* @return A data endpoint object implementing the iDataEndpoint interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a return type

$dotDirFilterIterator = new \CallbackFilterIterator(
$iterator,
function ($current, $key, $iterator) {
if ( $iterator->isDot() ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to return ! $iterator->isDot();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points.

);
}

if ( null !== $this->maxRecursionDepth ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to move this line to 464 into the try above as all of these operations depend on $flattenedIterator existing. It would also then conform to the pattern seen elsewhere in the code of: try block that creates new iterator, maybe does a few things and ultimately sets iterator = new iterator

Copy link
Contributor

@ryanrath ryanrath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@smgallo smgallo merged commit 9800e7c into ubccr:xdmod7.0 Jun 19, 2017
@smgallo smgallo deleted the etl/auto-discover-data-endpoints branch June 19, 2017 16:13
ryanrath pushed a commit to ryanrath/xdmod that referenced this pull request Jul 24, 2017
ubccr#157)

* Remove commented out code, improve control structure flow
* Allow DataEndpoint to auto-discover classes implementing iDataEndpoint
* Add overseer option to list available data endpoint types
chakrabortyr pushed a commit to chakrabortyr/xdmod that referenced this pull request Oct 17, 2017
ubccr#157)

* Remove commented out code, improve control structure flow
* Allow DataEndpoint to auto-discover classes implementing iDataEndpoint
* Add overseer option to list available data endpoint types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:ETL Extract Transform Load enhancement Enhancement of the functionality of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants