-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for cloud data to xdmod-shredder and xdmod-ingestor #739
Add support for cloud data to xdmod-shredder and xdmod-ingestor #739
Conversation
…vent data ingestion
…d etl pipeline function. removed previously added command line flags
… Generalcloud.php to Genericcloud.php. replaces etl_overseer script calls in bootstrap.sh to use xdmod-shredder and xdmod-ingestor instead
…gest data for that realm. moved ingestion of organizations and resource_types to their own etl json files so they can be ingested independent of the jobs pipelines.
@@ -108,6 +111,7 @@ function main() | |||
break; | |||
case 'aggregate': | |||
$aggregate = true; | |||
$realmToAggregate = $value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs the $value for an optional argument is false if it is not specified. Assuming the documentation is correct you'll need to check for === false rather than === null on line 287. Please confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpwhite4 Yeah, that does seem to be the case. It should be changed now.
902b7ba
to
e228e6d
Compare
edd9609
to
64abdda
Compare
bin/xdmod-shredder
Outdated
@@ -233,7 +233,7 @@ function main() | |||
$logger->notice("Job errors written to '$jobErrorLogFile'"); | |||
} | |||
|
|||
if (!$dryRun) { | |||
if (!$dryRun && in_array($format, array('pbs', 'slurm', 'lsf', 'sge', 'uge'))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you explain the reason for this change.
@@ -323,4 +323,54 @@ public static function quoteVariables(array $variables, VariableStore $variableS | |||
|
|||
return $localVariableMap; | |||
} // quoteVariables() | |||
|
|||
public static function runEtlPipeline(array $pipelines, $logger, array $params = array()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this should be a separate pull request that also removes similar code from the code base:
$overseer = new EtlOverseer($overseerOptions, $this->logger); |
$overseer = new EtlOverseer($overseerOptions, $this->logger); |
$overseer = new \ETL\EtlOverseer($overseerOptions, $logger); |
Line 2251 in 133a3ae
$overseer = new EtlOverseer($overseerOptions, $log); |
Specifically so we dont have the many places this is done in our code base and we dont forget about it.
You do state in the description that there is a subsequent PR for this can you please link to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the branch, https://github.com/eiffel777/xdmod/tree/move-runEtlPipeline-references-to-etl-utilities-class, that has the changes talked about in the last paragraph of the description. It's a bit behind xdmod8.1 right now so there is a some extra code in the diff that can be ignored
@@ -7,14 +7,14 @@ | |||
|
|||
namespace OpenXdmod\Shredder; | |||
|
|||
use CCR\DB\iDatabase; | |||
|
|||
class OpenStack extends aCloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the filename match the class name OpenStack
vs Openstack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch. It's changed now.
private function realmEnabled($realm) | ||
{ | ||
$realms = $this->warehouseDb->query("SELECT * FROM moddb.realms WHERE display = :realm", [':realm' => $realm]); | ||
return (count($realms) > 0) ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (count($realms) > 0) ? true : false; | |
return (count($realms) > 0); |
@@ -234,11 +234,16 @@ function main() | |||
} | |||
|
|||
if (!$dryRun) { | |||
$logger->notice('Normalizing data'); | |||
$logger->notice('Normalizing data!'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am so excited we are normalizing this data, before I was just fine.
This PR adds support for shredding and ingesting cloud event logs in either the OpenStack or Eucalyptus(soon to be renamed Generic Cloud) log file formats.
For shredding the cloud log files two shredder classes are added that implement the shredDirectory function that is used when xdmod-shredder is called using the -d option. Shredding cloud logs by file using the -i flag is not supported and throws an exception.
For ingesting the shredded data a new command line option called datatype is added that allows you to specify what sort of data you want to ingest. Currently only openstack and genericcloud are supported.
The aggregate option now takes an optional parameter to specify what data you want to aggregate. Currently this is either jobs or cloud. If no parameter is specified it will attempt to aggregate both jobs and cloud data.
If no command line options are specified when running
xdmod-ingestor
then it will try to ingest both cloud and jobs data. There may be cases where only the jobs realm is being used butxdmod-ingestor
with no command line options is run. In this examplexdmod-ingestor
will attempt to ingest jobs data and cloud data. In order to prevent errors from trying to ingest cloud data that doesn't exist the new function realmEnabled is used in order to only attempt ingestion of realms that are enabled.There are some pipeline actions that needed to exist outside of the jobs pipelines becuase of foreign key constraints on tables used to ingest resources. This was done for actions pertaining to ingesting organizations and resource_types from their respective config files.
The runEtlPipeline function was added to /ETL/Utilities class. It is similar to the runEtlPipeline function in DataWarehouseInitializer but is not private and does not require you to instantiate the DataWarehouseInitializer class and the two other classes that are passed to the DataWarehouseInitializer constructor but not used in runEtlPipeline. In a subsequent pull request I will remove the runEtlPipeline function from the DataWarehouseInitializer class and point anything that uses it to the ETL\Utilities::runEtlPipeline().
Motivation and Context
We want to have a unified way of shredding and ingesting all types of data that xdmod needs.
Tests performed
Manual testing in docker and added new tests for shredder classes that test to make sure if a directory that does not exist is passed to the shredder then false is returned.
Types of changes
Checklist: