Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ingestion Job management API for Feast Core #548

Merged
merged 66 commits into from
Apr 7, 2020
Merged

Add Ingestion Job management API for Feast Core #548

merged 66 commits into from
Apr 7, 2020

Conversation

mrzzy
Copy link
Collaborator

@mrzzy mrzzy commented Mar 18, 2020

What this PR does / why we need it:
PR to implement a Ingestion Job Management API into feast core.
This PR adds:

  • extends CoreService API to give the ability to list, stop, and restart ingestion jobs.
  • extends Python sdk's Client to add support for the additions to the core service API.
  • extends the CLI with feast ingest-job list, describe, stop, restart
  • tests for the additions where appropriate.

Which issue(s) this PR fixes:

Fixes: #302

Does this PR introduce a user-facing change?: Yes

Added Ingestion Job Management to Feast Core and the Python SDK:
- Listing Ingestion Jobs with filters
- Restarting Ingestion Jobs
- Stopping Ingestion Jobs

Added ingest-job subcommand to CLI for Ingestion Job Management:
-  feast ingest-job list, describe, stop, restart

@feast-ci-bot
Copy link
Collaborator

Hi @mrzzy. Thanks for your PR.

I'm waiting for a gojek member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@woop
Copy link
Member

woop commented Mar 23, 2020

/ok-to-test

@mrzzy
Copy link
Collaborator Author

mrzzy commented Mar 23, 2020

/retest

@woop woop changed the title [WIP] Job management api for feast core [WIP] Job management API for Feast Core Mar 28, 2020
@mrzzy mrzzy changed the title [WIP] Job management API for Feast Core Job management API for Feast Core Mar 31, 2020
@mrzzy mrzzy changed the title Job management API for Feast Core Ingestion Job management API for Feast Core Mar 31, 2020
@mrzzy mrzzy marked this pull request as ready for review March 31, 2020 04:48
@mrzzy mrzzy requested review from woop and zhilingc March 31, 2020 04:49
@mrzzy
Copy link
Collaborator Author

mrzzy commented Mar 31, 2020

/retest

@mrzzy mrzzy changed the title Ingestion Job management API for Feast Core Add Ingestion Job management API for Feast Core Mar 31, 2020
@mrzzy
Copy link
Collaborator Author

mrzzy commented Mar 31, 2020

This PR is non longer a work-in-progress and is ready for review.

I have manually tested to this to work with both Direct and Dataflow runners, although I like to ask a reviewer to double check that the integration with Dataflow is fully working.

@woop
Copy link
Member

woop commented Mar 31, 2020

This PR is non longer a work-in-progress and is ready for review.

Thanks for this @mrzzy. Much needed

I have manually tested to this to work with both Direct and Dataflow runners, although I like to ask a reviewer to double check that the integration with Dataflow is fully working.

How would the reviewer double check integration with Dataflow is working? Normally a reviewer only reviews the code, and the code should ideally contain all the necessary automation to test the functionality that is implemented.

@mrzzy
Copy link
Collaborator Author

mrzzy commented Mar 31, 2020

I agree that this is the ideal case, however the current end to end tests are only configured to test with the Direct runner, leaving a correctness blind spot with the Dataflow runner that has be addressed with manual testing.

@woop
Copy link
Member

woop commented Mar 31, 2020

I agree that this is the ideal case, however the current end to end tests are only configured to test with the Direct runner, leaving a correctness blind spot with the Dataflow runner that has be addressed with manual testing.

Ideally the tests would be updated to incorporate Dataflow testing as well. We can possibly leave it out of scope for now. The reviewer in this case won't have time to actually spin up Dataflow and test this all out.

@mrzzy would you mind creating a new GitHub issue to add a Dataflow based test?

@mrzzy
Copy link
Collaborator Author

mrzzy commented Apr 1, 2020

I opened issue #595 to track adding Dataflow Test Coverage to Feast

@mrzzy
Copy link
Collaborator Author

mrzzy commented Apr 7, 2020

/retest

@mrzzy
Copy link
Collaborator Author

mrzzy commented Apr 7, 2020

/test test-core-and-ingestion

@mrzzy
Copy link
Collaborator Author

mrzzy commented Apr 7, 2020

/test test-end-to-end-batch

@mrzzy
Copy link
Collaborator Author

mrzzy commented Apr 7, 2020

/test test-python-sdk

@mrzzy
Copy link
Collaborator Author

mrzzy commented Apr 7, 2020

@zhilingc Thanks for the review and taking the time out of your day to test the dataflow runner.
I have implemented your comments into the PR:

  • disabled stopJobs() for suspended or aborted jobs due to possible race condition.
  • added __str__() and __repr__() to IngestJob to render it in a human readable format.
  • Added feast ingest-job list, describe, stop, restart to feast CLI
  • Renamed Job to RetrivalJob to prevent confusion with IngestJob in python sdk.

@zhilingc
Copy link
Collaborator

zhilingc commented Apr 7, 2020

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrzzy, zhilingc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 9139fe3 into feast-dev:master Apr 7, 2020
khorshuheng pushed a commit to khorshuheng/feast that referenced this pull request Apr 15, 2020
* Added protobuf definitions for a Job Management API

* Added query methods to JobRepository to query jobs by store and featureset

* Added hashCode() & equals() to Job model compare and hash jobs

This would allow jobs to be used as elements in HashSets and keys in
HashMaps

* Added toIngestionProto() to Job object to convert Job model to ingestion job proto

* Added query methods to FeatureSetRepository to query by exact (name, project) or (name,version)

* Added listJobs() to JobService to handle request list ingestion jobs requests

* Added missing filter field in ListIngestionJobsRequest protobuf

* Added code to setup mockups for testing JobService

* Revert "Added hashCode() & equals() to Job model compare and hash jobs"

This reverts commit ba995bb712cbd38f0f4bef47efbe433d5ec07521 as
it caused core tests to fail

* Added JobServiceTest unit tests for JobService's listJobs()

* Added stopJobs() to JobService to handle requests to stop jobs

* Added getTransitionalStates() to JobStatus to return collection of transitional states

* Changed stopJobs() to throw unsupported error on transitional job statuses

* Moved conversion of JobStatus to IngestionJobStatus proto to JobStatus.

* Make findFeatureSet() match only one featureset.

Limit findFeatureSets() as feature set references as composite keys
should match one and only one featureset.

* Added matchFeatureSets() to SpecService to match Feature Sets from References

This commit is adds temporary support for FeatureSetReference to
SpecService via listFeatureSets(). In the future, this should merged
together with listFeatureSets() as their functionality is almost the
same.

* Refactor JobService listJobs() to use SpecService to provide featureset matching

* Revert findBy methods added to FeatureSetRepository as no longer used.

JobService no longer depends on FeatureSetRepository directly, instead
via SpecService

* Added restartJob() to JobManagers to restart ingestion/import jobs

* Added restartJob() to JobService to restart ingestion jobs

* Update job model (due to new extId) when restartJob() in JobService

* Use assertThat() & equalTo() instead of assertEquals() in JobService

* Revert "Use assertThat() & equalTo() instead of assertEquals() in JobService" due to failed tests

This reverts commit bb3cf0ae1e7cfbde7a2f997cacc661d040c0a35f.

* Use hamcrest’s assertThat() and equalTo() instead of Junit's assertEquals

* Hook up JobService list, stop, restart job methods to CoreServiceImpl.

* Throw InvalidArgumentException instead when calling listJobs() with invalid FeatureSetReference

Throw InvalidArgumentException instead of UnsupportedOperationException
for better semantics: UnsupportedOperationException should be reserved
for operations that fail due to failed preconditions

* Fixed typos in javadocs: InvalidArgumentException should be IllegalArgumentException

* Fixed in findByFeatureSetsIn() query in JobRepository

* Renamed toIngestionProto() methods to toProto() to follow code convention

* Make JobService's listJobs() a transaction to prevent DB data race conditions

* Moved matchFeatureSets() in SpecService to private method in JobService

* Make the map that maps between JobStatus and IngestionJobStatus static

* Make JobService's listJobs() to return all ingestion jobs on empty filter

* Fixed issue where the jobManager map that JobService built used wrong keys

* Fix issue where actual Job Status is not synced with database.

Issue occurs when the job is aborted/restarted, but the JobStatus
has not yet been updated by JobUpdateTask. Hence another call
to abortJob() & restartJob() that should be rejected due invalid status
is allowed through

* Log stopJob() & restartJob() operations to make debugging easier

* Use Runner.name() instead of runner.toString() to build JobManager map

* Move documentation on JobService operations to CoreService protobuf definition

* Added IngestJob to python sdk as native representation of IngestionJob proto

* Make empty filter on JobService's listJobs() select all ingestion jobs

* Added bindings for Job management API to python sdk client.

* Fixed __connect_core() to connect to Feast CoreService when calling on Job API calls

* Auto reload IngestJob.status and IngestJob.external_id on get property.

* Added IngestJob.wait() to wait for job status to transtion

* Added basic job api e2e test to exercise job api

* Reorder the operations e2etest to make sure that jobs are running after test

* Added e2e test for all types exercising job api

* Fixed typo in function arguments

* Added unit tests for Ingestion Job API additions in python sdk

* Rename "ingestion" to "ingest" for more consistent naming

* Disable support for restarting Job in a terminal state due to possible race conditions

* Added __str__ and __repr__ to IngestJob to render ingestjob in human readable string

* Added FeatureSetRef to represent references to featursets

* Admend client's list_ingest_jobs() to accept feature references directly

* Fixed typo in IngestJob.store property

* Fixed issue with FeatureSetRef.from_str not converting version to int

* Make the grpc error message more apparent on stop_ingest_job() and restart_ingest_job()

* Added feast ingest-job list, describe, stop, restart to CLI

* Rename Job to RetrievalJob to prevent confusion with IngestJob

* Updated e2e tests to use FeatureSetRef in list_ingest_jobs()

* Fixed due e2e tests to cater to new limitations on stop_ingest_job()

* Increase timeout on test_all_types_ingest_jobs() e2e test.

* Configure IngestJob.wait() to backoff with a exponentially larger wait duration

* Added print statements to debug e2e test failure.

* Revert "Added print statements to debug e2e test failure."

This reverts commit 146fb2b.

* Fixed issue of test waiting for aborted job to become running causing timeout.

Co-authored-by: Zhu Zhanyan <zhu.zhanyan@gojek.com>
khorshuheng pushed a commit that referenced this pull request Apr 15, 2020
* Added protobuf definitions for a Job Management API

* Added query methods to JobRepository to query jobs by store and featureset

* Added hashCode() & equals() to Job model compare and hash jobs

This would allow jobs to be used as elements in HashSets and keys in
HashMaps

* Added toIngestionProto() to Job object to convert Job model to ingestion job proto

* Added query methods to FeatureSetRepository to query by exact (name, project) or (name,version)

* Added listJobs() to JobService to handle request list ingestion jobs requests

* Added missing filter field in ListIngestionJobsRequest protobuf

* Added code to setup mockups for testing JobService

* Revert "Added hashCode() & equals() to Job model compare and hash jobs"

This reverts commit ba995bb712cbd38f0f4bef47efbe433d5ec07521 as
it caused core tests to fail

* Added JobServiceTest unit tests for JobService's listJobs()

* Added stopJobs() to JobService to handle requests to stop jobs

* Added getTransitionalStates() to JobStatus to return collection of transitional states

* Changed stopJobs() to throw unsupported error on transitional job statuses

* Moved conversion of JobStatus to IngestionJobStatus proto to JobStatus.

* Make findFeatureSet() match only one featureset.

Limit findFeatureSets() as feature set references as composite keys
should match one and only one featureset.

* Added matchFeatureSets() to SpecService to match Feature Sets from References

This commit is adds temporary support for FeatureSetReference to
SpecService via listFeatureSets(). In the future, this should merged
together with listFeatureSets() as their functionality is almost the
same.

* Refactor JobService listJobs() to use SpecService to provide featureset matching

* Revert findBy methods added to FeatureSetRepository as no longer used.

JobService no longer depends on FeatureSetRepository directly, instead
via SpecService

* Added restartJob() to JobManagers to restart ingestion/import jobs

* Added restartJob() to JobService to restart ingestion jobs

* Update job model (due to new extId) when restartJob() in JobService

* Use assertThat() & equalTo() instead of assertEquals() in JobService

* Revert "Use assertThat() & equalTo() instead of assertEquals() in JobService" due to failed tests

This reverts commit bb3cf0ae1e7cfbde7a2f997cacc661d040c0a35f.

* Use hamcrest’s assertThat() and equalTo() instead of Junit's assertEquals

* Hook up JobService list, stop, restart job methods to CoreServiceImpl.

* Throw InvalidArgumentException instead when calling listJobs() with invalid FeatureSetReference

Throw InvalidArgumentException instead of UnsupportedOperationException
for better semantics: UnsupportedOperationException should be reserved
for operations that fail due to failed preconditions

* Fixed typos in javadocs: InvalidArgumentException should be IllegalArgumentException

* Fixed in findByFeatureSetsIn() query in JobRepository

* Renamed toIngestionProto() methods to toProto() to follow code convention

* Make JobService's listJobs() a transaction to prevent DB data race conditions

* Moved matchFeatureSets() in SpecService to private method in JobService

* Make the map that maps between JobStatus and IngestionJobStatus static

* Make JobService's listJobs() to return all ingestion jobs on empty filter

* Fixed issue where the jobManager map that JobService built used wrong keys

* Fix issue where actual Job Status is not synced with database.

Issue occurs when the job is aborted/restarted, but the JobStatus
has not yet been updated by JobUpdateTask. Hence another call
to abortJob() & restartJob() that should be rejected due invalid status
is allowed through

* Log stopJob() & restartJob() operations to make debugging easier

* Use Runner.name() instead of runner.toString() to build JobManager map

* Move documentation on JobService operations to CoreService protobuf definition

* Added IngestJob to python sdk as native representation of IngestionJob proto

* Make empty filter on JobService's listJobs() select all ingestion jobs

* Added bindings for Job management API to python sdk client.

* Fixed __connect_core() to connect to Feast CoreService when calling on Job API calls

* Auto reload IngestJob.status and IngestJob.external_id on get property.

* Added IngestJob.wait() to wait for job status to transtion

* Added basic job api e2e test to exercise job api

* Reorder the operations e2etest to make sure that jobs are running after test

* Added e2e test for all types exercising job api

* Fixed typo in function arguments

* Added unit tests for Ingestion Job API additions in python sdk

* Rename "ingestion" to "ingest" for more consistent naming

* Disable support for restarting Job in a terminal state due to possible race conditions

* Added __str__ and __repr__ to IngestJob to render ingestjob in human readable string

* Added FeatureSetRef to represent references to featursets

* Admend client's list_ingest_jobs() to accept feature references directly

* Fixed typo in IngestJob.store property

* Fixed issue with FeatureSetRef.from_str not converting version to int

* Make the grpc error message more apparent on stop_ingest_job() and restart_ingest_job()

* Added feast ingest-job list, describe, stop, restart to CLI

* Rename Job to RetrievalJob to prevent confusion with IngestJob

* Updated e2e tests to use FeatureSetRef in list_ingest_jobs()

* Fixed due e2e tests to cater to new limitations on stop_ingest_job()

* Increase timeout on test_all_types_ingest_jobs() e2e test.

* Configure IngestJob.wait() to backoff with a exponentially larger wait duration

* Added print statements to debug e2e test failure.

* Revert "Added print statements to debug e2e test failure."

This reverts commit 146fb2b.

* Fixed issue of test waiting for aborted job to become running causing timeout.

Co-authored-by: Zhu Zhanyan <zhu.zhanyan@gojek.com>
@ches ches added this to the v0.5.0 milestone Apr 26, 2020
@ches ches added the kind/feature New feature or request label Apr 26, 2020
@woop woop mentioned this pull request Apr 30, 2020
@mrzzy mrzzy mentioned this pull request Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add job management APIs
5 participants