Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider native operations state machine #14405

Merged
merged 12 commits into from
May 4, 2017

Conversation

agrare
Copy link
Member

@agrare agrare commented Mar 20, 2017

This provides the basics for a provider extensible CRUD state machine.

The basic workflow is:

  1. Run a native operation and wait for it to complete
  2. Start a refresh and wait for it to complete
  3. Notify the user about the operation completing successfully or failing

State transition diagram:

    *                          /------\
    | :initialize              |      |  :poll_native_task
    v               :start     v      |
 waiting_to_start  -------> running ----------> refreshing <-\
                               |     :refresh        |       | :poll_refresh
                               v                     |-------/
                             error <-----------------|
                               |                     |
                               v                     |
       finished <---------- notifying --------------/
                   :finish              :notify

Examples of a successful and an unsuccessful notification from a NativeCrud job:
screenshot from 2017-03-21 16-29-35

@agrare agrare added the wip label Mar 20, 2017
@agrare agrare force-pushed the provider_native_crud_state_machine branch 3 times, most recently from 310e18c to 52e4085 Compare March 20, 2017 17:25
@agrare agrare requested a review from Fryguy March 20, 2017 18:13
@Fryguy Fryguy requested a review from blomquisg March 20, 2017 21:36
@Fryguy
Copy link
Member

Fryguy commented Mar 20, 2017

Overall LGTM. Still wonder if it might be nicer to use the better state machine system.

@agrare agrare force-pushed the provider_native_crud_state_machine branch from 52e4085 to 434a591 Compare March 21, 2017 19:07
@agrare agrare changed the title [WIP] Provider native crud state machine Provider native crud state machine Mar 21, 2017
@agrare agrare removed the wip label Mar 21, 2017
@blomquisg
Copy link
Member

I love the ascii art state diagram.

Also:

Overall LGTM. Still wonder if it might be nicer to use the better state machine system.

LOL, I heard this coming before I read it.

Copy link
Member

@blomquisg blomquisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited about this!

I have a few comments, but nothing really mandatory.

# finished <---------- notifying --------------/
# :finish :notify
#
def load_transitions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this method be protected?

I'm thinking of the possibility of later converging on one state machine. And this feels a bit like an internal detail that I would hope could be hidden away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly can, done

end
end

if refresh_finished
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can refresh_finished? become a private method that's called by poll_refresh?

Then, this method becomes:

def poll_refresh
  if refreshed_finished?
    queue_signal(:notify)
  else
    queue_signal(:poll_refresh, :deliver_on => Time.now.utc + 1.minute)
  end
end

I suppose you could one-line that into a ternary, but the if/else looks fine to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can refresh_finished? become a private method that's called by poll_refresh?

Yeah I like that a lot, I'll just need to handle the refresh task error differently since right now it is returning out of poll_refresh and signaling :error

@@ -0,0 +1,129 @@
class ManageIQ::Providers::NativeCrud < Job
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard...

ManageIQ::Providers::NativeOperationWorkflow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for not Crud

I guess it's for any operation thats to be scheduled (async) and which needs a refresh after it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like Workflow, aligns better with the miq_provision_workflow/miq_request_workflow

# v |
# finished <---------- notifying --------------/
# :finish :notify
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@agrare agrare force-pushed the provider_native_crud_state_machine branch 4 times, most recently from f59205e to db27d48 Compare March 22, 2017 20:05
@agrare agrare changed the title Provider native crud state machine Provider native operations state machine Mar 22, 2017
@agrare
Copy link
Member Author

agrare commented Mar 22, 2017

Okay moved everything from NativeCrud -> NativeOperationWorkflow

@miq-bot
Copy link
Member

miq-bot commented Mar 27, 2017

This pull request is not mergeable. Please rebase and repush.

@agrare agrare force-pushed the provider_native_crud_state_machine branch from db27d48 to 103c4e0 Compare March 28, 2017 21:03
@miq-bot
Copy link
Member

miq-bot commented Apr 5, 2017

This pull request is not mergeable. Please rebase and repush.

@agrare agrare force-pushed the provider_native_crud_state_machine branch from 103c4e0 to 454a00b Compare April 5, 2017 12:50
@job.signal(:start)
end

it 'doesn\'t allow poll_native_task signal' do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: if you use double quote, then you don't have to escape the '

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I started with ' and wanted to keep it consistent...maybe should switch to " everywhere here

@agrare agrare force-pushed the provider_native_crud_state_machine branch from 3ca48a4 to 1764f21 Compare May 2, 2017 14:55
@miq-bot
Copy link
Member

miq-bot commented May 2, 2017

Checked commits agrare/manageiq@a32c8a1~...1764f21 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
3 files checked, 0 offenses detected
Everything looks fine. 🏆

@Fryguy
Copy link
Member

Fryguy commented May 2, 2017

This LGTM... @blomquisg if you're happy with it, please merge.

@durandom
Copy link
Member

durandom commented May 3, 2017

@agrare do you plan using this anywhere? Otherwise I could use https://www.pivotaltracker.com/story/show/144430761 to make use of this...

@agrare
Copy link
Member Author

agrare commented May 3, 2017

@durandom I was going to start moving some vm power ops over to use this, but by all means start using/modifying it for whatever you want :)

mresti pushed a commit to mresti/manageiq that referenced this pull request May 3, 2017
the sleep here is also needed because tower needs some time
to actually propagate it's updates
if we would return immediately it _could_ be that the we get
the old playbooks

the whole sleep business is a workaround anyway until we get
proper polling via ManageIQ#14405
mresti pushed a commit to mresti/manageiq that referenced this pull request May 3, 2017
the sleep here is also needed because tower needs some time
to actually propagate it's updates
if we would return immediately it _could_ be that the we get
the old playbooks

the whole sleep business is a workaround anyway until we get
proper polling via ManageIQ#14405
@blomquisg blomquisg merged commit 4872781 into ManageIQ:master May 4, 2017
@blomquisg blomquisg added this to the Sprint 60 Ending May 8, 2017 milestone May 4, 2017
@agrare agrare deleted the provider_native_crud_state_machine branch May 4, 2017 19:42
@agrare agrare restored the provider_native_crud_state_machine branch May 4, 2017 19:42
@agrare agrare deleted the provider_native_crud_state_machine branch May 4, 2017 19:42
juliancheal pushed a commit to ManageIQ/manageiq-providers-ansible_tower that referenced this pull request May 11, 2017
the sleep here is also needed because tower needs some time
to actually propagate it's updates
if we would return immediately it _could_ be that the we get
the old playbooks

the whole sleep business is a workaround anyway until we get
proper polling via ManageIQ/manageiq#14405


(transferred from ManageIQ/manageiq@85090c9)
@agrare
Copy link
Member Author

agrare commented Mar 23, 2018

@Glutexo yes it is used by https://github.com/ManageIQ/manageiq/blob/master/app/models/service/resource_linking.rb#L21-L23 via https://github.com/ManageIQ/manageiq/blob/master/app/models/service/linking_workflow.rb.

It is really based around something that needs to do an EmsRefresh and get the resulting object though.

@Glutexo
Copy link
Member

Glutexo commented Mar 23, 2018

Sorry, my bad, I deleted my comment instead of editing it. N00b. Pasting here for the context:

Is this actually used somewhere already? I‘ve found a reference to this PR in ManageIQ::Providers::AnsibleTower::Shared::AutomationManager::ConfigurationScriptSource. There is some nasty sleep loop going on, waiting for a project refresh (cloning/sync) in the provider finishes. I guess it’s there instead of its own job so a right error/success notification is sent. It this the case your ManageIQ::Providers::NativeOperationWorkflow can help with? If so, would you mind sharing how, please?

@jameswnl @agrare

@agrare
Copy link
Member Author

agrare commented Mar 23, 2018

@Glutexo if you don't need to do the EmsRefresh step you can probably just use the first few states, you should be able to provide your own state transitions in a subclass

@Glutexo
Copy link
Member

Glutexo commented Mar 23, 2018

Thanks, @agrare! I’ll take a look at it for reference.

I guess there are two reasons why there are those nasty things going on:

  • Doing a refresh (unfortunately untargeted this time, but I guess it can be fixed) after creating a resource in EMS, verifying it is really present in there.
  • Waiting for the resource (project) to do its initial refresh/update/sync (which takes some time) in the EMS right after it is created. This is done so we can deliver success/error notification depending on the result of the sync.

From what you said it looks like the NaturalWorkflow is designed just exactly to solve the former case and it can be extended to support the latter as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants