Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of ProgressMeter.jl and added a reset_callback to DPWSolver #69

Merged
merged 15 commits into from
Sep 7, 2020

Conversation

mossr
Copy link
Member

@mossr mossr commented Aug 11, 2020

Being an "anytime" algorithm, it's nice to know how long MCTS will take :)
Therefore, I added support for using @showprogress from ProgressMeter.jl within action_info for the DPWPlanner. Note the progress meter is off by default.

I also added a reset_callback(mdp, s) option for the DWPSolver so you can specify a callback function to reset your MDP to a given state. I used these changes for a recent DASC 2020 paper.

@zsunberg
Copy link
Member

@mossr , thanks for the contribution!

Do you know if the progress meter addition could have any performance consequences? There is a very wide variety of ways this package is used - some people use it to plan for minutes or hours; some use it to plan for 0.01 seconds. I am a bit concerned that if it is used for 0.01 seconds, the progress meter might take a significant portion of that. I would be more comfortable using code where we will be sure that julia will optimize out all progress meter code.

(In the long run, it would be better to have a package that is designed for different use cases than just running POMDPs.jl simulations like this one 😄)

I am not sure I understand the need for reset_callback. Is it because you have a simulator where the state is not actually separate from the mdp?

(In the long run, the solution to this would be to have a better package that uses an interface that does not assume that the mdp model and state are completely separate objects, like CommonRLInterface.jl 😄. I think with all the julia experience we have acquired, we could make a much much better MCTS package that is easier for people to use and extend for many different use cases. If you happen to be interested in working on this, let me know and we can zoom about it! I have wanted to work on it for a long time, but have had to focus on other things.)

@findmyway
Copy link

I think with all the julia experience we have acquired, we could make a much much better MCTS package that is easier for people to use and extend for many different use cases

Count me in! 😉

src/dpw.jl Outdated
@@ -15,9 +15,10 @@ POMDPs.action(p::DPWPlanner, s) = first(action_info(p, s))
"""
Construct an MCTSDPW tree and choose the best action. Also output some information.
"""
function POMDPModelTools.action_info(p::DPWPlanner, s; tree_in_info=false)
function POMDPModelTools.action_info(p::DPWPlanner, s; tree_in_info=false, show_progress=false)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest using an Union{Progress, Nothing} here. So that @zsunberg will have no concern about performance😁

@mossr
Copy link
Member Author

mossr commented Aug 17, 2020

@zsunberg Great point regarding performance. I modified my branch based on the suggestion from @findmyway to use the Progress type instead of @showprogress. This will optimize the loop to skip any additional computation when show_progress=false (which is the default). I attempted a solution using Requires.jl, but found an issue with that package that I filed here (if you're curious): JuliaPackaging/Requires.jl#88 (tl;dr POMDPSimulators loads ProgressMeter which triggers MCTS to think that it has direct access to ProgressMeter).

Regarding reset_callback, your explanation is correct. We use this feature for adaptive stress testing (POMDPStressTesting.jl) when the simulator state is not truly separate from the MDP state.
I updated the description of reset_callback and comments around where it's used to clarify this.

I am interested in a general MCTS package, but—like you—don't have the time at the moment. Maybe in 6-12 months time we can reconvene about this! 😄

@codecov
Copy link

codecov bot commented Aug 17, 2020

Codecov Report

Merging #69 into master will increase coverage by 0.20%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #69      +/-   ##
==========================================
+ Coverage   85.30%   85.51%   +0.20%     
==========================================
  Files          11       11              
  Lines         415      421       +6     
==========================================
+ Hits          354      360       +6     
  Misses         61       61              
Impacted Files Coverage Δ
src/MCTS.jl 100.00% <ø> (ø)
src/dpw.jl 96.19% <100.00%> (+0.19%) ⬆️
src/dpw_types.jl 83.87% <100.00%> (+0.53%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1434f7e...a3e9c2b. Read the comment docs.

@zsunberg
Copy link
Member

Sorry for not responding for a while. I need to request a few more changes:

  1. For consistency we should have the progress option available in the solver constructor
  2. Should we need to call finish! on the progress option after the loop in case it terminates early because of time?
  3. I don't think this code branch

    MCTS.jl/src/dpw.jl

    Lines 97 to 99 in c006ef2

    if sol.reset_callback !== nothing
    sol.reset_callback(dpw.mdp, s) # Optional: used to reset/reinitialize MDP to a given state.
    end
    can be optimized out because the compiler does not know what type reset_callback is. Can you either verify that it is optimized out or make reset_callback inferrable (probably by adding it to DPWPlanner)? (or if you have another argument about why it is ok, I am happy to hear it)

@mossr
Copy link
Member Author

mossr commented Sep 4, 2020

Hey @zsunberg, to counter your delay I delayed 🙃

Regarding your comments (which were all warranted):

  1. Agreed regarding show_progress—I moved the option to the solver.
  2. Done. I now call finish! inside the timeout conditional right before we break.
  3. This one was fun to prove! I changed the type of reset_callback from Any to Function with a default of (mdp, s)->false. Beautifully, Julia optimizes out that call as shown in my test example below.
mutable struct Obj value end # Some random structure

obj = Obj(0) # With an instantiation

# Function under test, which includes an inner call to `callback`
function perf_test(obj, callback::Function)
    callback(obj)
    return (10,20)
end

First, it's cleaner to show the @code_typed output for the two cases: 1) when the callback input function is our default (i.e. blindly returns false) and 2) when it actually performs some operation on its input:

julia> @code_typed perf_test(obj, o->false)
CodeInfo(
1return (10, 20)
) => Tuple{Int64,Int64}

julia> @code_typed perf_test(obj, o->o.value+=1)
CodeInfo(
1 ─     invoke callback(_2::Obj)::Any
└──     return (10, 20)
) => Tuple{Int64,Int64}

Now to compare using @code_native, which is what's ultimately run by the machine (and thus harder to initially follow), I wanted to first create another function called perf_test_empty that simply does not call the callback internally:

function perf_test_empty(obj, callback::Function)
    return (10,20)
end

And running both functions, where each of them takes in the blind false callback:

julia> @code_native perf_test(obj, o->false)
        .text
; ┌ @ REPL[6]:2 within `perf_test'
        pushq   %rbp
        movq    %rsp, %rbp
; │ @ REPL[6]:3 within `perf_test'
        movabsq $736989064, %rax        # imm = 0x2BED8F88
        vmovups (%rax), %xmm0
        vmovups %xmm0, (%rcx)
        movq    %rcx, %rax
        popq    %rbp
        retq
        nopl    (%rax,%rax)
; └

julia> @code_native perf_test_empty(obj, o->false)
        .text
; ┌ @ REPL[7]:2 within `perf_test_empty'
        pushq   %rbp
        movq    %rsp, %rbp
        movabsq $736990256, %rax        # imm = 0x2BED9430
        vmovups (%rax), %xmm0
        vmovups %xmm0, (%rcx)
        movq    %rcx, %rax
        popq    %rbp
        retq
        nopl    (%rax,%rax)
; └

We can also show what perf_test and perf_test_empty look like when they both take in the callback that modifies its inputs:

julia> @code_native perf_test(obj, o->o.value+=1)
        .text
; ┌ @ REPL[6]:2 within `perf_test'
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %rsi
        subq    $40, %rsp
        movq    %rcx, %rsi
        movq    %rdx, -16(%rbp)
        movabsq $"japi1_#27_17172", %rax
        leaq    -16(%rbp), %rdx
        movl    $811572056, %ecx        # imm = 0x305F9B58
        movl    $1, %r8d
        callq   *%rax
; │ @ REPL[6]:3 within `perf_test'
        movabsq $736989448, %rax        # imm = 0x2BED9108
        vmovups (%rax), %xmm0
        vmovups %xmm0, (%rsi)
        movq    %rsi, %rax
        addq    $40, %rsp
        popq    %rsi
        popq    %rbp
        retq
        nopw    (%rax,%rax)
; └

julia> @code_native perf_test_empty(obj, o->o.value+=1)
        .text
; ┌ @ REPL[7]:2 within `perf_test_empty'
        pushq   %rbp
        movq    %rsp, %rbp
        movabsq $736990128, %rax        # imm = 0x2BED93B0
        vmovups (%rax), %xmm0
        vmovups %xmm0, (%rcx)
        movq    %rcx, %rax
        popq    %rbp
        retq
        nopl    (%rax,%rax)
; └

Therefore, I now just call p.solver.reset_callback(p.mdp, s) without wrapping the !== nothing conditional and Julia should optimize it out! Note, I did the same exercise with the !== nothing conditional and—as you suspected—Julia did not optimize that branch out. So I'm pretty happy with the above (and committed) solution.

@mossr
Copy link
Member Author

mossr commented Sep 4, 2020

Note I relaxed the Colors.jl lower bound version requirement due to conflicts between Flux v0.10 and MCTS v0.4.3 (without issue)

@zsunberg
Copy link
Member

zsunberg commented Sep 7, 2020

Ok, we are close... Thanks for being thorough! Unfortunately, I don't think reset_callback will optimize it out in MCTS. The reason it was able to do so in the example you provided is because julia compiles a specialized version of perf_test for the concrete argument types, so essentially you have a function barrier there. This won't work in MCTS because the solver type does not contain information about the concrete type of reset_callback.

I went ahead and fixed it by putting it into the planner object with a type parameter. (I am not sure if this is the absolute best pattern, but it is what I have been using and I am just maintaining consistency)

I also added Colors 0.12 back into the compatibility in addition to 0.11. I don't think there is any reason to limit it to just 0.11.

If the tests pass, I will merge this.

@mossr
Copy link
Member Author

mossr commented Sep 7, 2020

Interesting and good to know—thanks for going ahead and fixing it!

@zsunberg zsunberg merged commit 6782345 into JuliaPOMDP:master Sep 7, 2020
@zsunberg
Copy link
Member

zsunberg commented Sep 7, 2020

Ok, @mossr is there anything else to add before we register a new version?

@mossr
Copy link
Member Author

mossr commented Sep 7, 2020

@zsunberg Nothing code-wise, but I thought we could address the Github linguist issue I opened beforehand: #71

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants