Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "partr: fix multiqueue resorting stability" #35589

Merged
merged 1 commit into from
Apr 27, 2020
Merged

Conversation

JeffBezanson
Copy link
Sponsor Member

This reverts commit c23554a.

This caused pfib to run ~forever.

@JeffBezanson JeffBezanson added the domain:multithreading Base.Threads and related functionality label Apr 24, 2020
@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 25, 2020

Hm, any idea why? I know I split out that commit so it could be bisected (and reverted) easily, but it'd be good to understand

@JeffBezanson
Copy link
Sponsor Member Author

All the tasks have the same priority, so I think this was causing it to iterate through most of them on each operation.

@JeffBezanson
Copy link
Sponsor Member Author

@kpamnany What is the right thing to do here to dequeue tasks in the right order without being O(n^2)?

@JeffBezanson JeffBezanson merged commit 47087cb into master Apr 27, 2020
@JeffBezanson JeffBezanson deleted the jb/siftdown branch April 27, 2020 20:46
@kpamnany
Copy link
Contributor

@JeffBezanson: What's the 'right' order? Originally partr used <= in both sift_up() and sift_down(). So in multiq_insert(), the most recently added task was moved to the top of the heap and after a multiq_deletemin(), the task from the bottom of the heap moved back down to the bottom. This honors the depth first strategy.

Looks like this has been changed to < in both... but not sure why.

@JeffBezanson
Copy link
Sponsor Member Author

Looks like this has been changed to < in both... but not sure why.

Because using <= causes the code to traverse a large part of the structure on each operation, making the scheduler O(n^2) and then we spend all of run time there.

@kpamnany
Copy link
Contributor

Have you confirmed that with profiling? If not, I should be able to finally get started on doing some of that this week.

It shouldn't really happen that sifting consumes a lot of runtime. There are 2 * jl_n_threads heaps that are 8-ary trees laid out breadth-first, i.e. 8 task movements in a single cache line. If you have a workload that creates a lot of tasks which yield repeatedly, then perhaps it could get slow -- that's a bad profile for this approach.

If you increase heap_c to 4 here you'll get some speedup for these workloads, but there'll be some overhead when there aren't many tasks.

@JeffBezanson
Copy link
Sponsor Member Author

Yes. Try running this:

function pfib(n::Int)
    if n <= 1
        return n
    end
    t = Threads.@spawn pfib(n-2)
    return pfib(n-1) + fetch(t)::Int
end
pfib(25)

with 2 threads.
With <:

  0.176900 seconds (729.64 k allocations: 87.096 MiB, 12.04% gc time)

with <=:

  6.887915 seconds (728.51 k allocations: 87.061 MiB, 0.49% gc time)

I admit this is a pathological case, but I don't think it's ok for the default scheduler to have any worst case this bad.

@kpamnany
Copy link
Contributor

Yes. There's a 7.5X slowdown with fib in the C partr as well. And we aren't even using the priorities right now AFAICT. This needs some thought.

mbauman added a commit that referenced this pull request Apr 28, 2020
* origin/master: (833 commits)
  Improve typesubtract for tuples (#35600)
  Make searchsorted*/findnext/findprev return values of keytype (#32978)
  fix buggy rand(RandomDevice(), Bool) (#35590)
  remove `Ref` allocation on task switch (#35606)
  Revert "partr: fix multiqueue resorting stability" (#35589)
  exclude types with free variables from `type_morespecific` (#35555)
  fix small typo in NEWS.md (#35611)
  enable inline allocation of structs with pointers (#34126)
  SparseArrays: Speed up right-division by diagonal matrices (#35533)
  Allow hashing 1D OffsetArrays
  NEWS item for introspection macros (#35594)
  Special case empty covec-diagonal-vec product (#35557)
  [GCChecker] fix a few tests by looking through casts
  Use norm instead of abs in generic lu factorization (#34575)
  [GCChecker,NFC] run clang-format -style=llvm
  [GCChecker] fix tests and add Makefile
  Add introspection macros support for dot syntax (#35522)
  Specialize `union` for `OneTo` (#35577)
  add pop!(vector, idx, [default]) (#35513)
  bump Pkg version (#35584)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants