Skip to content
This repository has been archived by the owner on Nov 30, 2024. It is now read-only.

Use loop-based DFS to resolve #341 (SystemStackError) #343

Merged

Conversation

craigjbass
Copy link
Contributor

This resolves performance issues in #341.

I tried to write a spec for this, but I couldn't get it to fail consistently. Stack-size is platform specific.

Copy link
Member

@myronmarston myronmarston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. Nice work! It's always great when users solve their own problems :).

I want to let @yujinakayama review this before we merge since he knows this code the best.

return to_enum(__method__) unless block_given?

yield self
outstanding_nodes = []
outstanding_nodes << self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these two lines be combined into outstanding_nodes = [self]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like combining these lines causes a segfault on 1.9.2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bizarre! Let’s keep them separate then.

@@ -45,13 +45,18 @@ def location
@location ||= args.find { |arg| arg.is_a?(Location) }
end

def each(&block)
def each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the lack of spec to cover this (which is completely understandable!), can you leave a comment on this method explaining why it uses a loop-based approach instead of recursion?

@myronmarston
Copy link
Member

Hmmm, according to this blog post we can configure the stack size with the RUBY_THREAD_VM_STACK_SIZE environment variable. That might provide a way to write a consistent spec. Here's how I imagine that could work:

  • Figure out a value of RUBY_THREAD_VM_STACK_SIZE that makes a spec you've written consistently fail w/o your fix.
  • In our travis build, we can export that ENV variable to ensure the spec's preconditions are satisfied.
  • In the spec (or in a before hook), you can add a line like this to skip the spec when the pre-condition isn't satisfied:
unless ENV['RUBY_THREAD_VM_STACK_SIZE'] == expected_value
  message = "This spec is only valid when RUBY_THREAD_VM_STACK_SIZE is set to #{expected_value}"
  ENV['CI'] ? fail(message) : skip(message)
end

That should give us a spec that is skipped in non-CI environments (unless the developer sets the ENV var), covers the fix on travis, and ensures we don't accidentally remove the ENV export for CI in the future.

Thoughts?

@craigjbass
Copy link
Contributor Author

@myronmarston I had to set it to RUBY_THREAD_VM_STACK_SIZE=20250 to get consistent results natively. That's a stack size of 20KiB.

I then investigated the alpine stack size defaults, and they seem normal:

 ✘ cbass@craig-lenovo  ~/Projects/stack-level-too-deep   master  docker run -it rspec-stack-level-too-deep ash
/app # echo $RUBY_THREAD_VM_STACK_SIZE

/app # irb
irb(main):001:0> RubyVM::DEFAULT_PARAMS
=> {:thread_vm_stack_size=>1048576, :thread_machine_stack_size=>1048576, :fiber_vm_stack_size=>131072, :fiber_machine_stack_size=>524288}

Something strange seems to occur when executing ruby inside Docker.

@craigjbass
Copy link
Contributor Author

craigjbass commented Jan 28, 2018

More digging... https://wiki.musl-libc.org/functional-differences-from-glibc.html#Thread-stack-size

Possibly related to the fact Alpine linux uses musl glibc which has a far smaller stack size. 😬

@yujinakayama
Copy link
Member

yujinakayama commented Jan 29, 2018

@craigjbass Thanks for the fix!

The new loop-based logic basically looks good, but it can be simplified as follows:

        def each
          return to_enum(__method__) unless block_given?

          node_queue = []
          node_queue.push(self)

          while (current_node = node_queue.shift)
            yield current_node
            node_queue.concat(current_node.children)
          end
        end

@craigjbass
Copy link
Contributor Author

Hey @yujinakayama the assignment inside the while loop conditional feels a bit obfuscated to me.

You comfortable with this code being in the repo?

@craigjbass craigjbass force-pushed the use-loop-based-depth-first-search branch from 9514946 to bbb2c88 Compare January 29, 2018 10:33
@craigjbass
Copy link
Contributor Author

@yujinakayama have refactored and squashed my commits

@craigjbass
Copy link
Contributor Author

craigjbass commented Jan 29, 2018

Hey - what are the steps to getting this merged and a release rolled out

@myronmarston myronmarston merged commit 492e223 into rspec:master Jan 30, 2018
myronmarston added a commit that referenced this pull request Jan 30, 2018
@myronmarston
Copy link
Member

Hey - what are the steps to getting this merged and a release rolled out

3.7.1 has been released with the fix.

@craigjbass craigjbass deleted the use-loop-based-depth-first-search branch February 8, 2018 15:43
@jayniz
Copy link

jayniz commented Feb 9, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants