Skip to content

Commit 13739c6

Browse files
committed
Fix sibling bug to #177
While #177 is reported as being caused by a comment, the underlying behavior is a problem due to the newline that we generated (from a comment). The prior commit fixed that problem by preserving whitespace before the comment. That guarantees that a block will form there from the frontier before it will be expanded there via a "neighbors" method. Since empty lines are valid ruby code, it will be hidden and be safe. ## Problem setup This failure mode is not fixed by the prior commit, because the indentation is 0. To provide good results, we must make the algorithm less greedy. One heuristic/signal to follow is developer added newlines. If a developer puts a newline between code, it's more likely they're unrelated. For example: ``` port = rand(1000...9999) stub_request(:any, "localhost:#{port}") query = Cutlass::FunctionQuery.new( port: port ).call expect(WebMock).to have_requested(:post, "localhost:#{port}"). with(body: "{}") ``` This code is split into three chunks by the developer. Each are likely (but not guaranteed) to be intended to stand on their own (in terms of syntax). This behavior is good for scanning neighbors (same indent or higher) within a method, but bad for parsing neighbors across methods. ## Problem Code is expanded to capture all neighbors, and then it decreases indent level which allows it to capture surrounding scope (think moving from within the method to also capturing the `def/end` definition. Once the indentation level has been increased, we go back to scanning neighbors, but now neighbors also contain keywords. For example: ``` 1 def bark 2 3 end 4 5 def sit 6 end ``` In this case if lines 4, 5, and 6 are in a block when it tries to expand neighbors it will expand up. If it stops after line 2 or 3 it may cause problems since there's a valid kw/end pair, but the block will be checked without it. TLDR; It's good to stop scanning code after hitting a newline when you're in a method...it causes a problem scanning code between methods when everything inside of one of the methods is an empty line. In this case it grabs the end on line 3 and since the problem was an extra end, the program now compiles correctly. It incorrectly assumes that the block it captured was causing the problem. ## Extra bit of context One other technical detail is that after we've decided to stop scanning code for a new neighbor block expansion, we look around the block and grab any empty newlines. Basically adding empty newlines before of after a code block do not affect the parsing of that block. ## The fix Since we know that this problem only happens when there's a newline inside of a method and we know this particular failure mode is due to having an invalid block (capturing an extra end, but not it's keyword) we have all the metadata we need to detect this scenario and correct it. We know that the next line above our block must be code or empty (since we grabbed extra newlines). Same for code below it. We can count all the keywords and ends in the block. If they are balanced, it's likely (but not guaranteed) we formed the block correctly. If they're imbalanced, look above or below (depending on the nature of the imbalance), check to see if adding that line would balance the count. This concept of balance and "leaning" comes from work in #152 and has proven useful, but not been formally introduced into the main branch. ## Outcome Adding this extra check introduced no regressions and fixed the test case. It might be possible there's a mirror or similar problem that we're not handling. That will come out in time. It might also be possible that this causes a worse case in some code not under test. That too would come out in time. One other possible concern to adding logic in this area (which is a hot codepath), is performance. This extra count check will be performed for every block. In general the two most helpful performance strategies I've found are reducing total number of blocks (therefore reducing overall N internal iterations) and making better matches (the parser to determine if a close block is valid or not is a major bottleneck. If we can split valid code into valid blocks, then it's only evaluated by the parser once, where as invalid code must be continuously re-checked by the parser until it becomes valid, or is determined to be the cause of the core problem. This extra logic should very rarely result in a change, but when it does it should tend to produce slightly larger blocks (by one line) and more accurate blocks. Informally it seems to have no impact on performance: `` This branch: DEBUG_DISPLAY=1 bundle exec rspec spec/ --format=failures 3.01s user 1.62s system 113% cpu 4.076 total ``` ``` On main: DEBUG_DISPLAY=1 bundle exec rspec spec/ --format=failures 3.02s user 1.64s system 113% cpu 4.098 total ```
1 parent dc18f9c commit 13739c6

File tree

7 files changed

+197
-12
lines changed

7 files changed

+197
-12
lines changed

lib/syntax_suggest/around_block_scan.rb

Lines changed: 49 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,6 @@ def stop_after_kw
6161

6262
def scan_while
6363
stop_next = false
64-
6564
kw_count = 0
6665
end_count = 0
6766
index = before_lines.reverse_each.take_while do |line|
@@ -166,7 +165,55 @@ def on_falling_indent
166165
end
167166
end
168167

169-
def scan_neighbors
168+
# Scanning is intentionally conservative because
169+
# we have no way of rolling back an agressive block (at this time)
170+
#
171+
# If a block was stopped for some trivial reason, (like an empty line)
172+
# but the next line would have caused it to be balanced then we
173+
# can check that condition and grab just one more line either up or
174+
# down.
175+
#
176+
# For example, below if we're scanning up, line 2 might cause
177+
# the scanning to stop. This is because empty lines might
178+
# denote logical breaks where the user intended to chunk code
179+
# which is a good place to stop and check validity. Unfortunately
180+
# it also means we might have a "dangling" keyword or end.
181+
#
182+
# 1 def bark
183+
# 2
184+
# 3 end
185+
#
186+
# If lines 2 and 3 are in the block, then when this method is
187+
# run it would see it is unbalanced, but that acquiring line 1
188+
# would make it balanced, so that's what it does.
189+
def lookahead_balance_one_line
190+
kw_count = 0
191+
end_count = 0
192+
lines.each do |line|
193+
kw_count += 1 if line.is_kw?
194+
end_count += 1 if line.is_end?
195+
end
196+
197+
return self if kw_count == end_count # nothing to balance
198+
199+
# More ends than keywords, check if we can balance expanding up
200+
if (end_count - kw_count) == 1 && next_up
201+
return self unless next_up.is_kw?
202+
return self unless next_up.indent >= @orig_indent
203+
204+
@before_index = next_up.index
205+
206+
# More keywords than ends, check if we can balance by expanding down
207+
elsif (kw_count - end_count) == 1 && next_down
208+
return self unless next_down.is_end?
209+
return self unless next_down.indent >= @orig_indent
210+
211+
@after_index = next_down.index
212+
end
213+
self
214+
end
215+
216+
def scan_neighbors_not_empty
170217
scan_while { |line| line.not_empty? && line.indent >= @orig_indent }
171218
end
172219

lib/syntax_suggest/block_expand.rb

Lines changed: 91 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,14 +35,31 @@ def initialize(code_lines:)
3535
@code_lines = code_lines
3636
end
3737

38+
# Main interface. Expand current indentation, before
39+
# expanding to a lower indentation
3840
def call(block)
3941
if (next_block = expand_neighbors(block))
40-
return next_block
42+
next_block
43+
else
44+
expand_indent(block)
4145
end
42-
43-
expand_indent(block)
4446
end
4547

48+
# Expands code to the next lowest indentation
49+
#
50+
# For example:
51+
#
52+
# 1 def dog
53+
# 2 print "dog"
54+
# 3 end
55+
#
56+
# If a block starts on line 2 then it has captured all it's "neighbors" (code at
57+
# the same indentation or higher). To continue expanding, this block must capture
58+
# lines one and three which are at a different indentation level.
59+
#
60+
# This method allows fully expanded blocks to decrease their indentation level (so
61+
# they can expand to capture more code up and down). It does this conservatively
62+
# as there's no undo (currently).
4663
def expand_indent(block)
4764
AroundBlockScan.new(code_lines: @code_lines, block: block)
4865
.skip(:hidden?)
@@ -51,14 +68,82 @@ def expand_indent(block)
5168
.code_block
5269
end
5370

71+
# A neighbor is code that is at or above the current indent line.
72+
#
73+
# First we build a block with all neighbors. If we can't go further
74+
# then we decrease the indentation threshold and expand via indentation
75+
# i.e. `expand_indent`
76+
#
77+
# Handles two general cases.
78+
#
79+
# ## Case #1: Check code inside of methods/classes/etc.
80+
#
81+
# It's important to note, that not everything in a given indentation level can be parsed
82+
# as valid code even if it's part of valid code. For example:
83+
#
84+
# 1 hash = {
85+
# 2 name: "richard",
86+
# 3 dog: "cinco",
87+
# 4 }
88+
#
89+
# In this case lines 2 and 3 will be neighbors, but they're invalid until `expand_indent`
90+
# is called on them.
91+
#
92+
# When we are adding code within a method or class (at the same indentation level),
93+
# use the empty lines to denote the programmer intended logical chunks.
94+
# Stop and check each one. For example:
95+
#
96+
# 1 def dog
97+
# 2 print "dog"
98+
# 3
99+
# 4 hash = {
100+
# 5 end
101+
#
102+
# If we did not stop parsing at empty newlines then the block might mistakenly grab all
103+
# the contents (lines 2, 3, and 4) and report them as being problems, instead of only
104+
# line 4.
105+
#
106+
# ## Case #2: Expand/grab other logical blocks
107+
#
108+
# Once the search algorithm has converted all lines into blocks at a given indentation
109+
# it will then `expand_indent`. Once the blocks that generates are expanded as neighbors
110+
# we then begin seeing neighbors being other logical blocks i.e. a block's neighbors
111+
# may be another method or class (something with keywords/ends).
112+
#
113+
# For example:
114+
#
115+
# 1 def bark
116+
# 2
117+
# 3 end
118+
# 4
119+
# 5 def sit
120+
# 6 end
121+
#
122+
# In this case if lines 4, 5, and 6 are in a block when it tries to expand neighbors
123+
# it will expand up. If it stops after line 2 or 3 it may cause problems since there's a
124+
# valid kw/end pair, but the block will be checked without it.
125+
#
126+
# We try to resolve this edge case with `lookahead_balance_one_line` below.
54127
def expand_neighbors(block)
55-
expanded_lines = AroundBlockScan.new(code_lines: @code_lines, block: block)
128+
neighbors = AroundBlockScan.new(code_lines: @code_lines, block: block)
56129
.skip(:hidden?)
57130
.stop_after_kw
58-
.scan_neighbors
59-
.scan_while { |line| line.empty? } # Slurp up empties
131+
.scan_neighbors_not_empty
132+
133+
# Slurp up empties
134+
with_empties = neighbors
135+
.scan_while { |line| line.empty? }
136+
137+
# If next line is kw and it will balance us, take it
138+
expanded_lines = with_empties
139+
.lookahead_balance_one_line
60140
.lines
61141

142+
# Don't allocate a block if it won't be used
143+
#
144+
# If nothing was taken, return nil to indicate that status
145+
# used in `def call` to determine if
146+
# we need to expand up/out (`expand_indent`)
62147
if block.lines == expanded_lines
63148
nil
64149
else

lib/syntax_suggest/capture_code_context.rb

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,6 @@ def call
7676
# end
7777
# end
7878
#
79-
#
8079
def capture_falling_indent(block)
8180
AroundBlockScan.new(
8281
block: block,

spec/integration/syntax_suggest_spec.rb

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -234,5 +234,30 @@ def sit
234234
> 10 end # extra end
235235
EOM
236236
end
237+
238+
it "space inside of a method" do
239+
source = <<~'EOM'
240+
class Dog # 1
241+
def bark # 2
242+
243+
end # 4
244+
245+
def sit # 6
246+
print "sit" # 7
247+
end # 8
248+
end # 9
249+
end # extra end
250+
EOM
251+
252+
io = StringIO.new
253+
SyntaxSuggest.call(
254+
io: io,
255+
source: source
256+
)
257+
out = io.string
258+
expect(out).to include(<<~EOM)
259+
> 10 end # extra end
260+
EOM
261+
end
237262
end
238263
end

spec/unit/around_block_scan_spec.rb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ module SyntaxSuggest
1313
code_lines = CodeLine.from_source(source)
1414
block = CodeBlock.new(lines: code_lines[1])
1515
expand = AroundBlockScan.new(code_lines: code_lines, block: block)
16-
.scan_neighbors
16+
.scan_neighbors_not_empty
1717

1818
expect(expand.code_block.to_s).to eq(source)
1919
expand.scan_while { |line| false }
@@ -151,7 +151,7 @@ def foo
151151
expand = AroundBlockScan.new(code_lines: code_lines, block: block)
152152
expand.skip(:empty?)
153153
expand.skip(:hidden?)
154-
expand.scan_neighbors
154+
expand.scan_neighbors_not_empty
155155

156156
expect(expand.code_block.to_s).to eq(<<~EOM.indent(4))
157157

spec/unit/block_expand_spec.rb

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,36 @@
44

55
module SyntaxSuggest
66
RSpec.describe BlockExpand do
7+
it "empty line in methods" do
8+
source_string = <<~EOM
9+
class Dog # index 0
10+
def bark # index 1
11+
12+
end # index 3
13+
14+
def sit # index 5
15+
print "sit" # index 6
16+
end # index 7
17+
end # index 8
18+
end # extra end
19+
EOM
20+
21+
code_lines = code_line_array(source_string)
22+
23+
sit = code_lines[4..7]
24+
sit.each(&:mark_invisible)
25+
26+
block = CodeBlock.new(lines: sit)
27+
expansion = BlockExpand.new(code_lines: code_lines)
28+
block = expansion.expand_neighbors(block)
29+
30+
expect(block.to_s).to eq(<<~EOM.indent(2))
31+
def bark # index 1
32+
33+
end # index 3
34+
EOM
35+
end
36+
737
it "captures multiple empty and hidden lines" do
838
source_string = <<~EOM
939
def foo

spec/unit/code_search_spec.rb

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -338,7 +338,6 @@ def dog
338338
end
339339
EOM
340340
search.call
341-
puts "done"
342341

343342
expect(search.invalid_blocks.join).to eq(<<~'EOM')
344343
Foo.call do

0 commit comments

Comments
 (0)