Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common/swdptap: make SWD timing more consistent #1714

Merged
merged 4 commits into from
Jul 12, 2024

Conversation

tlyu
Copy link
Contributor

@tlyu tlyu commented Jan 4, 2024

Make SWD timing more consistent: output changes (including floating) take place after the falling edge, and reads take place before the rising edge.

This includes moving a gpio_get to occur after the clock-low delay, which would help reduce incorrect reads from lower-bandwidth target connections.

Eliminate redundant GPIO operations, typically ones that set the clock low, by making the bit string input/output primitives always end by outputting a falling clock edge.

Detailed description

This makes the bit banging code more consistent and easier to analyze. It also removes some redundant clearing of the clock, typically because most existing big banging functions ended on a falling clock edge, and also began with an explicit clearing of the clock. This is no longer necessary, if we adopt a convention that the "resting" phase of the clock is low.

It also moves a misplaced gpio_get from before a clock-low delay to after it, so it comes right before the rising clock edge. This can help avoid incorrect reads from lower-bandwidth target connections. It might be helpful for @stansotn or someone else with such target connections to test it out.

I'll mark it ready for review after testing on BMDA, though I suspect that BMDA might slow things down too enough to make the differences harder to see on the logic analyzer.

Note that adopting an inlined busy-loop function as in #1688 might be needed to gain more significant clock consistency.

Your checklist for this pull request

  • I've read the Code of Conduct
  • I've read the guidelines for contributing to this repository
  • It builds for hardware native (make PROBE_HOST=native)
  • It builds as BMDA (make PROBE_HOST=hosted)
  • I've tested it to the best of my ability
  • My commit messages provide a useful short description of what the commits do

Closing issues

N/A

@tlyu tlyu marked this pull request as ready for review January 8, 2024 21:00
@tlyu
Copy link
Contributor Author

tlyu commented Jan 8, 2024

I think the cycles saved by this change are small enough that my logic analyzer can't reliably see them, so I'm going to mark this as ready for review.

dragonmux
dragonmux previously approved these changes Jan 13, 2024
Copy link
Member

@dragonmux dragonmux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a code perspective this all looks good to us. We'll try to get this tested on our local setup in the next couple of days to see what the before-and-after on the timing and waveforms looks like with a mind to merge this if it all looks fine.

We'll post our testing results as images and traces here to be able to reference back to if needed.

@dragonmux dragonmux added this to the v2.0 release milestone Jan 13, 2024
@dragonmux dragonmux added Enhancement General project improvement BMP Firmware Black Magic Probe Firmware (not PC hosted software) labels Jan 13, 2024
@tlyu
Copy link
Contributor Author

tlyu commented Jan 13, 2024

I wonder if it's useful to hang some extra capacitance off of the probe wiring, to slow things down enough to A/B test the change to the bit read timing. (I don't have a MSO or similar at the moment, which would likely help a lot with this kind of test.)

Copy link
Member

@dragonmux dragonmux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've just given this a test, screenshots of the resulting waveforms to follow, and while the delay routines on minimum delay now have a very slight duty cycle imbalance on SWCLK, the clarity improvements and other changes more than make up for that and it doesn't cause bad behaviour from parts tested. If all involved are happy, we're happy to merge this.

src/platforms/common/swdptap.c Outdated Show resolved Hide resolved
@dragonmux
Copy link
Member

dragonmux commented Jan 16, 2024

Here are our results from testing this PR

first is a set of before captures from main:

no-delay SWD out:
no-delay SWD out signaling

no-delay SWD in:
no-delay SWD in signaling

minimum delay SWD out:
minimum delay SWD out signaling

minimum delay SWD in:
minimum delay SWD in signaling

Now follow the after captures taken without any extra nop's:

no-delay SWD out:
no-delay SWD out signaling

no-delay SWD in:
no-delay SWD in signaling

minimum delay SWD out:
minimum delay SWD out signaling

minimum delay SWD in:
minimum delay SWD in signaling

And finally, the no-delay routines with one extra nop:

no-delay SWD out:
no-delay SWD out signaling

no-delay SWD in:
no-delay SWD in signaling

In the case of the extra no-ops we may have inserted the extra in the wrong spot looking more closely at how it changed the waveforms - but this gives you a basis to work from at least.

@tlyu
Copy link
Contributor Author

tlyu commented Jan 17, 2024

Thanks for the review!

I noticed similar changes as what you saw, but was also limited to a logic analyzer with 24MHz sample rate. Many of the asymmetries are on the order of one LA sample, and aren't that consistent. I can take a closer look to see if there are any asymmetries that are somewhat more consistent, and try to adjust the timing on those.

On BMP native clocked at 72MHz (default), it seems like a 1-cycle difference could easily result in an inconsistent 1-sample asymmetry at 24MSamp/sec. It would be nice to get some traces from a faster logic analyzer.

@dragonmux
Copy link
Member

What we tend to do is assume that the asymmetry errs on the side that appears more commonly - so that final capture is showing an asymmetry ~2/3 of the way to the right between the two seen asymetries - they're both asymetrical on the side of the high time being low; just.. slightly closer to even than the captures showing one high period would otherwise suggest

Agreed, though, that getting some high sample density captures would be great - if we can get our scope set up for it'll we'll try to get some captures with that - but that'll be a few days out.

@tlyu
Copy link
Contributor Author

tlyu commented Jan 17, 2024

Force-push is to resolve a merge conflict; I haven't dug into the asymmetries in detail yet.

@tlyu
Copy link
Contributor Author

tlyu commented Jan 17, 2024

After poking at this a bit, staring at LA traces and disassembly, most of the calculations that occur inside the loop happen during the clock low phase for output. Input calculations can be moved around a bit. In any case, moving the clock clear to the top of the loop for output makes it easier to rebalance the duty cycle.

(gpio_set_val has enough branching that it takes a comparable number of cycles to the loop increment)

@tlyu
Copy link
Contributor Author

tlyu commented Jan 18, 2024

Thanks for the review and detailed timing analysis! I've made some adjustments that seem to improve the duty cycle for me, at least as well as I can see at 24MSamp/sec.

@dragonmux
Copy link
Member

Ah, sorry the ball got dropped a bit on this one - could you rebase this on main please and we'll look at getting this merged ASAP as we think it's probably ready to go otherwise.

Copy link
Member

@dragonmux dragonmux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies this PR slipped away from us.. We've a few review notes from this round.

With the up-vs-down loops, we don't recall if we've already had a conversation about it and what the outcome was, so please forgive us if you've already explained to us before if there's a merit for the proposed change and we've just forgotten it. If there's a clearly demonstrated technical reason for using the count-down style here, please add comments in the code explaining the choice and otherwise ignore those 4 lints.

src/platforms/common/swdptap.c Outdated Show resolved Hide resolved
src/platforms/common/swdptap.c Outdated Show resolved Hide resolved
src/platforms/common/swdptap.c Show resolved Hide resolved
src/platforms/common/swdptap.c Outdated Show resolved Hide resolved
src/platforms/common/swdptap.c Outdated Show resolved Hide resolved
src/platforms/common/swdptap.c Outdated Show resolved Hide resolved
@dragonmux
Copy link
Member

We're working to tidy up and merge outstanding good quality PRs at the moment, and notice this one's not seen any movement since our review. If you're happy with us sorting the outstanding items and rebasing this on main, then we'll get cracking with that in the next week with the intention of landing this by mid month.

@tlyu
Copy link
Contributor Author

tlyu commented Jul 7, 2024

Catching up… I'll try to look at it this week as well. I first have to see if I can even get the tree to build on macOS at the moment (I remember hearing about some Meson problems).

@dragonmux
Copy link
Member

Sounds good! 🤞🏼 you can figure out the macOS build - there was a set of bug fixes pushed to main recently that solved some issues around BMDA on the fruity OS when doing a cross-build to build the firmware. Once rebased on main, that should be a reasonable process.

@tlyu
Copy link
Contributor Author

tlyu commented Jul 8, 2024

Got Meson to work on macOS. Native doesn't fit into ROM with the default config on main.

Still puzzling through at least one merge conflict.

@dragonmux
Copy link
Member

That sounds like some fairly reasonable progress. Which compiler version are you using for main to not be fitting (as far as we know, 12.2.Rel1 builds still fit)?

@tlyu
Copy link
Contributor Author

tlyu commented Jul 8, 2024

arm-none-eabi-gcc (Arm GNU Toolchain 13.2.rel1 (Build arm-13.7)) 13.2.1 20231009

@dragonmux
Copy link
Member

Ah, yes, that toolchain has worse codegen than 12.2.Rel1 so that makes sense of that - this is why we've held back on upgrading toolchains for a while now because the project's trying to squeeze every possible bit of space out at the moment..

tlyu added 2 commits July 12, 2024 09:35
Make SWD timing more consistent: output changes (including floating)
take place after the falling edge, and reads take place before the
rising edge.

This includes moving a gpio_get to occur after the clock-low delay,
which would help reduce incorrect reads from lower-bandwidth target
connections.

Eliminate redundant GPIO operations, typically ones that set the clock
low, by making the bit string input/output primitives always end by
outputting a falling clock edge.
Make clock cycle more symmetric, especially for `no_delay` variants.

`swdptap_seq_out_no_delay` seems to need the clock clear to be moved
back to the top of the loop, to ensure minimal delay between the falling
edge and the SWDIO change.

Also use a different loop counter that compiles to a `SUBS` `BCC`/`BCS`
pair, instead of using an explicit `CMP`.
@tlyu tlyu force-pushed the fix/swd-timing branch from 75aa822 to caf82d5 Compare July 12, 2024 15:13
@tlyu
Copy link
Contributor Author

tlyu commented Jul 12, 2024

Force push is to resolve merge conflicts. I'll check the other comments shortly.

Copy link
Member

@dragonmux dragonmux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All review items address, so LGTM - merging. Thank you for the contribution!

@dragonmux dragonmux merged commit 117028d into blackmagic-debug:main Jul 12, 2024
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BMP Firmware Black Magic Probe Firmware (not PC hosted software) Enhancement General project improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants