-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntcore] Fix memory leak due to missing deallocations before reference clearing #6439
[ntcore] Fix memory leak due to missing deallocations before reference clearing #6439
Conversation
if(_buf.len == 0) { | ||
_buf.Deallocate(); | ||
} | ||
} | ||
m_bufs.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does m_bufs.clear() do if already deallocated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand it has nothing to do with the deallocation since the Buffer
class does not have a destructor. Previously, this meant that all the references stored in the vector were lost without being properly deallocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspected it would be something along those lines, in that case this lgtm
After some more vigorous testing I can confirm that there are more leaks that this fix does not account for. Additionally, there is a rare case where an attempted deallocation for an invalid memory address occurs and As before,
I am no longer entirely sure what the correct solution to this is. The way that the code was written sort of implies that As previously mentioned, I do not entirely understand how this system is supposed to work, therefore I would appreciate some feedback as to what direction to take for fixing this. I can also close this pull request and create an issue instead if a solution is likely to require much more effort. |
I will take a look this weekend. Essentially for any buffers getting sent over the wire with write(), they should be getting freed in the write callback after the write completes. Any buffers that are NOT sent should either be retained (as they haven’t been sent yet, but will be in the future) or freed immediately (if they were sent immediately via TryWrite). uv::Buffer is not RAII as it is a light wrapper over a C structure that is passed back and forth with the libuv C API. It would be much less error prone to be RAII but the wrapping of the libuv C library doesn’t allow that. |
* fix the leak, includes all debug helpers * remove debugging for merge * formatting * FORMATTING
After digging much deeper I have found the true leak (my previous fix and "hopeful" results were likely because of a lower network utilization compared with the first test). The problem occurs after a failed The trick is that the span of frames returned by Edit: I just stress-tested the project for which this has been giving me issues for over an hour and had no leaks! |
Thanks for the detailed analysis and fix! |
* [ci] Upgrade to new macOS runner (wpilibsuite#6328) * [wpimath] Make units math functions constexpr (wpilibsuite#6345) * [wpimath] Feed forwards: Use correct 'k' value in error message (wpilibsuite#6360) * [build] Fix tcpsockets header publishing (wpilibsuite#6367) * [cscore] Use Raw for CvSink and CvSource (wpilibsuite#6364) Eventually we want to get to a point where we can remove OpenCV from the internals of cscore. The start to doing that is converting the existing CvSource and CvSink methods to RawFrame. For CvSource, this is 100% a free operation. We can do everything the existing code could have done (with one small exception we can fairly easily fix). For CvSink, by defaut this change would incur one extra copy, but no extra allocations. A set of direct methods were added to CvSink to add a method to avoid this extra copy. * [cscore] Add BGRA support (wpilibsuite#6365) * [apriltag] Fix field length in 2024 JSON (wpilibsuite#6373) Fixes wpilibsuite#6371 * [wpimath] ExponentialProfile: Return copy of input state (wpilibsuite#6370) As State is mutable, this avoids accidental modification of the passed-in object by the caller modifying the return value. * [hal,wpiutil] Error out of HAL_Initialize if SetupRioNow fails (wpilibsuite#6374) * [ntcore] Don't send value update to client setting value (wpilibsuite#6375) * [ntcore] Add hidden subscribe option (wpilibsuite#6376) This allows creating subscribers that aren't communicated with the network. * [sysid] Relax peak acceleration search (wpilibsuite#6378) * [apriltag] Add AprilTagFieldLayout.loadField() (wpilibsuite#6377) * [hal] HAL_RefreshDSData: Zero out control word on DS disconnect, use cache in sim (wpilibsuite#6380) * [glass] Fix FMS game data display and editing (wpilibsuite#6381) Also don't require Enter for editing game data or match time. * [wpiunits] Fix Distance class javadocs to state the correct dimension (NFC) (wpilibsuite#6363) * [ci] Upgrade wpiformat (wpilibsuite#6395) * [sysid] Fix position feedback latency compensation (wpilibsuite#6392) * [ci] Windows cmake: update vcpkg version (wpilibsuite#6397) We need fmtlib 10.2.1 to work around a compiler bug. Also, reducing the number of jobs is no longer required with Actions runner upgrades. * [wpiutil] Rate-limit FPGA error from Now() (wpilibsuite#6394) * [examples] Remove unused private variables (wpilibsuite#6403) * [wpiutil, hal] Crash on failure for SetupNowRio() and wpi::Now() when not configured (wpilibsuite#6417) This is an unrecoverable condition, so always terminate. * [sysid] Fix arm characterization crash (wpilibsuite#6422) Fixes wpilibsuite#6421. * [wpilibj] Call abort() on Rio on caught Java exception (wpilibsuite#6420) On Rio, we simply want to restart the robot program as quickly as possible, and don't want to risk a hang somewhere that will keep that from happening. The main downside of this is it won't wait for threads to finish (e.g. data logs won't get a final flush). * [wpimath] Add structured data support for DifferentialDriveWheelPositions (wpilibsuite#6412) * [wpimath] LinearSystemId: Don't throw if Kv = 0 (wpilibsuite#6424) That's just a system with no back-EMF. * [sysid] Fix crash on negative feedforward gains (wpilibsuite#6425) LinearSystemId's linear system factories throw on negative feedforward gains, but SysId can compute the feedback gains just fine in that case. Now we construct the system manually instead. Fixes wpilibsuite#6423. * [sim] GUI: Use shift to enable docking features (wpilibsuite#6429) * [hal] Raise SIGKILL instead of calling abort() (wpilibsuite#6427) We don't need to generate a core dump here if core dumps are enabled. * [hal] Use SIGKILL instead of SIGILL (wpilibsuite#6431) Fix typo. * [docs] Add docs for features not supported on PDH (NFC) (wpilibsuite#6436) * [sysid] Fix "Sample" docs typo (NFC) (wpilibsuite#6435) * [sysid] Fix wrong position Kd with unnormalized time (wpilibsuite#6433) * [ntcore] Fix memory leak in WebSocketConnection (wpilibsuite#6439) * [commands] Trigger: pass m_loop to new Trigger in composition functions (wpilibsuite#6441) * [ci] Work around asan actions bug (wpilibsuite#6442) * [wpiutil] Upgrade to LLVM 18.1.1 (wpilibsuite#6405) * [glass] Don't limit window name+label to 128 chars (wpilibsuite#6447) * [build] Upgrade to wpiformat 2024.33 (wpilibsuite#6449) This upgrades to clang-format and clang-tidy 18.1.1. This has the constructor attribute formatting fix, so we can remove our WPI_DEPRECATED macro. * [ci] Pin wpiformat version in comment command (wpilibsuite#6457) * [commands] Cache button and POV triggers This is a common footgun for teams. * [ci] Pin wpiformat version in comment command (wpilibsuite#6458) * Run java format (wpilibsuite#6462) * [wpiutil] DataLog: Don't constantly retry creating logs when low on space (wpilibsuite#6468) When low on space, a log file won't be created. This is detected as a "deletion", and the DataLog thread will continously try to create a log, fail to do so because of low space, detect it as a "deletion", and do so in a loop. If there's not enough space, the DataLog will be marked as stopped, preventing this infinite loop. Calls to start() will hit this code path and mark it as stopped again. * [wpimath] Make more LinearSystemId functions not throw if Kv = 0 (wpilibsuite#6465) * [wpimath] Add Pair.toString() (wpilibsuite#6463) * [ci] Fix 2023 docker image usage (wpilibsuite#6459) * Add reference to development to CONTRIBUTING.md (wpilibsuite#6467) * Revert "[commands] Cache button and POV triggers" Also revert the associated formatting commit. This was an accidental merge. This reverts commit ff929d4. This reverts commit 2392c9f. * [ci] Use mirror repository for liblzma (wpilibsuite#6499) Uses https://github.com/bminor/xz to work around suspended repository. We will revert this once vcpkg updates to point to an accessible repo. * [wpilib] LinearSystemSim: Add missing clamp function and getInput() (wpilibsuite#6493) * [wpimath] Support formatting Eigen array types (wpilibsuite#6496) * [wpilibj] Fix EncoderSim.setDistancePerPulse parameter name and comment (NFC) (wpilibsuite#6481) * MAINTAINERS.md: Remove reference to marketplace (wpilibsuite#6470) * [ci] Revert "Use mirror repository for liblzma (wpilibsuite#6499)" (wpilibsuite#6506) This reverts commit c46847b. * [wpimath] Document ChassisSpeeds::Discretize() math (NFC) (wpilibsuite#6509) * [build] Add exports to CMake subprojects (wpilibsuite#6505) This allows consuming allwpilib via FetchContent. * README.md: Link straight to contributing in contents (wpilibsuite#6525) Avoids need to click twice to get to contributing.md. * Update README-CMAKE.md (wpilibsuite#6522) * [examples] Fix memory over-allocation in Apriltag examples (wpilibsuite#6517) Change hamming distance to 1, add comment about memory usage. * [wpimath] Rotation2d: add Measure<Angle> getter (Java) (wpilibsuite#6492) * [commands] Fix double composition error truncation (wpilibsuite#6501) * [commands] WrappedCommand: Call wrapped command initSendable (wpilibsuite#6471) * [wpilibj] DataLogManager: Fix behavior when low on space (wpilibsuite#6486) Uses getUsableSpace in Java, matching how C++ determines available space (C++ calls it available, but they mean the same thing.) This fixes a bug where logs wouldn't get deleted due to incorrect available space detection. The DataLog thread now also checks if the state was marked as stopped after a call to StartLogFile. * [wpilib] Add flash update capability to ADI IMUs (wpilibsuite#6450) * [wpiunits] Add isNear function implementation (wpilibsuite#6396) This implementation uses a tolerance in the same units as the measure it checks. * [commands] WaitCommand: add Measure<Time> overload (wpilibsuite#6386) Also add waitTime() factory. * [apriltag] Cache layout loaded from AprilTagFields resource json (wpilibsuite#6385) * [commands] Add Trigger.onChange() (wpilibsuite#6390) * [wpiunits] Add Acceleration and MOI Units (wpilibsuite#6495) --------- Co-authored-by: Tyler Veness <calcmogul@gmail.com> Co-authored-by: fodfodfod <94200657+fodfodfod@users.noreply.github.com> Co-authored-by: Joe Wildfong <57462350+JoeWildfong@users.noreply.github.com> Co-authored-by: Thad House <ThadHouse@users.noreply.github.com> Co-authored-by: shueja <32416547+shueja@users.noreply.github.com> Co-authored-by: Peter Johnson <johnson.peter@gmail.com> Co-authored-by: Eli Barnett <emichaelbarnett@gmail.com> Co-authored-by: Joseph Eng <91924258+KangarooKoala@users.noreply.github.com> Co-authored-by: DeltaDizzy <EJBraswell@gmail.com> Co-authored-by: Dean Brettle <dean@brettle.com> Co-authored-by: sciencewhiz <sciencewhiz@users.noreply.github.com> Co-authored-by: Sam Richter <60528506+S1ink@users.noreply.github.com> Co-authored-by: person4268 <28717044+person4268@users.noreply.github.com> Co-authored-by: Gold856 <117957790+Gold856@users.noreply.github.com> Co-authored-by: Wispy <101812473+WispySparks@users.noreply.github.com> Co-authored-by: Isaac Turner <spacey_sooty@outlook.com> Co-authored-by: Ryan Blue <ryanzblue@gmail.com> Co-authored-by: Nicholas Armstrong <narmstro@warren.k12.in.us> Co-authored-by: Carl Hauser <chauser@users.noreply.github.com> Co-authored-by: HarryXChen <51322624+HarryXChen3@users.noreply.github.com> Co-authored-by: Starlight220 <53231611+Starlight220@users.noreply.github.com> Co-authored-by: Juan Jose Chong <juchong@users.noreply.github.com> Co-authored-by: vichik <54233741+vichik123@users.noreply.github.com> Co-authored-by: Jacob Hotz <77470805+Jacob1010-h@users.noreply.github.com>
UPDATE: Please see this comment.
I stumbled upon a memory leak having to do with calls to
nt::net::WebSocketConnection::AllocBuf()
withinnt::net::WebSocketConnection::write_impl()
(only seems to happen as a result of sending large binary buffers over NT). I'm not sure if this is the correct solution but it seemed to fix the issue, so feel free to make or suggest changes as necessary.For reference, here is a screenshot of the memory allocation trace - there is pretty obviously something wrong: