Skip to content

Conversation

@SolidWallOfCode
Copy link
Member

Remove and replace ts::Buffer.

@SolidWallOfCode SolidWallOfCode added this to the 10.0.0 milestone May 1, 2023
@SolidWallOfCode SolidWallOfCode self-assigned this May 1, 2023
@bryancall bryancall requested a review from bneradt May 1, 2023 22:12
@bneradt
Copy link
Contributor

bneradt commented May 1, 2023

It seems like somehow your PR is impacting the behavior of the immediate shutdown oriented tests. The ones that misconfigure ATS then expect certain logs and response codes:

https://ci.trafficserver.apache.org/job/Github_Builds/job/autest/6935/console

Test : Checking that ReturnCode == 33 - Failed
        Reason: Returned Value -2 != 33
     Time-Out : Process finishes within expected time - Passed
        Reason: Returned value: 1.0000324249267578 < 600.0
     file /tmp/sandbox/emergency/ts/log/diags.log : should contain "testing emergency shutdown" - Failed
        Reason: Cannot read /tmp/sandbox/emergency/ts/log/diags.log: [Errno 2] No such file or directory: '/tmp/sandbox/emergency/ts/log/diags.log'
     file /tmp/sandbox/emergency/ts/log/traffic.out : should NOT contain "failed to shutdown" - Failed
        Reason: Cannot read /tmp/sandbox/emergency/ts/log/traffic.out: [Errno 2] No such file or directory: '/tmp/sandbox/emergency/ts/log/traffic.out'

Note the difference in return code (-2 instead of 33) and the diags.log and traffic.out files are not there. These tests are generally pretty reliable in master - your other recent PRs haven't failed these tests.

Unfortunately, the archived sandbox doesn't have much information, since none of the logs were emitted:

https://ci.trafficserver.apache.org/job/Github_Builds/job/autest/6935/artifact/output/9664/sandbox/emergency/

Copy link
Contributor

@bneradt bneradt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to investigate why the shutdown autests are having issues in CI with this PR.

@SolidWallOfCode
Copy link
Member Author

@bneradt - actually, they've all failed and are now successful only because of retries.

@bneradt
Copy link
Contributor

bneradt commented May 2, 2023

@bneradt - actually, they've all failed and are now successful only because of retries.

Exactly, some passed because of a retry. This is what I notice:

  • Your first run for this PR had some shutdown test failures.
  • Your second had a shutdown failure which I analyzed and commented upon above.
  • You have three or so other PRs you filed today which have had clean runs without any retries.

My observation is that it seems like this patch is causing issues with some of the shutdown tests (fatal.test.py, emergency.test.py, etc.). I'm not sure why that's the case, but I don't want us to just re-run autest a bunch of times, get this merged in once all the tests pass by happenstance, then destabilize those tests for the rest of CI.

@bneradt
Copy link
Contributor

bneradt commented May 2, 2023

@bneradt - actually, they've all failed and are now successful only because of retries.

Maybe I misunderstood you. When you say they all failed are you referring to CI for the other PR's?

@SolidWallOfCode
Copy link
Member Author

SolidWallOfCode commented May 2, 2023

Yes, for every one of these PRs I have had to run CI multiple times. For instance #9661 took three tries. Note on this run, it was a failure on Rocky - all of the AuTests passed.

@bneradt
Copy link
Contributor

bneradt commented May 2, 2023

Yes, for every one of these PRs I have had to run CI multiple times. For instance #9661 took three tries. Note on this run, it was a failure on Rocky - all of the AuTests passed.

Ah, OK. Then I take back my comment. This PR isn't the problem then. Clearly we'll need to figure out what has caused CI to not be stable anymore. It was running pretty consistently for the last couple months.

@bneradt
Copy link
Contributor

bneradt commented May 2, 2023

[approve ci rocky]

Copy link
Contributor

@bneradt bneradt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks fine as is. But when reviewing the code, doesn't this need to call free on these sowc::memSpan elements?

https://github.com/apache/trafficserver/pull/9664/files#diff-222342fe7ac7184aa61268fff5299a17c55b3984daeba4ee796ba8ab9ca482cfR445

Clearing the vector isn't going to free them. Or are they free'd elsewhere?

@SolidWallOfCode
Copy link
Member Author

That looks like a leak. I suspect we don't see it in the field because

  • Those structures are process static, so it's only "leaked" on process shutdown.
  • I'm not sure anything actually uses the multiple text matcher.

The correct solution is to leverage YAML to get rid of the entire ControlMatcher library, but short of that it would be better to put the text in a std::string member and have views into that, rather than using ats_strdup.

Copy link
Contributor

@bneradt bneradt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks fine. Again, you didn't introduce the leak here.

We can address the leak in a separate PR if you prefer. Using a std::string as you suggest makes sense. Or we can loop over the vector and call free on them.

@SolidWallOfCode
Copy link
Member Author

No, looping over the vector is the wrong approach. It validates doing separate allocations for each string, instead of one for all of them. Don't you remember @zwoop was whining about exactly that at the last PR review?

@SolidWallOfCode SolidWallOfCode merged commit 1572373 into apache:master May 2, 2023
cmcfarlen pushed a commit to cmcfarlen/trafficserver that referenced this pull request Jun 3, 2024
* asf/master: (33 commits)
  Add error log for invalid OCSP response (apache#9674)
  Add new settings to specify TLS versions to use (apache#9667)
  Remove flask from tests/Pipfile (apache#9688)
  Doc: Add example of --enable-lto build option with LLVM (apache#9654)
  Added Zhengxi to the asf contributors (apache#9685)
  Don't build native QUIC implementation (apache#9670)
  Stabilize autest tls_hooks17 (apache#9671)
  Cleanup: remove ts::buffer from HostDB. (apache#9677)
  Fix leak in MultiTextMod in ControlBase. (apache#9675)
  Cleanup: remove TsBuffer.h from MIME.cc (apache#9661)
  Cleanup: remove ts::Buffer from ControlBase. (apache#9664)
  setup pre-commit hook at cmake generation time (apache#9669)
  Updated parent retry attempt logic (apache#9620)
  Try to do less work in hot function HttpHookState::getNext (apache#9660)
  cmake build, fixed warning using older openssl APIs (apache#9648)
  rename attempts to retry_attempts (apache#9655)
  OCSP: Fix unitialized variable error. (apache#9662)
  Add support for CONNECT method on HTTP/2 connection (apache#9616)
  Remove deprecated debug output functions from 10 source files. (apache#9657)
  Remove deprecated debug output functions from 14 source files. (apache#9658)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants