Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building with clang's libc++ fails #569

Closed
adisuissa opened this issue Nov 9, 2020 · 4 comments · Fixed by #573
Closed

Building with clang's libc++ fails #569

adisuissa opened this issue Nov 9, 2020 · 4 comments · Fixed by #573

Comments

@adisuissa
Copy link
Contributor

PR #549 modified Nighthawk's use of MonotonicTime to SystemTime, which causes a failure to build with clang's/llvm's libc++ (--config=libc++), and gives the following error:

ERROR: /home/.../git/nighthawk/source/client/BUILD:11:17: C++ compilation of rule '//source/client:nighthawk_client_lib' failed (Exit 1) clang failed: error executing command /home/.../clang10/bin/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -fcolor-diagnostics -fno-omit-frame-pointer '-stdlib=libc++' -MD -MF ... (remaining 765 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
In file included from source/client/process_impl.cc:1:
In file included from bazel-out/k8-fastbuild/bin/source/client/_virtual_includes/nighthawk_client_lib/client/process_impl.h:3:
In file included from /home/.../clang10/bin/../include/c++/v1/map:479:
In file included from /home/.../clang10/bin/../include/c++/v1/__tree:15:
/home/.../clang10/bin/../include/c++/v1/memory:3028:32: error: no matching constructor for initialization of 'Nighthawk::Client::ClientWorkerImpl'
    return unique_ptr<_Tp>(new _Tp(_VSTD::forward<_Args>(__args)...));
                               ^   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
source/client/process_impl.cc:188:29: note: in instantiation of function template specialization 'std::__1::make_unique<Nighthawk::Client::ClientWorkerImpl, Envoy::Api::Api &, Envoy::ThreadLocal::InstanceImpl &, std::__1::unique_ptr<Envoy::Upstream::ClusterManager, std::__1::default_delete<Envoy::Upstream::ClusterManager> > &, const Nighthawk::Client::BenchmarkClientFactoryImpl &, const Nighthawk::Client::TerminationPredicateFactoryImpl &, const Nighthawk::Client::SequencerFactoryImpl &, const Nighthawk::Client::RequestSourceFactoryImpl &, Envoy::Stats::ThreadLocalStoreImpl &, int &, std::__1::chrono::time_point<std::__1::chrono::system_clock, std::__1::chrono::duration<long long, std::__1::ratio<1, 1000000000> > >, std::__1::shared_ptr<Envoy::Tracing::HttpTracer> &, Nighthawk::Client::ClientWorkerImpl::HardCodedWarmupStyle>' requested here
    workers_.push_back(std::make_unique<ClientWorkerImpl>(
                            ^
bazel-out/k8-fastbuild/bin/source/client/_virtual_includes/nighthawk_client_lib/client/client_worker_impl.h:29:3: note: candidate constructor not viable: no known conversion from 'time_point<[...], duration<[...], ratio<[...], 1000000000>>>' to 'const time_point<[...], duration<[...], ratio<[...], 1000000>>>' for 10th argument
  ClientWorkerImpl(Envoy::Api::Api& api, Envoy::ThreadLocal::Instance& tls,
  ^
bazel-out/k8-fastbuild/bin/source/client/_virtual_includes/nighthawk_client_lib/client/client_worker_impl.h:25:7: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 12 were provided
class ClientWorkerImpl : public WorkerImpl, virtual public ClientWorker {
      ^
1 error generated.
Target //source/client:nighthawk_client_lib failed to build

After executing:

bazel build //source/client:nighthawk_client_lib --config=libc++

The cause seems to be the different resolution of system_clock by llvm and gcc:

Possible ways to solve:

  1. Revert, and change back to MonotonicTime.
  2. Use MonotonicTime in this part of the code.
  3. Change the required resolution in this code to milliseconds instead of nanoseconds.
  4. Change Envoy's definition of SystemTime to explicitly use a nanoseconds, i.e.:
    using SystemTime = std::chrono::time_point<std::chrono::system_clock, std::chrono::nanoseconds>;, and update the necessary places in Envoy's code to correctly cast when system_clock is used (std::chrono::time_point_cast<std::chrono::system_clock::duration>(time)).

@oschaaf Am I missing anything? What's your opinion about this?

@oschaaf
Copy link
Member

oschaaf commented Nov 9, 2020

Thanks for the great analysis. The reason for making this change is preparing for something like master...oschaaf:schedule-date. That would allow
one to set a specific schedule date/time for starting load test execution, and this would in turn be useful
when synchronizing the starts of multiple clients on the premise that machines have their system times
synchronized using external tooling. This functionality intersects with workers of a single client synchronizing
their start times, which happens in the code you pointed out.
Loosing precision in these paths seems suboptimal to me. I need to think about this for a bit to determine what
I think would be the best resolution.

@oschaaf
Copy link
Member

oschaaf commented Nov 11, 2020

After discussing on Slack with @dubious90, and mulling this over, I think the problem can be best described as that we're overloading two things.

  • Set a schedule for execution. A millisecond resolution for this is fine.
  • Set an inter-worker delay. This needs the highest resolution we can get, and therefore using monotonic time for this is probably more applicable now that we learned system time is problematic. The inter-worker start delay needs to be high-res, because it is used to inject an offset to worker starting times so that request-release timings get spread evenly across them.

In terms of the listed options to resolve above, I'd say this is best represented by 3) plus a change to decouple scheduling execution and offsetting worker-starts. I propose to address this as part of the forthcoming change that will add the actual scheduling feature.
@adisuissa does that sound good?

@adisuissa
Copy link
Contributor Author

Yes, I can see how this could fulfill the requirement for synchronized start across machines.
SGTM.

@kushthedude
Copy link
Contributor

Does this mean we are going to see scheduling of load-tests in nighthawk soon @oschaaf?

oschaaf added a commit to oschaaf/nighthawk that referenced this issue Nov 16, 2020
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
dubious90 pushed a commit that referenced this issue Dec 7, 2020
Adds a new schedule option to the proto api., which allows specifying a date/time
at which execution should start.

Should also fix #569 by reducing reliance on `Envoy::SystemTime` to a minimum:
Based on the value schedule option, a monotonic time will be computed
and propagated. This makes us use `Envoy::MonotonicTime` again in places
where sub ms resolution is a must have. (The fix will need confirmation, as I
couldn't get `--config=libc++` to work for other reasons.)

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
mum4k pushed a commit that referenced this issue Dec 11, 2020
List of changes:

- Update Envoy to 8188e232a9e0b15111d30f4724cbc7bf77d3964a (Dec 8th)
- Sync .bazelrc
- Modify extensions_build_config.bzl, add required DISABLED_BY_DEFAULT_EXTENSIONS
- Rework our h1 connection pool derivation to keep our own prefetching feature working.
   Note: We enable the allow_prefetch runtime feature for this.
- Changes to reflect changed Envoy constructors and method signatures: 
   OnPoolReady(), ProdClusterManagerFactory(), allocateConnPool() 
- Modified log filtering in the integration test: `SIGTERM` -> `ENVOY_SIGTERM`.
- Dropped include of `ares.h` as small cleanup, we no longer need that.
- In `options_impl.cc` there's a small change to how we define `elapsed_since_epoch` to unbreak
   building with `libc++`: a regression of #569.
   Filed #594 to avoid regression.

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants