Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed caching: hazelcast_node argument prints WARNING, does nothing #1414

Closed
dfabulich opened this issue Jun 16, 2016 · 6 comments
Closed

Comments

@dfabulich
Copy link
Contributor

Consider this repository. https://github.com/dfabulich/hazelcast-test

It has two targets, one artificially slow target, and another target that depends on it.

genrule(
    name='x',
    outs=['x.txt'],
    cmd='sleep 5; echo hello > $@',
)

genrule(
    name='y',
    srcs=['x'],
    outs=['y.txt'],
    cmd='cp $(location x) $@',
)

tools/bazel.rc instructs it to use a hazelcast node for distributed artifact caching, following the instructions here, adding --genrule_strategy=remote per the instructions on issue #1412: https://raw.githubusercontent.com/bazelbuild/bazel/79adf59e2973754c8c0415fcab45cd58c7c34697/src/main/java/com/google/devtools/build/lib/remote/README.md

build --hazelcast_node=127.0.0.1:5701 --spawn_strategy=remote --genrule_strategy=remote
java -jar hazelcast-3.5.4.jar &

bazel clean && bazel build :y
bazel clean && bazel build :y

Expected: The second run should be fast, as it retrieves the built x file from the cache.
Actual: It sleeps 5s even with the cache. The log includes a warning message.

$ bazel clean && bazel build :y
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
INFO: Waiting for response from Bazel server (pid 31149)...
INFO: Found 1 target...
WARNING: Genrule Cannot instantiate remote action cache. Running locally.
WARNING: Genrule Cannot instantiate remote action cache. Running locally.
Target //:y up-to-date:
  bazel-genfiles/y.txt
INFO: Elapsed time: 13.006s, Critical Path: 5.02s

@hhclam

@dfabulich
Copy link
Contributor Author

There's something weird happening here. It was working on my machine, or at least, better than this, for a few minutes. But then, I dunno, something happened, and now it always fails.

I can't even repro #1413 which I filed earlier because this bug #1414 blocks it.

@dfabulich
Copy link
Contributor Author

Yeah, I've got two supposedly identical EC2 machines, and it works on one of them, fails on the other.

@hermione521 hermione521 added type: bug P3 We're not considering working on this, but happy to review a PR. (No assignee) category: performance labels Jun 16, 2016
@hermione521
Copy link
Contributor

Also cc @philwo

@hermione521 hermione521 added under investigation and removed category: performance P3 We're not considering working on this, but happy to review a PR. (No assignee) type: bug labels Jun 16, 2016
@philwo
Copy link
Member

philwo commented Jun 20, 2016

This warning means that remote caching does not work: "Cannot instantiate remote action cache. Running locally."

I'm not sure how this can happen. What's the value of your --hazelcast_node flag? Are you sure that the machine that prints this error can connect to the host that is specified in there? (Maybe EC2's firewall or firewall on the host running the Hazelcache node is in the way, or maybe the Hazelcast node only listens to localhost?)

(FYI @hhclam)

@dslomov
Copy link
Contributor

dslomov commented Jun 28, 2016

@dfabulich any updates on this? Closing for now, but reply if you have more information.

@dslomov dslomov closed this as completed Jun 28, 2016
@dfabulich
Copy link
Contributor Author

I've struggled to reproduce this, but I have found something odd that I think may help me to reproduce it.

When the hazelcast server is down entirely, I can reproduce the problem with the Bazel 0.3.0 deb install on Ubuntu Vivid, but not when I build the 0.3.0 tag locally.

With the official Bazel 0.3.0 deb install on Ubuntu Vivid:

Build label: 0.3.0
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Jun 10 11:38:23 2016 (1465558703)
Build timestamp: 1465558703
Build timestamp as int: 1465558703

When I run bazel build :y with https://github.com/dfabulich/hazelcast-test without launching Hazelcast at all, I see this warning in the logs:

$ bazel build :y
.
INFO: Waiting for response from Bazel server (pid 117542)...
INFO: Found 1 target...
WARNING: Genrule Cannot instantiate remote action cache. Running locally.
WARNING: Genrule Cannot instantiate remote action cache. Running locally.
Target //:y up-to-date:
  bazel-genfiles/y.txt
INFO: Elapsed time: 13.121s, Critical Path: 5.01s

In an effort to try to debug this, I tried doing a local build of bazel, like this:

$ git clone https://github.com/bazelbuild/bazel.git
$ cd bazel
$ git checkout 0.3.0
$ git checkout -b 0.3.0
$ bazel version
Build label: 0.3.0
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Jun 10 11:38:23 2016 (1465558703)
Build timestamp: 1465558703
Build timestamp as int: 1465558703
$ bazel build //src:bazel
$ cd bazel-bin/src
$ export PATH=`pwd`:$PATH
$ bazel version
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Jan 01 00:00:00 1970 (0)
Build timestamp: Thu Jan 01 00:00:00 1970 (0)
Build timestamp as int: 0
$ cd ../hazelcast-test
$ bazel build :y
Found non-responsive server process (pid=118475). Killing it.
.
INFO: Waiting for response from Bazel server (pid 118572)...
bazel crash in async thread:
java.lang.IllegalStateException: Unable to connect to any address in the config! The following addresses were tried:[/127.0.0.1:5701]
    at com.hazelcast.client.spi.impl.ClusterListenerSupport.connectToOne(ClusterListenerSupport.java:215)
    at com.hazelcast.client.spi.impl.ClusterListenerSupport.connectToCluster(ClusterListenerSupport.java:148)
    at com.hazelcast.client.spi.impl.ClientClusterServiceImpl.start(ClientClusterServiceImpl.java:183)
    at com.hazelcast.client.impl.HazelcastClientInstanceImpl.start(HazelcastClientInstanceImpl.java:262)
    at com.hazelcast.client.HazelcastClient.newHazelcastClient(HazelcastClient.java:86)
    at com.google.devtools.build.lib.remote.HazelcastCacheFactory.create(HazelcastCacheFactory.java:41)
    at com.google.devtools.build.lib.remote.RemoteModule.buildStarting(RemoteModule.java:71)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:95)
    at com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:154)
    at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:80)
    at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:456)
    at com.google.common.eventbus.Subscriber.dispatchEvent(Subscriber.java:76)
    at com.google.common.eventbus.Dispatcher$PerThreadQueuedDispatcher.dispatch(Dispatcher.java:119)
    at com.google.common.eventbus.EventBus.post(EventBus.java:215)
    at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:155)
    at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:344)
    at com.google.devtools.build.lib.runtime.commands.BuildCommand.exec(BuildCommand.java:71)
    at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:478)
    at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:316)
    at com.google.devtools.build.lib.runtime.CommandExecutor.exec(CommandExecutor.java:49)
    at com.google.devtools.build.lib.server.RPCService.executeRequest(RPCService.java:70)
    at com.google.devtools.build.lib.server.AfUnixServer.executeRequest(AfUnixServer.java:411)
    at com.google.devtools.build.lib.server.AfUnixServer.serve(AfUnixServer.java:215)
    at com.google.devtools.build.lib.runtime.BlazeRuntime.serverMain(BlazeRuntime.java:868)
    at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:650)
    at com.google.devtools.build.lib.bazel.BazelMain.main(BazelMain.java:56)
Error: unexpected EOF from Bazel server.
Contents of '/mnt/bamboo-ebs/bazel-cache/bazel/_bazel_bamboo/6b8b6f341a63448110098a124eda529a/server/jvm.out':
java.lang.IllegalStateException: Unable to connect to any address in the config! The following addresses were tried:[/127.0.0.1:5701]
    at com.hazelcast.client.spi.impl.ClusterListenerSupport.connectToOne(ClusterListenerSupport.java:215)
    at com.hazelcast.client.spi.impl.ClusterListenerSupport.connectToCluster(ClusterListenerSupport.java:148)
    at com.hazelcast.client.spi.impl.ClientClusterServiceImpl.start(ClientClusterServiceImpl.java:183)
    at com.hazelcast.client.impl.HazelcastClientInstanceImpl.start(HazelcastClientInstanceImpl.java:262)
    at com.hazelcast.client.HazelcastClient.newHazelcastClient(HazelcastClient.java:86)
    at com.google.devtools.build.lib.remote.HazelcastCacheFactory.create(HazelcastCacheFactory.java:41)
    at com.google.devtools.build.lib.remote.RemoteModule.buildStarting(RemoteModule.java:71)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:95)
    at com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:154)
    at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:80)
    at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:456)
    at com.google.common.eventbus.Subscriber.dispatchEvent(Subscriber.java:76)
    at com.google.common.eventbus.Dispatcher$PerThreadQueuedDispatcher.dispatch(Dispatcher.java:119)
    at com.google.common.eventbus.EventBus.post(EventBus.java:215)
    at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:155)
    at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:344)
    at com.google.devtools.build.lib.runtime.commands.BuildCommand.exec(BuildCommand.java:71)
    at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:478)
    at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:316)
    at com.google.devtools.build.lib.runtime.CommandExecutor.exec(CommandExecutor.java:49)
    at com.google.devtools.build.lib.server.RPCService.executeRequest(RPCService.java:70)
    at com.google.devtools.build.lib.server.AfUnixServer.executeRequest(AfUnixServer.java:411)
    at com.google.devtools.build.lib.server.AfUnixServer.serve(AfUnixServer.java:215)
    at com.google.devtools.build.lib.runtime.BlazeRuntime.serverMain(BlazeRuntime.java:868)
    at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:650)
    at com.google.devtools.build.lib.bazel.BazelMain.main(BazelMain.java:56)

Expected: When the Hazelcast node is down, both servers should respond in the same way, throwing an exception on startup.
Actual: The Bazel 0.3.0 deb build just logs a warning with no explanation of what's wrong; the local build throws a clear exception as expected/desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants