-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid UUID.randomUUID() in file system related startup code #5450
base: master
Are you sure you want to change the base?
Conversation
@@ -45,7 +45,7 @@ static File setupCacheDir(String fileCacheDir) { | |||
|
|||
// the cacheDir will be suffixed a unique id to avoid eavesdropping from other processes/users | |||
// also this ensures that if process A deletes cacheDir, it won't affect process B | |||
String cacheDirName = fileCacheDir + "-" + UUID.randomUUID(); | |||
String cacheDirName = fileCacheDir + "-" + System.nanoTime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nanoTime is not absolute - it's relative to the process. Meaning that another application starting can have it again, without any need to be simultaneous - is it what you expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nanoTime is not absolute - it's relative to the process
Correct. But we do think it can be problematic, I'm happy to use Random.getRandom()
or System.currentTimeMillis()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we use Math.random()
it get better - but is still not granted to be unique - because it still uses System::nanoTime and Random per se doesn't guarantee uniqueness across processes (try printing new Random(42).nextInt() running it twice with 2 diff processes...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure we are not looking for that such strong of a guarantee here, but I'll let the maintainers be the judge of that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about an optimistic attempt? Something like (simplifying):
for(;;) {
try {
String cacheDirName = fileCacheDir + "-" + System.nanoTime();
Files.createDirectories(cacheDirName);
break;
} catch(FileAlreadyExistException ignore) {
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, you could use a random instead of System.nanoTime
, I think it would be faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@franz1981 is Random.nextLong()
faster than System.nanoTime()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope - or better - usually nanoTime (if not on the cloud with unreliable time sources) uses a thing called rdts
which is as cheap as reading a memory area
@@ -36,7 +36,7 @@ public DefaultDeploymentManager(VertxImpl vertx) { | |||
} | |||
|
|||
private String generateDeploymentID() { | |||
return UUID.randomUUID().toString(); | |||
return Long.valueOf(System.nanoTime()).toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be globally unique when running in clustered mode with HA enabled
So perhaps something like:
if (vertx.isClustered() && vertx.haManager()!=null) {
return UUID.randomUUID().toString();
}
// Use a counter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty common to deploy verticles concurrently. Even when Vert.x is not clustered, the returned value should be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated it to use Random, is that what you meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant incrementing an AtomicLong
counter instead of using a random value (uniqueness is guaranteed and it shouldn't change the perf results you got)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, fixed
what seems to take time is the initialization of SecureRandom.getDfaultPrng due to loading providers, I think w ecould generate a faster UUID by using a given provider |
But those are not public APIs, no? |
I think we should have a way to specify the exact cache dir (e.g. |
Sure, that would make sense for us too |
b28445c
to
7c9fc8f
Compare
I have updated the PR per suggestions |
Is there anything else you would like me to do for this one? |
For usability, it seems to me adding a boolean to the options would be enough (it's what's computed in the end to determine if a UUID should be added to the path). But it's a matter of taste so I'm fine with keeping an extra dir option if you choose so @vietj |
Is there anything more you want me to do with this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @geoand
@vietj PTAL |
🙏🏽 |
vertx-core/src/main/java/io/vertx/core/file/FileSystemOptions.java
Outdated
Show resolved
Hide resolved
@@ -27,6 +28,8 @@ public class DefaultDeploymentManager implements DeploymentManager { | |||
|
|||
public static final Logger log = LoggerFactory.getLogger(DefaultDeploymentManager.class); | |||
|
|||
private static final AtomicLong nextId = new AtomicLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea of cache dir is to avoid that, and keep the same behaviour we have, so please no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what do you propose? This was added as a response to #5450 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you confused two changes @vietj : the cache dir that used a random UUID, and the verticle id generator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, can we have the verticle id generator in another PR then ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep distinct PR for the changelog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Distinct PR or commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR updated
8ba99cf
to
c749aaa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test with a real vertx instance and check those cases
- a missing dir is created
- an existing dir is reused
- an error is thrown when a non dir file exist already
Sure, I'll do that when I'm back from JFokus |
Aren't those cases already covered by the tests for |
good question, I don't know :-) |
That's what I understand from looking at FileResolverTestBase |
public void testGetTheExactCacheDirWithoutHacks() { | ||
String cacheDir = new FileResolverImpl(new FileSystemOptions().setExactFileCacheDir(cacheBaseDir + "-exact")).cacheDir(); | ||
if (cacheDir != null) { | ||
System.out.println(cacheDir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
println
@@ -486,4 +488,16 @@ public void testGetTheCacheDirWithoutHacks() { | |||
} | |||
} | |||
} | |||
|
|||
@Test | |||
public void testGetTheExactCacheDirWithoutHacks() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test should be moved to FileCacheTest
instead, FileResolverTestBase
tests the behaviour of resolver implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need tests that assesses the behaviour of creating a vertx instance when
- the cache dir already exists and is a directory (I guess it reuses the directory)
- the cache dir does not exist (it should create the missing directory)
- the cache dir string is not a valid value
- the cache dir does not exists and cannot be created, e.g. the parent path points to a file
- the cache dir exists but is not a directory, e.g. it is a file
I added all but |
This is done because bootstrapping the plumbing needed by the JDK to produce a UUID value is expensive, it thus doesn't make sense to pay this cost when the property isn't actually needed
// ensure that the argument doesn't end with separator | ||
if (fileCacheDir.endsWith(File.separator)) { | ||
fileCacheDir = fileCacheDir.substring(0, fileCacheDir.length() - File.separator.length()); | ||
} | ||
|
||
// the cacheDir will be suffixed a unique id to avoid eavesdropping from other processes/users | ||
// also this ensures that if process A deletes cacheDir, it won't affect process B | ||
String cacheDirName = fileCacheDir + "-" + UUID.randomUUID(); | ||
File cacheDir = new File(cacheDirName); | ||
File cacheDir = isEffectiveValue ? new File(fileCacheDir) : new File(fileCacheDir + "-" + UUID.randomUUID()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this change, break the comment above? when isEffectiveValue
is true, 2 vert.x instances will interfere with each other's cache. While this is probably ok for the same application, if 2 applications differ, then it could cause invalid states.
One example is (regardless if the 2 applications are the same or not) The moment the 1st terminates, it deletes the cache and would also mean it was deleted for the second, causing inconsistencies and errors.
Motivation:
This is done because bootstrapping the plumbing
needed by the JDK to produce a UUID value
is expensive, it thus doesn't make sense to
pay this cost when the property isn't actually
needed
Explain here the context, and why you're making that change, what is the problem you're trying to solve.
We are making an effort in Quarkus to improve startup time even further by eliminating various bottlenecks across the board.
The first call to
UUID.randomUUID()
is definitely heavy (as shown in the following flamegraph) and if we can avoid it a startup code (as we have in the development branch of Quarkus), it would be nice.P.S. Ideally we would like to have this in Vert.x 4 as well.