-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeouts in Testing involving Worker Manager and Async Operations #178
Comments
This seems to be related to issues in PolykeyAgent tests. Issues in this cause issues elsewhere. |
@scottmmorris can you add in your notes regarding what the problem is here too. And what you have tried, as well as @DrFacepalm to add in his suggestions. There's a nuclear option to just test the domain individually:
But there may be a way to fix jest properly to be able to do a full Try search on jest docs/configuration/issues about problems involving lots of asynchronous tests and timeout errors.
Also check if there are async operations you are forgetting to await for: function f() {
await asyncf();
doSomethingAsync(); // <- PROBLEM!
await didSomethingelse();
} |
Try to find the deadlocks or where 2 test files/test domains are affecting each other's performance. Also have a read about the jest process architecture. |
I am going to try and restart my laptop and run these tests again because it seems as if something from the 2nd or 3rd tests has caused failures in subsequent tests. |
Restarting the laptop did not fix anything, running the bin tests by themselves resulted in failure. It seems like pinpointing where the source of the async test conflict might be very difficult. I might start to look into issues on jest docs. |
Working on logging start and finishing for each test suite and realised that for the GitBackend tests the before and after each hooks were not inside the describe wrapper. I read somewhere in the jest issues that this condition has sometimes lead to similar async issues. Will continue to update for the logging and report on the progress. EDIT: No difference after changing the 'before/after' hooks but that was a bit hopeful anyway |
So after locking through the configuration file here I found a config option called |
For me to figure this out, I'm going to to have to start off from the domains that I know well. That means:
So I'm going to try get all these tests running at the same time. |
I don't think we are using jest mocks anywhere as far as I know so I don't think this will affect anything. |
Ok here's the sanity test that proves what By default Jest configures this timeout to be If you have a test that looks like this (using test.only('somedumbthing', async () => {
await sleep(6000);
}); This will result in an timeout error that looks like:
Therefore the timeout here does not affect the total runtime of testing, but just the time it takes to run 1 asynchronous test. So if 2 asynchronous tests take 4 seconds each, that's fine. The default 5 seconds is a bit low, so we raise to 10 seconds. However it is possible to set the timeout on a per-test basis if we believe that a specific test will take longer than usual. |
To give specific tests a specific timeout that is not the default, use the third parameter of the Example given here: jestjs/jest#5055 (comment)
|
So the first thing I'm going to do is to reset If their test requires more time, either break it up into smaller tests, and if not possible, then use a third parameter to specify a longer length of time. |
Each test file should be affecting each other's timeout... that's the other important thing. |
So for example above, all the time is within 5 seconds. I reckon a rule of thumb. If a test breaks the 5 seconds limit, then it is likely to double and go over 10 seconds. In that case, give it a 20 second timer to complete. So we will need to do that for all tests! |
The other test I had to do is to see if test files are run in parallel, while tests within a test file are run serially. This sanity check proves this is the case.
The way I did did this is by adding Running Showed that Then for each test inside the test file, they were run serially. So in terms of test interference, it's possible that test files running in parallel will slowdown the execution of each test which could change the runtime performance which could affect the timeout issues. One way to change this is to use the option |
This:
Demonstrates that using |
Note that:
This means jest uses by default the number of workers equal to your core count minus 1. So on my big laptop that's |
I have a hunch that the most troublesome tests are going to be ones involving the networking tests using utp-native. |
I've noticed that when running more tests, each individual test can double in the time taken to execute. For example:
The initial run I had:
The last run I did:
Therefore the addition of keys domain into the testing is so heavy that almost all other tests when run in parallel start to double in their execution time. |
So based on the rule of thumb I can see that a few tests require special time doubling as now they have hit 5 second mark.
These are the ones to double:
|
Ok I've made a realization. Our symmetric encryption and decryption is incredibly slow. This significantly slows all DB access including get and put operations. This requires further investigation since this affects ALL domains. For example:
Takes at least 2 seconds to execute... This few get, put and delete operations should be ALOT faster than 2 seconds. |
We seem to be having problems with the After some digging I found inside // default timeout per test
// some tests may take longer in which case you should specify the timeout
// explicitly for each test by using the third parameter of test function
jest.setTimeout(global.defaultTimeout); this seems to be overriding the timeout parameter set for each test. I don't know whats going on here since the timeout parameter seemed to be working before. Do you know what is happening here? @CMCDragonkai |
Look at the tests I did above, that worked fine. And there was no such problem as you mention. |
Make sure you're own the same dependency as we are. Maybe the change in pkg.nix has changed things. |
The tests above don't include any bin tests which usually exceed the default timeout of 10000ms. |
Pretty sure I did hit the timeout even without the bin tests. That's why
I added in some overrides for specific test cases that did it...
Can you reproduce in a smaller example like demo lib where the per-test
timeouts are being overridden by the global timeout setting?
…On 8/6/21 2:25 PM, Brian Botha wrote:
The tests above don't include any bin tests which usually exceed the
default timeout of 10000ms.
I'm not sure the tests above would've revealed this issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#178 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAE4OHKEXF6JMFEDFO7KXL3T3NP27ANCNFSM452EL5TQ>.
|
I'll look at that in a minuet. just finishing off some small changes for |
Do a sanity check on this with a single setTimeout test. And then fiddle with the parameters to clarify the behaviour here. |
To optimise the test cases:
|
Also try tuning the networking delay for the punch packets for optimising the startup of forward and reverse proxy during testing. |
To add to the discussion, there's 2 indicators that the jest test timeout parameter isn't working as expected:
However, I believe @tegefaulkes also tested the case of setting the global as a larger value and the passed parameter as a smaller value, and it still didn't override (at least in the error message). |
Similarly, trying to test the import type { NodeId, NodeAddress } from '@/nodes/types';
import os from 'os';
import path from 'path';
import fs from 'fs';
import Logger, { LogLevel, StreamHandler } from '@matrixai/logger';
import * as utils from './utils';
import * as testUtils from './utils';
import { PolykeyAgent } from '@';
const logger = new Logger('pkWithStdio Test', LogLevel.WARN, [
new StreamHandler(),
]);
let senderDataDir: string, receiverDataDir: string;
let senderNodePath: string, receiverNodePath: string;
let senderPasswordFile: string, receiverPasswordFile: string;
let senderPolykeyAgent: PolykeyAgent, receiverPolykeyAgent: PolykeyAgent;
let senderNodeId: NodeId, receiverNodeId: NodeId;
function genCommandsSender(options: Array<string>) {
return ['notifications', ...options, '-np', senderNodePath];
}
function genCommandsReceiver(options: Array<string>) {
return ['notifications', ...options, '-np', receiverNodePath];
}
describe('CLI Notifications', () => {
beforeEach(async () => {
senderDataDir = await fs.promises.mkdtemp(
path.join(os.tmpdir(), 'polykey-test-'),
);
receiverDataDir = await fs.promises.mkdtemp(
path.join(os.tmpdir(), 'polykey-test-'),
);
senderNodePath = path.join(senderDataDir, 'sender');
receiverNodePath = path.join(receiverDataDir, 'receiver');
senderPasswordFile = path.join(senderDataDir, 'passwordFile');
receiverPasswordFile = path.join(senderDataDir, 'passwordFile');
await fs.promises.writeFile(senderPasswordFile, 'password');
await fs.promises.writeFile(receiverPasswordFile, 'password');
senderPolykeyAgent = new PolykeyAgent({
nodePath: senderNodePath,
logger: logger,
});
receiverPolykeyAgent = new PolykeyAgent({
nodePath: receiverNodePath,
logger: logger,
});
await senderPolykeyAgent.start({
password: 'password',
});
await receiverPolykeyAgent.start({
password: 'password',
});
senderNodeId = senderPolykeyAgent.nodes.getNodeId();
receiverNodeId = receiverPolykeyAgent.nodes.getNodeId();
await senderPolykeyAgent.nodes.setNode(receiverNodeId, {
ip: receiverPolykeyAgent.revProxy.getIngressHost(),
port: receiverPolykeyAgent.revProxy.getIngressPort(),
} as NodeAddress);
// Authorize session
await utils.pk([
'agent',
'unlock',
'-np',
senderNodePath,
'--password-file',
senderPasswordFile,
]);
await utils.pk([
'agent',
'unlock',
'-np',
receiverNodePath,
'--password-file',
receiverPasswordFile,
]);
await receiverPolykeyAgent.notifications.clearNotifications();
});
afterEach(async () => {
await senderPolykeyAgent.stop();
await receiverPolykeyAgent.stop();
await fs.promises.rmdir(senderDataDir, { recursive: true });
await fs.promises.rmdir(receiverDataDir, { recursive: true });
});
describe('commandSendNotification', () => {
test('Should send notification with permission.', async () => {
await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
gestalt: {
notify: null,
},
vaults: {},
});
const commands = genCommandsSender(['send', receiverNodeId, 'msg']);
const result = await testUtils.pkWithStdio(commands);
expect(result.code).toBe(0); // Succeeds
expect(result.stdout).toContain('msg');
const notifications =
await receiverPolykeyAgent.notifications.readNotifications();
expect(notifications).toContain('msg');
}, global.defaultTimeout * 5);
test('Should send notification without permission.', async () => {
await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
gestalt: {},
vaults: {},
});
const commands = genCommandsSender(['send', receiverNodeId, 'msg']);
const result = await testUtils.pkWithStdio(commands);
expect(result.code).toBe(0); // Succeeds
expect(result.stdout).toContain('msg');
const notifications =
await receiverPolykeyAgent.notifications.readNotifications();
expect(notifications).toEqual([]); // Notification should be sent but not received
}, global.defaultTimeout * 5);
});
describe('commandReadNotifications', () => {
test('Should read all notifications.', async () => {
await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
gestalt: {
notify: null,
},
vaults: {},
});
const senderCommands1 = genCommandsSender([
'send',
receiverNodeId,
'msg1',
]);
const senderCommands2 = genCommandsSender([
'send',
receiverNodeId,
'msg2',
]);
const senderCommands3 = genCommandsSender([
'send',
receiverNodeId,
'msg3',
]);
await testUtils.pkWithStdio(senderCommands1);
await testUtils.pkWithStdio(senderCommands2);
await testUtils.pkWithStdio(senderCommands3);
const receiverCommands = genCommandsReceiver(['read']);
const result1 = await testUtils.pkWithStdio(receiverCommands);
expect(result1.code).toBe(0);
expect(result1.stdout).toContain('msg1');
expect(result1.stdout).toContain('msg2');
expect(result1.stdout).toContain('msg3');
const senderCommands4 = genCommandsSender([
'send',
receiverNodeId,
'msg4',
]);
await testUtils.pkWithStdio(senderCommands4);
const result2 = await testUtils.pkWithStdio(receiverCommands);
expect(result2.code).toBe(0);
expect(result2.stdout).toContain('msg1');
expect(result2.stdout).toContain('msg2');
expect(result2.stdout).toContain('msg3');
expect(result2.stdout).toContain('msg4');
}, global.defaultTimeout * 5);
test('Should read all unread notifications.', async () => {
await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
gestalt: {
notify: null,
},
vaults: {},
});
const senderCommands1 = genCommandsSender([
'send',
receiverNodeId,
'msg1',
]);
const senderCommands2 = genCommandsSender([
'send',
receiverNodeId,
'msg2',
]);
const senderCommands3 = genCommandsSender([
'send',
receiverNodeId,
'msg3',
]);
await testUtils.pkWithStdio(senderCommands1);
await testUtils.pkWithStdio(senderCommands2);
await testUtils.pkWithStdio(senderCommands3);
const receiverCommands1 = genCommandsReceiver(['read']);
await testUtils.pkWithStdio(receiverCommands1);
const senderCommands4 = genCommandsSender([
'send',
receiverNodeId,
'msg4',
]);
await testUtils.pkWithStdio(senderCommands4);
const receiverCommands2 = genCommandsReceiver(['read', '--unread']);
const result = await testUtils.pkWithStdio(receiverCommands2);
expect(result.code).toBe(0);
expect(result.stdout).not.toContain('msg1'); // previously read notifications should be ignored
expect(result.stdout).not.toContain('msg2');
expect(result.stdout).not.toContain('msg3');
expect(result.stdout).toContain('msg4');
}, global.defaultTimeout * 5);
test('Should read a fixed number of notifications.', async () => {
await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
gestalt: {
notify: null,
},
vaults: {},
});
const senderCommands1 = genCommandsSender([
'send',
receiverNodeId,
'msg1',
]);
const senderCommands2 = genCommandsSender([
'send',
receiverNodeId,
'msg2',
]);
const senderCommands3 = genCommandsSender([
'send',
receiverNodeId,
'msg3',
]);
await testUtils.pkWithStdio(senderCommands1);
await testUtils.pkWithStdio(senderCommands2);
await testUtils.pkWithStdio(senderCommands3);
const receiverCommands = genCommandsReceiver(['read', '-n', '2']);
const result = await testUtils.pkWithStdio(receiverCommands);
expect(result.code).toBe(0);
expect(result.stdout).not.toContain('msg1'); // oldest notification not included
expect(result.stdout).toContain('msg2');
expect(result.stdout).toContain('msg3');
}, global.defaultTimeout * 5);
test('Should read notifications in reverse order.', async () => {
await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
gestalt: {
notify: null,
},
vaults: {},
});
const senderCommands1 = genCommandsSender([
'send',
receiverNodeId,
'msg1',
]);
const senderCommands2 = genCommandsSender([
'send',
receiverNodeId,
'msg2',
]);
const senderCommands3 = genCommandsSender([
'send',
receiverNodeId,
'msg3',
]);
await testUtils.pkWithStdio(senderCommands1);
await testUtils.pkWithStdio(senderCommands2);
await testUtils.pkWithStdio(senderCommands3);
const receiverCommands = genCommandsReceiver([
'read',
'-u',
'true',
'-n',
'1',
'-o',
'oldest',
]);
const result1 = await testUtils.pkWithStdio(receiverCommands);
const result2 = await testUtils.pkWithStdio(receiverCommands);
const result3 = await testUtils.pkWithStdio(receiverCommands);
expect(result1.code).toBe(0);
expect(result2.code).toBe(0);
expect(result3.code).toBe(0);
expect(result1.stdout).toContain('msg1');
expect(result2.stdout).toContain('msg2');
expect(result3.stdout).toContain('msg3');
}, global.defaultTimeout * 5);
test('Should read no notifications.', async () => {
const receiverCommands = genCommandsReceiver(['read']);
const result = await testUtils.pkWithStdio(receiverCommands);
expect(result.code).toBe(0);
expect(result.stdout).toEqual('No notifications to display\n');
}, global.defaultTimeout * 5);
});
describe('commandClearNotifications', () => {
test('Should remove all notifications.', async () => {
await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
gestalt: {
notify: null,
},
vaults: {},
});
const senderCommands1 = genCommandsSender([
'send',
receiverNodeId,
'msg1',
]);
const senderCommands2 = genCommandsSender([
'send',
receiverNodeId,
'msg2',
]);
const senderCommands3 = genCommandsSender([
'send',
receiverNodeId,
'msg3',
]);
await testUtils.pkWithStdio(senderCommands1);
await testUtils.pkWithStdio(senderCommands2);
await testUtils.pkWithStdio(senderCommands3);
const receiverCommandsClear = genCommandsReceiver(['clear']);
const receiverCommandsRead = genCommandsReceiver(['read']);
await testUtils.pkWithStdio(receiverCommandsClear);
const result = await testUtils.pkWithStdio(receiverCommandsRead);
expect(result.code).toBe(0);
expect(result.stdout).toEqual('No notifications to display\n'); // should be no notifications left
}, global.defaultTimeout * 5);
});
}); But calling |
Small sanity tests: For the following sleep program (6000 ms sleep): import { sleep } from '@/utils';
describe('Timeouts', () => {
test('sleep', async () => {
await sleep(6000);
})
});
For the following sleep program (6000 ms sleep), with a custom 1000 ms timeout: import { sleep } from '@/utils';
describe('Timeouts', () => {
test('sleep', async () => {
await sleep(6000);
}, 1000)
});
For the following sleep program, with the global timeout set as 500: import { sleep } from '@/utils';
describe('Timeouts', () => {
test('sleep', async () => {
await sleep(6000);
}, 2000)
}); import path from 'path';
declare global {
namespace NodeJS {
interface Global {
projectDir: string;
testDir: string;
defaultTimeout: number;
}
}
}
global.projectDir = path.join(__dirname, '../');
global.testDir = __dirname;
global.defaultTimeout = 500;
So therefore, the parameter is overriding the global, even with a larger value. Similarly, without changing the global timeout from 500ms, we can do this with a larger passed timeout to make this test pass: import { sleep } from '@/utils';
describe('Timeouts', () => {
test('sleep', async () => {
await sleep(6000);
}, 10000)
});
|
Alright, I think I've figured this out. After trying to find a minimal example to write up an issue upstream, I built the following test file, just to see what outputs we'd get in the console: // global timeout = 1000
import { sleep } from '@/utils';
beforeEach(async() => {
await sleep(5000);
})
describe('Hello', () => {
test('time out?', async () => {
await sleep(5000);
}, 8000);
}); By passing the timeout of 8000 to the test function, we expect this to override the global timeout of 1000ms. And it does do this. However, the test still fails:
But why does the console specify that the test timed out as a result of the 1000 ms global timeout? It's not the test function that's timing out - it's the import { sleep } from '@/utils';
beforeEach(async() => {
await sleep(5000);
}, 2000)
describe('Hello', () => {
test('time out?', async () => {
await sleep(5000);
}, 8000);
}); We get the following:
Changing it to 8000:
|
Therefore, the solution for our case is to also provide this timeout parameter for any Alternatively, we increase this default timeout to some large value to not have to think about this. |
I did some testing with optimizing the tests. I started with looking at the
after applying the optimisations;
The tests ran more than 10 faster..
Using this method should be mostly safe against tests affecting each other. |
This is great.
@tegefaulkes Are you just doing a full DB clear to achieve this? |
I just removed any possible permissions using. afterEach(async () => {
//This handles the cheap teardown between tests.
//Clean up any dangling permissions.
await polykeyAgent.gestalts.unsetGestaltActionByNode(node1.id, 'notify');
await polykeyAgent.gestalts.unsetGestaltActionByNode(node1.id, 'scan');
await polykeyAgent.gestalts.unsetGestaltActionByNode(node2.id, 'notify');
await polykeyAgent.gestalts.unsetGestaltActionByNode(node2.id, 'scan');
await polykeyAgent.gestalts.unsetGestaltActionByNode(node3.id, 'notify');
await polykeyAgent.gestalts.unsetGestaltActionByNode(node3.id, 'scan');
await polykeyAgent.gestalts.unsetGestaltActionByIdentity(
identity1.providerId,
identity1.identityId,
'notify',
); Is there a method in the DB to clear everything? |
I believe there's a |
If this is the case, might be worthwhile to introduce a utils function that recursively goes to the appropriate sublevels and clears them. |
it would be useful to have functions that clears certain aspects of the databases.
|
Yes clear clears all sublevels. Go to the EFS MR. And see the new DB tests there. It demonstrates how it works. |
However it's not guaranteed that clear is atomic. So when I need atomicity I make sure to generate ops for deletion instead. |
So if you do a clear from the root level, it will clear everything sublevel. That's how fresh works. |
!202 has me merged, closing issue. |
Running
npm run test
causes these errors to occur, which causes failures in unrelated modules. The failing test suites fun without error when run in isolation however.From what I can tell, it seems to be related to the
WorkerManager
@CMCDragonkaiThe text was updated successfully, but these errors were encountered: