Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeouts in Testing involving Worker Manager and Async Operations #178

Closed
DrFacepalm opened this issue May 31, 2021 · 57 comments
Closed

Timeouts in Testing involving Worker Manager and Async Operations #178

DrFacepalm opened this issue May 31, 2021 · 57 comments
Assignees
Labels
bug Something isn't working

Comments

@DrFacepalm
Copy link
Contributor

DrFacepalm commented May 31, 2021

Running npm run test causes these errors to occur, which causes failures in unrelated modules. The failing test suites fun without error when run in isolation however.

From what I can tell, it seems to be related to the WorkerManager @CMCDragonkai

A worker process has failed to exit gracefully and has been force exited. This is likely caused by tests leaking due to improper teardown. Try running with --detectOpenHandles to find leaks.
Summary of all failing tests
 FAIL  tests/PolykeyAgent.test.ts (26.341 s)
  ● Polykey › async start constructs node path

    Timeout: Did not receive an init message from worker after 10000ms. Make sure the worker calls expose().

      at Timeout._onTimeout (node_modules/threads/dist/master/spawn.js:35:53)

  ● Polykey › async stop leaves the node path

    Timeout: Did not receive an init message from worker after 10000ms. Make sure the worker calls expose().

      at Timeout._onTimeout (node_modules/threads/dist/master/spawn.js:35:53)

 FAIL  tests/bin/polykey.test.ts (41.051 s)
  ● CLI echoes › should echo

    Timeout: Did not receive an init message from worker after 10000ms. Make sure the worker calls expose().

      at Timeout._onTimeout (node_modules/threads/dist/master/spawn.js:35:53)
@DrFacepalm
Copy link
Contributor Author

This seems to be related to issues in PolykeyAgent tests. Issues in this cause issues elsewhere.

@CMCDragonkai CMCDragonkai added the bug Something isn't working label Jul 3, 2021
@CMCDragonkai CMCDragonkai added this to the Polykey CLI 1.0.0 Release milestone Jul 3, 2021
@CMCDragonkai CMCDragonkai changed the title Worker Manager Causes test failures. Timeouts in Testing involving Worker Manager and Async Operations Jul 9, 2021
@CMCDragonkai
Copy link
Member

@scottmmorris can you add in your notes regarding what the problem is here too. And what you have tried, as well as @DrFacepalm to add in his suggestions.

There's a nuclear option to just test the domain individually:

npm test -- tests/bin
npm test -- tests/agent
npm test -- tests/agent

But there may be a way to fix jest properly to be able to do a full npm run tests with no problems.

Try search on jest docs/configuration/issues about problems involving lots of asynchronous tests and timeout errors.

  • "too many tests"
  • "running out timeout, many asynchronous"

Also check if there are async operations you are forgetting to await for:

function f() {
    await asyncf();
    doSomethingAsync(); // <- PROBLEM!
    await didSomethingelse();
}

@CMCDragonkai
Copy link
Member

Try to find the deadlocks or where 2 test files/test domains are affecting each other's performance.

Also have a read about the jest process architecture.

@scottmmorris
Copy link
Contributor

npm run test -- ./tests/bin - PASSED
npm run test -- ./tests/bin ./tests/keys - PASSED
npm run test -- ./tests/bin ./tests/nodes - FAILED
npm run test -- ./tests/bin ./tests/keys - FAILED
npm run test -- ./tests/bin - FAILED

I am going to try and restart my laptop and run these tests again because it seems as if something from the 2nd or 3rd tests has caused failures in subsequent tests.

@scottmmorris
Copy link
Contributor

scottmmorris commented Jul 9, 2021

Restarting the laptop did not fix anything, running the bin tests by themselves resulted in failure. It seems like pinpointing where the source of the async test conflict might be very difficult. I might start to look into issues on jest docs.

@scottmmorris
Copy link
Contributor

image.png

This might be useful if we decide to go with the 'nuclear' option.

@scottmmorris
Copy link
Contributor

scottmmorris commented Jul 9, 2021

Working on logging start and finishing for each test suite and realised that for the GitBackend tests the before and after each hooks were not inside the describe wrapper. I read somewhere in the jest issues that this condition has sometimes lead to similar async issues. Will continue to update for the logging and report on the progress.

EDIT: No difference after changing the 'before/after' hooks but that was a bit hopeful anyway

@scottmmorris
Copy link
Contributor

scottmmorris commented Jul 9, 2021

So after locking through the configuration file here I found a config option called clearMocks. I remember reading somewhere else about people running into the same issue as us and the problem was that they were not clearing their mocks properly. Initial tests (without the CLI) succeeded but with the CLI added back in the errors came back. The other config option that I tried was resetModules which didn't make a difference either.

@CMCDragonkai
Copy link
Member

For me to figure this out, I'm going to to have to start off from the domains that I know well.

That means:

  • acl
  • db
  • keys
  • grpc
  • network
  • gestalts

So I'm going to try get all these tests running at the same time.

@CMCDragonkai
Copy link
Member

So after locking through the configuration file here I found a config option called clearMocks. I remember reading somewhere else about people running into the same issue as us and the problem was that they were not clearing their mocks properly. Initial tests (without the CLI) succeeded but with the CLI added back in the errors came back. The other config option that I tried was resetModules which didn't make a difference either.

I don't think we are using jest mocks anywhere as far as I know so I don't think this will affect anything.

@CMCDragonkai
Copy link
Member

Ok here's the sanity test that proves what jest.setTimeout(...); actually does.

By default Jest configures this timeout to be 5000 which means 5 seconds.

If you have a test that looks like this (using sleep from utils):

  test.only('somedumbthing', async () => {
    await sleep(6000);
  });

This will result in an timeout error that looks like:

    : Timeout - Async callback was not invoked within the 5000 ms timeout specified by jest.setTimeout.Timeout - Async callback was not invoked within the 5000 ms timeout specified by jest.setTimeout.Error:

Therefore the timeout here does not affect the total runtime of testing, but just the time it takes to run 1 asynchronous test. So if 2 asynchronous tests take 4 seconds each, that's fine.

The default 5 seconds is a bit low, so we raise to 10 seconds.

However it is possible to set the timeout on a per-test basis if we believe that a specific test will take longer than usual.

@CMCDragonkai
Copy link
Member

To give specific tests a specific timeout that is not the default, use the third parameter of the test function.

Example given here: jestjs/jest#5055 (comment)

test('example', async () => {
  await new Promise(resolve => setTimeout(resolve, 1000));
}, 500);

@CMCDragonkai
Copy link
Member

So the first thing I'm going to do is to reset jest.setTimeout to using 10000 by default. Therefore we must make the whole team realize that all async tests have a timeout of 10 seconds.

If their test requires more time, either break it up into smaller tests, and if not possible, then use a third parameter to specify a longer length of time.

@CMCDragonkai
Copy link
Member

Each test file should be affecting each other's timeout... that's the other important thing.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jul 11, 2021

[nix-shell:~/Projects/js-polykey]$ npm test -- tests/acl

> @matrixai/polykey@0.0.41 test /home/cmcdragonkai/Projects/js-polykey
> jest "tests/acl"

Determining test suites to run...
GLOBAL SETUP
 PASS  tests/acl/utils.test.ts
  utils
    ✓ merging permissions (3 ms)

 PASS  tests/acl/ACL.test.ts (14.25 s)
  ACL
    ✓ trust and untrust gestalts (1623 ms)
    ✓ setting and unsetting vault actions (1429 ms)
    ✓ joining existing gestalt permissions (2857 ms)
    ✓ joining existing vault permisisons (1824 ms)
    ✓ node removal (3085 ms)
    ✓ vault removal (1044 ms)
    ✓ transactional operations (1672 ms)

Test Suites: 2 passed, 2 total
Tests:       8 passed, 8 total
Snapshots:   0 total
Time:        14.521 s, estimated 16 s
Ran all test suites matching /tests\/acl/i.
GLOBAL TEARDOWN

So for example above, all the time is within 5 seconds. I reckon a rule of thumb.

If a test breaks the 5 seconds limit, then it is likely to double and go over 10 seconds.

In that case, give it a 20 second timer to complete.

So we will need to do that for all tests!

@CMCDragonkai
Copy link
Member

The other test I had to do is to see if test files are run in parallel, while tests within a test file are run serially.

This sanity check proves this is the case.

WARN:ACL Test:FIRST
WARN:GestaltGraph Test:FIRST
WARN:ACL Test:SECOND
WARN:GestaltGraph Test:SECOND
WARN:ACL Test:LAST
WARN:GestaltGraph Test:LAST

The way I did did this is by adding logger.warn('...'); to each test between acl and gestalts.

Running npm test -- tests/acl tests/gestalts

Showed that tests/acl and tests/gestalts all the files within these directories were run in parallel.

Then for each test inside the test file, they were run serially.

So in terms of test interference, it's possible that test files running in parallel will slowdown the execution of each test which could change the runtime performance which could affect the timeout issues.

One way to change this is to use the option --runInBand which will make Jest run serially even for separate test files.

@CMCDragonkai
Copy link
Member

This:

> npm test -- --runInBand tests/acl tests/gestalts

WARN:GestaltGraph Test:FIRST
WARN:GestaltGraph Test:SECOND
WARN:GestaltGraph Test:LAST
WARN:ACL Test:FIRST
WARN:ACL Test:SECOND
WARN:ACL Test:LAST

Demonstrates that using --runInBand now shows that all the test files are run sequentially. This can be important in case different processes are interfering with each other. But this should not currently happen. Only CPU time can be a problem, especially in CI/CD scenarios. In that case, our Gitlab CI/CD might be better to use --runInBand.

@CMCDragonkai
Copy link
Member

Note that:

Alias: -w. Specifies the maximum number of workers the worker-pool will spawn for running tests. In single run mode, this defaults to the number of the cores available on your machine minus one for the main thread. In watch mode, this defaults to half of the available cores on your machine to ensure Jest is unobtrusive and does not grind your machine to a halt. It may be useful to adjust this in resource limited environments like CIs but the defaults should be adequate for most use-cases.

This means jest uses by default the number of workers equal to your core count minus 1. So on my big laptop that's nproc which is 8. So the jest worker pool would be 7. So by default, there can only be 7 test files running in parallel. Therefore assuming we have similar CPUs between CI/CD, my laptop and all the devs in Matrix AI, then we shouldn't have too much variance in CPU execution time for each test. Maybe +- 30% of the test time. So doubling rule of thumb above should be suffiicent.

@CMCDragonkai
Copy link
Member

I have a hunch that the most troublesome tests are going to be ones involving the networking tests using utp-native.

@CMCDragonkai
Copy link
Member

I've noticed that when running more tests, each individual test can double in the time taken to execute. For example:

 PASS  tests/acl/ACL.test.ts (11.913 s)
  ACL
    ✓ trust and untrust gestalts (1703 ms)
    ✓ setting and unsetting vault actions (1229 ms)
    ✓ joining existing gestalt permissions (1926 ms)
    ✓ joining existing vault permisisons (2180 ms)
    ✓ node removal (1532 ms)
    ✓ vault removal (1508 ms)
    ✓ transactional operations (1709 ms)
 PASS  tests/acl/ACL.test.ts (22.147 s)
  ACL
    ✓ trust and untrust gestalts (2043 ms)
    ✓ setting and unsetting vault actions (2199 ms)
    ✓ joining existing gestalt permissions (3612 ms)
    ✓ joining existing vault permisisons (3791 ms)
    ✓ node removal (2650 ms)
    ✓ vault removal (3973 ms)
    ✓ transactional operations (2822 ms)

The initial run I had:

npm test -- --runInBand tests/acl tests/gestalts

The last run I did:

npm test -- tests/acl tests/gestalts tests/db tests/keys

Therefore the addition of keys domain into the testing is so heavy that almost all other tests when run in parallel start to double in their execution time.

@CMCDragonkai
Copy link
Member

So based on the rule of thumb I can see that a few tests require special time doubling as now they have hit 5 second mark.

[nix-shell:~/Projects/js-polykey]$ npm test -- tests/acl tests/gestalts tests/db tests/keys

> @matrixai/polykey@0.0.41 test /home/cmcdragonkai/Projects/js-polykey
> jest "tests/acl" "tests/gestalts" "tests/db" "tests/keys"

Determining test suites to run...
GLOBAL SETUP
 PASS  tests/acl/utils.test.ts
  utils
    ✓ merging permissions (4 ms)

 PASS  tests/keys/utils.test.ts (10.413 s)
  utils
    ✓ key pair copy (850 ms)
    ✓ to and from der encoding (1219 ms)
    ✓ certificate copy (2194 ms)
    ✓ encryption and decryption of private key (5246 ms)

 PASS  tests/db/DB.test.ts (17.774 s)
  DB
    ✓ construction has no side effects (3105 ms)
    ✓ async start constructs the db leveldb (1892 ms)
    ✓ start and stop preserves the db key (2487 ms)
    ✓ get and put and del (3272 ms)
    ✓ db levels are leveldbs (2802 ms)
    ✓ clearing a level clears all sublevels (3189 ms)

 PASS  tests/acl/ACL.test.ts (22.147 s)
  ACL
    ✓ trust and untrust gestalts (2043 ms)
    ✓ setting and unsetting vault actions (2199 ms)
    ✓ joining existing gestalt permissions (3612 ms)
    ✓ joining existing vault permisisons (3791 ms)
    ✓ node removal (2650 ms)
    ✓ vault removal (3973 ms)
    ✓ transactional operations (2822 ms)

 PASS  tests/gestalts/GestaltGraph.test.ts (43.62 s)
  GestaltGraph
    ✓ get, set and unset node (1914 ms)
    ✓ get, set and unset identity (2598 ms)
    ✓ setting independent node and identity gestalts (2148 ms)
    ✓ start and stop preserves state (4339 ms)
    ✓ link node to node (1367 ms)
    ✓ link node to identity (3547 ms)
    ✓ link node to node and identity (2201 ms)
    ✓ getting all gestalts (1052 ms)
    ✓ new node gestalts creates a new acl record (3765 ms)
    ✓ new identity gestalts does not create a new acl record (1536 ms)
    ✓ set and unset gestalt actions (1214 ms)
    ✓ linking 2 new nodes results in a merged permission (2004 ms)
    ✓ linking 2 existing nodes results in a merged permission (1227 ms)
    ✓ link existing node to new node (1804 ms)
    ✓ linking new node and new identity results in a merged permission (1534 ms)
    ✓ linking existing node and existing identity results in merged permission (1620 ms)
    ✓ link existing node to new identity (1276 ms)
    ✓ link new node to existing identity (1273 ms)
    ✓ splitting node and node results in split inherited permissions (1361 ms)
    ✓ splitting node and identity results in split inherited permissions unless the identity is a loner (2482 ms)
    ✓ removing a gestalt removes the permission (2178 ms)

 PASS  tests/keys/KeyManager.test.ts (50.876 s)
  KeyManager
    ✓ construction has no side effects (4 ms)
    ✓ async start constructs root key pair and root cert and root certs (4399 ms)
    ✓ can get root key pair and root cert (2503 ms)
    ✓ uses WorkerManager for generating root key pair (4774 ms)
    ✓ encrypting and decrypting with root key (1331 ms)
    ✓ uses WorkerManager for encryption and decryption with root key (4253 ms)
    ✓ encrypting beyond maximum size (1722 ms)
    ✓ signing and verifying with root key (2078 ms)
    ✓ uses WorkerManager for signing and verifying with root key (4406 ms)
    ✓ can change root key password (2790 ms)
    ✓ can reset root certificate (3656 ms)
    ✓ can reset root key pair (4793 ms)
    ✓ can renew root key pair (4840 ms)
    ✓ order of certificate chain should be leaf to root (8418 ms)

Test Suites: 6 passed, 6 total
Tests:       53 passed, 53 total
Snapshots:   0 total
Time:        51.249 s, estimated 56 s
Ran all test suites matching /tests\/acl|tests\/gestalts|tests\/db|tests\/keys/i.
GLOBAL TEARDOWN

These are the ones to double:

✓ encryption and decryption of private key (5246 ms)
✓ order of certificate chain should be leaf to root (8418 ms)

@CMCDragonkai
Copy link
Member

Ok I've made a realization. Our symmetric encryption and decryption is incredibly slow. This significantly slows all DB access including get and put operations. This requires further investigation since this affects ALL domains.

For example:

  test('get and put and del', async () => {
    const dbPath = `${dataDir}/db`;
    const db = new DB({ dbPath, logger });
    await db.start({
      keyPair: keyManager.getRootKeyPair()
    });
    await db.put([], 'a', 'value0');
    expect(await db.get([], 'a')).toBe('value0');
    await db.del([], 'a');
    expect(await db.get([], 'a')).toBeUndefined();
    await db.level<string>('level1');
    await db.put(['level1'], 'a', 'value1');
    expect(await db.get(['level1'], 'a')).toBe('value1');
    await db.del(['level1'], 'a');
    expect(await db.get(['level1'], 'a')).toBeUndefined();
    await db.stop();
  }, global.defaultTimeout * 2);

Takes at least 2 seconds to execute... This few get, put and delete operations should be ALOT faster than 2 seconds.

@tegefaulkes
Copy link
Contributor

We seem to be having problems with the global.defaultTimeout
When I run the tests they seem to timeout after 10000ms. This happens regardless of the timeout parameter I set for each test.

After some digging I found inside setupAfterEnv.ts

// default timeout per test
// some tests may take longer in which case you should specify the timeout
// explicitly for each test by using the third parameter of test function
jest.setTimeout(global.defaultTimeout);

this seems to be overriding the timeout parameter set for each test.
If i remove the jest.setTimeout(global.defaultTimeout) the timeout defaults to 5000ms which is the default for jest itself.
However the timeout parameter for each test still does't override this.

I don't know whats going on here since the timeout parameter seemed to be working before.
@joshuakarp has confirmed this problem on his end as well.

Do you know what is happening here? @CMCDragonkai

@CMCDragonkai
Copy link
Member

Look at the tests I did above, that worked fine. And there was no such problem as you mention.

@CMCDragonkai
Copy link
Member

Make sure you're own the same dependency as we are. Maybe the change in pkg.nix has changed things.

@tegefaulkes
Copy link
Contributor

The tests above don't include any bin tests which usually exceed the default timeout of 10000ms.
I'm not sure the tests above would've revealed this issue.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 6, 2021 via email

@tegefaulkes
Copy link
Contributor

I'll look at that in a minuet. just finishing off some small changes for nodes-cli

@CMCDragonkai
Copy link
Member

Do a sanity check on this with a single setTimeout test. And then fiddle with the parameters to clarify the behaviour here.

@CMCDragonkai
Copy link
Member

To optimise the test cases:

  1. Start using beforeAll and afterAll to share state
  2. Try to break up the tests into smaller tests so that each test can be faster to run
  3. Don't rely entirely on the jest reported times for benchmarking, a better way is to use the actual bench library, or just use the Date and subtract delta

@CMCDragonkai
Copy link
Member

Also try tuning the networking delay for the punch packets for optimising the startup of forward and reverse proxy during testing.

@joshuakarp
Copy link
Contributor

To add to the discussion, there's 2 indicators that the jest test timeout parameter isn't working as expected:

However, I believe @tegefaulkes also tested the case of setting the global as a larger value and the passed parameter as a smaller value, and it still didn't override (at least in the error message).

@joshuakarp
Copy link
Contributor

Similarly, trying to test the tests/bin/notifications.test.ts at the moment, and I was having the same kind of timeout issues, where every test failed (even with the passed test timeout values set as 5 times the global):

import type { NodeId, NodeAddress } from '@/nodes/types';

import os from 'os';
import path from 'path';
import fs from 'fs';
import Logger, { LogLevel, StreamHandler } from '@matrixai/logger';
import * as utils from './utils';
import * as testUtils from './utils';
import { PolykeyAgent } from '@';

const logger = new Logger('pkWithStdio Test', LogLevel.WARN, [
  new StreamHandler(),
]);
let senderDataDir: string, receiverDataDir: string;
let senderNodePath: string, receiverNodePath: string;
let senderPasswordFile: string, receiverPasswordFile: string;
let senderPolykeyAgent: PolykeyAgent, receiverPolykeyAgent: PolykeyAgent;
let senderNodeId: NodeId, receiverNodeId: NodeId;

function genCommandsSender(options: Array<string>) {
  return ['notifications', ...options, '-np', senderNodePath];
}

function genCommandsReceiver(options: Array<string>) {
  return ['notifications', ...options, '-np', receiverNodePath];
}

describe('CLI Notifications', () => {
  beforeEach(async () => {
    senderDataDir = await fs.promises.mkdtemp(
      path.join(os.tmpdir(), 'polykey-test-'),
    );
    receiverDataDir = await fs.promises.mkdtemp(
      path.join(os.tmpdir(), 'polykey-test-'),
    );
    senderNodePath = path.join(senderDataDir, 'sender');
    receiverNodePath = path.join(receiverDataDir, 'receiver');
    senderPasswordFile = path.join(senderDataDir, 'passwordFile');
    receiverPasswordFile = path.join(senderDataDir, 'passwordFile');
    await fs.promises.writeFile(senderPasswordFile, 'password');
    await fs.promises.writeFile(receiverPasswordFile, 'password');
    senderPolykeyAgent = new PolykeyAgent({
      nodePath: senderNodePath,
      logger: logger,
    });
    receiverPolykeyAgent = new PolykeyAgent({
      nodePath: receiverNodePath,
      logger: logger,
    });
    await senderPolykeyAgent.start({
      password: 'password',
    });
    await receiverPolykeyAgent.start({
      password: 'password',
    });
    senderNodeId = senderPolykeyAgent.nodes.getNodeId();
    receiverNodeId = receiverPolykeyAgent.nodes.getNodeId();
    await senderPolykeyAgent.nodes.setNode(receiverNodeId, {
      ip: receiverPolykeyAgent.revProxy.getIngressHost(),
      port: receiverPolykeyAgent.revProxy.getIngressPort(),
    } as NodeAddress);

    // Authorize session
    await utils.pk([
      'agent',
      'unlock',
      '-np',
      senderNodePath,
      '--password-file',
      senderPasswordFile,
    ]);
    await utils.pk([
      'agent',
      'unlock',
      '-np',
      receiverNodePath,
      '--password-file',
      receiverPasswordFile,
    ]);
    await receiverPolykeyAgent.notifications.clearNotifications();
  });
  afterEach(async () => {
    await senderPolykeyAgent.stop();
    await receiverPolykeyAgent.stop();
    await fs.promises.rmdir(senderDataDir, { recursive: true });
    await fs.promises.rmdir(receiverDataDir, { recursive: true });
  });
  describe('commandSendNotification', () => {
    test('Should send notification with permission.', async () => {
      await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
        gestalt: {
          notify: null,
        },
        vaults: {},
      });
      const commands = genCommandsSender(['send', receiverNodeId, 'msg']);
      const result = await testUtils.pkWithStdio(commands);
      expect(result.code).toBe(0); // Succeeds
      expect(result.stdout).toContain('msg');
      const notifications =
        await receiverPolykeyAgent.notifications.readNotifications();
      expect(notifications).toContain('msg');
    }, global.defaultTimeout * 5);
    test('Should send notification without permission.', async () => {
      await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
        gestalt: {},
        vaults: {},
      });
      const commands = genCommandsSender(['send', receiverNodeId, 'msg']);
      const result = await testUtils.pkWithStdio(commands);
      expect(result.code).toBe(0); // Succeeds
      expect(result.stdout).toContain('msg');
      const notifications =
        await receiverPolykeyAgent.notifications.readNotifications();
      expect(notifications).toEqual([]); // Notification should be sent but not received
    }, global.defaultTimeout * 5);
  });
  describe('commandReadNotifications', () => {
    test('Should read all notifications.', async () => {
      await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
        gestalt: {
          notify: null,
        },
        vaults: {},
      });
      const senderCommands1 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg1',
      ]);
      const senderCommands2 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg2',
      ]);
      const senderCommands3 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg3',
      ]);
      await testUtils.pkWithStdio(senderCommands1);
      await testUtils.pkWithStdio(senderCommands2);
      await testUtils.pkWithStdio(senderCommands3);
      const receiverCommands = genCommandsReceiver(['read']);
      const result1 = await testUtils.pkWithStdio(receiverCommands);
      expect(result1.code).toBe(0);
      expect(result1.stdout).toContain('msg1');
      expect(result1.stdout).toContain('msg2');
      expect(result1.stdout).toContain('msg3');
      const senderCommands4 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg4',
      ]);
      await testUtils.pkWithStdio(senderCommands4);
      const result2 = await testUtils.pkWithStdio(receiverCommands);
      expect(result2.code).toBe(0);
      expect(result2.stdout).toContain('msg1');
      expect(result2.stdout).toContain('msg2');
      expect(result2.stdout).toContain('msg3');
      expect(result2.stdout).toContain('msg4');
    }, global.defaultTimeout * 5);
    test('Should read all unread notifications.', async () => {
      await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
        gestalt: {
          notify: null,
        },
        vaults: {},
      });
      const senderCommands1 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg1',
      ]);
      const senderCommands2 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg2',
      ]);
      const senderCommands3 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg3',
      ]);
      await testUtils.pkWithStdio(senderCommands1);
      await testUtils.pkWithStdio(senderCommands2);
      await testUtils.pkWithStdio(senderCommands3);
      const receiverCommands1 = genCommandsReceiver(['read']);
      await testUtils.pkWithStdio(receiverCommands1);
      const senderCommands4 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg4',
      ]);
      await testUtils.pkWithStdio(senderCommands4);
      const receiverCommands2 = genCommandsReceiver(['read', '--unread']);
      const result = await testUtils.pkWithStdio(receiverCommands2);
      expect(result.code).toBe(0);
      expect(result.stdout).not.toContain('msg1'); // previously read notifications should be ignored
      expect(result.stdout).not.toContain('msg2');
      expect(result.stdout).not.toContain('msg3');
      expect(result.stdout).toContain('msg4');
    }, global.defaultTimeout * 5);
    test('Should read a fixed number of notifications.', async () => {
      await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
        gestalt: {
          notify: null,
        },
        vaults: {},
      });
      const senderCommands1 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg1',
      ]);
      const senderCommands2 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg2',
      ]);
      const senderCommands3 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg3',
      ]);
      await testUtils.pkWithStdio(senderCommands1);
      await testUtils.pkWithStdio(senderCommands2);
      await testUtils.pkWithStdio(senderCommands3);
      const receiverCommands = genCommandsReceiver(['read', '-n', '2']);
      const result = await testUtils.pkWithStdio(receiverCommands);
      expect(result.code).toBe(0);
      expect(result.stdout).not.toContain('msg1'); // oldest notification not included
      expect(result.stdout).toContain('msg2');
      expect(result.stdout).toContain('msg3');
    }, global.defaultTimeout * 5);
    test('Should read notifications in reverse order.', async () => {
      await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
        gestalt: {
          notify: null,
        },
        vaults: {},
      });
      const senderCommands1 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg1',
      ]);
      const senderCommands2 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg2',
      ]);
      const senderCommands3 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg3',
      ]);
      await testUtils.pkWithStdio(senderCommands1);
      await testUtils.pkWithStdio(senderCommands2);
      await testUtils.pkWithStdio(senderCommands3);
      const receiverCommands = genCommandsReceiver([
        'read',
        '-u',
        'true',
        '-n',
        '1',
        '-o',
        'oldest',
      ]);
      const result1 = await testUtils.pkWithStdio(receiverCommands);
      const result2 = await testUtils.pkWithStdio(receiverCommands);
      const result3 = await testUtils.pkWithStdio(receiverCommands);
      expect(result1.code).toBe(0);
      expect(result2.code).toBe(0);
      expect(result3.code).toBe(0);
      expect(result1.stdout).toContain('msg1');
      expect(result2.stdout).toContain('msg2');
      expect(result3.stdout).toContain('msg3');
    }, global.defaultTimeout * 5);
    test('Should read no notifications.', async () => {
      const receiverCommands = genCommandsReceiver(['read']);
      const result = await testUtils.pkWithStdio(receiverCommands);
      expect(result.code).toBe(0);
      expect(result.stdout).toEqual('No notifications to display\n');
    }, global.defaultTimeout * 5);
  });
  describe('commandClearNotifications', () => {
    test('Should remove all notifications.', async () => {
      await receiverPolykeyAgent.acl.setNodePerm(senderNodeId, {
        gestalt: {
          notify: null,
        },
        vaults: {},
      });
      const senderCommands1 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg1',
      ]);
      const senderCommands2 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg2',
      ]);
      const senderCommands3 = genCommandsSender([
        'send',
        receiverNodeId,
        'msg3',
      ]);
      await testUtils.pkWithStdio(senderCommands1);
      await testUtils.pkWithStdio(senderCommands2);
      await testUtils.pkWithStdio(senderCommands3);
      const receiverCommandsClear = genCommandsReceiver(['clear']);
      const receiverCommandsRead = genCommandsReceiver(['read']);
      await testUtils.pkWithStdio(receiverCommandsClear);
      const result = await testUtils.pkWithStdio(receiverCommandsRead);
      expect(result.code).toBe(0);
      expect(result.stdout).toEqual('No notifications to display\n'); // should be no notifications left
    }, global.defaultTimeout * 5);
  });
});

But calling jest.setTimeout(50000) at the start of the tests resolved this entirely. No more timeouts.

@joshuakarp
Copy link
Contributor

joshuakarp commented Aug 9, 2021

Small sanity tests:

For the following sleep program (6000 ms sleep):

import { sleep } from '@/utils';

describe('Timeouts', () => {
  test('sleep', async () => {
    await sleep(6000);
  })
});
  1. global timeout = 5000ms: failed
[nix-shell:~/Documents/MatrixAI/js-polykey]$ npm test -- tests/sanity.test.ts 

> @matrixai/polykey@0.0.41 test /home/josh/Documents/MatrixAI/js-polykey
> jest "tests/sanity.test.ts"

Determining test suites to run...
GLOBAL SETUP
 FAIL  tests/sanity.test.ts (5.379 s)
  Timeouts
    ✕ sleep (5004 ms)

  ● Timeouts › sleep

    : Timeout - Async callback was not invoked within the 5000 ms timeout specified by jest.setTimeout.Timeout - Async callback was not invoked within the 5000 ms timeout specified by jest.setTimeout.Error:

      2 |
      3 | describe('Timeouts', () => {
    > 4 |   test('sleep', async () => {
        |   ^
      5 |     await sleep(6000);
      6 |   })
      7 | });

      at new Spec (node_modules/jest-jasmine2/build/jasmine/Spec.js:116:22)
      at Suite.<anonymous> (tests/sanity.test.ts:4:3)

Test Suites: 1 failed, 1 total
Tests:       1 failed, 1 total
Snapshots:   0 total
Time:        5.426 s
Ran all test suites matching /tests\/sanity.test.ts/i.
GLOBAL TEARDOWN
npm ERR! Test failed.  See above for more details.

  1. global timeout = 10000ms: passed
[nix-shell:~/Documents/MatrixAI/js-polykey]$ npm test -- tests/sanity.test.ts 

> @matrixai/polykey@0.0.41 test /home/josh/Documents/MatrixAI/js-polykey
> jest "tests/sanity.test.ts"

Determining test suites to run...
GLOBAL SETUP
 PASS  tests/sanity.test.ts (6.365 s)
  Timeouts
    ✓ sleep (6006 ms)

Test Suites: 1 passed, 1 total
Tests:       1 passed, 1 total
Snapshots:   0 total
Time:        6.425 s
Ran all test suites matching /tests\/sanity.test.ts/i.
GLOBAL TEARDOWN

For the following sleep program (6000 ms sleep), with a custom 1000 ms timeout:

import { sleep } from '@/utils';

describe('Timeouts', () => {
  test('sleep', async () => {
    await sleep(6000);
  }, 1000)
});
  1. global timeout = 5000ms: failed (with message stating expected 1000 ms)
[nix-shell:~/Documents/MatrixAI/js-polykey]$ npm test -- tests/sanity.test.ts 

> @matrixai/polykey@0.0.41 test /home/josh/Documents/MatrixAI/js-polykey
> jest "tests/sanity.test.ts"

Determining test suites to run...
GLOBAL SETUP
 FAIL  tests/sanity.test.ts
  Timeouts
    ✕ sleep (1007 ms)

  ● Timeouts › sleep

    : Timeout - Async callback was not invoked within the 1000 ms timeout specified by jest.setTimeout.Timeout - Async callback was not invoked within the 1000 ms timeout specified by jest.setTimeout.Error:

      2 |
      3 | describe('Timeouts', () => {
    > 4 |   test('sleep', async () => {
        |   ^
      5 |     await sleep(6000);
      6 |   }, 1000)
      7 | });

      at new Spec (node_modules/jest-jasmine2/build/jasmine/Spec.js:116:22)
      at Suite.<anonymous> (tests/sanity.test.ts:4:3)

Test Suites: 1 failed, 1 total
Tests:       1 failed, 1 total
Snapshots:   0 total
Time:        4.01 s, estimated 5 s
Ran all test suites matching /tests\/sanity.test.ts/i.
GLOBAL TEARDOWN
Jest did not exit one second after the test run has completed.

This usually means that there are asynchronous operations that weren't stopped in your tests. Consider running Jest with `--detectOpenHandles` to troubleshoot this issue.

For the following sleep program, with the global timeout set as 500:

import { sleep } from '@/utils';

describe('Timeouts', () => {
  test('sleep', async () => {
    await sleep(6000);
  }, 2000)
});
import path from 'path';

declare global {
  namespace NodeJS {
    interface Global {
      projectDir: string;
      testDir: string;
      defaultTimeout: number;
    }
  }
}

global.projectDir = path.join(__dirname, '../');
global.testDir = __dirname;
global.defaultTimeout = 500;
[nix-shell:~/Documents/MatrixAI/js-polykey]$ npm test -- tests/sanity.test.ts 

> @matrixai/polykey@0.0.41 test /home/josh/Documents/MatrixAI/js-polykey
> jest "tests/sanity.test.ts"

Determining test suites to run...
GLOBAL SETUP
 FAIL  tests/sanity.test.ts
  Timeouts
    ✕ sleep (2006 ms)

  ● Timeouts › sleep

    : Timeout - Async callback was not invoked within the 2000 ms timeout specified by jest.setTimeout.Timeout - Async callback was not invoked within the 2000 ms timeout specified by jest.setTimeout.Error:

      2 |
      3 | describe('Timeouts', () => {
    > 4 |   test('sleep', async () => {
        |   ^
      5 |     await sleep(6000);
      6 |   }, 2000)
      7 | });

      at new Spec (node_modules/jest-jasmine2/build/jasmine/Spec.js:116:22)
      at Suite.<anonymous> (tests/sanity.test.ts:4:3)

Test Suites: 1 failed, 1 total
Tests:       1 failed, 1 total
Snapshots:   0 total
Time:        4.893 s
Ran all test suites matching /tests\/sanity.test.ts/i.
GLOBAL TEARDOWN
Jest did not exit one second after the test run has completed.

This usually means that there are asynchronous operations that weren't stopped in your tests. Consider running Jest with `--detectOpenHandles` to troubleshoot this issue.

So therefore, the parameter is overriding the global, even with a larger value.

Similarly, without changing the global timeout from 500ms, we can do this with a larger passed timeout to make this test pass:

import { sleep } from '@/utils';

describe('Timeouts', () => {
  test('sleep', async () => {
    await sleep(6000);
  }, 10000)
});
[nix-shell:~/Documents/MatrixAI/js-polykey]$ npm test -- tests/sanity.test.ts 

> @matrixai/polykey@0.0.41 test /home/josh/Documents/MatrixAI/js-polykey
> jest "tests/sanity.test.ts"

Determining test suites to run...
GLOBAL SETUP
 PASS  tests/sanity.test.ts (9.295 s)
  Timeouts
    ✓ sleep (6005 ms)

Test Suites: 1 passed, 1 total
Tests:       1 passed, 1 total
Snapshots:   0 total
Time:        9.369 s
Ran all test suites matching /tests\/sanity.test.ts/i.
GLOBAL TEARDOWN

@joshuakarp
Copy link
Contributor

Alright, I think I've figured this out.

After trying to find a minimal example to write up an issue upstream, I built the following test file, just to see what outputs we'd get in the console:

// global timeout = 1000
import { sleep } from '@/utils';

beforeEach(async() => {
  await sleep(5000);
})

describe('Hello', () => {
  test('time out?', async () => {
    await sleep(5000);
  }, 8000);
});

By passing the timeout of 8000 to the test function, we expect this to override the global timeout of 1000ms. And it does do this. However, the test still fails:

[nix-shell:~/Documents/MatrixAI/js-polykey]$ npm test -- tests/bin/timeout.test.ts 

> @matrixai/polykey@0.0.41 test /home/josh/Documents/MatrixAI/js-polykey
> jest "tests/bin/timeout.test.ts"

Determining test suites to run...
GLOBAL SETUP
 FAIL  tests/bin/timeout.test.ts (10.58 s)
  Hello
    ✕ time out? (6010 ms)

  ● Hello › time out?

    Timeout - Async callback was not invoked within the 1000 ms timeout specified by jest.setTimeout.Error: Timeout - Async callback was not invoked within the 1000 ms timeout specified by jest.setTimeout.

      at mapper (node_modules/jest-jasmine2/build/queueRunner.js:27:45)

Test Suites: 1 failed, 1 total
Tests:       1 failed, 1 total
Snapshots:   0 total
Time:        10.628 s
Ran all test suites matching /tests\/bin\/timeout.test.ts/i.
GLOBAL TEARDOWN
npm ERR! Test failed.  See above for more details.

But why does the console specify that the test timed out as a result of the 1000 ms global timeout?

It's not the test function that's timing out - it's the beforeEach call. I realised that you can actually pass a timeout parameter to the beforeEach call as well (just like any other test function). Then, if we change the timeout for beforeEach to 2000 ms:

import { sleep } from '@/utils';

beforeEach(async() => {
  await sleep(5000);
}, 2000)

describe('Hello', () => {
  test('time out?', async () => {
    await sleep(5000);
  }, 8000);
});

We get the following:

[nix-shell:~/Documents/MatrixAI/js-polykey]$ npm test -- tests/bin/timeout.test.ts 

> @matrixai/polykey@0.0.41 test /home/josh/Documents/MatrixAI/js-polykey
> jest "tests/bin/timeout.test.ts"

Determining test suites to run...
GLOBAL SETUP
 FAIL  tests/bin/timeout.test.ts (10.311 s)
  Hello
    ✕ time out? (7010 ms)

  ● Hello › time out?

    Timeout - Async callback was not invoked within the 2000 ms timeout specified by jest.setTimeout.Error: Timeout - Async callback was not invoked within the 2000 ms timeout specified by jest.setTimeout.

      at mapper (node_modules/jest-jasmine2/build/queueRunner.js:27:45)

Test Suites: 1 failed, 1 total
Tests:       1 failed, 1 total
Snapshots:   0 total
Time:        10.368 s, estimated 11 s
Ran all test suites matching /tests\/bin\/timeout.test.ts/i.
GLOBAL TEARDOWN
npm ERR! Test failed.  See above for more details.

Changing it to 8000:

[nix-shell:~/Documents/MatrixAI/js-polykey]$ npm test -- tests/bin/timeout.test.ts 

> @matrixai/polykey@0.0.41 test /home/josh/Documents/MatrixAI/js-polykey
> jest "tests/bin/timeout.test.ts"

Determining test suites to run...
GLOBAL SETUP
 PASS  tests/bin/timeout.test.ts (13.189 s)
  Hello
    ✓ time out? (10005 ms)

Test Suites: 1 passed, 1 total
Tests:       1 passed, 1 total
Snapshots:   0 total
Time:        13.257 s
Ran all test suites matching /tests\/bin\/timeout.test.ts/i.
GLOBAL TEARDOWN

@joshuakarp
Copy link
Contributor

Therefore, the solution for our case is to also provide this timeout parameter for any beforeEach/afterEach/beforeAll/afterAll calls in test files that are expected to exceed our 10000 ms global default timeout.

Alternatively, we increase this default timeout to some large value to not have to think about this.

@joshuakarp
Copy link
Contributor

@DrFacepalm

@tegefaulkes
Copy link
Contributor

I did some testing with optimizing the tests. I started with looking at the tests/bin/identities.test.ts

PASS tests/bin/identities.test.ts (151.822 s)  
 CLI Identities  
   commandAllowGestalts  
     ✓ Should allow permissions on node. (6248 ms)  
     ✓ Should allow permissions on Identity. (6057 ms)  
     ✓ Should fail on invalid inputs. (6263 ms)  
   commandDisallowGestalts  
     ✓ Should disallow permissions on Node. (6307 ms)  
     ✓ Should disallow permissions on Identity. (7155 ms)  
     ✓ Should fail on invalid inputs. (6871 ms)  
   commandPermissionsGestalts  
     ✓ Should get permissions on Node. (6576 ms)  
     ✓ Should get permissons on Identity. (6470 ms)  
   commandTrustGestalts  
     ✓ Should set trust on Node. (6477 ms)  
     ✓ Should set trust on Identity. (5940 ms)  
     ✓ Should fail on invalid inputs. (6312 ms)  
   commandUntrustGestalts  
     ✓ Should unset trust on Node. (5977 ms)  
     ✓ Should unset trust on Identity. (6133 ms)  
     ✓ Should fail on invalid inputs. (6214 ms)  
   commandClaimKeynode  
     ✓ Should claim a keynode. (6302 ms)  
   commandAuthenticateProvider  
     ✓ Should authenticate provider. (6234 ms)  
   commandGetGestalts  
     ✓ Should list gestalt by Node (5907 ms)  
     ✓ Should list gestalt by Identity (6008 ms)  
   commandListGestalts  
     ✓ Should list gestalts with permissions. (5820 ms)  
   commandSearchIdentities  
     ✓ Should find a connected identity. (6347 ms)  
   commandDiscoverGestalts  
     ✓ Should start discovery by Node (6081 ms)  
     ✓ Should start discovery by Identity (6658 ms)  
   Should fail when  
     ✓ Session is not running. (8176 ms)  
  
Test Suites: 1 passed, 1 total  
Tests: 23 passed, 23 total  
Snapshots: 0 total  
Time: 151.86 s  
Ran all test suites matching /tests\/bin\/identities.test.ts/i.  
GLOBAL TEARDOWN

after applying the optimisations;

  • constructing PolykeyAgent only once in a beforeAll
  • destroying the temp space once in a afterAll
  • cleaning up any possible side effects inside a beforeEach.

The tests ran more than 10 faster..

Determining test suites to run...  
GLOBAL SETUP  
PASS tests/bin/identities.test.ts (11.237 s)  
 CLI Identities  
   commandAllowGestalts  
     ✓ Should allow permissions on node. (65 ms)  
     ✓ Should allow permissions on Identity. (36 ms)  
     ✓ Should fail on invalid inputs. (26 ms)  
   commandDisallowGestalts  
     ✓ Should disallow permissions on Node. (16 ms)  
     ✓ Should disallow permissions on Identity. (19 ms)  
     ✓ Should fail on invalid inputs. (24 ms)  
   commandPermissionsGestalts  
     ✓ Should get permissions on Node. (19 ms)  
     ✓ Should get permissons on Identity. (19 ms)  
   commandTrustGestalts  
     ✓ Should set trust on Node. (21 ms)  
     ✓ Should set trust on Identity. (18 ms)  
     ✓ Should fail on invalid inputs. (19 ms)  
   commandUntrustGestalts  
     ✓ Should unset trust on Node. (17 ms)  
     ✓ Should unset trust on Identity. (22 ms)  
     ✓ Should fail on invalid inputs. (20 ms)  
   commandClaimKeynode  
     ✓ Should claim a keynode. (24 ms)  
   commandAuthenticateProvider  
     ✓ Should authenticate provider. (18 ms)  
   commandGetGestalts  
     ✓ Should list gestalt by Node (14 ms)  
     ✓ Should list gestalt by Identity (13 ms)  
   commandListGestalts  
     ✓ Should list gestalts with permissions. (60 ms)  
   commandSearchIdentities  
     ✓ Should find a connected identity. (13 ms)  
   commandDiscoverGestalts  
     ✓ Should start discovery by Node (19 ms)  
     ✓ Should start discovery by Identity (13 ms)  
   Should fail when  
     ✓ Session is not running. (69 ms)  
  
Test Suites: 1 passed, 1 total  
Tests: 23 passed, 23 total  
Snapshots: 0 total  
Time: 11.288 s  
Ran all test suites matching /tests\/bin\/identities.test.ts/i.  
GLOBAL TEARDOWN

Using this method should be mostly safe against tests affecting each other.
Since we can clean up any possible side effects in the afterEach the state can be reasonably clean between tests. since the afterEach runs regardless of the result of the test, failing tests shouldn't poison the state for other tests. Though this is not a guarentee.

@joshuakarp
Copy link
Contributor

This is great.

  • cleaning up any possible side effects inside a beforeEach.

@tegefaulkes Are you just doing a full DB clear to achieve this?

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Aug 11, 2021

I just removed any possible permissions using.

  afterEach(async () => {
    //This handles the cheap teardown between tests.

    //Clean up any dangling permissions.
    await polykeyAgent.gestalts.unsetGestaltActionByNode(node1.id, 'notify');
    await polykeyAgent.gestalts.unsetGestaltActionByNode(node1.id, 'scan');
    await polykeyAgent.gestalts.unsetGestaltActionByNode(node2.id, 'notify');
    await polykeyAgent.gestalts.unsetGestaltActionByNode(node2.id, 'scan');
    await polykeyAgent.gestalts.unsetGestaltActionByNode(node3.id, 'notify');
    await polykeyAgent.gestalts.unsetGestaltActionByNode(node3.id, 'scan');

    await polykeyAgent.gestalts.unsetGestaltActionByIdentity(
      identity1.providerId,
      identity1.identityId,
      'notify',
    );

Is there a method in the DB to clear everything?

@joshuakarp
Copy link
Contributor

I believe there's a db.clear function. However, I'm not sure of the effect with how we're using subleveldown (i.e. can we do a top level polykeyAgent.db.clear to completely wipe all db entries in the sublevels? Or will this delete the sublevels completely - might be the latter.) @CMCDragonkai may know about this, otherwise might be worth looking into (especially for tests which require lots of db insertions).

@joshuakarp
Copy link
Contributor

Or will this delete the sublevels completely - might be the latter.

If this is the case, might be worthwhile to introduce a utils function that recursively goes to the appropriate sublevels and clears them.

@tegefaulkes
Copy link
Contributor

it would be useful to have functions that clears certain aspects of the databases.
e.g. claims, permissions, known nodes, ect. but in the simplest form of just cleaning the slate would be useful for tests. then the process could just be...

  1. set up needed state for test.
  2. run the test.
  3. clean the slate
  4. repeat.

@CMCDragonkai
Copy link
Member

Yes clear clears all sublevels. Go to the EFS MR. And see the new DB tests there. It demonstrates how it works.

@CMCDragonkai
Copy link
Member

However it's not guaranteed that clear is atomic. So when I need atomicity I make sure to generate ops for deletion instead.

@CMCDragonkai
Copy link
Member

So if you do a clear from the root level, it will clear everything sublevel. That's how fresh works.

@tegefaulkes
Copy link
Contributor

!202 has me merged, closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

5 participants