Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(#12): cloud-based execution of move contact #177

Merged
merged 25 commits into from
Jun 29, 2024

Conversation

paulpascal
Copy link
Contributor

@paulpascal paulpascal commented May 18, 2024

Closes: #12

This PR introduces a cloud-based execution of the move-contact feature, enhancing user experience by replacing the previous cht-conf command-line approach.

To achieve this, we've implemented bullmq, a powerful and flexible queue system built on Redis.

Here's a breakdown of the process:

  1. Upon initiating a contact move, essential details such as contact_id, parent_id, instanceUrl, and the current sessionToken (encrypted) are compiled and submitted as a job to the MOVE_CONTACT_QUEUE.

  2. A dedicated worker monitors this queue, processing incoming jobs:

    • It has the flexibility to defer processing based on predefined logic, allowing for scheduled execution.
    • Alternatively, it can proceed to execute the cht-conf command via a child_process, utilizing all pertinent job data, including the decrypted sessionToken.
  3. Upon successful execution, the job is marked as completed. In case of any failures, the job is flagged accordingly. Users can conveniently track job statuses directly from the user management tool's dashboard.

Note

Enabling cht-conf to support session tokens required implementing some modifications, detailed here.
(the session token is initial got from _session request)

@paulpascal paulpascal force-pushed the 12-move-contact-cloud-based branch from d8eb298 to d770b31 Compare May 20, 2024 13:32
Copy link
Collaborator

@mrjones-plip mrjones-plip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good work here - great to see a compose file that I suspect works out of the box!

My big question is how we're building and publishing images. I have some smaller feedback points and tried to make code ```suggestions when possible - but lemme know if I got any of those wrong!

docker-compose.yml Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
@paulpascal
Copy link
Contributor Author

Thanks @mrjones-plip for this great review, I will be going through them one by one and get back to you.
About the publishing, I was also thinking, should it be possible to do it the same way we did for the current version (using publish script). Maybe it will need some updates on the current docker stuff I did? (Forgive if I'm sinning here 🥲 I am not a 😅pro on docker)

cc: @kennsippell

Copy link
Member

@kennsippell kennsippell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work Paul!

Dockerfile Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
shared/config/index.ts Outdated Show resolved Hide resolved
shared/encryption/index.ts Outdated Show resolved Hide resolved
worker/worker.ts Outdated Show resolved Hide resolved
worker/worker.ts Outdated Show resolved Hide resolved
worker/jobs/move-contact/move-contact-job.processor.ts Outdated Show resolved Hide resolved
worker/jobs/move-contact/move-contact-job.processor.ts Outdated Show resolved Hide resolved
worker/worker.ts Outdated Show resolved Hide resolved
@paulpascal
Copy link
Contributor Author

Thanks @kennsippell , great review. Really appreciate. Will be making the right adjustment

@kennsippell
Copy link
Member

kennsippell commented May 21, 2024

Testing today:

  1. If redis service is not running, the worker process logs errors but doesn't exit. Seems it should fail fast.
  2. The Cht command output: output with all the new lines is extremely verbose. I think we can just output the raw data. It'd also be nice to output the actual command being executed.'
  3. Can we add json_docs and upload-docs.*.json into .gitignore since this is created when worker thread is run in active directory.
  4. We need to think about how to handle the case where the docker container is killed (eg. deployment) while a job is executing. This can leave contacts in a broken state.
  5. adding new job to MOVE_CONTACT_QUEUE with [object Object]

Copy link
Member

@kennsippell kennsippell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff Paul. I did another review here but I got tired and didn't look through tests yet.

src/worker/move-contact-worker.ts Outdated Show resolved Hide resolved
src/shared/queue-config.ts Outdated Show resolved Hide resolved
src/shared/queue-config.ts Outdated Show resolved Hide resolved
src/worker/move-contact-worker.ts Show resolved Hide resolved
src/worker/move-contact-worker.ts Outdated Show resolved Hide resolved
test/worker/move-contact-worker.spec.ts Outdated Show resolved Hide resolved
src/worker/move-contact-worker.ts Outdated Show resolved Hide resolved
src/worker/move-contact-worker.ts Outdated Show resolved Hide resolved
src/worker/move-contact-worker.ts Show resolved Hide resolved
src/worker/move-contact-worker.ts Outdated Show resolved Hide resolved
@paulpascal paulpascal requested a review from kennsippell June 10, 2024 09:53
@paulpascal paulpascal requested a review from mrjones-plip June 10, 2024 09:53
@paulpascal
Copy link
Contributor Author

Hi, @mrjones-plip it took a bit but finally got all the docker feedback addressed 🥲. Would you like to check it back when ever you have time?

@kennsippell, i have fixed the delay stuff as well.

Thanks

Copy link
Member

@kennsippell kennsippell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nice progress. This feels close.

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
package.json Show resolved Hide resolved
src/lib/authentication.ts Outdated Show resolved Hide resolved
src/worker/move-contact-worker.ts Outdated Show resolved Hide resolved
test/lib/authentication.spec.ts Outdated Show resolved Hide resolved
test/lib/move.spec.ts Show resolved Hide resolved
test/worker/move-contact-worker.spec.ts Outdated Show resolved Hide resolved
job.log(`[${new Date().toISOString()}]: ${result.message}`);
const errorMessage = `Job ${job.id} failed with the following error: ${result.message}`;
console.error(errorMessage);
throw new Error(errorMessage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some point it might be nice to handle errors a bit better. maybe not in this PR, but what if cht-conf has an error - the most common scenario here is losing connectivity with the server during a heavy move. should the job be retried?

i'll make a quick comment about aborted/canceled jobs or generally handling any case where a contact's move is interrupted

  • the good news is that if a move is interrupted, regardless of the state cht-conf can be rerun. it will aggregate all documents (those in the new position already and those in the old position) and move them all again. this is really an incredible win - if a job fails, it just needs to be retried.
  • the bad news is that cht-user-management won't let you retry an aborted job if the top-level contact successfully moves. it will give an error like "A already has B as parent" or whatever.

It's going to happen, and we're going to need to deal with it at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really important, the way I was dealing with it is to see from the Board, the issue with the failed job (from the logs).
If it has failed due to the lost of connectivity, then I just retry from the Board.

But we can handle this properly in another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make an issue so we don't forget?

Copy link
Collaborator

@mrjones-plip mrjones-plip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! Looking really good here. Most of my feedback is trying both make it easy to set up the dev environment, while also keeping sane defaults for production.

While I did easily stand up an instance using the bash script, I'm unable to connect to redis for some reason.

both the main service and the work service have this error, where 192.168.16.2 is the IP of the redis container:

docker logs cht-user-management-cht-user-management-1

Error: connect ECONNREFUSED 192.168.16.2:6378
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1606:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '192.168.16.2',
  port: 6378
}

does this work for you?

README.md Outdated Show resolved Hide resolved
env.example Outdated Show resolved Hide resolved
docker-local-setup.sh Outdated Show resolved Hide resolved
Dockerfile.worker Outdated Show resolved Hide resolved
Dockerfile.worker Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Dockerfile.worker Outdated Show resolved Hide resolved
env.example Outdated Show resolved Hide resolved
@paulpascal
Copy link
Contributor Author

Thanks @mrjones-plip and @kennsippell for your great reviews 🔥.

Really appreciate that, will go through it thoroughly.

Co-authored-by: Ashley <8253488+mrjones-plip@users.noreply.github.com>
Co-authored-by: Kenn Sippell <kennsippell@gmail.com>
@paulpascal paulpascal force-pushed the 12-move-contact-cloud-based branch from d88902a to c5cc101 Compare June 19, 2024 16:34
@paulpascal
Copy link
Contributor Author

paulpascal commented Jun 19, 2024

docker logs cht-user-management-cht-user-management-1

Error: connect ECONNREFUSED 192.168.16.2:6378
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1606:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '192.168.16.2',
  port: 6378
}

does this work for you?

Yes @mrjones-plip, it works for me, we can please have a little sync on that if its okay with you.

@paulpascal paulpascal force-pushed the 12-move-contact-cloud-based branch from 594f62f to 31e91d1 Compare June 22, 2024 02:57
Copy link
Member

@kennsippell kennsippell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do one more, but I think we're basically there.

README.md Outdated Show resolved Hide resolved
src/lib/authentication.ts Outdated Show resolved Hide resolved
src/lib/queues.ts Outdated Show resolved Hide resolved
src/lib/queues.ts Outdated Show resolved Hide resolved
src/lib/move.ts Outdated Show resolved Hide resolved
job.log(`[${new Date().toISOString()}]: ${result.message}`);
const errorMessage = `Job ${job.id} failed with the following error: ${result.message}`;
console.error(errorMessage);
throw new Error(errorMessage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make an issue so we don't forget?

src/worker/move-contact-worker.ts Show resolved Hide resolved
src/worker/move-contact-worker.ts Outdated Show resolved Hide resolved
src/lib/authentication.ts Outdated Show resolved Hide resolved
test/worker/move-contact-worker.spec.ts Outdated Show resolved Hide resolved
Copy link
Collaborator

@mrjones-plip mrjones-plip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @paulpascal! I REALLY love that NODE_ENV is exposed in the .env file now so it makes it trivial to run in dev mode in docker - so handy!

I'm not getting any more errors for redis service in the docker log, so I assume this is working? I don't know how to test.

I did have one issue that I wasn't able log in - here's what I did:

  1. deploy 4.8.1 instance with docker helper 192-168-68-26.local-ip.medicmobile.org:10446
  2. set CHT_DEV_HTTP=true and CHT_DEV_URL_PORT=192-168-68-26.local-ip.medicmobile.org:10446 and NODE_ENV=dev
  3. in CHT Core, create a test user that is role User Manager
  4. start user man tool with: ./docker-local-setup.sh build
  5. try to login with that user on 127.0.0.1:3000/login

expected: I can log in
actual: get error "login failed. please try again"

Here's my .env in case it's helpful:

NODE_ENV=dev
COOKIE_PRIVATE_KEY=quux
QUEUE_PRIVATE_KEY=smang
CONFIG_NAME=chis-ke
PORT=3000
EXTERNAL_PORT=3000
INTERFACE=0.0.0.0
CHT_DEV_HTTP=true
CHT_DEV_URL_PORT=192-168-68-26.local-ip.medicmobile.org:10446

docker-local-setup.sh Show resolved Hide resolved
env.example Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
env.example Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
paulpascal and others added 2 commits June 25, 2024 01:41
Co-authored-by: mrjones <8253488+mrjones-plip@users.noreply.github.com>
Co-authored-by: Kenn Sippell <kennsippell@gmail.com>
Copy link
Collaborator

@mrjones-plip mrjones-plip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a docker and docker compose perspective, with the updated env.example and updated readme.md - I think this PR is ready to go from my perspective. I still was unable to log in, so please be sure and test this with a clean set up to ensure there's no regressions!

@paulpascal
Copy link
Contributor Author

... I still was unable to log in, so please be sure and test this with a clean set up to ensure there's no regressions!

Thanks @mrjones-plip , I will make sure to have that working.

@paulpascal paulpascal force-pushed the 12-move-contact-cloud-based branch from ef2573a to cf89625 Compare June 27, 2024 02:37
Copy link
Member

@kennsippell kennsippell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Let's go!

src/server.ts Show resolved Hide resolved
test/scripts/wait_for_redis.sh Outdated Show resolved Hide resolved
@paulpascal paulpascal merged commit 975d57c into main Jun 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cloud-Based execution of move-contacts commands
3 participants