Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce new MultiContainerRun function to run pods with two containers #3095

Merged
merged 7 commits into from
Oct 3, 2024

Conversation

hairyhum
Copy link
Contributor

@hairyhum hairyhum commented Sep 6, 2024

Change Overview

In order to separate data dump generation by database tools and export by kando, a pod performing database dump can have two separate containers and set up the pipe over file instead of anonymous pipe.

The goal of this function is to enable database-specific blueprints to use official database images separately from kanister-tools image instead of building a custom image for each databse blueprint.

Pull request type

Please check the type of change your PR introduces:

  • 🚧 Work in Progress
  • 🌈 Refactoring (no functional changes, no api changes)
  • 🐹 Trivial/Minor
  • 🐛 Bugfix
  • 🌻 Feature
  • 🗺️ Documentation
  • 🤖 Test
  • 🏗️ Build

Issues

  • fixes #issue-number

Test Plan

  • 💪 Manual
  • ⚡ Unit test
  • 💚 E2E

@hairyhum
Copy link
Contributor Author

hairyhum commented Sep 6, 2024

There is an example in tests and docs.
Another example of blueprint:

apiVersion: cr.kanister.io/v1alpha1
kind: Blueprint
metadata:
  name: kube-task-parallel-bp
  namespace: kanister
actions:
  dump:
    phases:
    - func: MultiContainerRun
      name: dump
      args:
        namespace: kanister
        sharedVolumeMedium: Memory
        sharedVolumeSizeLimit: 1Gi
        sharedVolumeDir: "/tmp/"
        backgroundImage: ubuntu:latest
        backgroundCommand:
        - bash
        - -o
        - errexit
        - -o
        - pipefail
        - -c
        - |
          mkfifo /tmp/pipe-file
          for i in {1..10}
          do
            echo $i
            sleep 1
          done > /tmp/pipe-file
        outputImage: ghcr.io/kanisterio/kanister-tools:v9.99.9-dev
        outputCommand:
        - bash
        - -o
        - errexit
        - -o
        - pipefail
        - -c
        - |
          while [ ! -e /tmp/pipe-file  ]; do sleep 1; done
          kando output numbers "$(cat /tmp/pipe-file)"

docs/functions.rst Outdated Show resolved Hide resolved
@viveksinghggits
Copy link
Contributor

Hi @hairyhum ,
Are we doing this to solve any performance issue that were reported? Can you please share some background just for my understanding.

@hairyhum
Copy link
Contributor Author

hairyhum commented Sep 9, 2024

Hi @hairyhum , Are we doing this to solve any performance issue that were reported? Can you please share some background just for my understanding.

The main goal of this change is to separate image management for databse dump blueprints. Not the performance.

@hairyhum hairyhum changed the title feat: Introduce new Dump function to run pods with two containers feat: Introduce new KubeTaskParallel function to run pods with two containers Sep 11, 2024
@hairyhum
Copy link
Contributor Author

Renamed function into KubeTaskParallel and the arguments for containers to background and output containers.

@hairyhum hairyhum force-pushed the dump-function branch 2 times, most recently from 514f8b2 to fdc0a1b Compare September 11, 2024 16:11
Copy link
Contributor

@e-sumin e-sumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but for me function name looks a bit misleading.
But, frankly, I can't propose better name. Maybe other reviewers will have an ideas ?

pkg/function/kube_task_parallel.go Outdated Show resolved Hide resolved
Copy link
Contributor

@e-sumin e-sumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@viveksinghggits viveksinghggits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me overall. just left some nit comments/formatting suggestion.

@@ -150,6 +150,82 @@ Example:
- |
echo "Example"

KubeTaskParallel
Copy link
Contributor

@viveksinghggits viveksinghggits Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we should the name of the function in next community call? It's good right now, but by reading the name it looks like in this function we are doing things parallelly that kubetask does sequentially, which doesn't seem to be the case right?
I think it doesn't necessarily have to have KubeTask in its name right? Maybe MultiContainerUpload or something in that line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid the "purpose" style names like "dump", "upload" or "export". While this function is used for that purpose it's deliberately made generic.

Copy link
Contributor

@viveksinghggits viveksinghggits Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's take opinions from other folks as well. @pavannd1 @PrasadG193 @mlavi .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow KubeTaskParallel doesn't sound good to me. I agree that it would be better if we avoid KubeTask* prefix for the new function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use MultiContainerPod or MultiContainerRun, but we already use KubeTaslk instead of Pod in the other function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like MultiContainerRun. Do we need to change the arguments name as well? that are currently named background and ouptut I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard. Okay with MultiContainerRun or maybe MultiContainerJob

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have better names than background and output for containers.

Copy link
Contributor

@pavannd1 pavannd1 Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the use of the named pipe - one of the containers pushes data while the other one receives it.
Here are some options I could think of:

  • input and output
  • producer and consumer
  • writer and reader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the use of the named pipe - one of the containers pushes data while the other one receives it.

That is not a part of the function itself, but a part of the command. The only practical difference in the implementation of those containers is that only one of them sends its ouput into the function output. Otherwise you can make the output container wirte data to a file and you can make the background container read from the file.

docs/functions.rst Outdated Show resolved Hide resolved
docs/functions.rst Outdated Show resolved Hide resolved
Comment on lines +171 to +172
`backgroundImage`, Yes, `string`, image to be used in "background" container
`backgroundCommand`, Yes, `[]string`, command list to execute in "background" container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call this container snapshot or backup container? or maybe data container? just a suggestion, if you think background is ok, I am good.

Comment on lines +173 to +174
`outputImage`, Yes, `string`, image to be used in "output" container
`outputCommand`, Yes, `[]string`, command list to execute in "output" container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be called upload container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as a note before, trying to name things for what they do rather than what they can be used for.

@hairyhum hairyhum changed the title feat: Introduce new KubeTaskParallel function to run pods with two containers feat: Introduce new MultiContainerRun function to run pods with two containers Oct 2, 2024
@hairyhum
Copy link
Contributor Author

hairyhum commented Oct 2, 2024

Renamed function to `MultiContainerRun'

…ntainers

In order to separate data dump generation by database tools and export
by `kando`, a pod performing database dump can have two separate containers
and set up the pipe over file instead of anonymous pipe
Init container can be used to set up data in the shared volume.
@hairyhum hairyhum added the kueue label Oct 3, 2024
@mergify mergify bot merged commit 7723a4a into master Oct 3, 2024
18 checks passed
@mergify mergify bot deleted the dump-function branch October 3, 2024 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants