Skip to content
This repository has been archived by the owner on Feb 8, 2024. It is now read-only.

CORTX-30537: add a way to disable formulaic pool versions #2123

Merged
merged 1 commit into from
Sep 1, 2022

Conversation

mssawant
Copy link

Hare automatically generates formulaic pool versions (i.e. different
combinations of layouts) based on the tolerances for failure domains
and corresponding layout parameters. In some configuration cases,
these multiple poolversions may not be required and in-order to use
other fault tolerant and recovery methods (e.g. automatic data recovery
on process restart). Thus, there needs to be a way to disable
generating formulaic pool versions.

Solution:

  • Add a flag in cdf to disable formulaic pool versions.
  • Update cfgen to read corresponding flag and skip generating
    formulaic pool versions.
  • Add corresponding flag to hare mini-provisioner, set it to False
    by default.

Signed-off-by: Mandar Sawant mandar.sawant@seagate.com

Copy link
Contributor

@Shreya-18 Shreya-18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@vaibhavparatwar
Copy link
Contributor

retest this please

@vaibhavparatwar
Copy link
Contributor

Deployment jobs are failing and premerge also failed. need to triage. Please paste the test results @mssawant

@stale
Copy link

stale bot commented Jun 29, 2022

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 6 days. Tagging @mssawant for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

andriytk added a commit to andriytk/cortx-motr that referenced this pull request Jul 6, 2022
Currently, if no clean pool version (actual or formulaic)
can be found for the new object on its creation, -ENOENT is
returned, which is not good. We want the user to be able to
create new objects even if this implies the degraded i/o on
them.

Solution: return the actual pver at conf_pver_find_locked()
in case when nothing better (cleaner) can be found.

Closes Seagate#1958.
Relates Seagate/cortx-hare#2123.

Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
andriytk added a commit to andriytk/cortx-motr that referenced this pull request Jul 7, 2022
Currently, if no clean pool version (actual or formulaic)
can be found for the new object on its creation, -ENOENT is
returned, which is not good. We want the user to be able to
create new objects even if this implies the degraded i/o on
them.

Solution: return the actual pver at conf_pver_find_locked()
in case when nothing better (cleaner) can be found.

Closes Seagate#1958.
Relates Seagate/cortx-hare#2123.

Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
rkothiya pushed a commit to Seagate/cortx-motr that referenced this pull request Jul 11, 2022
Currently, if no clean pool version (actual or formulaic)
can be found for the new object on its creation, -ENOENT is
returned, which is not good. We want the user to be able to
create new objects even if this implies the degraded i/o on
them.

Solution: return the actual pver at conf_pver_find_locked()
in case when nothing better (cleaner) can be found.

Closes #1958.
Relates Seagate/cortx-hare#2123.

Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
@vaibhavparatwar
Copy link
Contributor

@mssawant can we merge (after proper testing) discussion with @chandradharraval dependent Motr PR Seagate/cortx-motr#1959 is merged already.

@stale stale bot removed the needs-attention label Jul 21, 2022
@vaibhavparatwar
Copy link
Contributor

@mssawant deployment, premerge and sanity are failing on this PR. need to check.

@vaibhavparatwar
Copy link
Contributor

retest this please

@mssawant
Copy link
Author

retest this please

1 similar comment
@vaibhavparatwar
Copy link
Contributor

retest this please

@hessio hessio added the Status: Checks Failed Checks have failed on this PR label Aug 8, 2022
@stale
Copy link

stale bot commented Aug 16, 2022

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 6 days. Tagging @mssawant for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

@vaibhavparatwar
Copy link
Contributor

retest this please

@stale stale bot removed the needs-attention label Aug 17, 2022
@mssawant mssawant force-pushed the formulaic-diable branch 2 times, most recently from 520f601 to 8b7f5fe Compare August 23, 2022 14:24
@mssawant
Copy link
Author

mssawant commented Sep 1, 2022

Hare Sanity is failing at HA pod deployment,

16:26:06  ERROR: Rollout of deployment/cortx-ha timed out after 241 seconds
16:26:06  
16:26:06  ERROR: A timeout occurred while waiting for one or more resources during the CORTX cluster installation.

@pavankrishnat
Copy link
Contributor

retest this please

@mssawant
Copy link
Author

mssawant commented Sep 1, 2022

Fixed the make test issue.
Pre-merge is failing due to shutdown timeout and got aborted,

11:43:53  Stopping m0d@0x7200000000000001:0x2 (ios) at localhost:0f1314cc... 
11:43:53  Stopping m0d@0x7200000000000001:0x3 (ios) at localhost:0f1314cc... 
11:43:53  Stopped m0d@0x7200000000000001:0x3 (ios) at localhost:0f1314cc
11:43:53  Stopped m0d@0x7200000000000001:0x2 (ios) at localhost:0f1314cc
11:43:53  Stopping m0d@0x7200000000000001:0x1 (confd) at localhost:0f1314cc... 
11:47:52  Cancelling nested steps due to timeout
[Pipeline] }
[Pipeline] // script
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Declarative: Post Actions)
[Pipeline] script
[Pipeline] {
[Pipeline] sshCommand
11:47:52  Executing command on cortx-vm-name[ssc-vm-g2-rhev4-3193.colo.seagate.com]: 
11:47:52                      cluster_status=$(( hctl status ) 2>&1)
11:47:52                      if [ "$cluster_status" != "Cluster is not running" ]; then hctl shutdown; fi
11:47:52                       sudo: false
11:47:53  Stopping m0d@0x7200000000000001:0x1 (confd) at localhost:0f1314cc... 
11:48:55  Stopped m0d@0x7200000000000001:0x1 (confd) at localhost:0f1314cc
11:48:55  Stopping hare-hax at localhost:0f1314cc... 
11:50:25  Stopped hare-hax at localhost:0f1314cc
11:50:25  Making sure that RC leader can be re-elected next time
11:50:25  Stopping hare-consul-agent at localhost:0f1314cc... 

Using context: premerge-test
Timeout has been exceeded
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at hudson.remoting.Request.call(Request.java:177)
	at hudson.remoting.Channel.call(Channel.java:997)
	at org.jenkinsci.plugins.sshsteps.steps.CommandStep$Execution.run(CommandStep.java:72)
	at org.jenkinsci.plugins.sshsteps.util.SSHStepExecution.lambda$start$0(SSHStepExecution.java:84)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Finished: ABORTED

Hare automatically generates formulaic pool versions (i.e. different
combinations of layouts) based on the tolerances for failure domains
and corresponding layout parameters. In some configuration cases,
these multiple poolversions may not be required and in-order to use
other fault tolerant and recovery methods (e.g. automatic data recovery
on process restart). Thus, there needs to be a way to disable
generating formulaic pool versions.

Solution:
- Add a flag in cdf to disable formulaic pool versions.
- Update cfgen to read corresponding flag and skip generating
  formulaic pool versions.
- Add corresponding flag to hare mini-provisioner, set it to False
  by default.

Signed-off-by: Mandar Sawant <mandar.sawant@seagate.com>
@mssawant mssawant merged commit bca30e0 into Seagate:main Sep 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla-signed Status: Checks Failed Checks have failed on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants