Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor handling of init systems #355

Closed
wants to merge 15 commits into from

Conversation

codenrhoden
Copy link
Contributor

The idea here is to create and use classes with a common interface to abstract working with the init systems. Should simplify the code immensely.

This will be a pre-req to fixing http://tracker.ceph.com/issues/13087, which has us use the new ceph-detect-init tool when it is present.

@ceph-jenkins
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/ceph-deploy-pull-requests/1759/
Test FAILed.

@ceph-jenkins
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/ceph-deploy-pull-requests/1760/
Test PASSed.

@ceph-jenkins
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/ceph-deploy-pull-requests/1761/
Test PASSed.

@ceph-jenkins
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/ceph-deploy-pull-requests/1762/
Test PASSed.

@ceph-jenkins
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/ceph-deploy-pull-requests/1763/
Test PASSed.

@ceph-jenkins
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/ceph-deploy-pull-requests/1764/
Test PASSed.

@ceph-jenkins
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/ceph-deploy-pull-requests/1765/
Test PASSed.

@codenrhoden codenrhoden changed the title DNM: Refactor handling of init systems Refactor handling of init systems Sep 15, 2015
@codenrhoden
Copy link
Contributor Author

I think this in a good state to review now.

I've tested on CentOS 7, using both Hammer release, and latest Infernalis dev packages. This covered both sysvinit and systemd.

I did the same thing on Trusty, and that successfully gets upstart on each.

Using the latest Infernalis packages, I did get an error when trying activate file-based OSDs on both Trusty and CentOS 7::

[trusty1][WARNIN] 2015-09-15 23:30:57.355914 7f60eede0980 -1 filestore(/tmp/osd1) mkfs: write_version_stamp() failed: (13) Permission denied
[trusty1][WARNIN] 2015-09-15 23:30:57.355938 7f60eede0980 -1 OSD::mkfs: ObjectStore::mkfs failed with error -13
[trusty1][WARNIN] 2015-09-15 23:30:57.357761 7f60eede0980 -1  ** ERROR: error creating empty object store in /tmp/osd1: (13) Permission denied

This seems more to do with running the daemons as the ceph users than anything to do with the init system, though.

I should still do a couple tests using RGW and MDS, but it's looking good so far.

@alfredodeza @liewegas feel free to take a look. There is additional clean up that could be done, but this is all the obvious stuff. It is nice to not have distro-specific versions of the "mon create" stuff anymore. :)

Once this is in, I can work on switching over to ceph-detect-init when its presence is detected.

@liewegas
Copy link
Member

On Tue, 15 Sep 2015, Travis Rhoden wrote:

[trusty1][WARNIN] 2015-09-15 23:30:57.355914 7f60eede0980 -1 filestore(/tmp/osd1) mkfs: write_version_stamp() failed: (13) Permission denied
[trusty1][WARNIN] 2015-09-15 23:30:57.355938 7f60eede0980 -1 OSD::mkfs: ObjectStore::mkfs failed with error -13
[trusty1][WARNIN] 2015-09-15 23:30:57.357761 7f60eede0980 -1 ** ERROR: error creating empty object store in /tmp/osd1: (13) Permission denied

if it's dcrypt, that's http://tracker.ceph.com/issues/13000

This seems more to do with running the daemons as the ceph users than anything to do with the init system, though.

yeah

@codenrhoden
Copy link
Contributor Author

FWIW, the permissions error is because I was using directories for OSDs (which I rarely do) and created them as root.

If I did...

chown -R ceph:ceph /tmp/osd1
ceph-deploy osd prepare <host>:/tmp/osd1
ceph-deploy osd activate <host>:/tmp/osd1

things worked just fine. So as part of Infernalis, we'll probably need to update docs to say that if you are using directories for OSD testing, you need to set the right perms. :) Block devices are all taken care of automatically by the tooling. I wonder if ceph-disk should do something similar when pointed at a directory?

@ceph-jenkins
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/ceph-deploy-pull-requests/1766/
Test PASSed.

@ceph-jenkins
Copy link
Collaborator

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/ceph-deploy-pull-requests/1767/
Test PASSed.

@codenrhoden
Copy link
Contributor Author

RGW and MDS services seem correct on Trusty/Upstart.

I had to push one more commit to fix RGW on sysvinit. I wasn't properly handling that the 'ceph' and 'ceph-radosgw' init scripts have completely different options (like -c, or that the rgw script can't take a daemon name).

That's all fixed now.

@codenrhoden
Copy link
Contributor Author

@dmick @alfredodeza care to take a look at this one?

def _get_init_script(self, service):
return 'ceph-radosgw' if service == 'ceph-radosgw' else 'ceph'

def _build_service_command(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any tests for this one. It seems like it is one of the few that has a lot of behavior (maybe there are tests for it already?)

It might get tricky to test it out because the InitSystem class has eager behavior in __init__ (it connects remotely to find out what the hostname is).

If that results in making it difficult to test maybe hostname should be a @property so that is only called/used when needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alfredodeza Thanks for pointing out the eager behavior in __init__. I've removed that in the two places I had it and used @property instead. A definite improvement.

@codenrhoden
Copy link
Contributor Author

Oh man - I never realized this never went in. That's a shame, it was a big improvement! :)

I suppose this should just be closed at this point.

@liewegas
Copy link
Member

can it be rebased?

@osynge
Copy link

osynge commented Sep 27, 2016

@codenrhoden I think this could be rebased and made to work again and is a very big improvement.

@codenrhoden
Copy link
Contributor Author

@liewegas @osynge I'm sure it can be. I just need to go through any fixes/improvements from the last year to see if there is anything that needs to be added.

I'll try to get to that over the next week or two.

I'm going to be working on some Ceph related stuff over the next few weeks, so I'm happy to help out here!

@vasukulkarni
Copy link
Contributor

@codenrhoden sorry this seems to have become old, the newer centos and ubuntu is only systemd now and the current logic although simple handles it better and is well tested.

Travis Rhoden added 7 commits September 27, 2016 13:54
Signed-off-by: Travis Rhoden <trhoden@redhat.com>
Signed-off-by: Travis Rhoden <trhoden@redhat.com>
Signed-off-by: Travis Rhoden <trhoden@redhat.com>
There is no longer any need to have a per-distro
implementation of mon create.

Signed-off-by: Travis Rhoden <trhoden@redhat.com>
Signed-off-by: Travis Rhoden <trhoden@redhat.com>
Signed-off-by: Travis Rhoden <trhoden@redhat.com>
Signed-off-by: Travis Rhoden <trhoden@redhat.com>
@codenrhoden
Copy link
Contributor Author

well whether it makes sense or not, I'll take a stab at it. I've just pushed an initial rebase, but no testing has been done at this point.

@vasukulkarni
Copy link
Contributor

@codenrhoden
Copy link
Contributor Author

@vasukulkarni: thanks for the links. I haven't done much testing yet, so I don't expect things to work. However, the errors in Pulpito are not code related. They are all failing when doing git clone -b RM-13087 git://git.ceph.com/ceph-deploy.git. The problem is there is no RM-13087 branch at git.ceph.com/ceph-deploy.git. That branch only exists in my (codenrhoden) fork.

Not sure how Pulpito normally handles that.

If the branch can be overriden, I believe git clone -b refs/pull/355/head ... might work, but only if refs/pull/* are also copied over to git.ceph.com.

Travis Rhoden and others added 8 commits October 4, 2016 19:16
Make sure we enable ceph/ceph.target appropriately

Signed-off-by: Travis Rhoden <trhoden@redhat.com>
Signed-off-by: Travis Rhoden <trhoden@redhat.com>
Signed-off-by: Travis Rhoden <trhoden@redhat.com>
Signed-off-by: Travis Rhoden <trhoden@redhat.com>
Since we have a single place handling "service ceph ..." vs
"service ceph-radosgw", we need to handle the fact that their
options are different.

Refactor code to build the exectuable command all in one
place

Signed-off-by: Travis Rhoden <trhoden@redhat.com>
Use @Property instead, to only reach out to remote node when
the attr in question is needed, rather than setting it on
initialization.

Signed-off-by: Travis Rhoden <trhoden@redhat.com>
Signed-off-by: Travis Rhoden <trhoden@redhat.com>
Signed-off-by: Travis Rhoden <trhoden@gmail.com>
@codenrhoden
Copy link
Contributor Author

We'll see what Jenkins thinks of the latest changes. I was able to get all tests passing on py27 and py34 plus flake8 with latest changes (updating pytest and mock version requirements). I wasn't easily able to get a py26 env installed on CentOS 7.2. I guess it's no longer in EPEL?

@codenrhoden
Copy link
Contributor Author

codenrhoden commented Oct 4, 2016

  py26: commands succeeded
  py27: commands succeeded
ERROR:   py34: InterpreterNotFound: python3.4
  flake8: commands succeeded

So, if Python 3.4 is going to be tested, you should probably add it to the Jenkins nodes. :)

I also want to be clear that this was just focusing on the unit tests via Jenkins. Still haven't done "real world" testing yet.

@vasukulkarni
Copy link
Contributor

@codenrhoden that failure as you said is because its in your git repo which i didn't notice , it only looks for branh in ceph repo https://github.com/ceph/ceph-qa-suite/blob/master/tasks/ceph_deploy.py#L66, If you can create the branch in ceph-deploy repo it will be easy to get some runs on hammer/jewel

@alfredodeza
Copy link
Contributor

ceph is systemd only now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants