Skip to content
This repository has been archived by the owner on Mar 6, 2023. It is now read-only.

Role tasks mess up the entire system's permissions #109

Closed
walterdolce opened this issue Oct 16, 2019 · 10 comments
Closed

Role tasks mess up the entire system's permissions #109

walterdolce opened this issue Oct 16, 2019 · 10 comments
Labels
wontfix This will not be worked on

Comments

@walterdolce
Copy link

It looks like under certain circumstances, this role seems to be wreaking havoc where it is provisioned. It changes the permissions of the entire root folder (!?).

Example output:

TASK [cloudalchemy.node-exporter : Install dependencies] *********************************************************************************************************************

TASK [cloudalchemy.node-exporter : Create the node_exporter group] ***********************************************************************************************************
changed: [the_vm_ip]

TASK [cloudalchemy.node-exporter : Create the node_exporter user] ************************************************************************************************************
fatal: [the_vm_ip]: FAILED! => {"changed": false, "msg": "[Errno 1] Operation not permitted: '/proc/sys'"}

After this failure, I SSH'd into the system and found this....

system-username@vm-hostname:~$ ls -la /
total 88
drwxr-xr-x  23 node-exp users  4096 Oct 16 12:43 .
drwxr-xr-x  23 node-exp users  4096 Oct 16 12:43 ..
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:31 bin
drwxr-xr-x   3 node-exp users  4096 Oct 10 19:32 boot
drwxr-xr-x  16 node-exp users  3580 Oct 16 12:43 dev
drwxr-xr-x 103 node-exp users  4096 Oct 16 13:01 etc
drwxr-xr-x  18 node-exp users  4096 Oct 16 12:59 home
lrwxrwxrwx   1 root     root     31 Oct 10 19:32 initrd.img -> boot/initrd.img-4.15.0-1046-gcp
lrwxrwxrwx   1 root     root     31 Oct 10 19:32 initrd.img.old -> boot/initrd.img-4.15.0-1046-gcp
drwxr-xr-x  20 node-exp users  4096 Oct 16 12:51 lib
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:29 lib64
drwx------   2 node-exp users 16384 Oct 10 19:31 lost+found
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:29 media
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:29 mnt
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:29 opt
dr-xr-xr-x 717 node-exp users     0 Oct 16 12:43 proc
drwx------   4 node-exp users  4096 Oct 16 12:51 root
drwxr-xr-x  23 node-exp users   940 Oct 16 13:02 run
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:31 sbin
drwxr-xr-x   2 node-exp users  4096 Oct 16 12:43 snap
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:29 srv
dr-xr-xr-x  13 node-exp users     0 Oct 16 12:48 sys
drwxrwxrwt   8 node-exp users  4096 Oct 16 13:01 tmp
drwxr-xr-x  10 node-exp users  4096 Oct 10 19:29 usr
drwxr-xr-x  14 node-exp users  4096 Oct 16 12:51 var
lrwxrwxrwx   1 root     root     28 Oct 10 19:32 vmlinuz -> boot/vmlinuz-4.15.0-1046-gcp
lrwxrwxrwx   1 root     root     28 Oct 10 19:32 vmlinuz.old -> boot/vmlinuz-4.15.0-1046-gcp

The above is a VM running in GCP. I have another VM running in GCP where I have run the same version of the role against it and this did not happen.

This is the playbook where this is happening:

# Playbook where the issue happens
---
- hosts: "{{ hosts_group }}"
  gather_facts: true
  become: true
  roles:
    - role: lifeofguenter.oracle-java
      become: yes
    - role: jobscore.beats
      become: yes
    - role: torian.logstash
       become: yes
    - role: cloudalchemy.node-exporter

And this is the playbook where this does not happen:

# Playbook where problem does not occur
---
- hosts: "{{ hosts_group }}"
  gather_facts: yes
  roles:
    - role: jobscore.beats
      become: yes
    - role: torian.logstash
      become: yes
    - role: cloudalchemy.node-exporter

And this is the requirements.yml file used in both projects:

---
- src: https://github.com/jobscore/ansible-role-beats/archive/v0.1.1.tar.gz
  name: jobscore.beats
- src: https://github.com/torian/ansible-role-logstash/archive/1.2.0.tar.gz
  name: torian.logstash
- src: https://github.com/lifeofguenter/ansible-role-oracle-java/archive/1.0.2.tar.gz
  name: lifeofguenter.oracle-java
- src: https://github.com/cloudalchemy/ansible-node-exporter/archive/0.15.0.tar.gz
  name: cloudalchemy.node-exporter

The only visible difference is the become: true defined in the playbook where this happens. But still, why would the role change the permissions of the entire system? 🤔

@paulfantom
Copy link
Member

paulfantom commented Oct 16, 2019

The task that changes file permissions is https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/configure.yml#L11-L19. However it changes those permissions only to directory specified in node_exporter_textfie_dir ansible variable, which is set to /var/lib/node_exporter by default in https://github.com/cloudalchemy/ansible-node-exporter/blob/master/defaults/main.yml#L8. It looks to me like you somewhere set this to / and run as root.

@walterdolce
Copy link
Author

I have just tried to remove become: true from the playbook where this is happening. The problem persists.

@walterdolce
Copy link
Author

Thanks for the quick reply @paulfantom!
I found the problem. This is the problem:

I removed home and the role did not mess up the permissions after a full provisioning. I suspect Ansible chmods the folder when the task module is run. And I tend to think this is expected (in the end, if you tell it where its home is, everything in that location should belong to that user:group).

The root folder is not a "home" for anyone. And I think it shouldn't be treated as such. I will send a PR immediately.

@paulfantom
Copy link
Member

paulfantom commented Oct 16, 2019

Just to bring in discussion in one place.

Setting home: / was implemented to be compatible with https://github.com/dev-sec/ansible-os-hardening role, which I don't want to break. Especially that it is not uncommon to have system user with home directory pointing to /.

Also looking at failing test output from your PR it looks to me more like a problem with user module in ansible. Specifically with createhome alias option. We are setting it to false which should prevent creating home directory (and changing directory permissions), but it seems like this dir is created either way. When you removed home: / it created /home/node-exp directory, which definitely is not a proper thing for a system user.

And I tend to think this is expected (in the end, if you tell it where its home is, everything in that location should belong to that user:group).

Not if you don't want to create a directory, but only set it in /etc/passwd. Essentially, control over the -m switch in useradd.

@paulfantom
Copy link
Member

I quickly tested if #111 works, but if you could give it a try that would be awesome.

@walterdolce
Copy link
Author

Hmmm. That's interesting and these are all good points.

I just went to double-check what version of Ansible I'm using on the projects where this happened and where it didn't. Where this happen I have pinned at this commit whereby in the project where it didn't happen I am using Ansible 2.8.5. Interestingly, I have a third project with the same setup using Ansible 2.7.12. Here the issue didn't occur either (though in this project I'm using AWS EC2 instances.

It's probably worth mentioning that where this problem occurred, I am also using the dev-sec.os-hardening and dev-sec.ssh-hardening Ansible roles (both at version 5.0.0).

@paulfantom
Copy link
Member

Last time I tested the compatibility of this role with dev-sec.os-hardening was when the latter was at 4.x.y release, so there may have been some changes. However I would assume dev-sec.os-hardening wouldn't do such stupid thing as changing permissions to files based on /etc/passwd, so the problem needs to be somewhere else.

I checked latest ansible code for user module and it seems that the problem shouldn't be in an alias, but it might be possible that there were some changes between 2.7.12, 2.8.5, and now.

You seem to have a skew of hundreds commits between 2.8.5 and commit you linked to. Also if you say that in ansible released versions of 2.8.5 and 2.7.12 this doesn't happen, but it happens on ansible built from some commit, then I would suspect a problem is in your ansible (which is not exactly a problem, as you are supposed to use released versions otherwise you are essentially on your own).

As for EC2, this doesn't matter as role doesn't have any way of checking anything in layers lower than operating system.

Just to be even surer this is not happening on any entry in a matrix of supported operating systems and ansible versions I included a permission test in #112

@walterdolce
Copy link
Author

Yes at some point we found a bug in a version of Ansible but a version of devel (the commit in question) did not have that bug and so we pinned it to that version. I will give it a try with the latest version of Ansible to see what happens first.

Excellent MTTR (Mean Time To Reaction in this case), BTW 👏

@stale
Copy link

stale bot commented Dec 1, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Dec 1, 2019
@stale stale bot closed this as completed Dec 15, 2019
@lock
Copy link

lock bot commented Jan 14, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jan 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants