Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a daily cronjob emergency deletion of logs/files #75

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

anadahz
Copy link
Member

@anadahz anadahz commented Sep 1, 2016

This script checks the root filesystem disk usage and delete log files
and a number of files upon a critical or critical disk space usage of
the root filesystem.

This script checks the root filesystem disk usage and delete log files
and a number of files upon a critical or critical disk space usage of
the root filesystem.
@hellais
Copy link
Member

hellais commented Sep 1, 2016

I don't think deleting OS files like this is a good way to handle this type of situation. This can lead to the system being in an inconsistent state (for example now manpages don't exist anymore) or it could delete certain log files on which services have open handles on, leading to errors in the application.

I think the better solution for future versions to have a better file system layout where the data files are on a separate partition than the OS files.

Edit: also how much space are we really going to save by deleting the logfiles and the manpages? Note: critical levels of usage for ooniprobe related files is already implemented in ooniprobe itself and the quota is checked every hour.

@anadahz
Copy link
Member Author

anadahz commented Sep 1, 2016

This script will delete older log files and unneeded OS system files (some of these files are being already deleted upon the build of the image see: https://github.com/TheTorProject/lepidopter/blob/master/lepidopter-fh/cleanup.sh).
Given the fact that we are going to be updating lepidopter in the long run newer packages will add more docs, manpages, translations and increase the retrieved package files in the apt repository.
If we ever reach this percentage 98% and 95% I guess deleting 1-2 days of logs is the least of our concerns.

@hellais try it at your Pi you will surprised of how much disk space it will free up.

We are using this script as a safeguard in order not to end up with clattered disks that will make lepidopter unresponsive and stop providing ooniprobe reports.
In any case this script will not be triggered if ooniprobe's quota works as expected.

@hellais
Copy link
Member

hellais commented Sep 1, 2016

Just ran this on a lepidopter that has been running for may months now and I get this:

+ find /usr/share/doc -depth -type f '!' -name copyright
+ xargs du -sch
32K .
32K total
+ find /usr/share/doc -empty
+ xargs du -sch
4.0K    /usr/share/doc/git/contrib/buildsystems
4.0K    /usr/share/doc/git/contrib/credential
4.0K    /usr/share/doc/git/contrib/subtree
4.0K    /usr/share/doc/libssl-doc/demos/engines
4.0K    /usr/share/doc/xz-utils/extra
4.0K    /usr/share/doc/python-pycparser/examples
4.0K    /usr/share/doc/adduser/examples/adduser.local.conf.examples
4.0K    /usr/share/doc/netcat-traditional/examples
32K total
+ du -sch /usr/share/man /usr/share/groff /usr/share/info /usr/share/lintian /usr/share/linda /var/cache/man /usr/share/locale
du: cannot access '/usr/share/man': No such file or directory
du: cannot access '/usr/share/groff': No such file or directory
du: cannot access '/usr/share/info': No such file or directory
du: cannot access '/usr/share/lintian': No such file or directory
du: cannot access '/usr/share/linda': No such file or directory
28K /var/cache/man
4.0K    /usr/share/locale
32K total
+ find /var/log/ -type f -mtime +1
+ xargs du -sch
80K /var/log/daemon.log.1
4.0K    /var/log/tor/log.4.gz
4.0K    /var/log/tor/log.5.gz
4.0K    /var/log/tor/log.3.gz
4.0K    /var/log/debug.1
4.0K    /var/log/syslog.7.gz
4.0K    /var/log/auth.log.3.gz
4.0K    /var/log/auth.log.4.gz
4.0K    /var/log/kern.log.2.gz
4.0K    /var/log/messages.2.gz
928K    /var/log/ooni/ooniprobe.log.2016_8_25
932K    /var/log/ooni/ooniprobe.log.2016_8_28
932K    /var/log/ooni/ooniprobe.log.2016_8_29
0   /var/log/ooni/cronjobs.log
992K    /var/log/ooni/ooniprobe.log.2016_8_30
932K    /var/log/ooni/ooniprobe.log.2016_8_27
932K    /var/log/ooni/ooniprobe.log.2016_8_26
4.0K    /var/log/messages.4.gz
4.0K    /var/log/syslog.4.gz
4.0K    /var/log/syslog.5.gz
4.0K    /var/log/debug.2.gz
8.0K    /var/log/daemon.log.3.gz
8.0K    /var/log/daemon.log.4.gz
8.0K    /var/log/daemon.log.2.gz
4.0K    /var/log/messages.3.gz
8.0K    /var/log/kern.log.3.gz
4.0K    /var/log/auth.log.2.gz
36K /var/log/auth.log.1
4.0K    /var/log/syslog.3.gz
0   /var/log/debug
0   /var/log/btmp.1
4.0K    /var/log/wtmp.1
8.0K    /var/log/kern.log.4.gz
4.0K    /var/log/syslog.6.gz
4.0K    /var/log/kern.log.1
4.0K    /var/log/messages.1
5.8M    total
+ echo 'simulating apt-get clean'
simulating apt-get clean
+ du -sch /var/cache/apt/archives/lock /var/cache/apt/archives/partial '/var/cache/apt/archives/partial/*' /var/cache/apt/pkgcache.bin /var/cache/apt/srcpkgcache.bin
0   /var/cache/apt/archives/lock
4.0K    /var/cache/apt/archives/partial
du: cannot access '/var/cache/apt/archives/partial/*': No such file or directory
36M /var/cache/apt/pkgcache.bin
36M /var/cache/apt/srcpkgcache.bin
71M total

All it all it seems like it cleans up less than 80 MB of data of which 71 MB the apt-get cache. It seems like this is just shifting the problem forward and it will not really give a raspberry pi that is in critical state much more mileage (80 MB is about 1 days worth of ooniprobe measurements) at the cost of risking to break other system services.

@anadahz
Copy link
Member Author

anadahz commented Sep 5, 2016

Initially my idea as discussed with @bassosimone was to delete or archive ooniprobe reports, since ooniprobe is taking care of this I decided to not instruct the emergency deletion process to remove any ooniprobe reports.
80M (71M of apt cache files and 9M of other files) is still a significant amount of disk space that will allow ooniprobe to run and enforce the quota.

@hellais as I mentioned already the emergency cleanup will not be activated if the quota functionality in ooniprobe-agent is working as expected. This script ensures that lepidopter will not be brought in a state that is impossible to recover without user intervention. These files in #75 (comment) are already being deleted during the build-up process to reduce disk space in lepidopter's final image.

If we decide to drop this and take our risks I would like to know at least how well the quota enforcement is working and if it has been at all tested?

@hellais
Copy link
Member

hellais commented Nov 1, 2016

I suggest we instead go for an approach where we enable log rotation for ooniprobe logs.

In the past there were some issues with using logrotate in ooniprobe due to the fact that the logrotation of twisted was competing with the logrotate.

In https://github.com/TheTorProject/ooni-probe/pull/664 this should be resolved.

I would suggest we test if this feature does in fact work as expected and if so include log rotation for ooniprobe as well in later versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants