Skip to content

build.opensuse.org

Eduardo J. edited this page Oct 16, 2024 · 14 revisions

Deploying

We deploy using ansible, see ansible-obs for details.

Application Server

We use Apache in combination with Passenger as application server.

Passenger

Passenger has command line tool that you can use to get more information

passenger-status # shows an overview
passenger-status --show=server # shows detailed information

and you can gracefully restart the Passenger threads (they stop once they processed the last request and new ones get booted) like this

touch tmp/restart.txt

Apache

The systemd unit for Apache is called apache2.

systemctl status|start|stop|restart apache2

Put apache into maintenance mode

  1. Put apache out of maintenance mode in /etc/sysconfig/apache2

      APACHE_SERVER_FLAGS="STATUS MAINTENANCE"
    
  2. Restart apache

      rcapache2 restart
    
  3. Do whatever you need to do to fix the problem.

  4. Put apache out of maintenance mode in /etc/sysconfig/apache2

      APACHE_SERVER_FLAGS="STATUS"
    
  5. Restart apache

      rcapache2 restart
    

Log Files

  • Apache: /srv/www/obs/api/log/apache_access.log and /srv/www/obs/api/log/error.log
  • Passenger: /var/log/apache2/passenger_log

Ruby on Rails

We are running the rails application as the user wwwrun. So whatever you want to do you should also do this as this user to avoid creating files/running services with the wrong permissions. For this just prepend

run_in_api rails console

pry which we use as rails console has a nice history feature. You can look at the log here:

/var/lib/wwwrun/.local/share/pry/pry_history

You can toggle admin rights for a user with

run_in_api rake user:toggle_admin_rights theusername

Log Files

  • Ruby on Rails: /srv/www/obs/api/log/production.log
  • Ruby on Rails calling the backend: /srv/www/obs/api/log/backend_access.log

Troubleshooting Assets

From time to time we have some issues with the CSS/JS assets. If application.css or application.js are missing (you will notice it, when you see unusual errors in your javascript console, specially a 404 when trying to retrieve it) then there are probably more than one sprocket manifest in production. Go to the public folder, check which one comes from the package and delete the one which doesn't. After that reload the application.

cd public/assets
rpm -qf .sprockets-manifest*
rm .sprockets-manifest-$SOMEHASH.json
cd ../..
touch tmp/restart.txt

Local Services

For delayed jobs, sphinx, postfix etc. there is are systemd units and a target. You can issue commands for all services...

systemctl stop obs-api-support.target

...or on single units.

systemctl stop obs-sphinx.service

To make sure all units are running fine (display as active (running), in green)

systemctl list-dependencies obs-api-support.target

To get an overview about events/jobs, you can run:

run_in_api rails runner script/delayed_job_stats.rb

Log Files

  • searchd: /srv/www/obs/api/log/production.searchd.log and /srv/www/obs/api/log/production.searchd.query.log
  • clockworkd: /srv/www/obs/api/log/clockworkd.clock.output
  • Postfix: /var/log/mail
  • systemd unit logs are in journalctl -u $service like journalctl -u obs-clockwork.service. You can also filter messages within a time range (either timestamp or placeholders like "yesterday"). Read up on man journalctl...
    journalctl -u $SERVICE --since now|today|yesterday|tomorrow --until YYYY-MM-DD HH:MM:SS

Remote Services

We also use a couple of services that are somewhere else in the network.

RabbitMQ

We push a lot of events and metrics to https://rabbit.opensuse.org/

The HTML frontend of that service shows you live events, so click that link to see if it's working in general.

When you see exceptions like SSL_connect SYSCALL returned=5 errno=0 state=unknown state this usually means that there is some issue with the RabbitMQ server / connection. The maintenance window of the RabbitMQ server is Thursday, 8:00am to 10:00am CET. This can also cause this issue.

Errbit could also report AMQ::Protocol::EmptyResponseError: Empty response received from the server. errors. This does not happen when the reference server is deployed, but when the rabbit machine is (regular security updates, for example).

Errbit

We push exceptions to https://errbit.opensuse.org

To make sure that errbit is working run

run_in_api rake airbrake:test

and it will send an airbrake event to our errbit.

Removing a bunch of exceptions from Errbit

Remeber:

Github

OBS packages are build whenever a PR get's merged to master. This might delay publishing of built packages. To prevent this, disable the OBS integration in GitHub:

  1. Go to settings of the OBS GitHub project
  2. Select the 'Integration & services' tab and click on 'Edit' in the OBS column
  3. Uncheck the 'Activate' checkbox and 'Update services'
  4. Once the deployment is done activate the checkbox again ;-)

Linux

Of course we deploy on Linux so you can use a lot of the cool tools/features it brings.

Bash History

We are recording shell commands with timestamps in /root/.bash_history. You can look at it with in chronological order with

history | tr --squeeze-repeats " " | cut -d " " -f 3- | sort

Who is/was logged in?

journalctl -u sshd --utc -o json --since "6 hours ago" | ruby /root/who.rb

Poor Mans Analytics with awk

Grep the top 15 IPs requesting Webui::PackageController#view_file

grep 'controller=Webui::PackageController action=view_file' log/production.log |awk 'match($0,/\yhost=(\S+)/, arr) {print arr[1]}'|sort |uniq -c|sort -n|tail -n 15

See Exceptions

Grep the exceptions we saw in the log file (minus the boring ones we've filtered out).

grep -A 1 FATAL log/production.log |grep -v "ActiveRecord::RecordNotFound\|ActionController::RoutingError\|ActionController::InvalidAuthenticityToken\|ActionController::UnknownFormat\|ActiveRecord::RecordNotUnique"|grep -v FATAL|grep -v -- '--'

Grep the request UUID for the full backtrace.

What are people currently accessing?

For instance when some thread is taking a very long time or starts blocking the database you want to see what is happening now (threads running) instead of seeing what happened in the past (log entries).

passenger-status --show server|grep path

What files are modified in production?

In production, some files can differ from what we expect to have after the package installation. Some of those files are permanently modified, as configuration files. But some others can be temporarily modified when dealing with monkey patches, for example.

It is helpful to verify the packages and discover which of the installed files differ.

rpm -V obs-api

Troubleshooting

Update Email in OBS

The reference server uses a proxy (IDP) to handle users. Sometimes users have problems updating their email. These are the steps:

  • Change the email in IDP.
  • Verify the email in the account system entering your username into https://idp-portal.suse.com/univention/self-service/#page=verifyaccount (In a perfect world, this would be automatically triggered by the request to change emails, but it is not yet).
  • Wait a few minutes to sync and then log into OBS for it to pick up the update.
Clone this wiki locally