Skip to content
darix edited this page Aug 6, 2020 · 3 revisions

Symlink and mirrorbrain

mirrorbrain assumes that all mirrors match the directory layout from download.opensuse.org under their basedir. They can have partially mirrored data and will handle it fine as long as the layout matches.

Current behavior

We only support intree symlinks right now. The algorithm for this is:

  1. resolve the symlink
  2. remove the base directory from the resolved symlink
  3. lookup new relative path in the DB

This has a few advantages

  1. we can redirect even to mirrors which do not handle symlinks.
  2. if we want to run statistics, we do the whole symlink resolving once and can track the downloads only on the canonical path.

Disadvantages

  1. We have some main mirrors which setup their whole main tree via symlinks to out of tree directories. This leads to the situation that when we do step 2, we remove the incorrect base directory from the path resolved in step 1. This can be worked around by making the base dir of the other path match the length of the mirrorbrain base directory. While it works this is really ugly.
  2. In the current way of scanning files, we still learn both the symlinked and the expanded path. Which adds unneeded rows to the database.

Align symlink handling with other goals

  1. We want to make mirrorbrain hosts (like download.opensuse.org) work without having the whole filesystem available to them. This makes scale out geo based distribution much more easy. But it brings up the question how to handle intree symlinks in this case.
  2. Support the case for mirrors that create their base tree with symlinks.

Possible solutions

Custom realpath

This is more a short term solution as it still requires file system access.

char *mb_realpath(const char *path, const char *basedir, char *resolved_path);

With the following idea: we walk the path backwards and do readlink() on each path element to see if it is a symlink, and resolve the relative path up to that point, if the symlink target is still within the basedir. If we encounter a symlink that points outside of the basedir, we stop and treat the rest of the path as a directory.

This would still need careful considerations as realpath is actually doing a lot more than just resolving symlinks. To quote the man page:

"realpath() expands all symbolic links and resolves references to /./, /../ and extra '/' characters in the null-terminated string named by path to produce a canonicalized absolute pathname. The resulting pathname is stored as a null-terminated string, up to a maximum of PATH_MAX bytes, in the buffer pointed to by resolved_path. The resulting path will have no symbolic link, /./ or /../ components. If resolved_path is specified as NULL, then realpath() uses malloc(3) to allocate a buffer of up to PATH_MAX bytes to hold the resolved pathname, and returns a pointer to this buffer. The caller should deallocate this buffer using free(3)."

Last but not least this approach does not align nicely with a mirrorbrain without FS access.

Config file approach

Especially on download.opensuse.org we have 2 forms of indirections:

  1. rewrite rules
  2. intree symlinks

This is a bit annoying in itself:

  1. It is hard to get an overview what is going on when you try to debug why people get pointed to a certain file.
  2. rewrite rules would need to be replicated between download.opensuse.org and downloadcontent.opensuse.org
  3. rewrite rules are always resolved before we hit the mirrorbrain code unlike symlinks where we have to do the resolving ourself.

Maybe we want to unify this mapping into one config file which then can be used by all parts of mirrorbrain:

mirrorbrain:
  mappings:
    'update/openSUSE-stable':
      target: 'leap/15.2'
      symlink: true
    'repositories/openSUSE:Factory/':
      target: 'tumbleweed/'

The first mapping would be applied to the filesystem e.g. via a mb mappings apply.

For the 2nd mapping the same command would generate rewrite rules for each webserver involved (in our case Apache httpd and nginx).

# nginx
rewrite ^/repositories/openSUSE:Factory/(.*)$ /tumbleweed/$1 last;

# Apache
RewriteEngine on
RewriteRule ^/repositories/openSUSE:Factory/(.*)$ /tumbleweed/$1 [L]

That mapping file would allow us to resolve all known mappings (symlinks or rewrite rules) in mod_mirrorbrain now or server implementations in the future, and only store canonical paths in the DB. We could even store the whole mapping information into the DB and load it from there on startup/reload.

CREATE TABLE mb_file_mappings (
  id bigint GENERATED ALWAYS AS IDENTITY,
  source text,
  target text,
  symlink boolean DEFAULT 'f'
);
Clone this wiki locally