You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a Pelican cache needs to put in a downtime, operators still create the downtime in topology (for example, that goes into the daily monitoring reports).
However, it seems that if there are Pelican server ads, then it overwrites the downtime from topology and we continue to use the cache. For example, check out sc-cache.chtc.wisc.edu that @matyasselmeci operates: it should be in a months-long downtime but it is actively receiving redirects.
Unlike other attributes where we always prefer the Pelican version, we should have the director enforce a downtime if either source (topology or the pelican origin/cache) has a downtime listed.
Marking as critical for 7.9.0 but I think it should be backported to 7.8.x as well.
The text was updated successfully, but these errors were encountered:
For this case, if a cache is registered at both sides, isn't it the admin's job to also disable the cache in the Pelican director? If the cache only lives in the topology, putting it in downtime removes it from the topology json, and there seems to have no explicit way of telling if a server is in downtime from the topology json.
As @matyasselmeci pointed out, we can add ?include_downed=1 to show all the servers, and we can do a difference to find the servers that are in downtime, but I'm not entirely sure this is the way we want to follow.
For this case, if a cache is registered at both sides, isn't it the admin's job to also disable the cache in the Pelican director?
Which admin?
If you mean the cache admin: I don't want a cache admin to have to repeat themselves. They should only have to declare the downtime once. Since there's no way for a cache admin to declare a downtime in Pelican (see #1251), they have to do this via topology (plus our monitoring infrastructure only looks at topology right now).
If you're talking about the central services admin: I don't think they should be declaring downtimes for all caches.
That makes sense. I'll figure out a way to let director admin know that a downtime was fetched from Topology instead of set at the director, which can then be expanded to show a generic source of downtime: topology, director UI, director configuration, origin/cache server.
When a Pelican cache needs to put in a downtime, operators still create the downtime in topology (for example, that goes into the daily monitoring reports).
However, it seems that if there are Pelican server ads, then it overwrites the downtime from topology and we continue to use the cache. For example, check out sc-cache.chtc.wisc.edu that @matyasselmeci operates: it should be in a months-long downtime but it is actively receiving redirects.
Unlike other attributes where we always prefer the Pelican version, we should have the director enforce a downtime if either source (topology or the pelican origin/cache) has a downtime listed.
Marking as critical for 7.9.0 but I think it should be backported to 7.8.x as well.
The text was updated successfully, but these errors were encountered: