Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If the latest registry in upstreams don't agree with each other, mirror all of them #8

Open
johnnychen94 opened this issue Aug 12, 2020 · 2 comments

Comments

@johnnychen94
Copy link
Owner

It makes less sense to code the complex logic here as the storage server doesn't care much about which is the latest one; simply pulling down all registry tarballs and whatever packages they contain would be good enough.

function query_latest_hash(registry::RegistryMeta, upstreams::AbstractVector{<:AbstractString})
# collect current registry hashes from servers
uuid = registry.uuid
hash_info = Dict{String, Vector{String}}() # Dict(hashA => [serverA, serverB], ...)
servers = String[] # [serverA, serverB]
for server in upstreams
hash = query_latest_hash(registry, server)
isnothing(hash) && continue
push!(get!(hash_info, hash, String[]), server)
push!(servers, server)
end
# for each hash check what other servers know about it
if isempty(hash_info)
# if none of the upstreams contains the registry we want to mirror
@warn "failed to find available registry" registry=registry.name upstreams=upstreams
return nothing
end
# a hash might be known to many upstreams
for (hash, hash_servers) in hash_info
for server in servers
server in hash_servers && continue
url_exists("$server/registry/$uuid/$hash") || continue
push!(hash_servers, server)
end
end
# Ideally, there is an upstream server that knows all hashes, and we set hash in that server
# as the latest hash.
# In practice, we set the first non-malicious hash known to fewest servers as the latest hash.
hashes = sort!(collect(keys(hash_info)))
sort!(hashes, by = hash -> length(hash_info[hash]))
hashes[findfirst(x->verify_registry_hash(registry.source_url, x), hashes)]
end

@skyzh
Copy link

skyzh commented Aug 12, 2020

Is there any way to read which one is "latest" (e.g. from timestamp)?

@johnnychen94
Copy link
Owner Author

johnnychen94 commented Aug 12, 2020

Unfortunately, no, at least the current pkg&storage protocol doesn't specify this.

JuliaLang/Pkg.jl#1377
One subtlety is how the Pkg Server determines what the latest version of each registry is. It can get a map from regsitry UUIDs to version hashes from each Storage Server, but hashes are unordered—if multiple Storages Servers reply with different hashes, which one should the Pkg Server use? When Storage Servers disagree on the latest hash of a registry, the Pkg Server should ask each Storage Server about the hashes that the other servers returned: if Service A knows about Service B's hash but B doesn't know about A's hash, then A's hash is more recent and should be used. If each server doesn't know about the other's hash, then neither hash is strictly newer than the other one and either could be used. The Pkg Server can break the tie any way it wants, e.g. randomly or by using the lexicographically earlier hash.

For storage server, mirroring all registry tarballs is equivalent to mirroring the "latest" tarball, and thus we could free us from the complex "which is the latest" code logic and to use a plain for loop to just mirror the same registry multiple times (most of them will be trivial cases).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants