-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Matrix #40
Comments
Could you provide more insights on how to weight all tests based on the algorithm? |
As discussed, I'd approach that from a code perspective and maybe later try to optimize or convert it to a (complex) query. In that sense, here's how to roughly compute a cell value def to_percentage(scores: list[float]) -> list[float]:
total = sum(scores)
return [score / total for score in scores]
def get_percentage_for(mirror: Mirror, within: list[Mirror]) -> float:
index = within.index(mirror)
if index < 0:
return 0
return to_percentage([mirror_.score for mirror_ in within])[index]
def get_list_of_mirrors_serving_country(country: Country) -> list[Mirror]:
def is_serving_country(mirror: Mirror) -> bool:
# exclude mirrors serving single country not this one
# we dont have nor plan to have this kind of config though
if mirror.only_country and mirror.country != country:
return False
# exclude mirrors serving single region if not ours
if mirror.only_region and mirror.region != country.region:
return False
# same but using the countries field
if mirror.other_countries and country not in mirror.other_countries_mapped:
return False
return True
return list(filter(is_serving_country, get_mirrors()))
def get_list_of_mirrors_serving_region(region: Region) -> list[Mirror]:
def is_serving_region(mirror: Mirror) -> bool:
# exclude mirrors serving single country not in opur region
if mirror.only_country and mirror.country not in region.all_countries:
return False
# exclude mirrors serving single region if not ours
if mirror.only_region and mirror.region != region:
return False
# same but using the countries field
if mirror.other_countries:
for country in region.all_countries:
if country not in mirror.other_countries_mapped:
return False
return True
return list(filter(is_serving_region, get_mirrors()))
def is_mb_possible(test: Test):
"""Whether such a combination is possible with MB configuration"""
if test.mirror.only_country and test.mirror.country != test.country:
return False
# exclude mirrors serving single region if not ours
if test.mirror.only_region and test.mirror.region != test.country.region:
return False
# same but using the countries field
if (
test.mirror.other_countries
and test.country not in test.mirror.other_countries_mapped
):
return False
return True
def compute_country_perf(mirror: Mirror, country: Country) -> float:
"""performance value for a country/mirror cell
represents the weighted median speed in the context or serving mirrors"""
all_tests = filter(
is_mb_possible, get_all_tests_for(mirror=mirror, country=country)
)
mirrors_in_country = get_mirrors_in(country=country)
# single mirror in that country: all tests are considered
# this condition is unnecessary as elif below already takes care of it
# but this explains what goes on
if mirrors_in_country == [mirror]:
return median_of(all_tests)
# there are multiple mirrors for this country, balance results
elif mirrors_in_country:
mirror_weight_for_country = get_percentage_for(mirror, mirrors_in_country)
return median_of(all_tests) * mirror_weight_for_country
# no mirror for this country
mirrors_in_region = get_mirrors_in(region=country.region)
# we have mirror(s) in the region. those are serving all requests for this country
if mirrors_in_region:
mirror_weight_for_region = get_percentage_for(mirror, mirrors_in_country)
return median_of(all_tests) * mirror_weight_for_region
# we dont have mirror in region, using fallback mirrors
serving_mirrors = get_list_of_mirrors_serving_country(country)
mirror_weight_general = get_percentage_for(mirror, serving_mirrors)
return median_of(all_tests) * mirror_weight_general
def compute_region_perf(mirror: Mirror, region: Region) -> float:
all_tests = filter(is_mb_possible, get_all_tests_for(mirror=mirror, region=region))
mirrors_in_region = get_mirrors_in(region=region)
# we have mirror(s) in the region. those are serving all requests
if mirrors_in_region:
mirror_weight_for_region = get_percentage_for(mirror, mirrors_in_region)
return median_of(all_tests) * mirror_weight_for_region
# we dont have mirror in region, using fallback mirrors
other_mirrors = get_list_of_mirrors_serving_region(region)
mirror_weight_general = get_percentage_for(mirror, other_mirrors)
return median_of(all_tests) * mirror_weight_general With a matrix as this one, I'd expec:
|
Because we are now to configure mirrorbain for regions (#38 and #39), we need to be able to extract some data by region. This assume that the region and score data are all in the database.
We thus need the following (to be discussed) Performance Matrix
It's important we're able to tell whether this new mirrorbain configuration (serving per-region with 2 fallback mirrors) is an improvement over the current configuration.
For this we need a way to measure the service's performance. That is; what are the speed people are expected to get when downloading from us?
To get closer to reality (which we cannot measure), we have to get closer to how mirrorbrain decides which mirror to direct each requests to.
Here's thus the simplified mirrorbrain algorithm we'll be mimicking:
That
sorted_by_score
would return a mirrorx%
of the time withx
being based on the score.The output of this would be a table with a row per mirror and a column per country (or the opposite) each giving the median download speed. In addition to a column per country, we'd need one column per region.
What's important here is that we dont use the
Test
table directly (that we already have) but weight all tests based on that algorithm.Once this ticket is implemented, the output of this matrix will be our baseline data.
Once we're happy with our baseline data, we'll update mirrorbrain config (recording the date) and let the tests run for a while (at least a month) then we can reprocess the performance matrix and compare the results with the baseline.
Baseline examples
If I want to look at results for France, I know it's gonna be simple because France has a mirror (2 actually) so there will be data only for 2 mirrors. On them, 96% of the tests will be using kiwix mirror (score is
3000
ATM) and 4% the mblibrary-fr one (score is100
). How we compute the weighted speed is open for discussion: we can either pick x% of the test and median those of pick the general median and weight using the percentage.Another example is India. With a single mirror, there will be only one matching row and that value will be the median of in-mirror from India.
If I look at a mirror instead of a country, say
fau.de
mirror line, it will have values for most countries because there is noonlyXX
parameter set in mirrobrain at the moment so any Test not from a country which has its own mirror has a 14% chance of being assigned to fau.de and thus use its speed.Future examples
After kiwix/container-images#263, results should be significantly different (whether this requires code change to build this matrix is TBD):
Results for countries with a mirror should be somewhat similar to the baseline as this is still prioritized. New scores will play a big role though.
Countries in EU, North America and Asia should not change much as they have a mirror in their region. Score would also play a role. Countries in South America on the other hand should change much. Africa and Oceania as well
If I look at a mirror, say
fau.de
mirror line, it will have values for many European countries because there are plenty of European countries but not a mirror in each. There should not be results for France, UK, Denmark, Sweden, Moldova and Israel because each have a mirror. There should not be any result from outside EU neither because fau.de will have theonlyRegion
flag set.Only the kiwix mirror and the moldova one will have results for countries in region where there is no mirror (africa, oceania)
The text was updated successfully, but these errors were encountered: