Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start plumbing redirect choice info into Director's response body #2054

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jhiemstrawisc
Copy link
Member

One of the most frequent user/integrator issues I've encountered is "why is the Director choosing this list of caches/origins to redirect me to." As this redirection logic grows more complex, it may be useful to empower curious users to inspect some of the underlying logic themselves (with the appropriate debug logging flags). Not only does provide information about the Director's choices, but it might also let users identify issues that we can solve, such as client IPs that aren't resolvable with MaxMind.

This PR should accomplish two things:

  • Add information about the Director's redirect choices to its response body for both cache & origin redirects
  • Display that information in the client when debug information is turned on

@jhiemstrawisc jhiemstrawisc added this to the v7.15 milestone Feb 26, 2025
@jhiemstrawisc jhiemstrawisc added enhancement New feature or request client Issue affecting the OSDF client director Issue relating to the director component labels Feb 26, 2025
Copy link
Collaborator

@bbockelm bbockelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I think it's a violation of the GeoIP database license to provide this information in a web service.

I think we can release whether or not there's an accurate estimate, whether we leveraged a location override, etc. Just can't state a location.

@jhiemstrawisc
Copy link
Member Author

jhiemstrawisc commented Feb 27, 2025

@bbockelm, can you point me to where you're determining that this would violate their license? I spent some time trying to determine this today, and I came to the opposite conclusion. My observations were:

We use the GeoLite2 databases, so we fall under the "GeoLite2 End User License Agreement"

  • The first section of the "GeoLite2 End User License Agreement" states:
    "The following policies are incorporated into this Agreement by reference and provide additional terms and conditions incorporated herein by this reference and/or related to the use of the Website:

    Creative Commons Corporation Attribution-ShareAlike 4.0 International License (the "Creative Commons License")"

  • The "Creative Commons" license that's used as the base of their agreement states:
    "You are free to:

    Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
    Adapt — remix, transform, and build upon the material for any purpose, even commercially.
    The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

Attribution — You must give[ appropriate credit ](https://creativecommons.org/licenses/by-sa/4.0/#ref-appropriate-credit), provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the[ same license ](https://creativecommons.org/licenses/by-sa/4.0/#ref-same-license)as the original.
No additional restrictions — You may not apply legal terms or[ technological measures ](https://creativecommons.org/licenses/by-sa/4.0/#ref-technological-measures)that legally restrict others from doing anything the license permits."
  • The only clause in the "Additional Restrictions" section of the the "GeoLite2 End User License Agreement" that might apply to our situation here states:
    "Except as explicitly permitted by the Creative Commons License, you will not disclose the Services to any third party or after notifying MaxMind of the anticipated disclosure and obtaining MaxMind's prior written consent to the disclosure. To the extent you disclose the Services to a third party as permitted by this Agreement, you will impose upon the third party the same or substantially similar contractual duties imposed on you and the rights provided to MaxMind as in this Agreement, including those in Section 3 (LIMITED GRANT OF RIGHTS), Section 6 (ADDITIONAL RESTRICTIONS), and the DPA and, where not inconsistent with the other terms of this Agreement, as in the Creative Commons License. You are responsible for the acts or omissions of any third parties with which you share the Services."

Since this section begins with "Except as explicitly permitted by the Creative Commons License...", it seems to me that we are explicitly allowed to "Share" and "Adapt" the information I've plumbed back to clients in this PR.

If I'm misinterpreting all this legalese and you're right, then we're probably already violating the user agreement by exposing geolite-derived lat/lon coordinates for all of the origins/caches through the Director web ui. Do we need to remove that feature or put it behind an auth wall?

@bbockelm
Copy link
Collaborator

@jhiemstrawisc - I couldn't be happier to hear that I'm wrong and operating off old information.

A bit behind on things currently so I'm not able to double-check your assertions. @turetske - could you take a look as well?

@bbockelm bbockelm self-requested a review February 28, 2025 00:57
@turetske
Copy link
Collaborator

@jhiemstrawisc @bbockelm
I don't believe that interpretation is correct. And actually, we have a much larger problem because we currently violate their End User Agreement as is.

As Justin pointed out, their language of "Except as explicitly permitted by the Creative Commons License" seems to be contradictory since the Creative Commons license allows for almost anything.

But, at the top of the license is the line:

"This Agreement controls in the event of any conflict with the above-referenced documents, except as otherwise provided in Section 7 (Personal Data). Thereafter, for any conflicts among the above 4 documents, the priority and precedence of interpretation is DPA, PP, WT and Creative Commons License."

Which I believe means that in the event of a conflict, what's written in the GeoLite2 End User License Agreement takes precedence over the Creative Commons license.

There is also this line in the agreement:

"In addition and if you are using the Services for internal use, subject to the terms and conditions of this Agreement, MaxMind also hereby grants you a non-exclusive, non-transferable limited license to access and use the Services for your own internal business purposes." Which I think we actually violate just as is, since we aren't using it just for internal business purposes. By which they mean using it to, say, develop targeted ads based on ip location whereas we are actually distributing a tool that uses the services.

I believe in order to explicitly share the data, MaxMind would insist we actually need a commercial license:

Specifically this line:

"If you would like to include data from MaxMind’s GeoLite2 databases in a product or service you provide to your users or customers, you will need a Commercial Redistribution License for GeoLite2"

@jhiemstrawisc
Copy link
Member Author

I followed up with the MaxMind sales department about what our obligations are here. I explained the information we wanted to provide as part of this PR, and how we integrate with Maxmind. Their response:

As long as you are able to meet our Attribution Requirement (here), I can confirm that your use of the GeoLite data is permitted under the GeoLite2 EULA.

Their "attribution requirement" link states explicitly:

The GeoLite2 End User License Agreement also allows you to use data from GeoLite2 databases and web services to build applications that are available to users outside your company or organization, or to display that data to users outside your organization, as long as you attribute the data to MaxMind.

So I think our only obligation is to say the lat/lon/accuracy information comes from MaxMind in the client response. I've asked for further clarification on whether we need that attribution per client request, or whether we can just add it to the Director's web page somewhere.

@jhiemstrawisc
Copy link
Member Author

In response to my question about per-request attribution vs general web page attribution, they said sticking one attribution in the Director's web UI is sufficient. The set of further guidelines they provided is attached
GeoLite attribution requirements.pdf

There was a bug previously that assumed all incoming weights were positive.
However, some of the weights may be negative, which is a trick used to make
sure the weights are always sorted at the end of a list. When negative weights
made it into adaptive sort, it triggered an infinite loop.

This fixes the bug by normalizing the weights in a way that guarantees negative
weights are treated as the smallest possible ranges.

Additionally, rather than generate random numbers over and over and over only
to see them consistently falling in an already-visited range, this changes the
function to remove and rescale ranges as we go. One consequence of this is that
it no longer makes sense for the stochastic sort function to return the weights
because these weights only make sense in the context of the exact set of ranges
that existed when the weight was created.
Most of these changes are to make sure the redirectInfo object is picking
up the values it should.

There's also a new test called "test-adaptive-sort" that covers a
previously-untested set of code. When I started writing the test, I found
the infinite loop bug mentioned in this PR's previous commit.
NOTE: The new test is not super rigorous because I don't know that it makes
a lot of sense to spend time designing a good test for something as stochastic
as our adaptive sort. In the future, when we're convinced adaptive sort does
exactly what we want, we should revisit the test and make sure its assumptions
hold true.
@jhiemstrawisc jhiemstrawisc marked this pull request as ready for review March 4, 2025 14:32
@jhiemstrawisc jhiemstrawisc requested a review from turetske March 4, 2025 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client Issue affecting the OSDF client director Issue relating to the director component enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plumb info about Director's geolocation decisions back to Clients
3 participants