Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repository_url key should normalize against minor differences #7

Open
DuaneOBrien opened this issue Oct 20, 2020 · 1 comment
Open
Labels
effort: 2 good first issue Good for newcomers. hacktoberfest help type: fix Iterations on existing features or infrastructure. work: obvious The situation is obvious, best practices used.

Comments

@DuaneOBrien
Copy link

Nearly identical repositories can result in data fragmentation. Sample:

  {
    "repository_url": "http://tomcat.apache.org",
    "score": 3793
  },
  {
    "repository_url": "https://tomcat.apache.org/",
    "score": 3293
  },
  {
    "repository_url": "http://tomcat.apache.org/",
    "score": 12
  }

It seems like we could strip the protocol and any trailing slash or whitespace characters and reduce this, while getting the same results.

@mjpitz
Copy link
Member

mjpitz commented Oct 21, 2020

The new API scheme introduced a concept of a ProviderURL that is intended to do this. Definitely seems like an easy thing we can fix right now.

@mjpitz mjpitz added effort: 2 good first issue Good for newcomers. hacktoberfest help type: fix Iterations on existing features or infrastructure. work: obvious The situation is obvious, best practices used. labels Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: 2 good first issue Good for newcomers. hacktoberfest help type: fix Iterations on existing features or infrastructure. work: obvious The situation is obvious, best practices used.
Projects
None yet
Development

No branches or pull requests

2 participants