Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading "Downloads by Python version over time" is only showing a subset of used versions #105

Open
lesteve opened this issue Nov 30, 2024 · 2 comments · May be fixed by #116
Open

Misleading "Downloads by Python version over time" is only showing a subset of used versions #105

lesteve opened this issue Nov 30, 2024 · 2 comments · May be fixed by #116
Assignees
Labels
bug Something isn't working

Comments

@lesteve
Copy link

lesteve commented Nov 30, 2024

Describe the bug

The "Downloads by Python version over time" is only showing a subset of versions which is confusing.

To Reproduce
Steps to reproduce the behavior:
Look at the scikit-learn dashboard for the last year https://clickpy.clickhouse.com/dashboard/scikit-learn?min_date=2024-01-01&max_date=2025-01-01

It makes you think that nobody is using scikit-learn with Python >= 3.11 which is rather suprising.

Looking at the query there is a LIMIT 4 on the last line which I believe is the source of the problem (there may be a reason why is there maybe efficiency?):

SELECT
            python_minor as name,
            if(date_diff('month', {min_date:Date32},{max_date:Date32}) <= 6,toStartOfDay(date)::Date32, toStartOfWeek(date)::Date32) AS x,
            sum(count) AS y
        FROM pypi.pypi_downloads_per_day_by_version_by_python
        WHERE (date >= {min_date:Date32}) AND (date < if(date_diff('month', {min_date:Date32},{max_date:Date32}) <= 6,toStartOfDay({max_date:Date32})::Date32, toStartOfWeek({max_date:Date32})::Date32)) AND (project = {package_name:String}) 
        AND 1=1 AND python_minor != '' 
        AND 1=1 AND 1=1
        GROUP BY name, x
        ORDER BY x ASC, y DESC LIMIT 4 BY x

If I remove the LIMIT 4 I do see that scikit-learn is used with Python >= 3.11 as expected.

Expected behavior

All Python versions are shown.

Screenshots
If applicable, add screenshots to help explain your problem.
image

Desktop (please complete the following information):

  • I don't think is relevant

Additional context
Add any other context about the problem here.

@lesteve lesteve added the bug Something isn't working label Nov 30, 2024
@gingerwizard
Copy link
Collaborator

I think this was added to make things render better over large time periods @lesteve otherwise we get a chart which just isnt usable. We had a few options here:

  • Get the top N for the whole period and only show these - this fails to show new released versions unfortunately.
  • Get the top 4 for each period and use these - this is what we do now and it generally works (except where the same top 4 are for each period)

I think we should remove the limit but bucket anything over 10 into an other bucket - similar to sql UI here

@lesteve
Copy link
Author

lesteve commented Dec 4, 2024

I think we should remove the limit but bucket anything over 10 into an other bucket - similar to sql UI here

Sounds like a good compromise indeed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants