Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support COUNT(DISTINCT a.b) #600

Closed
LeoniePhiline opened this issue Feb 9, 2023 · 0 comments · Fixed by #700
Closed

Support COUNT(DISTINCT a.b) #600

LeoniePhiline opened this issue Feb 9, 2023 · 0 comments · Fixed by #700

Comments

@LeoniePhiline
Copy link

LeoniePhiline commented Feb 9, 2023

Ref:

Motivation

I am aggregating data and need to count not all rows, but merely distinct rows where some might be present multiple times.

My use case is this (MariaDB):

Table (just to give you context for the query further below - I am not currently looking to build this CREATE TABLE statement in sea-query):

CREATE TABLE tx_digibordauthentication_activity
(
  uid          bigint unsigned    NOT NULL auto_increment,

  date_time    datetime           NOT NULL,

  person       int(11) unsigned   NOT NULL DEFAULT '0',
  institution  int(11) unsigned   NOT NULL DEFAULT '0',
  subscription int(11) unsigned   NOT NULL DEFAULT '0',

  year_month      varchar(7) AS (DATE_FORMAT(CONVERT_TZ(UTC_TIMESTAMP, "UTC", "Europe/Amsterdam"), '%Y-%m')) PERSISTENT,
  year_week       varchar(7) AS (DATE_FORMAT(CONVERT_TZ(UTC_TIMESTAMP, "UTC", "Europe/Amsterdam"), '%x-W%v')) PERSISTENT,
  year_month_day  varchar(10) AS (DATE_FORMAT(CONVERT_TZ(UTC_TIMESTAMP, "UTC", "Europe/Amsterdam"), '%Y-%m-%d')) PERSISTENT,

  PRIMARY KEY (uid),

  KEY institution_last_activity (institution, date_time),
  KEY institution_monthly (institution, year_month, person),
  KEY institution_weekly (institution, year_week, person),
  KEY institution_daily (institution, year_month_day, person),

  KEY subscription_last_activity (subscription, date_time),
  KEY subscription_monthly (subscription, year_month, person),
  KEY subscription_weekly (subscription, year_week, person),
  KEY subscription_daily (subscription, year_month_day, person),
);

Query - here sea-query does not support all I need - i.e. COUNT(DISTINCT a.b) -, and some stuff gets very tricky even with Expr::cust_with_exprs() etc.:

SELECT p.`year_month`,
  COUNT(a.uid) AS all_activity,
  IF(
    LAG(p.`year_month`) OVER (ORDER BY p.`year_month` DESC) IS NULL,
    NULL,
    (COUNT(a.uid) / (LEAD(COUNT(a.uid)) OVER (ORDER BY p.`year_month` DESC)) - 1) * 100
  ) AS all_activity_change_percent,
  COUNT(DISTINCT a.year_month_day) AS active_days,
  IF(
    LAG(p.`year_month`) OVER (ORDER BY p.`year_month` DESC) IS NULL,
    NULL,
    (COUNT(DISTINCT a.year_month_day) / (LEAD(COUNT(DISTINCT a.year_month_day)) OVER (ORDER BY p.`year_month` DESC)) - 1) * 100
  ) AS active_days_change_percent,
  COUNT(DISTINCT a.person) AS active_personnel,
  IF(
    LAG(p.`year_month`) OVER (ORDER BY p.`year_month` DESC) IS NULL,
    NULL,
    (COUNT(DISTINCT a.person) / (LEAD(COUNT(DISTINCT a.person)) OVER (ORDER BY p.`year_month` DESC)) - 1) * 100
  ) AS active_personnel_change_percent
FROM (
  SELECT DISTINCT `year_month`
  FROM tx_digibordauthentication_activity
  ORDER BY `year_month` DESC
) p
LEFT JOIN tx_digibordauthentication_activity a
  ON a.year_month = p.year_month
    AND a.institution = 2050
GROUP BY p.`year_month`
ORDER BY p.`year_month` DESC;

Proposed Solutions

Add Expr::count_distinct() and Func::count_distinct()

Additional Information

I am missing a few things here and I will open separate issues for these:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant