Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add initial version of learning analytics computation and dashboard illustrations #4211

Merged
merged 58 commits into from
Dec 23, 2024

Conversation

sjschlapbach
Copy link
Member

@sjschlapbach sjschlapbach commented Aug 26, 2024

Summary by CodeRabbit

  • New Features

    • Introduced new analytics functionalities including participant activity performance tracking, aggregated course analytics, and enhanced performance metrics.
    • Added new components for improved user experience in analytics dashboards, including filters, navigation options, and visualizations.
    • Implemented new Jupyter notebooks for structured data analysis and visualization of participant and course analytics.
  • Bug Fixes

    • Improved error handling in participant response evaluation and tracking functions.
  • Documentation

    • Added comprehensive README files for new services detailing setup instructions and available commands.
  • Chores

    • Updated configuration files for improved project structure and path mappings.

…ponent (#4181)

Co-authored-by: Julius Schlapbach <80708107+sjschlapbach@users.noreply.github.com>
Co-authored-by: sjschlapbach <julius.schlapbach@gmx.net>
Copy link

aviator-app bot commented Aug 26, 2024

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.

…on approach for analytics (#4182)

Co-authored-by: Julius Schlapbach <80708107+sjschlapbach@users.noreply.github.com>
Co-authored-by: sjschlapbach <julius.schlapbach@gmx.net>
Copy link

cypress bot commented Aug 26, 2024

klicker-uzh    Run #3837

Run Properties:  status check passed Passed #3837  •  git commit d14d9dc9ce: feat: add initial version of learning analytics computation and dashboard illust...
Project klicker-uzh
Branch Review v3
Run status status check passed Passed #3837
Run duration 12m 31s
Commit git commit d14d9dc9ce: feat: add initial version of learning analytics computation and dashboard illust...
Committer Julius Schlapbach
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 148
View all changes introduced in this branch ↗︎

Copy link

cypress bot commented Aug 26, 2024

klicker-uzh    Run #3836

Run Properties:  status check passed Passed #3836  •  git commit ee3b015605 ℹ️: Merge f8c8cb1e99c88e7bf1f385d0f032629faeca0837 into 3ec83538f1fb6ab46177ded80aac...
Project klicker-uzh
Branch Review v3-analytics
Run status status check passed Passed #3836
Run duration 12m 02s
Commit git commit ee3b015605 ℹ️: Merge f8c8cb1e99c88e7bf1f385d0f032629faeca0837 into 3ec83538f1fb6ab46177ded80aac...
Committer Julius Schlapbach
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 148
View all changes introduced in this branch ↗︎

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 22

Outside diff range, codebase verification and nitpick comments (1)
apps/analytics/src/modules/participant_analytics/compute_participant_course_analytics.py (1)

8-12: Consider adding type hints for function parameters.

Adding type hints for db, df_courses, and verbose can improve code readability and maintainability.

Apply this diff to add type hints:

-def compute_participant_course_analytics(db, df_courses, verbose=False):
+def compute_participant_course_analytics(db: DatabaseType, df_courses: pd.DataFrame, verbose: bool = False):
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 107aecd and a1736ff.

Files ignored due to path filters (1)
  • apps/analytics/poetry.lock is excluded by !**/*.lock
Files selected for processing (44)
  • .gitignore (1 hunks)
  • apps/analytics/pyproject.toml (2 hunks)
  • apps/analytics/src/init.py (1 hunks)
  • apps/analytics/src/modules/init.py (1 hunks)
  • apps/analytics/src/modules/participant_analytics/init.py (1 hunks)
  • apps/analytics/src/modules/participant_analytics/aggregate_analytics.py (1 hunks)
  • apps/analytics/src/modules/participant_analytics/compute_correctness.py (1 hunks)
  • apps/analytics/src/modules/participant_analytics/compute_participant_analytics.py (1 hunks)
  • apps/analytics/src/modules/participant_analytics/compute_participant_course_analytics.py (1 hunks)
  • apps/analytics/src/modules/participant_analytics/get_participant_responses.py (1 hunks)
  • apps/analytics/src/modules/participant_analytics/save_participant_analytics.py (1 hunks)
  • apps/analytics/src/notebooks/archive/daily_participant_analytics.ipynb (1 hunks)
  • apps/analytics/src/notebooks/archive/test.ipynb (1 hunks)
  • apps/analytics/src/notebooks/participant_analytics.ipynb (1 hunks)
  • apps/frontend-control/src/components/sessions/EmbeddingModal.tsx (1 hunks)
  • apps/frontend-control/tsconfig.json (1 hunks)
  • apps/frontend-manage/next.config.js (1 hunks)
  • apps/frontend-manage/src/components/common/Header.tsx (1 hunks)
  • apps/frontend-manage/src/components/courses/actions/EvaluationLinkLiveQuiz.tsx (1 hunks)
  • apps/frontend-manage/src/components/courses/actions/RunningLiveQuizLink.tsx (1 hunks)
  • apps/frontend-manage/src/components/courses/actions/StartLiveQuizButton.tsx (1 hunks)
  • apps/frontend-manage/src/components/interaction/AudienceInteraction.tsx (1 hunks)
  • apps/frontend-manage/src/components/sessions/EmbeddingModal.tsx (1 hunks)
  • apps/frontend-manage/src/components/sessions/Session.tsx (2 hunks)
  • apps/frontend-manage/src/components/sessions/cockpit/CancelSessionModal.tsx (1 hunks)
  • apps/frontend-manage/src/components/sessions/cockpit/SessionTimeline.tsx (1 hunks)
  • apps/frontend-manage/src/components/sessions/creation/liveQuiz/LiveSessionWizard.tsx (2 hunks)
  • apps/frontend-manage/src/components/sessions/evaluation/EvaluationControlBar.tsx (1 hunks)
  • apps/frontend-manage/src/pages/analytics/index.tsx (1 hunks)
  • apps/frontend-manage/src/pages/index.tsx (1 hunks)
  • apps/frontend-manage/src/pages/quizzes/[id]/cockpit.tsx (2 hunks)
  • apps/frontend-manage/src/pages/quizzes/index.tsx (2 hunks)
  • apps/frontend-manage/tsconfig.json (1 hunks)
  • apps/frontend-pwa/tsconfig.json (1 hunks)
  • apps/office-addin/src/content/components/App.tsx (1 hunks)
  • apps/office-addin/src/content/components/URLForm.tsx (2 hunks)
  • cypress/cypress/e2e/F-live-quiz-workflow.cy.ts (10 hunks)
  • cypress/cypress/e2e/H-practice-quiz-workflow.cy.ts (3 hunks)
  • cypress/cypress/e2e/K-group-activity-workflow.cy.ts (1 hunks)
  • packages/i18n/messages/de.ts (1 hunks)
  • packages/i18n/messages/en.ts (1 hunks)
  • packages/prisma/src/prisma/migrations/20240826113537_participant_analytics_correctness_counts/migration.sql (1 hunks)
  • packages/prisma/src/prisma/migrations/20240827152536_computed_at_analytics/migration.sql (1 hunks)
  • packages/prisma/src/prisma/schema/analytics.prisma (9 hunks)
Files skipped from review due to trivial changes (5)
  • .gitignore
  • apps/frontend-manage/src/components/courses/actions/EvaluationLinkLiveQuiz.tsx
  • apps/frontend-manage/src/components/sessions/evaluation/EvaluationControlBar.tsx
  • apps/office-addin/src/content/components/App.tsx
  • packages/i18n/messages/de.ts
Additional context used
Ruff
apps/analytics/src/__init__.py

1-1: from .modules import * used; unable to detect undefined names

(F403)


2-2: from .notebooks import * used; unable to detect undefined names

(F403)

apps/analytics/src/modules/__init__.py

1-1: .participant_analytics.compute_correctness imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


1-1: .participant_analytics.get_participant_responses imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

apps/analytics/src/modules/participant_analytics/__init__.py

1-1: .compute_correctness.compute_correctness imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


2-2: .get_participant_responses.get_participant_responses imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


3-3: .aggregate_analytics.aggregate_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


4-4: .save_participant_analytics.save_participant_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


5-5: .compute_participant_course_analytics.compute_participant_course_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

apps/analytics/src/modules/participant_analytics/compute_correctness.py

10-12: Use instance_dict["elementData"].get("options", None) instead of an if block

Replace with instance_dict["elementData"].get("options", None)

(SIM401)

Additional comments not posted (98)
apps/frontend-manage/src/pages/analytics/index.tsx (3)

4-6: LGTM!

The function is correctly implemented and follows best practices for component rendering.

The code changes are approved.


8-13: LGTM!

The function is correctly implemented and follows best practices for localization.

The code changes are approved.


16-16: LGTM!

The default export is correctly implemented.

The code changes are approved.

apps/frontend-manage/tsconfig.json (1)

19-19: LGTM!

The added path mapping improves code readability and maintainability by standardizing import paths.

The code changes are approved.

apps/analytics/pyproject.toml (3)

5-5: LGTM!

The expanded authors list reflects a change in the project's collaborative authorship.

The code changes are approved.


20-21: LGTM!

The new section for development dependencies enhances type checking during development.

The code changes are approved.


31-32: LGTM!

The new configuration for pyright enforces stricter type checks across the codebase.

The code changes are approved.

packages/prisma/src/prisma/migrations/20240826113537_participant_analytics_correctness_counts/migration.sql (1)

1-16: Verify the impact of dropping columns and ensure correct data types for new columns.

Dropping columns meanFirstCorrectCount, meanLastCorrectCount, and collectedAchievements will result in data loss. Ensure that this data is no longer needed or has been backed up. Additionally, verify that the new columns firstCorrectCount, firstWrongCount, lastCorrectCount, and lastWrongCount have the correct data types and constraints.

Run the following script to verify the data types and constraints of the new columns:

Verification successful

Verify the impact of dropping columns and ensure correct data types for new columns.

The new columns firstCorrectCount, firstWrongCount, lastCorrectCount, and lastWrongCount have the correct data type REAL, consistent with existing similar fields. However, dropping columns meanFirstCorrectCount, meanLastCorrectCount, and collectedAchievements will result in data loss. Ensure that this data is no longer needed or has been backed up.

  • File: packages/prisma/src/prisma/migrations/20240826113537_participant_analytics_correctness_counts/migration.sql
  • Model Definition: packages/prisma/src/prisma/schema/analytics.prisma
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the data types and constraints of the new columns in the `ParticipantAnalytics` table.

# Test: Check the schema of the `ParticipantAnalytics` table. Expect: Correct data types and constraints for the new columns.
psql -c '\d+ "ParticipantAnalytics"'

Length of output: 110


Script:

#!/bin/bash
# Search for files related to the `ParticipantAnalytics` table schema or ORM models
fd -e sql -e js -e ts -e prisma

# Search for mentions of `ParticipantAnalytics` within those files to gather schema information
rg 'ParticipantAnalytics' --type sql --type js --type ts --type prisma

Length of output: 19081


Script:

#!/bin/bash
# Search for mentions of `ParticipantAnalytics` within `.prisma` files to gather schema information
rg 'ParticipantAnalytics' --type-add 'prisma:*.prisma' --type prisma

Length of output: 550


Script:

#!/bin/bash
# Extract the definition of the `ParticipantAnalytics` model from the `analytics.prisma` file
rg 'model ParticipantAnalytics {' -A 20 packages/prisma/src/prisma/schema/analytics.prisma

Length of output: 212


Script:

#!/bin/bash
# Correct the regex pattern and extract the definition of the `ParticipantAnalytics` model from the `analytics.prisma` file
rg 'model ParticipantAnalytics \{' -A 20 packages/prisma/src/prisma/schema/analytics.prisma

Length of output: 1004

apps/frontend-control/tsconfig.json (1)

19-19: LGTM!

The new path alias mapping ~/* pointing to src/* improves code readability and maintainability by providing a clearer structure for imports.

The code changes are approved.

apps/frontend-manage/src/components/courses/actions/RunningLiveQuizLink.tsx (1)

17-17: LGTM!

The modified URL structure correctly reflects the shift in navigation logic, directing users to a quiz cockpit instead of a session cockpit.

The code changes are approved.

apps/frontend-pwa/tsconfig.json (1)

19-19: LGTM!

The new path mapping for the alias ~/* enhances module resolution and simplifies import statements.

The code changes are approved.

apps/frontend-manage/next.config.js (1)

11-39: LGTM!

The new asynchronous redirects function ensures seamless redirection from old session URLs to new quiz URLs, improving user experience and maintaining link integrity.

The code changes are approved.

apps/frontend-manage/src/components/courses/actions/StartLiveQuizButton.tsx (1)

27-27: LGTM!

The URL path modification in the router.push method correctly reflects the reorganization of the application's routing strategy.

The code changes are approved.

apps/analytics/src/modules/participant_analytics/compute_participant_analytics.py (5)

1-4: LGTM!

The import statements are correctly importing the necessary functions.

The code changes are approved.


7-11: LGTM!

The function definition and initial data retrieval are correctly implemented.

The code changes are approved.


12-19: LGTM!

The correctness computation is correctly implemented.

The code changes are approved.


21-25: LGTM!

The analytics aggregation and saving are correctly implemented.

The code changes are approved.


27-30: LGTM!

The data cleanup is correctly implemented.

The code changes are approved.

apps/office-addin/src/content/components/URLForm.tsx (2)

24-26: LGTM!

The regex for URL validation is correctly updated to match quiz URLs.

The code changes are approved.


45-45: LGTM!

The placeholder text is correctly updated to match the new URL format and provides clear guidance to the user.

The code changes are approved.

packages/prisma/src/prisma/migrations/20240827152536_computed_at_analytics/migration.sql (2)

1-11: Address the warnings.

The warnings indicate that new columns are added without default values. Ensure that the tables are empty or provide default values for the new columns.

Verify that the tables are empty or provide default values for the new columns.


12-36: LGTM!

The SQL statements to alter tables and add new columns are correctly implemented.

The code changes are approved.

apps/frontend-manage/src/components/sessions/cockpit/CancelSessionModal.tsx (1)

48-48: Verify the updated navigation route.

The route in the router.push function has been updated from /sessions to /quizzes. Ensure this change aligns with the intended user experience and that all related navigation flows are updated accordingly.

apps/analytics/src/modules/participant_analytics/get_participant_responses.py (5)

5-11: Verify the structure of detail.

Ensure that detail always contains either practiceQuiz or microLearning to avoid potential NoneType errors.


14-21: Verify the structure of participant_dict.

Ensure that participant_dict always contains detailQuestionResponses to avoid potential KeyError.


24-31: Verify the DataFrame structure.

Ensure that the DataFrame structure aligns with the expected schema for participant details.


35-49: Verify the structure of detail.

Ensure that detail always contains practiceQuiz or microLearning or handle cases where neither is present.


52-86: Verify the database query and verbose logging.

Ensure that the database query retrieves the expected data and that the verbose logging provides useful information.

apps/frontend-control/src/components/sessions/EmbeddingModal.tsx (1)

38-38: Verify the updated URL structure.

The URL structure in the LazyHMACLink function has been updated from /sessions to /quizzes. Ensure this change aligns with the intended functionality and that all related URL structures are updated accordingly.

apps/frontend-manage/src/components/sessions/EmbeddingModal.tsx (1)

32-32: LGTM!

The URL path change from /sessions/ to /quizzes/ is consistent with the shift to a quiz-based model.

The code changes are approved.

apps/analytics/src/modules/participant_analytics/compute_participant_course_analytics.py (9)

1-5: LGTM!

The necessary modules are imported correctly.

The code changes are approved.


21-39: LGTM!

The database query to find all participants and corresponding linked responses is correctly implemented.

The code changes are approved.


41-51: LGTM!

The creation of dataframes containing all detail responses and question responses is correctly implemented.

The code changes are approved.


64-80: LGTM!

The creation of pandas dataframes containing all question responses and details is correctly implemented.

The code changes are approved.


81-90: LGTM!

The computation of the correctness of each question response detail is correctly implemented.

The code changes are approved.


92-94: LGTM!

The computation of participant analytics is correctly implemented.

The code changes are approved.


95-103: LGTM!

The saving of aggregated analytics into the database is correctly implemented.

The code changes are approved.


105-108: LGTM!

The deletion of dataframes to avoid conflicts in the next iteration is correctly implemented.

The code changes are approved.


110-110: LGTM!

The function returns the count of courses without responses correctly.

The code changes are approved.

apps/frontend-manage/src/pages/quizzes/index.tsx (1)

18-18: LGTM!

The renaming of the component from SessionList to QuizList is consistent with the shift to a quiz-based model.

The code changes are approved.

apps/analytics/src/modules/participant_analytics/save_participant_analytics.py (3)

1-1: Import statement is correct.

The import statement for datetime is correct.

The code changes are approved.


4-5: Ensure timestamp consistency.

The computedAt timestamp is set to the current date with a fixed time of "00:00:00.000Z". Ensure this is the intended behavior and that the timestamp format is consistent with other parts of the application.

Verify that the computedAt timestamp format is consistent with other parts of the application.


91-92: Raise appropriate exceptions.

The function raises a ValueError for unknown analytics types. Ensure this is the appropriate exception type and that it is handled correctly in the calling code.

Verify that the ValueError is the appropriate exception type and that it is handled correctly in the calling code.

apps/frontend-manage/src/pages/quizzes/[id]/cockpit.tsx (2)

41-42: Update prefetch paths.

The prefetch paths have been updated to reflect the transition from sessions to quizzes. Ensure that these new paths are correct and that the corresponding pages exist.

Verify that the new prefetch paths are correct and that the corresponding pages exist.


118-118: Update push route.

The push route has been updated to reflect the transition from sessions to quizzes. Ensure that the new route is correct and that the corresponding page exists.

Verify that the new push route is correct and that the corresponding page exists.

apps/analytics/src/modules/participant_analytics/aggregate_analytics.py (9)

1-1: Import statement is correct.

The import statement for pandas is correct.

The code changes are approved.


4-21: Ensure aggregation logic is correct.

The aggregation logic for df_analytics_counts seems correct. Ensure that the column names and aggregation functions match the expected schema.

Verify that the column names and aggregation functions match the expected schema.


23-31: Ensure correctness of unstack operation.

The unstack operation in df_analytics_corr_temp seems correct. Ensure that the resulting dataframe has the expected structure.

Verify that the resulting dataframe from the unstack operation has the expected structure.


33-65: Ensure correctness of mean calculations.

The mean calculations for correctness columns seem correct. Ensure that the division operations handle cases where the denominator is zero.

Verify that the division operations handle cases where the denominator is zero.


67-79: Ensure correctness of aggregation logic.

The aggregation logic for df_analytics_correctness seems correct. Ensure that the column names and aggregation functions match the expected schema.

Verify that the column names and aggregation functions match the expected schema.


81-94: Ensure correctness of column mapping.

The column mapping for df_analytics_counts seems correct. Ensure that the new column names match the expected schema.

Verify that the new column names match the expected schema.


96-126: Ensure correctness of course analytics aggregation.

The aggregation logic for df_course_analytics seems correct. Ensure that the column names and aggregation functions match the expected schema.

Verify that the column names and aggregation functions match the expected schema.


127-144: Ensure correctness of dataframe merge.

The merge operations for the dataframes seem correct. Ensure that the resulting dataframe has the expected structure and that there are no duplicate rows.

Verify that the resulting dataframe from the merge operations has the expected structure and that there are no duplicate rows.


145-145: Return statement is correct.

The return statement is correct.

The code changes are approved.

apps/frontend-manage/src/components/common/Header.tsx (5)

33-35: LGTM!

The changes to the navigation item label and cy attribute are consistent with the new focus of the application.

The code changes are approved.


38-41: LGTM!

The changes to the navigation item label, href, and cy attributes are consistent with the new focus of the application.

The code changes are approved.


49-54: LGTM!

The addition of the new navigation item for "analytics" enhances the functionality of the component by providing users with more options.

The code changes are approved.


59-66: LGTM!

The updates to the class names for the Navigation component indicate a visual redesign, transitioning from a dark theme to a lighter theme.

The code changes are approved.


74-143: LGTM!

The updates to the class names, properties, icon sizes, and hover effects for Navigation.ButtonItem and Navigation.TriggerItem components enhance the user interface and experience. The onClick handlers have been correctly updated to route to new paths.

The code changes are approved.

apps/analytics/src/modules/participant_analytics/compute_correctness.py (2)

17-96: LGTM!

The function is well-structured and correctly computes correctness columns based on the element type.

The code changes are approved.


99-164: LGTM!

The function is well-structured and correctly processes the data to compute correctness.

The code changes are approved.

packages/prisma/src/prisma/schema/analytics.prisma (6)

19-20: LGTM!

The addition of the computedAt field enhances the granularity of data captured for each participant.

The code changes are approved.


38-41: LGTM!

The addition of the new fields allows for more detailed tracking of correctness over questions attempted.

The code changes are approved.


51-52: LGTM!

The addition of the createdAt and updatedAt fields enhances the temporal context of the data.

The code changes are approved.


Line range hint 64-79: LGTM!

The addition of the computedAt, createdAt, and updatedAt fields enhances the granularity and temporal context of the data captured for each participant.

The code changes are approved.


98-100: LGTM!

The addition of the createdAt and updatedAt fields enhances the temporal context of the data.

The code changes are approved.


119-121: LGTM!

The addition of the createdAt and updatedAt fields enhances the temporal context of the data.

The code changes are approved.

Also applies to: 140-142, 161-162

apps/analytics/src/notebooks/participant_analytics.ipynb (5)

4-8: LGTM!

The markdown cell is correctly formatted and provides a clear section header.

The code changes are approved.


11-27: LGTM!

The imports and setup are correctly implemented.

The code changes are approved.


54-58: LGTM!

The markdown cell is correctly formatted and provides a clear section header.

The code changes are approved.


113-117: LGTM!

The markdown cell is correctly formatted and provides a clear section header.

The code changes are approved.


150-157: LGTM!

The database disconnection is correctly implemented.

The code changes are approved.

apps/frontend-manage/src/components/interaction/AudienceInteraction.tsx (1)

91-91: LGTM! But verify the URL structure consistency.

The change correctly updates the URL structure to reference a quizzes endpoint. Ensure that the new URL structure is consistent with the rest of the application.

The code changes are approved.

Run the following script to verify the URL structure consistency:

Verification successful

URL Structure Consistency Verified

The /quizzes/ URL structure is consistently used across the application, confirming that the update in AudienceInteraction.tsx aligns with the existing structure. This supports the correctness of the change.

  • Files with consistent URL usage:
    • apps/frontend-manage/next.config.js
    • apps/frontend-manage/src/components/interaction/AudienceInteraction.tsx
    • Various components and test files
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the URL structure consistency for quizzes endpoint.

# Test: Search for the URL structure. Expect: Consistent URL structure across the application.
rg --type js --type tsx --type ts $'/quizzes/'

Length of output: 73


Script:

#!/bin/bash
# Description: Verify the URL structure consistency for quizzes endpoint.

# Test: Search for the URL structure. Expect: Consistent URL structure across the application.
rg --type javascript --type typescript '/quizzes/'

Length of output: 85


Script:

#!/bin/bash
# Description: Verify the URL structure consistency for quizzes endpoint.

# Test: Search for the URL structure. Expect: Consistent URL structure across the application.
rg --type js --type ts '/quizzes/'

Length of output: 2487

apps/analytics/src/notebooks/archive/test.ipynb (3)

16-27: LGTM!

The database connection setup is correctly implemented.

The code changes are approved.


128-139: LGTM!

The code correctly processes response details and activity analytics.

The code changes are approved


4-13: Avoid hardcoding the database URL.

Hardcoding the database URL is not a best practice. Consider using environment variables or configuration files to manage sensitive information.

Apply this diff to avoid hardcoding the database URL:

- os.environ['DATABASE_URL'] = 'postgresql://klicker:klicker@localhost:5432/klicker-prod'
+ os.environ['DATABASE_URL'] = os.getenv('DATABASE_URL', 'default_value')

Likely invalid or redundant comment.

apps/frontend-manage/src/components/sessions/cockpit/SessionTimeline.tsx (1)

198-198: Verify consistency of routing changes.

The href attribute has been updated to reflect the transition from sessions to quizzes. Ensure that this change is consistent across the application and does not break any existing functionality.

Run the following script to verify the consistency of routing changes:

apps/frontend-manage/src/components/sessions/creation/liveQuiz/LiveSessionWizard.tsx (2)

241-241: Verify consistency of routing changes.

The router.push method call has been updated to redirect users to /quizzes. Ensure that this change is consistent across the application and does not break any existing functionality.

Run the following script to verify the consistency of routing changes:


259-259: Verify consistency of routing changes.

The router.push method call has been updated to redirect users to /quizzes/${data?.createSession?.id}/cockpit. Ensure that this change is consistent across the application and does not break any existing functionality.

Run the following script to verify the consistency of routing changes:

apps/frontend-manage/src/components/sessions/Session.tsx (2)

195-195: Verify consistency of routing changes.

The href attribute has been updated to redirect users to /quizzes/${session.id}/evaluation. Ensure that this change is consistent across the application and does not break any existing functionality.

Run the following script to verify the consistency of routing changes:


180-180: Verify consistency of routing changes.

The href attribute has been updated to redirect users to /quizzes/${session.id}/cockpit. Ensure that this change is consistent across the application and does not break any existing functionality.

Run the following script to verify the consistency of routing changes:

apps/analytics/src/notebooks/archive/daily_participant_analytics.ipynb (5)

4-10: LGTM!

The markdown cell provides useful context for the notebook.

The content is approved.


13-23: LGTM!

The imports are necessary for the notebook's functionality.

The code changes are approved.


72-76: LGTM!

The markdown cell provides useful context for the notebook.

The content is approved.


79-104: Validate the column selection.

Ensure that all necessary columns are included in the DataFrame. Consider adding a validation step to check for missing columns.

Apply this diff to add a validation step:

 df_details = convert_to_df(participant_response_details)
+ required_columns = ['score', 'pointsAwarded', 'xpAwarded', 'timeSpent', 'response', 'elementInstanceId', 'participantId', 'courseId']
+ missing_columns = [col for col in required_columns if col not in df_details.columns]
+ if missing_columns:
+     raise ValueError(f"Missing columns: {', '.join(missing_columns)}")
 df_details = df_details[['score', 'pointsAwarded', 'xpAwarded', 'timeSpent', 'response', 'elementInstanceId', 'participantId', 'courseId']]

325-331: LGTM!

The database disconnection is correctly implemented.

The code changes are approved.

apps/frontend-manage/src/pages/index.tsx (1)

87-88: LGTM! But verify the new routes.

The changes correctly update the prefetching behavior. Ensure that the new routes /quizzes/running and /quizzes are valid and accessible.

The code changes are approved.

Run the following script to verify the new routes:

cypress/cypress/e2e/H-practice-quiz-workflow.cy.ts (3)

30-30: LGTM! But verify the consistency of the new selector.

The change correctly updates the selector to match the new UI structure. Ensure that the new selector data-cy="library" is used consistently across the codebase.

The code changes are approved.

Run the following script to verify the consistency of the new selector:

Verification successful

Selector data-cy="library" is used consistently across the codebase.

The selector appears in multiple test files, confirming its consistent application. No further action is needed regarding this change.

  • cypress/cypress/e2e/K-group-activity-workflow.cy.ts
  • cypress/cypress/e2e/H-practice-quiz-workflow.cy.ts
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the new selector is used consistently across the codebase.

# Test: Search for the new selector in the codebase. Expect: Only occurrences of the new selector.
rg --type js --type ts --type tsx --type jsx 'data-cy="library"'

Length of output: 94


Script:

#!/bin/bash
# Description: Verify that the new selector is used consistently across the codebase.

# Test: Search for the new selector in the codebase. Expect: Only occurrences of the new selector.
rg --type js --type ts 'data-cy="library"'

Length of output: 417


397-397: LGTM! But verify the consistency of the new selector.

The change correctly updates the selector to match the new UI structure. Ensure that the new selector data-cy="library" is used consistently across the codebase.

The code changes are approved.

Run the following script to verify the consistency of the new selector:

Verification successful

The new selector is used consistently across the codebase.

The data-cy="library" selector appears in multiple test files, confirming its consistent usage.

  • cypress/cypress/e2e/K-group-activity-workflow.cy.ts
  • cypress/cypress/e2e/H-practice-quiz-workflow.cy.ts
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the new selector is used consistently across the codebase.

# Test: Search for the new selector in the codebase. Expect: Only occurrences of the new selector.
rg --type js --type ts --type tsx --type jsx 'data-cy="library"'

Length of output: 94


Script:

#!/bin/bash
# Description: Verify that the new selector is used consistently across the codebase.

# Test: Search for the new selector in JavaScript and TypeScript files.
rg --type js --type ts 'data-cy="library"'

Length of output: 417


201-201: LGTM! But verify the consistency of the new selector.

The change correctly updates the selector to match the new UI structure. Ensure that the new selector data-cy="library" is used consistently across the codebase.

The code changes are approved.

Run the following script to verify the consistency of the new selector:

Verification successful

The selector data-cy="library" is used consistently across the codebase.

The new selector appears in multiple test files, indicating that the change has been applied correctly. No inconsistencies were found.

  • cypress/cypress/e2e/H-practice-quiz-workflow.cy.ts
  • cypress/cypress/e2e/K-group-activity-workflow.cy.ts
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the new selector is used consistently across the codebase.

# Test: Search for the new selector in the codebase. Expect: Only occurrences of the new selector.
rg --type js --type ts --type tsx --type jsx 'data-cy="library"'

Length of output: 94


Script:

#!/bin/bash
# Description: Verify that the new selector is used consistently across the codebase.

# Test: Search for the new selector in the codebase. Expect: Only occurrences of the new selector.
rg --type javascript --type typescript --type jsx 'data-cy="library"'

Length of output: 106


Script:

#!/bin/bash
# Description: Verify that the new selector is used consistently across the codebase.

# Test: Search for the new selector in the codebase. Expect: Only occurrences of the new selector.
rg --type js --type ts --type jsx 'data-cy="library"'

Length of output: 83


Script:

#!/bin/bash
# Description: Verify that the new selector is used consistently across the codebase.

# Test: Search for the new selector in the codebase. Expect: Only occurrences of the new selector.
rg --type js --type ts 'data-cy="library"'

Length of output: 417

cypress/cypress/e2e/K-group-activity-workflow.cy.ts (1)

19-19: Verify the correctness of the new selector.

Ensure that the new selector [data-cy="library"] correctly targets the intended UI element and that the test passes with this change.

Run the following script to verify the correctness of the new selector:

cypress/cypress/e2e/F-live-quiz-workflow.cy.ts (4)

653-653: Verify the correctness of the new selector.

Ensure that the new selector [data-cy="quizzes"] correctly targets the intended UI element and that the test passes with this change.

Run the following script to verify the correctness of the new selector:


680-680: Verify the correctness of the new selector.

Ensure that the new selector [data-cy="quizzes"] correctly targets the intended UI element and that the test passes with this change.

Run the following script to verify the correctness of the new selector:


760-760: Verify the correctness of the new selector.

Ensure that the new selector [data-cy="quizzes"] correctly targets the intended UI element and that the test passes with this change.

Run the following script to verify the correctness of the new selector:


689-689: Verify the correctness of the new URL path.

Ensure that the new URL path /quizzes/ correctly targets the intended endpoint and that the test passes with this change.

Run the following script to verify the correctness of the new URL path:

Verification successful

The new URL path /quizzes/ is correctly integrated across the codebase.

The change from /sessions/ to /quizzes/ is consistent and appears intentional, with the new path being used in multiple relevant contexts, such as routing and component links.

  • Files and contexts using /quizzes/:
    • apps/office-addin/src/content/components/URLForm.tsx
    • cypress/cypress/e2e/F-live-quiz-workflow.cy.ts
    • apps/frontend-manage/next.config.js
    • Various components in apps/frontend-manage/src

This confirms that the new URL path is correctly targeting the intended endpoint.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the correctness of the new URL path `/quizzes/`.

# Test: Search for the new URL path usage. Expect: Only occurrences of the new URL path.
rg --type js --type ts -A 5 $'/quizzes/'

Length of output: 11105

packages/i18n/messages/en.ts (3)

696-696: LGTM!

The entry library is correctly added and follows the existing pattern.

The code changes are approved.


697-697: LGTM!

The entry quizzes is correctly added and follows the existing pattern.

The code changes are approved.


698-698: LGTM!

The entry analytics is correctly added and follows the existing pattern.

The code changes are approved.

# Conflicts:
#	apps/frontend-manage/src/components/common/Header.tsx
#	apps/office-addin/src/content/components/URLForm.tsx
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between a1736ff and 9a3af74.

Files selected for processing (13)
  • apps/frontend-control/src/components/sessions/EmbeddingModal.tsx (1 hunks)
  • apps/frontend-manage/src/components/common/Header.tsx (1 hunks)
  • apps/frontend-manage/src/components/courses/actions/EvaluationLinkLiveQuiz.tsx (1 hunks)
  • apps/frontend-manage/src/components/courses/actions/RunningLiveQuizLink.tsx (1 hunks)
  • apps/frontend-manage/src/components/interaction/AudienceInteraction.tsx (1 hunks)
  • apps/frontend-manage/src/components/sessions/EmbeddingModal.tsx (1 hunks)
  • apps/frontend-manage/src/components/sessions/Session.tsx (2 hunks)
  • apps/frontend-manage/src/components/sessions/cockpit/CancelSessionModal.tsx (1 hunks)
  • apps/frontend-manage/src/components/sessions/cockpit/SessionTimeline.tsx (1 hunks)
  • apps/frontend-manage/src/components/sessions/evaluation/EvaluationControlBar.tsx (1 hunks)
  • apps/frontend-manage/src/pages/index.tsx (1 hunks)
  • apps/office-addin/src/content/components/URLForm.tsx (2 hunks)
  • packages/markdown/src/public/utilities.css (1 hunks)
Files skipped from review due to trivial changes (2)
  • apps/frontend-manage/src/components/courses/actions/EvaluationLinkLiveQuiz.tsx
  • packages/markdown/src/public/utilities.css
Files skipped from review as they are similar to previous changes (10)
  • apps/frontend-control/src/components/sessions/EmbeddingModal.tsx
  • apps/frontend-manage/src/components/courses/actions/RunningLiveQuizLink.tsx
  • apps/frontend-manage/src/components/interaction/AudienceInteraction.tsx
  • apps/frontend-manage/src/components/sessions/EmbeddingModal.tsx
  • apps/frontend-manage/src/components/sessions/Session.tsx
  • apps/frontend-manage/src/components/sessions/cockpit/CancelSessionModal.tsx
  • apps/frontend-manage/src/components/sessions/cockpit/SessionTimeline.tsx
  • apps/frontend-manage/src/components/sessions/evaluation/EvaluationControlBar.tsx
  • apps/frontend-manage/src/pages/index.tsx
  • apps/office-addin/src/content/components/URLForm.tsx
Additional comments not posted (6)
apps/frontend-manage/src/components/common/Header.tsx (6)

33-35: LGTM!

The changes to the first navigation item are straightforward and align with the new focus of the application.

The code changes are approved.


38-41: LGTM!

The changes to the second navigation item reflect the shift in the application's navigation structure.

The code changes are approved.


49-54: LGTM!

The addition of the new navigation item for "analytics" enhances the component's functionality by providing users with more options.

The code changes are approved.


59-66: LGTM!

The stylistic changes to the Navigation component indicate a visual redesign of the header.

The code changes are approved.


74-78: LGTM!

The changes to the Navigation.ButtonItem component improve the visual consistency of the navigation items.

The code changes are approved.


86-142: LGTM!

The changes to the Navigation.TriggerItem component improve the visual consistency and functionality of the navigation items.

The code changes are approved.

Copy link

@sjschlapbach sjschlapbach marked this pull request as draft August 30, 2024 15:22
Copy link

gitguardian bot commented Aug 30, 2024

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
1509424 Triggered Generic Password a47b11f util/_restore-db-dev.sh View secret
1509424 Triggered Generic Password 01eee07 docker-compose.yml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@sjschlapbach sjschlapbach marked this pull request as ready for review December 22, 2024 20:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

🛑 Comments failed to post (62)
apps/frontend-manage/src/components/analytics/activity/LevelTag.tsx (1)

6-24: 🛠️ Refactor suggestion

Consider refactoring to reduce duplication and improve accessibility.

The current implementation has duplicated JSX structure and relies solely on colors to convey information.

Consider applying these improvements:

+interface TagConfig {
+  bgColor: string;
+  label: string;
+  ariaLabel: string;
+}
+
+const TAG_CONFIGS: Record<1 | 2 | 3, TagConfig> = {
+  3: { bgColor: 'bg-green-700', label: 'levelHigh', ariaLabel: 'High level' },
+  2: { bgColor: 'bg-yellow-500', label: 'levelMedium', ariaLabel: 'Medium level' },
+  1: { bgColor: 'bg-red-600', label: 'levelLow', ariaLabel: 'Low level' },
+};
+
 function LevelTag({ level }: { level: 1 | 2 | 3 }) {
   const t = useTranslations()
+  const config = TAG_CONFIGS[level]
 
-  if (level === 3) {
-    return (
-      <div className="w-max rounded-md border bg-green-700 px-1 py-0.5 text-white">
-        {t('manage.analytics.levelHigh')}
-      </div>
-    )
-  } else if (level === 2) {
-    return (
-      <div className="w-max rounded-md border bg-yellow-500 px-1 py-0.5 text-white">
-        {t('manage.analytics.levelMedium')}
-      </div>
-    )
-  } else {
-    return (
-      <div className="w-max rounded-md border bg-red-600 px-1 py-0.5 text-white">
-        {t('manage.analytics.levelLow')}
-      </div>
-    )
-  }
+  return (
+    <div 
+      className={`w-max rounded-md border ${config.bgColor} px-1 py-0.5 text-white`}
+      role="status"
+      aria-label={config.ariaLabel}
+    >
+      {t(`manage.analytics.${config.label}`)}
+    </div>
+  )
 }

This refactoring:

  1. Reduces code duplication
  2. Adds proper ARIA attributes for accessibility
  3. Makes the component more maintainable
  4. Centralizes the configuration
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

interface TagConfig {
  bgColor: string;
  label: string;
  ariaLabel: string;
}

const TAG_CONFIGS: Record<1 | 2 | 3, TagConfig> = {
  3: { bgColor: 'bg-green-700', label: 'levelHigh', ariaLabel: 'High level' },
  2: { bgColor: 'bg-yellow-500', label: 'levelMedium', ariaLabel: 'Medium level' },
  1: { bgColor: 'bg-red-600', label: 'levelLow', ariaLabel: 'Low level' },
};

function LevelTag({ level }: { level: 1 | 2 | 3 }) {
  const t = useTranslations()
  const config = TAG_CONFIGS[level]

  return (
    <div 
      className={`w-max rounded-md border ${config.bgColor} px-1 py-0.5 text-white`}
      role="status"
      aria-label={config.ariaLabel}
    >
      {t(`manage.analytics.${config.label}`)}
    </div>
  )
}
apps/frontend-manage/src/components/analytics/StackedBarChartLabel.tsx (1)

14-26: 🛠️ Refactor suggestion

Enhance accessibility and flexibility of the text element.

The SVG text element needs accessibility improvements and could be more flexible.

Consider these improvements:

   <text
     x={x + width / 2}
     y={y + height / 2 + 1}
     fill="white"
     fontSize={14}
     textAnchor="middle"
     dominantBaseline="middle"
-    className="font-bold"
+    className="font-bold select-none"
+    role="presentation"
+    aria-hidden="true"
   >
-    {value} %
+    {Math.round(value)}%
   </text>

Also consider:

  1. Making fill and fontSize configurable via props
  2. Adding text truncation for large numbers
  3. Documenting the purpose of the 1px vertical offset or removing it if unnecessary

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/notebooks/participant_performance.ipynb (2)

94-102: 🛠️ Refactor suggestion

Ensure proper database connection cleanup

The database disconnection should be guaranteed even if errors occur.

Consider using a context manager:

+from contextlib import contextmanager
+
+@contextmanager
+def database_connection():
+    db = Prisma()
+    try:
+        db.connect()
+        yield db
+    finally:
+        db.disconnect()
+
+with database_connection() as db:
+    # Your database operations here
+    pass

Committable suggestion skipped: line range outside the PR's diff.


70-91: 🛠️ Refactor suggestion

Add error handling and performance optimizations

The main computation loop lacks error handling and could be optimized for performance.

Consider these improvements:

  1. Add try-catch blocks for database operations
  2. Implement batch processing for database operations
  3. Add progress tracking using tqdm
  4. Implement memory-efficient processing for large datasets
+from tqdm import tqdm
+
 # Fetch all courses from the database
-df_courses = get_running_past_courses(db)
+try:
+    df_courses = get_running_past_courses(db)
+except Exception as e:
+    print(f"Error fetching courses: {e}")
+    db.disconnect()
+    raise

-for idx, course in df_courses.iterrows():
+for idx, course in tqdm(df_courses.iterrows(), total=len(df_courses)):
     course_id = course["id"]
-    print(f"Processing course", idx, "of", len(df_courses), "with id", course_id)
+    try:
         # fetch all question responses linked to this course
         question_responses = db.questionresponse.find_many(where={"courseId": course_id})
         df_responses = pd.DataFrame(list(map(lambda x: x.dict(), question_responses)))

         if df_responses.empty:
             print("No responses linked to course", course_id)
             continue

         df_performance = compute_response_error_rates(df_responses)
         df_performance = compute_performance_levels(df_performance)

         # store computed performance analytics in the corresponding database table
         save_participant_performance(db, df_performance, course_id)
+    except Exception as e:
+        print(f"Error processing course {course_id}: {e}")
+        continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    from tqdm import tqdm

    # Fetch all courses from the database
    try:
        df_courses = get_running_past_courses(db)
    except Exception as e:
        print(f"Error fetching courses: {e}")
        db.disconnect()
        raise

    # Iterate over the course and fetch all question responses linked to it
    for idx, course in tqdm(df_courses.iterrows(), total=len(df_courses)):
        course_id = course["id"]
        try:
            # fetch all question responses linked to this course
            question_responses = db.questionresponse.find_many(where={"courseId": course_id})
            df_responses = pd.DataFrame(list(map(lambda x: x.dict(), question_responses)))

            # if no responses are linked to the course, skip the iteration
            if df_responses.empty:
                print("No responses linked to course", course_id)
                continue

            df_performance = compute_response_error_rates(df_responses)
            df_performance = compute_performance_levels(df_performance)

            # store computed performance analytics in the corresponding database table
            save_participant_performance(db, df_performance, course_id)
        except Exception as e:
            print(f"Error processing course {course_id}: {e}")
            continue
apps/analytics/src/notebooks/participant_course_analytics.ipynb (2)

47-55: ⚠️ Potential issue

Security: Remove hardcoded database credentials.

Several critical issues in the database setup:

  1. Database credentials should not be hardcoded in the notebook
  2. Missing error handling for database connection
  3. The verbose flag is defined but never used

Consider this safer approach:

-os.environ["DATABASE_URL"] = "postgresql://klicker:klicker@localhost:5432/klicker-prod"
+# Load database URL from environment variable
+if "DATABASE_URL" not in os.environ:
+    raise ValueError("DATABASE_URL environment variable is required")

-db.connect()
+try:
+    db.connect()
+except Exception as e:
+    print(f"Failed to connect to database: {e}")
+    raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

db = Prisma()

# set the environment variable DATABASE_URL to the connection string of your database
# Load database URL from environment variable
if "DATABASE_URL" not in os.environ:
    raise ValueError("DATABASE_URL environment variable is required")

try:
    db.connect()
except Exception as e:
    print(f"Failed to connect to database: {e}")
    raise

# Script settings
verbose = False

71-92: 🛠️ Refactor suggestion

Enhance robustness of the computation logic.

The main computation loop could benefit from several improvements:

  1. Better progress tracking for long-running operations
  2. Error handling for computation functions
  3. Transaction management for database operations

Consider this enhanced implementation:

+from tqdm.notebook import tqdm
+from contextlib import contextmanager

+@contextmanager
+def db_transaction(db):
+    try:
+        yield
+        db.commit()
+    except Exception as e:
+        db.rollback()
+        raise

 # find all courses that started in the past
 df_courses = get_running_past_courses(db)

 # iterate over all courses and compute the participant course analytics
-for idx, course in df_courses.iterrows():
+for idx, course in tqdm(df_courses.iterrows(), total=len(df_courses), desc="Processing courses"):
     print(f"Processing course", idx, "of", len(df_courses), "with id", course["id"])

     # compute the number of active weeks per participant and activity level
-    df_activity = get_active_weeks(db, course)
+    try:
+        df_activity = get_active_weeks(db, course)
+    except Exception as e:
+        print(f"Error processing course {course['id']}: {e}")
+        continue

     # if the dataframe is empty, no participant was active in the course
     if df_activity.empty:
         print("No participant was active in the course, skipping")
         continue

     # compute the number of active days per week and mean elements per day
-    df_activity = compute_participant_activity(
-        db, df_activity, course["id"], course["startDate"], course["endDate"]
-    )
+    try:
+        df_activity = compute_participant_activity(
+            db, df_activity, course["id"], course["startDate"], course["endDate"]
+        )
+    except Exception as e:
+        print(f"Error computing activity for course {course['id']}: {e}")
+        continue

     # store the computed participant course analytics
-    save_participant_course_analytics(db, df_activity)
+    with db_transaction(db):
+        save_participant_course_analytics(db, df_activity)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    from tqdm.notebook import tqdm
    from contextlib import contextmanager

    @contextmanager
    def db_transaction(db):
        try:
            yield
            db.commit()
        except Exception as e:
            db.rollback()
            raise

    # find all courses that started in the past
    df_courses = get_running_past_courses(db)

    # iterate over all courses and compute the participant course analytics
    for idx, course in tqdm(df_courses.iterrows(), total=len(df_courses), desc="Processing courses"):
        print(f"Processing course", idx, "of", len(df_courses), "with id", course["id"])

        # compute the number of active weeks per participant and activity level
        try:
            df_activity = get_active_weeks(db, course)
        except Exception as e:
            print(f"Error processing course {course['id']}: {e}")
            continue

        # if the dataframe is empty, no participant was active in the course
        if df_activity.empty:
            print("No participant was active in the course, skipping")
            continue

        # compute the number of active days per week and mean elements per day
        try:
            df_activity = compute_participant_activity(
                db, df_activity, course["id"], course["startDate"], course["endDate"]
            )
        except Exception as e:
            print(f"Error computing activity for course {course['id']}: {e}")
            continue

        # store the computed participant course analytics
        with db_transaction(db):
            save_participant_course_analytics(db, df_activity)
apps/frontend-manage/src/components/analytics/ActivityTypeFilter.tsx (1)

18-28: 🛠️ Refactor suggestion

Consider deriving activity types from enum and sharing code with ElementTypeFilter.

  1. The items array is hardcoded with specific ActivityTypes. Consider deriving it from the ActivityType enum for maintainability.
  2. There's significant code duplication with ElementTypeFilter. Consider creating a shared generic filter component.
+import { useMemo } from 'react'

 function ActivityTypeFilter({
   activityType,
   setActivityType,
 }: {
   activityType: ActivityType | 'all'
   setActivityType: Dispatch<SetStateAction<ActivityType | 'all'>>
 }) {
   const t = useTranslations()
+  
+  const items = useMemo(() => [
+    { value: 'all', label: t('manage.analytics.allActivityTypes') },
+    ...Object.values(ActivityType).map((value) => ({
+      value,
+      label: t(`shared.generic.${value.toLowerCase()}s`),
+    })),
+  ], [t])

   return (
     <SelectField
       label={t('manage.analytics.activityType')}
-      items={[
-        { value: 'all', label: t('manage.analytics.allActivityTypes') },
-        {
-          value: ActivityType.PracticeQuiz,
-          label: t('shared.generic.practiceQuizzes'),
-        },
-        {
-          value: ActivityType.MicroLearning,
-          label: t('shared.generic.microlearnings'),
-        },
-      ]}
+      items={items}
       value={activityType}

Alternative: Create a shared generic filter component:

interface TypeFilterProps<T> {
  type: T | 'all'
  setType: Dispatch<SetStateAction<T | 'all'>>
  enumValues: Record<string, T>
  labelPrefix: string
  translationKey: string
}

function TypeFilter<T extends string>({
  type,
  setType,
  enumValues,
  labelPrefix,
  translationKey,
}: TypeFilterProps<T>) {
  const t = useTranslations()
  
  const items = useMemo(() => [
    { value: 'all', label: t(`${labelPrefix}.all${translationKey}Types`) },
    ...Object.values(enumValues).map((value) => ({
      value,
      label: t(`${labelPrefix}.${value.toLowerCase()}s`),
    })),
  ], [t, enumValues, labelPrefix, translationKey])

  return (
    <SelectField
      label={t(`${labelPrefix}.${translationKey}Type`)}
      items={items}
      value={type}
      onChange={(value) => setType(value as T | 'all')}
      className={{ select: { root: 'w-52', trigger: 'h-8' } }}
    />
  )
}

// Usage:
<TypeFilter
  type={activityType}
  setType={setActivityType}
  enumValues={ActivityType}
  labelPrefix="manage.analytics"
  translationKey="activity"
/>
apps/frontend-manage/src/components/analytics/ActivitiesElementsSwitch.tsx (1)

21-40: 🛠️ Refactor suggestion

Enhance accessibility for button group

The button group implementation should include proper ARIA attributes for better screen reader support.

-      <div className="flex flex-row">
+      <div 
+        className="flex flex-row"
+        role="group"
+        aria-label={t('manage.analytics.viewTypeSelector')}
+      >
         <Button
           basic
           onClick={() => setType('activity')}
+          aria-pressed={type === 'activity'}
           className={{
             root: `py-0.25 h-8 rounded-l border !border-r-0 border-solid px-2 ${type === 'activity' ? 'bg-primary-100 border-primary-100 text-white' : ''}`,
           }}
         >
           {t('manage.analytics.activities')}
         </Button>
         <Button
           basic
           onClick={() => setType('instance')}
+          aria-pressed={type === 'instance'}
           className={{
             root: `h-8 rounded-r border !border-l-0 border-solid px-2 py-0.5 ${type === 'instance' ? 'bg-primary-100 border-primary-100 text-white' : ''}`,
           }}
         >
           {t('manage.analytics.elements')}
         </Button>
       </div>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

      <div 
        className="flex flex-row"
        role="group"
        aria-label={t('manage.analytics.viewTypeSelector')}
      >
        <Button
          basic
          onClick={() => setType('activity')}
          aria-pressed={type === 'activity'}
          className={{
            root: `py-0.25 h-8 rounded-l border !border-r-0 border-solid px-2 ${type === 'activity' ? 'bg-primary-100 border-primary-100 text-white' : ''}`,
          }}
        >
          {t('manage.analytics.activities')}
        </Button>
        <Button
          basic
          onClick={() => setType('instance')}
          aria-pressed={type === 'instance'}
          className={{
            root: `h-8 rounded-r border !border-l-0 border-solid px-2 py-0.5 ${type === 'instance' ? 'bg-primary-100 border-primary-100 text-white' : ''}`,
          }}
        >
          {t('manage.analytics.elements')}
        </Button>
      </div>
apps/frontend-manage/src/components/analytics/activity/useTotalStudentActivityHistogram.ts (2)

21-24: ⚠️ Potential issue

Add error handling for empty participant activity

The hook should handle the case when there's no participant activity data.

   return useMemo(() => {
+    if (!participantActivity?.length) {
+      return []
+    }
     const activeWeeks = participantActivity.map((p) => p.activeWeeks)
     const histogram = groupStudentActivityBins(activeWeeks, courseWeeks)
     const stats = computeHistogramStatistics(activeWeeks)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

  return useMemo(() => {
    if (!participantActivity?.length) {
      return []
    }
    const activeWeeks = participantActivity.map((p) => p.activeWeeks)
    const histogram = groupStudentActivityBins(activeWeeks, courseWeeks)
    const stats = computeHistogramStatistics(activeWeeks)

6-12: ⚠️ Potential issue

Add input validation for histogram binning

The groupStudentActivityBins function should validate inputs to prevent potential issues with large totalWeeks values or negative numbers.

 function groupStudentActivityBins(activeWeeks: number[], totalWeeks: number) {
+  if (totalWeeks < 0 || totalWeeks > 52) {
+    throw new Error('Total weeks must be between 0 and 52')
+  }
+  if (activeWeeks.some(weeks => weeks < 0 || weeks > totalWeeks)) {
+    throw new Error('Active weeks must be between 0 and total weeks')
+  }
   const histogram = new Array(totalWeeks + 1).fill(0)
   activeWeeks.forEach((weeks) => {
     histogram[weeks] = (histogram[weeks] || 0) + 1
   })
   return histogram
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

function groupStudentActivityBins(activeWeeks: number[], totalWeeks: number) {
  if (totalWeeks < 0 || totalWeeks > 52) {
    throw new Error('Total weeks must be between 0 and 52')
  }
  if (activeWeeks.some(weeks => weeks < 0 || weeks > totalWeeks)) {
    throw new Error('Active weeks must be between 0 and total weeks')
  }
  const histogram = new Array(totalWeeks + 1).fill(0)
  activeWeeks.forEach((weeks) => {
    histogram[weeks] = (histogram[weeks] || 0) + 1
  })
  return histogram
}
apps/frontend-manage/src/components/analytics/activity/DailyActivityTimeSeries.tsx (1)

21-34: 🛠️ Refactor suggestion

Optimize data transformation logic

The current implementation has several potential improvements:

  1. Date formatting logic could be extracted to a utility function
  2. Percentage calculation should handle division by zero
  3. Consider using useMemo for the transformed data to prevent unnecessary recalculations
+const formatDate = (date: Date): string => {
+  return date.toLocaleDateString('en-GB', {
+    day: '2-digit',
+    month: '2-digit',
+    year: 'numeric',
+  }).replace(/\//g, '-')
+}

+const calculatePercentage = (active: number, total: number): number => {
+  return total > 0 ? (active / total) * 100 : 0
+}

function DailyActivityTimeSeries({ activity, courseParticipants }: DailyActivityTimeSeriesProps) {
  const t = useTranslations()
+  const transformedData = useMemo(() => 
+    activity.map((item) => ({
+      date: formatDate(new Date(item.date)),
+      activeParticipants: calculatePercentage(item.activeParticipants, courseParticipants)
+    })), [activity, courseParticipants])

   return (
     // ...
-    <ActivityTimeSeriesPlot
-      activityData={activity.map((item) => {
-        const date = new Date(item.date)
-        return {
-          date: date.toLocaleDateString('en-GB', {
-            day: '2-digit',
-            month: '2-digit',
-            year: 'numeric',
-          }).replace(/\//g, '-'),
-          activeParticipants: (item.activeParticipants / courseParticipants) * 100,
-        }
-      })}
-    />
+    <ActivityTimeSeriesPlot activityData={transformedData} />
   )

Committable suggestion skipped: line range outside the PR's diff.

apps/frontend-manage/src/components/analytics/activity/ActivityTimeSeriesPlot.tsx (1)

14-24: 🛠️ Refactor suggestion

Enhance type safety and validation for chart data

The component could benefit from stronger typing and validation:

  1. Define explicit interfaces for the data structure
  2. Add runtime validation for required props
+interface ActivityDataPoint {
+  date: string
+  activeParticipants: number
+  activeParticipantsReference?: number
+}

+interface ActivityTimeSeriesPlotProps {
+  singleCourse?: boolean
+  currentCourse?: string
+  comparisonCourse?: string
+  activityData: ActivityDataPoint[]
+}

-function ActivityTimeSeriesPlot({
+function ActivityTimeSeriesPlot({
   singleCourse = true,
   currentCourse,
   comparisonCourse,
   activityData,
-}: {
-  singleCourse?: boolean
-  currentCourse?: string
-  comparisonCourse?: string
-  activityData: { date: string; activeParticipants: number }[]
-}) {
+}: ActivityTimeSeriesPlotProps) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

interface ActivityDataPoint {
  date: string
  activeParticipants: number
  activeParticipantsReference?: number
}

interface ActivityTimeSeriesPlotProps {
  singleCourse?: boolean
  currentCourse?: string
  comparisonCourse?: string
  activityData: ActivityDataPoint[]
}

function ActivityTimeSeriesPlot({
  singleCourse = true,
  currentCourse,
  comparisonCourse,
  activityData,
}: ActivityTimeSeriesPlotProps) {
apps/frontend-manage/src/components/analytics/activity/SuspendedCourseComparison.tsx (1)

24-31: ⚠️ Potential issue

Improve error handling and data transformation

The GraphQL query handling could be improved:

  1. Add error handling for the query
  2. Memoize the transformed courses data
  3. Add loading state for initial data fetch
+import { ApolloError } from '@apollo/client'
+import { UserNotification } from '@uzh-bf/design-system'

function SuspendedCourseComparison({ ... }) {
-  const { data } = useSuspenseQuery(GetUserCoursesDocument)
+  const { data, error } = useSuspenseQuery(GetUserCoursesDocument)

+  const courses = useMemo(() => 
+    data?.userCourses
+      ?.filter((course) => course.id !== router.query.courseId)
+      .map((course) => ({
+        label: course.name,
+        value: course.id,
+      })) ?? [],
+    [data?.userCourses, router.query.courseId]
+  )

+  if (error) {
+    return (
+      <UserNotification
+        message={t('shared.errors.failedToLoadCourses')}
+        type="error"
+      />
+    )
+  }

-  const courses =
-    data.userCourses
-      ?.filter((course) => course.id !== router.query.courseId)
-      .map((course) => ({
-        label: course.name,
-        value: course.id,
-      })) ?? []

Committable suggestion skipped: line range outside the PR's diff.

apps/frontend-manage/src/components/analytics/activity/WeeklyActivityTimeSeries.tsx (2)

40-73: 🛠️ Refactor suggestion

Refactor complex data transformation logic

The data transformation logic is complex and could benefit from better organization and safety checks.

  1. Extract data transformation logic:
const formatComparisonData = (
  activity: ParticipantActivityTimestamp[],
  courseParticipants: number,
  secondActivity: any[],
  secondParticipants: number
) => {
  return activity.map((item, idx) => ({
    date: t('manage.analytics.weekN', { number: idx + 1 }),
    activeParticipants: (item.activeParticipants / courseParticipants) * 100,
    activeParticipantsReference: secondActivity.length > idx
      ? (secondActivity[idx].activeParticipants / secondParticipants) * 100
      : undefined,
  }))
}

const formatSingleCourseData = (
  activity: ParticipantActivityTimestamp[],
  courseParticipants: number
) => {
  return activity.map((item) => ({
    date: new Date(item.date).toLocaleDateString('en-GB', {
      day: '2-digit',
      month: '2-digit',
      year: 'numeric',
    }).replace(/\//g, '-'),
    activeParticipants: (item.activeParticipants / courseParticipants) * 100,
  }))
}
  1. Add safety check for division by zero:
- (secondActivity[idx].activeParticipants / secondParticipants) * 100
+ secondParticipants > 0 
+   ? (secondActivity[idx].activeParticipants / secondParticipants) * 100 
+   : 0

26-33: 🛠️ Refactor suggestion

Add error handling for the GraphQL query

The query implementation should handle error states to improve user experience.

Add error handling:

- const { data, loading } = useQuery(GetCourseWeeklyActivityDocument, {
+ const { data, loading, error } = useQuery(GetCourseWeeklyActivityDocument, {
    variables: { courseId: courseComparison?.id },
    skip: typeof courseComparison === 'undefined',
  })
+ 
+ if (error) {
+   return (
+     <UserNotification
+       message={t('manage.analytics.errorLoadingComparison')}
+       type="error"
+     />
+   )
+ }

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/modules/instance_activity_performance/agg_activity_performance.py (1)

1-6: ⚠️ Potential issue

Fix the to_dict() call and add type hints, docstring, and error handling

The function has several issues that need to be addressed:

  1. The to_dict() call on line 4 has no effect as its result is not assigned to any variable
  2. Missing type hints for better code maintainability
  3. Missing docstring for documentation
  4. No error handling for edge cases

Here's the suggested fix:

-def agg_activity_performance(df_instance_performance):
-    activity_performance = df_instance_performance.mean()
-    activity_performance.drop("instanceId", inplace=True)
-    activity_performance.to_dict()
-
-    return activity_performance
+from typing import Dict
+import pandas as pd
+
+def agg_activity_performance(df_instance_performance: pd.DataFrame) -> Dict[str, float]:
+    """
+    Compute mean performance metrics across all instances.
+    
+    Args:
+        df_instance_performance: DataFrame containing performance metrics with instanceId
+        
+    Returns:
+        Dictionary of aggregated performance metrics
+        
+    Raises:
+        ValueError: If DataFrame is empty or missing required columns
+    """
+    if df_instance_performance.empty:
+        raise ValueError("DataFrame is empty")
+        
+    if "instanceId" not in df_instance_performance.columns:
+        raise ValueError("DataFrame missing required column: instanceId")
+        
+    activity_performance = df_instance_performance.mean()
+    activity_performance.drop("instanceId", inplace=True)
+    return activity_performance.to_dict()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import Dict
import pandas as pd

def agg_activity_performance(df_instance_performance: pd.DataFrame) -> Dict[str, float]:
    """
    Compute mean performance metrics across all instances.
    
    Args:
        df_instance_performance: DataFrame containing performance metrics with instanceId
        
    Returns:
        Dictionary of aggregated performance metrics
        
    Raises:
        ValueError: If DataFrame is empty or missing required columns
    """
    if df_instance_performance.empty:
        raise ValueError("DataFrame is empty")
        
    if "instanceId" not in df_instance_performance.columns:
        raise ValueError("DataFrame missing required column: instanceId")
        
    activity_performance = df_instance_performance.mean()
    activity_performance.drop("instanceId", inplace=True)
    return activity_performance.to_dict()
apps/analytics/src/modules/instance_activity_performance/get_course_activities.py (1)

1-14: 🛠️ Refactor suggestion

Add type hints, error handling, and refactor duplicate query logic

The function needs improvements in several areas:

  1. Missing type hints
  2. Duplicate query structure
  3. No error handling for database operations
  4. No validation for course_id

Here's the suggested refactor:

+from typing import List, Dict, Tuple
+from prisma import Prisma
+
+def get_activities_by_type(db: Prisma, course_id: str, collection: str) -> List[Dict]:
+    """Helper function to fetch activities from a specific collection"""
+    try:
+        results = getattr(db, collection).find_many(
+            where={"courseId": course_id},
+            include={"stacks": {"include": {"elements": {"include": {"responses": True}}}}},
+        )
+        return list(map(lambda x: x.dict(), results))
+    except Exception as e:
+        raise Exception(f"Failed to fetch {collection}: {str(e)}")
+
+def get_course_activities(db: Prisma, course_id: str) -> Tuple[List[Dict], List[Dict]]:
+    """
+    Fetch practice quizzes and microlearning activities for a course.
+    
+    Args:
+        db: Prisma database instance
+        course_id: ID of the course
+        
+    Returns:
+        Tuple of (practice_quizzes, microlearning_activities)
+        
+    Raises:
+        ValueError: If course_id is invalid
+        Exception: If database operations fail
+    """
+    if not course_id:
+        raise ValueError("course_id is required")
+        
+    return (
+        get_activities_by_type(db, course_id, "practicequiz"),
+        get_activities_by_type(db, course_id, "microlearning")
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import List, Dict, Tuple
from prisma import Prisma

def get_activities_by_type(db: Prisma, course_id: str, collection: str) -> List[Dict]:
    """Helper function to fetch activities from a specific collection"""
    try:
        results = getattr(db, collection).find_many(
            where={"courseId": course_id},
            include={"stacks": {"include": {"elements": {"include": {"responses": True}}}}},
        )
        return list(map(lambda x: x.dict(), results))
    except Exception as e:
        raise Exception(f"Failed to fetch {collection}: {str(e)}")

def get_course_activities(db: Prisma, course_id: str) -> Tuple[List[Dict], List[Dict]]:
    """
    Fetch practice quizzes and microlearning activities for a course.
    
    Args:
        db: Prisma database instance
        course_id: ID of the course
        
    Returns:
        Tuple of (practice_quizzes, microlearning_activities)
        
    Raises:
        ValueError: If course_id is invalid
        Exception: If database operations fail
    """
    if not course_id:
        raise ValueError("course_id is required")
        
    return (
        get_activities_by_type(db, course_id, "practicequiz"),
        get_activities_by_type(db, course_id, "microlearning")
    )
apps/analytics/src/modules/instance_activity_performance/save_instance_performances.py (1)

1-44: 🛠️ Refactor suggestion

Add type hints, validation, error handling, and optimize performance

The function needs several improvements:

  1. Missing type hints
  2. No validation of required DataFrame columns
  3. Inefficient row-by-row processing
  4. No error handling for database operations
  5. Hard-coded column names

Here's the suggested refactor:

+from typing import List, Dict, Set
+import pandas as pd
+from prisma import Prisma
+
+REQUIRED_COLUMNS: Set[str] = {
+    "instanceId", "responseCount", "totalErrorRate", "totalPartialRate",
+    "totalCorrectRate", "averageTimeSpent"
+}
+
+DETAILED_COLUMNS: Set[str] = {
+    "firstErrorRate", "firstPartialRate", "firstCorrectRate",
+    "lastErrorRate", "lastPartialRate", "lastCorrectRate"
+}
+
+def validate_dataframe(df: pd.DataFrame, total_only: bool) -> None:
+    """Validate DataFrame has required columns"""
+    missing_columns = REQUIRED_COLUMNS - set(df.columns)
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")
+        
+    if not total_only:
+        missing_detailed = DETAILED_COLUMNS - set(df.columns)
+        if missing_detailed:
+            raise ValueError(f"Missing detailed columns: {missing_detailed}")
+
+def prepare_upsert_data(row: pd.Series, course_id: str, total_only: bool) -> Dict:
+    """Prepare data for upsert operation"""
+    values = {col: row[col] for col in REQUIRED_COLUMNS if col != "instanceId"}
+    
+    if not total_only:
+        values.update({col: row[col] for col in DETAILED_COLUMNS})
+        
+    create_values = values.copy()
+    create_values.update({
+        "instance": {"connect": {"id": row["instanceId"]}},
+        "course": {"connect": {"id": course_id}},
+    })
+    
+    return {
+        "where": {"instanceId": row["instanceId"]},
+        "data": {"create": create_values, "update": values}
+    }
+
+def save_instance_performances(
+    db: Prisma,
+    df_instance_performance: pd.DataFrame,
+    course_id: str,
+    total_only: bool = False
+) -> None:
+    """
+    Save or update instance performance metrics in the database.
+    
+    Args:
+        db: Prisma database instance
+        df_instance_performance: DataFrame with performance metrics
+        course_id: ID of the course
+        total_only: If True, only save total metrics
+        
+    Raises:
+        ValueError: If DataFrame is invalid or missing required columns
+        Exception: If database operations fail
+    """
+    if df_instance_performance.empty:
+        raise ValueError("DataFrame is empty")
+        
+    if not course_id:
+        raise ValueError("course_id is required")
+        
+    validate_dataframe(df_instance_performance, total_only)
+    
+    try:
+        # Prepare all upsert operations
+        upsert_data = [
+            prepare_upsert_data(row, course_id, total_only)
+            for _, row in df_instance_performance.iterrows()
+        ]
+        
+        # Execute upserts in batches
+        batch_size = 100
+        for i in range(0, len(upsert_data), batch_size):
+            batch = upsert_data[i:i + batch_size]
+            db.instanceperformance.batch_(
+                [db.instanceperformance.upsert(**data) for data in batch]
+            )
+    except Exception as e:
+        raise Exception(f"Failed to save instance performances: {str(e)}")

This refactor:

  1. Adds type hints for better code maintainability
  2. Validates DataFrame columns before processing
  3. Uses batch processing for better performance
  4. Adds proper error handling
  5. Extracts magic strings into constants
  6. Separates concerns into helper functions
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import List, Dict, Set
import pandas as pd
from prisma import Prisma

REQUIRED_COLUMNS: Set[str] = {
    "instanceId", "responseCount", "totalErrorRate", "totalPartialRate",
    "totalCorrectRate", "averageTimeSpent"
}

DETAILED_COLUMNS: Set[str] = {
    "firstErrorRate", "firstPartialRate", "firstCorrectRate",
    "lastErrorRate", "lastPartialRate", "lastCorrectRate"
}

def validate_dataframe(df: pd.DataFrame, total_only: bool) -> None:
    """Validate DataFrame has required columns"""
    missing_columns = REQUIRED_COLUMNS - set(df.columns)
    if missing_columns:
        raise ValueError(f"Missing required columns: {missing_columns}")
        
    if not total_only:
        missing_detailed = DETAILED_COLUMNS - set(df.columns)
        if missing_detailed:
            raise ValueError(f"Missing detailed columns: {missing_detailed}")

def prepare_upsert_data(row: pd.Series, course_id: str, total_only: bool) -> Dict:
    """Prepare data for upsert operation"""
    values = {col: row[col] for col in REQUIRED_COLUMNS if col != "instanceId"}
    
    if not total_only:
        values.update({col: row[col] for col in DETAILED_COLUMNS})
        
    create_values = values.copy()
    create_values.update({
        "instance": {"connect": {"id": row["instanceId"]}},
        "course": {"connect": {"id": course_id}},
    })
    
    return {
        "where": {"instanceId": row["instanceId"]},
        "data": {"create": create_values, "update": values}
    }

def save_instance_performances(
    db: Prisma,
    df_instance_performance: pd.DataFrame,
    course_id: str,
    total_only: bool = False
) -> None:
    """
    Save or update instance performance metrics in the database.
    
    Args:
        db: Prisma database instance
        df_instance_performance: DataFrame with performance metrics
        course_id: ID of the course
        total_only: If True, only save total metrics
        
    Raises:
        ValueError: If DataFrame is invalid or missing required columns
        Exception: If database operations fail
    """
    if df_instance_performance.empty:
        raise ValueError("DataFrame is empty")
        
    if not course_id:
        raise ValueError("course_id is required")
        
    validate_dataframe(df_instance_performance, total_only)
    
    try:
        # Prepare all upsert operations
        upsert_data = [
            prepare_upsert_data(row, course_id, total_only)
            for _, row in df_instance_performance.iterrows()
        ]
        
        # Execute upserts in batches
        batch_size = 100
        for i in range(0, len(upsert_data), batch_size):
            batch = upsert_data[i:i + batch_size]
            db.instanceperformance.batch_(
                [db.instanceperformance.upsert(**data) for data in batch]
            )
    except Exception as e:
        raise Exception(f"Failed to save instance performances: {str(e)}")
apps/analytics/src/modules/instance_activity_performance/save_activity_performance.py (1)

38-41: 🛠️ Refactor suggestion

Add error handling for database operations

The database operation should be wrapped in try-except to handle potential errors.

+    try:
         db.activityperformance.upsert(
             where=where_clause,
             data={"create": create_values, "update": values},
         )
+    except Exception as e:
+        raise RuntimeError(f"Failed to save activity performance: {str(e)}") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    try:
        db.activityperformance.upsert(
            where=where_clause,
            data={"create": create_values, "update": values},
        )
    except Exception as e:
        raise RuntimeError(f"Failed to save activity performance: {str(e)}") from e
apps/analytics/src/modules/instance_activity_performance/compute_instance_performance.py (3)

25-29: 🛠️ Refactor suggestion

Add error handling for DataFrame creation

The DataFrame creation from instance responses lacks error handling.

-            df_responses = pd.DataFrame(instance["responses"])
+            try:
+                df_responses = pd.DataFrame(instance["responses"])
+            except Exception as e:
+                print(f"Warning: Failed to process responses for instance {instance['id']}: {str(e)}")
+                continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

            try:
                df_responses = pd.DataFrame(instance["responses"])
            except Exception as e:
                print(f"Warning: Failed to process responses for instance {instance['id']}: {str(e)}")
                continue

            if df_responses.empty:
                continue


73-81: ⚠️ Potential issue

Potential division by zero risk

The calculations of rates don't handle the case where trialsCount is zero.

-            df_responses["responseErrorRate"] = (
-                df_responses["wrongCount"] / df_responses["trialsCount"]
-            )
+            df_responses["responseErrorRate"] = (
+                df_responses["wrongCount"] / df_responses["trialsCount"]
+                if df_responses["trialsCount"].any() else 0
+            )
-            df_responses["responsePartialRate"] = (
-                df_responses["partialCorrectCount"] / df_responses["trialsCount"]
-            )
+            df_responses["responsePartialRate"] = (
+                df_responses["partialCorrectCount"] / df_responses["trialsCount"]
+                if df_responses["trialsCount"].any() else 0
+            )
-            df_responses["responseCorrectRate"] = (
-                df_responses["correctCount"] / df_responses["trialsCount"]
-            )
+            df_responses["responseCorrectRate"] = (
+                df_responses["correctCount"] / df_responses["trialsCount"]
+                if df_responses["trialsCount"].any() else 0
+            )

Committable suggestion skipped: line range outside the PR's diff.


4-21: 🛠️ Refactor suggestion

Consider splitting the function and adding type hints

The function is quite long and handles multiple responsibilities. Consider breaking it down and adding type hints.

+from typing import Dict, Any
+import pandas as pd
+
-def compute_instance_performance(db, activity, total_only=False):
+def compute_instance_performance(
+    db: Any,
+    activity: Dict[str, Any],
+    total_only: bool = False
+) -> pd.DataFrame:
+    """Compute performance metrics for activity instances.
+    
+    Args:
+        db: Database connection
+        activity: Activity data containing stacks and elements
+        total_only: If True, compute only total metrics
+    
+    Returns:
+        DataFrame containing performance metrics for each instance
+    """

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/notebooks/instance_activity_performance.ipynb (2)

134-135: 🛠️ Refactor suggestion

Ensure database disconnection in case of errors

The database disconnection should be guaranteed even if errors occur.

-    # Disconnect from the database
-    db.disconnect()
+    try:
+        # Disconnect from the database
+        db.disconnect()
+    except Exception as e:
+        print(f"Error disconnecting from database: {str(e)}")
+        raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    try:
        # Disconnect from the database
        db.disconnect()
    except Exception as e:
        print(f"Error disconnecting from database: {str(e)}")
        raise

52-57: ⚠️ Potential issue

Security concern: Hardcoded database credentials

The database connection string contains hardcoded credentials, which is a security risk.

-    os.environ["DATABASE_URL"] = "postgresql://klicker:klicker@localhost:5432/klicker-prod"
+    # Load database URL from environment variable or .env file
+    if "DATABASE_URL" not in os.environ:
+        raise ValueError("DATABASE_URL environment variable must be set")

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/modules/participant_performance/save_participant_performance.py (1)

1-30: 🛠️ Refactor suggestion

Add error handling and input validation

The function needs several improvements for robustness and performance:

  1. Add error handling for database operations
  2. Validate DataFrame columns before processing
  3. Consider batch processing for better performance
  4. Reduce code duplication in create/update blocks

Here's a suggested implementation:

 def save_participant_performance(db, df_performance, course_id):
+    required_columns = [
+        "participantId", "firstErrorRate", "firstPerformance",
+        "lastErrorRate", "lastPerformance", "totalErrorRate", "totalPerformance"
+    ]
+    
+    # Validate input DataFrame
+    missing_columns = [col for col in required_columns if col not in df_performance.columns]
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")
+
+    # Prepare common data structure
+    performance_data = {
+        "firstErrorRate": "firstErrorRate",
+        "firstPerformance": "firstPerformance",
+        "lastErrorRate": "lastErrorRate",
+        "lastPerformance": "lastPerformance",
+        "totalErrorRate": "totalErrorRate",
+        "totalPerformance": "totalPerformance",
+    }
+
+    try:
         for _, row in df_performance.iterrows():
             db.participantperformance.upsert(
                 where={
                     "participantId_courseId": {
                         "participantId": row["participantId"],
                         "courseId": course_id,
                     }
                 },
                 data={
                     "create": {
-                        "firstErrorRate": row["firstErrorRate"],
-                        "firstPerformance": row["firstPerformance"],
-                        "lastErrorRate": row["lastErrorRate"],
-                        "lastPerformance": row["lastPerformance"],
-                        "totalErrorRate": row["totalErrorRate"],
-                        "totalPerformance": row["totalPerformance"],
+                        **{k: row[v] for k, v in performance_data.items()},
                         "participant": {"connect": {"id": row["participantId"]}},
                         "course": {"connect": {"id": course_id}},
                     },
                     "update": {
-                        "firstErrorRate": row["firstErrorRate"],
-                        "firstPerformance": row["firstPerformance"],
-                        "lastErrorRate": row["lastErrorRate"],
-                        "lastPerformance": row["lastPerformance"],
-                        "totalErrorRate": row["totalErrorRate"],
-                        "totalPerformance": row["totalPerformance"],
+                        **{k: row[v] for k, v in performance_data.items()},
                     },
                 },
             )
+    except Exception as e:
+        raise Exception(f"Failed to save participant performance: {str(e)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def save_participant_performance(db, df_performance, course_id):
    required_columns = [
        "participantId", "firstErrorRate", "firstPerformance",
        "lastErrorRate", "lastPerformance", "totalErrorRate", "totalPerformance"
    ]
    
    # Validate input DataFrame
    missing_columns = [col for col in required_columns if col not in df_performance.columns]
    if missing_columns:
        raise ValueError(f"Missing required columns: {missing_columns}")

    # Prepare common data structure
    performance_data = {
        "firstErrorRate": "firstErrorRate",
        "firstPerformance": "firstPerformance",
        "lastErrorRate": "lastErrorRate",
        "lastPerformance": "lastPerformance",
        "totalErrorRate": "totalErrorRate",
        "totalPerformance": "totalPerformance",
    }

    try:
        for _, row in df_performance.iterrows():
            db.participantperformance.upsert(
                where={
                    "participantId_courseId": {
                        "participantId": row["participantId"],
                        "courseId": course_id,
                    }
                },
                data={
                    "create": {
                        **{k: row[v] for k, v in performance_data.items()},
                        "participant": {"connect": {"id": row["participantId"]}},
                        "course": {"connect": {"id": course_id}},
                    },
                    "update": {
                        **{k: row[v] for k, v in performance_data.items()},
                    },
                },
            )
    except Exception as e:
        raise Exception(f"Failed to save participant performance: {str(e)}")
apps/analytics/src/modules/participant_performance/compute_performance_levels.py (1)

1-39: 🛠️ Refactor suggestion

Improve robustness and maintainability

The function needs several improvements:

  1. Add input validation
  2. Make performance thresholds configurable
  3. Handle edge cases
  4. Reduce code duplication

Here's a suggested implementation:

+from typing import Dict, List, Literal
+
+PerformanceLevel = Literal["LOW", "MEDIUM", "HIGH"]
+
+def compute_performance_levels(
+    df_performance,
+    quantiles: List[float] = [0.25, 0.75],
+    metrics: Dict[str, str] = {
+        "firstErrorRate": "firstPerformance",
+        "lastErrorRate": "lastPerformance",
+        "totalErrorRate": "totalPerformance",
+    },
+) -> pd.DataFrame:
+    """
+    Compute performance levels based on error rate quantiles.
+    
+    Args:
+        df_performance: DataFrame containing error rate columns
+        quantiles: List of quantile values for thresholds
+        metrics: Dictionary mapping error rate columns to performance columns
+    
+    Returns:
+        DataFrame with added performance level columns
+    """
+    if df_performance.empty:
+        return df_performance
+
+    # Validate input DataFrame
+    missing_columns = [col for col in metrics.keys() if col not in df_performance.columns]
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")
+
+    def assign_performance_levels(
+        df: pd.DataFrame, error_col: str, perf_col: str
+    ) -> pd.DataFrame:
+        qs = df[error_col].quantile(quantiles)
+        df[perf_col] = "MEDIUM"
+        df.loc[df[error_col] <= qs[quantiles[0]], perf_col] = "HIGH"
+        df.loc[df[error_col] >= qs[quantiles[1]], perf_col] = "LOW"
+        return df
+
+    for error_col, perf_col in metrics.items():
+        df_performance = assign_performance_levels(df_performance, error_col, perf_col)
+
+    return df_performance
-def compute_performance_levels(df_performance):
-    # set the performance levels based on the quantiles
-    first_qs = df_performance.firstErrorRate.quantile([0.25, 0.75])
-    last_qs = df_performance.lastErrorRate.quantile([0.25, 0.75])
-    total_qs = df_performance.totalErrorRate.quantile([0.25, 0.75])
-
-    first_q1 = first_qs[0.25]
-    first_q3 = first_qs[0.75]
-    last_q1 = last_qs[0.25]
-    last_q3 = last_qs[0.75]
-    total_q1 = total_qs[0.25]
-    total_q3 = total_qs[0.75]
-
-    # set the performance levels based on the quantiles
-    df_performance["firstPerformance"] = "MEDIUM"
-    df_performance.loc[
-        df_performance.firstErrorRate <= first_q1, "firstPerformance"
-    ] = "HIGH"
-    df_performance.loc[
-        df_performance.firstErrorRate >= first_q3, "firstPerformance"
-    ] = "LOW"
-
-    df_performance["lastPerformance"] = "MEDIUM"
-    df_performance.loc[df_performance.lastErrorRate <= last_q1, "lastPerformance"] = (
-        "HIGH"
-    )
-    df_performance.loc[df_performance.lastErrorRate >= last_q3, "lastPerformance"] = (
-        "LOW"
-    )
-
-    df_performance["totalPerformance"] = "MEDIUM"
-    df_performance.loc[
-        df_performance.totalErrorRate <= total_q1, "totalPerformance"
-    ] = "HIGH"
-    df_performance.loc[
-        df_performance.totalErrorRate >= total_q3, "totalPerformance"
-    ] = "LOW"
-
-    return df_performance
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import Dict, List, Literal

PerformanceLevel = Literal["LOW", "MEDIUM", "HIGH"]

def compute_performance_levels(
    df_performance,
    quantiles: List[float] = [0.25, 0.75],
    metrics: Dict[str, str] = {
        "firstErrorRate": "firstPerformance",
        "lastErrorRate": "lastPerformance",
        "totalErrorRate": "totalPerformance",
    },
) -> pd.DataFrame:
    """
    Compute performance levels based on error rate quantiles.
    
    Args:
        df_performance: DataFrame containing error rate columns
        quantiles: List of quantile values for thresholds
        metrics: Dictionary mapping error rate columns to performance columns
    
    Returns:
        DataFrame with added performance level columns
    """
    if df_performance.empty:
        return df_performance

    # Validate input DataFrame
    missing_columns = [col for col in metrics.keys() if col not in df_performance.columns]
    if missing_columns:
        raise ValueError(f"Missing required columns: {missing_columns}")

    def assign_performance_levels(
        df: pd.DataFrame, error_col: str, perf_col: str
    ) -> pd.DataFrame:
        qs = df[error_col].quantile(quantiles)
        df[perf_col] = "MEDIUM"
        df.loc[df[error_col] <= qs[quantiles[0]], perf_col] = "HIGH"
        df.loc[df[error_col] >= qs[quantiles[1]], perf_col] = "LOW"
        return df

    for error_col, perf_col in metrics.items():
        df_performance = assign_performance_levels(df_performance, error_col, perf_col)

    return df_performance
apps/analytics/src/modules/participant_performance/compute_response_error_rates.py (1)

1-54: 🛠️ Refactor suggestion

Add validation and optimize DataFrame operations

The function needs several improvements:

  1. Add input validation and edge case handling
  2. Optimize DataFrame operations
  3. Add proper documentation
  4. Handle potential division by zero

Here's a suggested implementation:

+from typing import List
+import pandas as pd
+
 def compute_response_error_rates(df_responses):
+    """
+    Compute various error rates from participant responses.
+    
+    Args:
+        df_responses: DataFrame containing response data with columns:
+            - participantId
+            - wrongCount
+            - trialsCount
+            - firstResponseCorrectness
+            - lastResponseCorrectness
+    
+    Returns:
+        DataFrame containing computed error rates and counts
+    """
+    required_columns = [
+        "participantId", "wrongCount", "trialsCount",
+        "firstResponseCorrectness", "lastResponseCorrectness"
+    ]
+    
+    # Validate input DataFrame
+    if df_responses.empty:
+        return pd.DataFrame()
+        
+    missing_columns = [col for col in required_columns if col not in df_responses.columns]
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")
+
     # compute the error rate for each response itself
-    df_responses["responseErrorRate"] = (
-        df_responses["wrongCount"] / df_responses["trialsCount"]
-    )
+    df_responses["responseErrorRate"] = df_responses["wrongCount"].div(
+        df_responses["trialsCount"]
+    ).fillna(0)

     # compute the total number of responses, number of wrong first and last responses,
     # total number of wrong responses, and the average total error rate
-    df_response_count = (
-        df_responses.groupby("participantId").size().reset_index(name="responseCount")
-    )
-    df_first_response_wrong_count = (
-        df_responses[df_responses["firstResponseCorrectness"] == "WRONG"]
-        .groupby("participantId")
-        .size()
-        .reset_index(name="wrongFirstResponseCount")
-    )
-    df_last_response_wrong_count = (
-        df_responses[df_responses["lastResponseCorrectness"] == "WRONG"]
-        .groupby("participantId")
-        .size()
-        .reset_index(name="wrongLastResponseCount")
-    )
-    df_total_error_rate = (
-        df_responses[["participantId", "responseErrorRate"]]
-        .groupby("participantId")
-        .agg("mean")
-        .reset_index()
-        .rename(
-            columns={
-                "responseErrorRate": "totalErrorRate",
-            }
-        )
-    )
+    agg_dict = {
+        "responseCount": ("participantId", "size"),
+        "wrongFirstResponseCount": ("firstResponseCorrectness", 
+            lambda x: (x == "WRONG").sum()),
+        "wrongLastResponseCount": ("lastResponseCorrectness", 
+            lambda x: (x == "WRONG").sum()),
+        "totalErrorRate": ("responseErrorRate", "mean")
+    }
+    
+    df_performance = df_responses.groupby("participantId").agg(**agg_dict).reset_index()

-    # combine the dataframes into a single one
-    df_performance = (
-        df_response_count.merge(
-            df_first_response_wrong_count, on="participantId", how="left"
-        )
-        .merge(df_last_response_wrong_count, on="participantId", how="left")
-        .merge(df_total_error_rate, on="participantId", how="left")
-        .fillna(0)
+    # compute the first and last error rates safely
+    df_performance["firstErrorRate"] = df_performance["wrongFirstResponseCount"].div(
+        df_performance["responseCount"]
+    ).fillna(0)
+    df_performance["lastErrorRate"] = df_performance["wrongLastResponseCount"].div(
+        df_performance["responseCount"]
+    ).fillna(0)

-    # compute the first and last error rates
-    df_performance["firstErrorRate"] = (
-        df_performance["wrongFirstResponseCount"] / df_performance["responseCount"]
-    )
-    df_performance["lastErrorRate"] = (
-        df_performance["wrongLastResponseCount"] / df_performance["responseCount"]
     )

     return df_performance

The improvements include:

  1. Added input validation and edge case handling
  2. Optimized DataFrame operations by using a single groupby
  3. Added proper documentation with type hints
  4. Added safe division operations with fillna
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import List
import pandas as pd

def compute_response_error_rates(df_responses):
    """
    Compute various error rates from participant responses.
    
    Args:
        df_responses: DataFrame containing response data with columns:
            - participantId
            - wrongCount
            - trialsCount
            - firstResponseCorrectness
            - lastResponseCorrectness
    
    Returns:
        DataFrame containing computed error rates and counts
    """
    required_columns = [
        "participantId", "wrongCount", "trialsCount",
        "firstResponseCorrectness", "lastResponseCorrectness"
    ]
    
    # Validate input DataFrame
    if df_responses.empty:
        return pd.DataFrame()
        
    missing_columns = [col for col in required_columns if col not in df_responses.columns]
    if missing_columns:
        raise ValueError(f"Missing required columns: {missing_columns}")

    # compute the error rate for each response itself
    df_responses["responseErrorRate"] = df_responses["wrongCount"].div(
        df_responses["trialsCount"]
    ).fillna(0)

    # compute the total number of responses, number of wrong first and last responses,
    # total number of wrong responses, and the average total error rate
    agg_dict = {
        "responseCount": ("participantId", "size"),
        "wrongFirstResponseCount": ("firstResponseCorrectness", 
            lambda x: (x == "WRONG").sum()),
        "wrongLastResponseCount": ("lastResponseCorrectness", 
            lambda x: (x == "WRONG").sum()),
        "totalErrorRate": ("responseErrorRate", "mean")
    }
    
    df_performance = df_responses.groupby("participantId").agg(**agg_dict).reset_index()

    # compute the first and last error rates safely
    df_performance["firstErrorRate"] = df_performance["wrongFirstResponseCount"].div(
        df_performance["responseCount"]
    ).fillna(0)
    df_performance["lastErrorRate"] = df_performance["wrongLastResponseCount"].div(
        df_performance["responseCount"]
    ).fillna(0)

    return df_performance
apps/analytics/src/modules/participant_analytics/get_participant_responses.py (1)

5-11: ⚠️ Potential issue

Ensure fallback handling if both practiceQuiz and microLearning are None.
If both keys are None, the current else-branch would attempt to access a "courseId" in a nonexistent object, leading to a runtime error. Consider adding an explicit check or fallback.

 def map_details(detail, participantId):
-    courseId = (
-        detail["practiceQuiz"]["courseId"]
-        if detail["practiceQuiz"]
-        else detail["microLearning"]["courseId"]
-    )
+    if detail["practiceQuiz"] is not None:
+        courseId = detail["practiceQuiz"]["courseId"]
+    elif detail["microLearning"] is not None:
+        courseId = detail["microLearning"]["courseId"]
+    else:
+        # Handle the edge case explicitly
+        courseId = None
     return {**detail, "participantId": participantId, "courseId": courseId}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def map_details(detail, participantId):
    if detail["practiceQuiz"] is not None:
        courseId = detail["practiceQuiz"]["courseId"]
    elif detail["microLearning"] is not None:
        courseId = detail["microLearning"]["courseId"]
    else:
        # Handle the edge case explicitly
        courseId = None
    return {**detail, "participantId": participantId, "courseId": courseId}
apps/analytics/src/modules/aggregated_course_analytics/compute_weekday_activity.py (1)

104-107: ⚠️ Potential issue

Fix double appending to avoid incorrect counts.
The code appends zero if empty, then always appends the total count. Rows with data add both 0 and the actual length. The logic should do one or the other.

    if df_weekday.empty:
-        collector.append(0)
-    collector.append(len(df_weekday))
+        collector.append(0)
+    else:
+        collector.append(len(df_weekday))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        if df_weekday.empty:
            collector.append(0)
        else:
            collector.append(len(df_weekday))
apps/analytics/src/modules/aggregated_analytics/save_aggregated_analytics.py (1)

40-58: 💡 Codebase verification

Add error handling for find_unique_or_raise operation
The find_unique_or_raise operation is used within a loop processing analytics data, and a failure to find a course would interrupt the entire analytics aggregation. Add a try-except block to handle missing courses gracefully:

try:
    course = db.course.find_unique_or_raise(
        where={"id": row["courseId"]},
        include={...}
    )
except Exception as e:
    # Log the error and skip this course
    print(f"Error processing course {row['courseId']}: {str(e)}")
    continue
🔗 Analysis chain

Handle potential exceptions from find_unique_or_raise.
If the course is unexpectedly missing, the operation will raise an exception. Consider wrapping this call to provide a fallback or a more graceful error.


Let me gather more context about the error handling patterns in this codebase.


Let me try a different approach to check the error handling patterns.


Let me check one more thing to understand if there's any global error handling or if this is part of a larger try-except block.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Checks for rescue or fallback usage of find_unique_or_raise calls across the codebase.
rg --context 3 'find_unique_or_raise'

Length of output: 936


Script:

#!/bin/bash
# Check for try-except blocks around database operations in the analytics module
rg -A 5 -B 5 "try:.*\n.*db\." apps/analytics/src/modules/

# Check the implementation of find_unique_or_raise to understand its behavior
ast-grep --pattern 'def find_unique_or_raise'

Length of output: 299


Script:

#!/bin/bash
# Check for try-except blocks around database operations in the analytics module
rg -U -A 5 -B 5 "try:" apps/analytics/src/modules/

# Check for any existing error handling in save_aggregated_analytics.py
rg -A 5 -B 5 "except|raise|error" apps/analytics/src/modules/aggregated_analytics/save_aggregated_analytics.py

Length of output: 763


Script:

#!/bin/bash
# Get the full content of save_aggregated_analytics.py to understand the broader context
cat apps/analytics/src/modules/aggregated_analytics/save_aggregated_analytics.py

Length of output: 4843

apps/analytics/src/modules/activity_progress/save_practice_quiz_progress.py (2)

1-9: 🛠️ Refactor suggestion

Add type hints and docstring documentation

The function lacks type hints and documentation. This is particularly important for analytics code where data integrity is crucial.

Add type hints and documentation:

+ from typing import Any
+
+ def save_practice_quiz_progress(
+     db: Any,
+     course_participants: int,
+     started_count: int,
+     completed_count: int,
+     repeated_count: int,
+     course_id: str,
+     quiz_id: str,
+ ) -> None:
+     """Save or update practice quiz progress statistics.
+
+     Args:
+         db: Database connection instance
+         course_participants: Total number of course participants
+         started_count: Number of participants who started the quiz
+         completed_count: Number of participants who completed the quiz
+         repeated_count: Number of participants who repeated the quiz
+         course_id: Unique identifier of the course
+         quiz_id: Unique identifier of the practice quiz
+     """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import Any

def save_practice_quiz_progress(
    db: Any,
    course_participants: int,
    started_count: int,
    completed_count: int,
    repeated_count: int,
    course_id: str,
    quiz_id: str,
) -> None:
    """Save or update practice quiz progress statistics.

    Args:
        db: Database connection instance
        course_participants: Total number of course participants
        started_count: Number of participants who started the quiz
        completed_count: Number of participants who completed the quiz
        repeated_count: Number of participants who repeated the quiz
        course_id: Unique identifier of the course
        quiz_id: Unique identifier of the practice quiz
    """

1-23: ⚠️ Potential issue

Add input validation and error handling

The function should validate input parameters and handle potential database errors.

Add validation and error handling:

+ from typing import Any
+
  def save_practice_quiz_progress(
      db: Any,
      course_participants: int,
      started_count: int,
      completed_count: int,
      repeated_count: int,
      course_id: str,
      quiz_id: str,
  ) -> None:
+     # Validate input parameters
+     if not all(isinstance(count, int) and count >= 0 for count in [
+         course_participants, started_count, completed_count, repeated_count
+     ]):
+         raise ValueError("All count parameters must be non-negative integers")
+
+     if not all(isinstance(id, str) and id for id in [course_id, quiz_id]):
+         raise ValueError("Course ID and Quiz ID must be non-empty strings")
+
      values = {
          "totalCourseParticipants": course_participants,
          "startedCount": started_count,
          "completedCount": completed_count,
          "repeatedCount": repeated_count,
      }
      creation_values = values.copy()
      creation_values["course"] = {"connect": {"id": course_id}}
      creation_values["practiceQuiz"] = {"connect": {"id": quiz_id}}

+     try:
          db.activityprogress.upsert(
              where={"practiceQuizId": quiz_id},
              data={"create": creation_values, "update": values},
          )
+     except Exception as e:
+         raise RuntimeError(f"Failed to save practice quiz progress: {str(e)}") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import Any

def save_practice_quiz_progress(
    db: Any,
    course_participants: int,
    started_count: int,
    completed_count: int,
    repeated_count: int,
    course_id: str,
    quiz_id: str,
) -> None:
    # Validate input parameters
    if not all(isinstance(count, int) and count >= 0 for count in [
        course_participants, started_count, completed_count, repeated_count
    ]):
        raise ValueError("All count parameters must be non-negative integers")

    if not all(isinstance(id, str) and id for id in [course_id, quiz_id]):
        raise ValueError("Course ID and Quiz ID must be non-empty strings")

    values = {
        "totalCourseParticipants": course_participants,
        "startedCount": started_count,
        "completedCount": completed_count,
        "repeatedCount": repeated_count,
    }
    creation_values = values.copy()
    creation_values["course"] = {"connect": {"id": course_id}}
    creation_values["practiceQuiz"] = {"connect": {"id": quiz_id}}

    try:
        db.activityprogress.upsert(
            where={"practiceQuizId": quiz_id},
            data={"create": creation_values, "update": values},
        )
    except Exception as e:
        raise RuntimeError(f"Failed to save practice quiz progress: {str(e)}") from e
apps/analytics/pyproject.toml (1)

31-32: 💡 Codebase verification

Based on the evidence gathered, I can now provide the final response:

Type hints required for strict type checking

The codebase lacks proper type hints which are required for strict type checking mode. Found issues:

  • Missing return type annotations in all functions
  • Missing parameter type hints (e.g., db, df_activity, df_analytics, timestamp)
  • No typing imports present despite handling complex types like DataFrames
  • 21 out of 38 Python files contain functions without type hints

Key files needing attention:

  • apps/analytics/src/modules/participant_course_analytics/save_participant_course_analytics.py
  • apps/analytics/src/modules/participant_analytics/save_participant_analytics.py
  • And 19 other files identified with missing type hints
🔗 Analysis chain

Verify strict type checking compatibility

The strict type checking mode is enabled, which is great for code quality. However, ensure all modules have proper type hints to prevent build failures.


Let me verify the type hints in more detail by checking the function signatures in these files.


Let me check for any type annotations in the function bodies and variable declarations to get a more complete picture.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for missing type hints in Python files

# Find Python files without type hints
echo "Files potentially missing type hints:"
rg -l "def [^(]*\([^:]*\)\s*[^->]*:" -g "*.py"

# Count total Python files for comparison
echo -e "\nTotal Python files:"
fd -e py | wc -l

Length of output: 1916


Script:

#!/bin/bash
# Check a sample of files for their function signatures
for file in $(head -n 3 <<EOF
apps/analytics/src/modules/participant_course_analytics/save_participant_course_analytics.py
apps/analytics/src/modules/participant_analytics/save_participant_analytics.py
apps/analytics/src/modules/participant_performance/save_participant_performance.py
EOF
); do
  echo "=== $file ==="
  rg "^def" "$file" -A 1
done

# Also check if type imports are present
echo -e "\nChecking type-related imports:"
rg "^from typing import|^import typing" apps/analytics/src/modules/

Length of output: 2294


Script:

#!/bin/bash
# Check for type annotations in the first few files
for file in $(head -n 2 <<EOF
apps/analytics/src/modules/participant_course_analytics/save_participant_course_analytics.py
apps/analytics/src/modules/participant_analytics/save_participant_analytics.py
EOF
); do
  echo "=== $file ==="
  # Show the entire file content to see type annotations
  cat "$file"
  echo -e "\n"
done

# Also check if there are any type-related configurations in setup files
echo "=== Checking for type-related configurations ==="
fd "setup.py|setup.cfg|mypy.ini|pytest.ini" apps/analytics/

Length of output: 6883

apps/analytics/src/modules/participant_course_analytics/get_running_past_courses.py (2)

6-16: ⚠️ Potential issue

Add timezone handling and optimize database query

The current date handling doesn't account for timezones, which could lead to inconsistencies. Also, loading all participations might cause memory issues with large datasets.

Consider these improvements:

+ from datetime import datetime, timezone
  def get_running_past_courses(db):
-     curr_date = datetime.now().strftime("%Y-%m-%d")
+     curr_date = datetime.now(timezone.utc).strftime("%Y-%m-%d")
      courses = db.course.find_many(
          where={
              # Incremental scripts can add this statement to reduce the amount of required computations
              # 'endDate': {
              #     'gt': datetime.now().strftime('%Y-%m-%d') + 'T00:00:00.000Z'
              # }
              "startDate": {"lte": curr_date + "T23:59:59.999Z"},
          },
-         include={"participations": True},
+         include={
+             "participations": {
+                 "select": {
+                     "id": True,
+                     "userId": True,
+                     "status": True
+                 }
+             }
+         },
      )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    from datetime import datetime, timezone
    curr_date = datetime.now(timezone.utc).strftime("%Y-%m-%d")
    courses = db.course.find_many(
        where={
            # Incremental scripts can add this statement to reduce the amount of required computations
            # 'endDate': {
            #     'gt': datetime.now().strftime('%Y-%m-%d') + 'T00:00:00.000Z'
            # }
            "startDate": {"lte": curr_date + "T23:59:59.999Z"},
        },
        include={
            "participations": {
                "select": {
                    "id": True,
                    "userId": True,
                    "status": True
                }
            }
        },
    )

18-23: 🛠️ Refactor suggestion

Optimize memory usage and add error handling

The DataFrame conversion could be memory-intensive for large datasets. Also, add error handling for the database query.

+ from typing import Any
+ import logging
+
+ def get_running_past_courses(db: Any) -> pd.DataFrame:
+     """Retrieve courses that have started in the past.
+
+     Args:
+         db: Database connection instance
+
+     Returns:
+         DataFrame containing course information
+     """
+     try:
          df_courses = pd.DataFrame(list(map(lambda x: x.dict(), courses)))
          print(
              "Found {} courses with a start date before {}".format(
                  len(df_courses), curr_date
              )
          )
+     except Exception as e:
+         logging.error(f"Failed to retrieve courses: {str(e)}")
+         raise
+
+     # Consider chunking for large datasets
+     if len(df_courses) > 1000:
+         logging.warning(f"Large dataset detected: {len(df_courses)} courses")

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/modules/aggregated_analytics/aggregate_participant_analytics.py (1)

1-8: 🛠️ Refactor suggestion

Add type hints and improve error handling

The function should include type hints and better error handling for input validation.

-def aggregate_participant_analytics(df_participant_analytics, verbose=False):
+from pandas import DataFrame
+
+def aggregate_participant_analytics(df_participant_analytics: DataFrame, verbose: bool = False) -> DataFrame | None:
+    """Aggregate participant analytics by courseId.
+    
+    Args:
+        df_participant_analytics: DataFrame containing participant analytics
+        verbose: If True, print debug information
+    
+    Returns:
+        DataFrame with aggregated analytics or None if input is empty
+    
+    Raises:
+        ValueError: If required columns are missing
+    """
     # if the dataframe is empty, return None
     if df_participant_analytics.empty:
         if verbose:
             print("No participant analytics to aggregate")
         return None
+
+    required_columns = {"courseId", "id", "responseCount", "totalScore", "totalPoints", "totalXp"}
+    missing_columns = required_columns - set(df_participant_analytics.columns)
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {missing_columns}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from pandas import DataFrame

def aggregate_participant_analytics(df_participant_analytics: DataFrame, verbose: bool = False) -> DataFrame | None:
    """Aggregate participant analytics by courseId.
    
    Args:
        df_participant_analytics: DataFrame containing participant analytics
        verbose: If True, print debug information
    
    Returns:
        DataFrame with aggregated analytics or None if input is empty
    
    Raises:
        ValueError: If required columns are missing
    """
    # if the dataframe is empty, return None
    if df_participant_analytics.empty:
        if verbose:
            print("No participant analytics to aggregate")
        return None

    required_columns = {"courseId", "id", "responseCount", "totalScore", "totalPoints", "totalXp"}
    missing_columns = required_columns - set(df_participant_analytics.columns)
    if missing_columns:
        raise ValueError(f"Missing required columns: {missing_columns}")
apps/analytics/src/modules/aggregated_analytics/load_participant_analytics.py (2)

4-10: 🛠️ Refactor suggestion

Add type hints and error handling to convert_to_df

The function needs type hints and should handle potential conversion errors.

-def convert_to_df(analytics):
+from typing import List, Dict, Any
+import pandas as pd
+
+def convert_to_df(analytics: List[Dict[str, Any]]) -> pd.DataFrame:
+    """Convert analytics data to pandas DataFrame.
+    
+    Args:
+        analytics: List of analytics dictionaries
+        
+    Returns:
+        DataFrame containing the analytics data
+        
+    Raises:
+        ValueError: If analytics data is invalid
+    """
     # convert the database query result into a pandas dataframe
+    if not isinstance(analytics, list):
+        raise ValueError("Analytics must be a list of dictionaries")
     rows = []
     for item in analytics:
-        rows.append(dict(item))
+        try:
+            rows.append(dict(item))
+        except (TypeError, ValueError) as e:
+            raise ValueError(f"Failed to convert item to dictionary: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import List, Dict, Any
import pandas as pd

def convert_to_df(analytics: List[Dict[str, Any]]) -> pd.DataFrame:
    """Convert analytics data to pandas DataFrame.
    
    Args:
        analytics: List of analytics dictionaries
        
    Returns:
        DataFrame containing the analytics data
        
    Raises:
        ValueError: If analytics data is invalid
    """
    # convert the database query result into a pandas dataframe
    if not isinstance(analytics, list):
        raise ValueError("Analytics must be a list of dictionaries")
    rows = []
    for item in analytics:
        try:
            rows.append(dict(item))
        except (TypeError, ValueError) as e:
            raise ValueError(f"Failed to convert item to dictionary: {e}")

    return pd.DataFrame(rows)

13-30: ⚠️ Potential issue

Fix critical issues in load_participant_analytics

The function has several critical issues that need to be addressed.

-def load_participant_analytics(db, timestamp, analytics_type, verbose=False):
+from datetime import datetime
+from typing import Optional
+
+def load_participant_analytics(
+    db: Any,
+    timestamp: datetime,
+    analytics_type: str,
+    verbose: bool = False
+) -> pd.DataFrame:
+    """Load participant analytics from database.
+    
+    Args:
+        db: Database connection
+        timestamp: Timestamp for filtering analytics
+        analytics_type: Type of analytics to load
+        verbose: If True, print debug information
+        
+    Returns:
+        DataFrame containing the loaded analytics
+        
+    Raises:
+        Exception: If database query fails
+    """
+    try:
         participant_analytics = db.participantanalytics.find_many(
             where={"timestamp": timestamp, "type": analytics_type},
         )
 
         if verbose:
-            # Print the first participant analytics
             print(
                 "Found {} analytics for the timespan from {} to {}".format(
-                    len(participant_analytics), start_date, end_date
+                    len(participant_analytics),
+                    timestamp.strftime("%Y-%m-%d %H:%M:%S"),
+                    timestamp.strftime("%Y-%m-%d %H:%M:%S")
                 )
             )
-            print(participant_analytics[0])
+            if participant_analytics:
+                print(participant_analytics[0])
 
-        # convert the analytics to a dataframe
         df_loaded_analytics = convert_to_df(participant_analytics)
-
         return df_loaded_analytics
+    except Exception as e:
+        raise Exception(f"Failed to load participant analytics: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from datetime import datetime
from typing import Optional

def load_participant_analytics(
    db: Any,
    timestamp: datetime,
    analytics_type: str,
    verbose: bool = False
) -> pd.DataFrame:
    """Load participant analytics from database.
    
    Args:
        db: Database connection
        timestamp: Timestamp for filtering analytics
        analytics_type: Type of analytics to load
        verbose: If True, print debug information
        
    Returns:
        DataFrame containing the loaded analytics
        
    Raises:
        Exception: If database query fails
    """
    try:
        participant_analytics = db.participantanalytics.find_many(
            where={"timestamp": timestamp, "type": analytics_type},
        )

        if verbose:
            print(
                "Found {} analytics for the timespan from {} to {}".format(
                    len(participant_analytics),
                    timestamp.strftime("%Y-%m-%d %H:%M:%S"),
                    timestamp.strftime("%Y-%m-%d %H:%M:%S")
                )
            )
            if participant_analytics:
                print(participant_analytics[0])

        df_loaded_analytics = convert_to_df(participant_analytics)
        return df_loaded_analytics
    except Exception as e:
        raise Exception(f"Failed to load participant analytics: {e}")
🧰 Tools
🪛 Ruff (0.8.2)

22-22: Undefined name start_date

(F821)


22-22: Undefined name end_date

(F821)

apps/analytics/src/modules/participant_course_analytics/save_participant_course_analytics.py (1)

1-26: ⚠️ Potential issue

Improve performance and add error handling

The current implementation processes rows one by one and lacks error handling, which could be problematic for large datasets.

-def save_participant_course_analytics(db, df_activity):
+from typing import Any
+import pandas as pd
+from tqdm import tqdm
+
+def save_participant_course_analytics(db: Any, df_activity: pd.DataFrame, batch_size: int = 100) -> None:
+    """Save participant course analytics to database.
+    
+    Args:
+        db: Database connection
+        df_activity: DataFrame containing activity data
+        batch_size: Number of records to process in each batch
+        
+    Raises:
+        Exception: If database operations fail
+    """
+    total_rows = len(df_activity)
+    
+    try:
+        # Process in batches with progress bar
+        for start_idx in tqdm(range(0, total_rows, batch_size), desc="Saving analytics"):
+            batch = df_activity.iloc[start_idx:start_idx + batch_size]
+            
+            for _, row in batch.iterrows():
+                try:
                     db.participantcourseanalytics.upsert(
                         where={
                             "courseId_participantId": {
                                 "courseId": row["courseId"],
                                 "participantId": row["participantId"],
                             }
                         },
                         data={
                             "create": {
                                 "activeWeeks": row["activeWeeks"],
                                 "activeDaysPerWeek": row["activeDaysPerWeek"],
                                 "meanElementsPerDay": row["meanElementsPerDay"],
                                 "activityLevel": row["activityLevel"],
                                 "course": {"connect": {"id": row["courseId"]}},
                                 "participant": {"connect": {"id": row["participantId"]}},
                             },
                             "update": {
                                 "activeWeeks": row["activeWeeks"],
                                 "activeDaysPerWeek": row["activeDaysPerWeek"],
                                 "meanElementsPerDay": row["meanElementsPerDay"],
                                 "activityLevel": row["activityLevel"],
                             },
                         },
                     )
+                except Exception as e:
+                    print(f"Error processing row {row.name}: {e}")
+    except Exception as e:
+        raise Exception(f"Failed to save participant course analytics: {e}")

Additionally, consider implementing bulk upsert if your database supports it for better performance.

Would you like me to provide an example implementation of bulk upsert?

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import Any
import pandas as pd
from tqdm import tqdm

def save_participant_course_analytics(db: Any, df_activity: pd.DataFrame, batch_size: int = 100) -> None:
    """Save participant course analytics to database.
    
    Args:
        db: Database connection
        df_activity: DataFrame containing activity data
        batch_size: Number of records to process in each batch
        
    Raises:
        Exception: If database operations fail
    """
    total_rows = len(df_activity)
    
    try:
        # Process in batches with progress bar
        for start_idx in tqdm(range(0, total_rows, batch_size), desc="Saving analytics"):
            batch = df_activity.iloc[start_idx:start_idx + batch_size]
            
            for _, row in batch.iterrows():
                try:
                    db.participantcourseanalytics.upsert(
                        where={
                            "courseId_participantId": {
                                "courseId": row["courseId"],
                                "participantId": row["participantId"],
                            }
                        },
                        data={
                            "create": {
                                "activeWeeks": row["activeWeeks"],
                                "activeDaysPerWeek": row["activeDaysPerWeek"],
                                "meanElementsPerDay": row["meanElementsPerDay"],
                                "activityLevel": row["activityLevel"],
                                "course": {"connect": {"id": row["courseId"]}},
                                "participant": {"connect": {"id": row["participantId"]}},
                            },
                            "update": {
                                "activeWeeks": row["activeWeeks"],
                                "activeDaysPerWeek": row["activeDaysPerWeek"],
                                "meanElementsPerDay": row["meanElementsPerDay"],
                                "activityLevel": row["activityLevel"],
                            },
                        },
                    )
                except Exception as e:
                    print(f"Error processing row {row.name}: {e}")
    except Exception as e:
        raise Exception(f"Failed to save participant course analytics: {e}")
apps/analytics/src/modules/aggregated_analytics/compute_aggregated_analytics.py (2)

31-34: 🛠️ Refactor suggestion

Add error handling for database operations.

Consider wrapping the database operation in a try-except block to handle potential errors gracefully.

     if df_aggregated_analytics is not None:
+        try:
             save_aggregated_analytics(
                 db, df_aggregated_analytics, timestamp, analytics_type
             )
+        except Exception as e:
+            print(f"Error saving aggregated analytics: {e}")
+            raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    if df_aggregated_analytics is not None:
        try:
            save_aggregated_analytics(
                db, df_aggregated_analytics, timestamp, analytics_type
            )
        except Exception as e:
            print(f"Error saving aggregated analytics: {e}")
            raise

19-28: 🛠️ Refactor suggestion

Use f-strings for string formatting and add error handling.

Replace string concatenation with f-strings and consider adding error handling for database operations.

     if df_aggregated_analytics is not None and verbose:
-        print("Aggregated analytics for time range:" + start_date + " to " + end_date)
+        print(f"Aggregated analytics for time range: {start_date} to {end_date}")
         print(df_aggregated_analytics.head())
     elif df_aggregated_analytics is None:
-        print(
-            "No aggregated analytics to compute for time range:"
-            + start_date
-            + " to "
-            + end_date
-        )
+        print(f"No aggregated analytics to compute for time range: {start_date} to {end_date}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    if df_aggregated_analytics is not None and verbose:
        print(f"Aggregated analytics for time range: {start_date} to {end_date}")
        print(df_aggregated_analytics.head())
    elif df_aggregated_analytics is None:
        print(f"No aggregated analytics to compute for time range: {start_date} to {end_date}")
apps/analytics/src/modules/participant_course_analytics/get_active_weeks.py (2)

4-7: 🛠️ Refactor suggestion

Add input validation and type hints.

The function should validate the input course object and include type hints for better maintainability.

-def get_active_weeks(db, course):
+from typing import Dict, Any
+import pandas as pd
+
+def get_active_weeks(db, course: Dict[str, Any]) -> pd.DataFrame:
+    """Calculate the number of active weeks for participants in a course.
+
+    Args:
+        db: Database connection object
+        course: Course object containing id and participations
+
+    Returns:
+        DataFrame with participant activity metrics
+    """
+    if not course or "id" not in course or "participations" not in course:
+        raise ValueError("Invalid course object")
+
     course_id = course["id"]
     participations = course["participations"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import Dict, Any
import pandas as pd

def get_active_weeks(db, course: Dict[str, Any]) -> pd.DataFrame:
    """Calculate the number of active weeks for participants in a course.

    Args:
        db: Database connection object
        course: Course object containing id and participations

    Returns:
        DataFrame with participant activity metrics
    """
    if not course or "id" not in course or "participations" not in course:
        raise ValueError("Invalid course object")

    course_id = course["id"]
    participations = course["participations"]

14-20: 🛠️ Refactor suggestion

Optimize database query performance.

Consider fetching all weekly analytics in a single query instead of querying for each participant.

-        weekly_analytics = db.participantanalytics.find_many(
-            where={
-                "type": "WEEKLY",
-                "courseId": course_id,
-                "participantId": participant_id,
-            },
-        )
+    # Fetch all weekly analytics for the course in one query
+    all_weekly_analytics = db.participantanalytics.find_many(
+        where={
+            "type": "WEEKLY",
+            "courseId": course_id,
+            "participantId": {"in": [p["participantId"] for p in participations]},
+        },
+    )
+    
+    # Group by participant
+    analytics_by_participant = {}
+    for analytic in all_weekly_analytics:
+        pid = analytic["participantId"]
+        if pid not in analytics_by_participant:
+            analytics_by_participant[pid] = []
+        analytics_by_participant[pid].append(analytic)

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/modules/activity_progress/compute_progress_counts.py (1)

4-8: 🛠️ Refactor suggestion

Add input validation and type hints.

The function should validate the input activity object and include type hints.

-def compute_progress_counts(activity):
+from typing import Dict, Any, Tuple
+
+def compute_progress_counts(activity: Dict[str, Any]) -> Tuple[int, int, int]:
+    """Compute progress counts for an activity.
+
+    Args:
+        activity: Activity object containing responses and stacks
+
+    Returns:
+        Tuple of (started_count, completed_count, repeated_count)
+    """
+    if not activity or "responses" not in activity or "stacks" not in activity:
+        raise ValueError("Invalid activity object")
+
     started_count = 0
     completed_count = 0
     repeated_count = 0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from typing import Dict, Any, Tuple

def compute_progress_counts(activity: Dict[str, Any]) -> Tuple[int, int, int]:
    """Compute progress counts for an activity.

    Args:
        activity: Activity object containing responses and stacks

    Returns:
        Tuple of (started_count, completed_count, repeated_count)
    """
    if not activity or "responses" not in activity or "stacks" not in activity:
        raise ValueError("Invalid activity object")

    started_count = 0
    completed_count = 0
    repeated_count = 0
apps/analytics/src/modules/participant_course_analytics/compute_participant_activity.py (2)

56-67: 🛠️ Refactor suggestion

Add type hints and docstring for better maintainability

The helper function lacks type hints and documentation.

Add type hints and docstring:

-def sum_active_days_per_week(week_end, daily_analytics):
+from datetime import datetime
+from typing import List, Dict, Any
+
+def sum_active_days_per_week(week_end: datetime, daily_analytics: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+    """Filter daily analytics entries for a specific week.
+
+    Args:
+        week_end: The end date of the week
+        daily_analytics: List of daily analytics entries
+
+    Returns:
+        List of analytics entries within the specified week
+    """
     week_start = week_end - pd.DateOffset(days=6)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from datetime import datetime
from typing import List, Dict, Any

def sum_active_days_per_week(week_end: datetime, daily_analytics: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    """Filter daily analytics entries for a specific week.

    Args:
        week_end: The end date of the week
        daily_analytics: List of daily analytics entries

    Returns:
        List of analytics entries within the specified week
    """
    week_start = week_end - pd.DateOffset(days=6)

    # filter the daily analytics entries for the current week
    week_analytics = list(
        filter(
            lambda daily: week_start <= dict(daily)["timestamp"] <= week_end,
            daily_analytics,
        )
    )

    return week_analytics

4-53: ⚠️ Potential issue

Add error handling and input validation

The function lacks validation for input parameters and error handling for database operations.

Consider adding these validations:

 def compute_participant_activity(db, df_activity, course_id, course_start, course_end):
+    if course_end < course_start:
+        raise ValueError("course_end must be after course_start")
+    if df_activity is None or df_activity.empty:
+        return df_activity
+
     # compute course duration in days
     course_duration = (course_end - course_start).days + 1

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/notebooks/aggregated_course_analytics.ipynb (1)

66-74: ⚠️ Potential issue

Add error handling for database operations

The course processing loop lacks error handling for database operations.

Add try-except blocks:

 # find all courses that started in the past
-df_courses = get_running_past_courses(db)
+try:
+    df_courses = get_running_past_courses(db)
+except Exception as e:
+    print(f"Error fetching courses: {e}")
+    db.disconnect()
+    raise

 # iterate over all courses and compute the participant course analytics
 for idx, course in df_courses.iterrows():
     print(f"Processing course", idx, "of", len(df_courses), "with id", course["id"])

Committable suggestion skipped: line range outside the PR's diff.

apps/analytics/src/notebooks/activity_progress.ipynb (1)

73-113: 🛠️ Refactor suggestion

Optimize database operations and add error handling

The processing loop makes multiple database calls without proper error handling or batching.

Consider these improvements:

  1. Add error handling:
 for idx, course in df_courses.iterrows():
+    try:
         course_id = course["id"]
         print(f"Processing course", idx, "of", len(df_courses), "with id", course_id)
         
         # ... processing logic ...
+    except Exception as e:
+        print(f"Error processing course {course_id}: {e}")
+        continue
  1. Consider batching database operations:
def batch_save_progress(db, progress_data, batch_size=100):
    """Batch save progress data to reduce database calls"""
    for i in range(0, len(progress_data), batch_size):
        batch = progress_data[i:i + batch_size]
        db.batch_([
            db.practiceQuizProgress.create(data=item)
            for item in batch
        ])
apps/analytics/src/modules/participant_course_analytics/__init__.py (1)

1-4: 🛠️ Refactor suggestion

Define module's public API using __all__

The imports appear to be intended for re-export, but they're currently flagged as unused. To explicitly define the module's public API and silence the linter warnings, add an __all__ declaration.

Add this after the imports:

from .get_running_past_courses import get_running_past_courses
from .get_active_weeks import get_active_weeks
from .compute_participant_activity import compute_participant_activity
from .save_participant_course_analytics import save_participant_course_analytics
+
+__all__ = [
+    'get_running_past_courses',
+    'get_active_weeks',
+    'compute_participant_activity',
+    'save_participant_course_analytics',
+]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from .get_running_past_courses import get_running_past_courses
from .get_active_weeks import get_active_weeks
from .compute_participant_activity import compute_participant_activity
from .save_participant_course_analytics import save_participant_course_analytics

__all__ = [
    'get_running_past_courses',
    'get_active_weeks',
    'compute_participant_activity',
    'save_participant_course_analytics',
]
🧰 Tools
🪛 Ruff (0.8.2)

1-1: .get_running_past_courses.get_running_past_courses imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


2-2: .get_active_weeks.get_active_weeks imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


3-3: .compute_participant_activity.compute_participant_activity imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


4-4: .save_participant_course_analytics.save_participant_course_analytics imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

apps/analytics/README.md (3)

5-9: 🛠️ Refactor suggestion

Add missing critical requirements.

Please include:

  • Database requirements (type and version) since Prisma is used
  • Minimum system requirements (memory, disk space)
  • More specific version ranges (e.g., "Python >=3.12.0, <3.13.0")

1-24: 🛠️ Refactor suggestion

Add essential documentation sections.

Consider adding the following sections to make the documentation more complete:

  • Configuration guide (environment variables, config files)
  • Contributing guidelines
  • Development workflow
  • Troubleshooting guide
  • API documentation (if applicable)
🧰 Tools
🪛 LanguageTool

[grammar] ~16-~16: The word ‘install’ is not a noun.
Context: ...onment not matching specifications, the install command or script execution might fail....

(A_INSTALL)


15-15: 💡 Codebase verification

Update README to accurately describe the sync-schema script's functionality

The script exists and is functional, but the README's description is not entirely accurate. The script specifically syncs .prisma files from packages/prisma/src/prisma/schema/ to apps/analytics/prisma/schema/, excluding js.prisma. The current README description oversimplifies this and could mislead users.

Suggested locations to update:

  • apps/analytics/README.md: Update the instruction to: "Make sure that all required Prisma schema files are available in apps/analytics/prisma/schema/. If this is not the case, run the util/sync-schema.sh script to sync them from the main schema directory."
🔗 Analysis chain

Verify the sync-schema script location and functionality.

The README references util/sync-schema.sh but its existence and location should be verified.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the sync-schema script exists and is executable
if [ -f "util/sync-schema.sh" ]; then
    echo "Script exists"
    ls -l "util/sync-schema.sh"
else
    echo "Script not found at util/sync-schema.sh"
    # Search for the script in the repository
    fd "sync-schema.sh"
fi

Length of output: 165


Script:

#!/bin/bash
# Let's verify the content of the script to ensure it matches the README's description
cat util/sync-schema.sh

# Also check for .prisma files in the prisma directory
fd -e prisma

Length of output: 883

apps/frontend-manage/src/components/analytics/overview/QuizDashboardLabel.tsx (1)

5-14: 🛠️ Refactor suggestion

Add TypeScript return type and improve accessibility

The component should have explicit TypeScript types and better accessibility support.

Apply these improvements:

-function QuizDashboardLabel() {
+function QuizDashboardLabel({ className }: { className?: string }): JSX.Element {
   const t = useTranslations()
 
   return (
     <>
-      <FontAwesomeIcon className="mr-1" icon={faChartPie} />
+      <FontAwesomeIcon 
+        className="mr-1" 
+        icon={faChartPie}
+        aria-hidden="true"
+      />
-      <div>{t('manage.analytics.quizDashboard')}</div>
+      <div className={className}>{t('manage.analytics.quizDashboard')}</div>
     </>
   )
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

function QuizDashboardLabel({ className }: { className?: string }): JSX.Element {
  const t = useTranslations()

  return (
    <>
      <FontAwesomeIcon 
        className="mr-1" 
        icon={faChartPie}
        aria-hidden="true"
      />
      <div className={className}>{t('manage.analytics.quizDashboard')}</div>
    </>
  )
}
apps/frontend-manage/src/components/analytics/overview/ActivityDashboardLabel.tsx (1)

1-16: 🛠️ Refactor suggestion

Consider creating a shared base component

This component shares identical structure with QuizDashboardLabel and PerformanceDashboardLabel. Consider creating a reusable base component.

Create a new shared component:

// components/analytics/overview/DashboardLabel.tsx
import { IconDefinition } from '@fortawesome/free-solid-svg-icons'
import { FontAwesomeIcon } from '@fortawesome/react-fontawesome'

interface DashboardLabelProps {
  icon: IconDefinition
  translationKey: string
  className?: string
}

export function DashboardLabel({ 
  icon, 
  translationKey,
  className 
}: DashboardLabelProps): JSX.Element {
  const t = useTranslations()
  
  return (
    <>
      <FontAwesomeIcon 
        className="mr-1" 
        icon={icon}
        aria-hidden="true"
      />
      <div className={className}>{t(translationKey)}</div>
    </>
  )
}

Then simplify this component to:

-import { faChartLine } from '@fortawesome/free-solid-svg-icons'
-import { FontAwesomeIcon } from '@fortawesome/react-fontawesome'
-import { useTranslations } from 'next-intl'
+import { DashboardLabel } from './DashboardLabel'
+import { faChartLine } from '@fortawesome/free-solid-svg-icons'

-function ActivityDashboardLabel() {
+function ActivityDashboardLabel({ className }: { className?: string }): JSX.Element {
-  const t = useTranslations()
-
-  return (
-    <>
-      <FontAwesomeIcon className="mr-1" icon={faChartLine} />
-      <div>{t('manage.analytics.activityDashboard')}</div>
-    </>
-  )
+  return <DashboardLabel
+    icon={faChartLine}
+    translationKey="manage.analytics.activityDashboard"
+    className={className}
+  />
 }
apps/frontend-manage/src/components/analytics/overview/PerformanceDashboardLabel.tsx (2)

1-16: 🛠️ Refactor suggestion

Apply shared component pattern consistently

This component should also use the shared DashboardLabel component proposed earlier.

Apply the same pattern:

-import { faChartSimple } from '@fortawesome/free-solid-svg-icons'
-import { FontAwesomeIcon } from '@fortawesome/react-fontawesome'
-import { useTranslations } from 'next-intl'
+import { DashboardLabel } from './DashboardLabel'
+import { faChartSimple } from '@fortawesome/free-solid-svg-icons'

-function PerformanceDashboardLabel() {
+function PerformanceDashboardLabel({ className }: { className?: string }): JSX.Element {
-  const t = useTranslations()
-
-  return (
-    <>
-      <FontAwesomeIcon className="mr-1" icon={faChartSimple} />
-      <div>{t('manage.analytics.performanceDashboard')}</div>
-    </>
-  )
+  return <DashboardLabel
+    icon={faChartSimple}
+    translationKey="manage.analytics.performanceDashboard"
+    className={className}
+  />
 }

Committable suggestion skipped: line range outside the PR's diff.


1-16: 💡 Codebase verification

Translation key not found in message files

The translation key 'manage.analytics.performanceDashboard' appears to be missing from both English and German translation files. Please add the translation key to the message files at:

  • packages/i18n/messages/en.ts
  • packages/i18n/messages/de.ts
🔗 Analysis chain

Verify translation keys exist

Ensure that all translation keys used in these components are defined in the translation files.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for translation key definitions
rg -g '*.{json,ts,js}' "manage\.analytics\.(quizDashboard|activityDashboard|performanceDashboard)"

Length of output: 100


Script:

#!/bin/bash
# Search for translation files in the repository
fd -e json messages

# After finding the translation files, let's check their content
fd -e json messages -x cat {}

Length of output: 55


Script:

#!/bin/bash
# Try to find translation files in common i18n/locales directories
fd -e json . -p "i18n|locales|translations"

# Let's also check for any files that might contain translation definitions
rg -g '*.{json,ts,js,tsx}' -i "translations|messages|i18n" --files-with-matches

Length of output: 22851


Script:

#!/bin/bash
# Let's check the content of the translation files we found
cat packages/i18n/messages/en.ts packages/i18n/messages/de.ts

Length of output: 131031


Script:

#!/bin/bash
# Let's check if the translation key exists in the messages files
rg "manage\.analytics\.performanceDashboard" packages/i18n/messages/en.ts packages/i18n/messages/de.ts

Length of output: 104

apps/frontend-manage/src/components/analytics/overview/AnalyticsCourseLabel.tsx (1)

13-18: 🛠️ Refactor suggestion

Optimize date comparisons and add error handling.

The date comparisons could be memoized, and error handling should be added for invalid dates.

+import { useMemo } from 'react'

 function AnalyticsCourseLabel({
   course,
 }: {
   course: Pick<Course, 'id' | 'name' | 'startDate' | 'endDate'>
 }) {
-  const isPast = course.endDate
-    ? dayjs(course.endDate).isBefore(dayjs())
-    : false
-  const isFuture = course.startDate
-    ? dayjs(course.startDate).isAfter(dayjs())
-    : false
+  const { isPast, isFuture } = useMemo(() => {
+    try {
+      return {
+        isPast: course.endDate ? dayjs(course.endDate).isBefore(dayjs()) : false,
+        isFuture: course.startDate ? dayjs(course.startDate).isAfter(dayjs()) : false
+      }
+    } catch (error) {
+      console.error('Error parsing course dates:', error)
+      return { isPast: false, isFuture: false }
+    }
+  }, [course.startDate, course.endDate])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

  const { isPast, isFuture } = useMemo(() => {
    try {
      return {
        isPast: course.endDate ? dayjs(course.endDate).isBefore(dayjs()) : false,
        isFuture: course.startDate ? dayjs(course.startDate).isAfter(dayjs()) : false
      }
    } catch (error) {
      console.error('Error parsing course dates:', error)
      return { isPast: false, isFuture: false }
    }
  }, [course.startDate, course.endDate])
apps/frontend-manage/src/components/analytics/overview/CourseDashboardList.tsx (2)

7-11: 🛠️ Refactor suggestion

Add loading and error states.

The component should handle loading and error states for better user experience.

 function CourseDashboardList({
   courses,
+  isLoading,
+  error,
 }: {
   courses?: Pick<Course, 'id' | 'name' | 'startDate' | 'endDate'>[] | null
+  isLoading?: boolean
+  error?: Error
 }) {

Committable suggestion skipped: line range outside the PR's diff.


1-42: 🛠️ Refactor suggestion

Wrap component with error boundary.

Consider implementing an error boundary to gracefully handle runtime errors.

+import { ErrorBoundary } from '@uzh-bf/design-system'

 function CourseDashboardList({
   // ... props
 }) {
   // ... component logic
 }

-export default CourseDashboardList
+export default function CourseDashboardListWithErrorBoundary(props) {
+  return (
+    <ErrorBoundary fallback={<ErrorMessage />}>
+      <CourseDashboardList {...props} />
+    </ErrorBoundary>
+  )
+}

Committable suggestion skipped: line range outside the PR's diff.

apps/frontend-manage/src/components/analytics/overview/DashboardButtons.tsx (1)

11-48: 🛠️ Refactor suggestion

Optimize component performance and organization

Consider these improvements for better performance and maintainability:

+const DASHBOARD_BUTTONS = [
+  {
+    href: (courseId: string) => `/analytics/${courseId}/activity`,
+    label: <ActivityDashboardLabel />,
+    getCy: (courseName: string) => `activity-dashboard-button-${courseName}`,
+  },
+  // ... other buttons
+];

 function DashboardButtons({ course }: { course: Pick<Course, 'id' | 'name'> }) {
   const router = useRouter()
+  const handleNavigation = useCallback((href: string) => {
+    router.push(href)
+  }, [router])
 
   return (
     <div
-      key={`dashboards-${course.id}`}
       className="flex h-10 flex-row items-center gap-2"
     >
-      {[
-        {
-          href: `/analytics/${course.id}/activity`,
-          // ...
-        },
-      ].map((button, ix) => (
+      {DASHBOARD_BUTTONS.map((button, ix) => (
         <Button
           className={{
             root: 'h-8.5 border-uzh-blue-40 shadow-none',
           }}
-          onClick={() => router.push(button.href)}
-          data={{ cy: button.cy }}
+          onClick={() => handleNavigation(button.href(course.id))}
+          data={{ cy: button.getCy(course.name) }}
           key={`dashboard-button-${ix}-course-${course.id}`}
         >
           {button.label}
         </Button>

Committable suggestion skipped: line range outside the PR's diff.

apps/frontend-manage/src/components/analytics/overview/AnalyticsNavigation.tsx (2)

29-35: ⚠️ Potential issue

Add error handling for GraphQL query

The component currently handles loading state but doesn't handle query errors, which could lead to runtime errors.

-  const { data, loading } = useQuery(GetUserCoursesDocument)
+  const { data, loading, error } = useQuery(GetUserCoursesDocument)
   const router = useRouter()
   const t = useTranslations()
 
+  if (error) {
+    return (
+      <div className="text-red-600">
+        {t('shared.errors.failedToLoadCourses')}
+      </div>
+    )
+  }
+
   if (loading) {
     return <Loader />
   }

Committable suggestion skipped: line range outside the PR's diff.


46-62: 🛠️ Refactor suggestion

Optimize SelectField implementation and add null safety

The current implementation could benefit from performance optimizations and better null handling.

+  const courseItems = useMemo(
+    () =>
+      data?.userCourses?.map((course) => ({
+        label: course.name,
+        value: course.id,
+      })) ?? [],
+    [data?.userCourses]
+  )
+
+  const handleCourseChange = useCallback(
+    (value: string) => {
+      router.push({ pathname: `/analytics/${value}/${slug}` })
+    },
+    [router, slug]
+  )

   <div className="hidden justify-center md:flex">
     <SelectField
       label={`${t('shared.generic.course')}:`}
       labelType="large"
-      value={router.query.courseId as string}
+      value={router.query.courseId?.toString() ?? ''}
       items={courseItems}
-      onChange={(value) => {
-        router.push({ pathname: `/analytics/${value}/${slug}` })
-      }}
+      onChange={handleCourseChange}
       className={{ select: { trigger: 'h-8' } }}
     />
   </div>

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
packages/prisma/src/prisma/schema/element.prisma (2)

Line range hint 156-166: Track migration tasks for ElementStackType

The migration path is well-documented, but we should ensure these TODOs are tracked in the issue tracker:

  1. Drop type field after migration
  2. Rename typeNEW to ElementStackType and make it required

Would you like me to create GitHub issues to track these migration tasks?


Line range hint 1-400: Consider implementing analytics-specific indices

Given the focus on learning analytics, consider adding database indices to optimize analytics queries, particularly for:

  • ElementInstance.instancePerformance lookups
  • ElementType filtering for new types (CONTENT, FLASHCARD)

This will improve the performance of analytics queries, especially as the dataset grows.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e721db and a47b11f.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (2)
  • packages/graphql/src/services/stacks.ts (0 hunks)
  • packages/prisma/src/prisma/schema/element.prisma (1 hunks)
💤 Files with no reviewable changes (1)
  • packages/graphql/src/services/stacks.ts
🔇 Additional comments (2)
packages/prisma/src/prisma/schema/element.prisma (2)

90-94: Verify InstancePerformance model definition

The addition of instancePerformance field for analytics tracking looks good, but we need to ensure the referenced type is properly defined.

✅ Verification successful

Based on the search results, I can see that the InstancePerformance model is properly defined in packages/prisma/src/prisma/schema/analytics.prisma with the following fields:

  • id (Int, auto-incremented primary key)
  • responseCount (Int)
  • averageTimeSpent (Float)
  • firstErrorRate, firstPartialRate, firstCorrectRate (Float, optional)
  • lastErrorRate, lastPartialRate, lastCorrectRate (Float, optional)
  • totalErrorRate, totalPartialRate, totalCorrectRate (Float)
  • instance relation (ElementInstance, one-to-one)
  • course relation (Course, many-to-one)

Additionally, I found the migration file 20241206141120_learning_analytics_performance_progress/migration.sql that introduced these analytics-related changes.

InstancePerformance model is properly defined

The addition of the instancePerformance field to track analytics is well-structured and properly integrated with the necessary relations and migrations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify InstancePerformance model definition and usage

# Look for InstancePerformance model definition
echo "Checking for InstancePerformance model definition..."
rg -l "model InstancePerformance" --type prisma

# Check for related migrations
echo "Checking for related migrations..."
fd -e sql -e prisma . | rg -l "InstancePerformance"

Length of output: 326


Script:

#!/bin/bash
# Let's try a different approach to find the model definition and related files

# Search for InstancePerformance in all files
echo "Searching for InstancePerformance model..."
rg "model InstancePerformance" -A 5

# Find all .prisma files and check their content
echo "Checking .prisma files..."
fd -e prisma . -x cat {}

# Look for any migrations or schema changes
echo "Looking for migrations..."
fd -g "*migration*.sql" .

Length of output: 64668


Line range hint 6-15: Verify handling of new ElementTypes across the codebase

The addition of CONTENT and FLASHCARD types is aligned with the learning analytics feature. However, we should ensure proper handling across related components.

✅ Verification successful

Let me gather more specific information about the switch statements by modifying the pattern to better match ElementType cases.


Based on the comprehensive results, I can now provide the final response:

New ElementTypes are properly handled across the codebase

The new CONTENT and FLASHCARD types are well-implemented throughout the codebase with proper handling in:

  • Type-specific response processing in frontend components
  • Data transformation and migration scripts
  • GraphQL services and resolvers
  • Evaluation and grading logic

The codebase shows consistent handling of these types alongside existing ones, with proper type checks and corresponding business logic in:

  • Response processing (packages/graphql/src/services/stacks.ts)
  • Results aggregation (packages/graphql/src/scripts/2024-05-07_compute_aggregated_FCCT_responses.ts)
  • Frontend components (apps/frontend-pwa/src/components/practiceQuiz/ElementStack.tsx)
  • Data migration scripts (packages/prisma/src/scripts/)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify implementation of new ElementTypes

# Check for CONTENT type handling
echo "Checking CONTENT type implementation..."
rg -l "ElementType.CONTENT|type: ['|\"]CONTENT" --type ts

# Check for FLASHCARD type handling
echo "Checking FLASHCARD type implementation..."
rg -l "ElementType.FLASHCARD|type: ['|\"]FLASHCARD" --type ts

# Check for potential switch statements that need updating
echo "Checking switch statements on ElementType..."
ast-grep --pattern 'switch ($expr) { $$$case ElementType$$$ }'

Length of output: 2416


Script:

#!/bin/bash
# Check for switch statements specifically handling ElementType cases with context
rg "switch.*ElementType.*\{" -A 10 --type ts

# Check for if-else chains handling ElementType
rg "if.*ElementType\." -A 5 --type ts

# Look for type-specific handling in function signatures
ast-grep --pattern 'function $name($_: { type: ElementType, $$$ }) { $$$ }'

Length of output: 39299

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (46)
apps/analytics/src/scripts/2_initial_aggregated_course_analytics.py (4)

4-4: Remove unused import 'os'
The import statement for os is never used in this file. Removing it will help avoid clutter.

Suggested fix:

- import os
 from prisma import Prisma
🧰 Tools
🪛 Ruff (0.8.2)

4-4: os imported but unused

Remove unused import: os

(F401)


6-6: Remove unused import 'pandas'
Although you use a DataFrame object from another module, the direct import pandas as pd is never referenced here. Consider removing it.

Suggested fix:

- import pandas as pd
+ # import pandas as pd  # (Remove altogether if not needed)
🧰 Tools
🪛 Ruff (0.8.2)

6-6: pandas imported but unused

Remove unused import: pandas

(F401)


32-32: Remove extraneous 'f' prefix in the print statement
The string does not contain any placeholders and therefore does not need an f-string.

Suggested fix:

- print(f"Processing course", idx, "of", len(df_courses), "with id", course["id"])
+ print("Processing course", idx, "of", len(df_courses), "with id", course["id"])
🧰 Tools
🪛 Ruff (0.8.2)

32-32: f-string without any placeholders

Remove extraneous f prefix

(F541)


1-38: Consider adding tests to validate the script's functionality
This script could benefit from automated tests to confirm that course analytics are computed correctly and that any future modifications won't break existing functionality.

🧰 Tools
🪛 Ruff (0.8.2)

4-4: os imported but unused

Remove unused import: os

(F401)


6-6: pandas imported but unused

Remove unused import: pandas

(F401)


32-32: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/src/scripts/6_initial_activity_progress.py (4)

4-4: Remove the unused 'os' import.

This import is flagged as unused by static analysis tools, so removing it will reduce clutter and potential confusion.

Apply this diff to remove the unused import:

- import os
 from prisma import Prisma
 import pandas as pd
 import sys
🧰 Tools
🪛 Ruff (0.8.2)

4-4: os imported but unused

Remove unused import: os

(F401)


6-6: Remove the unused 'pandas' import.

Similarly, this import is also unused. Removing it keeps dependencies clear and avoids unnecessary clutter.

Apply this diff to remove the unused import:

 import prisma
-import pandas as pd
 import sys
🧰 Tools
🪛 Ruff (0.8.2)

6-6: pandas imported but unused

Remove unused import: pandas

(F401)


42-42: Fix the extraneous 'f' prefix in the print statement.

Since the statement does not contain any interpolation placeholders, the f prefix is unnecessary. Alternatively, you could keep the f string and add actual placeholders.

Apply one of the following diffs:

Option A: Remove the f prefix if no interpolation is intended:

-print(f"Processing course", idx, "of", len(df_courses), "with id", course_id)
+print("Processing course", idx, "of", len(df_courses), "with id", course_id)

Option B: Keep the f prefix and use placeholders for improved readability:

-print(f"Processing course", idx, "of", len(df_courses), "with id", course_id)
+print(f"Processing course {idx} of {len(df_courses)} with id {course_id}")
🧰 Tools
🪛 Ruff (0.8.2)

42-42: f-string without any placeholders

Remove extraneous f prefix

(F541)


39-76: Consider adding error handling for database and function calls.

The current script assumes that all operations succeed. In production, exceptions (e.g., network disruptions, incomplete data, or stale connections) could occur.

As a follow-up, you could wrap critical sections in try/except blocks or use context managers for database operations:

try:
    db.connect()
    # Perform queries and computations
finally:
    db.disconnect()

This helps ensure the script remains robust even under unexpected conditions.

🧰 Tools
🪛 Ruff (0.8.2)

42-42: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/src/scripts/4_initial_participant_performance.py (5)

4-4: Remove the unused import.

The import statement for "os" is not referenced anywhere in this script and should be removed to comply with best practices and to avoid clutter.

- import os
+ # import os not used, removing
🧰 Tools
🪛 Ruff (0.8.2)

4-4: os imported but unused

Remove unused import: os

(F401)


28-30: Consider using the verbose variable or removing it.

You have defined "verbose = False" but it does not appear to be used. If you intended to implement logging or detailed console output, implement it. Otherwise, remove this unused variable to keep the code clean.


38-38: Remove unnecessary f-string.

This print statement uses the 'f' prefix but doesn't utilize any placeholder expressions in curly braces.

- print(f"Processing course", idx, "of", len(df_courses), "with id", course_id)
+ print("Processing course", idx, "of", len(df_courses), "with id", course_id)
🧰 Tools
🪛 Ruff (0.8.2)

38-38: f-string without any placeholders

Remove extraneous f prefix

(F541)


25-27: Add exception handling for database connection.

While the script correctly connects and disconnects from Prisma, consider wrapping the database operations in a try-except block to handle potential connection errors. This helps avoid incomplete runs in production environments.


35-53: Data integrity & performance assurance.

This block processes each course and saves performance analytics. For large datasets, consider chunking or batched writes. Also, double-check that the data insertion logic properly handles potential duplicates or conflicts in the database.

🧰 Tools
🪛 Ruff (0.8.2)

38-38: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/src/scripts/5_initial_participant_course_analytics.py (4)

5-7: Remove unused imports "os", "json", and "datetime".
These imports are flagged by static analysis as unused. Please remove them to keep the codebase clean and comply with best practices.

-import os
-import json
-from datetime import datetime
🧰 Tools
🪛 Ruff (0.8.2)

5-5: os imported but unused

Remove unused import: os

(F401)


6-6: json imported but unused

Remove unused import: json

(F401)


7-7: datetime.datetime imported but unused

Remove unused import: datetime.datetime

(F401)


9-9: Remove unused import "pandas".
Although a DataFrame is used, it appears to be returned by the helper functions. The code here does not use the "pd" namespace directly.

-import pandas as pd
🧰 Tools
🪛 Ruff (0.8.2)

9-9: pandas imported but unused

Remove unused import: pandas

(F401)


31-31: "verbose" variable is defined but never used.
Consider removing it or incorporating logging statements controlled by this variable to make the script more informative.


40-40: Fix the extraneous f-string usage.
No placeholders exist in the string as currently written. Either remove the "f" prefix or convert it to a proper f-string with placeholders.

- print(f"Processing course", idx, "of", len(df_courses), "with id", course["id"])
+ print(f"Processing course {idx} of {len(df_courses)} with id {course['id']}")
🧰 Tools
🪛 Ruff (0.8.2)

40-40: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/package.json (2)

5-7: Consider adding @types/node and stricter version constraints

  1. Since this is a Node.js project (as specified in engines), consider adding @types/node as a dev dependency for better TypeScript support.
  2. For development tools like nodemon, it's safer to use exact versions (^) instead of patch-level updates (~) to ensure consistent behavior across development environments.
 "devDependencies": {
-  "nodemon": "~3.1.7"
+  "nodemon": "^3.1.7",
+  "@types/node": "^20.0.0"
 }

8-15: Improve script configuration and documentation

  1. The script, script:prod, and script:qa commands are incomplete as they require a module name parameter.
  2. The watch paths in the dev script could be more specific to avoid unnecessary rebuilds.

Consider adding:

  1. Documentation comments for each script's purpose and usage
  2. More specific watch paths in the dev script
  3. A helper script that validates the module parameter
 "scripts": {
+  // Watch and reload the analytics service in development
   "dev": "doppler run --config dev -- nodemon --exec 'poetry run poe main' --watch src,prisma --ext py,prisma",
+  // Generate Prisma client
   "generate": "poetry run poe generate",
+  // Run the analytics service
   "main": "doppler run --config dev -- poetry run poe main",
+  // Run a specific Python module in development (usage: pnpm script <module_name>)
   "script": "doppler run --config dev -- poetry run python -m",
+  // Run a specific Python module in production (usage: pnpm script:prod <module_name>)
   "script:prod": "doppler run --config prd -- poetry run python -m",
+  // Run a specific Python module in QA (usage: pnpm script:qa <module_name>)
   "script:qa": "doppler run --config qa -- poetry run python -m"
 }
apps/analytics/src/scripts/0_initial_participant_analytics.py (2)

4-5: Remove unused imports.
“os” and “json” are not utilized here, so consider removing them.

Apply this diff to remove the unused imports:

- import os
- import json
🧰 Tools
🪛 Ruff (0.8.2)

4-4: os imported but unused

Remove unused import: os

(F401)


5-5: json imported but unused

Remove unused import: json

(F401)


42-91: Consolidate repetitive daily, weekly, and monthly analytics logic.
Currently, there is repetitive code for daily, weekly, and monthly computations. Consider extracting a reusable function or loop structure to handle these intervals. This approach reduces duplication and simplifies future maintenance.

apps/analytics/src/scripts/3_initial_instance_activity_performance.py (2)

4-6: Remove unused imports to optimize your script.
Currently, the script doesn’t use the "os", "json", "datetime", or "pandas" modules. Removing them would comply with static analysis guidance and improve maintainability.

Here is a suggested diff:

-import os
-import json
-from datetime import datetime
-import pandas as pd

Also applies to: 8-8

🧰 Tools
🪛 Ruff (0.8.2)

4-4: os imported but unused

Remove unused import: os

(F401)


5-5: json imported but unused

Remove unused import: json

(F401)


6-6: datetime.datetime imported but unused

Remove unused import: datetime.datetime

(F401)


45-45: Convert the print statement to either a standard print or use placeholders within the f-string.
The current f-string on this line does not use any placeholders, which triggers a static analysis warning. Consider the following improvement for clearer output:

-print(f"Processing course", idx, "of", len(df_courses), "with id", course_id)
+print(f"Processing course {idx} of {len(df_courses)} with id {course_id}")
🧰 Tools
🪛 Ruff (0.8.2)

45-45: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/src/scripts/1_initial_aggregated_analytics.py (4)

4-4: Remove unused import 'os'
According to the static analysis (F401), the 'os' module is not used in the script. Removing it will clean up the imports.

- import os
🧰 Tools
🪛 Ruff (0.8.2)

4-4: os imported but unused

Remove unused import: os

(F401)


5-5: Remove unused import 'json'
As indicated by the static analysis (F401), the 'json' module is not used in this script. Consider removing it to keep imports concise.

- import json
🧰 Tools
🪛 Ruff (0.8.2)

5-5: json imported but unused

Remove unused import: json

(F401)


37-45: Consider using logging instead of print statements
Print statements provide basic info, but using the logging module allows for configurable log levels, better formatting, and integration with external log monitoring. This can be especially beneficial for batch jobs running multiple computations.


92-92: Remove extraneous 'f' prefix
Static analysis (F541) indicates this f-string has no placeholders. You can simply remove the prefix.

- print(f"Computing course-wide aggregated analytics")
+ print("Computing course-wide aggregated analytics")
🧰 Tools
🪛 Ruff (0.8.2)

92-92: f-string without any placeholders

Remove extraneous f prefix

(F541)

packages/prisma/src/prisma/schema/analytics.prisma (4)

14-18: Consider reusing existing enums or centralizing naming conventions.
The new "PerformanceLevel" enum is well-defined, but it duplicates the idea of "ActivityLevel" declared earlier. If they're semantically different, clarify it in comments or rename to avoid confusion.


42-47: Validate optional counts and avoid negative or inconsistent values.
These new float fields capturing first/last correctness/wrongness rates can cause confusion if left uninitialized or negative. Consider enforcing data integrity checks during inserts/updates.


155-163: Evaluate future extensibility of weekly participant activity fields.
Storing each weekday’s activity count in a single row is fine for now, but if you envisage adding multiple time windows (e.g., monthly or daily details), consider a separate table or structure.


270-288: ActivityProgress model is straightforward.
The fields are self-explanatory for progress tracking. Consider minimum values (e.g., repeatedCount >= 0) if data anomalies can occur.

apps/analytics/src/modules/participant_activity_performance/save_participant_activity_performance.py (3)

1-2: Iterate carefully over DataFrame rows.
Using “iterrows()” for potentially large data sets can impact performance. Consider vectorized or batch upsert approaches if performance becomes an issue.


3-7: Creation and update values are logically consistent.
The approach of mirroring fields in both dictionaries for upsert is clear. Optional: unify them if identical to reduce duplication.

Also applies to: 8-11


25-26: MicroLearnings section mirrors the quiz logic well.
No immediate issues. Could be refactored into a smaller shared routine if duplication grows over time.

Also applies to: 27-35

apps/analytics/src/scripts/7_participant_activity_performance.py (2)

5-5: Unused imports detected by static analysis.
“import os” and “import pandas as pd” are flagged as unused. Remove them if you’re not using these modules.

Also applies to: 7-7

🧰 Tools
🪛 Ruff (0.8.2)

5-5: os imported but unused

Remove unused import: os

(F401)


38-38: Remove unnecessary f-string usage.
The print statement includes “f” but no placeholders, so it can be simplified.

🧰 Tools
🪛 Ruff (0.8.2)

38-38: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/src/modules/participant_activity_performance/prepare_participant_activity_data.py (1)

59-72: Nested loops are generally OK but watch out for performance.
This straightforward approach is fine for moderate data sizes. If you handle large courses or many responses, look into bulk operations or indexes to speed queries.

apps/analytics/src/modules/participant_activity_performance/agg_participant_activity_performance.py (2)

20-20: Rename loop index to avoid confusion.
The variable idx is unused. Rename it to “idx” or “” to adhere to static analysis warnings and clarify intent.

🧰 Tools
🪛 Ruff (0.8.2)

20-20: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)


22-82: DataFrame concatenation in a loop can be costly.
Repeatedly calling pd.concat() in a loop might degrade performance. Consider building a list of dictionaries and creating a single DataFrame at the end.

apps/analytics/src/notebooks/participant_activity_performance.ipynb (1)

98-98: Validate database disconnection logic.
Calling db.disconnect() is good practice. Ensure that this method is not prematurely disconnecting, especially if multiple notebooks or scripts reuse the same Prisma client instance within the same session.

packages/prisma/src/prisma/schema/participant.prisma (2)

51-52: Confirm optional vs. mandatory relationships.
Both fields 'participantActivityPerformances' and 'coursePerformances' are declared as arrays. Make sure the foreign key constraints and cascade rules behave as intended (e.g., no orphaned records if a participant is removed).


51-52: Validate indexing strategy for performance data.
Consider adding composite indexes or partial indexes on these new performance-related fields if you plan to query large volumes of participant performance data frequently.

packages/prisma/src/prisma/schema/quiz.prisma (2)

34-36: Ensure performance vs. progress fields are well-defined.
These newly-added fields in 'PracticeQuiz' can lead to confusion if there's conceptual overlap (e.g., performance vs. completion). Provide clear definitions and usage guidelines for each field to avoid data inconsistencies.


219-221: Sustain consistent naming conventions.
'performance' and 'progress' are added to 'MicroLearning'. For clarity, consider harmonizing them with any existing naming patterns (e.g., 'activityPerformance', 'activityProgress') to maintain consistency across the codebase.

packages/prisma/src/prisma/migrations/20241223092511_participant_activity_performance/migration.sql (1)

4-5: Consider adding constraints for score and completion fields.

The totalScore and completion fields should have value constraints:

  • completion should be between 0 and 1 (or 0 and 100 if using percentage)
  • totalScore might benefit from a non-negative constraint
ALTER TABLE "ParticipantActivityPerformance"
ADD CONSTRAINT "completion_range" CHECK ("completion" >= 0 AND "completion" <= 1),
ADD CONSTRAINT "score_non_negative" CHECK ("totalScore" >= 0);
apps/analytics/src/modules/participant_activity_performance/__init__.py (1)

1-3: Define module's public API explicitly.

Add __all__ to explicitly declare the module's public interface and silence the unused import warnings.

from .prepare_participant_activity_data import prepare_participant_activity_data
from .save_participant_activity_performance import save_participant_activity_performance
from .agg_participant_activity_performance import agg_participant_activity_performance
+
+__all__ = [
+    'prepare_participant_activity_data',
+    'save_participant_activity_performance',
+    'agg_participant_activity_performance',
+]
🧰 Tools
🪛 Ruff (0.8.2)

1-1: .prepare_participant_activity_data.prepare_participant_activity_data imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


2-2: .save_participant_activity_performance.save_participant_activity_performance imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


3-3: .agg_participant_activity_performance.agg_participant_activity_performance imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a47b11f and 08acc27.

📒 Files selected for processing (21)
  • apps/analytics/package.json (1 hunks)
  • apps/analytics/src/__init__.py (1 hunks)
  • apps/analytics/src/modules/__init__.py (1 hunks)
  • apps/analytics/src/modules/participant_activity_performance/__init__.py (1 hunks)
  • apps/analytics/src/modules/participant_activity_performance/agg_participant_activity_performance.py (1 hunks)
  • apps/analytics/src/modules/participant_activity_performance/prepare_participant_activity_data.py (1 hunks)
  • apps/analytics/src/modules/participant_activity_performance/save_participant_activity_performance.py (1 hunks)
  • apps/analytics/src/notebooks/participant_activity_performance.ipynb (1 hunks)
  • apps/analytics/src/notebooks/participant_analytics.ipynb (1 hunks)
  • apps/analytics/src/scripts/0_initial_participant_analytics.py (1 hunks)
  • apps/analytics/src/scripts/1_initial_aggregated_analytics.py (1 hunks)
  • apps/analytics/src/scripts/2_initial_aggregated_course_analytics.py (1 hunks)
  • apps/analytics/src/scripts/3_initial_instance_activity_performance.py (1 hunks)
  • apps/analytics/src/scripts/4_initial_participant_performance.py (1 hunks)
  • apps/analytics/src/scripts/5_initial_participant_course_analytics.py (1 hunks)
  • apps/analytics/src/scripts/6_initial_activity_progress.py (1 hunks)
  • apps/analytics/src/scripts/7_participant_activity_performance.py (1 hunks)
  • packages/prisma/src/prisma/migrations/20241223092511_participant_activity_performance/migration.sql (1 hunks)
  • packages/prisma/src/prisma/schema/analytics.prisma (8 hunks)
  • packages/prisma/src/prisma/schema/participant.prisma (1 hunks)
  • packages/prisma/src/prisma/schema/quiz.prisma (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/analytics/src/notebooks/participant_analytics.ipynb
🧰 Additional context used
🪛 Ruff (0.8.2)
apps/analytics/src/scripts/2_initial_aggregated_course_analytics.py

4-4: os imported but unused

Remove unused import: os

(F401)


6-6: pandas imported but unused

Remove unused import: pandas

(F401)


32-32: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/src/scripts/6_initial_activity_progress.py

4-4: os imported but unused

Remove unused import: os

(F401)


6-6: pandas imported but unused

Remove unused import: pandas

(F401)


42-42: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/src/scripts/4_initial_participant_performance.py

4-4: os imported but unused

Remove unused import: os

(F401)


38-38: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/src/scripts/1_initial_aggregated_analytics.py

4-4: os imported but unused

Remove unused import: os

(F401)


5-5: json imported but unused

Remove unused import: json

(F401)


92-92: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/src/scripts/0_initial_participant_analytics.py

4-4: os imported but unused

Remove unused import: os

(F401)


5-5: json imported but unused

Remove unused import: json

(F401)

apps/analytics/src/modules/participant_activity_performance/agg_participant_activity_performance.py

20-20: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

apps/analytics/src/scripts/7_participant_activity_performance.py

5-5: os imported but unused

Remove unused import: os

(F401)


7-7: pandas imported but unused

Remove unused import: pandas

(F401)


38-38: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/src/scripts/3_initial_instance_activity_performance.py

4-4: os imported but unused

Remove unused import: os

(F401)


5-5: json imported but unused

Remove unused import: json

(F401)


6-6: datetime.datetime imported but unused

Remove unused import: datetime.datetime

(F401)


8-8: pandas imported but unused

Remove unused import: pandas

(F401)


45-45: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/src/scripts/5_initial_participant_course_analytics.py

5-5: os imported but unused

Remove unused import: os

(F401)


6-6: json imported but unused

Remove unused import: json

(F401)


7-7: datetime.datetime imported but unused

Remove unused import: datetime.datetime

(F401)


9-9: pandas imported but unused

Remove unused import: pandas

(F401)


40-40: f-string without any placeholders

Remove extraneous f prefix

(F541)

apps/analytics/src/__init__.py

1-1: from .modules import * used; unable to detect undefined names

(F403)


2-2: from .notebooks import * used; unable to detect undefined names

(F403)


3-3: from .scripts import * used; unable to detect undefined names

(F403)

apps/analytics/src/modules/__init__.py

1-1: from .participant_analytics import * used; unable to detect undefined names

(F403)


2-2: from .aggregated_analytics import * used; unable to detect undefined names

(F403)


3-3: from .participant_course_analytics import * used; unable to detect undefined names

(F403)


4-4: from .aggregated_course_analytics import * used; unable to detect undefined names

(F403)


5-5: from .participant_performance import * used; unable to detect undefined names

(F403)


6-6: from .instance_activity_performance import * used; unable to detect undefined names

(F403)


7-7: from .activity_progress import * used; unable to detect undefined names

(F403)


8-8: from .participant_activity_performance import * used; unable to detect undefined names

(F403)

apps/analytics/src/modules/participant_activity_performance/__init__.py

1-1: .prepare_participant_activity_data.prepare_participant_activity_data imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


2-2: .save_participant_activity_performance.save_participant_activity_performance imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


3-3: .agg_participant_activity_performance.agg_participant_activity_performance imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

🔇 Additional comments (17)
apps/analytics/package.json (1)

16-21: Verify Node.js version consistency

The Node.js version is strictly pinned to 20, which might be too restrictive. Let's verify if this matches the root configuration and consider using a more flexible version range.

✅ Verification successful

Node.js version is consistent with root configuration

The Node.js version (=20) in apps/analytics/package.json exactly matches the version specified in the root package.json. Most packages in the monorepo also use the same strict version, with only one exception (apps/office-addin using >=16). Since this follows the project-wide convention, the strict version pinning appears to be intentional.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check Node.js version consistency across the project
# Get Node.js version from root package.json
echo "Root package.json Node.js version:"
jq -r '.engines.node' package.json

# List all package.json files and their Node.js version requirements
echo "\nNode.js versions across all package.json files:"
fd package.json --type f --exec sh -c 'echo "{}" && jq -r ".engines.node // \"not specified\"" "{}"'

Length of output: 1198

apps/analytics/src/scripts/0_initial_participant_analytics.py (1)

36-40: Verify time zone handling.
Using naive date/time strings (e.g., “YYYY-MM-DD”) and converting them to “T00:00:00.000Z” might overlook actual time zone offsets. If your data or users span multiple time zones, ensure that time zone conversions are addressed or confirm that naive handling is sufficient.

✅ Verification successful

Let me search for timezone-related code and patterns in the codebase to better understand the context and potential impact.


Let me check the specific timezone handling in the files where we found datetime usage.


Based on the analysis of the codebase, I can now provide a final response:

Time zone handling is consistently implemented and safe.
The codebase consistently handles timezone-sensitive operations by:

  1. Using UTC for all database timestamps (appending 'Z' suffix)
  2. Properly localizing timestamps when needed (e.g., tz_localize("UTC") in weekday activity calculations)
  3. Following a consistent pattern where date strings are converted to timestamps with explicit timezone markers (T00:00:00.000Z or T23:59:59.999Z)

The naive date handling in the reviewed code is intentional and appropriate because:

  • Start/end dates are always converted to UTC timestamps before database operations
  • Date ranges are used for binning/aggregation purposes where timezone precision isn't critical
  • All timestamp comparisons use consistent UTC timezone markers
🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# Search for timezone-related imports and usage
rg -l "timezone|pytz|tzinfo|tz" apps/analytics/src/

# Look for datetime usage patterns
ast-grep --pattern 'datetime.now()' apps/analytics/src/

# Search for any configuration files that might contain timezone settings
fd -e yaml -e yml -e json -e conf -e ini . apps/analytics/src/

# Check if there are any other files using date ranges
rg "date_range" apps/analytics/src/

Length of output: 5141


Script:

#!/bin/bash
# Check the content of files where we found datetime usage, focusing on timezone handling
rg -B 5 -A 5 "T00:00:00.000Z|T23:59:59.999Z" apps/analytics/src/

# Check the content of the weekday activity file that had timezone references
cat apps/analytics/src/modules/aggregated_course_analytics/compute_weekday_activity.py

# Look for any timezone-related configurations or imports at the top of analytics files
rg -B 5 "^import|^from.*import" apps/analytics/src/modules/participant_analytics/compute_participant_course_analytics.py
rg -B 5 "^import|^from.*import" apps/analytics/src/scripts/0_initial_participant_analytics.py

Length of output: 37608

apps/analytics/src/scripts/3_initial_instance_activity_performance.py (3)

50-68: Review the logic for handling practice quizzes.
The core flow looks consistent:
• Compute instance performance.
• Skip if empty.
• Aggregate activity performance.
• Save instance performance data.
• Save activity performance data.

Everything seems logical. Ensure you have comprehensive tests covering courses with no quizzes and courses where quizzes have zero participants.


33-34: Database connection management is implemented correctly.
The connection is opened once and closed at the end, which is good practice. Continue to ensure no intermittent errors remain unhandled.

Also applies to: 90-91


69-89: Align microlearning flow with practice quizzes to ensure consistent data handling.
The structure parallels the practice quiz flow. Confirm that the “total_only” parameter is correctly used and tested.

packages/prisma/src/prisma/schema/analytics.prisma (6)

57-59: Consistent timestamp fields added.
The added “createdAt” and “updatedAt” fields across multiple models follow standard best practices. Make sure to maintain them consistently, especially when performing bulk or manual updates.

Also applies to: 70-70, 83-85, 104-106, 125-127, 146-148


164-169: No issues spotted with the relational mapping.
Ensuring a unique course ID for aggregated course analytics is consistent with single-record logic per course.


223-249: ActivityPerformance model looks consistent with your approach.
No critical issues found, but be mindful of large-scale expansions if you plan to store many fields per day or per attempt.


251-268: Two unique constraints might cause unexpected duplication errors.
With "@@unique([participantId, practiceQuizId])" and "@@unique([participantId, microLearningId])", ensure your logic never tries to fill both IDs in a single record.


25-26: Double-check time precision requirements for analytics.
Storing timestamps as a date only (without time) might limit or complicate more granular analyses or daily cross-overs. Confirm that this meets your analytics needs.

✅ Verification successful

Let me search for more context about the analytics schema and its usage to better understand the time precision requirements.


Let me try a different approach to search for analytics-related files and usage patterns.


Let me search for the actual usage of these timestamp fields in the analytics processing code.


Based on the code analysis, I can now provide a definitive response about the timestamp precision requirements:

@db.Date precision is appropriate for the analytics timestamps

The timestamp fields in the analytics schema are correctly using @db.Date precision because:

  • The analytics data is aggregated on a daily, weekly, monthly, or course-wide basis as shown by the AnalyticsType enum
  • The Python code processes analytics in day-level granularity, with timestamps always set to the start or end of a day (e.g., T00:00:00.000Z or T23:59:59.999Z)
  • No sub-day (hourly/minute) analytics are being computed or stored
  • Cross-day analysis is handled through the separate daily aggregations, not through time components
🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# Search for files containing analytics schema references
rg -l "analytics" --type prisma

# Search for any analytics-related queries or operations
rg "analytics" -A 5 --type typescript --type javascript

Length of output: 153


Script:

#!/bin/bash
# Search for files containing "analytics" in their name or content
fd analytics

# Search for content with analytics references using basic grep
rg "analytics" -A 5

Length of output: 74237


Script:

#!/bin/bash
# Search for timestamp usage in Python files
rg "timestamp" --type py -A 3 apps/analytics/

# Search for timestamp in the schema file
cat packages/prisma/src/prisma/schema/analytics.prisma

Length of output: 29003


195-221: Optionally, constrain first/last/total rates in “InstancePerformance.”
It’s good to track attempt-based rates. However, ensure partial rates, error rates, etc., sum to sensible values (e.g., 1.0) if that’s a design requirement.

✅ Verification successful

Let me search for any related schema constraints or validation logic in the codebase.


Let me search for any validation or calculation logic related to these rates.


Based on the code analysis so far, I can see that the rates are used in GraphQL types and services, but I haven't found explicit validation logic. Let me try one more search pattern to look for any rate-related calculations or validations.


Based on the code analysis, I can now provide a final response:

The rates in InstancePerformance are already constrained by design.

The codebase shows that these rates are handled consistently throughout the application:

  • The rates are used in pairs (error/partial/correct) across first, last, and total attempts
  • Frontend components like PerformanceRatesBarChart enforce that rates sum to 100% by calculating roundedCorrectRate = 100 - roundedErrorRate - roundedPartialRate
  • The analytics service maintains consistency by deriving rates from actual performance data

No additional database constraints are necessary as the data integrity is maintained at the application level.

🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# Search for any validation logic related to InstancePerformance rates
rg -A 5 "InstancePerformance.*Rate" --type ts

# Look for any test files related to InstancePerformance
fd "InstancePerformance.*test" --type file

# Search for any service/utility files that handle rate calculations
rg -A 5 "(first|last|total)(Error|Partial|Correct)Rate" --type ts

Length of output: 65904


Script:

#!/bin/bash
# Search for any validation or calculation logic for the rates
rg -A 5 "calculatePerformance|validateRates|computeRates" --type ts

# Look for any database triggers or constraints
fd "analytics.*sql" --type file

# Search for any test files that verify rate calculations
rg -A 5 "test.*Rate" --type ts

Length of output: 136


Script:

#!/bin/bash
# Search for any files that might contain rate calculations or validations
rg -A 5 "Rate.*=|updateRates|setRates|rate.*validation" --type ts

# Look for any prisma middleware or hooks that might handle rate validation
fd "middleware.*ts|hooks.*ts" --type file

# Search for any utility functions that handle rates
rg -A 5 "utils.*rate" --type-add 'ts:*.{ts,tsx}' --type ts

Length of output: 31660

apps/analytics/src/modules/participant_activity_performance/save_participant_activity_performance.py (1)

13-14: PracticeQuizzes upsert logic is sound.
The “where” clause ensures uniqueness on participant and quiz ID, preventing duplicates. Confirm that you handle missing or invalid “activityId” values upstream.

Also applies to: 15-23

✅ Verification successful

Let me gather information about how activityId is handled before reaching this point.


Let me search for the validation of activityId in the Python codebase.


Let me search for the complete implementation and any validation in the surrounding code.


Let me check how activityId is prepared before reaching this function.


Based on the complete code analysis, I can now generate the final response:

The activityId handling is properly validated upstream. The prepare_participant_activity_data.py ensures that:

  1. Only activities with valid statuses ("PUBLISHED", "ENDED", "GRADED") are processed
  2. ActivityIds are extracted directly from the database query results
  3. The data is properly structured before reaching the upsert operation

The uniqueness constraint in the upsert operation (participantId_practiceQuizId and participantId_microLearningId) provides additional safety against duplicates.

🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# Search for the function containing this code and its context
ast-grep --pattern 'def save_participant_activity_performance($$$)'

# Also search for any validation or preprocessing of activityId
rg "activityId" -A 5 -B 5

Length of output: 65632


Script:

#!/bin/bash
# Search for the save_participant_activity_performance function implementation
rg -A 10 -B 10 "def save_participant_activity_performance" apps/analytics/src/modules/participant_activity_performance/

# Search for any validation of activityId
rg "activityId.*validate" -A 5 -B 5

Length of output: 1735


Script:

#!/bin/bash
# Get the complete file content to understand the validation context
cat apps/analytics/src/modules/participant_activity_performance/save_participant_activity_performance.py

# Also search for any input validation in related files
rg -g "*.py" "activityId" apps/analytics/src/modules/participant_activity_performance/

Length of output: 3763


Script:

#!/bin/bash
# Check the prepare_participant_activity_data.py file for activityId handling
cat apps/analytics/src/modules/participant_activity_performance/prepare_participant_activity_data.py

# Also check the aggregation file
cat apps/analytics/src/modules/participant_activity_performance/agg_participant_activity_performance.py

Length of output: 6351

apps/analytics/src/modules/participant_activity_performance/agg_participant_activity_performance.py (2)

5-17: Grouping approach for aggregated data is clear.
The logic to sum scores, trial counts, and response counts is straightforward. Keep an eye on memory usage for large DataFrame merges.


85-87: Upserting data is correct.
Storing the aggregated performance data using save_participant_activity_performance is consistent with the rest of this module’s design.

apps/analytics/src/notebooks/participant_activity_performance.ipynb (1)

70-89: Verify handling of empty responses.
When no responses are submitted, the script correctly skips analytics processing. However, confirm that you handle edge cases such as partially completed courses or incomplete data for other fields (like missing participant metadata) before continuing analytics computations.

✅ Verification successful

Let me gather information about how edge cases are handled in the codebase.


Let me try a different approach to search for these functions and their error handling.


Let me try one more search to find the implementation of these functions.


Based on the code analysis, I can now provide a final response:

Edge cases are properly handled in the analytics pipeline.
The codebase shows comprehensive handling of empty or missing data:

  • Empty responses are explicitly checked and skipped: if df_responses.empty: continue
  • Missing participant data is handled: if df_activity.empty: print("No participant was active in the course, skipping")
  • Null checks are performed before processing: if df_details is None: print("No participant responses found...")
  • Empty performance data is gracefully handled: if df_instance_performance.empty: continue

The implementation shows robust error handling for all the edge cases mentioned in the review comment, including empty responses, partially completed courses, and missing participant metadata.

🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# Search for prepare_participant_activity_data function implementation
ast-grep --pattern 'def prepare_participant_activity_data($$$)'

# Search for agg_participant_activity_performance function implementation
ast-grep --pattern 'def agg_participant_activity_performance($$$)'

# Look for any error handling or data validation in these functions
rg -A 5 "raise|error|exception|assert|validate|check" --type python

Length of output: 235


Script:

#!/bin/bash
# Search for the functions in any file
rg -A 10 "def prepare_participant_activity_data"

# Search for the aggregation function
rg -A 10 "def agg_participant_activity_performance"

# Look for error handling patterns
rg -A 5 "raise|error|exception|assert|validate|check"

# Find all Python files that might contain these functions
fd ".py$"

Length of output: 125936


Script:

#!/bin/bash
# Look for the prepare_participant_activity_data function implementation
fd -e py -x cat {} | rg -A 10 "def prepare_participant_activity_data"

# Look for the agg_participant_activity_performance function implementation
fd -e py -x cat {} | rg -A 10 "def agg_participant_activity_performance"

# Look for any data validation in these files
fd -e py -x cat {} | rg -A 5 "if.*empty|if.*None|if not|raise|assert"

Length of output: 8692

packages/prisma/src/prisma/migrations/20241223092511_participant_activity_performance/migration.sql (1)

20-26: Review CASCADE deletion impact.

The ON DELETE CASCADE behavior on all foreign keys means that deleting a participant or activity will automatically remove all their performance records. Verify if this aligns with data retention requirements.

apps/analytics/src/__init__.py (1)

1-3: Avoid wildcard imports.

Wildcard imports can lead to namespace pollution and make it difficult to track which names are present in the namespace. Import only the necessary components explicitly.

🧰 Tools
🪛 Ruff (0.8.2)

1-1: from .modules import * used; unable to detect undefined names

(F403)


2-2: from .notebooks import * used; unable to detect undefined names

(F403)


3-3: from .scripts import * used; unable to detect undefined names

(F403)

@uzh-bf uzh-bf deleted a comment from coderabbitai bot Dec 23, 2024
Copy link

coderabbitai bot commented Dec 23, 2024

Important

Review skipped

More than 25% of the files skipped due to max files limit. The review is being skipped to prevent a low-quality review.

132 files out of 222 files are above the max files limit of 75. Please upgrade to Pro plan to get higher limits.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@sjschlapbach sjschlapbach changed the title feat: add learning analytics feat: add initial version of learning analytics computation and dashboard illustrations Dec 23, 2024
@sjschlapbach sjschlapbach merged commit d14d9dc into v3 Dec 23, 2024
32 of 34 checks passed
@sjschlapbach sjschlapbach deleted the v3-analytics branch December 23, 2024 19:57
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
3.5% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants