Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: Developer dashboard for tracking stdlib ecosystem build failures #64

Closed
6 tasks done
Rec0iL99 opened this issue Mar 28, 2024 · 3 comments
Closed
6 tasks done
Labels
2024 2024 GSoC proposal. rfc Project proposal.

Comments

@Rec0iL99
Copy link

Rec0iL99 commented Mar 28, 2024

Full name

Joel Mathew Koshy

University status

Yes

University name

University of Windsor

University program

Master of Applied Computing

Expected graduation

September 2024

Short biography

I am a master's student at the University of Windsor, Ontario, Canada. After completing my bachelor's degree in India I moved to Canada last year to pursue a master's degree. I specialize in building full-stack web applications using tools such as React, JavaScript/TypeScript, Node.js, PostgreSQL, Redis, and GraphQL. As part of my academic journey, I have completed relevant courses such as Advanced Software Engineering, Advanced System Programming in C, Advanced Database Topics, Data Structures and Algorithms, Operating Systems, Theory of Computation, and Internet Applications and Distributed Systems. Additionally, I am an active contributor to the ESLint open-source project, a crucial tool in the JavaScript ecosystem. I contribute by addressing GitHub issues, handling pull requests, and debugging problems that software developers encounter while using the tool.

Timezone

Eastern Daylight Saving Time, Toronto, ON (GMT-4)

Contact details

email:koshy21@uwindsor.ca,email:joelmathewkoshy@gmail.com,github:Rec0iL99,gitter:@rec0il99-5e378041d73408ce4fd8811e

Platform

Mac

Editor

My preferred code editor is VSCode, and I favour it for its lightweight and extensive nature. The ability to customize your editor by building your own extension or downloading one makes VSCode a favourite. A code editor that has caught my eye recently is Zed, and I'm exploring the editor more since VSCode has started showing performance problems when I open large projects.

Programming experience

I took a few programming courses in C++ during high school and became interested in programming, which led me to pursue a career in software development. For my first software project, I built an Android app for my church to maintain contact records of its members. During college, I dedicated most of my time to developing mobile apps. It wasn't until one of my professors needed a website that I began exploring JavaScript. Since then, I have been developing web apps in React and JavaScript, and I still thoroughly enjoy it. I'm currently learning Rust as a new programming language since most projects in the JavaScript ecosystem have started adopting Rust into parts of their projects.

JavaScript experience

Since 2020, I have been programming in JavaScript. Recently, I began diving into the advanced features of JavaScript through my work with ESLint. JavaScript is indeed a fascinating language, especially because of its flexibility. From websites and mobile apps to desktop apps and command-line tools, you can build virtually anything with JavaScript. One of my favourite features of JavaScript is the spread operator. For example, you can natively merge two arrays in JavaScript by using the spread operator.

const first = [1, 2, 3, 4]
const second = [5, 6]

// [1, 2, 3, 4, 5, 6]
console.log([...first, ...second])

Destructing objects in JavaScript is also something I enjoy using since it makes code more readable.

// a sample React component
function Example({ name }) {
  // the default argument of props to a functional component was destructed and name was retrieved 
}

Node.js experience

I have a good knowledge of Node.js, as I have been developing backend APIs using Node.js and Express.js for a while now. One project that I built using Node.js is CodeRoyale, a coding contest platform. This platform enables users to create rooms, invite friends using generated links, form teams, and engage in real-time coding competitions. Another project that I'm proud of is Grupo, which allows developers to create real-time chat rooms like Gitter.

C/Fortran experience

I recently completed a course in Advanced System Programming, where I had the opportunity to work with C and Linux system calls. As part of my class project, I implemented a socket server that allows multiple clients to connect to the server and perform operations such as running bash commands, requesting files from the server, and downloading files from the server as compressed .tar.gz files. I had a lot of fun taking this course, and I'm proud of what I accomplished throughout it.

Interest in stdlib

I was searching for GSoC organizations to contribute to during the GSoC period, and I came across stdlib. Previously, people used Python for data visualization, and achieving this natively in web browsers used to be difficult, if not impossible. Now, this can be achieved easily using a library like stdlib, which provides data visualization APIs, something that fascinates me. The library also offers other utility functions for data manipulation, math operations, string manipulation, etc.

Version control

Yes

Contributions to stdlib

stdlib-js/stdlib#1851

Goals

Technology Stack

I aim to build the developer dashboard with minimal tooling and tools that the core maintainers are already familiar with to ensure easy maintenance post-GSoC and to require minimal ramp-up time to get up and running. I also aim to make the stdlib community well aware of the development status of the dashboard over the summer, such as creating a tracking issue, etc.

Frontend

Technology/Library Info/Reason
React + Vite (JavaScript) Vite (React) is known for it's efficient build process and performance. After having a chat with one of the mentors I realized going with JavaScript over a typed system like TypeScript would be a better choice because the core stdlib is written in JavaScript and choosing JavaScript would also help contributors who contribute to the core stdlib library, switch over and easily contribute to the dashboard, thus increasing the pool of potential contributors to the dashboard repo.
react-router-dom Makes routing possible for React components and dynamic routing for eg. /modules/:module-name. Plus, the stdlib website already uses this library, so the maintainers are already familiar with this library.
@tanstack/react-table Helps in building complex data tables in React with ease.
Tailwind CSS CSS framework for building the UI, which has been used in stdlib before. Therefore, the core maintainers are familiar with this technology.
Nivo Chart component library for React with active maintenance and quite popular among React devs

Backend

Technology/Library Info/Reason
Fastify The stdlib website uses Fastify already for the server backend and the maintainers are familiar with it. Note: I haven't built a backend using Fastify before, but I have experience with Express.js. I believe the skills I have gained with Express.js are transferrable to building the backend with Fastify.
pg A lightweight Node.js client for PostgreSQL databases, most used by Node.js developers with active maintenance and a good open source community.

Misc

Technology/Library Info/Reason
ESLint Since we won't be using a typed system like TypeScript, using a popular linter like ESLint will help with finding problems or bugs in JavaScript code faster.

Proposed Methodology

Note: All the design prototypes shared here will not reflect the end product as is. There will be elements added and design changes according to the requirements of the mentors and stdlib community.

Index Page

This would be the front page that is seen by the users while launching the domain (for eg. stdlib.io/dashboard) which would have a data table containing information on the various repositories hosted by the stdlib GitHub organization.

Screenshot 2024-03-29 at 12 32 44 AM

Screenshot 2024-03-28 at 3 58 53 AM

Screenshot 2024-03-29 at 2 05 37 AM

Table columns

  • The first column of the table (Repository ) lists the names of the repos hosted by stdlib. The list can be sorted to fetch in descending order (alphabetically). The repo name, archived data, public or not, and URL data will be retrieved from the PostgreSQL database.

  • The Build Status column shows the current build status of the repo. It will be a green tick if the build was successful, that means all the workflow jobs had a conclusion of status:pass. The red cross indicates that the build is currently failing.

  • The PRs and Issues columns in the table are all GitHub pull requests and issues that are open in the central stdlib repository (if the repository or module is part of it). For repos that have prs and issues open in the central repository we can use GitHub search queries to get issues and prs related to that specific stdlib module.

    for example,

    https://github.com/stdlib-js/stdlib/pulls?q=is%3Apr+is%3Aopen+math%2Fbase%2Fspecial%2Fsin+in%3Atitle+

    This GitHub search query fetches all prs that have the math/base/special/sin module in the title. We can do this
    similarly for issues by specifying is:issue

    Note: By default GitHub searches a keyword in all places titles, comments, descriptions etc. By using in:title the
    search is limited to only the title of the pr or issue (GitHub docs).

  • Data for columns Latest commit, NPM version, NPM downloads, Node version, License, Tarball size, Latest Tag, and Latest GitHub event will be retrieved from the PostgreSQL database.

  • The Priority:Filter(Urgent | High | Low | Normal) column is for prs and issues that have the priority label set on them. This can help maintainers/contributors filter priority issues or prs to work on. The filter for urgent, high, low, or normal can be set using the dropdown beside the search bar.

    for example, https://github.com/stdlib-js/stdlib/issues?q=is%3Aopen+label%3A%22priority%3A+High%22+math%2Fbase%2Fspecial%2Fgammaln+in%3Atitle+

  • The Needs Review column is for issues or prs that need review. These are issues and prs that have the label Needs Review label set. This will help maintainers triage issues and pull requests faster.

  • The columns dropdown enables the user to select what columns in the data table should not be shown.

Pagination

stdlib hosts more than 3500 repositories, and fetching all 3500+ repo data at once from the backend API will not be ideal. For this, the index page will utilize offset-based pagination with infinite scroll.

Working: When the page first renders, a request is sent to the backend API for a list of repositories and their respective data. The backend API responds by sending a list of 10 (this number can change) repository data from the database, which contains over 3500 repositories. This data is then rendered to the user in the data table. As the user scrolls and reaches the end of the page, a new request is sent to the backend API for the next list/page of repository data. This cycle continues until there are no more repository data to be sent from the backend API. Each request from the frontend will contain an offset value, which is the number of rows to skip from the previous page.

Example SQL query,

select r.*
from stdlib_github.repository r
order by r.name
limit 10 offset 0
Page Offset
1 0
2 10
3 20
.. ..

offset = (page - 1) * 10

This query will retrieve 10 repos and their data from the database. The equation to calculate the offset value for each page is mentioned above.

Search

For the first iteration of the dashboard, we could include search filters for repo names and build status.

For example, if a maintainer wanted to search for a repo that contains the keyword math, the search would return all the repos that contain math in the repo title.

> math
---Search Results
- math-base-special-atan2
- math-base-tools-evalrational-compile
- math-base-tools-evalrational-compile-c
- ...

Example SQL query,

select r.*
from stdlib_github.repository r
where r.name like '%math%'

The user can also search based on build status.

For example, specifying build:fail will return repos that have a build status of failing.

> math build:fail
---Search Results
- math-base-tools-evalrational-compile
- ...

This search filter can also be combined with the keyword search like the example above.

Filtering

Clicking on the filter button beside the search bar will open up a modal/dialog that allows the user to filter repositories based on their status, such as private, public, or archived. For example, selecting "public" from the dropdown select instead of "all" will only render public repositories in the data table. Optionally the user will be able to select public or private repos that are archived from the same filter modal.

Data Refresh

The user will be able to refresh the page to retrieve the latest data from the database by clicking on the refresh icon button in the header. This triggers a new network request to the backend API to obtain the latest details, and the number of pages (pagination) won't be lost but will be preserved.

Sorting button

The user will be able to sort the columns in the data table by clicking on the sort icon button. For example, clicking on the sort button in the build status column will bring the packages that have builds failing to the top of the data table.

Individual Repository Page

Screenshot 2024-03-28 at 3 43 06 AM

Screenshot 2024-03-28 at 3 43 39 AM

Screenshot 2024-03-28 at 3 44 09 AM

Screenshot 2024-03-28 at 3 44 38 AM

Screenshot 2024-03-28 at 3 45 00 AM

Screenshot 2024-04-02 at 3 03 28 AM

Screenshot 2024-04-02 at 3 03 54 AM

  • Individual repo pages can be accessed by providing the repo name in the URL.

    stdlib.io/dashboard/repositories/:repo-name

    example, stdlib.io/dashboard/repositories/array-base

  • The header shows the current build status as failing or passing to notify the user of any failing workflows. The globe icon shows if the repo is public or private. Clicking on the GitHub icons redirects the user to the respective GitHub repo page.

  • This page shows quick facts on the repository such as the current version, license, node version, tarball size and published at.

  • The page contains tabs for quick navigation to view workflows, tags, commits, events, NPM downloads, and metrics.

  • The metrics tab will contain a heatmap chart that shows stats such as downloads, build failures, stars, etc for the whole calendar error (inspiration from GitHub contribution graph). Hovering over a point in the heatmap will show the stat count. The user will be able to filter stats based on version and year too using the dropdowns.

  • Optionally, (if the mentors are not a fan of the heatmap) the metrics tab would show a line chart showing stats for the calendar year.

  • If time permits implementation for a line chart tab for the user to compare stats between different versions of the package could be done (inspiration from npm trends). This will provide insights to the user such as which version of the package is downloaded the most, etc.

Metrics Dashboard

Screenshot 2024-04-02 at 3 18 05 AM

Based on feedback from mentors, I've created this page (a skeleton) to display total usage metrics. I believe that if and when the proposal is accepted and I gain access to the raw data on usage metrics in the private database, I will be able to design a better way to display the usage metrics.

Figma Doc
Database Schema Visualizer

Why this project?

I have been working closely with the TSC team maintaining ESLint, and keeping track of issues, PRs, releases, build events, etc., can be frustrating sometimes, as the team may lose track. I can only imagine how difficult this is for the core maintainers of stdlib, where the GitHub organization hosts more than 3500 repositories. Having a developer dashboard to manage and keep track of all these repos in one place for the maintainers and making the process of maintaining stdlib easier is something that excites me to contribute to this project. Developing this dashboard involves tailoring it to the specific needs of the stdlib maintainers and community. I believe this project will be a great help to the maintainers and community of stdlib.

Qualifications

This project requires experience working with React, Node.js and PostgreSQL, of which I have utilized in my projects CodeRoyale and Grupo, as mentioned earlier. In these projects, I implemented various features such as building API routes, pagination, writing complex SQL queries, fetching data from the GitHub API, etc. These skills will enable me to effectively implement this proposal on time.

Prior art

This project has been implemented before for the NPM GitHub organization for tracking build failures.

https://npm.github.io/statusboard

Commitment

By the time development starts, I will be in my last semester with a single course to complete. Hence, I will be able to allocate 25 hours/week for this project. If my proposal is accepted, I will request my academic program coordinator to substitute my course with the developer dashboard project, in which case I can work more than 35 hours/week.

Schedule

Assuming a 12 week schedule,

Community Bonding Period:

  • Get familiar with the PostgreSQL database schema
  • Polish the proposal and finalize the UI mockups after discussion with the mentors and community
  • Discuss with the mentors on how the code should be structured and other execution steps
  • Get the development environment ready
  • Initialize the starting code for the frontend and backend

Week 1, 2, 3:

  • Start with building the backend API routes specifically for the index page
  • Build out the react-table component for displaying repository data
  • Implement infinite scroll pagination
  • Build the frontend index page along with needed components and establish a connection with the API

Week 4 & 5:

  • Implement search for the index page
  • Implement APIs for individual repository pages
  • Implement individual repository page in frontend by integrating with API

Week 6 & 7: (midterm)

  • Midterm review by the mentors
  • Implement the Tags tab as mentioned in the UI mockup
  • Implement Workflows tab of the individual repo page

Week 8 & 9:

  • Implement Events tab of the individual repo page
  • Implement NPM downloads tab of the individual repo page
  • Implement Commits tab of the individual repo page

Week 10 & 11:

  • Implement the package metrics tab of the individual repo page
  • Implement the metrics dashboard page

Week 12 (final week):

  • I prefer to keep this as the buffer week to implement any tasks spilled over
  • Document the code
  • Deploy the web app

Notes:

  • The community bonding period is a 3 week period built into GSoC to help you get to know the project community and participate in project discussion. This is an opportunity for you to setup your local development environment, learn how the project's source control works, refine your project plan, read any necessary documentation, and otherwise prepare to execute on your project project proposal.
  • Usually, even week 1 deliverables include some code.
  • By week 6, you need enough done at this point for your mentor to evaluate your progress and pass you. Usually, you want to be a bit more than halfway done.
  • By week 11, you may want to "code freeze" and focus on completing any tests and/or documentation.
  • During the final week, you'll be submitting your project.

Related issues

#4

Checklist

  • I have read and understood the Code of Conduct.
  • I have read and understood the application materials found in this repository.
  • I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
  • I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
  • The issue name begins with [RFC]: and succinctly describes your proposal.
  • I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.
@Rec0iL99 Rec0iL99 added 2024 2024 GSoC proposal. rfc Project proposal. labels Mar 28, 2024
@Planeshifter
Copy link
Member

Thanks for your comprehensive proposal, which demonstrates that you put thought and effort into the design for the developer dashboard! Your proposal outlines a thoughtful selection of technologies for the frontend and has clear, well-defined goals.
One suggestion of mine would be to think about some of the challenges you may encounter (for example, would need to be mindful of the GitHub rate limits when doing GitHub search queries to obtain the number of open issues and PRs for the various tools) and what contingency plans you would have in case of falling behind schedule, especially given your academic commitments during the summer.
Your experience aligns well with the project requirements and you have demonstrated your open-source chops through your contributions to ESLint, which are much appreciated!

@kgryte
Copy link
Member

kgryte commented Apr 1, 2024

Thanks for sharing a draft of your proposal, @Rec0iL99! A few comments:

  • I appreciate some of your ideas for the drill down pages. This information is similar to a view you might through viewing a repositories "insights" page. I think one thing we'd be particularly interested in is seeing metrics over time. E.g., download counts across the calendar year broken out by version. Or build status failures over time. Or stars and forks.
  • One thing we don't have a great insight into are our total usage metrics. While we've collected a good amount of raw data, we don't have a readily available sense regarding general adoption across our entire ecosystem (e.g., which stdlib packages are currently "trending", etc). It may be worth considering how you might display that information.
  • The priority filter idea is an interesting idea, so long as we have ways of linking issues of the main project repository to standalone repos. While the issue title is one way, this doesn't readily scale for issues affecting two or more packages.
  • I second Philipp's concern regarding the GitHub API rate limit. Preferably, we'd be able to run more complex search/filter queries against our PG database directly.

@Rec0iL99
Copy link
Author

Rec0iL99 commented Apr 1, 2024

Thank you @Planeshifter @kgryte for the feedback. I'll incorporate the feedback into my proposal for submission tomorrow.

@kgryte kgryte closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024 2024 GSoC proposal. rfc Project proposal.
Projects
None yet
Development

No branches or pull requests

3 participants