Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: Add support for the multivariate normal distribution #70

Closed
6 tasks done
BrianP2002 opened this issue Mar 30, 2024 · 2 comments
Closed
6 tasks done

[RFC]: Add support for the multivariate normal distribution #70

BrianP2002 opened this issue Mar 30, 2024 · 2 comments
Labels
2024 2024 GSoC proposal. rfc Project proposal.

Comments

@BrianP2002
Copy link

Full name

Lin ha

University status

Yes

University name

University of Wisconsin-Madison

University program

Computer Science, Mathematics, Data Science

Expected graduation

2025 Spring

Short biography

I am pursuing a bachelor's degree in mathematics, computer science, and data science at the University of Wisconsin-Madison. I am familiar with Java, C/C++, JavaScript, Python, R, and a little bit about HTML/CSS. My main interests in CS are arithmetic algorithms, cryptology, and optimization problems.

Timezone

US Central Time(UTC−06:00)

Contact details

email: halinbr2002@gmail.com, github: https://github.com/BrianP2002, phone: +1 6089773640

Platform

Linux

Editor

My first choice of code editor is VSCode, and my second choice is Vim.
For VSCode, the main reason I like it is because it has an abundant and mature ecosystem for most languages and tools. It is easy to set up an eligible working environment in a short period with mainstream toolkits embedded, like git, docker, etc. Also, there are many beautiful themes I love (especially monokai).
For Vim, I like it because it is really easy to call up and use. This lightweight editor saved me a lot of time when checking logs and outputs on the Linux VM.

Programming experience

Python: I use Python to do most of the Machine learning jobs like training NLP models through SpaCy, handling big data analysis through Hadoop-related software(Cassandra, Spark, Kafka), and writing some code to assist my math homework like checking if a matrix is totally unimodular and simulate Feistel cipher in CFB mode. Also, I am familiar with Django, I am currently working on a project about helping patients understand doctor's notes which takes Django as the backend.
C/C++: I used to implement a simple shell using C which supports pipe, run commands in detached mode, and output redirection. I also implemented several other school projects including a simple automatic garbage memory collection, multi-thread merge sort, etc. Sometimes I will use C++ as a substitution for Python to implement some code to assist my math homework like implementing the simplex method using Tabular to solve LP problems.
Java: I use Java to implement a personalized version of iperf to test internet connectivity and performance on the Mininet. I am also familiar with casting and data encapsulation, multithreading, network communication through java, etc.
JavaScript: I will state this part in detail in the next section.
R: I learned R mostly from classes like data modeling. I am familiar with using R to plot various kinds of statistical diagrams, doing hypothesis tests, and evaluating regression models.

JavaScript experience

During high school, my partner and I constructed a game bot AI(generals.io) for a research project. I was in charge of data collection using a crawler to gain gaming replay data in JavaScript and did data transformation including slicing and replaying gamer's contest. For now, I am working on a project that developed a Chrome extension to help people read doctor's notes. I helped with the front end which let people directly select and highlight content they want to learn about using mainly javaScript cooperating with some Chrome-provided API.
The one feature of JavaScript I liked most is its flexibility. As I stated above, it can be used in various environments and jobs. The reason behind this is there is a very mature ecosystem related to JavaScript, which makes it a very welcoming language.
The thing I dislike the most is that JavaScript has a very blurry and loose typing system, which brought me a lot of trouble and confusion when I learned JavaScript. I prefer a more strict and explicit typing system rather than a blurry one.

Node.js experience

I am not very experienced in using Node.js, but I am familiar with the basic concepts and usage of it.

C/Fortran experience

I am experienced in C. C/C++ are the first two programming languages I learned. I took the Computer Organization and Operating System courses which all use C programming heavily. Thanks to these lectures, I am familiar with C programming's memory structure, multithreading, multiprocessing, data encapsulation, etc.

Interest in stdlib

As a student studying math, I really appreciate the purpose and goal of projects like stdlib. The existence of these libraries makes our life easier a lot. For instance, I don't need to handwrite all the code from scratch to simulate several random variables in various distributions. Therefore, I'd like to help develop and make this library better.

Version control

Yes

Contributions to stdlib

I've not yet contributed to stdlib, but I believe this is going to be a great time to start working on contributing something.

Goals

Basic Expectation: implement the multivariate normal distribution just like all other implemented, including but not limited to the following functions:

  • PDF: Probability density function.
  • logPDF: Log of the probability density function.
  • CDF: Cumulative density function.
  • logCDF: Log of the probability density function.
  • MGF: Moment generating function for multivariate normal distribution.
  • Entropy: Compute the differential entropy of the multivariate normal.
  • mean/
    One should be able to create an RV in multivariate normal distribution by x=multiNormal([a_1,b_1],[a_2,b_2],...) or giving a 2-D covariance matrix byx=multiNormal([[c_11,c_12,...],[c_21,c_22],...])
    Also, this multivariate normal distribution should cooperate with the plot function that generates the expected diagrams like the existing implementation of other distributions.

Bigger Picture
Beyond the basic expectation, I will consider implementing several other multivariate distributions like multivariate hypergeometric/exponential/Bernoulli distribution.

Why this project?

I am deeply interested in contributing to this project, driven by my strong desire to apply my mathematical background to a math-related open-source library. With a foundation in both computer science and mathematics, especially in the realm of probability, I find this project to be a perfect match for my skills and interests. My academic and practical experiences have equipped me with a robust understanding of mathematical concepts and their computational implementations, making me keenly aware of the challenges and opportunities in developing mathematically rigorous and efficient algorithms. I am eager to contribute by leveraging my knowledge in probability and mathematical analysis. Joining this project represents a unique opportunity for me to merge my passion for mathematics with my computer science expertise, contributing to a library that is pivotal in advancing open-source, math-centric computing solutions.

Qualifications

I have taken college-level probability theory and stochastic processes courses at the university. I am also doing research directed by a professor in statistics, mostly about stochastic processes and probability distribution with multivariable. Also, I am familiar with tools like Wolfram Alpha, random distribution package in R, and writing personalized code (mostly in Python) for solving problems in Linear programming, cryptology, group theory, etc. Overall, I have a matching mathematical background and understanding of the demands of target users, which make me an eligible candidate for this project.

Prior art

Scipy has some decent implementation of multivariate normal distribution.
R also had a package that implemented multivariate normal distribution.
Julia also supports the multivariate normal distribution.

Commitment

I will finish all my final exams in mid-May, and I can work about 20-30 hrs/week for 12 weeks. I will be located in the US Central timezone and will quickly response to all the messages and video meetings.

Schedule

Assuming a 12 week schedule,

  • Community Bonding Period:
    Task: Do research about similar implementations in other languages, write user stories, and turn them into the feature list.
    Goal: Summarize the implementation details, write out pseudocode, and list several potential use cases.

  • Week 1-2:
    Task: Write the first draft, do some basic testing, communicate with mentors, and receive feedback.
    Goal: Have something runnable and meaningful to do a demo (maybe fragile)

  • Week 3-4:
    Task: Revise the first draft code, examine potential bugs, do more tests
    Goal: No obvious bugs, concise calling method,

  • Week 5-6:(midterm)
    Task: Optimize the code, communicate with the community, and think about if there is anything expandable.
    Goal: deliverable code that is ready to be used and tested by a small group of potential users.

  • Week 7-8:
    Task: have an alpha test with potential users, listen to their feedback(including bug reports, feature requirements, potential improvement, and optimization), and revise the code

  • Week 9-10:
    Task: have a beta test with a wider range of developers. Meanwhile, revise the code by doing some further optimization to the original code to speed up and make some modifications to align with users' requirements.

  • Week 11:
    Task: Final round of the test, actively communicate with the stdlib developing community, asking for their suggestion about the pre-publish stage.
    Goal: code that is ready to be published.

  • Week 12:
    Task: Submit the code and wrap up.

  • Final Week:

Notes:

  • The community bonding period is a 3 week period built into GSoC to help you get to know the project community and participate in project discussion. This is an opportunity for you to setup your local development environment, learn how the project's source control works, refine your project plan, read any necessary documentation, and otherwise prepare to execute on your project project proposal.
  • Usually, even week 1 deliverables include some code.
  • By week 6, you need enough done at this point for your mentor to evaluate your progress and pass you. Usually, you want to be a bit more than halfway done.
  • By week 11, you may want to "code freeze" and focus on completing any tests and/or documentation.
  • During the final week, you'll be submitting your project.

Related issues

No response

Checklist

  • I have read and understood the Code of Conduct.
  • I have read and understood the application materials found in this repository.
  • I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
  • I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
  • The issue name begins with [RFC]: and succinctly describes your proposal.
  • I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.
@BrianP2002 BrianP2002 added 2024 2024 GSoC proposal. rfc Project proposal. labels Mar 30, 2024
@Planeshifter
Copy link
Member

Planeshifter commented Mar 31, 2024

Thanks for your proposal! To strengthen it, I would suggest to highlight your plans for integrating this new distribution with the existing stdlib codebase, especially since it relies on multi-dimensional arrays, which are not part of the native JavaScript language.

It's good that you reference prior art in other languages such as R and Julia, but it would also be beneficial to discuss more in-depth how the multivariate normal distribution will be implemented by you or is implemented in these reference implementations. For example, how will the covariance matrix be handled? What numerical methods will be used? This way, aside your highly relevant academic experience and achievements, you could further demonstrate your experience with numerical computing and assure the reviewers that you have the necessary skills to pull off this project.

@kgryte
Copy link
Member

kgryte commented Apr 1, 2024

@BrianP2002 Following up on Philipp's comments, I'd also like to add

  1. As part of our application requirements, for any application to be considered, a contributor must land a patch to the main project repository. If this requirement is not fulfilled, we will not consider the respective application.
  2. In your timeline, you mentioned user feedback. How do you plan to acquire such feedback? Who is your target audience? My sense is that you are highly unlikely to get substantial feedback or idea a sufficient body of potential users, especially given JavaScript's current standing as a language for scientific computation. In which case, if you don't have a clear user feedback plan, I suggest expanding the technical activities of your proposal accordingly, potentially to include other multivariate distributions or higher level functionally which relies on the multivariate normal distribution.

@kgryte kgryte closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024 2024 GSoC proposal. rfc Project proposal.
Projects
None yet
Development

No branches or pull requests

3 participants