Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: achieve ndarray API parity with built-in JavaScript arrays #48

Closed
6 tasks done
headlessNode opened this issue Mar 20, 2024 · 9 comments
Closed
6 tasks done
Labels
2024 2024 GSoC proposal. rfc Project proposal.

Comments

@headlessNode
Copy link
Member

headlessNode commented Mar 20, 2024

Full name

Muhammad Haris

University status

Graduated

University name

Air University

University program

Mechanical Engineering

Expected graduation

Graduated 2022

Short biography

I am a Mechanical Engineering graduate and embarked on a journey of self-learning programming around June 2023. Since then, I have delved into various programming languages, including C++ and Assembly Language, while my primary focus has been on web development. I am Proficient in HTML, CSS, and JavaScript, and continuously expanding my skill set with the goal of learning React and back-end technologies in the future. Through personal projects, open source contributions and dedicated study, I aim to transition into a career in web development by the end of 2024.

Timezone

PKT (UTC + 5)

Contact details

email: harriskhan047@outlook.com, github: headlessNode

Platform

Linux

Editor

My preferred code editor is Visual Studio Code. I use it primarily because it's widely used and recommended in the programming community. While I haven't explored other options extensively, I find VSCode intuitive and feature-rich, which suits my needs for coding projects. However, I'm open to exploring other editors in the future if the need arises for a specific project.

Programming experience

I embarked on my programming journey with The Odin Project, an open-source platform tailored for self-learners interested in web development. Through this project-based learning program, I tackled various projects of differing complexities to gain practical experience.

Below are several projects that mark milestones in my learning journey:

  • Battleship: A Battleship game, the aim of this project was implementing Test-Driven Development (TDD) principles to write robust and efficient code.
    Links: Repo, Live

  • Task tracker: A To-do List App, the aim of this project was to reinforce knowledge of local storage, streamline code bundling with Node.js and Webpack, and implement SOLID principles.
    Links: Repo, Live

  • Weather app: A simple weather app, the aim of this project was to familiarize myself with API integration and asynchronous operations , Promises, or async/await syntax to ensure proper sequencing and error handling.
    Links: Repo, Live

  • Tic-Tac-Toe: A tic-tac-toe game, the aim of this project was to get familiar with code modularization and adherence to SOLID principles.
    Links: Repo, Live

These projects instilled in me the importance of writing clean, maintainable code and adhering to best practices in software development. Additionally, I've completed several other projects like to deepen my understanding of fundamental data structures like Trees, Hash-maps, and Linked Lists. For more projects, you can visit my GitHub profile.

JavaScript experience

In terms of JavaScript experience, I've gained practical knowledge through hands-on projects, as showcased above. I've completed projects that cover fundamental data structures like Hashmaps, LinkedLists, Binary Search Trees, and Graphs, which have deepened my understanding of JavaScript's versatility.

One of my favorite features of JavaScript is its asynchronous nature, which enhances application responsiveness. This feature is particularly powerful when handling tasks like fetching data or processing user input without freezing the UI. It provides a smoother user experience by executing tasks concurrently.

In terms of JavaScript's weaker points, one aspect that can be challenging is its reliance on callbacks. Callbacks can lead to famously dreaded, callback hell or deeply nested code structures, making code difficult to read and maintain. This issue not only affects code readability but also raises security concerns, as callbacks can give external entities control over the program's execution flow when fetching data from external APIs. However, with the introduction of Promises and Async/await syntax, this issue has been mitigated to some extent, providing cleaner and more manageable asynchronous code.

Node.js experience

I have some experience with Node.js primarily in the context of setting up build tools and development environments for front-end projects. I've used Node.js to install and configure libraries such as ESLint, Prettier, Webpack and more. Libraries which are essential for maintaining code quality and optimizing the development workflow. While my experience with Node.js is currently more focused on front-end development, I'm eager to expand my skills and explore its back-end capabilities as I progress in my learning journey.

C/Fortran experience

I don't have specific experience with C or Fortran but I do have some familiarity with C++. During my bachelor's degree, I utilized C++ for projects that involved solving differential equations. Additionally, I've primarily used C++ for tackling challenge questions on platforms like LeetCode. While my experience in C++ is more focused on algorithmic problem-solving, I'm open to exploring other programming languages and deepening my understanding of low-level programming concepts.

Interest in stdlib

What interests me about stdlib is the aspect of ndarrays. While I haven't yet had the opportunity to use stdlib for my projects, the concept of ndarrays immediately caught my attention. I have experience with simple 2-D arrays but I wasn't familiar with ndarrays until I saw this project achieve ndarray API parity with built-in JavaScript arrays on stdlib page. As I further researched this domain, I found the implementations of ndarrays and various methods associated with it really interesting, particularly the methods for efficiently iterating through the ndarrays of various shapes and strides.

Moreover, aside the technical aspects, I appreciate the welcoming and warm open-source community surrounding stdlib. I've had the pleasure of engaging in discussions about the project on the Element channel, and I found the community to be incredibly helpful and engaging.

Version control

Yes

Contributions to stdlib

Open pull requests:

Merged pull requests:

Goals

ndarrays (n-dimensional arrays), offer data structures that are crucial for many data processing tasks. The proposed project aims to enhance the functionality and usability of ndarrays, by achieving parity with built-in JavaScript arrays with focus on efficient ndarray operations for contiguous and non-contiguous data.

  1. Achieving Parity with Built-in JavaScript Arrays:
  • This involves implementing essential array methods and functionalities within stdlib's ndarray module, ensuring seamless compatibility and ease of transition for developers accustomed to working with JavaScript arrays.

  • The project will involve achieving parity with following JavaScript array methods but may extend to include others:

      1.1. forEach
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns an output array of the same shape.
                         Reduction: Does not perform reduction; applies a function to each element.
                         Suitability: Suitable for ndarrays; iterates over each element without changing the array's structure.
      1.2. map:
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns an output array of the same shape.
                         Reduction: Does not perform reduction; creates a new array with transformed elements.
                         Suitability: Suitable; transforms each element into a new value, maintaining the array's structure.
      1.3. reduce:
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns a reduced array (single value or array with fewer dimensions).
                         Reduction: Performs reduction by applying a function to each element and accumulating the result.
                         Suitability: Suitable for ndarrays; reduces the array to a single value or a lower-dimensional array.
      1.4. reduceRight:
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns a reduced array (single value or array with fewer dimensions).
                         Reduction: Performs reduction by applying a function to from right to left and accumulating the result.
                         Suitability: Suitable for ndarrays.
      1.5. filter
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns a flattened array containing all filtered elements.
                         Reduction: Can be considered a form of reduction.
                         Suitability: Suitable; selects elements based on a condition.
      1.6. indexOf
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns an array of indices indicating the position of the element, -1 if not found.
                         Reduction: Can be considered a form of reduction; returns an array of indices.
                         Suitability: Suitable for ndarrays; allows searching for position of elements.
      1.7. lastIndexOf
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns an array of indices, the position of the last occurrence, -1 if not found.
                         Reduction: Can be considered a form of reduction; returns an array of indices.
                         Suitability: Suitable for ndarrays; allows searching for elements from the end.
      1.8. find
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns a single value (element satisfying the condition).
                         Reduction: Does not perform reduction; returns the first matching element.
                         Suitability: Suitable for ndarrays; finds the first element that satisfies a condition.
      1.9. findIndex
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns an array of indices indicating position of element if found.
                         Reduction: Can be considered a form of reduction.
                         Suitability: Suitable for ndarrays.
      1.10. findLast
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns a single value (element satisfying the condition).
                         Reduction: Does not perform reduction; returns the last matching element.
                         Suitability: Suitable for ndarrays.
      1.11. findLastIndex
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns an array of indices of last occurrence indicating position of element if found.
                         Reduction: Can be considered a form of reduction.
                         Suitability: Suitable for ndarrays.
      1.12. sort
                         Mutation: Mutates the input ndarray by sorting the elements.
                         Shape Consistency: Returns an array of the same shape.
                         Reduction: Does not perform reduction; rearranges elements in sorted order.
                         Suitability: Suitable for ndarrays; sorts elements along one or more dimensions.
      1.13. toSorted
                         Mutation: Mutates the input ndarray by sorting the elements.
                         Shape Consistency: Returns an array of the same shape.
                         Reduction: Does not perform reduction.
                         Suitability: Suitable for ndarrays.
      1.14. toReversed
                         Mutation: Mutates the input ndarray.
                         Shape Consistency: Returns an array of the same shape.
                         Reduction: Does not perform reduction.
                         Suitability: Suitable for ndarrays.
      1.15. fill
                         Mutation: Mutates the input ndarray by filling it with a specified value, overwriting existing elements.
                         Shape Consistency: Does not change the shape of the array; the size and dimensions remain the same.
                         Reduction: Not a reduction operation; it modifies the existing array by replacing its elements.
                         Suitability: Suitable for ndarrays.
      1.16. flat
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns a flattened array; changes the shape of the array to one dimension.
                         Reduction: Not a reduction operation.
                         Suitability: Suitable for ndarrays; simplifies processing or manipulation.
      1.17. flatMap
                         Mutation: Does not mutate the input ndarray.
                         Shape Consistency: Returns a flattened array; changes the shape of the array to one dimension.
                         Reduction: Not a reduction operation.
                         Suitability: Suitable for ndarrays.

  1. Performance Optimization:
  • Performance optimization is another key focus area for this project. This includes enhancing iteration efficiency, minimizing memory overhead, and other techniques to improve computation speed. These techniques could include but are not limited to employing multiple checks to determine if blocked iteration can be avoided or if shape and stride can be ignored without affecting the outcome etc, and efficient memory management will be explored and implemented to enhance the performance of ndarray operations.
  1. Error Handling and Documentation:
  • Clear and comprehensive error messages will be provided to aid developers in identifying and resolving issues more effectively. Additionally, documentation will be created or updated to provide detailed usage examples, explanations of method behaviors, and best practices, ensuring a smooth developer experience.

Why this project?

Since I have only experience working with simple 2-D arrays, the opportunity to work on ndarrays within the stdlib framework really excites me. The prospect of contributing to a project that focuses on enhancing the functionality and efficiency of multidimensional arrays in JavaScript is highly appealing to me.

Firstly, the ndarray module plays a crucial role in numerical computing and data analysis tasks. By improving the ndarray API and optimizing its performance, the developers can work more efficiently with complex data structures with ease.

Secondly, I am intrigued by the technical challenges associated with achieving ndarray API parity with built-in JavaScript arrays. This involves implementing various methods and functionalities to ensure that ndarrays offer the same level of flexibility as native arrays while also optimizing their performance.

Furthermore, I am excited about contributing to an active and vibrant community of developers allowing me to learn from others, share my insights, and collaborate on solving problems together. The opportunity to make meaningful contributions to a project with real-world applications is both fulfilling and motivating.

Qualifications

As a self-taught programmer with a background in Mechanical Engineering, I have sharpened my skills through hands-on projects and self-directed learning. While I may not have formal qualifications in specific areas like statistics, I have a solid foundation in Numerical Methods and Differential Equations, which I acquired during my studies in Mechanical Engineering. Also my experience and approach to learning make me well-suited to execute on my proposal for the following reasons:

  1. Programming Skills:
  • I have developed proficiency in languages like JavaScript, HTML, CSS, and C++. Through projects like the Battleship game, Weather app, Task trackr and various data structure implementations, I have demonstrated my ability to tackle complex programming challenges and write clean, maintainable code.

  • I have also completed several projects that required mathematical reasoning and problem-solving skills, such as implementing algorithms for differential equations and exploring fundamental data structures like Hashmaps, Linked Lists, and Binary Search Trees. These experiences have provided me with a solid foundation for tackling new challenges and adapting to different problem domains.

  1. Self-Directed Learning:
  • My journey with The Odin Project and other self-learning platforms has equipped me with the ability to learn new concepts and technologies independently. I am proactive in seeking out documentation, and other resources to deepen my understanding of relevant topics.
  1. Adaptability and Resourcefulness:
  • My background in Mechanical Engineering has taught me how to approach problems systematically, break them down into manageable tasks, and leverage available resources to find solutions.

I am committed to bridging any knowledge gaps through self-study and collaboration with domain experts. I am passionate about expanding my skill set and contributing meaningfully to projects.

Prior art

During my research to understand ndarrays and achieving ndarray API parity with built-in JavaScript methods, I came across NumPy's implementation of ndarrays. It offers an ndarray object along with a set of array manipulation functionalities like ndarray.reshape to reshape the array to a new shape, ndarray.flatten to flatten the array into a 1-D array, ndarray.sort to sort the elements of the array and many others.

While there isn't direct implementation of ndarray API parity with built-in JavaScript, exploring how these NumPy handles array manipulation, iteration, and performance optimization aided me in formulating a rough concept of how we can achieve our goals.

Commitment

Given my current status as a full-time learner in programming and web development, I don't have any other commitments, I can commit to full-time working hours (35 - 40 hours/week) on the project.

Schedule

Assuming a 12 week schedule,

  • Community Bonding Period: Actively engage with the stdlib community and mentors to establish rapport and to discuss finer details and clarify any doubts. Continue to familiarize with the ndarrays implementation and methods offered by stdlib. Research existing implementations and best practices for efficient iteration over non-contiguous data, focusing on how its handled in NumPy.

  • Week 1: Generate a list of built-in array API methods applicable for ndarrays, considering factors such as mutation, shape preservation, reduction, and suitability for ndarrays. Share research findings and proposed approach with the mentors. Begin implementations of custom kernels for forEach, map, and flat API. Draft documentation for each method.

  • Week 2: Focus on implementing forEach, map, and flat methods for ndarrays. Open PR for these methods. Continue documentation efforts. Seek feedback from mentors on effectiveness and suitability.

  • Week 3: Based on the feedback received, Iterate on the implementations and refine them. Start working on prototype implementations for reduce and reduceRight methods. Draft documentation for each method.

  • Week 4: Open PR from reduce and reduceRight methods. Share progress with mentors and gather feedback. Refine the implementation based on the feedback received. Start Implementation of indexOf and lastIndexOf methods.

  • Week 5: Focus on the implementation of indexOf and lastIndexOf. Continue drafting documentation efforts. Open PR for indexOf and lastIndexOf method implementations. Seek feedback and iterate over the implementations based on the feedback.

  • Week 6: (midterm) Seek feedback and evaluate progress made so far. Start implementation of findIndex and findLastIndex. Continue documentation efforts.

  • Week 7: Open PR for findIndex and findLastIndex implementations. Start working on implementing filter, find, and findLast.

  • Week 8: Open PR for filter, find and findLast methods. Continue documentation efforts. Begin working on implementations of sort and toSorted methods.

  • Week 9: Open PR for sort and toSorted methods. Gather feedback. Iterate over the implementations based on the feedback. Begin implementations of toReversed method. Continue documentation efforts.

  • Week 10: Open PR for toReversed method. Begin working on implementations of fill method.

  • Week 11: Open PR for fill method implementation. Seek feedback. Iterate over the implemented methods based on the feedback.

  • Week 12: Seek feedback from the mentors. Finalize any pending tasks and address any remaining issues or bugs. Conduct final round of testing and quality assurance checks. Polish documentation and ensure it is comprehensive and well-organized. Prepare final project report summarizing achievements and challenges faced.

  • Final Week: Address any last-minute feedback. Submit project deliverables and documentation.

  • Post Gsoc: After GSoC, my focus will shift towards incorporating support for bool dtype and integrating the JavaScript API methods reliant on boolean arrays, such as every, some, and includes.

Notes:

  • The community bonding period is a 3 week period built into GSoC to help you get to know the project community and participate in project discussion. This is an opportunity for you to setup your local development environment, learn how the project's source control works, refine your project plan, read any necessary documentation, and otherwise prepare to execute on your project project proposal.
  • Usually, even week 1 deliverables include some code.
  • By week 6, you need enough done at this point for your mentor to evaluate your progress and pass you. Usually, you want to be a bit more than halfway done.
  • By week 11, you may want to "code freeze" and focus on completing any tests and/or documentation.
  • During the final week, you'll be submitting your project.

Related issues

Checklist

  • I have read and understood the Code of Conduct.
  • I have read and understood the application materials found in this repository.
  • I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
  • I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
  • The issue name begins with [RFC]: and succinctly describes your proposal.
  • I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.
@headlessNode headlessNode added 2024 2024 GSoC proposal. rfc Project proposal. labels Mar 20, 2024
@kgryte
Copy link
Member

kgryte commented Mar 21, 2024

A few comments:

  1. indexOf, includes, lastIndexOf, every, some, and find should be capable of operating over a list of specified dimensions. In which case, you won't return a single value, but an array of reduced rank.
  2. filter should return a flattened array, as otherwise you'd need to support ragged dimensions, which we cannot do.
  3. In order to support return an array of booleans, we need to add support for bool dtypes in ndarrays, which we do not currently support. Related: [Idea]: add support for boolean arrays in stdlib #43
  4. You've omitted various other built-in array methods (e.g., concat, copyWithin, fill, findIndex, findLast, findLastIndex, flat, flatMap, join, reduceRight, toReversed, toSorted, with). Can you comment on why you omitted those?
  5. I suggest updating your timeline with when you expect to open various PRs. Currently, the language is fairly vague (e.g., "begin experimenting", "prototyping", etc).

@headlessNode
Copy link
Member Author

@kgryte Thank you for your detailed response.

  1. I've modified the indexOf. However, I think that some, every, find and includes will not return an array but a single boolean indicating whether the condition holds true or otherwise, Please guide me in this matter.

  2. I've modified the specifications for filter method.

  3. related to point 1.

  4. I didn't include concat, copyWithin, join, and with because I'm unsure of how they would be implemented for ndarrays. As for the remaining methods, I've now added them. The omission of those was due to their similarities to the ones already included, but I recognize their importance and have included them in the proposal. Apologies for the oversight, and thank you for bringing it to my attention.

  5. Timeline updated.

Again, thank you for providing the feedback. Your suggestions have been really helpful in coming up with the project proposal and refining it.

@kgryte
Copy link
Member

kgryte commented Mar 21, 2024

Re: 1. You're assuming that the operation will be across the entire array. Meaning, if we're provided a 10D array or a 1D array, it is the same. You're proposal suggests that in both cases the result of, e.g., every will be the equivalent of flattening any input array to 1D and then checking each element.

What I am saying is that, while that is fine for a default, we're also likely to want to perform row-wise or column-wise operations. Consider the matrix

x = [
  0 1 0
  1 0 1
  0 0 0
]

We should be able to call every( x, axis ), where axis specifies the dimension over which to perform the reduction. E.g., if say, axis=0 means across rows, then the result should be

y = [
  true true true
]

and if, say, axis=1 means across columns, then the result should be

y = [
  true
  true
  false
]

See the difference? In the first case, y should be 1x3 and in the second case 3x1, assuming that keepdims is true, as in NumPy.

@headlessNode
Copy link
Member Author

@kgryte Thank you for the detailed explanation. I could research how NumPy achieves the similar by the np.all, np.any, np.where etc methods. But then, as you said, we'd need add the support for bool dtypes. I've gone through the #43 issue, and it looks like it requires considerable time and effort.

I see two possible paths forward. The first option would involve incorporating the bool dtype support project into this one, potentially sacrificing some of the array API methods that are not directly related to this issue. However, this approach might affect the scope and timeline of the project.

Alternatively, we could focus on implementing methods that do not depend on returning an array of booleans for now. After GSoC, we could revisit the issue and work on adding support for bool dtypes, enabling us to implement methods that depend on them.

I lean towards this second option due to it being more focused and achievable project scope.

I'm open to any suggestions or alternative approaches you may have regarding this matter. So please let me know your thoughts.

@kgryte
Copy link
Member

kgryte commented Mar 22, 2024

Pursuing the second option seems sensible.

@headlessNode
Copy link
Member Author

Great, I'll update the proposal accordingly. Thank you for the guidance and support.

@headlessNode
Copy link
Member Author

Hi @kgryte, before submitting the final proposal to the GSoC website, I wanted to confirm with you, if it would be appropriate for me to include what each package of the implementation will include?

For instance, for the implementation of forEach, we will have distinct modules for dimensions up to 10d. For each dimension, we'll include simple nested iteration, accessor, and blocked iteration implementations. And, for dimensions exceeding 10d, we'll include nd and nd_accessor implementations.

@kgryte
Copy link
Member

kgryte commented Mar 31, 2024

@headlessNode Yes, that seems reasonable. Note, however, this structure may not be necessary for all the APIs in this proposal. For APIs needing element-wise iteration (e.g., forEach, map), this should suffice. For others, such as reduction APIs, what those implementations will look like is TBD.

@headlessNode
Copy link
Member Author

@kgryte Thanks for the confirmation. And yes, in the final proposal, I already added an implementation overview for the forEach method. I explicitly mentioned that this implementation overview may not encompass each of the proposed methods.

@kgryte kgryte closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024 2024 GSoC proposal. rfc Project proposal.
Projects
None yet
Development

No branches or pull requests

2 participants