Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: expand support for additional pseudorandom number generators #47

Closed
6 tasks done
PraneGIT opened this issue Mar 19, 2024 · 3 comments
Closed
6 tasks done
Labels
2024 2024 GSoC proposal. rfc Project proposal.

Comments

@PraneGIT
Copy link

PraneGIT commented Mar 19, 2024

Full name

Pranjal Jha

University status

Yes

University name

IIIT Jabalpur

University program

Computer Science Engineering

Expected graduation

31-5-2025

Short biography

Hey, I am Pranjal Jha, a developer with a passion for backend development and proficiency in JavaScript, coupled with my academic background as a third-year Computer Science student at IIIT Jabalpur, along with internship experience in Android development and proficiency in Node.js,Golang, C, and C++, I believe I am uniquely suited for the project focused on expanding the selection of PRNGs in stdlib.

My diverse skill set spanning across mobile development, backend development, and proficiency in multiple programming languages gives me a holistic understanding of software development. This enables me to approach the project from various angles and ensure compatibility across different platforms and environments. My passion for backend development, demonstrated through my proficiency in Golang and Node.js, drives my interest in projects like expanding the standard library's PRNGs. I understand the importance of robust and efficient random number generation in backend systems, especially in applications requiring numerical simulations or cryptographic operations.

Timezone

India Standard Time

Contact details

email : pranjaljha00@gmail.com

Platform

Linux

Editor

I predominantly use Visual Studio Code (VS Code) as my preferred code editor. Its seamless integration with Ubuntu's ecosystem, along with its extensive range of features and plugins, makes it a versatile and efficient tool for software development.
VS Code's intuitive user interface, robust debugging capabilities, and support for various programming languages enhance my productivity and streamline my workflow.

Programming experience

I have experience in competitive programming, on codeforces I am rated about 1300 and use C++ as my primary language. Apart from compitative programming I have made several projects like gRPC – complete which is simple gRPC project that demonstrates unary,client-server,bidirectional communication between a server and a client using Go programming language, while attempting a simple implementation of containerization and creating a pod/deployment.
Apart from that I am also experienced in android development, I have also made an expense splitting app ,which helps in split bills and transactions for user with his friends.

JavaScript experience

In JavaScript, I have substantial experience gained through both academic projects and practical applications. I've developed interactive web applications, dynamic user interfaces, and backend services using frameworks like Node.js. I'm proficient in concepts such as asynchronous programming, closures, callbacks, promises, and ES6 features, enabling me to write clean, efficient, and maintainable code. Additionally, I have experience with popular libraries and frameworks like React.js and Express.js

M favorite JS feature is async/await. This feature allows for asynchronous code to be written in a synchronous-like manner, enhancing code readability and maintainability, especially in scenarios where multiple asynchronous operations need to be coordinated or chained together.

I personally do not like the === operator,In many cases, developers prefer strict equality comparisons (===), which require both value and type to be the same for the comparison to evaluate to true.But languages like C++ or even Java handel it much better.

Node.js experience

I have made several personal projects in NodeJS, mostly using the express library for making server side applications using postgres SQL,mySQL,mongoDB as their database. I have also implemented many middlewares in these applications for auth and routing.

C/Fortran experience

In C, I have a solid foundation acquired through coursework, personal projects, and competitive programming. I'm comfortable with low-level programming concepts, memory management, data structures, and algorithm implementation. I've worked on projects ranging from system-level programming to embedded systems development.

Interest in stdlib

stdlib combines the 2 things I really like working on maths and web dev,
In stdlib I am thoroughly impressed by the already implementations of Base (i.e., lower-level) pseudorandom number generators (PRNGs).
and their implementations.
It would be a learning experience for me to implement more of these PRNGs which are not yet implemented in this esteemed organization.

Version control

Yes

Contributions to stdlib

so far in stdlib I have merged 3 pull requests and have 1 unmerged open pull request

detail for merged PRs:

  1. feat: add iter/do-while-each:

adds an iterator utility to invoke a function for each iterated value until a predicate function returns false.
This is the iterator equivalent of @stdlib/utils/do-while-each.

2 . feat: add utils/every-own-by:

adds a utility to test whether every "own" property of a provided object satisfies a predicate function.
This is the object keys equivalent of @stdlib/utils/every-by.

  1. feat: add iter/sequences/tribonacci:

adding the package @stdlib/math/iter/sequences/tribonacci.
The package is similar in structure as @stdlib/math/iter/sequences/fibonacci.
Package: @stdlib/math/iter/sequences/tribonacci

Detail for unmerged PR:

  1. refactor: update blas/ext/base/sdssumpw to follow current project conventions:

refactors @stdlib/blas/ext/base/sdssumpw to follow current project conventions.
As outlined in Refactor and update existing base BLAS packages according to current project conventions #788.

Goals

Currently, the library offers limited options for PRNGs. This project proposes adding 8 new algorithms (PCG32, Philox, Xorshiro++/Xoshiro**,Arc4Random,ChaCha20 & TinyMT32/64) to provide developers with a wider range of choices depending on their specific needs. The inclusion of a cryptographically secure PRNG like ChaCha20, empowers developers to build more robust security features into their applications, especially those involving sensitive data.

This project has the potential to significantly enhance the @StdLib\random library, providing developers with a broader spectrum of high-performance, statistically sound PRNG

image

All module to be implemented will be similar in structure to @stdlib/random/base/mt19937

providing a comparative analysis of five pseudorandom number generator (PRNG) algorithms: (PCG32, Philox, Xorshiro++/Xoshiro**,Arc4Random,ChaCha20 & TinyMT32/64). Each entry highlights the key strengths and limitations of the respective PRNG, aiding developers in selecting the most suitable option for their specific needs:

  1. PCG Family (focusing on PCG32):
  • Strengths:
    • PCG generators excel in empirical statistical tests, demonstrating exceptional randomness properties.
    • These generators offer periods of virtually any size, ensuring an immense pool of unique random sequences.
    • Their high-quality randomness and customizable period lengths make them adaptable to diverse applications.
    • PCG generators are recognized for their speed and efficient memory usage.
  • Limitations:
    • While robust for general-purpose randomness, PCG is not demonstrably cryptographically secure, limiting its use in security-sensitive applications.
    • Being a newer entrant compared to some other PRNGs, PCG may not have undergone as extensive scrutiny or achieved widespread adoption.
  1. Philox:
  • Strengths:
    • Philox boasts exceptional speed, making it ideal for tasks requiring generation of vast quantities of random numbers.
    • This PRNG features a remarkably long period, significantly reducing the likelihood of encountering repeating sequences in generated numbers.
    • Philox thrives in parallel computing environments, allowing simultaneous generation of multiple independent random streams.
  • Limitations:
    • Philox is not considered cryptographically secure, limiting its use in security-critical applications.
    • Some statistical tests have shown minor biases in Philox's output, which might be a concern for applications requiring the highest quality randomness.
  1. xoshiro128**:
  • Strengths:
    • xoshiro128** is relatively easy to implement, requiring only simple bitwise operations.
    • Its simplicity translates to ease of implementation and application across various contexts.
  • Limitations:
    • xoshiro128**'s relatively short period may limit its usefulness in applications demanding very long sequences of unique random numbers.
    • Following the generation of a single random number, xoshiro128**'s output becomes predictable, raising concerns for specific applications.
  1. xoshiro128++:
  • Strengths:
    • xoshiro128++ variant typically offer better statistical properties compared to basic xoshiro128** algorithms.
    • Its simplicity translates to ease of implementation and application across various contexts.
    • xoshiro128++ excels at producing 32-bit numbers or streams of random bits.
  • Limitations:
    • Similar to xoshiro128**, xoshiro128++ may have a limited period length, which can impact its suitability for applications requiring very long sequences of random numbers.
  1. ChaCha20:
  • Strengths:
    • ChaCha20 is demonstrably cryptographically secure, making it suitable for applications requiring secure random numbers.
    • It exhibits good efficiency, particularly compared to some other cryptographic PRNGs.
    • ChaCha20 supports variants like ChaCha8 that trade off some security for enhanced speed.
  • Limitations:
    • Implementations may be more complex compared to non-cryptographic PRNGs, potentially requiring more computational resources.
    • Despite a sizeable internal state, ChaCha20's period might be smaller than anticipated, necessitating caution in some applications.
  1. Arc4random:
  • Strengths:
    • Arc4random is considered cryptographically secure for most purposes, making it suitable for security-sensitive applications.
    • It balances randomness quality and performance effectively, demonstrating efficiency in both time and space complexity.
    • Readily available in BSD-based systems, Arc4random simplifies integration into applications requiring high-quality randomness.
  • Limitations:
    • Proper initialization is crucial to ensure randomness. Improper usage may lead to predictability issues.
    • Improper usage or implementation flaws could introduce biases in generated random numbers, potentially leading to security vulnerabilities.
  1. TinyMT32:
  • Strengths:
    • TinyMT32 is known for its high speed and low memory usage, making it suitable for resource-constrained environments.
  • Limitations:
    • The small state size of TinyMT32 may limit its suitability for applications requiring very large periods or high-dimensional random numbers.
  1. TinyMT64:
  • Strengths:
    • TinyMT64 offers a longer period compared to TinyMT32, making it suitable for applications requiring more extensive sequences of random numbers.
  • Limitations:
    • While larger than TinyMT32, the state size of TinyMT64 may still be insufficient for some applications requiring extremely long periods or high-dimensional random numbers.

Why this project?

Developing high-quality PRNG algorithms requires a deep understanding of mathematical concepts, algorithmic efficiency, and cryptographic principles. As someone passionate about algorithms and cryptography, I am excited about the technical challenges involved in designing and implementing robust PRNGs.
As someone with a background in computer science, cryptography, and software development, this project aligns perfectly with my interests and expertise. I bring a combination of theoretical knowledge and practical experience to the table, enabling me to make meaningful contributions to the project.

Qualifications

With experience in JavaScript, C, Kotlin and C++, I have a solid foundation in programming languages commonly used for software development. This enables me to understand and implement algorithms efficiently across different platforms.

As a 3rd year student at IIIT Jabalpur my coursework in computer science , including courses in mathematics and cryptography, has equipped me with a strong mathematical foundation. Understanding the mathematical principles behind PRNG algorithms is crucial for designing and implementing them effectively. Through competitive programming on platforms like Codeforces, I have honed my skills in algorithm design and analysis. This experience allows me to approach the development of PRNG algorithms with a systematic and analytical mindset, ensuring their correctness and efficiency. Courses in advanced cryptography have provided me with insights into cryptographic primitives, randomness requirements, and security considerations.

This knowledge is invaluable for designing PRNGs that meet the stringent requirements of cryptographic applications .As an Android developer with internship experience, I am familiar with software development processes, version control systems, and collaborative tools. I understand the importance of writing clean, maintainable code and adhering to best practices in software engineering.

Prior art

Here is what I found while researching for the project:
https://www.pcg-random.org/
https://www.pcg-random.org/pdf/toms-oneill-pcg-family-v1.02.pdf
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2075r2.pdf
https://www.jstatsoft.org/article/download/v008i14/916
https://xilinx.github.io/Vitis_Libraries/security/2019.2/guide_L1/internals/chacha20.html#:~:text=Chacha20%20is%20a%20cipher%20stream,4*32%2Dbit%20words
https://man.openbsd.org/arc4random.3
https://prng.di.unimi.it/

which In my opinion provides enough resources to implement these set of PRNGs in stdlib.

Commitment

1 May - 26 May -> Bonding Period
27 May - 7 July -> 40 hours/week ( 40 * 6 )
8 July - 17 August -> 20 hours/week ( 20 * 6 )
Total = 240 + 120 = 340 hours

Schedule

Assuming a 12 week schedule,

  • Community Bonding Period:
    During this period, I will work to explore the project's codebase. Paying attention to the organization of the code, existing PRNG implementations (mainly of mt19937), and any relevant documentation or comments.

  • Week 1:
    Familiarization and Setup:

    • Study the documentation and source code of existing PRNG implementations (such as MT19937).
    • Understand the basics of PCG, Philox, Xorshift, ChaCha20, and Arc4random.
    • Make sure to understand the already implemented MT19937 file structure to implement other PRNGs like this.
    • Will add the stream, iter , array , and strided higher level wrappers initial setup.
  • Week 2:
    Implement PCG Family:

    • Begin by implementing PCG, focusing on PCG32 initially.
    • Test the implementation thoroughly to ensure correctness and adherence to the specifications.
    • Document the implementation process and any challenges encountered.
  • Week 3:
    Extend PCG Implementation:

    • Extend the PCG implementation to support additional variants if necessary (e.g., PCG64).
    • Explore and implement features like jump-ahead and distance-between-states.
    • Conduct performance tests to assess the speed and memory usage of the implemented variants.
  • Week 4:
    Implement Philox PRNG:

    • Study the Philox algorithm and its specifications.
    • Implement the Philox PRNG in accordance with the provided documentation.
    • Test the implementation for speed, period length, and potential biases.
  • Week 5:
    Implement xoshiro128**:

    • Address any identified issues or optimizations required.
    • Test thoroughly for speed, period length, and potential biases.
  • Week 6: (midterm)
    Implement xoshiro128++:

    • Rather than using multiplication, it is possible to use addition as a faster non-linear transformation.
    • Study the xoshiro128 algorithm and available implementations.
    • Implement the chosen variant, ensuring correctness and efficiency.
  • Week 7:
    Implement ChaCha20 PRNG:

    • Study the ChaCha20 algorithm and available implementations.
    • Choose appropriate parameters and variants for implementation.
    • Implement the ChaCha20 PRNG according to the chosen specifications.
  • Week 8:
    Implement Arc4random

    • Study the Arc4random algorithm and its implementation in BSD systems.
    • Implement Arc4random, ensuring compatibility and security considerations.
    • Test the implementation against known test vectors and cryptographic standards.
  • Week 9:
    Implement TinyMT32:

    • Implement TinyMT32, ensuring correctness and efficiency.
    • Thorough testing for statistical soundness.
  • Week 10:
    Implement TinyMT64:

    • Implement TinyMT64, ensuring correctness and efficiency.
    • Thorough testing for statistical soundness.
  • Week 11:
    Documentation and Finalization:

    • Apply optimization techniques to improve performance without sacrificing randomness quality.
    • Benchmark the PRNGs against each other and against existing implementations to evaluate performance gains.
  • Week 12:
    Documentation and Finalization:

    • Conduct final testing and validation of the entire PRNG library.
    • Address any remaining issues, bugs, or optimizations.
    • Prepare comprehensive documentation and final deliverables for submission
  • Final Week:
    I will try to complete and wrap up all the things during this week and submit it and take suggestions from mentors.

Notes:

  • The community bonding period is a 3 week period built into GSoC to help you get to know the project community and participate in project discussion. This is an opportunity for you to setup your local development environment, learn how the project's source control works, refine your project plan, read any necessary documentation, and otherwise prepare to execute on your project project proposal.
  • Usually, even week 1 deliverables include some code.
  • By week 6, you need enough done at this point for your mentor to evaluate your progress and pass you. Usually, you want to be a bit more than halfway done.
  • By week 11, you may want to "code freeze" and focus on completing any tests and/or documentation.
  • During the final week, you'll be submitting your project.

Related issues

stdlib-js/stdlib#200
stdlib-js/stdlib#199
stdlib-js/stdlib#201

Checklist

  • I have read and understood the Code of Conduct.
  • I have read and understood the application materials found in this repository.
  • I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
  • I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
  • The issue name begins with [RFC]: and succinctly describes your proposal.
  • I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.
@PraneGIT PraneGIT added 2024 2024 GSoC proposal. rfc Project proposal. labels Mar 19, 2024
@kgryte
Copy link
Member

kgryte commented Mar 26, 2024

@PraneGIT A few comments:

  1. In general, we are not concerned with cryptographically secure PRNGs at this point, as we don't (yet) want the burden and responsibility of ensuring cryptographically secure anything.
  2. I think a reasonable target is one new PRNG every week or two, so aiming for 8 PRNGs seems reasonable to me.
  3. Apart from the base implementation, we'll want to add the stream, iter, array, and strided higher level wrappers. Ideally, this will be largely automated, but there may be some initial setup work.
  4. One thing to consider is that we are biased toward 32-bit generators. 64-bit generators are possible, but we'd need to figure out the backward compatibility story, as we support older environments lacking, e.g., BigInt support.

Other PRNGs:

There are likely many more. The main thing is that they be reasonably well-known and permissively licensed. We cannot use anything GNU licensed. So nothing straight from GSL.

@PraneGIT
Copy link
Author

PraneGIT commented Mar 29, 2024

@kgryte Thank you for your feedback and suggestions regarding the proposed project. I appreciate the opportunity to refine the proposal based on the project's requirements and priorities.
Upon further research I have decided to work on these PRNGs, I have added Xoshiro++/** as I believe they are more useful and popular 32-bit generators.

Also I have added TinyMT32/64 in my proposal.
Now I have a total of 8 PRNGs to implement this summer.
I will add the stream, iter, array, and strided higher level wrappers within the 1st week of my project and will ensure that all selected PRNGs adhere to permissive licenses and are compatible with the project's licensing requirements. This includes avoiding PRNGs licensed under GNU licenses.

please have a review of updated issue.

@kgryte
Copy link
Member

kgryte commented Mar 31, 2024

Thanks for the updates.

@kgryte kgryte closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024 2024 GSoC proposal. rfc Project proposal.
Projects
None yet
Development

No branches or pull requests

2 participants