Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Test on data.table Issues.qmd file #16

Merged

Conversation

DorisAmoakohene
Copy link
Contributor

Working on a github action blog that explain the how to run an action for every pull request

@DorisAmoakohene
Copy link
Contributor Author

@tdhock @Anirban166 can you please proofread this. Thanks

output: html_document
---

Since August 2023, I have been working on Expanding the open-source Ecosystem around data.table In R. data.table has become a widely adopted tool for data manipulation tasks, especially in scenarios involving large datasets. Its popularity stems from its remarkable speed and memory efficiency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been working on expanding... -> I think it would be more appropriate to say that you have been working on performance testing, which could be useful for expanding the ecosystem, because it could increase confidence that code contributions are maintaining the efficiency of data.table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted

data.table is an extension of R's data.frame, designed to handle large datasets efficiently. It provides a syntax that is both concise and expressive, allowing users to perform complex data manipulations with ease. Its efficiency is particularly evident when dealing with tasks like filtering, grouping, aggregating, and joining data.

The development team behind data.table is committed to continuously improving its performance. Over the years, several major version changes have been introduced, aiming to enhance speed and efficiency. These changes include algorithmic optimizations, memory management improvements, and enhancements to parallel processing capabilities. Upgrading to the latest version ensures that users can leverage the most recent performance enhancements.
most recent performance enhancements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please delete repeated most recent performance enhancements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


# **Benchmarking for Performance Evaluation**

To evaluate data.table performance, it is essential to employ benchmarking methodologies. The approach I used utilizes the atime_versions function from the atime package, which measures the actual execution time of specific operations. This function allows for accurate comparisons between different versions of the data.table package, by benchmarking against realistic use cases and giving a graphical visualization of the results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time and memory usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# **Why do we run Performance Tests on commits?**

Running performance tests on commits helps maintain a high-performance standard for the package, detect and fix performance regressions, optimize code, validate performance improvements, and ensure consistent performance over time. It is an essential practice to deliver a performant and reliable package to end-users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and to encourage confidence in code contributions from new people

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

Running performance tests on commits helps maintain a high-performance standard for the package, detect and fix performance regressions, optimize code, validate performance improvements, and ensure consistent performance over time. It is an essential practice to deliver a performant and reliable package to end-users.

# **What are the Performance Tests?**
The goal of our atime Performance Tests is to gather memory and responsiveness time metrics while simulating the full range of member interactions with the data.table repository
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responsiveness? member interactions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

# **What are the Performance Tests?**
The goal of our atime Performance Tests is to gather memory and responsiveness time metrics while simulating the full range of member interactions with the data.table repository

## In atime code, there are five main parts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using atime_versions, there are five main arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated


## In atime code, there are five main parts:

1. `pkg.path`: This variable represents the path to the package being benchmarked. It specifies the location of the `data.table` package on your system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable -> argument

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also it should be a git clone, not just the package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


2. `N`: This variable determines the number of iterations for the benchmarking process. It is a sequence of numbers that define different data sizes to test the performance of the operation.

3. `setup`: This section contains the setup code for generating the dataset used in the benchmarking process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should depend on N

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well noted


3. `setup`: This section contains the setup code for generating the dataset used in the benchmarking process.

4. `expr`: This section contains the expression that represents the operation being benchmarked. It uses the `data.table::`[.data.table`` syntax to perform the operation on the dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clarify why data.table::[.data.table is necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


5. `...` : This section specifies the different versions of the data.table packages that will be tested. It includes three versions: "Before," "Regression," and "Fixed." Each version is associated with a specific commit id.

The result from running the atime versions will be a list of the seconds.limit (numeric input param) and timings (data table of results).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either delete seconds.limit, or explain it better please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay


The result from running the atime versions will be a list of the seconds.limit (numeric input param) and timings (data table of results).

Lastly, I run a github action. The action defines test.list as a list with names corresponding to different tests. Each element of the test.list should be a list with named arguments N, setup, expr, which was passed as arguments in your atime::atime_versions test. For further elaboration on the process of performing asymptotic time testing using the atime package, please refer to [this ](https://github.com/marketplace/actions/r-asymptotic-testing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the github action working already? on which repo did you set it up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not, maybe delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, its still being worked in, so I will delete it for now and later i can write on that.

Lastly, I run a github action. The action defines test.list as a list with names corresponding to different tests. Each element of the test.list should be a list with named arguments N, setup, expr, which was passed as arguments in your atime::atime_versions test. For further elaboration on the process of performing asymptotic time testing using the atime package, please refer to [this ](https://github.com/marketplace/actions/r-asymptotic-testing)

## We run the full performance, Pull Request (PR):
1. Before the issue is made (Before)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue -> performance regression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well noted


## We run the full performance, Pull Request (PR):
1. Before the issue is made (Before)
2. when the PR is first submitted (Regression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performance regression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

## We run the full performance, Pull Request (PR):
1. Before the issue is made (Before)
2. when the PR is first submitted (Regression)
3. when the PR is merged to the destination branch(Fixed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR which fixes the performance regression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted

3. when the PR is merged to the destination branch(Fixed)

# **APPROACH**
1. To begin, conduct the atime test for the different code branches (before regression, regression, fix regression) to identify potential performance issues. Here is an example of how to perform the [atime test](https://github.com/DorisAmoakohene/Efficiency-and-Preformance-Test.RData.table)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preformance? typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, fixed

# **APPROACH**
1. To begin, conduct the atime test for the different code branches (before regression, regression, fix regression) to identify potential performance issues. Here is an example of how to perform the [atime test](https://github.com/DorisAmoakohene/Efficiency-and-Preformance-Test.RData.table)

NB: Set up the necessary environment and dependencies, ensuring that the data.table package and the atime package are installed and loaded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use full English words instead of abbreviations like NB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay


3. Utilize the atime_versions function to track the fixes across different versions.

4. Pass the following named arguments to atime::atime_versions: N, setup, expr, and the different code branches. More documentation of the atime package can be found [here](https://github.com/tdhock/atime/tree/compare-dt-tidy).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change link to main github https://github.com/tdhock/atime instead of branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright

4. Pass the following named arguments to atime::atime_versions: N, setup, expr, and the different code branches. More documentation of the atime package can be found [here](https://github.com/tdhock/atime/tree/compare-dt-tidy).

5. Use the plot function to visually present the execution times of the expression evaluated across different versions of the data.table package.
Run the GitHub Action by writing tests in inst/atime/tests.R.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why inst/atime/tests.R here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted. updated.

Run the GitHub Action by writing tests in inst/atime/tests.R.


# Lets run some examples to see how this work.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


# Lets run some examples to see how this work.

The first example we will discuss is an issue reported on a performance regression when performing group computations, specifically when running R's C eval on each group (q7 and q8) in the db-benchmark, indicating a slowness in the implementation of the code.[link to comment that reported Regression](https://github.com/Rdatatable/data.table/issues/4200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which PR caused the regression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


atime.list.4200 <- atime::atime_versions(
pkg.path=tdir,
pkg.edit.fun=function(old.Package, new.Package, sha, new.pkg.path){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain what is pkg.edit.fun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 152 to 155
png("atime.list.4200.png")
plot(atime.list.4200)+
labs(title = "groupby with dogroups (R expression) performance regression")
dev.off()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably delete for blog

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

Comment on lines 170 to 172
3. Red Line (“Before”): Indicates performance before fixing the regression.

4. Green Line (“Fixed”): Shows improved performance after fixing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change colors (red and green should not be used on same plot, for color blind people), try https://r-graph-gallery.com/38-rcolorbrewers-palettes.html or https://colorbrewer2.org/#type=qualitative&scheme=Accent&n=3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay


5. Blue Line (“Regression”): Represents an ideal or target performance level.

In both graphs, as data size (N) increases, there’s an initial increase in median time, but it significantly reduces fix, indicating performance improvement. The regression issue was successfully addressed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are both graphs? isn't there just one with three lines?
please clarify "significantly reduces fix"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified

```{r,warning = FALSE, message = FALSE}
atime.list.5366 <- atime::atime_versions(
pkg.path=tdir,
pkg.edit.fun=function(old.Package, new.Package, sha, new.pkg.path){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is the same pkg.edit.fun, please define it once above and use it in both calls to atime_versions. If it is different, please explain why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its the same, so I called it just ones

Comment on lines 227 to 229
"Before"="be2f72e6f5c90622fe72e1c315ca05769a9dc854",
"Regression"="e793f53466d99f86e70fc2611b708ae8c601a451",
"Fixed"="58409197426ced4714af842650b0cc3b9e2cb842")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to explain where these commit ids come from

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need comments or explanation in text


# **Why do we run performance tests on commits?**

Running performance tests on commits helps maintain a high-performance standard for the package, detect and fix performance regressions, optimize code, validate performance improvements, ensure consistent performance over time and to encourage confidence in code contributions from new people.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this section redudant with the first section?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move sections/sentences together if they are very similar


# **Benchmarking for performance evaluation**

To evaluate data.table performance, it is essential to employ benchmarking methodologies. The approach I used utilizes the atime_versions function from the atime package, which measures the actual execution time of specific operations. This function allows for accurate comparisons between different versions of the data.table package, by benchmarking against time and memory usage and giving a graphical visualization of the results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence which introduces atime/versions should be moved closer to the section below where you explain its usage


5. `expr`: This section contains the expression that represents the operation being benchmarked. It uses the `data.table::`[.data.table`` syntax to perform the operation on the dataset.

In the given syntax `data.table::`[.data.table``, the first part `data.table::` installs and loads different versions of the data.table package based on the specified commit IDs. Following that, the expression specified within `[.data.table`` is executed on each installed version. This process is repeated for all the specified commit IDs in the code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if your code contains backticks, then you need to use one more backticks.
data.table::`[.data.table`
``data.table::`[.data.table` ``

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and you should write that data.table:: will be translated to data.table.SHA1:: for some version hash SHA1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay


1. `pkg.path`: This argument specifies the location on your system where you have stored a git clone of the `data.table` package.

2. `pkg.edit.fun`: The default behavior of pkg.edit.fun is designed to work with Rcpp packages and involves replacing instances of "PKG" with "PKG.SHA" in the package code. Any occurrences of the string "PKG" within the package code will be replaced with "PKG.SHA", where "SHA" represents the commit SHA/ids associated with the version being installed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain why we need a more complicated pkg.edit.fun for data.table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this explain need a more complicated pkg.edit.fun for data.table?

The data.table package needs a complex pkg.edit.fun function due to its use of Rcpp, versioning, and naming considerations. Thus, the pkg.edit.fun function plays a crucial role in addressing these challenges and ensuring the smooth operation of the data.table package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data.table does not use Rcpp

@@ -0,0 +1,249 @@
---
title: "Visualizing Performance Regression of data.table with Atime"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have everything without caps except for V / Visualizing since it's the first word

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay


```
data.table.ec1259af1bf13fc0c96a1d3f9e84d55d8106a9a4:::`[.data.table`(DT, , .(v3=mean(v3, na.rm=TRUE)), by=id3, verbose=TRUE)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the blank line at the end for each code segment (including the ones below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


1. `pkg.path`: This argument specifies the location on your system where you have stored a git clone of the `data.table` package.

2. `pkg.edit.fun`: The default behavior of pkg.edit.fun is designed to work with Rcpp packages and involves replacing instances of "PKG" with "PKG.SHA" in the package code. Any occurrences of the string "PKG" within the package code will be replaced with "PKG.SHA", where "SHA" represents the commit SHA/ids associated with the version being installed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use consistent capitalization - 'ids' here but 'IDs' below like in line 169 (and probably just use 'commit SHA' instead of 'commit SHA/ids'

Also, I agree with Toby that pkg.edit.fun needs more explaining. I do not clearly understand why it is used from that and I would not use 'Thus, the pkg.edit.fun function plays a crucial role in addressing these challenges and ensuring the smooth operation of the data.table package.' without explaining in detail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or in comparison to line 51 and 52 below (ids vs IDs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright


In this example, the expression `[.data.table` is executed on the `DT` dataset using the specified commit ID (`ec1259af1bf13fc0c96a1d3f9e84d55d8106a9a4`) of the data.table package. The expression calculates the mean of the `v3` column (ignoring missing values) grouped by `id3`, and the `verbose=TRUE` argument enables verbose output during the operation. This process is typically repeated for all commit IDs in your code to compare the performance of different versions of the data.table package.

6. `...` : This section specifies the different versions of the data.table packages that will be tested. It includes three versions: "Before," "Regression," and "Fixed." Each version is associated with a specific commit id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this can be labelled as a 'section', as it looks like an ellipsis (used for varying arguments usually) if not just a placeholder for continuation or incomplete code/output (also where is it used below?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is how the named versions is defined in the ?atime::atime_versions(),

... : named versions.


1. X-Axis (N): Represents the size of the data (N) on a logarithmic scale.

2.Y-Axis: Represents the median time in milliseconds (logarithmic scale).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space between '2.' and 'Y-Axis'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay


4.“Fixed”: Shows improved performance after fixing.

5. “Regression”: Represents an ideal or target performance level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces again for the two above (make it consistent with one space like this line)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay


# Example Two

In the specific case of issue *#5366*, there was a significant slowdown in the performance of data.table's time-based rolling aggregation compared to pandas' rolling aggregation. The cause of this regression was identified to be related to the addition of the snprintf function in the assign.c file. To address this issue, a fix was implemented by creating the targetDesc function and adding the snprintf function in the assign.c file. This fix resolved the regression and improved the performance of the time-based rolling aggregation in data.table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use ticks for have file and function names, e.g.: assign.c, snprintf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay


In summary, the graph visually demonstrates how fixing the regression issue led to improved performance in time-based rolling operations. The Fixed line represents the desired outcome, showing faster processing times for larger sample sizes.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing (again)

Overall this post looks better than the other one, but both could do with more attention to small things when writing (especially being consistent). Please convert the PR from draft to regular mode when done with the revisions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks


# **Conclusion**

In this blog post, we have delved into the use of the atime code to compare the asymptotic time and memory usage of different versions of the data.table package. Specifically, we explored the comparisons between the "Before," "Regression," and "Fixed" versions, as well as different versions implementing the same computation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a description of the github action, and how data.table is now using this to review PRs like this Rdatatable/data.table#5427 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

@tdhock
Copy link
Contributor

tdhock commented Jun 12, 2024

hi @DorisAmoakohene @Anirban166 can you please revise and tell me when you think it is ready to merge?

@DorisAmoakohene DorisAmoakohene marked this pull request as ready for review June 12, 2024 19:48
@DorisAmoakohene
Copy link
Contributor Author

yes @tdhock, this is ready for review

In summary, the graph visually demonstrates how fixing the regression issue led to improved performance in time-based rolling operations. The Fixed line represents the desired outcome, showing faster processing times for larger sample sizes.

# **Github Action**
The data.table project has implemented a GitHub Action to automatically review pull requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review -> run performance tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

In summary, the graph visually demonstrates how fixing the regression issue led to improved performance in time-based rolling operations. The Fixed line represents the desired outcome, showing faster processing times for larger sample sizes.

# **Github Action**
The data.table project has implemented a GitHub Action to automatically review pull requests.
The process is automated using a GitHub action implemented by @anirban166. This action runs the "atime" package for every pull request and generates plots of the results in a comment within the pull request. [See an example in this pull request](https://github.com/Rdatatable/data.table/pull/5427#issuecomment-2075471806)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"atime" -> `atime`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# **Github Action**
The data.table project has implemented a GitHub Action to automatically review pull requests.
The process is automated using a GitHub action implemented by @anirban166. This action runs the "atime" package for every pull request and generates plots of the results in a comment within the pull request. [See an example in this pull request](https://github.com/Rdatatable/data.table/pull/5427#issuecomment-2075471806)
This action allows the maintainers to easily determine if a pull request has any impact on the time or memory usage of the data.table package. To learn more you can visit [Anirban page](https://github.com/Anirban166/Autocomment-atime-results) or this [link](https://github.com/tdhock/atime?tab=readme-ov-file#github-action-for-continuous-performance-testing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anirban page -> Anirban's documentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

@tdhock tdhock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @DorisAmoakohene please proof-read one more time from start to finish
then you can ask Kelly for review / merge

@DorisAmoakohene
Copy link
Contributor Author

@tdhock @Anirban166

hi @DorisAmoakohene please proof-read one more time from start to finish then you can ask Kelly for review / merge

alright

@DorisAmoakohene
Copy link
Contributor Author

DorisAmoakohene commented Jun 21, 2024

@kbodwin Could you please review this blog for merge? This blog provides a performance testing analysis using atime on the performance of different versions of the data.table package.

1 similar comment
@DorisAmoakohene
Copy link
Contributor Author

@kbodwin Could you please review this blog for merge? This blog provides a performance testing analysis using atime on the performance of different versions of the data.table package.

@kbodwin kbodwin changed the base branch from main to dev August 12, 2024 17:35
@kbodwin kbodwin merged commit f0d6c89 into rdatatable-community:dev Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants