[REVIEW]: Distributed Parallelization of xPU Stencil Computations in Julia #137

whedon · 2023-11-06T19:49:50Z

Submitting author: @omlins (Samuel Omlin)
Repository: https://github.com/omlins/ImplicitGlobalGrid.jl
Branch with paper.md (empty if default branch):
Version:
Editor: @fcdimitr
Reviewers: @mloubout, @georgebisbas
Archive: Pending

Status

Status badge code:

HTML: <a href="https://proceedings.juliacon.org/papers/d1c21de0f6f0bde752fab8428d2fa186"><img src="https://proceedings.juliacon.org/papers/d1c21de0f6f0bde752fab8428d2fa186/status.svg"></a>
Markdown: [![status](https://proceedings.juliacon.org/papers/d1c21de0f6f0bde752fab8428d2fa186/status.svg)](https://proceedings.juliacon.org/papers/d1c21de0f6f0bde752fab8428d2fa186)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@mloubout & @georgebisbas, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @fcdimitr know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Review checklist for @mloubout

Conflict of interest

As the reviewer I confirm that I have read the JuliaCon conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

I confirm that I read and will adhere to the JuliaCon code of conduct.

General checks

Repository: Is the source code for this software available at the repository url?
License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
Authorship: Has the submitting author (@omlins) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

Installation: Does installation proceed as outlined in the documentation?
Functionality: Have the functional claims of the software been confirmed?
Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Paper format

Authors: Does the paper.tex file include a list of authors with their affiliations?
A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
Page limit: Is the page limit for extended abstracts respected by the submitted document?

Content

Context: is the scientific context motivating the work correctly presented?
Methodology: is the approach taken in the work justified, presented with enough details and reference to reproduce it?
Results: are the results presented and compared to approaches with similar goals?

Review checklist for @georgebisbas

Conflict of interest

As the reviewer I confirm that I have read the JuliaCon conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

I confirm that I read and will adhere to the JuliaCon code of conduct.

General checks

Repository: Is the source code for this software available at the repository url?
License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
Authorship: Has the submitting author (@omlins) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

Installation: Does installation proceed as outlined in the documentation?
Functionality: Have the functional claims of the software been confirmed?
Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support #167

Paper format

Authors: Does the paper.tex file include a list of authors with their affiliations?
A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
Page limit: Is the page limit for extended abstracts respected by the submitted document?

Content

Context: is the scientific context motivating the work correctly presented?
Methodology: is the approach taken in the work justified, presented with enough details and reference to reproduce it?
Results: are the results presented and compared to approaches with similar goals?

The text was updated successfully, but these errors were encountered:

whedon · 2023-11-06T19:49:57Z

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @mloubout, @georgebisbas it looks like you're currently assigned to review this paper 🎉.

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

⭐ Important ⭐

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/JuliaCon/proceedings-review) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/JuliaCon/proceedings-review:

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon · 2023-11-06T19:50:02Z

Failed to discover a Statement of need section in paper

whedon · 2023-11-06T19:50:03Z

Wordcount for paper.tex is 1110

whedon · 2023-11-06T19:50:04Z

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.04 s (1395.4 files/s, 185772.3 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Julia                           24            368            277           2939
TeX                              8            244            177           2261
Markdown                         3             76              0            244
Bourne Shell                    11             78             84            132
YAML                             3              3              0             85
Ruby                             1              8              4             45
TOML                             3              4              0             27
-------------------------------------------------------------------------------
SUM:                            53            781            542           5733
-------------------------------------------------------------------------------


Statistical information for the repository '74fa0dfe549fde1b7d80b240' was
gathered on 2023/11/06.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Samuel Omlin                     3           114             57          100.00

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Samuel Omlin                 57           50.0          0.0                7.02

whedon · 2023-11-06T19:50:07Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

whedon · 2023-11-06T19:50:14Z

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1109/SC41405.2020.00062 is OK
- ??? is OK
- 10.21105/jcon.00068 is OK

MISSING DOIs

- 10.1137/141000671 may be a valid DOI for title: Julia: A fresh approach to numerical computing
- 10.1109/tpds.2018.2872064 may be a valid DOI for title: Effective extensible programming: unleashing Julia on GPUs

INVALID DOIs

- None

whedon · 2023-11-20T19:50:10Z

👋 @mloubout, please update us on how your review is going (this is an automated reminder).

whedon · 2023-11-20T19:50:10Z

👋 @georgebisbas, please update us on how your review is going (this is an automated reminder).

mloubout · 2023-11-20T20:17:10Z

Review

The abstract is written clearly and convey the message efficiently. The motivation and design principles are laid out clearly. The results show good performance and validate the claims of the author.

Some minor comments:

"We observe a parallel efficiency of 93% on 2197 GPUs." this is an odd number of GPUs to use for scaling, why not use 1024 similar to the parallel efficiency
Fig 3 could use the performance comparison since the author admit that the CUDA implementation outperforms the Juia implementation and this figure alone is slightly miss-leading.
Staggered grid can lead to asymmetrical halos (potentially send only/recv only), a small explanation on how it is handled would be useful
How does it compare to similar DSLs for distributed stencil/pde solving such as Firedrake, Fenics, Devito

very minor:

The overlap with [REVIEW]: High-performance xPU Stencil Computations in Julia #138 is non-negligeable, I wonder if those two should be combined.

With these addressed, this should be accepted for publication

georgebisbas · 2023-11-25T13:15:05Z

It seems like the repository: https://github.com/omlins/ImplicitGlobalGrid.jl
does not have an active "Issues" Github tab. Is that expected?

omlins · 2023-12-05T08:29:08Z

It seems like the repository: https://github.com/omlins/ImplicitGlobalGrid.jl does not have an active "Issues" Github tab. Is that expected?

Yes this is expected, because it is a fork for the single purpose of the publication of this paper. The original repository is the following: https://github.com/eth-cscs/ImplicitGlobalGrid.jl

georgebisbas · 2023-12-05T09:19:56Z

It seems like the repository: https://github.com/omlins/ImplicitGlobalGrid.jl does not have an active "Issues" Github tab. Is that expected?

Yes this is expected, because it is a fork for the single purpose of the publication of this paper. The original repository is the following: https://github.com/eth-cscs/ImplicitGlobalGrid.jl

Thanks @omlins. Where should we open an Issue for the paper according to the instructions? At the original repo?

omlins · 2023-12-05T13:43:42Z

I do not think that for this configuration it is specified in the instructions. Thus, I would suggest to open issues concerning the manuscript on the fork (https://github.com/omlins/ImplicitGlobalGrid.jl) and potential issues concerning the package on the original repository (https://github.com/eth-cscs/ImplicitGlobalGrid.jl).

georgebisbas · 2023-12-05T19:31:58Z

I do not think that for this configuration it is specified in the instructions. Thus, I would suggest to open issues concerning the manuscript on the fork (https://github.com/omlins/ImplicitGlobalGrid.jl) and potential issues concerning the package on the original repository (https://github.com/eth-cscs/ImplicitGlobalGrid.jl).

Now I can see a tab of Issues:
https://github.com/omlins/ImplicitGlobalGrid.jl/issues
which was not there before. Thanks

omlins · 2023-12-07T19:08:09Z

I do not think that for this configuration it is specified in the instructions. Thus, I would suggest to open issues concerning the manuscript on the fork (https://github.com/omlins/ImplicitGlobalGrid.jl) and potential issues concerning the package on the original repository (https://github.com/eth-cscs/ImplicitGlobalGrid.jl).

Now I can see a tab of Issues: https://github.com/omlins/ImplicitGlobalGrid.jl/issues which was not there before. Thanks

Yes, I added it upon your question.

georgebisbas · 2024-01-15T16:50:53Z

ImplcitGlobalGrid.jl offers an approach to automate distributed memory parallelism for stencil computations in Julia, leveraging a high-level symbolic interface and presenting highly competitive performance.

This work has a high overlap with #138

As discussed in ParallelStencil.jI, some changes should be applied to publish those as separate works. I understand that this work extends the optimization suite available in ParallelStencl.jl by allowing the use of automated Distributed Memory Parallelism. Since both packages will be published independently, mentioning that DMP is built upon the ParallelStencil.jl would not harm.

I suggest avoiding using the term xPU in the title. In the recent rise of “exotic” accelerators like Google’s TPU, Graphcore’s IPU, or expected QPUs, it probably would not harm just to use:

"Distributed Parallelization of Stencil Computations in Julia"

This is clarified at the end of the Introduction. However, I still advocate avoiding this term. It may also not help when someone will be looking for your work in the future.

The package's functionality is confirmed, and the installation, testing, and use documentation are present and well-defined. The paper is well-written, clearly presenting the motivation and example usage of the package. It may be good to add explicitly that users get automated DMP by only slightly editing the source code of a solver. I.e., the text can value the symbolic interface a bit more.

I am interested in how communication computation overlap is performed.
Would adding some more details about this in Section 2 be easy?

The diffusion example presented is well-written. Minor comments (not aiming to be addressed, but possibly interesting points for future work and further discussion):

Should Figure 1 be presented as a Listing 1?

Performance evaluation:

Why are you using 2197 GPUs?
Why is CPU scaling not presented since the paper claims xPUs?
Why is strong scaling not presented in this work?
How good is single-node performance?

Good to see that you are using a non-standard stencil kernel in Figure 3. Please add more info on this stencil kernel's computation and memory requirements.

I also second the comments posted by @mloubout.

Assuming the above concerns are addressed, this should be accepted for publication.

omlins · 2024-02-01T19:19:24Z

@mloubout: Thank you very much for the thorough review. We are working on addressing the reviews. In the following you can find replies to two of the issues that you raised. We will respond to the remaining issues alongside with the improvements of the manuscript.

"We observe a parallel efficiency of 93% on 2197 GPUs." this is an odd number of GPUs to use for scaling, why not use 1024 similar to the parallel efficiency

The reason for the scaling up to 2197 GPUs or nodes is that 17^3 (=2197) nodes is the biggest cubic node topology that can be submitted in the normal queue of Piz Daint (up to 2400 nodes, see: https://user.cscs.ch/access/running/piz_daint/#slurm-batch-queues). The number of 2197 GPUs appears to surprise when it is not explained, but there is nothing negative about it as the reported parallel efficiency confirms. Thus, we would like to add an explanation while keeping the plot as is.

Fig 3 could use the performance comparison since the author admit that the CUDA implementation outperforms the Juia implementation and this figure alone is slightly miss-leading.

Thank you for pointing this out; we will try to emphasize the performance difference.

omlins · 2024-02-02T09:07:02Z

@georgebisbas: Thank you very much for the thorough review. We are working on addressing the reviews. In the following you can find replies to many of the issues that you raised. We will respond to the remaining issues alongside with the improvements of the manuscript.

ImplcitGlobalGrid.jl offers an approach to automate distributed memory parallelism for stencil computations in Julia, leveraging a high-level symbolic interface and presenting highly competitive performance. (...) I understand that this work extends the optimization suite available in ParallelStencil.jl by allowing the use of automated Distributed Memory Parallelism. Since both packages will be published independently, mentioning that DMP is built upon the ParallelStencil.jl would not harm.

Thank you for this statement; it makes clear that we need to emphazise in the document that ImplicitGlobalGrid.jl is not built upon ParallelStencil.jl. ImplicitGlobalGrid.jl does not in any way assume or require that codes using it also use ParallelStencil.jl (the same is also true vice versa). In the documentation, there is a Multi-GPU example without ParallelStencil.jl (https://github.com/eth-cscs/ImplicitGlobalGrid.jl?tab=readme-ov-file#50-lines-multi-gpu-example). In addition to pointing this out, we will also refer to the example in the documentation.

The three main reasons why we chose an example using ParallelStencil.jl are 1) to illustrate interoperability with ParallelStencil.jl, which is a natural choice in Julia for the task at hand, 2) to achieve high-performance on a single-node, and 3) to illustrate the following:
"all data transfers are performed on non-blocking high-priority streams or queues, allowing to overlap the communication optimally with computation. ParallelStencil.jl, e.g., can do so with a simple macro call (Fig. 1, line 36)" (page 2, paragraph 1).

It may be good to add explicitly that users get automated DMP by only slightly editing the source code of a solver. I.e., the text can value the symbolic interface a bit more.

The previous point showed that we were missing to point out that ImplicitGlobalGrid.jl is not built upon ParallelStencil.jl which unfortunately comes out again in this comment: the symbolic interface is not part of ImplicitGlobalGrid.jl, but of ParallelStencil.jl, and plays no role in the distributed parallelization (with exception of the @hide_communication feature of ParallelStencil.jl, which can be optionally used and without ImplicitGlobalGrid.jl being aware of it or affected by it). The almost trivial distributed memory parallelization is achieved with calls to the functions provided by ImplicitGlobalGrid.jl and is described here:
"as little as three functions can be enough to transform a single xPU application into a massively scaling multi-xPU application: a first function creates the implicit global staggered grid, a second function performs a halo update on it, and a third function finalizes the global grid. Fig. 1 shows a stencil-based 3-D heat diffusion xPU solver, where distributed parallelization is achieved with these three ImplicitGlobalGrid functions (lines 23, 38 and 43) plus some additional functions to query the size of the global grid (lines 24-26; note that shared memory parallelization is performed with ParallelStencil [3])" (page 1, last paragraph)

I am interested in how communication computation overlap is performed. Would adding some more details about this in Section 2 be easy?

As noted in the previous two points, ImplicitGlobalGrid.jl does itself not perform a communication computation overlap, but it performs the communication in a way that enables other packages to do it easily an optimally: "all data transfers are performed on non-blocking high-priority streams or queues" (page 2, paragraph 1). Other packages as, e.g., ParallelStencil.jl can leverage this to perform communication computation overlap: "ParallelStencil.jl, e.g., can do so with a simple macro call (Fig. 1, line 36)" (page 2, paragraph 1). Thus, here is already described what ImplicitGlobalGrid.jl does concerning the communication computation overlap.

I suggest avoiding using the term xPU in the title. In the recent rise of “exotic” accelerators like Google’s TPU, Graphcore’s IPU, or expected QPUs, it probably would not harm just to use: "Distributed Parallelization of Stencil Computations in Julia"

Thank you for the suggestion. We can understand your point. However, if we removed the 'xPU', then many people would assume that this work applies only to CPU when reading the title. The 'xPU' is important for us to emphasise the backend portability and we believe normal that the 'x' of 'xPU' refers only to a subset of what exists. Furthermore, ImplicitGlobalGrid.jl was designed to be easily extendable with new backends, which can be for other GPU kinds or other kinds of xPU (and we will certainly add new backends in the future for any hardware that is of interest for us). We will try to emphasize more on this.

Why are you using 2197 GPUs?

The reason for the scaling up to 2197 GPUs or nodes is that 17^3 (=2197) nodes is the biggest cubic node topology that can be submitted in the normal queue of Piz Daint (up to 2400 nodes, see: https://user.cscs.ch/access/running/piz_daint/#slurm-batch-queues). The number of 2197 GPUs appears to surprise when it is not explained, but there is nothing negative about it as the reported parallel efficiency confirms. Thus, we would like to add an explanation while keeping the plot as is.

Why is CPU scaling not presented since the paper claims xPUs?

Given that CPU-only supercomputers are becoming rare and given that the challenge is higher when GPUs are involved, we considered it sufficient to focus on GPUs in this short paper. We see the interest of the CPU functionality mostly for code validation and possibly prototyping.

Why is strong scaling not presented in this work?

We do not have any use cases where we have a strictly fixed problem size. Thus, we considered presenting weak scaling more important.

How good is single-node performance?

The single-node performance is very high thanks to the implementation using ParallelStencil.jl.

georgebisbas · 2024-02-27T12:22:11Z

The single-node performance is very high thanks to the implementation using ParallelStencil.jl.

Are there any performance numbers, like Gpts/s or GCells/s or any roofline model, we could look at?

fcdimitr · 2024-03-23T17:36:18Z

@editorialbot generate pdf

editorialbot · 2024-03-23T17:36:44Z

⚠️ An error happened when generating the pdf.

fcdimitr · 2024-03-23T17:39:28Z

The error might be due to missing DOIs. @lucaferranti is my understanding correct?

@omlins could you add the missing DOIs and try to generate the PDF again?

Thank you!

omlins · 2024-03-25T19:40:47Z

@fcdimitr : please excuse the delay for the reply. We aim to finalize this extended abstract hopefully this and next week.

I have updated the bibliography file; I think now it should be able to generate the PDF.

omlins · 2024-03-25T19:43:04Z

@mloubout :

Staggered grid can lead to asymmetrical halos (potentially send only/recv only), a small explanation on how it is handled would be useful.

Thank you for pointing this out. In our approach, a field will only have halos in a given dimension if the corresponding overlap between the local fields is at least two cells wide; no halos are created if the overlap is only one cell wide (redundant computation is done instead), which avoids the need to deal with asymmetrical halos. We will see how to add this to the document.

How does it compare to similar DSLs for distributed stencil/pde solving such as Firedrake, Fenics, Devito

Thank you for your comment. We will be happy to do a comparison with related work as the one you mentioned, and with other key work in the vast fields of distributed memory parallelization and architecture-agnostic programming, as well as, more generally speaking, with important work that shares our objective to contribute to solving the challenge of the 3 “P”s -- (scalable) Performance, (performance) Portability and Productivity --. However, given the large amount of work our contribution is related to, we aim to do so at a later point in a full paper submission. We did not have the space to do so in this very short extended abstract format of "at most two pages of content including references" (see author's guide: https://juliacon.github.io/proceedings-guide/author/), which "lays out in a concise fashion the methodology and use cases of the work presented at the conference", as opposed to a full paper submission, which "compared to an extended abstract, (...) compares the work to other approaches taken in the field and gives some additional insights on the conference contribution".

fcdimitr · 2024-03-25T19:57:23Z

@editorialbot generate pdf

editorialbot · 2024-03-25T19:59:30Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

fcdimitr · 2024-05-29T14:54:57Z

Hi @omlins! I'm just checking in; what is the status of this? Thank you!

omlins · 2024-07-23T15:24:03Z

We aim to address the remaining issues in the manuscript in the next couple of days. Please excuse the delay.

omlins · 2024-07-25T14:56:16Z

@editorialbot generate pdf

editorialbot · 2024-07-25T14:59:38Z

⚠️ An error happened when generating the pdf.

omlins · 2024-07-26T18:30:04Z

@editorialbot generate pdf

editorialbot · 2024-07-26T18:32:35Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

omlins · 2024-10-17T10:35:02Z

@mloubout, @georgebisbas: Dear reviewers, we have improved the manuscript, addressing all the issues raised.

georgebisbas · 2024-10-18T07:31:15Z

Hi!
Thanks again for the improvements. A few more points that could be addressed:

The caption of Figure 2 could also be enhanced with more details on the executed problem (similar to Figure 3)

We do not have any use cases where we have a strictly fixed problem size. Thus, we considered presenting weak scaling more important.

I highly appreciate that, but why not presenting a 1024^3 problem of heat diffusion, just up to the point where the local decomposed problem becomes small enough to hit a strong scaling plateau?
Weak scaling benchmarking often hides communication efficiency, since the ratio of computation to communication remains high and constant, similar to the smallest possible decomposition.
Strong scaling could help us see whether e.g. using 8 GPUs starts having a plateau for a 1024^3 problem.

The single-node performance is very high thanks to the implementation using ParallelStencil.jl.

It would be good to add a pointer to the ParallelStencil.jl paper in the manuscript

Community guidelines should be also added. Please correct if I have not seen them so far.
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support #167

omlins · 2024-10-28T17:56:34Z

@editorialbot generate pdf

editorialbot · 2024-10-28T18:00:03Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

omlins · 2024-10-28T18:45:42Z

@georgebisbas: Thank you very much for the additional comments. In the following you can find the replies:

The caption of Figure 2 could also be enhanced with more details on the executed problem (similar to Figure 3)

We have extended the caption of Figure 2 with more information similar to Figure 3.

--

The single-node performance is very high thanks to the implementation using ParallelStencil.jl.

It would be good to add a pointer to the ParallelStencil.jl paper in the manuscript

We have added the pointer to the ParallelStencil.jl paper in the caption of Figure 2.

--

Community guidelines should be also added. Please correct if I have not seen them so far.
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support #167

We have added guidelines for contribution to the readme.

--

Minor comments (not aiming to be addressed, but possibly interesting points for future work and further discussion):

Why is strong scaling not presented in this work?

We do not have any use cases where we have a strictly fixed problem size. Thus, we considered presenting weak scaling more important.

I highly appreciate that, but why not presenting a 1024^3 problem of heat diffusion, just up to the point where the local decomposed problem becomes small enough to hit a strong scaling plateau?

Thank you for your suggestion. It could certainly be interesting to dive into a strong scaling analysis. However, we believe that this does not fit into this very short extended abstract of two pages. Thus, we would like to follow your original suggestion and leave this for future work.

georgebisbas · 2024-10-29T10:25:41Z

Thank you @omlins for the edits!
@fcdimitr I reckon that my comments have now been adequately addressed!

luraess · 2024-10-31T17:13:15Z

@fcdimitr please wait until I update the meta files and info to solve the non-showing DOI and special character issue before taking further action. 🙏

luraess · 2024-11-04T09:52:37Z

@editorialbot generate pdf

editorialbot · 2024-11-04T09:54:39Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

luraess · 2024-11-04T09:59:20Z

@fcdimitr The DOI and author name related issues are solved. Thanks for waiting. Since there seems to be no remaining issues, we could proceed.

fcdimitr · 2024-11-05T12:49:20Z

Thank you @luraess! What is the Zeonodo archival DOI?

👋 @mloubout and @georgebisbas. Could you please mark all checkboxes in your list as done? Thank you!

luraess · 2024-11-05T13:40:53Z

@omlins will share the DOI

georgebisbas · 2024-11-05T14:52:08Z

Thank you @luraess! What is the Zeonodo archival DOI?

👋 @mloubout and @georgebisbas. Could you please mark all checkboxes in your list as done? Thank you!

Done, thanks

omlins · 2024-11-08T17:45:47Z

@fcdimitr : here you can find the DOI for the paper: https://doi.org/10.5281/zenodo.14056962
here you can find the DOI for the code: https://doi.org/10.5281/zenodo.13847550

whedon added Julia review Shell TeX labels Nov 6, 2023

whedon assigned fcdimitr Nov 6, 2023

whedon mentioned this issue Nov 6, 2023

[PRE REVIEW]: Distributed Parallelization of xPU Stencil Computations in Julia #115

Closed

georgebisbas mentioned this issue Nov 25, 2023

[REVIEW]: High-performance xPU Stencil Computations in Julia #138

Closed

42 tasks

georgebisbas self-assigned this Jan 15, 2024

fcdimitr added the author action required label May 2, 2024

[REVIEW]: Distributed Parallelization of xPU Stencil Computations in Julia #137

[REVIEW]: Distributed Parallelization of xPU Stencil Computations in Julia #137

Comments

whedon commented Nov 6, 2023 • edited by georgebisbas Loading

Status

Reviewer instructions & questions

Review checklist for @mloubout

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Paper format

Content

Review checklist for @georgebisbas

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Paper format

Content

whedon commented Nov 6, 2023

whedon commented Nov 6, 2023

whedon commented Nov 6, 2023

whedon commented Nov 6, 2023

whedon commented Nov 6, 2023

whedon commented Nov 6, 2023

whedon commented Nov 20, 2023

whedon commented Nov 20, 2023

mloubout commented Nov 20, 2023

Review

georgebisbas commented Nov 25, 2023

omlins commented Dec 5, 2023

georgebisbas commented Dec 5, 2023

omlins commented Dec 5, 2023

georgebisbas commented Dec 5, 2023

omlins commented Dec 7, 2023

georgebisbas commented Jan 15, 2024

omlins commented Feb 1, 2024

omlins commented Feb 2, 2024

georgebisbas commented Feb 27, 2024

fcdimitr commented Mar 23, 2024

editorialbot commented Mar 23, 2024

fcdimitr commented Mar 23, 2024

omlins commented Mar 25, 2024

omlins commented Mar 25, 2024 • edited Loading

fcdimitr commented Mar 25, 2024

editorialbot commented Mar 25, 2024

fcdimitr commented May 29, 2024

omlins commented Jul 23, 2024

omlins commented Jul 25, 2024

editorialbot commented Jul 25, 2024

omlins commented Jul 26, 2024

editorialbot commented Jul 26, 2024

omlins commented Oct 17, 2024

georgebisbas commented Oct 18, 2024

omlins commented Oct 28, 2024

editorialbot commented Oct 28, 2024

omlins commented Oct 28, 2024

georgebisbas commented Oct 29, 2024

luraess commented Oct 31, 2024

luraess commented Nov 4, 2024

editorialbot commented Nov 4, 2024

luraess commented Nov 4, 2024

fcdimitr commented Nov 5, 2024

luraess commented Nov 5, 2024

georgebisbas commented Nov 5, 2024

omlins commented Nov 8, 2024

whedon commented Nov 6, 2023 •

edited by georgebisbas

Loading

omlins commented Mar 25, 2024 •

edited

Loading