Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding progress bar to forward gravity calculation of prisms #312

Closed
mdtanker opened this issue May 20, 2022 · 7 comments · Fixed by #315
Closed

Adding progress bar to forward gravity calculation of prisms #312

mdtanker opened this issue May 20, 2022 · 7 comments · Fixed by #315
Labels
enhancement Idea or request for a new feature

Comments

@mdtanker
Copy link
Member

Description of the desired feature:

During a call to prism.gravity (or prism_layer.gravity) it would be great to have a progress bar which updates as the calculations advance through the gravity observation points (coordinates) . @LL-Geo suggested the numba_progress package, which seems to be a good fit. Looking through harmonica.forward.prisms.py, I assume implementing this feature would be done within the jit_prism_gravity function, and when calling the function, users would use something along the lines of:

with ProgressBar(total=max_iter) as numba_progress:
        topo_prisms.prism_layer.gravity(
               coordinates=(df_grav.x, df_grav.y, df_grav.z),
               field = 'g_z',
               progress=numba_progress)

Are you willing to help implement and maintain this feature?

Yes I'm willing to try, but will likely need some assistance since it's my first code contribution, and I'm unfamiliar with numba_progress.

@mdtanker mdtanker added the enhancement Idea or request for a new feature label May 20, 2022
@leouieda
Copy link
Member

@mdtanker thank you for opening this feature request! I wasn't aware of numba-progress so it's good to know it exists.

My main concern with this is: Does this have an impact on performance? Adding a function call to jit-compiled loop (which usually runs many times) has the potential to slow down the process. It would be good to try this out and measure the performance impact. It's unlikely to be zero but I have no idea if it would be large.

If the performance hit is small, particularly when not using a progressbar, then it would be great to have as an optional. The dependency on numba-progress should be optional, since this a new and small project (only 10 commits and 2 releases) which could break or be discontinued in the future.

The numba_progress variable would need to be passed all the way down to this function: https://github.com/fatiando/harmonica/blob/main/harmonica/forward/prism.py#L189

Maybe @santisoler or @LL-Geo can advise on how you could go about trying this out?

@mdtanker
Copy link
Member Author

Thanks for the fast response @leouieda. I've had a go at implementing numba-progress and I think I've got it working in at least a minimal way. To do it, I added an argument progress_proxy to the function jit_prism_gravity, and then added progress_proxy.update(1) within the first for loop, as below:

    # Iterate over computation points and prisms
    for l in prange(coordinates[0].size):
        for m in range(prisms.shape[0]):
            # Iterate over the prism boundaries to compute the result of the
            # integration (see Nagy et al., 2000)
            for i in range(2):
                for j in range(2):
                    for k in range(2):
                        shift_east = prisms[m, 1 - i]
                        shift_north = prisms[m, 3 - j]
                        shift_upward = prisms[m, 5 - k]
                        # If i, j or k is 1, the shift_* will refer to the
                        # lower boundary, meaning the corresponding term should
                        # have a minus sign
                        out[l] += (
                            density[m]
                            * (-1) ** (i + j + k)
                            * kernel(
                                shift_east - coordinates[0][l],
                                shift_north - coordinates[1][l],
                                shift_upward - coordinates[2][l],
                            )
                        )
        progress_proxy.update(1)

To get this to work, I added the arg progress_proxy to the functions: prism_gravity, prism_layer, and gravity, as well as to the return of gravity, and the dispatch call in prism_gravity. I don't think I understand the code enough to know why I need to do this, but it seemed to make it work!

To calculate the forward gravity of a layer of prisms I was previously using something like the following:

 for k, v in layers.items():
     df_grav[f'{k}_forward_grav'] = v['prisms'].prism_layer.gravity(
         coordinates=(df_grav.x, df_grav.y, df_grav.z),
         field = 'g_z',)

Where layers is a dictionary containing 3 layers of prisms (ice, water, and sediment), and df_grav is a dataframe of gravity observations. Each layer has 16641 points and the same number of gravity observations. The above cell takes 1m 36.5s to run.

Below is how I've updated this to show a progress bar for each of the 3 layers of prisms.

from numba_progress import ProgressBar

for k, v in layers.items():
    with ProgressBar(total=len(df_grav)) as progress:
        df_grav[f'{k}_forward_grav'] = v['prisms'].prism_layer.gravity(
            coordinates=(df_grav.x, df_grav.y, df_grav.z),
            field = 'g_z', progress_proxy=progress)

100%|██████████| 16641.0/16641 [00:27<00:00, 608.82it/s]
100%|██████████| 16641.0/16641 [00:34<00:00, 489.12it/s]
100%|██████████| 16641.0/16641 [00:35<00:00, 464.67it/s]

This took 1m 37s. Based off this it seem to not slow the runtime at all.

Any suggestions for not needing to add the arg progress_proxy to so many locations in the code?

@santisoler
Copy link
Member

This looks great! Thanks @mdtanker for the idea! 🥇

Any suggestions for not needing to add the arg progress_proxy to so many locations in the code?

Actually I was thinking that we don't need to expose the ProgressBar to the user, nor ask them to pass a ProgressBar instance as argument. Instead we can add a progressbar flag to prism_gravity (like the one we have in pooch.retrieve https://www.fatiando.org/pooch/latest/api/generated/pooch.retrieve.html#pooch.retrieve) and initialize a ProgressBar inside the function, which will be passed to the jitted functions down the road.

So, your last example would look like this:

for k, v in layers.items():
    df_grav[f'{k}_forward_grav'] = v['prisms'].prism_layer.gravity(
        coordinates=(df_grav.x, df_grav.y, df_grav.z),
        field = 'g_z',
        progressbar=True,
    )

Since numba-progress will be an optional dependency, the progressbar should be False by default and the jitted functions should be able to work even if no ProgressBar is being passed to them.

@LL-Geo
Copy link
Member

LL-Geo commented May 23, 2022

Thanks for opening this feature request @mdtanker ! I think if we only pass the progressbar to the outer loop (the observation points), that will have a small impact on the performance.
I agree with @santisoler 's solution 👍. A default progressbar = False will be a good option.

@mdtanker
Copy link
Member Author

Thanks for the suggestions! I have it working and everything seems alright, except if called with parallel=False then the progress bar only gives 0% and 100%, with no intermediate progress. Not sure why this is happening, but since parallel defaults to True then maybe it's no big deal.

Here is an example without the progress bar, which took 1m 28.2s :

for k, v in layers.items():
    df_grav[f'{k}_forward_grav'] = v['prisms'].prism_layer.gravity(
        coordinates=(df_grav.x, df_grav.y, df_grav.z),
        field = 'g_z', 
)

And an example with progressbar=True, which took 1min 34.6s:

for k, v in layers.items():
    df_grav[f'{k}_forward_grav'] = v['prisms'].prism_layer.gravity(
        coordinates=(df_grav.x, df_grav.y, df_grav.z),
        field = 'g_z', 
        progressbar=True,
)
100%|██████████| 16641.0/16641 [00:29<00:00, 557.69it/s]
100%|██████████| 16641.0/16641 [00:31<00:00, 527.07it/s]
100%|██████████| 16641.0/16641 [00:33<00:00, 503.12it/s]

In the prism.py file, I've added the following before the dispatcher call:

if progressbar is True:
    from numba_progress import ProgressBar
    progress_proxy = ProgressBar(total = coordinates[0].size) 
else: 
    progress_proxy = None

And the following after the dispacher call:

  try:
      progress_proxy.close()
  except:
      pass

And this is the updated function:

def jit_prism_gravity(coordinates, prisms, density, kernel, out, progress_proxy):
    # Iterate over computation points and prisms
    for l in prange(coordinates[0].size):
        # Update progress bar if called
        if progress_proxy is not None:
            progress_proxy.update(1) 
        for m in range(prisms.shape[0]):
            # Iterate over the prism boundaries to compute the result of the
            # integration (see Nagy et al., 2000)
            for i in range(2):
                for j in range(2):
                    for k in range(2):
                        shift_east = prisms[m, 1 - i]
                        shift_north = prisms[m, 3 - j]
                        shift_upward = prisms[m, 5 - k]
                        # If i, j or k is 1, the shift_* will refer to the
                        # lower boundary, meaning the corresponding term should
                        # have a minus sign
                        out[l] += (
                            density[m]
                            * (-1) ** (i + j + k)
                            * kernel(
                                shift_east - coordinates[0][l],
                                shift_north - coordinates[1][l],
                                shift_upward - coordinates[2][l],
                            )
                        )

Is the next step to open a pull request to implement this?

@santisoler
Copy link
Member

Looking awesome! Congrats!

I have it working and everything seems alright, except if called with parallel=False then the progress bar only gives 0% and 100%, with no intermediate progress. Not sure why this is happening, but since parallel defaults to True then maybe it's no big deal.

Thanks for noticing it. This is not a good behaviour: any user that keeps waiting for ~1 minute without seeing the progressbar moving will likely stop the computation. The progressbar should work both with parallel set to True or False. But don't worry about it now, we can look into it later. That's just troubleshooting.

And the following after the dispacher call:

  try:
      progress_proxy.close()
  except:
      pass

Minor suggestion here.
We try to avoid try: ...; except: ... statements unless there's a specific exception we want to catch. In this particular case we can just close the progress_proxy if progressbar is True, so we can replace it with:

if progressbar:
    progress_proxy.close()

But, there's a good case in which we would like to use the try and except statements:

if progressbar is True:
    from numba_progress import ProgressBar
    progress_proxy = ProgressBar(total = coordinates[0].size) 
else: 
    progress_proxy = None

If a user flags progressbar=True but numba_progress is not installed, it would get an ImportError with probably a non straight forward message. We can catch that exception and send our our ImportError with a more informative message. Check how we do a similar thing with pyvista here: https://github.com/fatiando/harmonica/blob/main/harmonica/visualization/prism.py

Is the next step to open a pull request to implement this?

Absolutely! Feel free to open a PR with the changes. It doesn't need to be perfect nor the final version. We can continue this talk in there, with actual shared code we can both run and test.

@mdtanker
Copy link
Member Author

Good catch, I've added the ImportError and will try and open a PR soon. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Idea or request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants