Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AVHRRMTA_G-NAVO-L2P-v1.0 and AVHRRMTB_G-NAVO-L2P-v1.0 #15

Merged
merged 9 commits into from
Feb 18, 2022

Conversation

sliu008
Copy link
Contributor

@sliu008 sliu008 commented Feb 3, 2022

Description

Add AVHRRMTA_G-NAVO-L2P-v1.0 and AVHRRMTB_G-NAVO-L2P-v1.0 to associations.txt
Issue #10 handle empty granules

Overview of work done

Fork process dies suddenly with a signal 7 which is a sigbus error on linux which is a memory alignment error. This doesn't happen when we run concise locally but when ran in harmony we get this error. I added in code to limit how much memory is allocated and wait for the write process to delete the allocated space before continue reading and allocate more memory. Running in single core mode also works for these collections. Tried to increase memory via env variables for the docker container but didn't seem to have any effect and the fork process still died.

Bash into concise docker and running concise vis merge cli resulted in below error

dockeruser@podaac-concise-6df5775c76-psszf:~$ merge -c 2 /home/dockeruser/test2/ /home/dockeruser/test2.nc
Traceback (most recent call last):
  File "/home/dockeruser/.local/lib/python3.9/site-packages/podaac/merger/merge_worker.py", line 113, in _run_multi_core
    i, var_path, shape, memory_name = out_queue.get_nowait()
  File "<string>", line 2, in get_nowait
  File "/usr/local/lib/python3.9/multiprocessing/managers.py", line 825, in _callmethod
    raise convert_to_error(kind, result)
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dockeruser/.local/bin/merge", line 8, in <module>
    sys.exit(main())
  File "/home/dockeruser/.local/lib/python3.9/site-packages/podaac/merger/merge_cli.py", line 40, in main
    merge_netcdf_files(input_files, args.output_path, process_count=args.cores)
  File "/home/dockeruser/.local/lib/python3.9/site-packages/podaac/merger/merge.py", line 62, in merge_netcdf_files
    run_merge(merged_dataset, input_files, var_info, max_dims, process_count)
  File "/home/dockeruser/.local/lib/python3.9/site-packages/podaac/merger/merge_worker.py", line 36, in run_merge
    _run_multi_core(merged_dataset, file_list, var_info, max_dims, 2)
  File "/home/dockeruser/.local/lib/python3.9/site-packages/podaac/merger/merge_worker.py", line 115, in _run_multi_core
    _check_exit(processes)
  File "/home/dockeruser/.local/lib/python3.9/site-packages/podaac/merger/merge_worker.py", line 194, in _check_exit
    raise RuntimeError(f'Merging failed - exit code: {process.exitcode}')
RuntimeError: Merging failed - exit code: -7
/usr/local/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
dockeruser@podaac-concise-6df5775c76-psszf:/usr/local/lib/python3.9/multiprocessing$ df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay          59G   43G   14G  76% /
tmpfs            64M     0   64M   0% /dev
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda1        59G   43G   14G  76% /tmp
grpcfuse        932G  189G  726G  21% /tmp/metadata
shm              64M     0   64M   0% /dev/shm
tmpfs           3.9G   12K  3.9G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs           3.9G     0  3.9G   0% /proc/acpi
tmpfs           3.9G     0  3.9G   0% /sys/firmware

Shared memory is only 64 mb /dev/shm will ask harmony to see if we can increase

  • Update uat and ops associations files
  • Added additional logging
  • Added in code to throttle read process if there is too much memory allocated
  • Handle empty granule files

Overview of verification done

Tested collection in uat with local harmony.

Overview of integration done

Screen Shot 2022-02-10 at 10 42 34 AM

PR checklist:

  • Linted
  • Updated unit tests
  • Updated changelog
  • Integration testing

See Pull Request Review Checklist for pointers on reviewing this pull request

@sliu008 sliu008 marked this pull request as ready for review February 10, 2022 18:50
resized_arr = resize_var(ds_var, var_meta, max_dims)

# Limit to how much memory we allocate to 60 MB
while memory_limit.value + resized_arr.nbytes > 60000000 and not out_queue.empty():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this value (60000000) be an environment variable so we could change it at some point in the future? How does this affect the processing, does it just take longer? I'm wondering if we are at risk of not being able to use concise on certain data because we're limiting ourselves to 60MB? Like, could there be a dataset that won't fit into this 60MB and therefore wouldn't be able to be processed?

Copy link
Contributor Author

@sliu008 sliu008 Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I can put in to to get an env variable or get a default was hoping we can set it via harmony somewhere but doesn't seem like we can change the shared memory at the moment
  2. The current running version of kubernetes that harmony is running doesn't have way to set the memory size, we can mount memory into shared memory but ends up taking half the memory of the kubernetes pod which is a default to take half when no size is specified. There is a fix for it in later version of kubernetes. Another thing that harmony needs to add is LocalStorageCapacityIsolation which im not sure what that is.
  3. It doesn't take any longer as concise is 2 process one read one write, and it will finish when the write is done and read is faster than write, so even if we throttle the read the whole thing will only finish as fast as the write process.
  4. We are limited to the 60 mb as if there are variables that are bigger than that it will try to write to memory and crash.
  5. Another option is to go with single core processing so we don't use shared memory until we have a solution to set the shared memory size limit in harmony.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 17, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

87.1% 87.1% Coverage
0.0% 0.0% Duplication

@sliu008 sliu008 merged commit f6ccf5a into develop Feb 18, 2022
@sliu008 sliu008 deleted the feature/PODAAC-4171_PODAAC-4173 branch February 18, 2022 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants