Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add version restrictions to tensorflow for pypi #1485

Merged
merged 8 commits into from
Sep 14, 2023
Merged

Conversation

roomrys
Copy link
Collaborator

@roomrys roomrys commented Sep 11, 2023

Description

Add version restrictions to tensorflow for pypi extra. While the conda packages for 1.3.2 were not affected (since tensorflow is pulled in from anaconda), the PyPI only package installed via pip install sleap[pypi] had conflicts between the version of tensorflow and the version of keras.

Types of changes

  • Bugfix
  • New feature
  • Refactor / Code style update (no logical changes)
  • Build / CI changes
  • Documentation Update
  • Other (explain)

Does this address any currently open issues?

Outside contributors checklist

  • Review the guidelines for contributing to this repository
  • Read and sign the CLA and add yourself to the authors list
  • Make sure you are making a pull request against the develop branch (not main). Also you should start your branch off develop
  • Add tests that prove your fix is effective or that your feature works
  • Add necessary documentation (if appropriate)

Thank you for contributing to SLEAP!

❤️

Summary by CodeRabbit

  • Chore: Updated the project's dependencies in requirements.txt to ensure compatibility across different platforms and machine types.
  • Bug Fix: Added specific version constraints for python-rapidjson to address a Microsoft Visual C++ requirement error on Windows.
  • New Feature: Introduced support for TensorFlow on Apple Silicon machines by adding tensorflow-macos and tensorflow-metal dependencies.
  • Chore: Added and updated version constraints for several dependencies including tensorflow, tensorflow-hub, urllib3, google-auth, and protobuf to maintain compatibility and stability.

@roomrys roomrys marked this pull request as ready for review September 11, 2023 22:05
@coderabbitai
Copy link

coderabbitai bot commented Sep 11, 2023

Walkthrough

The changes primarily focus on updating and refining the project's dependencies in pypi_requirements.txt. The modifications ensure compatibility across different platforms and machine types, with specific attention to TensorFlow versions and Apple Silicon support. Additionally, version constraints have been added or updated for several packages to maintain stability and avoid potential conflicts.

Changes

File Summary
pypi_requirements.txt Updated and refined dependencies. Added platform-specific constraints for TensorFlow and TensorFlow-Hub. Introduced support for Apple Silicon with TensorFlow-Macos and TensorFlow-Metal. Adjusted version constraints for various packages including Python-Rapidjson, Protobuf, and others.

🐇💻

"In the land of code, where the shadows lie,

We update, refine, and rectify.

With each change, a new dawn begins,

A tale of versions, of losses and wins.

Across platforms wide, and machines so varied,

Our code shall run, no longer harried.

So here's to the changes, big and small,

For in the end, they benefit us all." 🎉🥕


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • Mention @coderabbitai in any review comment for bot assistance.
  • Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

  • Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.

@roomrys roomrys mentioned this pull request Sep 11, 2023
11 tasks
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between e4fca4f and a4d2da8 commits.
Files selected for processing (1)
  • pypi_requirements.txt (1 hunks)
Files skipped from review due to trivial changes (1)
  • pypi_requirements.txt

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between a4d2da8 and 3ec76be commits.
Files selected for processing (1)
  • pypi_requirements.txt (1 hunks)
Additional comments (Suppressed): 4
pypi_requirements.txt (4)
  • 34-35: The version constraints for tensorflow have been updated to handle different platforms and machine types. Ensure that these changes do not introduce any compatibility issues with other parts of the codebase that use tensorflow.

  • 36-36: The version constraint for tensorflow-hub has been updated. Verify if this change is compatible with the rest of the project.

  • 37-38: tensorflow-macos and tensorflow-metal are new dependencies added specifically for macOS arm64 machines. Make sure that these additions do not cause any conflicts or issues on other platforms.

  • 42-43: A new dependency on Protobuf has been added due to requirements from tensorboard and tensorflow. This could potentially lead to version conflicts if other parts of the project depend on a different version of Protobuf. Please verify this.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between 3ec76be and bd8e335 commits.
Files selected for processing (1)
  • pypi_requirements.txt (2 hunks)
Files skipped from review due to trivial changes (1)
  • pypi_requirements.txt

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between bd8e335 and 1f69080 commits.
Files selected for processing (1)
  • pypi_requirements.txt (2 hunks)
Files skipped from review due to trivial changes (1)
  • pypi_requirements.txt

@roomrys
Copy link
Collaborator Author

roomrys commented Sep 13, 2023

Windows 10 w/ tensorflow 2.10

The windows pypi package fails on inference and yields:

error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
2023-09-13 10:39:49.109423: W tensorflow/core/framework/op_kernel.cc:1768] UNKNOWN: JIT compilation failed.

I think we will have to give instructions for finding the CUDA_DIR after all...

Full Traceback
Using already trained model for centroid: C:/Users/Liezl/Projects/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship\models\230323_125426.centroid.n=101\training_config.json
Using already trained model for centered_instance: C:/Users/Liezl/Projects/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship\models\230522_142651.centered_instance.n=101\training_config.json
Command line call:
sleap-track C:/Users/Liezl/Projects/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship/courtship_labels.slp --only-suggested-frames -m C:/Users/Liezl/Projects/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship\models\230323_125426.centroid.n=101\training_config.json -m C:/Users/Liezl/Projects/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship\models\230522_142651.centered_instance.n=101\training_config.json --tracking.tracker none -o C:/Users/Liezl/Projects/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship\predictions\courtship_labels.slp.230913_103927.predictions.slp --verbosity json --no-empty-frames

Started inference at: 2023-09-13 10:39:31.701518
Args:
{
    'data_path': 'C:/Users/Liezl/Projects/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship/courtship_labels.slp',
    'models': [
        'C:/Users/Liezl/Projects/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship\\models\\230323_125426.centroid.n=101\\training_config.json',
        'C:/Users/Liezl/Projects/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship\\models\\230522_142651.centered_instance.n=101\\training_config.json'
    ],
    'frames': '',
    'only_labeled_frames': False,
    'only_suggested_frames': True,
    'output': 'C:/Users/Liezl/Projects/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship\\predictions\\courtship_labels.slp.230913_103927.predictions.slp',
    'no_empty_frames': True,
    'verbosity': 'json',
    'video.dataset': None,
    'video.input_format': 'channels_last',
    'video.index': '',
    'cpu': False,
    'first_gpu': False,
    'last_gpu': False,
    'gpu': 'auto',
    'max_edge_length_ratio': 0.25,
    'dist_penalty_weight': 1.0,
    'batch_size': 4,
2023-09-13 10:39:32.960204: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    'open_in_gui': False,
    'peak_threshold': 0.2,
    'max_instances': None,
    'tracking.tracker': 'none',
    'tracking.max_tracking': None,
    'tracking.max_tracks': None,
    'tracking.target_instance_count': None,
    'tracking.pre_cull_to_target': None,
    'tracking.pre_cull_iou_threshold': None,
    'tracking.post_connect_single_breaks': None,
2023-09-13 10:39:33.478075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9601 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:02:00.0, compute capability: 8.6
    'tracking.clean_instance_count': None,
    'tracking.clean_iou_threshold': None,
    'tracking.similarity': None,
    'tracking.match': None,
    'tracking.robust': None,
    'tracking.track_window': None,
    'tracking.min_new_track_points': None,
    'tracking.min_match_points': None,
    'tracking.img_scale': None,
    'tracking.of_window_size': None,
    'tracking.of_max_levels': None,
    'tracking.save_shifted_instances': None,
    'tracking.kf_node_indices': None,
    'tracking.kf_init_frame_count': None
}

INFO:sleap.nn.inference:Auto-selected GPU 0 with 11560 MiB of free memory.
2023-09-13 10:39:39.541627: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -45 } dim { size: -46 } dim { size: -47 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -15 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -15 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 3060" frequency: 1777 num_cores: 28 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 2359296 shared_memory_size_per_multiprocessor: 102400 memory_size: 10067378176 bandwidth: 360048000 } outputs { dtype: DT_FLOAT shape { dim { size: -15 } dim { size: -48 } dim { size: -49 } dim { size: 1 } } }
2023-09-13 10:39:39.544449: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1024 } dim { size: 1024 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -15 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -15 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 3600 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 49152 l2_cache_size: 524288 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -15 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }    
2023-09-13 10:39:39.549962: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -90 } dim { size: -91 } dim { size: -92 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -20 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -20 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 3060" frequency: 1777 num_cores: 28 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 2359296 shared_memory_size_per_multiprocessor: 102400 memory_size: 10067378176 bandwidth: 360048000 } outputs { dtype: DT_FLOAT shape { dim { size: -20 } dim { size: -94 } dim { size: -95 } dim { size: 1 } } }
2023-09-13 10:39:41.003211: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8201
2023-09-13 10:39:47.575359: I tensorflow/stream_executor/cuda/cuda_blas.cc:1614] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
2023-09-13 10:39:49.109423: W tensorflow/core/framework/op_kernel.cc:1768] UNKNOWN: JIT compilation failed.
Traceback (most recent call last):
  File "C:\Users\Liezl\Projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\Scripts\sleap-track-script.py", line 11, in <module>
Versions:
    load_entry_point('sleap==1.3.2', 'console_scripts', 'sleap-track')()
  File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 5424, in main
    labels_pr = predictor.predict(provider)
  File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 526, in predict
    self._make_labeled_frames_from_generator(generator, data)
  File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 2633, in _make_labeled_frames_from_generator
    for ex in generator:
  File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 457, in _predict_generator     
    ex = process_batch(ex)
  File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 399, in process_batch
    preds = self.inference_model.predict_on_batch(ex, numpy=True)
  File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 1069, in predict_on_batch      
    outs = super().predict_on_batch(data, **kwargs)
  File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\engine\training.py", line 2474, in predict_on_batch   
    outputs = self.predict_function(iterator)
  File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\tensorflow\python\eager\execute.py", line 55, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:     

Detected at node 'mod' defined at (most recent call last):
    File "C:\Users\Liezl\Projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\Scripts\sleap-track-script.py", line 11, in <module>
      load_entry_point('sleap==1.3.2', 'console_scripts', 'sleap-track')()       
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 5424, in main
      labels_pr = predictor.predict(provider)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 526, in predict
      self._make_labeled_frames_from_generator(generator, data)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 2633, in _make_labeled_frames_from_generator
      for ex in generator:
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 457, in _predict_generator   
      ex = process_batch(ex)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 399, in process_batch        
      preds = self.inference_model.predict_on_batch(ex, numpy=True)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 1069, in predict_on_batch    
      outs = super().predict_on_batch(data, **kwargs)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\engine\training.py", line 2474, in predict_on_batch 
      outputs = self.predict_function(iterator)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\engine\training.py", line 2041, in predict_function 
      return step_function(self, iterator)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\engine\training.py", line 2027, in step_function    
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\engine\training.py", line 2015, in run_step
      outputs = model.predict_step(data)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\engine\training.py", line 1983, in predict_step     
      return self(x, training=False)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\engine\training.py", line 557, in __call__
      return super().__call__(*args, **kwargs)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\engine\base_layer.py", line 1097, in __call__       
      outputs = call_fn(inputs, *args, **kwargs)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\utils\traceback_utils.py", line 96, in error_handler
      return fn(*args, **kwargs)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 2256, in call
      if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth):
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 2265, in call
      peaks_output = self.instance_peaks(crop_output)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\engine\base_layer.py", line 1097, in __call__       
      outputs = call_fn(inputs, *args, **kwargs)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\keras\utils\traceback_utils.py", line 96, in error_handler
      return fn(*args, **kwargs)
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 2101, in call
      if self.offsets_ind is None:
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\inference.py", line 2103, in call
      peak_points, peak_vals = sleap.nn.peak_finding.find_global_peaks(
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\peak_finding.py", line 366, in find_global_peaks 
      rough_peaks, peak_vals = find_global_peaks_rough(
    File "c:\users\liezl\projects\sleap-estimates-animal-poses\pull-requests\sleap\v0\lib\site-packages\sleap\nn\peak_finding.py", line 224, in find_global_peaks_rough
      channel_subs = tf.range(total_peaks, dtype=tf.int64) % channels
Node: 'mod'
JIT compilation failed.
         [[{{node mod}}]] [Op:__inference_predict_function_5208]
pip freeze
absl-py==1.4.0
astunparse==1.6.3
attrs==21.4.0
backports.zoneinfo==0.2.1
cached-property==1.5.2
cachetools==5.3.1
cattrs==1.1.1
certifi==2023.7.22
charset-normalizer==3.2.0
colorama==0.4.6
commonmark==0.9.1
cycler==0.11.0
efficientnet==1.0.0
flatbuffers==23.5.26
fonttools==4.38.0
gast==0.4.0
google-auth==2.23.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.58.0
h5py==3.1.0
hdmf==3.6.1
idna==3.4
image-classifiers==1.0.0
imageio==2.15.0
imgaug==0.4.0
imgstore==0.2.9
importlib-metadata==4.2.0
importlib-resources==5.12.0
joblib==1.3.2
jsmin==3.0.1
jsonpickle==1.2
jsonschema==4.17.3
keras==2.10.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.4.5
libclang==16.0.6
Markdown==3.4.4
MarkupSafe==2.1.3
matplotlib==3.5.3
ndx-pose==0.1.1
networkx==2.6.3
nixio==1.5.3
numpy==1.21.6
oauthlib==3.2.2
opencv-python==4.5.5.64
opt-einsum==3.3.0
packaging==23.1
pandas==1.3.5
Pillow==8.4.0
pkgutil-resolve-name==1.3.10
protobuf==3.19.6
psutil==5.9.5
pyasn1==0.5.0
pyasn1-modules==0.3.0
Pygments==2.16.1
pykalman==0.9.5
pynwb==2.3.3
pyparsing==3.1.1
pyrsistent==0.19.3
PySide2==5.14.1
python-dateutil==2.8.2
python-rapidjson==1.10
pytz==2023.3.post1
PyWavelets==1.3.0
PyYAML==6.0.1
pyzmq==25.1.1
qimage2ndarray==1.10.0
QtPy==2.4.0
requests==2.31.0
requests-oauthlib==1.3.1
rich==10.16.1
rsa==4.9
ruamel.yaml==0.17.32
ruamel.yaml.clib==0.2.7
scikit-image==0.19.3
scikit-learn==1.0.2
scikit-video==1.1.11
scipy==1.7.3
seaborn==0.12.2
segmentation-models==1.0.1
shapely==2.0.1
shiboken2==5.14.1
six==1.16.0
sleap==1.3.2
tensorboard==2.10.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.10.1
tensorflow-estimator==2.10.0
tensorflow-hub==0.14.0
tensorflow-io-gcs-filesystem==0.31.0
termcolor==2.3.0
threadpoolctl==3.1.0
tifffile==2021.11.2
typing-extensions==4.7.1
tzdata==2023.3
tzlocal==5.0.1
urllib3==1.26.16
Werkzeug==2.2.3
wrapt==1.15.0
zipp==3.15.0

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between 1f69080 and 9ad9e6b commits.
Files selected for processing (1)
  • pypi_requirements.txt (2 hunks)
Files skipped from review due to trivial changes (1)
  • pypi_requirements.txt

@roomrys
Copy link
Collaborator Author

roomrys commented Sep 13, 2023

I was able to get inference working on Windows by downgrading to tensorflow<2.9. I tried two different conda environments:
including cuda-nvcc

mamba create -y -c conda-forge -c nvidia pip python=3.7.12 cudatoolkit=11.3.1 cudnn=8.2.1 cuda-nvcc=11.3 -n env0

and excluding cuda-nvcc

mamba create -y -c conda-forge -c nvidia pip python=3.7.12 cudatoolkit=11.3.1 cudnn=8.2.1 -n env1

then pip installed into those environments. Both environments worked, but excluding cuda-nvcc gave this error (although inference ran successfully):

2023-09-13 15:00:35.826160: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8201
2023-09-13 15:00:36.755582: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2023-09-13 15:00:36.756557: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2023-09-13 15:00:36.756881: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas.exe --version
2023-09-13 15:00:36.759495: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2023-09-13 15:00:36.760083: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.

I also ran pip install .[pypi] inside a venv (from the python 3.7.12 inside my "including cuda-nvcc environment") and was able to run training/inference utilizing the GPU with no errors.

conda list
# Name                    Version                   Build  Channel
absl-py                   1.4.0                    pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
attrs                     21.4.0                   pypi_0    pypi
backports-zoneinfo        0.2.1                    pypi_0    pypi
ca-certificates           2023.7.22            h56e8100_0    conda-forge
cached-property           1.5.2                    pypi_0    pypi
cachetools                5.3.1                    pypi_0    pypi
cattrs                    1.1.1                    pypi_0    pypi
certifi                   2023.7.22                pypi_0    pypi
charset-normalizer        3.2.0                    pypi_0    pypi
colorama                  0.4.6                    pypi_0    pypi
commonmark                0.9.1                    pypi_0    pypi
cuda-nvcc                 11.3.58              hb8d16a4_0    nvidia
cudatoolkit               11.3.1              hf2f0253_12    conda-forge
cudnn                     8.2.1.32             h754d62a_0    conda-forge
cycler                    0.11.0                   pypi_0    pypi
efficientnet              1.0.0                    pypi_0    pypi
flatbuffers               23.5.26                  pypi_0    pypi
fonttools                 4.38.0                   pypi_0    pypi
gast                      0.5.4                    pypi_0    pypi
google-auth               2.23.0                   pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.58.0                   pypi_0    pypi
h5py                      3.1.0                    pypi_0    pypi
hdmf                      3.6.1                    pypi_0    pypi
idna                      3.4                      pypi_0    pypi
image-classifiers         1.0.0                    pypi_0    pypi
imageio                   2.15.0                   pypi_0    pypi
imgaug                    0.4.0                    pypi_0    pypi
imgstore                  0.2.9                    pypi_0    pypi
importlib-metadata        4.2.0                    pypi_0    pypi
importlib-resources       5.12.0                   pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
jsmin                     3.0.1                    pypi_0    pypi
jsonpickle                1.2                      pypi_0    pypi
jsonschema                4.17.3                   pypi_0    pypi
keras                     2.8.0                    pypi_0    pypi
keras-applications        1.0.8                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
libclang                  16.0.6                   pypi_0    pypi
libsqlite                 3.43.0               hcfcfb64_0    conda-forge
markdown                  3.3.4                    pypi_0    pypi
markupsafe                2.1.3                    pypi_0    pypi
matplotlib                3.5.3                    pypi_0    pypi
ndx-pose                  0.1.1                    pypi_0    pypi
networkx                  2.6.3                    pypi_0    pypi
nixio                     1.5.3                    pypi_0    pypi
numpy                     1.21.6                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
opencv-python             4.5.5.64                 pypi_0    pypi
openssl                   3.1.2                hcfcfb64_0    conda-forge
opt-einsum                3.3.0                    pypi_0    pypi
packaging                 23.1                     pypi_0    pypi
pandas                    1.3.5                    pypi_0    pypi
pillow                    8.4.0                    pypi_0    pypi
pip                       23.2.1             pyhd8ed1ab_0    conda-forge
pkgutil-resolve-name      1.3.10                   pypi_0    pypi
protobuf                  3.19.6                   pypi_0    pypi
psutil                    5.9.5                    pypi_0    pypi
pyasn1                    0.5.0                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pygments                  2.16.1                   pypi_0    pypi
pykalman                  0.9.5                    pypi_0    pypi
pynwb                     2.3.3                    pypi_0    pypi
pyparsing                 3.1.1                    pypi_0    pypi
pyrsistent                0.19.3                   pypi_0    pypi
pyside2                   5.14.1                   pypi_0    pypi
python                    3.7.12          h900ac77_100_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
python-rapidjson          1.10                     pypi_0    pypi
pytz                      2023.3.post1             pypi_0    pypi
pywavelets                1.3.0                    pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzmq                     25.1.1                   pypi_0    pypi
qimage2ndarray            1.10.0                   pypi_0    pypi
qtpy                      2.4.0                    pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rich                      10.16.1                  pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
ruamel-yaml               0.17.32                  pypi_0    pypi
ruamel-yaml-clib          0.2.7                    pypi_0    pypi
scikit-image              0.19.3                   pypi_0    pypi
scikit-learn              1.0.2                    pypi_0    pypi
scikit-video              1.1.11                   pypi_0    pypi
scipy                     1.7.3                    pypi_0    pypi
seaborn                   0.12.2                   pypi_0    pypi
segmentation-models       1.0.1                    pypi_0    pypi
setuptools                68.2.2             pyhd8ed1ab_0    conda-forge
shapely                   2.0.1                    pypi_0    pypi
shiboken2                 5.14.1                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sleap                     1.3.2                    pypi_0    pypi
sqlite                    3.43.0               hcfcfb64_0    conda-forge
tensorboard               2.8.0                    pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
tensorflow                2.8.4                    pypi_0    pypi
tensorflow-estimator      2.8.0                    pypi_0    pypi
tensorflow-hub            0.14.0                   pypi_0    pypi
tensorflow-io-gcs-filesystem 0.31.0                   pypi_0    pypi
termcolor                 2.3.0                    pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
tifffile                  2021.11.2                pypi_0    pypi
typing-extensions         4.7.1                    pypi_0    pypi
tzdata                    2023.3                   pypi_0    pypi
tzlocal                   5.0.1                    pypi_0    pypi
ucrt                      10.0.22621.0         h57928b3_0    conda-forge
urllib3                   1.26.16                  pypi_0    pypi
vc                        14.3                h64f974e_17    conda-forge
vc14_runtime              14.36.32532         hdcecf7f_17    conda-forge
vs2015_runtime            14.36.32532         h05e6639_17    conda-forge
werkzeug                  2.2.3                    pypi_0    pypi
wheel                     0.41.2             pyhd8ed1ab_0    conda-forge
wrapt                     1.15.0                   pypi_0    pypi
zipp                      3.15.0                   pypi_0    pypi

I also ran pip install .[pypi] inside a simple python only environment:

mamba create -y -c conda-forge pip python=3.7.12 -n env2

and was able to run training/inference without utilizing the GPU.

conda list
# Name                    Version                   Build  Channel
absl-py                   1.4.0                    pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
attrs                     21.4.0                   pypi_0    pypi
backports-zoneinfo        0.2.1                    pypi_0    pypi
ca-certificates           2023.7.22            h56e8100_0    conda-forge
cached-property           1.5.2                    pypi_0    pypi
cachetools                5.3.1                    pypi_0    pypi
cattrs                    1.1.1                    pypi_0    pypi
certifi                   2023.7.22                pypi_0    pypi
charset-normalizer        3.2.0                    pypi_0    pypi
colorama                  0.4.6                    pypi_0    pypi
commonmark                0.9.1                    pypi_0    pypi
cycler                    0.11.0                   pypi_0    pypi
efficientnet              1.0.0                    pypi_0    pypi
flatbuffers               23.5.26                  pypi_0    pypi
fonttools                 4.38.0                   pypi_0    pypi
gast                      0.5.4                    pypi_0    pypi
google-auth               2.23.0                   pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.58.0                   pypi_0    pypi
h5py                      3.1.0                    pypi_0    pypi
hdmf                      3.6.1                    pypi_0    pypi
idna                      3.4                      pypi_0    pypi
image-classifiers         1.0.0                    pypi_0    pypi
imageio                   2.15.0                   pypi_0    pypi
imgaug                    0.4.0                    pypi_0    pypi
imgstore                  0.2.9                    pypi_0    pypi
importlib-metadata        4.2.0                    pypi_0    pypi
importlib-resources       5.12.0                   pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
jsmin                     3.0.1                    pypi_0    pypi
jsonpickle                1.2                      pypi_0    pypi
jsonschema                4.17.3                   pypi_0    pypi
keras                     2.8.0                    pypi_0    pypi
keras-applications        1.0.8                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
libclang                  16.0.6                   pypi_0    pypi
libsqlite                 3.43.0               hcfcfb64_0    conda-forge
markdown                  3.3.4                    pypi_0    pypi
markupsafe                2.1.3                    pypi_0    pypi
matplotlib                3.5.3                    pypi_0    pypi
ndx-pose                  0.1.1                    pypi_0    pypi
networkx                  2.6.3                    pypi_0    pypi
nixio                     1.5.3                    pypi_0    pypi
numpy                     1.21.6                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
opencv-python             4.5.5.64                 pypi_0    pypi
openssl                   3.1.2                hcfcfb64_0    conda-forge
opt-einsum                3.3.0                    pypi_0    pypi
packaging                 23.1                     pypi_0    pypi
pandas                    1.3.5                    pypi_0    pypi
pillow                    8.4.0                    pypi_0    pypi
pip                       23.2.1             pyhd8ed1ab_0    conda-forge
pkgutil-resolve-name      1.3.10                   pypi_0    pypi
protobuf                  3.19.6                   pypi_0    pypi
psutil                    5.9.5                    pypi_0    pypi
pyasn1                    0.5.0                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pygments                  2.16.1                   pypi_0    pypi
pykalman                  0.9.5                    pypi_0    pypi
pynwb                     2.3.3                    pypi_0    pypi
pyparsing                 3.1.1                    pypi_0    pypi
pyrsistent                0.19.3                   pypi_0    pypi
pyside2                   5.14.1                   pypi_0    pypi
python                    3.7.12          h900ac77_100_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
python-rapidjson          1.10                     pypi_0    pypi
pytz                      2023.3.post1             pypi_0    pypi
pywavelets                1.3.0                    pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzmq                     25.1.1                   pypi_0    pypi
qimage2ndarray            1.10.0                   pypi_0    pypi
qtpy                      2.4.0                    pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rich                      10.16.1                  pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
ruamel-yaml               0.17.32                  pypi_0    pypi
ruamel-yaml-clib          0.2.7                    pypi_0    pypi
scikit-image              0.19.3                   pypi_0    pypi
scikit-learn              1.0.2                    pypi_0    pypi
scikit-video              1.1.11                   pypi_0    pypi
scipy                     1.7.3                    pypi_0    pypi
seaborn                   0.12.2                   pypi_0    pypi
segmentation-models       1.0.1                    pypi_0    pypi
setuptools                68.2.2             pyhd8ed1ab_0    conda-forge
shapely                   2.0.1                    pypi_0    pypi
shiboken2                 5.14.1                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sleap                     1.3.2                    pypi_0    pypi
sqlite                    3.43.0               hcfcfb64_0    conda-forge
tensorboard               2.8.0                    pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
tensorflow                2.8.4                    pypi_0    pypi
tensorflow-estimator      2.8.0                    pypi_0    pypi
tensorflow-hub            0.14.0                   pypi_0    pypi
tensorflow-io-gcs-filesystem 0.31.0                   pypi_0    pypi
termcolor                 2.3.0                    pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
tifffile                  2021.11.2                pypi_0    pypi
typing-extensions         4.7.1                    pypi_0    pypi
tzdata                    2023.3                   pypi_0    pypi
tzlocal                   5.0.1                    pypi_0    pypi
ucrt                      10.0.22621.0         h57928b3_0    conda-forge
urllib3                   1.26.16                  pypi_0    pypi
vc                        14.3                h64f974e_17    conda-forge
vc14_runtime              14.36.32532         hdcecf7f_17    conda-forge
vs2015_runtime            14.36.32532         h05e6639_17    conda-forge
werkzeug                  2.2.3                    pypi_0    pypi
wheel                     0.41.2             pyhd8ed1ab_0    conda-forge
wrapt                     1.15.0                   pypi_0    pypi
zipp                      3.15.0                   pypi_0    pypi

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between 9ad9e6b and 8db46b4 commits.
Files selected for processing (1)
  • pypi_requirements.txt (2 hunks)
Files skipped from review due to trivial changes (1)
  • pypi_requirements.txt

@roomrys
Copy link
Collaborator Author

roomrys commented Sep 14, 2023

Seeing as the virtual environments still have access to the conda installed libraries... I am redoing some tests on Linux:

Environment with python, cudatoolkit, and cudnn

 mm create --name w0 pip python=3.7.12 cudatoolkit=11.3 cudnn=8.2

then pip install .[pypi] directly into that environment.

Running sleap-label does not automatically find the GPUs and I need to set the LD_LIBRARY_PATH manually to find GPUs. Tunning training fails and results in the following errors:

2023-09-14 08:17:07.657595: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-09-14 08:17:07.668602: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.2
  /usr/local/cuda
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2023-09-14 08:17:07.669125: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:07.669220: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
2023-09-14 08:17:07.669298: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
Full traceback
INFO:sleap.nn.training:
2023-09-14 08:16:54.606234: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 08:16:54.610455: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 08:16:54.611185: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
INFO:sleap.nn.training:Auto-selected GPU 0 with 23473 MiB of free memory.
INFO:sleap.nn.training:Using GPU 0 for acceleration.
INFO:sleap.nn.training:Disabled GPU memory pre-allocation.
INFO:sleap.nn.training:System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True
INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: /home/talmolab/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship/drosophila-melanogaster-courtship/courtship_labels.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training:  Splits: Training = 90 / Validation = 10.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
2023-09-14 08:16:55.460776: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-14 08:16:55.461344: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 08:16:55.462209: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 08:16:55.462956: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 08:16:55.758043: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 08:16:55.758799: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 08:16:55.759477: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 08:16:55.760127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21554 MB memory:  -> device: 0, name: NVIDIA RTX A5000, pci bus id: 0000:01:00.0, compute capability: 8.6
INFO:sleap.nn.training:Loaded test example. [1.434s]
INFO:sleap.nn.training:  Input shape: (512, 512, 1)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training:  Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False)
INFO:sleap.nn.training:  Max stride: 16
INFO:sleap.nn.training:  Parameters: 1,953,105
INFO:sleap.nn.training:  Heads: 
INFO:sleap.nn.training:    [0] = CentroidConfmapsHead(anchor_part='thorax', sigma=2.5, output_stride=2, loss_weight=1.0)
INFO:sleap.nn.training:  Outputs: 
INFO:sleap.nn.training:    [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 256, 256, 1), dtype=tf.float32, name=None), name='CentroidConfmapsHead/BiasAdd:0', description="created by layer 'CentroidConfmapsHead'")
INFO:sleap.nn.training:Training from scratch
INFO:sleap.nn.training:Setting up data pipelines...
INFO:sleap.nn.training:Training set: n = 90
INFO:sleap.nn.training:Validation set: n = 10
INFO:sleap.nn.training:Setting up optimization...
INFO:sleap.nn.training:  Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08)
INFO:sleap.nn.training:  Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20)
INFO:sleap.nn.training:Setting up outputs...
INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: )
INFO:sleap.nn.training:  ZMQ controller subcribed to: tcp://127.0.0.1:9000
INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set
INFO:sleap.nn.training:  ZMQ progress reporter publish on: tcp://127.0.0.1:9001
INFO:sleap.nn.training:Created run path: /home/talmolab/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship/drosophila-melanogaster-courtship/models/230914_081652.centroid.n=100
INFO:sleap.nn.training:Setting up visualization...
2023-09-14 08:16:58.267206: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A5000" frequency: 1695 num_cores: 64 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 6291456 shared_memory_size_per_multiprocessor: 102400 memory_size: 22601400320 bandwidth: 768096000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } }
2023-09-14 08:16:59.090655: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A5000" frequency: 1695 num_cores: 64 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 6291456 shared_memory_size_per_multiprocessor: 102400 memory_size: 22601400320 bandwidth: 768096000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } }
INFO:sleap.nn.training:Finished trainer set up. [3.7s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
INFO:sleap.nn.training:Finished creating training datasets. [3.8s]
INFO:sleap.nn.training:Starting training loop...
Epoch 1/2
2023-09-14 08:17:04.871161: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:428] Loaded cuDNN version 8201
2023-09-14 08:17:05.765371: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-09-14 08:17:05.765853: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-09-14 08:17:05.765879: W tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:85] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2023-09-14 08:17:05.766411: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-09-14 08:17:05.766476: W tensorflow/compiler/xla/stream_executor/gpu/redzone_allocator.cc:318] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2023-09-14 08:17:07.655116: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x7f19a1d00690 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-09-14 08:17:07.655136: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (0): NVIDIA RTX A5000, Compute Capability 8.6
2023-09-14 08:17:07.657595: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-09-14 08:17:07.668602: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.2
  /usr/local/cuda
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2023-09-14 08:17:07.669125: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:07.669220: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
2023-09-14 08:17:07.669298: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:07.682158: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:07.682370: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:07.748199: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:07.748372: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:07.761001: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:07.761240: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:07.850496: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:07.850682: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:07.862536: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:07.862739: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:07.948463: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:07.948639: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:07.960910: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:07.961108: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.237140: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.237317: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.249632: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.249831: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.326741: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.326956: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.339581: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.339800: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.523160: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.523343: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.535760: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.535985: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.595202: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.595381: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.606972: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.607144: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.647198: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.647379: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.659597: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.659800: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.673450: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.673647: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.686303: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.686580: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.736072: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.736290: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.748625: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.748846: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.761664: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.761849: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.774337: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.774532: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.828803: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.828981: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.841308: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.841520: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.854495: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.854732: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.867139: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.867364: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.910598: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.910789: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:08.923174: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:08.923376: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:09.006807: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:09.006985: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:09.019378: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:09.019604: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:09.040892: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:09.041088: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-09-14 08:17:09.053238: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-09-14 08:17:09.053464: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
  File "/home/talmolab/micromamba/envs/w0/bin/sleap-train", line 8, in <module>
    sys.exit(main())
  File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/sleap/nn/training.py", line 2014, in main
    trainer.train()
  File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/sleap/nn/training.py", line 941, in train
    verbose=2,
  File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Detected at node 'StatefulPartitionedCall_33' defined at (most recent call last):
    File "/home/talmolab/micromamba/envs/w0/bin/sleap-train", line 8, in <module>
      sys.exit(main())
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/sleap/nn/training.py", line 2014, in main
      trainer.train()
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/sleap/nn/training.py", line 941, in train
      verbose=2,
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/engine/training.py", line 1027, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1169, in _internal_apply_gradients
      grads_and_vars,
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1217, in _distributed_apply_gradients_fn
      var, apply_grad_to_update_var, args=(grad,), group=False
    File "/home/talmolab/micromamba/envs/w0/lib/python3.7/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_33'
libdevice not found at ./libdevice.10.bc
         [[{{node StatefulPartitionedCall_33}}]] [Op:__inference_train_function_11726]
INFO:sleap.nn.callbacks:Closing the reporter controller/context.
INFO:sleap.nn.callbacks:Closing the training controller socket/context.
Run Path: /home/talmolab/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship/drosophila-melanogaster-courtship/models/230914_081652.centroid.n=100
Resetting monitor window.
pip freeze
absl-py==1.4.0
astunparse==1.6.3
attrs==21.4.0
backports.zoneinfo==0.2.1
cachetools==5.3.1
cattrs==1.1.1
certifi==2023.7.22
charset-normalizer==3.2.0
colorama==0.4.6
commonmark==0.9.1
cycler==0.11.0
efficientnet==1.0.0
flatbuffers==23.5.26
fonttools==4.38.0
gast==0.4.0
google-auth==2.23.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.58.0
h5py==3.8.0
hdmf==3.6.1
idna==3.4
image-classifiers==1.0.0
imageio==2.15.0
imgaug==0.4.0
imgstore==0.2.9
importlib-metadata==4.2.0
importlib-resources==5.12.0
joblib==1.3.2
jsmin==3.0.1
jsonpickle==1.2
jsonschema==4.17.3
keras==2.11.0
Keras-Applications==1.0.8
kiwisolver==1.4.5
libclang==16.0.6
Markdown==3.3.4
MarkupSafe==2.1.3
matplotlib==3.5.3
ndx-pose==0.1.1
networkx==2.6.3
nixio==1.5.3
numpy==1.21.6
oauthlib==3.2.2
opencv-python==4.5.5.64
opt-einsum==3.3.0
packaging==23.1
pandas==1.3.5
Pillow==8.4.0
pkgutil_resolve_name==1.3.10
protobuf==3.19.6
psutil==5.9.5
pyasn1==0.5.0
pyasn1-modules==0.3.0
Pygments==2.16.1
pykalman==0.9.5
pynwb==2.3.3
pyparsing==3.1.1
pyrsistent==0.19.3
PySide2==5.14.1
python-dateutil==2.8.2
python-rapidjson==1.11
pytz==2023.3.post1
PyWavelets==1.3.0
PyYAML==6.0.1
pyzmq==25.1.1
qimage2ndarray==1.10.0
QtPy==2.4.0
requests==2.31.0
requests-oauthlib==1.3.1
rich==10.16.1
rsa==4.9
ruamel.yaml==0.17.32
ruamel.yaml.clib==0.2.7
scikit-image==0.19.3
scikit-learn==1.0.2
scikit-video==1.1.11
scipy==1.7.3
seaborn==0.12.2
segmentation-models==1.0.1
shapely==2.0.1
shiboken2==5.14.1
six==1.16.0
sleap @ file:///home/talmolab/sleap-estimates-animal-poses/pull-requests/sleap
tensorboard==2.11.2
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.11.0
tensorflow-estimator==2.11.0
tensorflow-hub==0.14.0
tensorflow-io-gcs-filesystem==0.34.0
termcolor==2.3.0
threadpoolctl==3.1.0
tifffile==2021.11.2
typing_extensions==4.7.1
tzlocal==5.0.1
urllib3==1.26.16
Werkzeug==2.2.3
wrapt==1.15.0
zipp==3.15.0
micromamba list
  Name              Version    Build                 Channel    
──────────────────────────────────────────────────────────────────
  _libgcc_mutex     0.1        conda_forge           conda-forge
  _openmp_mutex     4.5        2_gnu                 conda-forge
  ca-certificates   2023.7.22  hbcca054_0            conda-forge
  cudatoolkit       11.3.1     hb98b00a_12           conda-forge
  cudnn             8.2.1.32   h86fa8c9_0            conda-forge
  ld_impl_linux-64  2.40       h41732ed_0            conda-forge
  libffi            3.4.2      h7f98852_5            conda-forge
  libgcc-ng         13.2.0     h807b86a_0            conda-forge
  libgomp           13.2.0     h807b86a_0            conda-forge
  libnsl            2.0.0      h7f98852_0            conda-forge
  libsqlite         3.43.0     h2797004_0            conda-forge
  libstdcxx-ng      13.2.0     h7e041cc_0            conda-forge
  libzlib           1.2.13     hd590300_5            conda-forge
  ncurses           6.4        hcb278e6_0            conda-forge
  openssl           3.1.2      hd590300_0            conda-forge
  pip               23.2.1     pyhd8ed1ab_0          conda-forge
  python            3.7.12     hf930737_100_cpython  conda-forge
  readline          8.2        h8228510_1            conda-forge
  setuptools        68.2.2     pyhd8ed1ab_0          conda-forge
  sqlite            3.43.0     h2c6b66d_0            conda-forge
  tk                8.6.12     h27826a3_0            conda-forge
  wheel             0.41.2     pyhd8ed1ab_0          conda-forge
  xz                5.2.6      h166bdaf_0            conda-forge

Dowgrading from tensorflow<=2.11 to tensorflow<=2.9

Using the same environment setup as above (swapping tensorflow versions).
The GPUs are automatically detected on sleap-label. But, while we pass training utilizing the GPU, we fail inference with the same error seen on windows with tensorflow>2.9:

error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
Full Traceback
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?Polling: /home/talmolab/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship/drosophila-melanogaster-courtship/models/230914_084452.centroid.n=100/viz/validation.*.png
Polling: /home/talmolab/sleap-estimates-animal-poses/datasets/drosophila-melanogaster-courtship/drosophila-melanogaster-courtship/models/230914_084539.centered_instance.n=100/viz/validation.*.png
2023-09-14 08:46:06.147329: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1024 } dim { size: 1024 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "103" frequency: 3600 num_cores: 16 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 49152 l2_cache_size: 524288 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -24 } dim { size: -25 } dim { size: 1 } } }
2023-09-14 08:46:06.155804: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -50 } dim { size: -51 } dim { size: -52 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A5000" frequency: 1695 num_cores: 64 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 6291456 shared_memory_size_per_multiprocessor: 102400 memory_size: 22673883136 bandwidth: 768096000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -54 } dim { size: -55 } dim { size: 1 } } }
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
2023-09-14 08:46:06.560091: W tensorflow/core/framework/op_kernel.cc:1733] UNKNOWN: JIT compilation failed.
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% ETA: -:--:-- ?
Traceback (most recent call last):
  File "/home/talmolab/micromamba/envs/w1/bin/sleap-train", line 8, in <module>
    sys.exit(main())
  File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/training.py", line 2014, in main
    trainer.train()
  File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/training.py", line 953, in train
    self.evaluate()
  File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/training.py", line 966, in evaluate
    split_name="train",
  File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/evals.py", line 744, in evaluate_model
    labels_pr: Labels = predictor.predict(labels_gt, make_labels=True)
  File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 526, in predict
    self._make_labeled_frames_from_generator(generator, data)
  File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 2633, in _make_labeled_frames_from_generator
    for ex in generator:
  File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 436, in _predict_generator
    ex = process_batch(ex)
  File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 399, in process_batch
    preds = self.inference_model.predict_on_batch(ex, numpy=True)
  File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 1069, in predict_on_batch
    outs = super().predict_on_batch(data, **kwargs)
  File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 2230, in predict_on_batch
    outputs = self.predict_function(iterator)
  File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 55, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:

Detected at node 'mod' defined at (most recent call last):
    File "/home/talmolab/micromamba/envs/w1/bin/sleap-train", line 8, in <module>
      sys.exit(main())
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/training.py", line 2014, in main
      trainer.train()
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/training.py", line 953, in train
      self.evaluate()
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/training.py", line 966, in evaluate
      split_name="train",
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/evals.py", line 744, in evaluate_model
      labels_pr: Labels = predictor.predict(labels_gt, make_labels=True)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 526, in predict
      self._make_labeled_frames_from_generator(generator, data)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 2633, in _make_labeled_frames_from_generator
      for ex in generator:
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 436, in _predict_generator
      ex = process_batch(ex)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 399, in process_batch
      preds = self.inference_model.predict_on_batch(ex, numpy=True)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 1069, in predict_on_batch
      outs = super().predict_on_batch(data, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 2230, in predict_on_batch
      outputs = self.predict_function(iterator)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 1845, in predict_function
      return step_function(self, iterator)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 1834, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 1823, in run_step
      outputs = model.predict_step(data)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 1791, in predict_step
      return self(x, training=False)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 2256, in call
      if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth):
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 2265, in call
      peaks_output = self.instance_peaks(crop_output)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 2101, in call
      if self.offsets_ind is None:
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 2103, in call
      peak_points, peak_vals = sleap.nn.peak_finding.find_global_peaks(
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/peak_finding.py", line 366, in find_global_peaks
      rough_peaks, peak_vals = find_global_peaks_rough(
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/peak_finding.py", line 224, in find_global_peaks_rough
      channel_subs = tf.range(total_peaks, dtype=tf.int64) % channels
Node: 'mod'
Detected at node 'mod' defined at (most recent call last):
    File "/home/talmolab/micromamba/envs/w1/bin/sleap-train", line 8, in <module>
      sys.exit(main())
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/training.py", line 2014, in main
      trainer.train()
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/training.py", line 953, in train
      self.evaluate()
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/training.py", line 966, in evaluate
      split_name="train",
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/evals.py", line 744, in evaluate_model
      labels_pr: Labels = predictor.predict(labels_gt, make_labels=True)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 526, in predict
      self._make_labeled_frames_from_generator(generator, data)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 2633, in _make_labeled_frames_from_generator
      for ex in generator:
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 436, in _predict_generator
      ex = process_batch(ex)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 399, in process_batch
      preds = self.inference_model.predict_on_batch(ex, numpy=True)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 1069, in predict_on_batch
      outs = super().predict_on_batch(data, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 2230, in predict_on_batch
      outputs = self.predict_function(iterator)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 1845, in predict_function
      return step_function(self, iterator)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 1834, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 1823, in run_step
      outputs = model.predict_step(data)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 1791, in predict_step
      return self(x, training=False)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 2256, in call
      if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth):
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 2265, in call
      peaks_output = self.instance_peaks(crop_output)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 2101, in call
      if self.offsets_ind is None:
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/inference.py", line 2103, in call
      peak_points, peak_vals = sleap.nn.peak_finding.find_global_peaks(
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/peak_finding.py", line 366, in find_global_peaks
      rough_peaks, peak_vals = find_global_peaks_rough(
    File "/home/talmolab/micromamba/envs/w1/lib/python3.7/site-packages/sleap/nn/peak_finding.py", line 224, in find_global_peaks_rough
      channel_subs = tf.range(total_peaks, dtype=tf.int64) % channels
Node: 'mod'
2 root error(s) found.
  (0) UNKNOWN:  JIT compilation failed.
         [[{{node mod}}]]
         [[top_down_inference_model/find_instance_peaks_1/RaggedFromValueRowIds_1/RowPartitionFromValueRowIds/assert_non_negative/assert_less_equal/Assert/AssertGuard/pivot_f/_159/_387]]
  (1) UNKNOWN:  JIT compilation failed.
         [[{{node mod}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_predict_function_38065]
INFO:sleap.nn.callbacks:Closing the reporter controller/context.
pip freeze
absl-py==1.4.0
astunparse==1.6.3
attrs==21.4.0
backports.zoneinfo==0.2.1
cachetools==5.3.1
cattrs==1.1.1
certifi==2023.7.22
charset-normalizer==3.2.0
colorama==0.4.6
commonmark==0.9.1
cycler==0.11.0
efficientnet==1.0.0
flatbuffers==1.12
fonttools==4.38.0
gast==0.4.0
google-auth==2.23.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.58.0
h5py==3.8.0
hdmf==3.6.1
idna==3.4
image-classifiers==1.0.0
imageio==2.15.0
imgaug==0.4.0
imgstore==0.2.9
importlib-metadata==4.2.0
importlib-resources==5.12.0
joblib==1.3.2
jsmin==3.0.1
jsonpickle==1.2
jsonschema==4.17.3
keras==2.9.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.4.5
libclang==16.0.6
Markdown==3.3.4
MarkupSafe==2.1.3
matplotlib==3.5.3
ndx-pose==0.1.1
networkx==2.6.3
nixio==1.5.3
numpy==1.21.6
oauthlib==3.2.2
opencv-python==4.5.5.64
opt-einsum==3.3.0
packaging==23.1
pandas==1.3.5
Pillow==8.4.0
pkgutil_resolve_name==1.3.10
protobuf==3.19.6
psutil==5.9.5
pyasn1==0.5.0
pyasn1-modules==0.3.0
Pygments==2.16.1
pykalman==0.9.5
pynwb==2.3.3
pyparsing==3.1.1
pyrsistent==0.19.3
PySide2==5.14.1
python-dateutil==2.8.2
python-rapidjson==1.11
pytz==2023.3.post1
PyWavelets==1.3.0
PyYAML==6.0.1
pyzmq==25.1.1
qimage2ndarray==1.10.0
QtPy==2.4.0
requests==2.31.0
requests-oauthlib==1.3.1
rich==10.16.1
rsa==4.9
ruamel.yaml==0.17.32
ruamel.yaml.clib==0.2.7
scikit-image==0.19.3
scikit-learn==1.0.2
scikit-video==1.1.11
scipy==1.7.3
seaborn==0.12.2
segmentation-models==1.0.1
shapely==2.0.1
shiboken2==5.14.1
six==1.16.0
sleap @ file:///home/talmolab/sleap-estimates-animal-poses/pull-requests/sleap
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.9.0
tensorflow-estimator==2.9.0
tensorflow-hub==0.14.0
tensorflow-io-gcs-filesystem==0.34.0
termcolor==2.3.0
threadpoolctl==3.1.0
tifffile==2021.11.2
typing_extensions==4.7.1
tzlocal==5.0.1
urllib3==1.26.16
Werkzeug==2.2.3
wrapt==1.15.0
zipp==3.15.0
micromamba list
  Name              Version    Build                 Channel    
──────────────────────────────────────────────────────────────────
  _libgcc_mutex     0.1        conda_forge           conda-forge
  _openmp_mutex     4.5        2_gnu                 conda-forge
  ca-certificates   2023.7.22  hbcca054_0            conda-forge
  cudatoolkit       11.3.1     hb98b00a_12           conda-forge
  cudnn             8.2.1.32   h86fa8c9_0            conda-forge
  ld_impl_linux-64  2.40       h41732ed_0            conda-forge
  libffi            3.4.2      h7f98852_5            conda-forge
  libgcc-ng         13.2.0     h807b86a_0            conda-forge
  libgomp           13.2.0     h807b86a_0            conda-forge
  libnsl            2.0.0      h7f98852_0            conda-forge
  libsqlite         3.43.0     h2797004_0            conda-forge
  libstdcxx-ng      13.2.0     h7e041cc_0            conda-forge
  libzlib           1.2.13     hd590300_5            conda-forge
  ncurses           6.4        hcb278e6_0            conda-forge
  openssl           3.1.2      hd590300_0            conda-forge
  pip               23.2.1     pyhd8ed1ab_0          conda-forge
  python            3.7.12     hf930737_100_cpython  conda-forge
  readline          8.2        h8228510_1            conda-forge
  setuptools        68.2.2     pyhd8ed1ab_0          conda-forge
  sqlite            3.43.0     h2c6b66d_0            conda-forge
  tk                8.6.12     h27826a3_0            conda-forge
  wheel             0.41.2     pyhd8ed1ab_0          conda-forge
  xz                5.2.6      h166bdaf_0            conda-forge

Downgrading from tensorflow <=2.9 to tensorflow<2.9

Same environment setup (swatpping versions of tensorflow). Running sleap-label does not automatically detects GPUs. To detect GPUs, the user must run set the LD_LIBRARY_PATH. The following sets the LD_LIBRARY_PATH for the current terminal once the environment is activated (and must be run inside the active environment):

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo '#!/bin/sh' >> $CONDA_PREFIX/etc/conda/activate.d/sleap_activate.sh
echo 'export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/sleap_activate.sh
source $CONDA_PREFIX/etc/conda/activate.d/sleap_activate.sh

Running training works. Running inference works.

pip freeze
absl-py==1.4.0
astunparse==1.6.3
attrs==21.4.0
backports.zoneinfo==0.2.1
cachetools==5.3.1
cattrs==1.1.1
certifi==2023.7.22
charset-normalizer==3.2.0
colorama==0.4.6
commonmark==0.9.1
cycler==0.11.0
efficientnet==1.0.0
flatbuffers==23.5.26
fonttools==4.38.0
gast==0.5.4
google-auth==2.23.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.58.0
h5py==3.8.0
hdmf==3.6.1
idna==3.4
image-classifiers==1.0.0
imageio==2.15.0
imgaug==0.4.0
imgstore==0.2.9
importlib-metadata==4.2.0
importlib-resources==5.12.0
joblib==1.3.2
jsmin==3.0.1
jsonpickle==1.2
jsonschema==4.17.3
keras==2.8.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.4.5
libclang==16.0.6
Markdown==3.3.4
MarkupSafe==2.1.3
matplotlib==3.5.3
ndx-pose==0.1.1
networkx==2.6.3
nixio==1.5.3
numpy==1.21.6
oauthlib==3.2.2
opencv-python==4.5.5.64
opt-einsum==3.3.0
packaging==23.1
pandas==1.3.5
Pillow==8.4.0
pkgutil_resolve_name==1.3.10
protobuf==3.19.6
psutil==5.9.5
pyasn1==0.5.0
pyasn1-modules==0.3.0
Pygments==2.16.1
pykalman==0.9.5
pynwb==2.3.3
pyparsing==3.1.1
pyrsistent==0.19.3
PySide2==5.14.1
python-dateutil==2.8.2
python-rapidjson==1.11
pytz==2023.3.post1
PyWavelets==1.3.0
PyYAML==6.0.1
pyzmq==25.1.1
qimage2ndarray==1.10.0
QtPy==2.4.0
requests==2.31.0
requests-oauthlib==1.3.1
rich==10.16.1
rsa==4.9
ruamel.yaml==0.17.32
ruamel.yaml.clib==0.2.7
scikit-image==0.19.3
scikit-learn==1.0.2
scikit-video==1.1.11
scipy==1.7.3
seaborn==0.12.2
segmentation-models==1.0.1
shapely==2.0.1
shiboken2==5.14.1
six==1.16.0
sleap @ file:///home/talmolab/sleap-estimates-animal-poses/pull-requests/sleap
tensorboard==2.8.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.8.4
tensorflow-estimator==2.8.0
tensorflow-hub==0.14.0
tensorflow-io-gcs-filesystem==0.34.0
termcolor==2.3.0
threadpoolctl==3.1.0
tifffile==2021.11.2
typing_extensions==4.7.1
tzlocal==5.0.1
urllib3==1.26.16
Werkzeug==2.2.3
wrapt==1.15.0
zipp==3.15.0
micromamba list
  Name              Version    Build                 Channel    
──────────────────────────────────────────────────────────────────
  _libgcc_mutex     0.1        conda_forge           conda-forge
  _openmp_mutex     4.5        2_gnu                 conda-forge
  ca-certificates   2023.7.22  hbcca054_0            conda-forge
  cudatoolkit       11.3.1     hb98b00a_12           conda-forge
  cudnn             8.2.1.32   h86fa8c9_0            conda-forge
  ld_impl_linux-64  2.40       h41732ed_0            conda-forge
  libffi            3.4.2      h7f98852_5            conda-forge
  libgcc-ng         13.2.0     h807b86a_0            conda-forge
  libgomp           13.2.0     h807b86a_0            conda-forge
  libnsl            2.0.0      h7f98852_0            conda-forge
  libsqlite         3.43.0     h2797004_0            conda-forge
  libstdcxx-ng      13.2.0     h7e041cc_0            conda-forge
  libzlib           1.2.13     hd590300_5            conda-forge
  ncurses           6.4        hcb278e6_0            conda-forge
  openssl           3.1.2      hd590300_0            conda-forge
  pip               23.2.1     pyhd8ed1ab_0          conda-forge
  python            3.7.12     hf930737_100_cpython  conda-forge
  readline          8.2        h8228510_1            conda-forge
  setuptools        68.2.2     pyhd8ed1ab_0          conda-forge
  sqlite            3.43.0     h2c6b66d_0            conda-forge
  tk                8.6.12     h27826a3_0            conda-forge
  wheel             0.41.2     pyhd8ed1ab_0          conda-forge
  xz                5.2.6      h166bdaf_0            conda-forge

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between 8db46b4 and 034e58b commits.
Files selected for processing (1)
  • pypi_requirements.txt (2 hunks)
Additional comments (Suppressed): 5
pypi_requirements.txt (5)
  • 13-13: The version constraint for numpy is well defined, ensuring compatibility with other packages.

  • 22-23: Good use of platform-specific constraints for the python-rapidjson package to address the Microsoft Visual C++ requirement on Windows.

  • 35-36: Ensure that all parts of the codebase that use tensorflow and tensorflow-hub are compatible with the specified version ranges.

  • 38-39: These lines add support for Apple Silicon machines by including tensorflow-macos and tensorflow-metal dependencies. However, as mentioned in the comment, these dependencies are untested. It would be beneficial to test these dependencies to ensure they work as expected on Apple Silicon machines.

  • 46-46: Adding a version constraint to protobuf due to compatibility requirements with tensorboard and tensorflow is a good practice. However, it's important to verify that this version constraint doesn't cause conflicts with other packages that depend on protobuf.

@roomrys
Copy link
Collaborator Author

roomrys commented Sep 14, 2023

The latest commit is running training/inference successfully on colab!
https://colab.research.google.com/drive/18Mk3zD8Z2ewnxiFjb0ICPOkiGTA6_3QA?usp=sharing

@roomrys roomrys merged commit 6eed6d9 into develop Sep 14, 2023
@roomrys roomrys deleted the liezl/hotfix-1.3.2 branch September 14, 2023 21:24
roomrys added a commit that referenced this pull request Sep 15, 2023
* Do not try to remove item if already deleted (#1498)

* Set `LD_LIBRARY_PATH` on `mamba activate` (#1496)

* Add version restrictions to tensorflow for pypi (#1485)

* Remove `imageio` pin (#1501)

* Reset LD_LIBRARY_PATH on deactivate (#1502)

* Brown bag bump to 1.3.3 (#1484)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant