Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keras3 #166

Merged
merged 47 commits into from
Nov 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
812c0c4
Add keras tests
wsmoses Nov 8, 2024
c985ed8
fix
wsmoses Nov 8, 2024
b2ff8d4
fix
wsmoses Nov 8, 2024
b5180b4
fix
wsmoses Nov 8, 2024
be8156b
MLIR AD
wsmoses Nov 8, 2024
bfc2c3d
fixups
wsmoses Nov 9, 2024
5636ead
lets see what CI says
wsmoses Nov 9, 2024
e760cd3
cleanup
wsmoses Nov 9, 2024
f42f425
Update test_utils.py
wsmoses Nov 9, 2024
753abdc
Update BUILD
wsmoses Nov 9, 2024
5d4dad3
fmt
wsmoses Nov 9, 2024
f793b6b
fix ngcm
wsmoses Nov 9, 2024
20f1853
fix
wsmoses Nov 9, 2024
c3fe341
fmt
wsmoses Nov 9, 2024
c48a1b3
fix
wsmoses Nov 9, 2024
d7874cb
more debug
wsmoses Nov 9, 2024
e8cbea9
continue debugging
wsmoses Nov 9, 2024
732ad64
print if not exist
wsmoses Nov 9, 2024
6b9ca06
cleanup bench vs xla
wsmoses Nov 9, 2024
22be91d
more dbg
wsmoses Nov 9, 2024
14341ab
fixup
wsmoses Nov 9, 2024
c5d642f
cleanup
wsmoses Nov 9, 2024
2363b7a
hopefully finally works
wsmoses Nov 9, 2024
e8c66a7
fix
wsmoses Nov 9, 2024
e031be0
does exclusive fix
wsmoses Nov 10, 2024
1576b6b
fix
wsmoses Nov 10, 2024
221db3b
also load cusolver
wsmoses Nov 10, 2024
f7071bc
also jitlink
wsmoses Nov 10, 2024
8618a60
fix
wsmoses Nov 10, 2024
d828c5a
cusparse
wsmoses Nov 10, 2024
41a168c
cleanup
wsmoses Nov 10, 2024
61b0b5a
Cleanup
wsmoses Nov 10, 2024
b42dd0d
Update gpu_pipeline.yml
wsmoses Nov 10, 2024
477beca
Update pipeline.yml
wsmoses Nov 10, 2024
3615b75
Update test_utils.py
wsmoses Nov 10, 2024
34aa00a
Update keras_test.py
wsmoses Nov 10, 2024
a7d9a7e
Update neuralgcm_test.py
wsmoses Nov 10, 2024
e22aef5
Update jaxmd.py
wsmoses Nov 10, 2024
5e7dfbf
fix
wsmoses Nov 10, 2024
3e2d98f
Update jaxmd.py
wsmoses Nov 11, 2024
0b460b7
try to bump keras nlp version
wsmoses Nov 11, 2024
8533f3a
Update gpu_pipeline.yml
wsmoses Nov 11, 2024
149137c
fix
wsmoses Nov 11, 2024
c59309f
Update keras_test.py
wsmoses Nov 11, 2024
4f3ca1d
Update keras_test.py
wsmoses Nov 11, 2024
2154129
Update keras_test.py
wsmoses Nov 11, 2024
6416e6e
Update keras_test.py
wsmoses Nov 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 15 additions & 43 deletions .buildkite/gpu_pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,16 @@ steps:
timeout_in_minutes: 180
if: build.tag == null
plugins:
# - cache#v0.6.0:
# manifest: .buildkite/gpu_pipeline.yml
# path: .local/bin/bazel
# restore: file
# save: file
# - cache#v0.6.0:
# manifest: WORKSPACE
# path: .baztmp
# restore: file
# save: file
- cache#v1.3.0:
manifest: .buildkite/pipeline.yml
path: .local/bin/bazel
restore: file
save: file
- cache#v1.3.0:
manifest: workspace.bzl
path: .baztmp
restore: file
save: file
commands: |
pwd
env
Expand All @@ -27,16 +27,6 @@ steps:
echo "openssl md5 | cut -d' ' -f2" > .local/bin/md5
chmod +x .local/bin/md5

# No one tells us what to do
unset NV_LIBCUBLAS_VERSION
unset NVIDIA_VISIBLE_DEVICES
unset NV_NVML_DEV_VERSION
unset NV_LIBNCCL_DEV_PACKAGE
unset NV_LIBNCCL_DEV_PACKAGE_VERSION
unset NVIDIA_REQUIRE_CUDA
unset NV_LIBCUBLAS_DEV_PACKAGE
unset NV_NVTX_VERSION

if [ ! -f ".local/bin/bazel" ]; then
curl -fLO https://github.com/bazelbuild/bazelisk/releases/download/v1.19.0/bazelisk-linux-amd64
mv bazel* .local/bin/bazel
Expand All @@ -49,32 +39,14 @@ steps:

echo "--- :python: Test"

export CUDA_DIR=`pwd`/bazel-bin/test/llama.runfiles/pypi_nvidia_cuda_nvcc_cu12/site-packages/nvidia/cuda_nvcc
export XLA_FLAGS=--xla_gpu_cuda_data_dir=\$CUDA_DIR
export LD_LIBRARY_PATH="`pwd`/bazel-bin/test/llama.runfiles/pypi_nvidia_cusolver_cu12/site-packages/nvidia/cusolver:\$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="`pwd`/bazel-bin/test/llama.runfiles/pypi_nvidia_cudnn_cu12/site-packages/nvidia/cudnn/lib:\$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="`pwd`/bazel-bin/test/test.runfiles/pypi_nvidia_cublas_cu12/site-packages/nvidia/cublas/lib:\$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="`pwd`/bazel-bin/test/llama.runfiles/pypi_nvidia_cuda_cupti_cu12/site-packages/nvidia/cuda_cupti/lib:\$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="`pwd`/bazel-bin/test/llama.runfiles/pypi_nvidia_cuda_runtime_cu12/site-packages/nvidia/cuda_runtime/lib:\$LD_LIBRARY_PATH"
export PATH="`pwd`/bazel-bin/test/llama.runfiles/pypi_nvidia_cuda_nvcc_cu12/site-packages/nvidia/cuda_nvcc/bin:\$PATH"
export TF_CPP_MIN_LOG_LEVEL=0
HERMETIC_PYTHON_VERSION="3.12" .local/bin/bazel --output_user_root=`pwd`/.baztmp run --repo_env CUDA_DIR --repo_env XLA_FLAGS --action_env XLA_FLAGS --repo_env TF_CPP_MIN_LOG_LEVEL --action_env TF_CPP_MIN_LOG_LEVEL //builddeps:requirements.update || echo "no req update"
HERMETIC_PYTHON_VERSION="3.12" .local/bin/bazel --output_user_root=`pwd`/.baztmp build --repo_env CUDA_DIR --repo_env XLA_FLAGS --action_env XLA_FLAGS --repo_env TF_CPP_MIN_LOG_LEVEL --action_env TF_CPP_MIN_LOG_LEVEL --test_output=errors //:wheel
HERMETIC_PYTHON_VERSION="3.12" .local/bin/bazel --output_user_root=`pwd`/.baztmp test --repo_env CUDA_DIR --repo_env XLA_FLAGS --action_env XLA_FLAGS --repo_env TF_CPP_MIN_LOG_LEVEL --action_env TF_CPP_MIN_LOG_LEVEL --test_output=errors //test/... || echo "fail1"
HERMETIC_PYTHON_VERSION="3.12" .local/bin/bazel --output_user_root=`pwd`/.baztmp test --repo_env CUDA_DIR --repo_env XLA_FLAGS --action_env XLA_FLAGS --repo_env TF_CPP_MIN_LOG_LEVEL --action_env TF_CPP_MIN_LOG_LEVEL --cache_test_results=no -s //test:bench_vs_xla
HERMETIC_PYTHON_VERSION="3.12" .local/bin/bazel --output_user_root=`pwd`/.baztmp test --repo_env CUDA_DIR --repo_env XLA_FLAGS --action_env XLA_FLAGS --repo_env TF_CPP_MIN_LOG_LEVEL --action_env TF_CPP_MIN_LOG_LEVEL --cache_test_results=no -s //test:llama
HERMETIC_PYTHON_VERSION="3.12" .local/bin/bazel --output_user_root=`pwd`/.baztmp test --repo_env CUDA_DIR --repo_env XLA_FLAGS --action_env XLA_FLAGS --repo_env TF_CPP_MIN_LOG_LEVEL --action_env TF_CPP_MIN_LOG_LEVEL --cache_test_results=no -s //test:jaxmd
HERMETIC_PYTHON_VERSION="3.12" .local/bin/bazel --output_user_root=`pwd`/.baztmp test --repo_env CUDA_DIR --repo_env XLA_FLAGS --action_env XLA_FLAGS --repo_env TF_CPP_MIN_LOG_LEVEL --action_env TF_CPP_MIN_LOG_LEVEL --cache_test_results=no -s //test:neuralgcm_test
# bazel-bin/test/llama.runfiles/python_*/bin/python3 -m pip install bazel-bin/*.whl git+https://github.com/wsmoses/maxtext aqtp tensorboardX google-cloud-storage datasets gcsfs
# bazel-bin/test/llama.runfiles/python_*/bin/python3 test/maxtext.py > maxtext.log
cat bazel-out/*/testlogs/test/llama/test.log
cat bazel-out/*/testlogs/test/bench_vs_xla/test.log
cat bazel-out/*/testlogs/test/jaxmd/test.log
cat bazel-out/*/testlogs/test/neuralgcm_test/test.log
# cat maxtext.log
HERMETIC_PYTHON_VERSION="3.11" .local/bin/bazel --output_user_root=`pwd`/.baztmp run //builddeps:requirements.update
HERMETIC_PYTHON_VERSION="3.11" .local/bin/bazel --output_user_root=`pwd`/.baztmp build --test_output=errors //:wheel
HERMETIC_PYTHON_VERSION="3.11" .local/bin/bazel --output_user_root=`pwd`/.baztmp test --test_output=errors //test/...

artifact_paths:
- "bazel-out/*/testlogs/test/llama/test.log"
- "bazel-out/*/testlogs/test/bench_vs_xla/test.log"
- "bazel-out/*/testlogs/test/jaxmd/test.log"
- "bazel-out/*/testlogs/test/neuralgcm_test/test.log"
- "bazel-out/*/testlogs/test/keras_test/test.log"
- "maxtext.log"
28 changes: 14 additions & 14 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,16 @@ steps:
arch: "{{matrix.arch}}"
if: build.tag == null
plugins:
- cache#v1.3.0:
manifest: .buildkite/pipeline.yml
path: .local/bin/bazel
restore: file
save: file
- cache#v1.3.0:
manifest: workspace.bzl
path: .baztmp
restore: file
save: file
# - cache#v1.3.0:
# manifest: .buildkite/pipeline.yml
# path: .local/bin/bazel
# restore: file
# save: file
# - cache#v1.3.0:
# manifest: workspace.bzl
# path: .baztmp
# restore: file
# save: file
commands: |
mkdir -p .local/bin
export PATH="`pwd`/.local/bin:`pwd`/conda/bin:\$PATH"
Expand All @@ -51,15 +51,15 @@ steps:
mkdir -p .baztmp
HERMETIC_PYTHON_VERSION={{matrix.python}} bazel --output_user_root=`pwd`/.baztmp run //builddeps:requirements.update
HERMETIC_PYTHON_VERSION={{matrix.python}} bazel --output_user_root=`pwd`/.baztmp test --test_output=errors //test/...
HERMETIC_PYTHON_VERSION={{matrix.python}} bazel --output_user_root=`pwd`/.baztmp test --cache_test_results=no //test:bench_vs_xla
HERMETIC_PYTHON_VERSION={{matrix.python}} bazel --output_user_root=`pwd`/.baztmp test --cache_test_results=no //test:llama
cat bazel-out/*/testlogs/test/llama/test.log
rm -f bazel-bin/*.whl
HERMETIC_PYTHON_VERSION={{matrix.python}} bazel --output_user_root=`pwd`/.baztmp build :wheel
cp bazel-bin/*.whl .
artifact_paths:
- "*.whl"
- "bazel-out/*/testlogs/test/llama/test.log"
- "bazel-out/*/testlogs/test/llama/bench_vs_xla.log"
- "bazel-out/*/testlogs/test/bench_vs_xla/test.log"
- "bazel-out/*/testlogs/test/jaxmd/test.log"
- "bazel-out/*/testlogs/test/neuralgcm_test/test.log"
- "bazel-out/*/testlogs/test/keras_test/test.log"

timeout_in_minutes: 180
Loading
Loading