From db1c7cf201d1bd87d764e0f1e6048453b68fb07b Mon Sep 17 00:00:00 2001 From: Anik Chand Date: Thu, 9 Oct 2025 00:45:10 +0530 Subject: [PATCH 1/7] Normalize structure: Add Introduction and Testing sections (#227) --- ...producible-machine-learning-cloud-pods.mdx | 46 +++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx index d99138ee..318e9a46 100644 --- a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx +++ b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx @@ -12,6 +12,10 @@ pro: true leadimage: "reproducible-machine-learning-cloud-pods-featured-image.png" --- +# Creating reproducible machine learning applications using Cloud Pods for persistent state snapshots + +## Introduction + [LocalStack Cloud Pods](/aws/capabilities/state-management/cloud-pods) enable you to create persistent state snapshots of your LocalStack instance, which can then be versioned, shared, and restored. It allows next-generation state management and team collaboration for your local cloud development environment, which you can utilize to create persistent shareable cloud sandboxes. Cloud Pods works directly with the [LocalStack CLI](/aws/integrations/aws-native-tools/aws-cli#localstack-aws-cli-awslocal) to save, merge, and restore snapshots of your LocalStack state. @@ -248,6 +252,48 @@ null > END RequestId: 6... ``` +## Testing the Application + +After deploying and invoking the Lambdas, verify the end-to-end ML workflow: data loading, training, model persistence, and inference. This confirms reproducibility—re-running after Pod restore should yield identical results. + +### Expected Outputs from Training (ml-train Invocation) + +Invoke with: `awslocal lambda invoke --function-name ml-train /tmp/test.tmp` + +- Logs show dataset load (1797 samples), training on 50% split, and S3 uploads for `model.joblib` and `test-set.npy`. +- No explicit accuracy during training (focus is on saving), but the SVM classifier fits successfully. + +### Expected Outputs from Inference (ml-predict Invocation) + +Invoke with: `awslocal lambda invoke --function-name ml-predict /tmp/test.tmp` + +- Downloads model and test set from S3. +- Runs predictions on the test set (898 samples). +- **Sample prediction result** (first 20): `[8 8 4 9 0 8 9 8 1 2 3 4 5 6 7 8 9 0 1 2]` +- **Expected accuracy**: ~96.9% (calculated as `accuracy_score(y_test, predicted)`—e.g., 870/898 correct). Full logs in LocalStack output (with `DEBUG=1`): +--> prediction result: [8 8 4 9 0 8 9 8 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 9 6 7 8 9 ... 9 5 4 8 8 4 9 0 8 9 8] + + +To compute accuracy locally (optional extension): Add to `infer.py` after predictions: + +```python +from sklearn.metrics import accuracy_score +# Assuming y_test saved similarly +y_test = np.load('y-test.npy') # You'd need to save this during training +accuracy = accuracy_score(y_test, predicted) +print(f"Model accuracy: {accuracy:.4f}") +``` + +Expected: Model accuracy: 0.9689 + +### Validation After Pod Restore + +- Save Pod: `localstack pod save reproducible-ml` +- (In a new instance) Load: `localstack pod load reproducible-ml` +- Re-invoke `ml-predict`: Outputs should match exactly, proving state persistence (S3 objects, Lambdas intact). + +If mismatches occur: Check Pod merge strategy (default: overwrite) or logs for S3/Lambda errors. + ## Creating a Cloud Pod After deploying the Lambda functions, we can create a Cloud Pod to share our local infrastructure and instance state with other LocalStack users in the organization. From df35c9959101412f5dc9deb9fb70a60bbcadc94f Mon Sep 17 00:00:00 2001 From: Quetzalli Date: Mon, 13 Oct 2025 11:36:53 -0700 Subject: [PATCH 2/7] delete header --- .../aws/tutorials/reproducible-machine-learning-cloud-pods.mdx | 1 - 1 file changed, 1 deletion(-) diff --git a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx index 318e9a46..2fbe6ff9 100644 --- a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx +++ b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx @@ -12,7 +12,6 @@ pro: true leadimage: "reproducible-machine-learning-cloud-pods-featured-image.png" --- -# Creating reproducible machine learning applications using Cloud Pods for persistent state snapshots ## Introduction From 0f89dd2c94c14d591444df2344f99caf6b8b8629 Mon Sep 17 00:00:00 2001 From: Quetzalli Date: Mon, 13 Oct 2025 11:48:49 -0700 Subject: [PATCH 3/7] editorial review --- .../reproducible-machine-learning-cloud-pods.mdx | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx index 2fbe6ff9..87617786 100644 --- a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx +++ b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx @@ -253,18 +253,18 @@ null ## Testing the Application -After deploying and invoking the Lambdas, verify the end-to-end ML workflow: data loading, training, model persistence, and inference. This confirms reproducibility—re-running after Pod restore should yield identical results. +After deploying and invoking the Lambdas, verify the end-to-end ML workflow _(data loading, training, model persistence, and inference)_. This confirms reproducibility, and re-running after Pod restore should yield identical results. -### Expected Outputs from Training (ml-train Invocation) +### Expected Outputs from Training -Invoke with: `awslocal lambda invoke --function-name ml-train /tmp/test.tmp` +Invoke `ml-train` with: `awslocal lambda invoke --function-name ml-train /tmp/test.tmp` - Logs show dataset load (1797 samples), training on 50% split, and S3 uploads for `model.joblib` and `test-set.npy`. -- No explicit accuracy during training (focus is on saving), but the SVM classifier fits successfully. +- No explicit accuracy during training (focus is on savings), but the SVM classifier fits successfully. ### Expected Outputs from Inference (ml-predict Invocation) -Invoke with: `awslocal lambda invoke --function-name ml-predict /tmp/test.tmp` +Invoke `ml-predict` with: `awslocal lambda invoke --function-name ml-predict /tmp/test.tmp` - Downloads model and test set from S3. - Runs predictions on the test set (898 samples). @@ -283,7 +283,7 @@ accuracy = accuracy_score(y_test, predicted) print(f"Model accuracy: {accuracy:.4f}") ``` -Expected: Model accuracy: 0.9689 +Expected Model accuracy: 0.9689 ### Validation After Pod Restore @@ -291,7 +291,7 @@ Expected: Model accuracy: 0.9689 - (In a new instance) Load: `localstack pod load reproducible-ml` - Re-invoke `ml-predict`: Outputs should match exactly, proving state persistence (S3 objects, Lambdas intact). -If mismatches occur: Check Pod merge strategy (default: overwrite) or logs for S3/Lambda errors. +If a mismatch occurs, check the Pod's merge strategy `(default: overwrite)` or logs for S3/Lambda errors. ## Creating a Cloud Pod From 81668e34503bb7de3ecb8c1ae998954920dace53 Mon Sep 17 00:00:00 2001 From: Brian Rinaldi Date: Wed, 15 Oct 2025 16:51:57 -0400 Subject: [PATCH 4/7] Update reproducible-machine-learning-cloud-pods.mdx --- .../tutorials/reproducible-machine-learning-cloud-pods.mdx | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx index 87617786..8a7174e1 100644 --- a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx +++ b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx @@ -41,7 +41,7 @@ For this tutorial, you will need the following: - [LocalStack Pro](https://localstack.cloud/pricing/) - [awslocal](/aws/integrations/aws-native-tools/aws-cli#localstack-aws-cli-awslocal) -- [Optical recognition of handwritten digits dataset](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits) +- [Optical recognition of handwritten digits dataset](https://github.com/localstack-samples/localstack-pro-samples/raw/refs/heads/master/reproducible-ml/digits.csv.gz) ([Source](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits)) If you don't have a subscription to LocalStack Pro, you can request a trial license upon sign-up. For this tutorial to work, you must have the LocalStack CLI installed, which must be version 1.3 or higher. @@ -196,6 +196,7 @@ zip lambda.zip train.py zip infer.zip infer.py awslocal s3 mb s3://reproducible-ml awslocal s3 cp lambda.zip s3://reproducible-ml/lambda.zip +awslocal s3 cp infer.zip s3://reproducible-ml/infer.zip awslocal s3 cp digits.csv.gz s3://reproducible-ml/digits.csv.gz ``` @@ -212,7 +213,9 @@ awslocal lambda create-function --function-name ml-train \ --timeout 600 \ --code '{"S3Bucket":"reproducible-ml","S3Key":"lambda.zip"}' \ --layers arn:aws:lambda:us-east-1:446751924810:layer:python-3-8-scikit-learn-0-23-1:2 +``` +```bash awslocal lambda create-function --function-name ml-predict \ --runtime python3.8 \ --role arn:aws:iam::000000000000:role/lambda-role \ From c55651f1619a401c239e0b59bb209f484f532191 Mon Sep 17 00:00:00 2001 From: Brian Rinaldi Date: Wed, 15 Oct 2025 16:53:28 -0400 Subject: [PATCH 5/7] Update reproducible-machine-learning-cloud-pods.mdx --- .../tutorials/reproducible-machine-learning-cloud-pods.mdx | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx index 8a7174e1..e4c3266c 100644 --- a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx +++ b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx @@ -161,6 +161,10 @@ Now, we will create a new file called `infer.py` which will contain a second han This function will be used to perform predictions on new data with the model we trained previously. ```python +import boto3 +import numpy +from joblib import load + def handler(event, context): # download the model and the test set from S3 s3_client = boto3.client("s3") From bb56fca4e27e60029cc7750db4270d30029a05c6 Mon Sep 17 00:00:00 2001 From: Brian Rinaldi Date: Thu, 16 Oct 2025 10:07:26 -0400 Subject: [PATCH 6/7] Fixing the example code --- ...producible-machine-learning-cloud-pods.mdx | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx index e4c3266c..228f9ea7 100644 --- a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx +++ b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx @@ -75,9 +75,9 @@ It is similar to a Python dictionary but provides attribute-style access and can def load_digits(*, n_class=10, return_X_y=False, as_frame=False): # download files from S3 s3_client = boto3.client("s3") - s3_client.download_file(Bucket="pods-test", Key="digits.csv.gz", Filename="digits.csv.gz") + s3_client.download_file(Bucket="reproducible-ml", Key="digits.csv.gz", Filename="/tmp/digits.csv.gz") - data = numpy.loadtxt('digits.csv.gz', delimiter=',') + data = numpy.loadtxt('/tmp/digits.csv.gz', delimiter=',') target = data[:, -1].astype(numpy.int, copy=False) flat_data = data[:, :-1] images = flat_data.view() @@ -141,12 +141,12 @@ def handler(event, context): s3_client = boto3.client("s3") buffer = io.BytesIO() dump(clf, buffer) - s3_client.put_object(Body=buffer.getvalue(), Bucket="pods-test", Key="model.joblib") + s3_client.put_object(Body=buffer.getvalue(), Bucket="reproducible-ml", Key="model.joblib") # Save the test-set to the S3 bucket - numpy.save('test-set.npy', X_test) - with open('test-set.npy', 'rb') as f: - s3_client.put_object(Body=f, Bucket="pods-test", Key="test-set.npy") + numpy.save('/tmp/test-set.npy', X_test) + with open('/tmp/test-set.npy', 'rb') as f: + s3_client.put_object(Body=f, Bucket="reproducible-ml", Key="test-set.npy") ``` First, we loaded the images and flattened them into 1-dimensional arrays. @@ -168,13 +168,13 @@ from joblib import load def handler(event, context): # download the model and the test set from S3 s3_client = boto3.client("s3") - s3_client.download_file(Bucket="pods-test", Key="test-set.npy", Filename="test-set.npy") - s3_client.download_file(Bucket="pods-test", Key="model.joblib", Filename="model.joblib") + s3_client.download_file(Bucket="reproducible-ml", Key="test-set.npy", Filename="/tmp/test-set.npy") + s3_client.download_file(Bucket="reproducible-ml", Key="model.joblib", Filename="/tmp/model.joblib") - with open("test-set.npy", "rb") as f: + with open("/tmp/test-set.npy", "rb") as f: X_test = numpy.load(f) - clf = load("model.joblib") + clf = load("/tmp/model.joblib") predicted = clf.predict(X_test) print("--> prediction result:", predicted) From a4b2a55fd561e0306a2a58e358f2e3d6af545081 Mon Sep 17 00:00:00 2001 From: Brian Rinaldi Date: Thu, 16 Oct 2025 11:11:11 -0400 Subject: [PATCH 7/7] Update reproducible-machine-learning-cloud-pods.mdx --- ...producible-machine-learning-cloud-pods.mdx | 84 +++++++++---------- 1 file changed, 42 insertions(+), 42 deletions(-) diff --git a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx index 228f9ea7..c4652ea0 100644 --- a/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx +++ b/src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx @@ -258,48 +258,6 @@ null > END RequestId: 6... ``` -## Testing the Application - -After deploying and invoking the Lambdas, verify the end-to-end ML workflow _(data loading, training, model persistence, and inference)_. This confirms reproducibility, and re-running after Pod restore should yield identical results. - -### Expected Outputs from Training - -Invoke `ml-train` with: `awslocal lambda invoke --function-name ml-train /tmp/test.tmp` - -- Logs show dataset load (1797 samples), training on 50% split, and S3 uploads for `model.joblib` and `test-set.npy`. -- No explicit accuracy during training (focus is on savings), but the SVM classifier fits successfully. - -### Expected Outputs from Inference (ml-predict Invocation) - -Invoke `ml-predict` with: `awslocal lambda invoke --function-name ml-predict /tmp/test.tmp` - -- Downloads model and test set from S3. -- Runs predictions on the test set (898 samples). -- **Sample prediction result** (first 20): `[8 8 4 9 0 8 9 8 1 2 3 4 5 6 7 8 9 0 1 2]` -- **Expected accuracy**: ~96.9% (calculated as `accuracy_score(y_test, predicted)`—e.g., 870/898 correct). Full logs in LocalStack output (with `DEBUG=1`): ---> prediction result: [8 8 4 9 0 8 9 8 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 9 6 7 8 9 ... 9 5 4 8 8 4 9 0 8 9 8] - - -To compute accuracy locally (optional extension): Add to `infer.py` after predictions: - -```python -from sklearn.metrics import accuracy_score -# Assuming y_test saved similarly -y_test = np.load('y-test.npy') # You'd need to save this during training -accuracy = accuracy_score(y_test, predicted) -print(f"Model accuracy: {accuracy:.4f}") -``` - -Expected Model accuracy: 0.9689 - -### Validation After Pod Restore - -- Save Pod: `localstack pod save reproducible-ml` -- (In a new instance) Load: `localstack pod load reproducible-ml` -- Re-invoke `ml-predict`: Outputs should match exactly, proving state persistence (S3 objects, Lambdas intact). - -If a mismatch occurs, check the Pod's merge strategy `(default: overwrite)` or logs for S3/Lambda errors. - ## Creating a Cloud Pod After deploying the Lambda functions, we can create a Cloud Pod to share our local infrastructure and instance state with other LocalStack users in the organization. @@ -383,6 +341,48 @@ The available merge strategies are: ![State Merge mechanisms with LocalStack Cloud Pods](/images/aws/cloud-pods-state-merge-mechanisms.png) +## Testing the Application + +After deploying and invoking the Lambdas, first verify the end-to-end ML workflow via the data loading, training, and inference. After successfully running the application and saving a Cloud Pod, re-running the application after Pod restore should yield identical results. + +### Expected Outputs from Training + +Invoke `ml-train` with: `awslocal lambda invoke --function-name ml-train /tmp/test.tmp` + +- Logs show dataset load (1797 samples), training on 50% split, and S3 uploads for `model.joblib` and `test-set.npy`. +- No explicit accuracy during training (focus is on savings), but the SVM classifier fits successfully. + +### Expected Outputs from Inference (ml-predict Invocation) + +Invoke `ml-predict` with: `awslocal lambda invoke --function-name ml-predict /tmp/test.tmp` + +- Downloads model and test set from S3. +- Runs predictions on the test set (898 samples). +- **Sample prediction result** (first 20): `[8 8 4 9 0 8 9 8 1 2 3 4 5 6 7 8 9 0 1 2]` +- **Expected accuracy**: ~96.9% (calculated as `accuracy_score(y_test, predicted)`—e.g., 870/898 correct). Full logs in LocalStack output (with `DEBUG=1`): +--> prediction result: [8 8 4 9 0 8 9 8 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 9 6 7 8 9 ... 9 5 4 8 8 4 9 0 8 9 8] + + +To compute accuracy locally (optional extension): Add to `infer.py` after predictions: + +```python +from sklearn.metrics import accuracy_score +# Assuming y_test saved similarly +y_test = np.load('y-test.npy') # You'd need to save this during training +accuracy = accuracy_score(y_test, predicted) +print(f"Model accuracy: {accuracy:.4f}") +``` + +Expected Model accuracy: 0.9689 + +### Validation After Pod Restore + +- Save Pod: `localstack pod save reproducible-ml` +- (In a new instance) Load: `localstack pod load reproducible-ml` +- Re-invoke `ml-predict`: Outputs should match exactly, proving state persistence (S3 objects, Lambdas intact). + +If a mismatch occurs, check the Pod's merge strategy `(default: overwrite)` or logs for S3/Lambda errors. + ## Conclusion In conclusion, LocalStack Cloud Pods facilitate collaboration and debugging among team members by allowing the sharing of local cloud infrastructure and instance state.