Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ pro: true
leadimage: "reproducible-machine-learning-cloud-pods-featured-image.png"
---


## Introduction

[LocalStack Cloud Pods](/aws/capabilities/state-management/cloud-pods) enable you to create persistent state snapshots of your LocalStack instance, which can then be versioned, shared, and restored.
It allows next-generation state management and team collaboration for your local cloud development environment, which you can utilize to create persistent shareable cloud sandboxes.
Cloud Pods works directly with the [LocalStack CLI](/aws/integrations/aws-native-tools/aws-cli#localstack-aws-cli-awslocal) to save, merge, and restore snapshots of your LocalStack state.
Expand All @@ -38,7 +41,7 @@ For this tutorial, you will need the following:

- [LocalStack Pro](https://localstack.cloud/pricing/)
- [awslocal](/aws/integrations/aws-native-tools/aws-cli#localstack-aws-cli-awslocal)
- [Optical recognition of handwritten digits dataset](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits)
- [Optical recognition of handwritten digits dataset](https://github.com/localstack-samples/localstack-pro-samples/raw/refs/heads/master/reproducible-ml/digits.csv.gz) ([Source](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits))

If you don't have a subscription to LocalStack Pro, you can request a trial license upon sign-up.
For this tutorial to work, you must have the LocalStack CLI installed, which must be version 1.3 or higher.
Expand Down Expand Up @@ -158,6 +161,10 @@ Now, we will create a new file called `infer.py` which will contain a second han
This function will be used to perform predictions on new data with the model we trained previously.

```python
import boto3
import numpy
from joblib import load

def handler(event, context):
# download the model and the test set from S3
s3_client = boto3.client("s3")
Expand Down Expand Up @@ -193,6 +200,7 @@ zip lambda.zip train.py
zip infer.zip infer.py
awslocal s3 mb s3://reproducible-ml
awslocal s3 cp lambda.zip s3://reproducible-ml/lambda.zip
awslocal s3 cp infer.zip s3://reproducible-ml/infer.zip
awslocal s3 cp digits.csv.gz s3://reproducible-ml/digits.csv.gz
```

Expand All @@ -209,7 +217,9 @@ awslocal lambda create-function --function-name ml-train \
--timeout 600 \
--code '{"S3Bucket":"reproducible-ml","S3Key":"lambda.zip"}' \
--layers arn:aws:lambda:us-east-1:446751924810:layer:python-3-8-scikit-learn-0-23-1:2
```

```bash
awslocal lambda create-function --function-name ml-predict \
--runtime python3.8 \
--role arn:aws:iam::000000000000:role/lambda-role \
Expand Down Expand Up @@ -248,6 +258,48 @@ null
> END RequestId: 6...
```

## Testing the Application

After deploying and invoking the Lambdas, verify the end-to-end ML workflow _(data loading, training, model persistence, and inference)_. This confirms reproducibility, and re-running after Pod restore should yield identical results.

### Expected Outputs from Training

Invoke `ml-train` with: `awslocal lambda invoke --function-name ml-train /tmp/test.tmp`

- Logs show dataset load (1797 samples), training on 50% split, and S3 uploads for `model.joblib` and `test-set.npy`.
- No explicit accuracy during training (focus is on savings), but the SVM classifier fits successfully.

### Expected Outputs from Inference (ml-predict Invocation)

Invoke `ml-predict` with: `awslocal lambda invoke --function-name ml-predict /tmp/test.tmp`

- Downloads model and test set from S3.
- Runs predictions on the test set (898 samples).
- **Sample prediction result** (first 20): `[8 8 4 9 0 8 9 8 1 2 3 4 5 6 7 8 9 0 1 2]`
- **Expected accuracy**: ~96.9% (calculated as `accuracy_score(y_test, predicted)`—e.g., 870/898 correct). Full logs in LocalStack output (with `DEBUG=1`):
--> prediction result: [8 8 4 9 0 8 9 8 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 9 6 7 8 9 ... 9 5 4 8 8 4 9 0 8 9 8]


To compute accuracy locally (optional extension): Add to `infer.py` after predictions:

```python
from sklearn.metrics import accuracy_score
# Assuming y_test saved similarly
y_test = np.load('y-test.npy') # You'd need to save this during training
accuracy = accuracy_score(y_test, predicted)
print(f"Model accuracy: {accuracy:.4f}")
```

Expected Model accuracy: 0.9689

### Validation After Pod Restore

- Save Pod: `localstack pod save reproducible-ml`
- (In a new instance) Load: `localstack pod load reproducible-ml`
- Re-invoke `ml-predict`: Outputs should match exactly, proving state persistence (S3 objects, Lambdas intact).

If a mismatch occurs, check the Pod's merge strategy `(default: overwrite)` or logs for S3/Lambda errors.

## Creating a Cloud Pod

After deploying the Lambda functions, we can create a Cloud Pod to share our local infrastructure and instance state with other LocalStack users in the organization.
Expand Down