Incorrect reporting of memory utilisation

**Describe the bug**
I'm running into issues with batch transform due to what I assume is an OOM condition. The main problem appears to be because as far as I can see there's no way to explicitly configure the batch_size for a batch transform that I'm aware of.

Instead the batch_size appears to be controlled by `MaxPayloadInMB` which has a minimum of 1. I added logging in my `predict_fn` and observe that I'm receiving a mix of batches containing 1000 examples, and some that contain 10k+ examples. The huge batches are pretty much 1MB is size - I have no idea where the batches of 1000 come from (I'm wondering if its splitting the last batch that is less than the 1MB payload).

The issue is that the large batches seem to occasionally cause the worker to crash - I suspect it's an out-of-memory (the obvious workaround is to pick a machine with more memory). When I look at the logs the maximum utilisation appears to be around 50% - but looking closer that metric appears wrong,  the example below has MemoryUsed=3537.828125 / MemoryAvailable=3843.3515625 = MemoryUtilization=50%

**Expected behavior**
MemoryUtilization = 100.0 * MemoryUsed / MemoryAvailable

**Screenshots or logs**
```text
2023-03-22T12:53:27.708+11:00 | 2023-03-22T01:53:26,857 [INFO ] pool-3-thread-2 TS_METRICS - MemoryAvailable.Megabytes:3843.3515625\|#Level:Host\|#hostname:4a73e96743e7,timestamp:1679450006
-- | --
  | 2023-03-22T12:53:27.708+11:00 | 2023-03-22T01:53:26,857 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUsed.Megabytes:3537.828125\|#Level:Host\|#hostname:4a73e96743e7,timestamp:1679450006
  | 2023-03-22T12:53:27.708+11:00 | 2023-03-22T01:53:26,857 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUtilization.Percent:50.0\|#Level:Host\|#hostname:4a73e96743e7,timestamp:1679450006
```


**System information**
A description of your system. Please provide:
- **Toolkit version**: pytorch
- **Framework version**: 1.13.1
- **Python version**: 3.9
- **CPU or GPU**: CPU
- **Custom Docker image (Y/N)**: No

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect reporting of memory utilisation #141

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect reporting of memory utilisation #141

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions