Introduce a simpler cache dedicated to just decode JPEGs #1550

jleibs · 2023-03-09T23:34:32Z

This cache stores a Tensor entity built from the decoded data.

This is now used in the few places where we have queried Tensors, but the TensorImageCache no longer needs to worry about JPEG-encoded data.

Ideally in the future this would just become something like a derived component from a new CompressedTensor. Then anything querying for Tensors would find this automatically with decoding and caching happening at the store/query layer instead. Some views (like the selection view) could then optionally handle CompressedTensor for a few UI elements like "download original image."

Related to:

Separate cache for decoded JPEGs #1462

Checklist

I have read and agree to Contributor Guide and the Code of Conduct
I've included a screenshot or gif (if applicable)

jleibs · 2023-03-09T23:42:16Z

crates/re_log_types/src/component_types/tensor.rs

@@ -535,7 +551,7 @@ impl<'a> TryFrom<&'a Tensor> for ::ndarray::ArrayViewD<'a, half::f16> {

 #[cfg(feature = "image")]
 #[derive(thiserror::Error, Debug)]
-pub enum ImageError {
+pub enum TensorImageError {


Differentiating this from the image::ImageError had me scratching my head for a bit.

Wumpf

Looking very good! But I'm very concerned about the tensor clone operation at the end of try_decode_tensor_if_necessary - I looked at Tensor and TensorData and they don't contain any Arc/Cow etc.. We need to come up with something more sophisticated that gets a handle to the tensor

Wumpf · 2023-03-10T10:08:24Z

crates/re_log_types/src/component_types/tensor.rs

+        match &self.data {
+            TensorData::U8(buf) | TensorData::JPEG(buf) => buf.len(),
+            TensorData::U16(buf) => buf.len(),
+            TensorData::U32(buf) => buf.len(),
+            TensorData::U64(buf) => buf.len(),
+            TensorData::I8(buf) => buf.len(),
+            TensorData::I16(buf) => buf.len(),
+            TensorData::I32(buf) => buf.len(),
+            TensorData::I64(buf) => buf.len(),
+            TensorData::F32(buf) => buf.len(),
+            TensorData::F64(buf) => buf.len(),


this looks so sad :D

Wumpf · 2023-03-10T10:11:49Z

crates/re_viewer/src/misc/caches/tensor_decode_cache.rs

+pub enum TensorDecodeError {
+    // TODO(jleibs): It would be nice to just transparently wrap
+    // `image::ImageError` and `tensor::TensorImageError` but neither implements
+    // `Clone`, which we need if we ant to cache the Result.


Suggested change

// `Clone`, which we need if we ant to cache the Result.

// `Clone`, which we need if we want to cache the Result.

isn't the later our own and we could implement Clone?

Yes... except it wraps ImageError :-D

Wumpf · 2023-03-10T10:14:16Z

crates/re_viewer/src/misc/caches/tensor_decode_cache.rs

+                        };
+
+                        let memory_used = match &tensor {
+                            Ok(tensor) => tensor.size_in_bytes() as u64,


nit: I'd expect usize all the way for cpu-sided memory sizes. But admittedly it's a mess. Have this casting issue again and again in re_renderer....

crates/re_viewer/src/misc/caches/tensor_decode_cache.rs

crates/re_viewer/src/ui/data_ui/image.rs

crates/re_viewer/src/ui/view_spatial/scene/scene_part/images.rs

jleibs · 2023-03-10T21:02:36Z

@Wumpf ok I pulled in the shim for to make these arrow Buffer objects so they can be cheaply cloned:

emilk · 2023-03-11T11:11:46Z

crates/re_log_types/src/component_types/arrow_convert_shims.rs

+/// Can be removed when: [arrow2-convert#103](https://github.com/DataEngineeringLabs/arrow2-convert/pull/103) lands
+#[derive(Clone, Debug, PartialEq, ArrowField, ArrowSerialize)]
+#[arrow_field(transparent)]
+pub struct BinaryBuffer(pub Buffer<u8>);


Isn't ByteBuffer a more fititng name name? 😬

Arrow calls it a BinaryArray so this is the Buffer that a binary array deserializes into.

emilk · 2023-03-11T11:14:28Z

crates/re_log_types/src/component_types/tensor.rs

+            TensorData::I32(buf) => buf.len(),
+            TensorData::I64(buf) => buf.len(),
+            TensorData::F32(buf) => buf.len(),
+            TensorData::F64(buf) => buf.len(),


Oh wow, it is quite surprising to me that Buffer<f64>::len() is the number of bytes, and not the number of elements, but it seems right: https://docs.rs/arrow2/latest/arrow2/buffer/struct.Buffer.html#method.len

I filed an issue for this: jorgecarleitao/arrow2#1430

emilk · 2023-03-11T11:16:25Z

crates/re_viewer/src/misc/caches/mod.rs

        let max_image_cache_use = 1_000_000_000;
-        self.image.new_frame(max_image_cache_use);
+        let max_decode_cache_use = 1_000_000_000;


Since the decode cache is RAM-only, we could make it quite a bit bigger. Maybe 8 GB?

Suggested change

let max_decode_cache_use = 1_000_000_000;

let max_decode_cache_use = 8_000_000_000;

but Web!! If we go there, this needs to have a different limit on Web

good catch - yeah, 8GB is quite a high limit for a 4iGB system :)

8 still seems high in general. Splitting the difference and going with 4 normally and keeping it as 1 for wasm.

crates/re_viewer/src/misc/caches/tensor_decode_cache.rs

emilk · 2023-03-11T11:27:56Z

crates/re_viewer/src/misc/caches/tensor_decode_cache.rs

+
+                lookup.tensor.clone()
+            }
+            _ => Ok(maybe_encoded_tensor),


Maybe we should have TensorData be split into just two categories: Compressed and Raw so we can do an exhaustive match here. Or at least TensorData::Compressed(CompressedTensor) + enum CompressedTensor { Jpeg(…) } so it is easier to add png etc.

…but we can do that in another PR though.

Yeah, I had the exact same thought as I was working through this. I think I'd like to split the tensor into 3 components. One for the meta info, one for native buffers, and one for compressed buffers. Then use the compressed data -> native as a very minimal prototype for a cached "derived component".

emilk · 2023-03-11T11:32:44Z

crates/re_log_types/src/component_types/tensor.rs

+            TensorData::I32(buf) => buf.len(),
+            TensorData::I64(buf) => buf.len(),
+            TensorData::F32(buf) => buf.len(),
+            TensorData::F64(buf) => buf.len(),


I filed an issue for this: jorgecarleitao/arrow2#1430

Wumpf · 2023-03-11T15:41:45Z

@Wumpf ok I pulled in the shim for to make these arrow Buffer objects so they can be cheaply cloned:

Nice! But - completely unrelated to anything here - oh my god RenderContext::before_submit is going crazy. I'll need to look into this at some point

jleibs changed the title ~~Jleibs/image decode cache~~ Introduce a simpler cache dedicated to just decode JPEGs Mar 9, 2023

jleibs added the 🚀 performance Optimization, memory use, etc label Mar 9, 2023

jleibs marked this pull request as ready for review March 9, 2023 23:41

jleibs commented Mar 9, 2023

View reviewed changes

Wumpf requested review from Wumpf and teh-cmc and removed request for teh-cmc March 10, 2023 08:05

Wumpf requested changes Mar 10, 2023

View reviewed changes

jleibs added 9 commits March 10, 2023 21:34

New cache for decoded tensors

233555b

Use the tensor_decode_cache

b6c0762

Actually track/purge the decode cache

78ca256

Fix up some comments

c524008

At least log warning if the image can't be decoded

4f9b88b

Remove file that wasn't used

66640ab

Pull in the zero-copy deserialization for ArrowBinary

f834747

Some cleanup from PR

983b703

Remove gratuitous clone

60ba461

jleibs force-pushed the jleibs/image_decode_cache branch from 8e59ca2 to 60ba461 Compare March 10, 2023 20:54

jleibs requested a review from Wumpf March 10, 2023 20:57

Fix doc

2d37a65

emilk reviewed Mar 11, 2023

View reviewed changes

crates/re_viewer/src/misc/caches/tensor_decode_cache.rs Outdated Show resolved Hide resolved

emilk approved these changes Mar 11, 2023

View reviewed changes

jleibs added 3 commits March 12, 2023 19:38

Keep error messages in the results and clean up some comments

8200569

Bump up decode cache to 4G on non-wasm targets

742201b

Merge branch 'main' into jleibs/image_decode_cache

cb15939

Wumpf approved these changes Mar 13, 2023

View reviewed changes

jleibs merged commit 79af754 into main Mar 13, 2023

jleibs deleted the jleibs/image_decode_cache branch March 13, 2023 14:33

emilk mentioned this pull request Mar 27, 2023

Add a script that generates a changelog from recent PRs and their labels #1718

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a simpler cache dedicated to just decode JPEGs #1550

Introduce a simpler cache dedicated to just decode JPEGs #1550

jleibs commented Mar 9, 2023 •

edited

Loading

jleibs Mar 9, 2023

Wumpf left a comment

Wumpf Mar 10, 2023

Wumpf Mar 10, 2023

jleibs Mar 10, 2023

Wumpf Mar 10, 2023

jleibs commented Mar 10, 2023

emilk Mar 11, 2023

jleibs Mar 12, 2023

emilk Mar 11, 2023

emilk Mar 11, 2023

emilk Mar 11, 2023

Wumpf Mar 11, 2023

emilk Mar 12, 2023 •

edited

Loading

jleibs Mar 12, 2023

emilk Mar 11, 2023

jleibs Mar 12, 2023

emilk Mar 11, 2023

Wumpf commented Mar 11, 2023

	// `Clone`, which we need if we ant to cache the Result.
	// `Clone`, which we need if we want to cache the Result.

	let max_decode_cache_use = 1_000_000_000;
	let max_decode_cache_use = 8_000_000_000;

Introduce a simpler cache dedicated to just decode JPEGs #1550

Introduce a simpler cache dedicated to just decode JPEGs #1550

Conversation

jleibs commented Mar 9, 2023 • edited Loading

Checklist

Choose a reason for hiding this comment

Wumpf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jleibs commented Mar 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emilk Mar 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Wumpf commented Mar 11, 2023

jleibs commented Mar 9, 2023 •

edited

Loading

emilk Mar 12, 2023 •

edited

Loading