Merge pull request #54 from sz3/glshake

Encoder improvements. 118 KB/s
sz3 · Mar 6, 2021 · 9ffb94e · 9ffb94e
2 parents 9b90e78 + 61fff8f
commit 9ffb94e
Show file tree

Hide file tree

Showing 20 changed files with 282 additions and 117 deletions.
diff --git a/DETAILS.md b/DETAILS.md
@@ -3,7 +3,7 @@
 
 ## The premise
 
-Conceptually, cimbar is built on top of `image hashing`:
+Cimbar is a grid of colored tiles. Conceptually, it is built on the idea of `image hashing`:
 
 ![example image hash](https://github.com/sz3/cimbar-samples/blob/v0.5/docs/imagehash.png)
 
@@ -84,7 +84,7 @@ These properties may appear to be magical as you consider them more, and they do
 2. wirehair requires the file contents to be stored in RAM
  * this relates to the size limit!
 
-This constraint is less of an obstacle than it may seem -- the fountain codes are essentially being used as a wire format, and the encoder and decoder could agree on a scheme to split up, and then reassemble, larger files. Cimbar does not yet implement this, however!
+This constraint is less of an obstacle than it may seem -- the fountain codes are essentially being used as a wire format, and the encoder and decoder could agree on a scheme to split up, and then reassemble, larger files. Cimbar does not (yet?) implement this, however!
 
 ## Implementation: Decoder
 

diff --git a/PERFORMANCE.md b/PERFORMANCE.md
@@ -9,22 +9,32 @@
  * There are 4 or 8 possible colors, encoding an additional 2-3 bits per tile.
  * These 6-7 bits per tile work out to a maximum of 9300-10850 bytes per barcode, though in practice this number is reduced by error correction.
  * The default ecc setting is 30/155, which is how we go from 9300 -> 7500 bytes of real data for a 4-color cimbar image.
-  * Reed Solomon is not an ideal for this use case -- specifically, it corrects byte errors, and cimbar errors tend to involve 1-3 bits at a time. However, since Reed Solomon implementations are ubiquitous, I used it for this prototype.
+  * Reed Solomon is not perfect for this use case -- specifically, it corrects byte errors, and cimbar errors tend to involve 1-3 bits at a time. However, since Reed Solomon implementations are ubiquitous, it is currently in use.
 
 ## Current sustained benchmark
 
 * 4-color cimbar with ecc=30:
- * 2,980,556 bytes (after compression) in 36s -> 662 kilobits/s (~82 KB/s)
+ * 4,717,525 bytes (after compression) in 45s -> 838 kilobits/s (~104 KB/s)
 
 * 8-color cimbar with ecc=30:
- * 2,980,556 bytes in 31s -> 769 kilobits/s (~96 KB/s)
+ * 4,717,525 bytes in 40s -> 943 kilobits/s (~118 KB/s)
 
 * details:
- * these numbers are use https://github.com/sz3/cfc, running with 4 CPU threads on a Qualcomm Snapdragon 625
+ * cimbar has built-in compression using zstd. What's being measured here is bits over the wire, e.g. data after compression is applied.
+ * these numbers are using https://github.com/sz3/cfc, running with 4 CPU threads on a Qualcomm Snapdragon 625
   * perhaps I will buy a new cell phone to inflate the benchmark numbers.
- * the sender commandline is `./cimbar_send /path/to/file -s`
-  * the `shakycam` option allows cfc to quickly discard ghosted frames, and spend more time decoding real data.
+ * the sender is the cimbar.org wasm implementation. An equivalent command line is `./cimbar_send /path/to/file -s`
+  * cimbar.org uses the `shakycam` option to allow the receiver to detect/discard "in between" frames as part of the scan step. This allows it to spend more processing time decoding real data.
  * burst rate can be higher (or lower)
   * to this end, lower ecc settings *can* provide better burst rates
- * 8-color cimbar is considerably more sensitive to lighting conditions. Notably, decoding has some issues with dim screens.
+ * 4-color cimbar is currently preferred, and will give more consistent transfer speeds.
+ * 8-color cimbar should be considered a prototype within a prototype. It is considerably more sensitive to lighting conditions and color tints.
 
+* other notes:
+  * having better lighting in the frame often leads to better results -- this is why cimbar.org has a (mostly) white background. cfc uses android's auto-exposure, auto-focus, etc (it's a very simple app). Good ambient light -- or a white background -- can lead to more consitent quality frame capture.
+  * because of the lighting/exposure question, I usually "shoot" in landscape instead of portrait.
+  * cfc currently has a low resolution, so the cimbar frame should take up as much of the display as possible (trust the guide brackets)
+  * similarly, it's best to keep the camera angle straight-on -- instead of at an angle -- to decode the whole image successfully. Decodes should still happen at higher angles, but the "smaller" part of the image may have more errors than the ECC can deal with.
+  * other things to be wary of:
+  * glare from light sources.
+  * shaky hands.
diff --git a/README.md b/README.md
@@ -5,7 +5,9 @@
 
 Behold: an experimental barcode format for air-gapped data transfer.
 
-It can sustain speeds of 770+ kilobits/s (~96 KB/s) using nothing but a smartphone camera!
+It can sustain speeds of 943+ kilobits/s (~118 KB/s) using just a computer monitor and a smartphone camera!
+
+![A non-animated cimbar code](https://github.com/sz3/cimbar-samples/blob/v0.5/6bit/4cecc30f.png)
 
 ## Explain?
 
@@ -21,11 +23,13 @@ No internet/bluetooth/NFC/etc is used. All data is transmitted through the camer
 
 `cimbar` is a high-density 2D barcode format. Data is stored in a grid of colored tiles -- bits are encoded based on which tile is chosen, and which color is chosen to draw the tile. Reed Solomon error correction is applied on the data, to account for the lossy nature of the video -> digital decoding. Sub-1% error rates are expected, and corrected.
 
-`libcimbar`, this optimized implementation, includes a simple protocol for file encoding based on fountain codes (`wirehair`). Files of up to 33MB can be encoded in a series of cimbar codes, which can be output as images or a live video feed. Once enough distinct image frames have been decoded successfully, the file will be reconstructed successfully. This is true even if the images are received out of order, or if some have been corrupted or are missing.
+`libcimbar`, this optimized implementation, includes a simple protocol for file encoding built on fountain codes (`wirehair`) and zstd compression. Files of up to 33MB (after compression!) are encoded in a series of cimbar codes, which can be output as images or a live video feed. Once enough distinct image frames have been decoded successfully, the file will be reconstructed and decompressed successfully. This is true even if the images are received out of order, or if some have been corrupted or are missing.
 
 ## Platforms
 
-The code is written in C++, and developed/tested on amd64+linux, arm64+android, and emscripten+wasm. It probably works, or can be made to work, on other platforms.
+The code is written in C++, and developed/tested on amd64+linux, arm64+android (decoder only), and emscripten+WASM (encoder only). It probably works, or can be made to work, on other platforms.
+
+Crucially, because the encoder compiles to asmjs and wasm, it can run on anything with a modern web browser. There are [releases](https://github.com/sz3/libcimbar/releases/latest) if you wish to run the encoder locally instead of via cimbar.org.
 
 ## Library dependencies
 

diff --git a/TODO.md b/TODO.md
@@ -8,15 +8,15 @@ libcimbar is fairly optimized, to achieve the *proof* part of proof-of-concept.
 Performance optimizations aside, there are a number of paths that might be interesting to pursue. Some I may take a look at, but most I will leave to any enterprising developer who wants to take up the cause:
 
 * proper metadata/header information?
- * would be nice to be able to determine ecc/#colors/#smybols from the cimbar image itself?
+ * would be nice to be able to determine ecc/#colors/#symbols from the cimbar image itself?
  * The bottom right corner is the obvious place to reclaim space to make this possible.
 * multi-frame decoding?
  * when decoding a static cimbar image, it would be useful to be able to use prior (unsuccessful) decode attempts to inform a future decode, and -- hopefully -- increase the probability of success. Currently, all frames are decoded independently.
   * there is already a granular confidence metric that could be reused -- the `distance` that's tracked when decoding symbol tiles...
 * optimal symbol set?
  * the 16-symbol (4 bit) set is hand-drawn. I stared with ~40 or so hand-drawn symbols, and used the 16 that performed best with each other.
  * there is surely a more optimal set -- a more rigorous approach should yield lower error rates!
- * but, more importantly, it may be possible to go up to 32 symbols, and encode 5 bits per tile?
+ * but, more importantly, it may be possible to go up to 32 symbols, and encode 5 symbol bits per tile?
 * optimal symbol size?
  * the symbols that make up each cell on the cimbar grid are 8x8 (in a 9x9 grid).
  * this is because imagehash was on 8x8 tiles!
@@ -25,16 +25,24 @@ Performance optimizations aside, there are a number of paths that might be inter
 * optimal color set?
  * the 4-color (2 bit) pallettes seem reasonable. 8-color, perhaps less so?
  * this may be a limitation of the algorithm/approach, however. Notably, since each symbol is drawn with one pallette color, all colors need sufficient contrast against the backdrop (#000 or #FFF, depending). This constrains the color space somewhat, and less distinct colors == more errors.
+ * in addition to contrast, there is interplay (that I don't currently understand) between the overall brightness of the image and the exposure time needed for high framerate capture. More clean frames == more troughput.
 * optimal grid size?
  * 1024x1024 is a remnant of the early prototyping process. There is nothing inherently special about it (except that it fits on a 1920x1080 screen, which seems good)
   * the tile grid itself is 1008x1008 (1008 == 9x112 -- there are 112 tile rows and columns)
  * a smaller grid would be less information dense, but more resilient to errors. Probably.
 * optimal grid shape?
  * it's a square because QR codes are square. That's it. Should it be?
+ * I'm strongly considering 4:3 for the next revision.
 * more efficient ECC?
- * LDPC?
- * Reed Solomon operates on bytes. Most decode errors tend to average out at 1-3 bits. It's not a total disaster, because it works. However, it would be nice to have denser error correction codes.
+ * QC-LDPC?
+ * Reed Solomon operates on bytes. Most decode errors tend to average out at 1-3 bits. (In the pathological case, a single read error will span two bytes.) It's not a total disaster -- it still works. 
+ * I expect that state of the art ECC will allow 6-15% better throughput.
+  * it's a wide range due to various unknowns (unknowns to me, anyway)
 * proper GPU support (OpenCV + openCL) on android?
+ * It *might* be useful. [CFC]((https://github.com/sz3/cfc) is the current test bed for this.
+* wasm decoder?
+ * probably needs to use Web Workers
+ * in-browser GPGPU support would be interesting (but I'm not counting on it)
 * ???
  * still reading? Of course there's more! There's always more!
 

diff --git a/src/exe/cimbar/cimbar.cpp b/src/exe/cimbar/cimbar.cpp
@@ -74,7 +74,7 @@ int main(int argc, char** argv)
  unsigned ecc = cimbar::Config::ecc_bytes();
  options.add_options()
  ("i,in", "Encoded pngs/jpgs/etc (for decode), or file to encode", cxxopts::value<vector<string>>())
- ("o,out", "Output file or directory.", cxxopts::value<string>())
+ ("o,out", "Output file prefix (encoding) or directory (decoding).", cxxopts::value<string>())
  ("c,color-bits", "Color bits. [0-3]", cxxopts::value<int>()->default_value(turbo::str::str(colorBits)))
  ("e,ecc", "ECC level", cxxopts::value<unsigned>()->default_value(turbo::str::str(ecc)))
  ("f,fountain", "Attempt fountain encode/decode", cxxopts::value<bool>())

diff --git a/src/exe/cimbar_send/send.cpp b/src/exe/cimbar_send/send.cpp
@@ -55,18 +55,11 @@ int main(int argc, char** argv)
  fps = defaultFps;
  unsigned delay = 1000 / fps;
 
- bool dark = true;
  bool use_rotatecam = result.count("rotatecam");
  bool use_shakycam = result.count("shakycam");
+ int window_size = 1080;
 
- cimbar::shaky_cam cam(cimbar::Config::image_size(), 1080, 1080, dark);
- // if we don't need the shakycam, we'll just turn it off
- // we could use a separate code path (just do a mat copyTo),
- // but this is fine.
- if (!use_shakycam)
- cam.toggle();
-
- cimbar::window w(cam.width(), cam.height(), "cimbar_send");
+ cimbar::window w(window_size, window_size, "cimbar_send");
  if (!w.is_good())
  {
  std::cerr << "failed to create window :(" << std::endl;
@@ -76,21 +69,22 @@ int main(int argc, char** argv)
  bool running = true;
  bool start = true;
 
- auto draw = [&w, &cam, use_rotatecam, delay, &running, &start] (const cv::Mat& frame, unsigned) {
+ auto draw = [&w, use_rotatecam, use_shakycam, delay, &running, &start] (const cv::Mat& frame, unsigned) {
  if (!start and w.should_close())
  return running = false;
  start = false;
 
- cv::Mat& windowImg = cam.draw(frame);
- w.show(windowImg, delay);
+ w.show(frame, delay);
  if (use_rotatecam)
  w.rotate();
+ if (use_shakycam)
+ w.shake();
  return true;
  };
 
  Encoder en(ecc, cimbar::Config::symbol_bits(), colorBits);
  while (running)
  for (const string& f : infiles)
- en.encode_fountain(f, draw, compressionLevel);
+ en.encode_fountain(f, draw, compressionLevel, 8.0, window_size);
  return 0;
 }
diff --git a/src/lib/cimb_translator/CimbWriter.cpp b/src/lib/cimb_translator/CimbWriter.cpp
@@ -33,39 +33,45 @@ namespace {
  string name = dark? "guide-vertical-dark" : "guide-vertical-light";
  return cimbar::load_img(fmt::format("bitmap/{}.png", name));
  }
-
- void paste(cv::Mat& canvas, const cv::Mat& img, int x, int y)
- {
- img.copyTo(canvas(cv::Rect(x, y, img.cols, img.rows)));
- }
 }
 
-CimbWriter::CimbWriter(unsigned symbol_bits, unsigned color_bits, bool dark)
+CimbWriter::CimbWriter(unsigned symbol_bits, unsigned color_bits, bool dark, int size)
  : _positions(Config::cell_spacing(), Config::num_cells(), Config::cell_size(), Config::corner_padding(), Config::interleave_blocks(), Config::interleave_partitions())
  , _encoder(symbol_bits, color_bits)
 {
- unsigned size = cimbar::Config::image_size();
+ if (size > cimbar::Config::image_size())
+ _offset = (size - cimbar::Config::image_size()) / 2;
+ else
+ size = cimbar::Config::image_size();
 
  cv::Scalar bgcolor = dark? cv::Scalar(0, 0, 0) : cv::Scalar(0xFF, 0xFF, 0xFF);
  _image = cv::Mat(size, size, CV_8UC3, bgcolor);
 
+ // from here on, we only care about the internal size
+ size = cimbar::Config::image_size();
+
  cv::Mat anchor = getAnchor(dark);
- paste(_image, anchor, 0, 0);
- paste(_image, anchor, 0, size - anchor.cols);
- paste(_image, anchor, size - anchor.rows, 0);
+ paste(anchor, 0, 0);
+ paste(anchor, 0, size - anchor.cols);
+ paste(anchor, size - anchor.rows, 0);
 
  cv::Mat secondaryAnchor = getSecondaryAnchor(dark);
- paste(_image, secondaryAnchor, size - anchor.rows, size - anchor.cols);
+ paste(secondaryAnchor, size - anchor.rows, size - anchor.cols);
 
  cv::Mat hg = getHorizontalGuide(dark);
- paste(_image, hg, (size/2) - (hg.cols/2), 2);
- paste(_image, hg, (size/2) - (hg.cols/2), size-4);
- paste(_image, hg, (size/2) - (hg.cols/2) - hg.cols, size-4);
- paste(_image, hg, (size/2) - (hg.cols/2) + hg.cols, size-4);
+ paste(hg, (size/2) - (hg.cols/2), 2);
+ paste(hg, (size/2) - (hg.cols/2), size-4);
+ paste(hg, (size/2) - (hg.cols/2) - hg.cols, size-4);
+ paste(hg, (size/2) - (hg.cols/2) + hg.cols, size-4);
 
  cv::Mat vg = getVerticalGuide(dark);
- paste(_image, vg, 2, (size/2) - (vg.rows/2));
- paste(_image, vg, size-4, (size/2) - (vg.rows/2));
+ paste(vg, 2, (size/2) - (vg.rows/2));
+ paste(vg, size-4, (size/2) - (vg.rows/2));
+}
+
+void CimbWriter::paste(const cv::Mat& img, int x, int y)
+{
+ img.copyTo(_image(cv::Rect(x+_offset, y+_offset, img.cols, img.rows)));
 }
 
 bool CimbWriter::write(unsigned bits)
@@ -77,7 +83,7 @@ bool CimbWriter::write(unsigned bits)
 
  CellPositions::coordinate xy = _positions.next();
  cv::Mat cell = _encoder.encode(bits);
- paste(_image, cell, xy.first, xy.second);
+ paste(cell, xy.first, xy.second);
  return true;
 }
 

diff --git a/src/lib/cimb_translator/CimbWriter.h b/src/lib/cimb_translator/CimbWriter.h
@@ -7,15 +7,19 @@
 class CimbWriter
 {
 public:
- CimbWriter(unsigned symbol_bits, unsigned color_bits, bool dark=true);
+ CimbWriter(unsigned symbol_bits, unsigned color_bits, bool dark=true, int size=0);
 
  bool write(unsigned bits);
  bool done() const;
 
  cv::Mat image() const;
 
+protected:
+ void paste(const cv::Mat& img, int x, int y);
+
 protected:
  cv::Mat _image;
  CellPositions _positions;
  CimbEncoder _encoder;
+ unsigned _offset = 0;
 };
diff --git a/src/lib/cimb_translator/test/CimbWriterTest.cpp b/src/lib/cimb_translator/test/CimbWriterTest.cpp
@@ -20,5 +20,23 @@ TEST_CASE( "CimbWriterTest/testSimple", "[unit]" )
  }
 
  cv::Mat img = cw.image();
+ assertEquals(1024, img.cols);
+ assertEquals(1024, img.rows);
  assertEquals( 0xeecc8800efce8c08, image_hash::average_hash(img) );
 }
+
+TEST_CASE( "CimbWriterTest/testCustomSize", "[unit]" )
+{
+ CimbWriter cw(4, 2, true, 1040);
+
+ while (1)
+ {
+ if (!cw.write(0))
+ break;
+ }
+
+ cv::Mat img = cw.image();
+ assertEquals(1040, img.cols);
+ assertEquals(1040, img.rows);
+ assertEquals( 0xab00ab02af0abfab, image_hash::average_hash(img) );
+}
diff --git a/src/lib/encoder/Encoder.h b/src/lib/encoder/Encoder.h
@@ -15,8 +15,8 @@ class Encoder : public SimpleEncoder
  using SimpleEncoder::SimpleEncoder;
 
  unsigned encode(const std::string& filename, std::string output_prefix);
- unsigned encode_fountain(const std::string& filename, std::string output_prefix, int compression_level=6, double redundancy=1.2);
- unsigned encode_fountain(const std::string& filename, const std::function<bool(const cv::Mat&, unsigned)>& on_frame, int compression_level=6, double redundancy=4.0);
+ unsigned encode_fountain(const std::string& filename, std::string output_prefix, int compression_level=6, double redundancy=1.2, int canvas_size=0);
+ unsigned encode_fountain(const std::string& filename, const std::function<bool(const cv::Mat&, unsigned)>& on_frame, int compression_level=6, double redundancy=4.0, int canvas_size=0);
 };
 
 inline unsigned Encoder::encode(const std::string& filename, std::string output_prefix)
@@ -39,7 +39,7 @@ inline unsigned Encoder::encode(const std::string& filename, std::string output_
  return i;
 }
 
-inline unsigned Encoder::encode_fountain(const std::string& filename, const std::function<bool(const cv::Mat&, unsigned)>& on_frame, int compression_level, double redundancy)
+inline unsigned Encoder::encode_fountain(const std::string& filename, const std::function<bool(const cv::Mat&, unsigned)>& on_frame, int compression_level, double redundancy, int canvas_size)
 {
  std::ifstream infile(filename);
  fountain_encoder_stream::ptr fes = create_fountain_encoder(infile, compression_level);
@@ -56,7 +56,7 @@ inline unsigned Encoder::encode_fountain(const std::string& filename, const std:
  unsigned i = 0;
  while (i < requiredFrames)
  {
- auto frame = encode_next(*fes);
+ auto frame = encode_next(*fes, canvas_size);
  if (!frame)
  break;
 
@@ -67,13 +67,13 @@ inline unsigned Encoder::encode_fountain(const std::string& filename, const std:
  return i;
 }
 
-inline unsigned Encoder::encode_fountain(const std::string& filename, std::string output_prefix, int compression_level, double redundancy)
+inline unsigned Encoder::encode_fountain(const std::string& filename, std::string output_prefix, int compression_level, double redundancy, int canvas_size)
 {
  std::function<bool(const cv::Mat&, unsigned)> fun = [output_prefix] (const cv::Mat& frame, unsigned i) {
  std::string output = fmt::format("{}_{}.png", output_prefix, i);
  cv::Mat bgr;
  cv::cvtColor(frame, bgr, cv::COLOR_RGB2BGR);
  return cv::imwrite(output, bgr);
  };
- return encode_fountain(filename, fun, compression_level, redundancy);
+ return encode_fountain(filename, fun, compression_level, redundancy, canvas_size);
 }