Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoder improvements. 118 KB/s #54

Merged
merged 26 commits into from
Mar 6, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
0de861b
quick WIP for GLSL transpose for the shakycam effect
sz3 Feb 20, 2021
cb84a59
hmm
sz3 Feb 20, 2021
d869656
use loop_iterator?
sz3 Feb 20, 2021
355ef7c
Is this better?
sz3 Feb 20, 2021
f6b6496
I don't know about this one chief
sz3 Feb 20, 2021
396939a
It's something like this, but...
sz3 Feb 20, 2021
a2ffe08
maybe this instead -- variable canvas size for CimbWriter
sz3 Feb 20, 2021
87338c4
Reimplement shakycam for opencv_highgui
sz3 Mar 2, 2021
a49fe7f
Only dark mode for now
sz3 Mar 2, 2021
c15ac29
The motivation of the shaky_cam is to detect cross-frame overlap,
sz3 Mar 3, 2021
6933d3e
fix color bug with highgui
sz3 Mar 3, 2021
21e39db
Update help
sz3 Mar 3, 2021
0626afc
Tired of the mouse cursor getting in the way...
sz3 Mar 4, 2021
f979012
:thinking:
sz3 Mar 4, 2021
4c5f826
I like this interface slightly more.
sz3 Mar 4, 2021
5b86c29
WIP: having more light (white) helps the camera
sz3 Mar 4, 2021
a865dd4
Simpler gradient, same effect
sz3 Mar 4, 2021
e340559
New numbers, and they are quick
sz3 Mar 4, 2021
48b2d1a
Spending my time on important things, like gradients
sz3 Mar 4, 2021
7247339
Fullscreen toggle?
sz3 Mar 4, 2021
f8408b3
I think document.documentElement is what I actually want
sz3 Mar 5, 2021
bf2c492
Misc code cleanup
sz3 Mar 5, 2021
206c786
Add a few tests for the larger size
sz3 Mar 5, 2021
8e5fa90
Misc docs updates, including new performance numbers!
sz3 Mar 5, 2021
fc04ea6
more docs?
sz3 Mar 5, 2021
61fff8f
I don't think I mentioned compression anywhere.
sz3 Mar 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions DETAILS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

## The premise

Conceptually, cimbar is built on top of `image hashing`:
Cimbar is a grid of colored tiles. Conceptually, it is built on the idea of `image hashing`:

![example image hash](https://github.com/sz3/cimbar-samples/blob/v0.5/docs/imagehash.png)

Expand Down Expand Up @@ -84,7 +84,7 @@ These properties may appear to be magical as you consider them more, and they do
2. wirehair requires the file contents to be stored in RAM
* this relates to the size limit!

This constraint is less of an obstacle than it may seem -- the fountain codes are essentially being used as a wire format, and the encoder and decoder could agree on a scheme to split up, and then reassemble, larger files. Cimbar does not yet implement this, however!
This constraint is less of an obstacle than it may seem -- the fountain codes are essentially being used as a wire format, and the encoder and decoder could agree on a scheme to split up, and then reassemble, larger files. Cimbar does not (yet?) implement this, however!

## Implementation: Decoder

Expand Down
24 changes: 17 additions & 7 deletions PERFORMANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,32 @@
* There are 4 or 8 possible colors, encoding an additional 2-3 bits per tile.
* These 6-7 bits per tile work out to a maximum of 9300-10850 bytes per barcode, though in practice this number is reduced by error correction.
* The default ecc setting is 30/155, which is how we go from 9300 -> 7500 bytes of real data for a 4-color cimbar image.
* Reed Solomon is not an ideal for this use case -- specifically, it corrects byte errors, and cimbar errors tend to involve 1-3 bits at a time. However, since Reed Solomon implementations are ubiquitous, I used it for this prototype.
* Reed Solomon is not perfect for this use case -- specifically, it corrects byte errors, and cimbar errors tend to involve 1-3 bits at a time. However, since Reed Solomon implementations are ubiquitous, it is currently in use.

## Current sustained benchmark

* 4-color cimbar with ecc=30:
* 2,980,556 bytes (after compression) in 36s -> 662 kilobits/s (~82 KB/s)
* 4,717,525 bytes (after compression) in 45s -> 838 kilobits/s (~104 KB/s)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4-color being over 100 KB/s is very nice.


* 8-color cimbar with ecc=30:
* 2,980,556 bytes in 31s -> 769 kilobits/s (~96 KB/s)
* 4,717,525 bytes in 40s -> 943 kilobits/s (~118 KB/s)

* details:
* these numbers are use https://github.com/sz3/cfc, running with 4 CPU threads on a Qualcomm Snapdragon 625
* cimbar has built-in compression using zstd. What's being measured here is bits over the wire, e.g. data after compression is applied.
* these numbers are using https://github.com/sz3/cfc, running with 4 CPU threads on a Qualcomm Snapdragon 625
* perhaps I will buy a new cell phone to inflate the benchmark numbers.
* the sender commandline is `./cimbar_send /path/to/file -s`
* the `shakycam` option allows cfc to quickly discard ghosted frames, and spend more time decoding real data.
* the sender is the cimbar.org wasm implementation. An equivalent command line is `./cimbar_send /path/to/file -s`
* cimbar.org uses the `shakycam` option to allow the receiver to detect/discard "in between" frames as part of the scan step. This allows it to spend more processing time decoding real data.
* burst rate can be higher (or lower)
* to this end, lower ecc settings *can* provide better burst rates
* 8-color cimbar is considerably more sensitive to lighting conditions. Notably, decoding has some issues with dim screens.
* 4-color cimbar is currently preferred, and will give more consistent transfer speeds.
* 8-color cimbar should be considered a prototype within a prototype. It is considerably more sensitive to lighting conditions and color tints.

* other notes:
* having better lighting in the frame often leads to better results -- this is why cimbar.org has a (mostly) white background. cfc uses android's auto-exposure, auto-focus, etc (it's a very simple app). Good ambient light -- or a white background -- can lead to more consitent quality frame capture.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misc notes for how to achieve peak throughput. There may be other stuff, but this is what I know.

* because of the lighting/exposure question, I usually "shoot" in landscape instead of portrait.
* cfc currently has a low resolution, so the cimbar frame should take up as much of the display as possible (trust the guide brackets)
* similarly, it's best to keep the camera angle straight-on -- instead of at an angle -- to decode the whole image successfully. Decodes should still happen at higher angles, but the "smaller" part of the image may have more errors than the ECC can deal with.
* other things to be wary of:
* glare from light sources.
* shaky hands.
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@

Behold: an experimental barcode format for air-gapped data transfer.

It can sustain speeds of 770+ kilobits/s (~96 KB/s) using nothing but a smartphone camera!
It can sustain speeds of 943+ kilobits/s (~118 KB/s) using just a computer monitor and a smartphone camera!

![A non-animated cimbar code](https://github.com/sz3/cimbar-samples/blob/v0.5/6bit/4cecc30f.png)

## Explain?

Expand All @@ -21,11 +23,13 @@ No internet/bluetooth/NFC/etc is used. All data is transmitted through the camer

`cimbar` is a high-density 2D barcode format. Data is stored in a grid of colored tiles -- bits are encoded based on which tile is chosen, and which color is chosen to draw the tile. Reed Solomon error correction is applied on the data, to account for the lossy nature of the video -> digital decoding. Sub-1% error rates are expected, and corrected.

`libcimbar`, this optimized implementation, includes a simple protocol for file encoding based on fountain codes (`wirehair`). Files of up to 33MB can be encoded in a series of cimbar codes, which can be output as images or a live video feed. Once enough distinct image frames have been decoded successfully, the file will be reconstructed successfully. This is true even if the images are received out of order, or if some have been corrupted or are missing.
`libcimbar`, this optimized implementation, includes a simple protocol for file encoding built on fountain codes (`wirehair`) and zstd compression. Files of up to 33MB (after compression!) are encoded in a series of cimbar codes, which can be output as images or a live video feed. Once enough distinct image frames have been decoded successfully, the file will be reconstructed and decompressed successfully. This is true even if the images are received out of order, or if some have been corrupted or are missing.

## Platforms

The code is written in C++, and developed/tested on amd64+linux, arm64+android, and emscripten+wasm. It probably works, or can be made to work, on other platforms.
The code is written in C++, and developed/tested on amd64+linux, arm64+android (decoder only), and emscripten+WASM (encoder only). It probably works, or can be made to work, on other platforms.

Crucially, because the encoder compiles to asmjs and wasm, it can run on anything with a modern web browser. There are [releases](https://github.com/sz3/libcimbar/releases/latest) if you wish to run the encoder locally instead of via cimbar.org.

## Library dependencies

Expand Down
16 changes: 12 additions & 4 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ libcimbar is fairly optimized, to achieve the *proof* part of proof-of-concept.
Performance optimizations aside, there are a number of paths that might be interesting to pursue. Some I may take a look at, but most I will leave to any enterprising developer who wants to take up the cause:

* proper metadata/header information?
* would be nice to be able to determine ecc/#colors/#smybols from the cimbar image itself?
* would be nice to be able to determine ecc/#colors/#symbols from the cimbar image itself?
* The bottom right corner is the obvious place to reclaim space to make this possible.
* multi-frame decoding?
* when decoding a static cimbar image, it would be useful to be able to use prior (unsuccessful) decode attempts to inform a future decode, and -- hopefully -- increase the probability of success. Currently, all frames are decoded independently.
* there is already a granular confidence metric that could be reused -- the `distance` that's tracked when decoding symbol tiles...
* optimal symbol set?
* the 16-symbol (4 bit) set is hand-drawn. I stared with ~40 or so hand-drawn symbols, and used the 16 that performed best with each other.
* there is surely a more optimal set -- a more rigorous approach should yield lower error rates!
* but, more importantly, it may be possible to go up to 32 symbols, and encode 5 bits per tile?
* but, more importantly, it may be possible to go up to 32 symbols, and encode 5 symbol bits per tile?
* optimal symbol size?
* the symbols that make up each cell on the cimbar grid are 8x8 (in a 9x9 grid).
* this is because imagehash was on 8x8 tiles!
Expand All @@ -25,16 +25,24 @@ Performance optimizations aside, there are a number of paths that might be inter
* optimal color set?
* the 4-color (2 bit) pallettes seem reasonable. 8-color, perhaps less so?
* this may be a limitation of the algorithm/approach, however. Notably, since each symbol is drawn with one pallette color, all colors need sufficient contrast against the backdrop (#000 or #FFF, depending). This constrains the color space somewhat, and less distinct colors == more errors.
* in addition to contrast, there is interplay (that I don't currently understand) between the overall brightness of the image and the exposure time needed for high framerate capture. More clean frames == more troughput.
* optimal grid size?
* 1024x1024 is a remnant of the early prototyping process. There is nothing inherently special about it (except that it fits on a 1920x1080 screen, which seems good)
* the tile grid itself is 1008x1008 (1008 == 9x112 -- there are 112 tile rows and columns)
* a smaller grid would be less information dense, but more resilient to errors. Probably.
* optimal grid shape?
* it's a square because QR codes are square. That's it. Should it be?
* I'm strongly considering 4:3 for the next revision.
* more efficient ECC?
* LDPC?
* Reed Solomon operates on bytes. Most decode errors tend to average out at 1-3 bits. It's not a total disaster, because it works. However, it would be nice to have denser error correction codes.
* QC-LDPC?
* Reed Solomon operates on bytes. Most decode errors tend to average out at 1-3 bits. (In the pathological case, a single read error will span two bytes.) It's not a total disaster -- it still works.
* I expect that state of the art ECC will allow 6-15% better throughput.
* it's a wide range due to various unknowns (unknowns to me, anyway)
* proper GPU support (OpenCV + openCL) on android?
* It *might* be useful. [CFC]((https://github.com/sz3/cfc) is the current test bed for this.
* wasm decoder?
* probably needs to use Web Workers
* in-browser GPGPU support would be interesting (but I'm not counting on it)
* ???
* still reading? Of course there's more! There's always more!

Expand Down
2 changes: 1 addition & 1 deletion src/exe/cimbar/cimbar.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ int main(int argc, char** argv)
unsigned ecc = cimbar::Config::ecc_bytes();
options.add_options()
("i,in", "Encoded pngs/jpgs/etc (for decode), or file to encode", cxxopts::value<vector<string>>())
("o,out", "Output file or directory.", cxxopts::value<string>())
("o,out", "Output file prefix (encoding) or directory (decoding).", cxxopts::value<string>())
("c,color-bits", "Color bits. [0-3]", cxxopts::value<int>()->default_value(turbo::str::str(colorBits)))
("e,ecc", "ECC level", cxxopts::value<unsigned>()->default_value(turbo::str::str(ecc)))
("f,fountain", "Attempt fountain encode/decode", cxxopts::value<bool>())
Expand Down
20 changes: 7 additions & 13 deletions src/exe/cimbar_send/send.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,18 +55,11 @@ int main(int argc, char** argv)
fps = defaultFps;
unsigned delay = 1000 / fps;

bool dark = true;
bool use_rotatecam = result.count("rotatecam");
bool use_shakycam = result.count("shakycam");
int window_size = 1080;

cimbar::shaky_cam cam(cimbar::Config::image_size(), 1080, 1080, dark);
// if we don't need the shakycam, we'll just turn it off
// we could use a separate code path (just do a mat copyTo),
// but this is fine.
if (!use_shakycam)
cam.toggle();

cimbar::window w(cam.width(), cam.height(), "cimbar_send");
cimbar::window w(window_size, window_size, "cimbar_send");
if (!w.is_good())
{
std::cerr << "failed to create window :(" << std::endl;
Expand All @@ -76,21 +69,22 @@ int main(int argc, char** argv)
bool running = true;
bool start = true;

auto draw = [&w, &cam, use_rotatecam, delay, &running, &start] (const cv::Mat& frame, unsigned) {
auto draw = [&w, use_rotatecam, use_shakycam, delay, &running, &start] (const cv::Mat& frame, unsigned) {
if (!start and w.should_close())
return running = false;
start = false;

cv::Mat& windowImg = cam.draw(frame);
w.show(windowImg, delay);
w.show(frame, delay);
if (use_rotatecam)
w.rotate();
if (use_shakycam)
w.shake();
return true;
};

Encoder en(ecc, cimbar::Config::symbol_bits(), colorBits);
while (running)
for (const string& f : infiles)
en.encode_fountain(f, draw, compressionLevel);
en.encode_fountain(f, draw, compressionLevel, 8.0, window_size);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may go a different direction with this at some point. We're passing in the size here to embed the 1024x1024 image in a larger image -- which will then get passed to the cimbar::window (which won't have to worry about resizing it).

This is somewhat silly -- the window should be able to render an image on a larger background.

But you know what else is silly? Trying to get transpose and scaling to work in a GLSL shader without hitting weird half-pixel blurring issues.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return 0;
}
42 changes: 24 additions & 18 deletions src/lib/cimb_translator/CimbWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,39 +33,45 @@ namespace {
string name = dark? "guide-vertical-dark" : "guide-vertical-light";
return cimbar::load_img(fmt::format("bitmap/{}.png", name));
}

void paste(cv::Mat& canvas, const cv::Mat& img, int x, int y)
{
img.copyTo(canvas(cv::Rect(x, y, img.cols, img.rows)));
}
}

CimbWriter::CimbWriter(unsigned symbol_bits, unsigned color_bits, bool dark)
CimbWriter::CimbWriter(unsigned symbol_bits, unsigned color_bits, bool dark, int size)
: _positions(Config::cell_spacing(), Config::num_cells(), Config::cell_size(), Config::corner_padding(), Config::interleave_blocks(), Config::interleave_partitions())
, _encoder(symbol_bits, color_bits)
{
unsigned size = cimbar::Config::image_size();
if (size > cimbar::Config::image_size())
_offset = (size - cimbar::Config::image_size()) / 2;
else
size = cimbar::Config::image_size();

cv::Scalar bgcolor = dark? cv::Scalar(0, 0, 0) : cv::Scalar(0xFF, 0xFF, 0xFF);
_image = cv::Mat(size, size, CV_8UC3, bgcolor);

// from here on, we only care about the internal size
size = cimbar::Config::image_size();

cv::Mat anchor = getAnchor(dark);
paste(_image, anchor, 0, 0);
paste(_image, anchor, 0, size - anchor.cols);
paste(_image, anchor, size - anchor.rows, 0);
paste(anchor, 0, 0);
paste(anchor, 0, size - anchor.cols);
paste(anchor, size - anchor.rows, 0);

cv::Mat secondaryAnchor = getSecondaryAnchor(dark);
paste(_image, secondaryAnchor, size - anchor.rows, size - anchor.cols);
paste(secondaryAnchor, size - anchor.rows, size - anchor.cols);

cv::Mat hg = getHorizontalGuide(dark);
paste(_image, hg, (size/2) - (hg.cols/2), 2);
paste(_image, hg, (size/2) - (hg.cols/2), size-4);
paste(_image, hg, (size/2) - (hg.cols/2) - hg.cols, size-4);
paste(_image, hg, (size/2) - (hg.cols/2) + hg.cols, size-4);
paste(hg, (size/2) - (hg.cols/2), 2);
paste(hg, (size/2) - (hg.cols/2), size-4);
paste(hg, (size/2) - (hg.cols/2) - hg.cols, size-4);
paste(hg, (size/2) - (hg.cols/2) + hg.cols, size-4);

cv::Mat vg = getVerticalGuide(dark);
paste(_image, vg, 2, (size/2) - (vg.rows/2));
paste(_image, vg, size-4, (size/2) - (vg.rows/2));
paste(vg, 2, (size/2) - (vg.rows/2));
paste(vg, size-4, (size/2) - (vg.rows/2));
}

void CimbWriter::paste(const cv::Mat& img, int x, int y)
{
img.copyTo(_image(cv::Rect(x+_offset, y+_offset, img.cols, img.rows)));
}

bool CimbWriter::write(unsigned bits)
Expand All @@ -77,7 +83,7 @@ bool CimbWriter::write(unsigned bits)

CellPositions::coordinate xy = _positions.next();
cv::Mat cell = _encoder.encode(bits);
paste(_image, cell, xy.first, xy.second);
paste(cell, xy.first, xy.second);
return true;
}

Expand Down
6 changes: 5 additions & 1 deletion src/lib/cimb_translator/CimbWriter.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,19 @@
class CimbWriter
{
public:
CimbWriter(unsigned symbol_bits, unsigned color_bits, bool dark=true);
CimbWriter(unsigned symbol_bits, unsigned color_bits, bool dark=true, int size=0);

bool write(unsigned bits);
bool done() const;

cv::Mat image() const;

protected:
void paste(const cv::Mat& img, int x, int y);

protected:
cv::Mat _image;
CellPositions _positions;
CimbEncoder _encoder;
unsigned _offset = 0;
};
18 changes: 18 additions & 0 deletions src/lib/cimb_translator/test/CimbWriterTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,23 @@ TEST_CASE( "CimbWriterTest/testSimple", "[unit]" )
}

cv::Mat img = cw.image();
assertEquals(1024, img.cols);
assertEquals(1024, img.rows);
assertEquals( 0xeecc8800efce8c08, image_hash::average_hash(img) );
}

TEST_CASE( "CimbWriterTest/testCustomSize", "[unit]" )
{
CimbWriter cw(4, 2, true, 1040);

while (1)
{
if (!cw.write(0))
break;
}

cv::Mat img = cw.image();
assertEquals(1040, img.cols);
assertEquals(1040, img.rows);
assertEquals( 0xab00ab02af0abfab, image_hash::average_hash(img) );
}
12 changes: 6 additions & 6 deletions src/lib/encoder/Encoder.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ class Encoder : public SimpleEncoder
using SimpleEncoder::SimpleEncoder;

unsigned encode(const std::string& filename, std::string output_prefix);
unsigned encode_fountain(const std::string& filename, std::string output_prefix, int compression_level=6, double redundancy=1.2);
unsigned encode_fountain(const std::string& filename, const std::function<bool(const cv::Mat&, unsigned)>& on_frame, int compression_level=6, double redundancy=4.0);
unsigned encode_fountain(const std::string& filename, std::string output_prefix, int compression_level=6, double redundancy=1.2, int canvas_size=0);
unsigned encode_fountain(const std::string& filename, const std::function<bool(const cv::Mat&, unsigned)>& on_frame, int compression_level=6, double redundancy=4.0, int canvas_size=0);
};

inline unsigned Encoder::encode(const std::string& filename, std::string output_prefix)
Expand All @@ -39,7 +39,7 @@ inline unsigned Encoder::encode(const std::string& filename, std::string output_
return i;
}

inline unsigned Encoder::encode_fountain(const std::string& filename, const std::function<bool(const cv::Mat&, unsigned)>& on_frame, int compression_level, double redundancy)
inline unsigned Encoder::encode_fountain(const std::string& filename, const std::function<bool(const cv::Mat&, unsigned)>& on_frame, int compression_level, double redundancy, int canvas_size)
{
std::ifstream infile(filename);
fountain_encoder_stream::ptr fes = create_fountain_encoder(infile, compression_level);
Expand All @@ -56,7 +56,7 @@ inline unsigned Encoder::encode_fountain(const std::string& filename, const std:
unsigned i = 0;
while (i < requiredFrames)
{
auto frame = encode_next(*fes);
auto frame = encode_next(*fes, canvas_size);
if (!frame)
break;

Expand All @@ -67,13 +67,13 @@ inline unsigned Encoder::encode_fountain(const std::string& filename, const std:
return i;
}

inline unsigned Encoder::encode_fountain(const std::string& filename, std::string output_prefix, int compression_level, double redundancy)
inline unsigned Encoder::encode_fountain(const std::string& filename, std::string output_prefix, int compression_level, double redundancy, int canvas_size)
{
std::function<bool(const cv::Mat&, unsigned)> fun = [output_prefix] (const cv::Mat& frame, unsigned i) {
std::string output = fmt::format("{}_{}.png", output_prefix, i);
cv::Mat bgr;
cv::cvtColor(frame, bgr, cv::COLOR_RGB2BGR);
return cv::imwrite(output, bgr);
};
return encode_fountain(filename, fun, compression_level, redundancy);
return encode_fountain(filename, fun, compression_level, redundancy, canvas_size);
}
Loading