Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Font subsetting features? #9

Closed
papandreou opened this issue Jul 1, 2019 · 52 comments
Closed

Font subsetting features? #9

papandreou opened this issue Jul 1, 2019 · 52 comments

Comments

@papandreou
Copy link
Contributor

Hey, I read that harfbuzz is gaining support for font subsetting and is working towards replacing pyftsubset from fonttools: https://github.com/harfbuzz/harfbuzz/projects/4

I'm not sure what the status of that effort is, but I'm drooling over the idea of using it in subfont without shelling out to python. Also, I'm really looking forward to the ability to do axis trimming of variable fonts some time in the future, so it would be great to try switching to harfbuzz.

I don't have any practical experience with this module or harfbuzz itself, but it looks like the subsetting features aren't exposed?

@ebraminio
Copy link
Contributor

ebraminio commented Jul 2, 2019

Hey there! Great idea! Just added build recipe, edb6f50 Here is the pure .wasm binary for hb-subset,
hb-subset.wasm.zip
it is a reduced version of harfbuzz thus some functionalities shouldn't work as expected but we can go further once we have some hello world around! :)

You may like to try to port this C code to JavaScript code like this https://github.com/harfbuzz/harfbuzzjs/blob/master/examples/nohbjs.html but using the .wasm I've given to you it should work but I will try myself later.

/* Creating a face */
hb_blob_t *blob = hb_blob_create (font_data, font_data_length, HB_MEMORY_MODE_READONLY, nullptr, nullptr);
/* Or if you like to read an actual file: hb_blob_t *blob = hb_blob_create_from_file (path); */
hb_face_t *face = hb_face_create (blob, 0/*this is ttcIndex*/);
hb_blob_destory (blob); /* face keeps a reference of to it so you can destroy it here */

/* Add your glyph indices here and subset */
hb_set_t *glyphs = hb_set_create ();
hb_set_add (glyphs, 0);
hb_set_add (glyphs, 3);
hb_subset_input_t *input = hb_subset_input_create_or_fail ();
hb_set_t *input_glyphs = hb_subset_input_glyph_set (input);
hb_set_union (input_glyphs, glyphs);
hb_subset_input_set_drop_hints (input, true);
//hb_subset_input_set_drop_layout (input, true);
hb_face_t *subset = hb_subset (face, input);

/* Clean up */
hb_subset_input_destroy (input);

/* Get result blob */
hb_blob_t *result = hb_face_reference_blob (subset);
unsigned int length;
const char *data = hb_blob_get_data (blob, &length);

/* Write. If you like! */
FILE *f = fopen (output_path, "wb");
fwrite (data, 1, length, f);
fclose (f);

/* Clean up */
hb_blob_destroy (result);
hb_face_destroy (subset);

@behdad
Copy link
Member

behdad commented Jul 2, 2019

Note that HarfBuzz subsetter is not a complete replacement for pyftsubset yet. It should be, later this year.

@papandreou
Copy link
Contributor Author

@ebraminio, thanks a lot for looking into this! It seems very promising.

I followed your instructions, and it does produce a functional font 🎉

const fs = require('fs');
const readFileAsync = require('util').promisify(fs.readFile);
const writeFileAsync = require('util').promisify(fs.writeFile);

(async () => {
    const { instance: { exports } } = await WebAssembly.instantiate(await readFileAsync(__dirname + '/subset/hb-subset.wasm'));
    exports.memory.grow(400); // each page is 64kb in size
    const fontBlob = await readFileAsync(__dirname + '/roboto-black.ttf');

    const heapu8 = new Uint8Array(exports.memory.buffer);
    const fontBuffer = exports.malloc(fontBlob.byteLength);
    heapu8.set(new Uint8Array(fontBlob), fontBuffer);

    /* Creating a face */
    const blob = exports.hb_blob_create(fontBuffer, fontBlob.byteLength, 2/*HB_MEMORY_MODE_WRITABLE*/, 0, 0);
    const face = exports.hb_face_create(blob, 0);
    exports.hb_blob_destroy(blob);

    /* Add your glyph indices here and subset */
    // hb_set_t *glyphs = hb_set_create ();
    const glyphs = exports.hb_set_create();

    // hb_set_add (glyphs, 0);
    exports.hb_set_add(glyphs, 0);
    // hb_set_add (glyphs, 3);
    exports.hb_set_add(glyphs, 3);
    // hb_subset_input_t *input = hb_subset_input_create_or_fail ();
    const input = exports.hb_subset_input_create_or_fail();
    // hb_set_t *input_glyphs = hb_subset_input_glyph_set (input);
    const input_glyphs = exports.hb_subset_input_glyph_set(input);
    // hb_set_union (input_glyphs, glyphs);
    exports.hb_set_union(input_glyphs, glyphs);
    // hb_subset_input_set_drop_hints (input, true);
    exports.hb_subset_input_set_drop_hints(input, true);
    // //hb_subset_input_set_drop_layout (input, true);
    // exports.hb_subset_input_set_drop_layout(input, true);
    // hb_face_t *subset = hb_subset (face, input);
    const subset = exports.hb_subset(face, input);

    /* Clean up */
    exports.hb_subset_input_destroy(input);

    /* Get result blob */
    const result = exports.hb_face_reference_blob(subset);

    // unsigned int length;
    // const char *data = hb_blob_get_data (blob, &length);
    const lengthPointer = exports.malloc(2); // Not sure this is the idiomatic way to do it :)
    const data = exports.hb_blob_get_data(blob, lengthPointer);
    const length = heapu8[lengthPointer] + heapu8[lengthPointer + 1] << 8;
    const subsetFontBlob = heapu8.slice(data, data + length);

    await writeFileAsync(__dirname + '/roboto-black-subset.ttf', subsetFontBlob);

    /* Clean up */
    exports.hb_blob_destroy(result);
    exports.hb_face_destroy(subset);
})();

... Although the resulting file size is 51968 bytes, whereas the original roboto-black.ttf I used is 44828 bytes. Seems like I'm missing something :)

@behdad
Copy link
Member

behdad commented Jul 2, 2019

... Although the resulting file size is 51968 bytes, whereas the original roboto-black.ttf I used is 44828 bytes. Seems like I'm missing something :)

Weird. Can you try with HarfBuzz native (hb-subset command) and report a bug against upstream harfbuzz?

@ebraminio
Copy link
Contributor

ebraminio commented Jul 3, 2019

I followed your instructions, and it does produce a functional font 🎉

Cool!

I am uploading a new wasm which has hb_blob_get_length so no longer needs that hack, hb-subset.zip

const fs = require('fs');
const readFileAsync = require('util').promisify(fs.readFile);
const writeFileAsync = require('util').promisify(fs.writeFile);

(async () => {
    const { instance: { exports } } = await WebAssembly.instantiate(await readFileAsync(__dirname + '/hb-subset.wasm'));
    exports.memory.grow(400); // each page is 64kb in size
    const fontBlob = await readFileAsync(__dirname + '/Roboto-Black.ttf');

    const heapu8 = new Uint8Array(exports.memory.buffer);
    const fontBuffer = exports.malloc(fontBlob.byteLength);
    heapu8.set(new Uint8Array(fontBlob), fontBuffer);

    /* Creating a face */
    const blob = exports.hb_blob_create(fontBuffer, fontBlob.byteLength, 2/*HB_MEMORY_MODE_WRITABLE*/, 0, 0);
    const face = exports.hb_face_create(blob, 0);
    exports.hb_blob_destroy(blob);

    /* Add your glyph indices here and subset */
    // hb_set_t *glyphs = hb_set_create ();
    const glyphs = exports.hb_set_create();

    // hb_set_add (glyphs, 0);
    exports.hb_set_add(glyphs, 0);
    // hb_set_add (glyphs, 3);
    exports.hb_set_add(glyphs, 3);
    // hb_subset_input_t *input = hb_subset_input_create_or_fail ();
    const input = exports.hb_subset_input_create_or_fail();
    // hb_set_t *input_glyphs = hb_subset_input_glyph_set (input);
    const input_glyphs = exports.hb_subset_input_glyph_set(input);
    // hb_set_union (input_glyphs, glyphs);
    exports.hb_set_union(input_glyphs, glyphs);
    // hb_subset_input_set_drop_hints (input, true);
    exports.hb_subset_input_set_drop_hints(input, true);
    // //hb_subset_input_set_drop_layout (input, true);
    // exports.hb_subset_input_set_drop_layout(input, true);
    // hb_face_t *subset = hb_subset (face, input);
    const subset = exports.hb_subset(face, input);

    /* Clean up */
    exports.hb_subset_input_destroy(input);

    /* Get result blob */
    const result = exports.hb_face_reference_blob(subset);

    const data = exports.hb_blob_get_data(result, 0);
    const subsetFontBlob = heapu8.slice(data, data + exports.hb_blob_get_length(result));

    await writeFileAsync(__dirname + '/roboto-black-subset.ttf', subsetFontBlob);

    /* Clean up */
    exports.hb_blob_destroy(result);
    exports.hb_face_destroy(subset);
})();

I used https://github.com/google/fonts/blob/master/apache/roboto/Roboto-Black.ttf 167kb and the result is 2kb :)

@ebraminio
Copy link
Contributor

Here is a wasm version with all of https://github.com/harfbuzz/harfbuzz/blob/master/src/hb-subset.h API: hb-subset.zip please note that hb-subset API is not considered stable however.

@ebraminio
Copy link
Contributor

And the builds here are HB_TINY builds, I don't know what implications that would have for hb-subset functionality but I guess there are some.

@papandreou
Copy link
Contributor Author

Weird. Can you try with HarfBuzz native (hb-subset command) and report a bug against upstream harfbuzz?

@behdad, with hb-subset roboto-black.ttf -o roboto-black-subset.ttf foo on harfbuzz master, the resulting file is 2372 bytes, so there's nothing wrong with that 😌

When I use the latest hb-subset.wasm from @ebraminio and the updated JavaScript, roboto-black-subset.ttf comes out as 980 bytes 🎉. Must just have been my first attempt that was buggy.

I tried adding R and O to the subset (exports.hb_set_add(glyphs, 82); exports.hb_set_add(glyphs, 79);), but unfortunately Chrome doesn't seem to pick up the characters when I load an HTML page that references the font:

<!DOCTYPE html>
<html>
    <head>
        <style>
            @font-face {
                font-family: roboto;
                src: url(roboto-black.ttf);
            }
            @font-face {
                font-family: robotosubset;
                src: url(roboto-black-subset.ttf);
            }
        </style>
    </head>
    <body>
        <div style="font-family: roboto; font-size: 50px;">ROBOTO</div>
        <div style="font-family: robotosubset; font-size: 50px;">ROBOTO (subset)</div>
    </body>
</html>

It just renders the fallback font:

Screen Shot 2019-07-04 at 00 49 29

I'm guessing that's what will work later this year? :)

@ebraminio
Copy link
Contributor

Can you check the native command output also? There should be no difference between it and the wasm output. Also do you hb_shape in wasm build? That adds up some 170kb in size but makes you able to replicate hb-subset command. I think due to being HB_TINY you should use enable retain gid, hb_subset_input_set_retain_gids, should check this myself.

@behdad
Copy link
Member

behdad commented Jul 5, 2019

No it's not supposed to be broken like that. Please test with non-JS natively built hb-subset and report.

@ebraminio
Copy link
Contributor

ebraminio commented Jul 5, 2019

Okay here is working thing now! :)

const fs = require('fs');
const readFileAsync = require('util').promisify(fs.readFile);
const writeFileAsync = require('util').promisify(fs.writeFile);

(async () => {
    const { instance: { exports } } = await WebAssembly.instantiate(await readFileAsync(__dirname + '/hb-subset.wasm'));
    exports.memory.grow(400); // each page is 64kb in size
    const fontBlob = await readFileAsync(__dirname + '/Roboto-Black.ttf');

    const heapu8 = new Uint8Array(exports.memory.buffer);
    const fontBuffer = exports.malloc(fontBlob.byteLength);
    heapu8.set(new Uint8Array(fontBlob), fontBuffer);

    /* Creating a face */
    const blob = exports.hb_blob_create(fontBuffer, fontBlob.byteLength, 2/*HB_MEMORY_MODE_WRITABLE*/, 0, 0);
    const face = exports.hb_face_create(blob, 0);
    exports.hb_blob_destroy(blob);

    /* Add your glyph indices here and subset */
    const glyphs = exports.hb_set_create();

    exports.hb_set_add(glyphs, 'a'.charCodeAt(0));
    exports.hb_set_add(glyphs, 'b'.charCodeAt(0));
    exports.hb_set_add(glyphs, 'c'.charCodeAt(0));

    const input = exports.hb_subset_input_create_or_fail();
    const input_glyphs = exports.hb_subset_input_unicode_set(input);
    exports.hb_set_union(input_glyphs, glyphs);
    exports.hb_subset_input_set_drop_hints(input, true);
    const subset = exports.hb_subset(face, input);

    /* Clean up */
    exports.hb_subset_input_destroy(input);

    /* Get result blob */
    const result = exports.hb_face_reference_blob(subset);

    const data = exports.hb_blob_get_data(result, 0);
    const subsetFontBlob = heapu8.slice(data, data + exports.hb_blob_get_length(result));

    await writeFileAsync(__dirname + '/roboto-black-subset.ttf', subsetFontBlob);

    /* Clean up */
    exports.hb_blob_destroy(result);
    exports.hb_face_destroy(subset);
})();

The wasm file: hb-subset.wasm.zip
Why the size bloat on wasm file? Because I am bringing back cff subset support and GSUB/GPOS subset https://github.com/harfbuzz/harfbuzzjs/blob/master/subset/config-override.h which has its own size implication.
Input: https://github.com/google/fonts/blob/master/apache/roboto/Roboto-Black.ttf
Output: roboto-black-subset.ttf.zip

image

And I can say this is our expected thing :)

@papandreou
Copy link
Contributor Author

The font output by the natively built hb-subset works fine and is 2428 bytes when I include only R and O.

With the latest code snippet and wasm file from @ebraminio I get an also functional file of only 1212 bytes 🎉

They produce seemingly identical renderings in Chrome on OSX:

Screen Shot 2019-07-07 at 10 35 50

I don't know how to take ttfs apart, but as far as https://fontdrop.info/ can tell the two font files are identical except some differences in the maxp chunk, where the wasm one has some more zero values:

subset from hb-subset

subset from wasm

Diffing the output of the ttx tool on the two fonts say that the fpgm, prep, and cvt chunks are also missing from the wasm output, as well as some instructions in the glyf chunk. I guess that accounts for the size difference: https://gist.github.com/papandreou/cc1f32ee847c055f47069ea5e8dd9554

The question is then, do I need those things when I'm targetting a web browser, or should I consider that a feature? 🤔

@ebraminio
Copy link
Contributor

Remove exports.hb_subset_input_set_drop_hints call and see what happens! :) (that is the difference AFAIK)

@papandreou
Copy link
Contributor Author

@ebraminio, yes, that's exactly it! Then the only difference between the two files (both 2428 bytes) is:

--- roboto-black-subset-with-hbsubset.ttx	2019-07-07 11:54:12.000000000 +0200
+++ roboto-black-subset-with-wasm.ttx	2019-07-07 11:54:05.000000000 +0200
@@ -12,7 +12,7 @@
     <!-- Most of this table will be recalculated by the compiler -->
     <tableVersion value="1.0"/>
     <fontRevision value="1.0"/>
-    <checkSumAdjustment value="0x97ec4f24"/>
+    <checkSumAdjustment value="0x6f737206"/>
     <magicNumber value="0x5f0f3cf5"/>
     <flags value="00000000 00011111"/>
     <unitsPerEm value="2048"/>

... Not exactly sure what that means, but probably no biggie 😅

@papandreou
Copy link
Contributor Author

This all looks really promising, thank you so much. I'm thinking of doing a spike in subfont where I introduce an experimental flag that does the subsetting using this wasm build + some woff/woff2 encoder instead of pyftsubset.

Given that @behdad said that

HarfBuzz subsetter is not a complete replacement for pyftsubset yet

... can you think of any reason why it wouldn't be complete enough for the web use case?

We presently use a command line of:

pyfysubset --output-file=... --obfuscate_names --flavor=woff2 --text=...

I've noted that you said that the hb-subset api isn't stable, and I'm willing to adapt to changes along the way ;)

@papandreou
Copy link
Contributor Author

papandreou commented Jul 7, 2019

I guess I'll need to find a way to preserve the ligatures that are utilized by the text on the web page.

Edit: Looks like hb-subset removes ligatures, so this is probably one of the things that aren't supported yet?

@ebraminio
Copy link
Contributor

@ebraminio, yes, that's exactly it! Then the only difference between the two files (both 2428 bytes) is:

image

They are identical in my case, do you have different hashes? A difference can mean something bad has happened so let us know.

@papandreou
Copy link
Contributor Author

papandreou commented Jul 8, 2019

Yeah, I get different hashes when I subset the roboto-black.ttf I have on my machine:

$ openssl dgst -sha256 *.ttf
SHA256(roboto-black-subset-wasm.ttf)= f8000ccef392c0a8bd4104dd7a3089e14d37cec8ac78d7ba6cb123401e1483c6
SHA256(roboto-black-subset-native.ttf)= 11adff663b3e95d10b04312ff65e2292e2b5340814022b3b8bbde578ae0633c8

(still only that checkSumAdjustment thing)

If I download the Roboto that you have been using (https://github.com/google/fonts/blob/master/apache/roboto/Roboto-Black.ttf) and repeat the experiment (abc without dropping hints), I get identical files. Both hashes match the sum that you got:

$ openssl dgst -sha256 *.ttf
SHA256(roboto-black-subset-native.ttf)= 54e1f60ddd2309051ba9d72b9e815f8d26f5b82f29662aec5b9afa9879eba25f
SHA256(roboto-black-subset-wasm.ttf)= 54e1f60ddd2309051ba9d72b9e815f8d26f5b82f29662aec5b9afa9879eba25f

So the difference is triggered by the file that I was testing with from the start. Here is a copy of it: roboto-black.zip

@ebraminio
Copy link
Contributor

ebraminio commented Jul 8, 2019

Reproduced it locally, something fishy is going on

@ebraminio
Copy link
Contributor

ebraminio commented Jul 8, 2019

Reproduced it locally, something fishy is going

harfbuzz/harfbuzz#1823 is related, lets see what happens there to proceed, such differences are very important and should be investigated.

... can you think of any reason why it wouldn't be complete enough for the web use case?

I guess I'll need to find a way to preserve the ligatures that are utilized by the text on the web page.

Edit: Looks like hb-subset removes ligatures, so this is probably one of the things that aren't supported yet?

I like to know answers of these also but as I haven't made myself that familiar with subset I can't help much so maybe Behdad can help

@ebraminio
Copy link
Contributor

ebraminio commented Jul 10, 2019

Hey @papandreou your report resulted to finding real issues in harfbuzz subset module and its testing, which are fixed now, feel free to test the new binary and report any difference you see.
Built with -Oz so is size optimal: hb-subset.wasm.zip
Built with -O3 for speed (default build now that browser isn't main use for this .wasm module): hb-subset.wasm.zip

~/harfbuzzjs/subset $ ./build.sh && node test.js && clang test.cc -o test -I ../harfbuzz/src/ -fno-rtti -fno-exceptions -lm && ./test && sha512sum roboto-black-subset-c.ttf roboto-black-subset-js.ttf
Already up to date.
6ebf7ad534667060b514a1af750fefe08944737b6967155204c2fa4c650328750c605d2680fa8c63531632349f6f25b02e43ea211ea3febacf44d5e22b92a93d  roboto-black-subset-c.ttf
6ebf7ad534667060b514a1af750fefe08944737b6967155204c2fa4c650328750c605d2680fa8c63531632349f6f25b02e43ea211ea3febacf44d5e22b92a93d  roboto-black-subset-js.ttf

@papandreou
Copy link
Contributor Author

@ebraminio, happy to help! I can confirm that both new wasm builds resolve the issue with the different checksum adjustments. Now I also get the 6ebf7ad... sha512 for both files.

@papandreou
Copy link
Contributor Author

@behdad, I looked a bit into the harfbuzz docs and source code, but couldn't figure out how to learn which ligatures are present in a font, or how to add them to a subset. Can you confirm that it's not implemented yet, or offer a few pointers? 🙏

@behdad
Copy link
Member

behdad commented Jul 15, 2019

@behdad, I looked a bit into the harfbuzz docs and source code, but couldn't figure out how to learn which ligatures are present in a font, or how to add them to a subset. Can you confirm that it's not implemented yet, or offer a few pointers?

You won't need to do anything. When it's implemented, it should automatically includes ligatures and other relevant features in the subset based on the Unicode characters you requested.

Check back in a month. It should be in better shape then. Definitely ligatures done by then.

@ebraminio
Copy link
Contributor

Great! So let's close this and continue the thread in harfbuzz repository itself for feature requests of harfbuzz but continue here for putting wasm builds. Thanks :)

@ebraminio
Copy link
Contributor

ebraminio commented Sep 1, 2019

Nice! Apparently Behdad is still working on finishing gsub/gpos support of the subsetter. I am also working on the related aspects of this on harfbuzz repo itself.

Just about the licensing, HarfBuzz and this project itself are released under MIT, we use minor elements from Intel Zephyr libc that are under Apache 2.0 and BSD (the below one) here https://github.com/harfbuzz/harfbuzzjs/blob/master/libc/zephyr-string.c maybe we should replace these minis with consistent, preferably released under MIT license ones, still be careful to mention those on a release.

@papandreou
Copy link
Contributor Author

That sounds great! I'm trying to follow along and I'm very impressed with the progress that you're making 👍

Wrt. the licensing I don't mind a mix of MIT, Apache 2.0 and BSD, but from the perspective of the library you're probably right. The devil's always in the details. Either way I'll just await your progress and will then report back. Once we have those basic examples rendering the same way before and after subsetting with harfbuzz, I can also enhance the test suite and maybe help finding more discrepancies.

@ebraminio
Copy link
Contributor

ebraminio commented Sep 1, 2019

I can also enhance the test suite and maybe help finding more discrepancies.

That sounds just great :)

Here is a new build, hb-subset.wasm.zip note that I am optimizing for speed in these builds but can optimize for size if you like to. Feel free to check if it works as expected :) Will upload an online demo for it also

papandreou added a commit to Munter/subfont that referenced this issue Sep 1, 2019
@papandreou
Copy link
Contributor Author

Thanks for the new build! The screenshot-based tests show the same difference as before, but I guess that's still expected :)

For now my use case is purely "server side", so it's not hugely important for me whether the wasm is optimized for speed or size at this stage. Maybe a slight preference for speed :)

@ebraminio
Copy link
Contributor

Online demo: https://harfbuzz.github.io/harfbuzzjs/subset/

@papandreou
Copy link
Contributor Author

@ebraminio, just checking in to see how it's going with ligature/GSUB/GPOS support. I saw that a lot of commits have landed in harfbuzz related to that, but I don't really understand what the status is :)

I built hb-subset.wasm and linked it into my harfbuzz subfont branch just now, but I still see the same rendering differences as shown in the screenshots on: Munter/subfont#56

I'm still using your instructions in the function that generates the subset: https://github.com/Munter/subfont/blob/4a3bf697c7e7ebdd17515cc0620fff2ebf8d5b22/lib/subsetLocalFontWithHarfbuzz.js#L30-L71

Maybe I need some additional commands to include those other tables?

@ebraminio
Copy link
Contributor

ebraminio commented Feb 9, 2020 via email

@papandreou
Copy link
Contributor Author

@ebraminio, hi again! I'm continuing to maintain my harfbuzz branch in subfont, but I still get the same differences when comparing against the rendering of the web page before:

12065-31870-1f5lbi1 3mwx
12065-31870-11wolaa 84ga
12065-31870-qt2rr8 eu3wq

Is it still the same state regarding the subsetting of GSUB and GPOS, or do I need to change something to take advantage of it? The code that uses hb-subset.wasm is still pretty much the same as what you originally provided: https://github.com/Munter/subfont/blob/95d90850f04d2064b01ae62894daec1834e4d640/lib/subsetLocalFontWithHarfbuzz.js#L30-L71

@ebraminio
Copy link
Contributor

ebraminio commented Jul 17, 2020

The situation with GPOS and specially GSUB isn't that improved since the previous time unfortunately but still a bit unclear you are comparing, are you comparing hb-subset command line result against hb-subset.wasm result or you are comparing an pyftsubset result vs hb-subset? Best way to progress things here is to compare hb-subset (the command line provided along the package) vs pyftsubset using fonttools' ttx but if there something different between hb-subset.wasm and hb-subset that sounds like a serious issue, anyway I am uploading a new hb-subset.wasm for you: hb-subset.zip but best would be you compile harfbuzz for yourself and file a bug against harfbuzz for missing things (using ttx comparison) and I can help you like to on it.

@papandreou
Copy link
Contributor Author

@ebraminio, thanks for the update!

but still a bit unclear you are comparing, are you comparing hb-subset command line result against hb-subset.wasm result or you are comparing an pyftsubset result vs hb-subset?

These tests are comparing the rendering of some small webpages in headless Chrome before and after applying subsetting. So a rendering with the original fonts vs. the ones created with various configurations and subsetting tools. So the magenta parts of the above images show that subsetting the font using hb-subset.wasm results in the webpage rendering differently.

The same test cases are run with pyftsubset where they pass. Since the "before" screenshot is the same whether hb-subset.wasm or pyftsubset is used, the diffs effectively also serve as a direct comparison of what comes out of the two subsetting tools.

Best way to progress things here is to compare hb-subset (the command line provided along the package) vs pyftsubset using fonttools' ttx but if there something different between hb-subset.wasm and hb-subset that sounds like a serious issue

Okay, thanks! I'll try that!

anyway I am uploading a new hb-subset.wasm for you: hb-subset.zip but best would be you compile harfbuzz for yourself and file a bug against harfbuzz for missing things (using ttx comparison) and I can help you like to on it.

Thanks! That build results in the same diffs. I'm already compiling it myself using /subset/build.sh and harfbuzz master, so that was to be expected.

@ebraminio
Copy link
Contributor

ebraminio commented Jul 17, 2020

Okay, thanks! I'll try that!
I'm already compiling it myself using /subset/build.sh and harfbuzz master, so that was to be expected.

Oh, so you can build harfbuzz itself also easily, git clone https://github.com/harfbuzz/harfbuzz && cd harfbuzz && meson build && ninja -Cbuild && meson test -Cbuild --suite subset --no-suite slow (or whole tests with ninja -Cbuild test) then build/util/hb-subset

@ebraminio
Copy link
Contributor

ebraminio commented Jul 17, 2020

But why I like you to interact with harfbuzz C++ project itself, I believe if you give your feedback on exact missing things (by comparing ttx result of pyftsubset vs hb-subset command line not the wasm one which should be identical anyway) on the main project (filing bugs there) that will speed up the development there, or, at least outsiders can know also which things are missing so may contribute there also.

@papandreou
Copy link
Contributor Author

@ebraminio,

if there something different between hb-subset.wasm and hb-subset that sounds like a serious issue

I tried rewiring the harfbuzz subsetting code to shell out to the hb-subset binary instead, and interestingly that improves the situation compared to using the wasm build. Now the "Hello, world!" case from above passes, and the one with the Chinese characters comes closer:

12065-18556-11u1m26 b979

The "Waffle stuffings" case with ligatures is still off.

... But those differences do seem to indicate that there's something wrong in harfbuzzjs or the way that I use it

@ebraminio
Copy link
Contributor

ebraminio commented Jul 17, 2020

oh, that is definitely looks bad, can you help me reproduce it like the previous case? some ttx / binary difference analysis would be awesome also.

@ebraminio
Copy link
Contributor

ebraminio commented Jul 17, 2020

or the way that I use it

exports.hb_subset_input_set_drop_hints(input, true); isn't done in hb-subset command (probably you can drop it anyway), guess that explains some part of the difference at least.

@papandreou
Copy link
Contributor Author

oh, that is definitely looks bad, can you help me reproduce it like the previous case? some ttx / binary difference analysis would be awesome also.

Source font: Roboto-400.zip

subsetWithWasm.js

/* global WebAssembly */
const fs = require('fs');
const readFileAsync = require('util').promisify(fs.readFile);
const _ = require('lodash');
const { readFile, writeFile } = require('fs').promises;

const loadAndInitializeHarfbuzz = _.once(async () => {
  const {
    instance: { exports },
  } = await WebAssembly.instantiate(
    await readFileAsync(require.resolve('harfbuzzjs/subset/hb-subset.wasm'))
  );
  exports.memory.grow(400); // each page is 64kb in size

  const heapu8 = new Uint8Array(exports.memory.buffer);
  return [exports, heapu8];
});

(async () => {
  const originalFont = await readFile('Roboto-400.ttf');
  const text = 'Hello, world!';
  const [exports, heapu8] = await loadAndInitializeHarfbuzz();

  const fontBuffer = exports.malloc(originalFont.byteLength);
  heapu8.set(new Uint8Array(originalFont), fontBuffer);

  // Create the face
  const blob = exports.hb_blob_create(
    fontBuffer,
    originalFont.byteLength,
    2, // HB_MEMORY_MODE_WRITABLE
    0,
    0
  );
  const face = exports.hb_face_create(blob, 0);
  exports.hb_blob_destroy(blob);

  // Add glyph indices and subset
  const glyphs = exports.hb_set_create();

  for (let i = 0; i < text.length; i += 1) {
    exports.hb_set_add(glyphs, text.charCodeAt(i));
  }

  const input = exports.hb_subset_input_create_or_fail();
  const inputGlyphs = exports.hb_subset_input_unicode_set(input);
  exports.hb_set_union(inputGlyphs, glyphs);
  exports.hb_subset_input_set_drop_hints(input, true);
  const subset = exports.hb_subset(face, input);

  // Clean up
  exports.hb_subset_input_destroy(input);

  // Get result blob
  const result = exports.hb_face_reference_blob(subset);

  const offset = exports.hb_blob_get_data(result, 0);
  const subsetFontBlob = heapu8.slice(
    offset,
    offset + exports.hb_blob_get_length(result)
  );

  // Clean up
  exports.hb_blob_destroy(result);
  exports.hb_face_destroy(subset);

  await writeFile('roboto-400-wasm.ttf', subsetFontBlob);
})();
../harfbuzz/build/util/hb-subset -o roboto-400-hb-subset.ttf Roboto-400.ttf 'Hello, world'
node subsetWithWasm.js
diff -U1000 <(ttx -o - roboto-400-hb-subset.ttf) <(ttx -o - roboto-400-wasm.ttf) > hb-subset_vs_wasm_ttx.diff
ls -la roboto-400* | cut -d" " -f 5,9
3372 roboto-400-hb-subset.ttf
1660 roboto-400-wasm.ttf

ttx diff

Seems like <hdmx>, <fpgm>, <prep>, <cvt> and some <instruction> elements from <glyf> are missing in the wasm-generated version. I suspect I'm missing some commands in subsetWithWasm.js?

@ebraminio
Copy link
Contributor

ebraminio commented Jul 17, 2020

I suspect I'm missing some commands in subsetWithWasm.js?

Yes, just remove exports.hb_subset_input_set_drop_hints(input, true); from your JS code :)

@papandreou
Copy link
Contributor Author

Ah, right, if I remove exports.hb_subset_input_set_drop_hints(input, true);, the files come out identical. I thought I had already tried that :)

@papandreou
Copy link
Contributor Author

Ported that change back into the subfont code, which achieved the same results 😌

Thanks again! I'll direct my future inquiries at the main harfbuzz project when possible.

@ebraminio
Copy link
Contributor

ebraminio commented Jul 17, 2020

Thanks again! I'll direct my future inquiries at the main harfbuzz project when possible.

Eagerly looking forward to it, thanks :)

@yisibl
Copy link
Contributor

yisibl commented Aug 13, 2020

@ebraminio If we need to support preview on the demo page of subset, do we need to add hb_font_create in the build file ? Can you add a live demo on the Demo page?

@ebraminio
Copy link
Contributor

@yisibl Guess this helps you https://harfbuzz.github.io/harfbuzzjs/subset/ you shouldn't need hb_font_create

@yisibl
Copy link
Contributor

yisibl commented Aug 13, 2020

If we add the SVG preview in the picture below to subset Demo[1], don’t need to call hb_font_create[2]?

image

@ebraminio
Copy link
Contributor

ebraminio commented Aug 13, 2020

Correct. That somehow falls into context of shaping rather than subset, you can take the binary from there and even use browser font renderer, or turn it to SVG using hbjs wrapper or go without it https://github.com/harfbuzz/harfbuzzjs/blob/master/examples/nohbjs.html

@papandreou
Copy link
Contributor Author

I ended up "liberating" the code we came up with here as a separate module: https://github.com/papandreou/subset-font

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants