-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Font subsetting features? #9
Comments
Hey there! Great idea! Just added build recipe, edb6f50 Here is the pure .wasm binary for hb-subset, You may like to try to port this C code to JavaScript code like this https://github.com/harfbuzz/harfbuzzjs/blob/master/examples/nohbjs.html but using the .wasm I've given to you it should work but I will try myself later. /* Creating a face */
hb_blob_t *blob = hb_blob_create (font_data, font_data_length, HB_MEMORY_MODE_READONLY, nullptr, nullptr);
/* Or if you like to read an actual file: hb_blob_t *blob = hb_blob_create_from_file (path); */
hb_face_t *face = hb_face_create (blob, 0/*this is ttcIndex*/);
hb_blob_destory (blob); /* face keeps a reference of to it so you can destroy it here */
/* Add your glyph indices here and subset */
hb_set_t *glyphs = hb_set_create ();
hb_set_add (glyphs, 0);
hb_set_add (glyphs, 3);
hb_subset_input_t *input = hb_subset_input_create_or_fail ();
hb_set_t *input_glyphs = hb_subset_input_glyph_set (input);
hb_set_union (input_glyphs, glyphs);
hb_subset_input_set_drop_hints (input, true);
//hb_subset_input_set_drop_layout (input, true);
hb_face_t *subset = hb_subset (face, input);
/* Clean up */
hb_subset_input_destroy (input);
/* Get result blob */
hb_blob_t *result = hb_face_reference_blob (subset);
unsigned int length;
const char *data = hb_blob_get_data (blob, &length);
/* Write. If you like! */
FILE *f = fopen (output_path, "wb");
fwrite (data, 1, length, f);
fclose (f);
/* Clean up */
hb_blob_destroy (result);
hb_face_destroy (subset); |
Note that HarfBuzz subsetter is not a complete replacement for pyftsubset yet. It should be, later this year. |
@ebraminio, thanks a lot for looking into this! It seems very promising. I followed your instructions, and it does produce a functional font 🎉 const fs = require('fs');
const readFileAsync = require('util').promisify(fs.readFile);
const writeFileAsync = require('util').promisify(fs.writeFile);
(async () => {
const { instance: { exports } } = await WebAssembly.instantiate(await readFileAsync(__dirname + '/subset/hb-subset.wasm'));
exports.memory.grow(400); // each page is 64kb in size
const fontBlob = await readFileAsync(__dirname + '/roboto-black.ttf');
const heapu8 = new Uint8Array(exports.memory.buffer);
const fontBuffer = exports.malloc(fontBlob.byteLength);
heapu8.set(new Uint8Array(fontBlob), fontBuffer);
/* Creating a face */
const blob = exports.hb_blob_create(fontBuffer, fontBlob.byteLength, 2/*HB_MEMORY_MODE_WRITABLE*/, 0, 0);
const face = exports.hb_face_create(blob, 0);
exports.hb_blob_destroy(blob);
/* Add your glyph indices here and subset */
// hb_set_t *glyphs = hb_set_create ();
const glyphs = exports.hb_set_create();
// hb_set_add (glyphs, 0);
exports.hb_set_add(glyphs, 0);
// hb_set_add (glyphs, 3);
exports.hb_set_add(glyphs, 3);
// hb_subset_input_t *input = hb_subset_input_create_or_fail ();
const input = exports.hb_subset_input_create_or_fail();
// hb_set_t *input_glyphs = hb_subset_input_glyph_set (input);
const input_glyphs = exports.hb_subset_input_glyph_set(input);
// hb_set_union (input_glyphs, glyphs);
exports.hb_set_union(input_glyphs, glyphs);
// hb_subset_input_set_drop_hints (input, true);
exports.hb_subset_input_set_drop_hints(input, true);
// //hb_subset_input_set_drop_layout (input, true);
// exports.hb_subset_input_set_drop_layout(input, true);
// hb_face_t *subset = hb_subset (face, input);
const subset = exports.hb_subset(face, input);
/* Clean up */
exports.hb_subset_input_destroy(input);
/* Get result blob */
const result = exports.hb_face_reference_blob(subset);
// unsigned int length;
// const char *data = hb_blob_get_data (blob, &length);
const lengthPointer = exports.malloc(2); // Not sure this is the idiomatic way to do it :)
const data = exports.hb_blob_get_data(blob, lengthPointer);
const length = heapu8[lengthPointer] + heapu8[lengthPointer + 1] << 8;
const subsetFontBlob = heapu8.slice(data, data + length);
await writeFileAsync(__dirname + '/roboto-black-subset.ttf', subsetFontBlob);
/* Clean up */
exports.hb_blob_destroy(result);
exports.hb_face_destroy(subset);
})(); ... Although the resulting file size is 51968 bytes, whereas the original |
Weird. Can you try with HarfBuzz native (hb-subset command) and report a bug against upstream harfbuzz? |
Cool! I am uploading a new wasm which has const fs = require('fs');
const readFileAsync = require('util').promisify(fs.readFile);
const writeFileAsync = require('util').promisify(fs.writeFile);
(async () => {
const { instance: { exports } } = await WebAssembly.instantiate(await readFileAsync(__dirname + '/hb-subset.wasm'));
exports.memory.grow(400); // each page is 64kb in size
const fontBlob = await readFileAsync(__dirname + '/Roboto-Black.ttf');
const heapu8 = new Uint8Array(exports.memory.buffer);
const fontBuffer = exports.malloc(fontBlob.byteLength);
heapu8.set(new Uint8Array(fontBlob), fontBuffer);
/* Creating a face */
const blob = exports.hb_blob_create(fontBuffer, fontBlob.byteLength, 2/*HB_MEMORY_MODE_WRITABLE*/, 0, 0);
const face = exports.hb_face_create(blob, 0);
exports.hb_blob_destroy(blob);
/* Add your glyph indices here and subset */
// hb_set_t *glyphs = hb_set_create ();
const glyphs = exports.hb_set_create();
// hb_set_add (glyphs, 0);
exports.hb_set_add(glyphs, 0);
// hb_set_add (glyphs, 3);
exports.hb_set_add(glyphs, 3);
// hb_subset_input_t *input = hb_subset_input_create_or_fail ();
const input = exports.hb_subset_input_create_or_fail();
// hb_set_t *input_glyphs = hb_subset_input_glyph_set (input);
const input_glyphs = exports.hb_subset_input_glyph_set(input);
// hb_set_union (input_glyphs, glyphs);
exports.hb_set_union(input_glyphs, glyphs);
// hb_subset_input_set_drop_hints (input, true);
exports.hb_subset_input_set_drop_hints(input, true);
// //hb_subset_input_set_drop_layout (input, true);
// exports.hb_subset_input_set_drop_layout(input, true);
// hb_face_t *subset = hb_subset (face, input);
const subset = exports.hb_subset(face, input);
/* Clean up */
exports.hb_subset_input_destroy(input);
/* Get result blob */
const result = exports.hb_face_reference_blob(subset);
const data = exports.hb_blob_get_data(result, 0);
const subsetFontBlob = heapu8.slice(data, data + exports.hb_blob_get_length(result));
await writeFileAsync(__dirname + '/roboto-black-subset.ttf', subsetFontBlob);
/* Clean up */
exports.hb_blob_destroy(result);
exports.hb_face_destroy(subset);
})(); I used https://github.com/google/fonts/blob/master/apache/roboto/Roboto-Black.ttf 167kb and the result is 2kb :) |
Here is a wasm version with all of https://github.com/harfbuzz/harfbuzz/blob/master/src/hb-subset.h API: hb-subset.zip please note that hb-subset API is not considered stable however. |
And the builds here are HB_TINY builds, I don't know what implications that would have for hb-subset functionality but I guess there are some. |
@behdad, with When I use the latest I tried adding <!DOCTYPE html>
<html>
<head>
<style>
@font-face {
font-family: roboto;
src: url(roboto-black.ttf);
}
@font-face {
font-family: robotosubset;
src: url(roboto-black-subset.ttf);
}
</style>
</head>
<body>
<div style="font-family: roboto; font-size: 50px;">ROBOTO</div>
<div style="font-family: robotosubset; font-size: 50px;">ROBOTO (subset)</div>
</body>
</html> It just renders the fallback font: I'm guessing that's what will work later this year? :) |
Can you check the native command output also? There should be no difference between it and the wasm output. Also do you hb_shape in wasm build? That adds up some 170kb in size but makes you able to replicate hb-subset command. I think due to being HB_TINY you should use enable retain gid, hb_subset_input_set_retain_gids, should check this myself. |
No it's not supposed to be broken like that. Please test with non-JS natively built hb-subset and report. |
Okay here is working thing now! :) const fs = require('fs');
const readFileAsync = require('util').promisify(fs.readFile);
const writeFileAsync = require('util').promisify(fs.writeFile);
(async () => {
const { instance: { exports } } = await WebAssembly.instantiate(await readFileAsync(__dirname + '/hb-subset.wasm'));
exports.memory.grow(400); // each page is 64kb in size
const fontBlob = await readFileAsync(__dirname + '/Roboto-Black.ttf');
const heapu8 = new Uint8Array(exports.memory.buffer);
const fontBuffer = exports.malloc(fontBlob.byteLength);
heapu8.set(new Uint8Array(fontBlob), fontBuffer);
/* Creating a face */
const blob = exports.hb_blob_create(fontBuffer, fontBlob.byteLength, 2/*HB_MEMORY_MODE_WRITABLE*/, 0, 0);
const face = exports.hb_face_create(blob, 0);
exports.hb_blob_destroy(blob);
/* Add your glyph indices here and subset */
const glyphs = exports.hb_set_create();
exports.hb_set_add(glyphs, 'a'.charCodeAt(0));
exports.hb_set_add(glyphs, 'b'.charCodeAt(0));
exports.hb_set_add(glyphs, 'c'.charCodeAt(0));
const input = exports.hb_subset_input_create_or_fail();
const input_glyphs = exports.hb_subset_input_unicode_set(input);
exports.hb_set_union(input_glyphs, glyphs);
exports.hb_subset_input_set_drop_hints(input, true);
const subset = exports.hb_subset(face, input);
/* Clean up */
exports.hb_subset_input_destroy(input);
/* Get result blob */
const result = exports.hb_face_reference_blob(subset);
const data = exports.hb_blob_get_data(result, 0);
const subsetFontBlob = heapu8.slice(data, data + exports.hb_blob_get_length(result));
await writeFileAsync(__dirname + '/roboto-black-subset.ttf', subsetFontBlob);
/* Clean up */
exports.hb_blob_destroy(result);
exports.hb_face_destroy(subset);
})(); The wasm file: hb-subset.wasm.zip And I can say this is our expected thing :) |
The font output by the natively built With the latest code snippet and wasm file from @ebraminio I get an also functional file of only 1212 bytes 🎉 They produce seemingly identical renderings in Chrome on OSX: I don't know how to take ttfs apart, but as far as https://fontdrop.info/ can tell the two font files are identical except some differences in the Diffing the output of the The question is then, do I need those things when I'm targetting a web browser, or should I consider that a feature? 🤔 |
Remove exports.hb_subset_input_set_drop_hints call and see what happens! :) (that is the difference AFAIK) |
@ebraminio, yes, that's exactly it! Then the only difference between the two files (both 2428 bytes) is: --- roboto-black-subset-with-hbsubset.ttx 2019-07-07 11:54:12.000000000 +0200
+++ roboto-black-subset-with-wasm.ttx 2019-07-07 11:54:05.000000000 +0200
@@ -12,7 +12,7 @@
<!-- Most of this table will be recalculated by the compiler -->
<tableVersion value="1.0"/>
<fontRevision value="1.0"/>
- <checkSumAdjustment value="0x97ec4f24"/>
+ <checkSumAdjustment value="0x6f737206"/>
<magicNumber value="0x5f0f3cf5"/>
<flags value="00000000 00011111"/>
<unitsPerEm value="2048"/> ... Not exactly sure what that means, but probably no biggie 😅 |
This all looks really promising, thank you so much. I'm thinking of doing a spike in Given that @behdad said that
... can you think of any reason why it wouldn't be complete enough for the web use case? We presently use a command line of:
I've noted that you said that the |
I guess I'll need to find a way to preserve the ligatures that are utilized by the text on the web page. Edit: Looks like |
They are identical in my case, do you have different hashes? A difference can mean something bad has happened so let us know. |
Yeah, I get different hashes when I subset the
(still only that If I download the Roboto that you have been using (https://github.com/google/fonts/blob/master/apache/roboto/Roboto-Black.ttf) and repeat the experiment (
So the difference is triggered by the file that I was testing with from the start. Here is a copy of it: roboto-black.zip |
Reproduced it locally, something fishy is going on |
harfbuzz/harfbuzz#1823 is related, lets see what happens there to proceed, such differences are very important and should be investigated.
I like to know answers of these also but as I haven't made myself that familiar with subset I can't help much so maybe Behdad can help |
Hey @papandreou your report resulted to finding real issues in harfbuzz subset module and its testing, which are fixed now, feel free to test the new binary and report any difference you see.
|
@ebraminio, happy to help! I can confirm that both new wasm builds resolve the issue with the different checksum adjustments. Now I also get the |
@behdad, I looked a bit into the harfbuzz docs and source code, but couldn't figure out how to learn which ligatures are present in a font, or how to add them to a subset. Can you confirm that it's not implemented yet, or offer a few pointers? 🙏 |
You won't need to do anything. When it's implemented, it should automatically includes ligatures and other relevant features in the subset based on the Unicode characters you requested. Check back in a month. It should be in better shape then. Definitely ligatures done by then. |
Great! So let's close this and continue the thread in harfbuzz repository itself for feature requests of harfbuzz but continue here for putting wasm builds. Thanks :) |
Nice! Apparently Behdad is still working on finishing gsub/gpos support of the subsetter. I am also working on the related aspects of this on harfbuzz repo itself. Just about the licensing, HarfBuzz and this project itself are released under MIT, we use minor elements from Intel Zephyr libc that are under Apache 2.0 and BSD (the below one) here https://github.com/harfbuzz/harfbuzzjs/blob/master/libc/zephyr-string.c maybe we should replace these minis with consistent, preferably released under MIT license ones, still be careful to mention those on a release. |
That sounds great! I'm trying to follow along and I'm very impressed with the progress that you're making 👍 Wrt. the licensing I don't mind a mix of MIT, Apache 2.0 and BSD, but from the perspective of the library you're probably right. The devil's always in the details. Either way I'll just await your progress and will then report back. Once we have those basic examples rendering the same way before and after subsetting with harfbuzz, I can also enhance the test suite and maybe help finding more discrepancies. |
That sounds just great :) Here is a new build, hb-subset.wasm.zip note that I am optimizing for speed in these builds but can optimize for size if you like to. Feel free to check if it works as expected :) Will upload an online demo for it also |
Thanks for the new build! The screenshot-based tests show the same difference as before, but I guess that's still expected :) For now my use case is purely "server side", so it's not hugely important for me whether the wasm is optimized for speed or size at this stage. Maybe a slight preference for speed :) |
Online demo: https://harfbuzz.github.io/harfbuzzjs/subset/ |
@ebraminio, just checking in to see how it's going with ligature/GSUB/GPOS support. I saw that a lot of commits have landed in harfbuzz related to that, but I don't really understand what the status is :) I built I'm still using your instructions in the function that generates the subset: https://github.com/Munter/subfont/blob/4a3bf697c7e7ebdd17515cc0620fff2ebf8d5b22/lib/subsetLocalFontWithHarfbuzz.js#L30-L71 Maybe I need some additional commands to include those other tables? |
Hey Andreas, some progress on GSUB but no on GPOS yet :/ We will get there
soon hopefully, subset has become an interesting part of the library.
…On Sun, Feb 9, 2020, 01:37 Andreas Lind ***@***.***> wrote:
@ebraminio <https://github.com/ebraminio>, just checking in to see how
it's going with ligature/GSUB/GPOS support. I saw that a lot of commits
have landed in harfbuzz related to that, but I don't really understand what
the status is :)
I built hb-subset.wasm and linked it into my harfbuzz subfont branch just
now, but I still see the same rendering differences as shown in the
screenshots on: Munter/subfont#56
<Munter/subfont#56>
I'm still using your instructions in the function that generates the
subset:
https://github.com/Munter/subfont/blob/4a3bf697c7e7ebdd17515cc0620fff2ebf8d5b22/lib/subsetLocalFontWithHarfbuzz.js#L30-L71
Maybe I need some additional commands to include those other tables?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9?email_source=notifications&email_token=AAGLPQPASMTEFAKIFWCS2Q3RB4UKXA5CNFSM4H4WHF32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELF4ZOA#issuecomment-583781560>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGLPQMKKCYI6H5ZB22K62DRB4UKXANCNFSM4H4WHF3Q>
.
|
@ebraminio, hi again! I'm continuing to maintain my harfbuzz branch in subfont, but I still get the same differences when comparing against the rendering of the web page before: Is it still the same state regarding the subsetting of GSUB and GPOS, or do I need to change something to take advantage of it? The code that uses |
The situation with GPOS and specially GSUB isn't that improved since the previous time unfortunately but still a bit unclear you are comparing, are you comparing |
@ebraminio, thanks for the update!
These tests are comparing the rendering of some small webpages in headless Chrome before and after applying subsetting. So a rendering with the original fonts vs. the ones created with various configurations and subsetting tools. So the magenta parts of the above images show that subsetting the font using The same test cases are run with
Okay, thanks! I'll try that!
Thanks! That build results in the same diffs. I'm already compiling it myself using |
Oh, so you can build harfbuzz itself also easily, |
But why I like you to interact with harfbuzz C++ project itself, I believe if you give your feedback on exact missing things (by comparing ttx result of pyftsubset vs hb-subset command line not the wasm one which should be identical anyway) on the main project (filing bugs there) that will speed up the development there, or, at least outsiders can know also which things are missing so may contribute there also. |
I tried rewiring the harfbuzz subsetting code to shell out to the The "Waffle stuffings" case with ligatures is still off. ... But those differences do seem to indicate that there's something wrong in harfbuzzjs or the way that I use it |
oh, that is definitely looks bad, can you help me reproduce it like the previous case? some ttx / binary difference analysis would be awesome also. |
|
Source font: Roboto-400.zip subsetWithWasm.js/* global WebAssembly */
const fs = require('fs');
const readFileAsync = require('util').promisify(fs.readFile);
const _ = require('lodash');
const { readFile, writeFile } = require('fs').promises;
const loadAndInitializeHarfbuzz = _.once(async () => {
const {
instance: { exports },
} = await WebAssembly.instantiate(
await readFileAsync(require.resolve('harfbuzzjs/subset/hb-subset.wasm'))
);
exports.memory.grow(400); // each page is 64kb in size
const heapu8 = new Uint8Array(exports.memory.buffer);
return [exports, heapu8];
});
(async () => {
const originalFont = await readFile('Roboto-400.ttf');
const text = 'Hello, world!';
const [exports, heapu8] = await loadAndInitializeHarfbuzz();
const fontBuffer = exports.malloc(originalFont.byteLength);
heapu8.set(new Uint8Array(originalFont), fontBuffer);
// Create the face
const blob = exports.hb_blob_create(
fontBuffer,
originalFont.byteLength,
2, // HB_MEMORY_MODE_WRITABLE
0,
0
);
const face = exports.hb_face_create(blob, 0);
exports.hb_blob_destroy(blob);
// Add glyph indices and subset
const glyphs = exports.hb_set_create();
for (let i = 0; i < text.length; i += 1) {
exports.hb_set_add(glyphs, text.charCodeAt(i));
}
const input = exports.hb_subset_input_create_or_fail();
const inputGlyphs = exports.hb_subset_input_unicode_set(input);
exports.hb_set_union(inputGlyphs, glyphs);
exports.hb_subset_input_set_drop_hints(input, true);
const subset = exports.hb_subset(face, input);
// Clean up
exports.hb_subset_input_destroy(input);
// Get result blob
const result = exports.hb_face_reference_blob(subset);
const offset = exports.hb_blob_get_data(result, 0);
const subsetFontBlob = heapu8.slice(
offset,
offset + exports.hb_blob_get_length(result)
);
// Clean up
exports.hb_blob_destroy(result);
exports.hb_face_destroy(subset);
await writeFile('roboto-400-wasm.ttf', subsetFontBlob);
})();
Seems like |
Yes, just remove |
Ah, right, if I remove |
Ported that change back into the subfont code, which achieved the same results 😌 Thanks again! I'll direct my future inquiries at the main harfbuzz project when possible. |
Eagerly looking forward to it, thanks :) |
@ebraminio If we need to support preview on the demo page of subset, do we need to add |
@yisibl Guess this helps you https://harfbuzz.github.io/harfbuzzjs/subset/ you shouldn't need |
If we add the SVG preview in the picture below to subset Demo[1], don’t need to call
|
Correct. That somehow falls into context of shaping rather than subset, you can take the binary from there and even use browser font renderer, or turn it to SVG using hbjs wrapper or go without it https://github.com/harfbuzz/harfbuzzjs/blob/master/examples/nohbjs.html |
I ended up "liberating" the code we came up with here as a separate module: https://github.com/papandreou/subset-font |
Hey, I read that harfbuzz is gaining support for font subsetting and is working towards replacing
pyftsubset
from fonttools: https://github.com/harfbuzz/harfbuzz/projects/4I'm not sure what the status of that effort is, but I'm drooling over the idea of using it in subfont without shelling out to python. Also, I'm really looking forward to the ability to do axis trimming of variable fonts some time in the future, so it would be great to try switching to harfbuzz.
I don't have any practical experience with this module or harfbuzz itself, but it looks like the subsetting features aren't exposed?
The text was updated successfully, but these errors were encountered: