-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Parse Ink data #4
Comments
Hi Markus, |
Hey Sebastian! First, thanks for your work with libmson! It's been a great help for understanding the OneNote file format while working on onenote.rs! Parsing Ink data is an interesting challenge as in my experience this is a widely used feature but Microsoft doesn't see the need to publish the specification for it:
To bad… Anyway, I have a somewhat working version of Ink parsing working locally but haven't had the time to polish it up for publishing the code. From my investigation I was able to determine that the Ink paths are stored as multi-byte data according to the ISF specification, specifically the section on Multi-byte Encoding of Signed Numbers. The algorithm is as follows:
This should allow to parse Ink paths into SVG data. I'll try to push the source code for this soon. During implementation I found that there's a second type of Ink paths used in embedded in rich text objects. This seems to be used for ink paths that OneNote detects to be handwriting. It's using the same data format, but it's really tricky to get the alignment right (inside of one2html) so that the resulting image doesn't get all messed up because the lines and handwriting objects are all over the place. This last bit has kept me from finishing this up for GitHub 🙈 |
I am glad it was of some help. It's mostly a playing field right to challenge myself to learn more regarding software development since I didn't study computer science. I hope to improve its structure with getting more experienced.
This is good news to me. Thank you for sharing this information. I should have read that spec until the end! =) It just didn't appear to me that it would contain the information i needed. Regarding:
Could you check data of the PropertyID Not sure how this included in ISF, but the Metric Table is an array of 32 byte structures.
On my machine, I only ever have the tag_x and tag_y although i tested input from a wacom tablet. |
I wish it were so easy 🙈 The spec contains multiple types of multi-byte encodings and for the first few tries I tried applying the more complicated techniques described in the Compression Algorithms for Packet Data section. This would have made most sense to me: According to the ISF spec, coordinate streams are packet data which may be encoded bit packing or Huffman tables. But it turns out that OneNote doesn't use most of the ISF spec except for the simplest form of multi-byte encoding described there…
Yes, that looks right to me. As far as I can tell, By the way, I've now pushed my ink parsing implementation in a407e1e |
In I noticed, that the first value in InkPath is actually the number of data elements in the vector. For example a multi_byte structure with 113 elements would contain 112 in the first element. So the actual vector ( Although i haven't attempted to learn Rust yet, I tried to follow up that structure in your library. As far as i understand the Is my understanding correct? Or do I miss some language feature which ignores the first element in a multibyte vector? Might this be a source for the "wierdness" you described in the embedded inks? Best regards, |
Very interesting! Didn't notice that 🙂
Yes, this is correct. Lines 261 to 268 in 99db021
I took another shot at embedded ink data and figured out that my two main issues were an incorrect usage of ID mapping tables (global ID table in regular OneNote files) and an incorrect indentation of paragraphs (I didn't properly use the
I have to admit that the code for this project isn't really clean at the moment (and probably not at all a good representation of clean Rust code). In some places I went overboard with trying to use a functional style even when it only complicates everything. Cleaning up the code base is on my list but has low priority as getting ink parsing to work was more important to me 🙂 |
Great, I didn't notice that part before. I spent an hour or two to figure out why my parser would always read odd length, believing i made a mistake somewhere... only when i printed the actual size next to the vector I noticed the correlation, haha.
I wouldn't be able to tell. Moreover, i believe to sit in a similar boat.
Same here. When I got your code the first time I was psyched how well it worked already. You obviously have a good understanding of the top-level document graph. That's something where I have to catch up. |
Same for me :) In the end I decided to accept that it doesn't make sense to me and to not be bothered by it
What really helped me was the overview in the OneNote JS API documentation: It skips over some details (e.g. ink words being contained in ink lines and ink paragraphs) but helped me get a sense of the overall object graph. Also playing with the Add-In JS API for OneNote was helpful as it contains more details and allows to inspect how OneNote itself structures its data (see also the API docs). |
No description provided.
The text was updated successfully, but these errors were encountered: