Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker PostMessage Data Corruption #1408

Closed
NathanaelA opened this issue Jun 26, 2019 · 5 comments
Closed

Worker PostMessage Data Corruption #1408

NathanaelA opened this issue Jun 26, 2019 · 5 comments
Assignees
Labels
Milestone

Comments

@NathanaelA
Copy link
Contributor

Environment
Provide version numbers for the following components (information can be retrieved by running tns info in your project folder or by inspecting the package.json of the project):

  • CLI: 5.4 (In Legacy Mode, as app isn't webpack compatible yet)
  • Cross-platform modules: 5.4
  • Android Runtime: 5.4

Describe the bug
I just upgraded an app from NS 2.x to NS 5.4 (finally convinced my client!!!) and I was seeing some really weird results. I did some digging into it and finally tracked it down to the Worker's data it is sending it back to the primary thread is partially corrupted.

Basically the process is I send a massive text file from the primary thread to the background worker for it to parse and it create a nice memory JSON database. It saves this DB so that on app restarts it can use the already parsed data, and not have to re-download it and then can return it to the primary thread. I already pulled both the original downloaded data and the parsed data and everything looks perfect on the worker thread.

In my NS 2 design I would just do this from the worker:
global.postMessage({cmd: WORKERMESSAGE.ONDATA, results: db._getAllData()});
The db._getAllData() call would actually return a pure object result set (i.e. it is not JSON stringified.) So it would be like '{x: 1, y:2, table1: [ {entry1}, {entry2} ... ]....}`. So I am passing pure data objects over the wire. This worked in NS 2 perfectly.

In NS5 (and as a workaround) I have now had to do this:
global.postMessage({cmd: WORKERMESSAGE.ONDATA, results: JSON.stringify(db._getAllData())}); and then JSON.parse it on the primary thread... As you can imagine 5 megs of parsing is a small hit; vs how fast it used to be to push plain old data over the message... The data size currently returned is: 5,345,087 of JSON.stringified data.

Now that you see what works; and fails this is the actual issue on how it fails. It always appears to fail on the same records (i.e. this gives the semi-random effect as the majority of them are good in the lower ranges.

So outputting the records:
0, 1, 4, 7-35, are correct,
2, 3, 5, 6, 36, 37, 43, 52, 65-76 are broken.
So their is like only 20 broken records in the 1000's and 2000's but only one broken record in the 3000's and then looks like the majority of the 5000's and 6000's are broken...
As you can see it doesn't follow any rime or reason which are broken -- and it appears that the same records get corrupted each time, as I've noticed record 2 & 3 are always broken (which is what initially clued me into their being an issue)

So what I did was find one of the records that was showing anomalies and output that record on the Worker side and got:

 ==== object dump start ====
JS: setDetailId: "69233060"
JS: .... LOTS of CORRECT Data Removed ....
JS: fkConceptIdList: []
JS: fkTagIdList: [6]
JS: iud: ""
JS: cMin: [0, 0]
JS: cHalfLayer: [0, 0]
JS: cLayer: [1, 18]
JS: cCarrier: [2, 18]
JS: changed: "0"

However on the Primary side; when I used the same function to output the exact same record immediately after I received in; this is what I had:

==== object dump start ====
JS: setDetailId: "69233060"
... LOTS of CORRECT Data Removed ...
JS: fkConceptIdList: "[Circular]"
JS: fkTagIdList: "[Circular]"
JS: iud: ""
JS: cMin: "[Circular]"
JS: cHalfLayer: "[Circular]"
JS: cLayer: "[Circular]"
JS: cCarrier: "[Circular]"
JS: changed: "0"

As you can see EVERY single ARRAY in the record was converted to a text string called "[Circular]" even empty arrays.

It isn't all arrays going over the bridge; but it is all arrays in the record is showing corruption... So if the record triggers a "[Circular]" then ALL arrays in that specific record will be that. In addition the data shown is exactly how the records look. Zero, One or Two element arrays in 99.9% of all cases...

To Reproduce
Not sure how to easily reproduce outside of my app. Might be able to create a worker that reads in my raw data and passes it over to the primary thread. I do have copies of it extracted from the app as part of my diagnostics. However, this is clients proprietary data; so I'd rather not share if I don't have to. However, you might look to see what triggers it to print "[Circular]" and see the logic issue that may have been put into place since NS2.

Expected behavior
It to work like NS 2 did and not corrupt the data in memory...

@darind
Copy link
Collaborator

darind commented Jun 26, 2019

Thanks for the detailed bug description.

The android runtime is not designed to share V8 isolate or any objects between multiple threads. So the only way to do communication between the workers is to serialize the data by making a copy of it. Here's a very good explanation of why this is necessary and even if this article is about Node.js, the exact same limitations apply to the {N} android runtime: https://nodeaddons.com/how-not-to-access-node-js-from-c-worker-threads/

So even if in NS 2 you were passing plain javascript objects, behind the scenes we have always been JSON stringifying them (and you were already paying the price).

The difference I can see between NS 2 and 5 is that we have introduced a so called smart JSON stringify function for serializing the data between the workers instead of relying on the built-in V8 JSON.stringify: 593c75f

When this function detects circular object references it will serialize them as [Circular] in the resulting string. This is why you are observing those litterals with the current version of the runtime.

I am not quite sure how the V8's built-in JSON.stringify function was handling those circular object graphs but I am pretty sure you might also be getting wrong data with NS 2. They might not have broken the app because there were no [Circular] magic strings but I can bet 2 cents that your data was wrong back in NS 2 days (probably missing values or something).

With this in mind, I would recommend redesigning the current object graph that you wish to transfer between the worker and the main thread so that it doesn't have any circular object references in it.

If the data is backed by some native Java objects you might also checkout the following approach to share it between the workers and void paying the serialization price that comes with it.

@NathanaelA

This comment was marked as abuse.

@darind
Copy link
Collaborator

darind commented Jun 27, 2019

@NathanaelA, the next thing that comes to mind is that the problem might be caused by the new JSON stringify routine. Since the data is confidential and you can't send it over, I have prepared a custom version of the android runtime that uses the old serialization function: https://we.tl/t-U81LAnmcs7

If the issue still persists, the next step would be to try and pinpoint which part of your data is causing it. You can remove any extra fields and and also anonymise the values so that you can send it to us for more thorough troubleshooting.

@NathanaelA

This comment was marked as abuse.

@darind
Copy link
Collaborator

darind commented Jun 27, 2019

@NathanaelA, thanks, your sample data is spot on!

In 6.0.0 we will revert back to the built-in JSON.stringify method for cross workers communication: #1411

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants