-
-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEWABLE] A second approach to storing non-serializable data in Redux #4047
Conversation
Note from that linked redux-persist commit this is a really straightforward change to make (I wish it had been there from the beginning!), so that's not the challenge, once we commit to and finish doing some version of #2915. |
@gnprice, locally, I've just been attempting to do the work of #2915. I think there are enough merge conflicts that it'll be easiest just to do it from scratch, rather than just using that PR as-is, if we're going to do it. But there's a lot of useful context in that PR's commit messages. I just found additional useful context at #2762 and in this discussion. Without manually tweaking any lines of code, I've managed to reduce the problems flagged by
|
Here is that output from ESLint:
|
Would you suggest manually fixing those remaining problems? I can try to keep a careful record of exactly what I did to fix them, so it's less necessary to just assume I've done it all correctly. Alternatively, we could have ESLint ignore these errors, to avoid the possibility that a person (such as myself) accidentally introduced bugs by trying to make this output go away with manual tweaking. |
For the part that's in the PR now: I think I quite like this approach! It feels like a lot less code than the "outside-in" approach in #3952. Thanks for prototyping both 😄 In particular, when I imagine going and converting things to For vendoring redux-persist so we can apply that patch to it: I think let's start by leaving that code in its current style, and just telling ESLint and Prettier not to operate on it. For ESLint, you can do this by adding to the
Based on a rough rebase of #2915, I think that eliminates all the lint errors other than missing dependencies. And I think all of these are not direct signs of problems -- they're either stylistic choices, or asking the code to use helpful features of modern JS. But it'd be useful for you to go through them to check that for yourself, too (and perhaps organize and/or comment better than my rough example above.) Then we can always follow up to modernize and assimilate the code. But that lets us move forward so we can do that at our leisure. (Even if the code is old-fashioned or sometimes messy, it's the same code we were relying on before anyway!) |
For Prettier, it looks like adding to Like this, since it's apparently gitignore syntax:
|
bbad956
to
8567842
Compare
Hi @gnprice, I just pushed my changes, getting our fork of I agree that it's much nicer to see that pretty much all of the changes in that commit are there strictly to adapt to I figured it would be nice to show you the |
It's less code, but it's less code in the way that (EDIT: this reads like hyperbole, and in this world, it is; it arises from my error below.) From #3952 (comment):
The problem is that this is a terrifyingly fragile global property. It's true today – by chance, rather than by design! – but it's totally plausible that a year from now no one will remember that this is something we need to worry about, and that a new object-as-map will be added which can have arbitrary keys. With the types we have at hand, I don't believe this could do more than cause a denial-of-service (e.g., after connecting to a Zulip server with a poorly- or maliciously-chosen emoji name, the app crashes on startup until it's uninstalled); but as more types are added to the deserializer, the chances that someone will be able to inject a construct that has more subtle security implications increases combinatorically.
(EDIT: As shown by the later solution to this issue, this last was incorrect.) |
Hmm, OK. Take a look at UserPresence:
I haven't gone through to verify whether and how an arbitrary
In
where
Here's an excerpt of the default reviver that accomplishes that, in node_modules/immutable/serialize.js:
Does this sound like a valid example of a simple denial-of-service you're thinking of? I don't mean it to distract from your larger concern that
|
And would it be an acceptable defense against all such attacks to make the replacer drop all user-supplied keys that are Or to use some kind of secret to prove that a key was persisted as |
Yes, although I think there's a shorter path to it: the rehydrator attempts to revive the entire value at
We can't. Although I'm pretty sure it won't be true in the (Also, Christopher Null has a few opinions on the subject.) |
Now we're getting into something feasible, I think. I can't see any way to make a secret work, but one might be able to escape any key matching EDIT: Which suggests, I think, that we should really be characterizing this entire class of security holes as failures of the serialization function to be injective. |
I just found a flaw in my example above; glad I spelled out the steps. I assumed it would be revived as an Immutable.Map, but actually, that wouldn't happen unless the value at I agree it would be problematic to drop data on purpose, too. Even in the presence case, that would violate the UserPresence JSDoc (above):
(It's already violated, but in a minor way that doesn't violate our types — still, I should probably fix that JSDoc.)
Lol. 😆 |
I think this is a straightforward solution to all the problems discussed in the last day's messages on this thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @chrisbobbe ! This looks great.
A couple of remarks below about the workflow for libdefs. Then I think the one substantive thing remaining is to make the replacer/reviver fully round-trip-clean, by escaping __serializedType__
properties as discussed above.
When doing that, one thing that will probably help is to split up the main "redux: Store parsed ZulipVersion in Redux state" into two commits:
- One inserts
redux-persist
into things, plus our fix for proper round-tripping, but doesn't yet do anything about ZulipVersion. - Then the next starts replacing/reviving ZulipVersion objects. ... hmm and maybe even stops there, touching only the replacer and reviver, so splitting into three commits?
- And then the last commit is the one that actually changes the reducer to maintain a
ZulipVersion
object instead of a string, and adds the migration to match.- In fact if we wanted we could even split out as a fourth commit the changes to the action types, and so the changes at the various places we dispatch those actions. But that would add a bit of churn at the reducers, and the reducer + action changes aren't a particularly complex diff combined, so I think it's just as good to keep these together as one commit.
/** | ||
* Flowtype definitions for index | ||
* Generated by Flowgen from a Typescript Definition | ||
* Flowgen v1.10.0 | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting!
Where did this file come from -- did you have to run Flowgen yourself (and how)? Or did you run something like yarn update-libdefs
and find that that ran Flowgen automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I'm also kind of puzzled because I don't see any TypeScript in that library!
$ find node_modules/remotedev-serialize/
node_modules/remotedev-serialize/
node_modules/remotedev-serialize/README.md
node_modules/remotedev-serialize/helpers
node_modules/remotedev-serialize/helpers/index.js
node_modules/remotedev-serialize/immutable
node_modules/remotedev-serialize/immutable/serialize.js
node_modules/remotedev-serialize/immutable/index.js
node_modules/remotedev-serialize/package.json
node_modules/remotedev-serialize/index.js
node_modules/remotedev-serialize/constants
node_modules/remotedev-serialize/constants/options.js
So I wonder where the information is coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! It comes from DefinitelyTyped, and I did indeed run Flowgen myself. I'll get this all meticulously documented in the next revision.
declare export type Reviver = (key: string, value: any, reviver: DefaultReviver) => any; | ||
declare export function immutable( | ||
immutable: any, | ||
refs?: Refs | null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to find a way that this kind of fix won't get automatically clobbered when we, say, upgrade this library. (Or when we just run yarn update-libdefs
, after upgrading any of our libraries.) How to do that depends on what the workflow is that generated these files in the first place, though. Maybe discuss in a chat thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! If it can fit in seamlessly enough, perhaps, to pick up this thread? That's where I learned that putting things in flowtyped/npm
means they might get clobbered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure -- a relevant existing thread is even better than a new one 🙂
…ing. `remotedev-serialize` has not been used yet (it was only added in the previous commit); this work is purely preventive. Ray points out [1] a security hole (or eventual security hole) that would arise from a standard use of `remotedev-serialize`, where its replacers/revivers (custom or default) operate on objects-as-map in Redux where the keys are arbitrary (i.e., not within well-defined namespaces, such as numbers, emails, JSON-serialized narrows, etc.), especially if those keys can be manipulated by users. `remotedev-serialize` relies on a magic string [2], '__serializedType__'. This string acts as an identifier, but it gets thrust into an area where data belongs (and has always belonged), so collisions are a problem unless we do something to ensure that it remains unique against all keys that represent data. So, solve this by escaping that magic string where it occurs in data, so that it doesn't activate the hidden functionality. Develop and recommend a wrapper that has largely the same API as `remotedev-serialize` [3] and takes care of all the necessary escaping, so development can continue without having to think about and make judgments related to this security hole. [1]: zulip#4047 (comment) [2]: zulip#4062 (comment) [3]: except that it strongly recommends using a `const` variable containing '__serializedType__', instead of making new string literals every time. This variable is also used in the wrapper's implementation, so consumers can be certain that it's the same one that is properly escaped.
8567842
to
fe4159a
Compare
Edited to update commit IDs / topic lines, following my force-push noted at #4047 (comment). OK, I've pushed my next revision. I explain why tests are failing at the tip, at the tip: 89c5d96 [DO NOT MERGE]: Rehydrate time trial results. That commit, and its parent (they're both labeled [DO NOT MERGE]), are NEVER intended to be merged and exist only for discussion. I'm hoping 235458e deps: Add answers all your questions about the workflow for libdefs.
I think you might have meant From "insert 235458e deps: Add I think it makes sense that these are four separate commits; what do you think? (Arguably, I could split "Add
Done; that's 3fa900a redux: Replace/revive ZulipVersion instances.
Done as a904564 redux: Store parsed ZulipVersion in Redux state, instead of string.
Makes sense. If you change your mind, I'm happy to split this out. |
As always, I'm happy to be convinced that my approach, or explicit or implicit assumptions, are wrong. 🙂 I do have some questions. As I mentioned here, it's harder to test that 724a3be redux [security]: Add gets a [DO NOT MERGE] tag — I'll go and add that right now. EDIT: Done, as d94379a [DO NOT MERGE] redux [security]: Add .) |
Another thing is that the types for
I can't use the return type of |
…th proper escaping. Query: I have tests for `escape` and `unescape`, here, but I don't have tests to ensure these functions are called in the correct places. This is harder to test for (shall I mock `remotedev-serialize`?). Below the line is my draft commit message, written before this query occurred to me. ------------------------------------------------------------------- `remotedev-serialize` has not been used yet (it was only added in the previous commit); this work is purely preventive. Ray points out [1] a security hole (or eventual security hole) that would arise from a standard use of `remotedev-serialize`, where its replacers/revivers (custom or default) operate on objects-as-map in Redux where the keys are arbitrary (i.e., not within well-defined namespaces, such as numbers, emails, JSON-serialized narrows, etc.), especially if those keys can be manipulated by users. `remotedev-serialize` relies on a magic string [2], '__serializedType__'. This string acts as an identifier, but it gets thrust into an area where data belongs (and has always belonged), so collisions are a problem unless we do something to ensure that it remains unique against all keys that represent data. So, solve this by escaping that magic string where it occurs in data, so that it doesn't activate the hidden functionality. Develop and recommend a wrapper that has largely the same API as `remotedev-serialize` [3] and takes care of all the necessary escaping, so development can continue without having to think about and make judgments related to this security hole. [1]: zulip#4047 (comment) [2]: zulip#4062 (comment) [3]: except that it strongly recommends using a `const` variable containing '__serializedType__', instead of making new string literals every time. This variable is also used in the wrapper's implementation, so consumers can be certain that it's the same one that is properly escaped.
fe4159a
to
89c5d96
Compare
(Added @ray-kraesig as a reviewer because he prompted the preventive measures about the security hole and may like to request or follow revisions to the implementation.) |
}); | ||
|
||
describe('escape and unescape are inverses of each other', () => { | ||
for (let i = 0; i < 100; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: now moot, see #4047 (comment).
I'm not sure why I chose 100 here. We can't (or I can't!) say with what probability this will flake, which I suppose was a prerequisite for using a randomized algorithm in src/utils/__tests__/backoffMachine-test.js
, under 'timeouts are random from zero to 100ms, 200ms, 400ms, 800ms...'.
src/boot/SerializeEscaped.js
Outdated
} | ||
}; | ||
|
||
const escapeKeys = makeObjectKeyRenamer(escape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: this and my comment that follows it are now moot; see #4047 (comment).
I wouldn't insist on using a makeObjectKeyRenamer
factory; we could just have a renameObjectKeys(obj, renameFn)
and pass escape
or unescape
as renameFn
every time. But, since escapeKeys
and unescapeKeys
are used several times, I think it's more readable that they exist as separate functions, with those names, and a factory function seemed a natural way to accomplish that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, since
escapeKeys
andunescapeKeys
are used several times
— er, no they're not! 🙃 I must have been thinking of a previous iteration I never submitted for review.
05e003b
to
f7ccec3
Compare
OK, I've pushed my branch, using zulip/remotedev-serialize@5f9f759a4 as Greg put together yesterday (thanks again!), instead of the wrapper I wrote with the |
Oh, and this is still a draft because of f7ccec3 [DO NOT MERGE] Proof-of-concept for state.narrows being Immutable.Map. at the tip; see note at #4047 (comment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
Small comments below; otherwise I think this is all ready to merge (apart from the demo commit at the tip.)
Keys with value `undefined` don't survive round-trips into JSON, and the zulipVersion will be persisted in JSON. Use `null`, to maintain that Account objects always have the same number of keys, and we don't have to worry about the distinction between an optional `string` property and a property that's `string | void`. The same strategy is in use for the ackedPushToken. As a side-effect, it's much simpler to distinguish between the second and third behaviors of `makeAccount` in the list below, for tests. We don't need the `shouldHaveZulipVersion` param anymore: 1. Supply our own valid string value for the `zulipVersion` field in the returned Account object 2. Make the returned Account object have an empty `zulipVersion` 3. Make the returned Account object have a default, valid string value for `zulipVersion`
In zulip#2762 (never merged), we tried to use v5 of redux-persist and found that it had breaking changes that unacceptably reduced our flexibility with migrations, so we decided never to use it, and instead to vendor it and make our own changes as needed. More background on the limitations of v5 is at https://chat.zulip.org/#narrow/stream/243-mobile-team/subject/redux-persist.20v5/near/624465. Its new way of doing migrations means you can't write a migration that is aware of the entire state tree, unless you limit yourself to storing the entire tree in a single key-value entry. We need to be able to split it up for performance (avoiding a rewrite to disk of everything each time a small change is made) and because each key-value pair is has a hard limit of 2MB, after which the data is just dropped. Even after compression, our entire state tree would be within a factor of 4 of this limit, which is unacceptable. More generally, the redux-persist maintainers seem not to have deep familiarity with apps that depend on correct, effective migrations to persist data across updates. Our immediate need is to incorporate the changes from rt2zz/redux-persist@a94f29139 and rt2zz/redux-persist@13a0b600b (not released until 6.0.0) so that we can pass custom serialize/deserialize functions for replace/revive logic so we can store non-serializable data (such as a ZulipVersion) in Redux. PR zulip#2915 was opened a while ago to vendor redux-persist, but it grew stale and gathered merge conflicts. First, ensure that the version of redux-persist in node_modules is 4.10.2, which is what we'd been using. Then, copy the contents of `node_modules/redux-persist/src` to `src/third/redux-persist` (this contains all the needed source code) and copy `node_modules/redux-persist/es/index.js.flow` there too; it has the needed type definitions. Then, address redux-persist's two dependencies that weren't listed in our own package.json: - `json-stringify-safe`: Add to package.json. - `lodash`: The only bit of lodash that it uses is `isPlainObject`, so add `lodash.isplainobject` to the package.json and make this change, the only change to our vendored copy's source code: In `src/third/redux-persist/utils/isStatePlainEnough.js: ```diff - import isPlainObject from 'lodash/isPlainObject' + import isPlainObject from 'lodash.isplainobject' ``` Then run `yarn`. Rather than fixing all the issues ESLint and Prettier flagged, tell those tools to ignore them -- we didn't introduce them, and we'd rather not accidentally introduce bugs by even these small changes. Some of the output indicated potential code-quality issues (see note at `no-use-before-define` in .eslintrc.yaml), some, just differing style choices (e.g., no semicolons), and others, just the fact that redux-persist wasn't using the very latest JavaScript features.
See the previous commit for motivation.
Incorporate the changes from rt2zz/redux-persist@a94f29139 and rt2zz/redux-persist@13a0b600b so that we can pass custom serialize/deserialize functions for replace/revive logic, to eventually store non-serializable data (such as a ZulipVersion; that's this series) in Redux.
In an upcoming commit, we will set up a system of "replacer" and "reviver" transforms so that we can store non-serializable objects, such as ZulipVersion instances, in Redux, and have a serializable format be persisted to disk. This same system will be used for non-serializable objects besides ZulipVersion instances; see zulip#3949 and zulip#3950, which will be much easier after this series. We use our own fork of `remotedev-serialize`, at 5f9f759a4, which Greg authored atop `zalmoxisus/remotedev-serialize.git` at 0.1.8, because it fixes an important bug. Also, the project seems unmaintained, and the maintainer hasn't been active anywhere on GitHub. The bug is that data fails to round-trip when it contains a "__serializedType__" key, which is used by the reviver to identify and reverse a particular replace transform. Such data, if present, could trigger hidden functionality. Also, explain the process of getting the `remotedev-serialize` libdef set up, in docs/howto/libdefs.md.
`null` is passed as the `refs` param in an example in the official docs: https://github.com/zalmoxisus/remotedev-serialize#passing-custom-serialization-functions
This sets up how we'd like to plug in custom replacers and revivers, but doesn't add any yet. We can use the '__serializedType__' string freely because we're using `zulip/remotedev-serialize@5f9f759a4`, which contains a fix for proper round-tripping of that string, to avoid collisions when that string occurs as a key representing data, and the unplanned behavior such collisions would cause.
No ZulipVersion instances are stored in Redux in this commit; this just handles the replace/revive logic for when they are, in an upcoming commit.
Storing the parsed ZulipVersion instance in Redux makes it easier to work with, so we don't have to think about constructing a new instance as part of every server feature check. The one tricky bit is that our Redux state is serialized by redux-persist so that it can be stored in ZulipAsyncStorage, but the ZulipVersion instance itself is not serializable. Thankfully, it can be fully represented by its raw version string, which is serializable. So, on entry into ZulipAsyncStorage, we just have to convert the ZulipVersion instance into the raw version string, with .raw() ("replace" it), and on exit, make a new ZulipVersion instance, by calling the constructor with that raw version string ("revive" it). We considered two main strategies for locating the bit of state to be transformed (in this case, the `zulipVersion` field, which stores a ZulipVersion value, in elements of the `state.accounts` array): 1) The "outside-in" strategy, of identifying the value by the extrinsic property of where it is; i.e., that it's at the field named 'zulipVersion' on elements of the `state.accounts` array, or 2) The "inside-out" strategy, of identifying the value by its intrinsic property of being `instanceof ZulipVersion`. We chose the latter. When we work on zulip#3950, converting our object-as-map fields to use Map or Immutable.Map, we'll be making similar, sweeping changes to many different parts of the state, so it's most natural for the bulk of our logic to be independent of the location in state, and focus instead on the type of non-serializable object being stored. This approach conveniently clears the path to use ImmutableJS for additional reasons discussed later. An exploration of the "outside-in" approach is left as the un-merged PR zulip#3952. The main advantage of that approach is that we wouldn't need to write migrations, but it had the significant drawback of requiring a lot of code (for locating the bit of state to be transformed) that was 1) boilerplate, and 2) difficult to get fully type-checked, which is a bad combination. These concerns were not sufficiently alleviated by the complexity of that boilerplate being bounded by the complexity of the reducer(s) in charge of the corresponding part(s) of the state: it's better just to not have to write the boilerplate. There's nothing stopping us from mixing the two approaches, in future, but it would be nice to stick to one as far as possible, for simplicity. For the "inside-out" implementation, we use `remotedev-serialize` (added in a recent commit) [1], with custom replacer and reviver functions, defined in the previous commit. As Greg explains at zulip#3952 (comment): """ * at save time, values that are `instanceof ZulipVersion` get turned into objects like `{__serializedType__: 'ZulipVersion', data: '2.0.0-rc1'}`, before being JSON-serialized * at load time, after JSON deserialization, values that are objects with a `__serializedType__` property get transformed in a way identified by that property's value: in particular if `__serializedType__: 'ZulipVersion'`, then `data` is fed to `new ZulipVersion` """ Since we've never dealt with the Zulip version in this "live" (i.e., non-serializable) form, we have to write a migration. This was straightforward, but we MUST remember to write this kind of migration in the future. For making a value "live", where it wasn't before, the migration needs to: 1) As input, take the previous shape of the data. Don't confuse this with the *current* way of storing the "dead" shape. Just like any other migration, the previous shape is the input. 2) As output, give the "live" form of the data. Once it's in Redux, the replacer will take care of persisting it in the correct "dead" form. As mentioned above, this approach further clears the way for ImmutableJS; zulip#3949 and zulip#3950 will be much easier after this. I've neglected to mention that the primary purpose of `remotedev-serialize` is to replace and revive ImmutableJS objects! Its "default" replacers and revivers do this. The custom replacer/reviver functions we use here are a nice feature provided on the side. (We added `immutable` as a dependency in a recent commit, since `remotedev-serialize` depends on it.) Fixes: zulip#3951 [1]: We actually use our own fork, `zulip/remotedev-serialize@5f9f759a4`, which fixes a bug where "__serializedType__" keys in data fail to round-trip properly.
It may feel smoother not to have to call the constructor when doing server version checks.
f7ccec3
to
cf728a8
Compare
OK! Just pushed those changes. |
cf728a8
to
8e7c7b8
Compare
Merged! Thanks again. For the workflow: I think after seeing this experiment, what I'd like to do in a similar situation in the future is:
This way the usual PR workflows can behave as normal for the main branch. Does that make sense? We can talk more in #mobile-team, too. |
Thank you! Glad to see a clearer way to replacing our objects-as-map with good data structures, etc.
Sure, that makes sense! |
[don't merge] redux: Store parsed ZulipVersion in Redux state, instead of string.
This commit should be read for its contrast with its counterpart,
with the same title, at #3952.
I've tried to make this PR minimally different from that one, so a
git diff
orgit range-diff
might be informative.Here, we use the "inside-out" approach described there:
As expected, it requires writing a (straightforward) migration.
Unfortunately, I haven't gotten far enough to actually see this
working, but I expect that our code in a working version won't need
to be drastically different, so I'm submitting this for comparison
anyhow. There appear to be some inaccuracies in the Flow libdef I
cobbled together from the TypeScript with
flowgen
andflow-typed create-stub
, but these shouldn't be difficult to fix.The current blocker, which blocks any use of ImmutableJS with Redux
(an integration this PR depends on, as I noted at the bottom of
#3951 (comment)),
is that we're stuck with an old version of
redux-persist
.Not until rt2zz/redux-persist@a94f29139, released in v6.0.0 (we're
on ^4.10.2), do we get the ability to supply our own
serialize/deserialize functions in the config passed to
persistStore
.The only other customization for replace/revive transforms that
redux-persist
allows is via a Transform created bycreateTransform
, as we use in #3952. But its contract requiresthat you supply functions that translate between serializable and
non-serializable objects, without allowing us to take responsibility
for the actual serialization. It would be a silly exercise, but it
might be possible, to make it work with
createTransform
with anextra round-trip into JSAN [sic]. JSAN is what
remotedev-serialize
serializes into, and it claims to be more flexible than JSON.
I believe #2915 is our latest written knowledge about the drawbacks
of
redux-persist
and why it ceases to be useful for us at version5 and above, and it looks like we have a plan to vendor it.