-
Notifications
You must be signed in to change notification settings - Fork 2
Rewrite the WXR package to store data in IndexedDB #23
Conversation
1f013a1
to
4ef1ab6
Compare
@akirk: I haven't yet managed to address the final step (actually downloading the WXR file), but the rest of the PR is in a reviewable state, if you're able to take some time to look at it. |
For now I only did a high level review and it looks already great with tests covering a good range of cases. One thing I don't fully grasp is the streaming aspect. In my experience, the benefit of "Streaming X" is that it's possible to consume data before all of the input has been fed into the "Streaming X". Is this eventually the plan? I am not sure I am seeing this concept in the PR yet since there are |
Thanks for checking it out, @akirk! The streaming part of it is not focussed on the idea of "downloading the WXR while the data is still being gathered", though that's something which we could certainly explore. As you noted, there would likely be some gotchas to watch out for. The goal of streaming is to download the WXR file as its being generated, rather than generating the entire WXR file and then downloading it. This allows us to generate arbitrarily large WXR files, without running into memory limits. For WXR 1.2, this is a fairly academic distinction, since the files would very rarely be large enough to cause these problems. This feature will come into its own in WXR 2.0, when we're adding media files to the download. |
2058282
to
86b104d
Compare
Are you planning to add |
Sadly, I couldn't get the streaming working on Firefox, so I've switched over to using the webextensions downloads API: it has to generate the full WXR file, so we don't get the memory benefits of streaming, but we can leave that until we're generating WXR files big enough to cause such problems. This PR is far too big to be properly reviewable, but hopefully the test coverage makes up for it. 🙂 It's the kind of thing we're going to need to iterate on, so I'm not super concerned about it landing in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some weirdness resolving the dependencies and had to remove my node_modules directory and install it again.
In my review I manipulated the tests a little since right now they follow the "golden path." So for example I added the same author twice and it is put into the WXR twice. This is by design and I think it makes sense to not burden the WXR library with data integrity checks but we need to be clear about that which I referred to in my review comment about README.md.
|
||
wxr.addAuthor( { | ||
login: 'someone-else', | ||
} ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I add another author pento
it will be added to the WXR and result in a duplicate author.
@@ -10,4 +10,39 @@ Install the module | |||
npm install @wordpress/wxr --save-dev | |||
``` | |||
|
|||
## Usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should include a section about the "mission," i.e. creating a compatible WXR and not being responsible for data integrity checks.
I'm not sure why, but there have been some oddities with
Aye, there are certainly issues along these lines. Given that the library already does some type checking, I think it'd be good to handle some data integrity checks: fields like the user login, or the post slug, could have a I think there's likely to be value in improving these checks over time, too: more comprehensive integrity checks means better results for end users, as we're more likely to catch data weirdness before they get to importing their WXR into WordPress. |
The initial WXR package was hacked together quite quickly, for the purposes of providing a proof of concept. In order to allow us to grow quickly, it needs to be largely rewritten. This PR addresses the following:
TODO
Features
WXRDriver
class.Tests
Fixes #3.