Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performances problems fixes on big files with a lot of references on same big object #195

Merged
merged 2 commits into from
Dec 9, 2020

Conversation

paztis
Copy link
Contributor

@paztis paztis commented Oct 20, 2020

fixes #176.

add processedObject list to avoid recursion in already processed tree parts
add dereferencedCache to avoid to re-treat object already dereferenced

… parts

add dereferencedCache to avoid to  re-treat object already dereferenced
@paztis
Copy link
Contributor Author

paztis commented Oct 20, 2020

I was working on a file that takes 10s to be processed (11 millions of passes in the crawl function)
I add caches to avoid this, and now we only pas 35000 times inside

@paztis paztis mentioned this pull request Oct 20, 2020
@philsturgeon philsturgeon requested a review from P0lip October 20, 2020 19:39
@marbemac
Copy link
Contributor

Very nice - just curious, could the bundle method benefit from a similar caching mechanism?

@paztis
Copy link
Contributor Author

paztis commented Oct 21, 2020

I don't look at the bundle usage. I only use this one, but potentially yes.

@paztis
Copy link
Contributor Author

paztis commented Oct 21, 2020

the same dev will be to redo

@paztis
Copy link
Contributor Author

paztis commented Oct 26, 2020

@P0lip any updates on the review ?

@marbemac
Copy link
Contributor

He's on vacation until early November, we'll have time to check it out then!

@P0lip
Copy link
Member

P0lip commented Oct 28, 2020

Hey @paztis.
As Marc stated, I'm off until the end of this week.
I'll try to get back to you on Monday, next week (02 Nov)

@Eli-Black-Work
Copy link

Hey @paztis.
As Marc stated, I'm off until the end of this week.
I'll try to get back to you on Monday, next week (02 Nov)

Enjoy your vacation 🙂

Copy link
Member

@P0lip P0lip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to add some tests for the properties merging (line 125).
Looking pretty solid otherwise

lib/dereference.js Show resolved Hide resolved
}
return {
circular: cache.circular,
value: Object.assign({}, cache.value, extraKeys),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might screw the order of properties and override some of them.
I'd suggest we bail out to regular scenario, same as we do in case of storing the cache at line 176.
Alternatively, we could maintain 'before$RefKeys' and after$RefKeys and merge them in that order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it will set extraKeys after basic ones
Is this order important ?

I do it to support some unit tests where an object has a description, then a reference of it overrides the description.
The cache key is the same for both (reference is unique). But after dewreference, ths description in both versions is not the same.
You've got specific unit tests for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this order important ?

Yeah it might matter at times if you have the property with the same key - it'll get overriden, won't it?
IMHO the current implementation may have a minor bug here.
I went through tests and we don't seem to have anything that'd validate the scenario I mentioned.

The below scenario fails

  it("should preserve the order of properties", async () => {
    let parser = new $RefParser();
    const schema = await parser.dereference(
      // This file doesn't actually need to exist. But its path will be used to resolve external $refs
      path.abs("path/that/does/not/exist.yaml"),
      // This schema object does not contain any external $refs
      {
        foo: {
          minLength: 1,
        },
        bar: {
          $ref: "#/foo",
        },
        baz: {
          $ref: "#/foo",
          minLength: 4,
        },
      },
      // An options object MUST be passed (even if it's empty)
      {});

    expect(schema).to.deep.equal({
      foo: {
        minLength: 1
      },
      bar: {
        minLength: 1
      },
      baz: {
        minLength: 4
      },
    });
  });

An improved implementation could look as follows

  if (refKeys.length > 1) {
      const value = {};
      for (let key of refKeys) {
        if (key === "$ref") {
          Object.assign(value, cache.value);
        }
        else {
          value[key] = $ref[key];
        }
      }

      return {
        circular: cache.circular,
        value,
      };
    }

This is questionable, however, since, if I am not mistaken, until draft 2019-09 you couldn't place $refs to any other properties, but as of 2019-09 you can and IIRC they are merged similarly to YAML merge keys.
I took at look at the spec, but couldn't really find any information regarding the order of merging.
Perhaps @philsturgeon has a clue on this one?

lib/dereference.js Show resolved Hide resolved
Co-authored-by: Jakub Rożek <jakub@rozek.tech>
@steverice
Copy link

FWIW this resolves the same issue that I have with an OpenAPI schema containing many (hundreds) of refs.
@P0lip @philsturgeon is this on track to be merged soon?

@P0lip
Copy link
Member

P0lip commented Dec 9, 2020

@steverice there is one (minor) suggestion left to be addressed.
I think we can merge it as is and I could apply the comment later on. It's non-blocking.

@P0lip P0lip merged commit 13b0092 into APIDevTools:master Dec 9, 2020
@tahpot
Copy link

tahpot commented Dec 10, 2020

Thanks to all for the work on this. Fantastic!

@Eli-Black-Work
Copy link

@P0lip Can we get a new release on NPM? 🙂 (Beta would be fine)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance issue
6 participants