Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java - Move method references for ParsableFactory to model types to reduce size #3152

Closed
papegaaij opened this issue Aug 15, 2023 · 14 comments
Closed
Assignees
Labels
enhancement New feature or request Java
Milestone

Comments

@papegaaij
Copy link
Contributor

Each unique method reference seems to add about 500 bytes to a class file. Using, for example, ODataError::createFromDiscriminatorValue in almost every request builder will cost a lot. Moving these to the model types allows for reuse from different request builder classes. A quick count showed that this might save as much as 5 to 10% on the total size of the jar.

@baywet
Copy link
Member

baywet commented Aug 15, 2023

I'm not sure I follow the change here, could you please provide examples?

@baywet baywet self-assigned this Aug 15, 2023
@baywet baywet added this to the Backlog milestone Aug 15, 2023
@baywet baywet added this to Kiota Aug 15, 2023
@github-project-automation github-project-automation bot moved this to Todo in Kiota Aug 15, 2023
@papegaaij
Copy link
Contributor Author

For ODataError, the change looks like this:
Add this method to ODataError:

public static com.microsoft.kiota.serialization.ParsableFactory<ODataError> factory() {
    return ODataError::createFromDiscriminatorValue;
}

and on all places where ODataError::createFromDiscriminatorValue was used, change it to ODataError.factory(), like this:

final HashMap<String, ParsableFactory<? extends Parsable>> errorMapping = new HashMap<String, ParsableFactory<? extends Parsable>>();
errorMapping.put("4XX", ODataError.factory());
errorMapping.put("5XX", ODataError.factory());

Do the same for all other createFromDiscriminatorValue methods.

I also tried reducing the size on the side of ODataError even more, by removing the createFromDiscriminatorValue and use this, but that didn't help much (only a few bytes):

public static com.microsoft.kiota.serialization.ParsableFactory<ODataError> factory() {
    return parseNode -> new ODataError();
}

@baywet
Copy link
Member

baywet commented Aug 15, 2023

So this

final HashMap<String, ParsableFactory<? extends Parsable>> errorMapping = new HashMap<String, ParsableFactory<? extends Parsable>>();
errorMapping.put("4XX", ODataError.factory());
errorMapping.put("5XX", ODataError.factory());

instead of

final HashMap<String, ParsableFactory<? extends Parsable>> errorMapping = new HashMap<String, ParsableFactory<? extends Parsable>>();
errorMapping.put("4XX", ODataError::createFromDiscriminatorValue);
errorMapping.put("5XX", ODataError::createFromDiscriminatorValue);

I'm not sure how that saves byte code since we're introducing a new method in between?
Do you know whether java compilation is adding an anonymous class or something similar for this kind of notation style? ODataError::createFromDiscriminatorValue (kind of like for anonymous classes being added for object initialization)

@papegaaij
Copy link
Contributor Author

The Java compiler adds what's called bootstrap methods for these method references. These methods use some very large signatures, which will end up in de constant pool of the class. A bootstrap method will be reused when the same method reference is used multiple times and some of the data is shared between multiple bootstrap methods, so the biggest gain is if you remove all method references from a class. It may even be beneficial to move all method references to a single class, but I haven't tried that.

To get an impression on what's in a class file, you can use javap -v. If you use that, you can also see that the nullable annotations take quite some space.

@papegaaij
Copy link
Contributor Author

papegaaij commented Aug 16, 2023

This is what those bootstrap methods actually look like in the byte code (as you can see, the signatures are quite large):

    BootstrapMethods:
      0: #335 REF_invokeStatic java/lang/invoke/LambdaMetafactory.metafactory:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;
        Method arguments:
          #342 (Lcom/microsoft/kiota/serialization/ParseNode;)Lcom/microsoft/kiota/serialization/Parsable;
          #344 REF_invokeStatic apisdk/models/odataerrors/ODataError.createFromDiscriminatorValue:(Lcom/microsoft/kiota/serialization/ParseNode;)Lapisdk/models/odataerrors/ODataError;
          #342 (Lcom/microsoft/kiota/serialization/ParseNode;)Lcom/microsoft/kiota/serialization/Parsable;
      1: #335 REF_invokeStatic java/lang/invoke/LambdaMetafactory.metafactory:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;
        Method arguments:
          #342 (Lcom/microsoft/kiota/serialization/ParseNode;)Lcom/microsoft/kiota/serialization/Parsable;
          #351 REF_invokeStatic apisdk/models/ServicePrincipal.createFromDiscriminatorValue:(Lcom/microsoft/kiota/serialization/ParseNode;)Lapisdk/models/ServicePrincipal;
          #357 (Lcom/microsoft/kiota/serialization/ParseNode;)Lapisdk/models/ServicePrincipal;

@baywet
Copy link
Member

baywet commented Aug 16, 2023

Thanks for the additional information.
Basing calculation on

Each unique method reference seems to add about 500 bytes to a class file

We have on average at least 3 of those per operation:

  • 2 error mappings
  • 1 return type mapping (that doesn't account for models that refer to other models)

Which we could reduce down to 1 of those per referenced type

  • 1 total for the error mappings
  • the savings depend on how many times is the model used as a return type, let's assume the worst case only once for simplicity

Which means we'd reduce by at least 2 (error mappings) x 20k operations x 500 bytes ~= 20MB for Graph v1. (currently sitting at 43MB). This sounds like a lot so I'd like to make sure:

  • I understood your findings correctly, please call out any flaw in the reasoning
  • The base hypothesis of 500 bytes per reference is correct, can you double check that aspect please

@papegaaij
Copy link
Contributor Author

You are mostly correct. Identical method references are shared, so most operations have 2 (1 for errors and 1 for the response). Multiple operations in 1 file still have 2.

Every unique method reference is indeed about 500 bytes. This very simple test compiles to 1332 bytes:

public class Test {
    public static void main(String[] args) throws Exception {
        java.util.List.of(1, 2, 3).stream().map(Object::toString).toList();
    }
}

When Object::toString is replaced with null, it compiles to 708 bytes, a difference of 524 bytes. Adding a second method reference to the first example adds 84 bytes. Note that the size of a method reference is greatly influenced by the size of the signature of the method involved and whether these signatures can be reused or not.

I think the most efficient way by far would be to generate 1 class with all factory methods for ParsableFactories for all types. So for every X.createFromDiscriminatorValue method, there will be an xFactory method in this class. This will look like this:

public class ParsableFactories {
    public static com.microsoft.kiota.serialization.ParsableFactory<X> xFactory() {
        return X::createFromDiscriminatorValue;
    }
}

I count 17265 unique (per file) references to ::createFromDiscriminatorValue over 9471 files and 3594 createFromDiscriminatorValue methods. Given the size of the signatures, I would say the first reference would add about 650 bytes and any subsequent reference about 250. This would give a total size of all references at this moment of 8104650 bytes.

After moving them to a single class, this class will be about 1.4MB (taking 400 bytes per method). The calls to the factory methods will probably take about 100 bytes each (per unique call), adding back another 1.7MB. The total savings will then be about 5MB in class size, which will be about 2MB in jar file size.

@papegaaij
Copy link
Contributor Author

One thing you need to take into account with this solution is the maximum size of the constant pool of a class: 65535 entries. A quick test shows that every method will probably take 9 entries, which will limit this to a maximum of about 7000 types. This is still twice as much as needed for graph, but something you might want to keep in the back of your mind.

@baywet
Copy link
Member

baywet commented Aug 17, 2023

Thanks for the detailed analysis here. This would be doable but would require introducing a new piece of infrastructure.
Let's see where we get once #3149 and #3150 are implemented and review then.

@baywet
Copy link
Member

baywet commented Aug 31, 2023

Update: the jar file only decreased 2MB with the changes we've implemented above
https://oss.sonatype.org/content/repositories/snapshots/com/microsoft/graph/microsoft-graph/6.0.0-SNAPSHOT/microsoft-graph-6.0.0-20230815.175622-33.jar
https://oss.sonatype.org/content/repositories/snapshots/com/microsoft/graph/microsoft-graph/6.0.0-SNAPSHOT/microsoft-graph-6.0.0-20230830.230429-35.jar

This is consistent with the early estimates you had provided, moving the factory references would most likely result in a 5-10% reduction ~2 to 4MB. While appreciable, this is not going to be enough to allow supporting both a sync and an async API.

@papegaaij
Copy link
Contributor Author

Good to hear this helped to reduce the size of the SDKs. I must say I was a bit surprised myself to find out that method references take so much space.

@baywet baywet added the Java label Nov 3, 2023
@baywet baywet removed this from the Backlog milestone Nov 3, 2023
@baywet baywet added this to the Kiota v1.9 milestone Nov 3, 2023
@baywet baywet assigned ramsessanchez and unassigned baywet Nov 3, 2023
@baywet baywet added enhancement New feature or request and removed question Needs: Attention 👋 labels Nov 3, 2023
@baywet
Copy link
Member

baywet commented Nov 3, 2023

Assigning to @ramsessanchez as we're looking at wrapping Java up.

To recap, in the models we want an additional method:

// notice the unique name to avoid conflict in inheritance
public static com.microsoft.kiota.serialization.ParsableFactory<ODataError> oDataErrorfactory() {
    return ODataError::createFromDiscriminatorValue;
}

and in the request builders we want the following changes

final HashMap<String, ParsableFactory<? extends Parsable>> errorMapping = new HashMap<String, ParsableFactory<? extends Parsable>>();
-errorMapping.put("4XX", ODataError::createFromDiscriminatorValue);
-errorMapping.put("5XX", ODataError::createFromDiscriminatorValue);
+errorMapping.put("4XX", ODataError.oDataErrorfactory());
+errorMapping.put("5XX", ODataError.oDataErrorfactory());
-return this.requestAdapter.sendAsync(requestInfo, MessageCollectionResponse::createFromDiscriminatorValue, errorMapping);
+return this.requestAdapter.sendAsync(requestInfo, MessageCollectionResponse.messageCollectionResponseFactory(), errorMapping);

And before this change gets merged in, we want to validate we achieved a size decrease of the jar (should be about 2MB)

@baywet
Copy link
Member

baywet commented Nov 3, 2023

Also for reference, the previous version of the Graph SDK (different generator), is only 11MB. This new version supports much more API surface and pattern of course, and we expect it to get smaller with that change as well as because we'll be stripping the completable futures.

@baywet
Copy link
Member

baywet commented Dec 7, 2023

With the recent code reduction changes we've made, we've been able to reduce the full service library from 111MB down to 46MB. (and a few hundred operations were added during that time, plus new capabilities)
While 46MB is not great for mobile scenarios, it's acceptable for server side scenarios.
Also, people who care about the size, can perfectly generate their own client with only the endpoints they need, which will reduce the jar further down to 1MB or lower.
This was initially suggested in a scenario we'd provide both a sync and an async API, we've now decided on sync only.
Closing.

@baywet baywet closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in Kiota Dec 7, 2023
@baywet baywet modified the milestones: Kiota v1.10, Backlog Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Java
Projects
Archived in project
Development

No branches or pull requests

3 participants