Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements collections of bundles. #1702

Merged
merged 9 commits into from
Aug 13, 2024
Merged

Implements collections of bundles. #1702

merged 9 commits into from
Aug 13, 2024

Conversation

cmnbroad
Copy link
Collaborator

Also contains some commits with minor updates to take advantage of Java 17 and Java text blocks.

@cmnbroad cmnbroad mentioned this pull request Jan 23, 2024
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few little issues.

I recommend we change from returning bundle Collection to bundle List. It's rare we actually want an unsorted bag of file names.

@@ -38,7 +38,9 @@
public class Bundle implements Iterable<BundleResource>, Serializable {
private static final long serialVersionUID = 1L;

private final Map<String, BundleResource> resources = new LinkedHashMap<>();
// don't use LinkedHashMap here; using HashMap resolves unnatural resource ordering issues that arise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow what's going on here when you have a LinkedHashMap. Shouldn't serializing a linked hashmap deserialize it to the same map?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I was having problems when roundtripping these through JSON, but I can no longer reproduce the issue, so reverting to our beloved LinkedHashMap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long Live LinkedHashMap!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I realized what the issue was here. JSONObject internally uses a HashMap (implying, I think, that JSON doesn't preserve the serialized order of JSON attributes), so when you roundtrip through JSON, the iteration order from JSON is based on the HashMap order. If we use LinkedHashMap in Bundle, then the order after a roundtrip gets scrambled, and tests fail (but only for some cases because sometimes the roundtrip order matches and sometimes it differs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahah, that makes sense. Seems like a weird decision to not maintain internal order, but it's good to know about. Cana we change our tests to use an order independent comparison?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an resource-order-independent equals method for use in the tests.

public static final String JSON_SCHEMA_NAME = "htsbundle";
public static final String JSON_SCHEMA_VERSION = "0.1.0"; // TODO: bump this to 1.0.0

final private static Set<String> TOP_LEVEL_PROPERTIES = Collections.unmodifiableSet(
new HashSet<String>() {
new HashSet<>() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these static declarations be migrated to use Set.of() instead of these anonymous classes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, thats an improvement. Done.

final Collection<Bundle> bundles = toBundleCollection(jsonString, ioPathConstructor);
if (bundles.size() > 1) {
throw new IllegalArgumentException(
String.format("A JSON string with more than one bundle was provided but only a single Bundle is allowed",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format string is missing the template variables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, fixed.

}
}
if (bundles.size() < 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicky but I isEmpty might be better

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

final String primaryContentType = getRequiredPropertyAsString(jsonObject, JSON_PROPERTY_PRIMARY);
final Collection<BundleResource> bundleResources = toBundleResources(jsonObject, ioPathConstructor);
if (bundleResources.isEmpty()) {
LOG.warn("Empty resource bundle found in: ", jsonObject.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow empty bundles at all?

Copy link
Collaborator Author

@cmnbroad cmnbroad Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we already don't. I'll remove this (the Bundle constructor will throw anyway), and add a negative test to BundleTest to demonstrate that.

return new IOPathResource(
ioPathConstructor.apply(getRequiredPropertyAsString(jsonObject, JSON_PROPERTY_PATH)),
contentType,
format == null ? null : format);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this ternary is unnecessary since you're not dereferencing format here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

"}\n",
BundleResourceType.ALIGNED_READS,
Arrays.asList(BundleResourceTestData.readsWithFormat)
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is SO much nicer.

@@ -30,11 +32,12 @@ public class BundleJSON {
public static final String JSON_PROPERTY_PRIMARY = "primary";
public static final String JSON_PROPERTY_PATH = "path";
public static final String JSON_PROPERTY_FORMAT = "format";

public static final String JSON_SCHEMA_NAME = "htsbundle";
public static final String JSON_SCHEMA_VERSION = "0.1.0"; // TODO: bump this to 1.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 0.2.0 now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update the schema in a separate PR when I move the bundle classes out of the beta package.

* @return Collection<Bundle>
* @param <T> IOPath-derived class to use for IOPathResources
*/
public static <T extends IOPath> Collection<Bundle> toBundleCollection(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want this to be Collection and not something with an order? I would think List might be better? The input order is usually important in some way, changing the encounter order can often change either errors generated or floating point aggregations even if it's something that seems like it should be file order agnostic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Lists.

@cmnbroad cmnbroad force-pushed the cn_bundle_updates branch 2 times, most recently from 437912b to 1be2b85 Compare April 8, 2024 21:02
@cmnbroad
Copy link
Collaborator Author

@lbergelson I updated the names and organization of the bundle content type strings in this, which is why there are now so many changes.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad Looks good to me. I'm not 100% sure why you want to decouple the content type names from the enums but that seems fine.

@cmnbroad cmnbroad merged commit 204a0db into master Aug 13, 2024
4 checks passed
@cmnbroad cmnbroad deleted the cn_bundle_updates branch August 13, 2024 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants