Skip to content
This repository has been archived by the owner on Sep 20, 2024. It is now read-only.

Refactor Integrate Asset #2898

Merged
merged 79 commits into from
Jul 13, 2022

Conversation

BigRoy
Copy link
Collaborator

@BigRoy BigRoy commented Mar 16, 2022

Brief description

Draft attempt at refactoring/rewriting some logic of the Integrator. My first step was to try and figure out the pieces that are currently putting it together to really figure out all logic that's there and clean up what really was redundant.

Note: The current state it not tested and 100% sure won't work as is. It's a WIP to initiate first discussion - and I'd be happy to revamp a lot more. But I hope to find some time tomorrow to improve readability of the code further so it's easier to understand what we are actually altering. A lot of it might also be "commenting" specific areas of the code.

Related topics

@BigRoy
Copy link
Collaborator Author

BigRoy commented Mar 16, 2022

Notable changes

Wanted to mark these changes specifically since they were obvious changes that I felt could break things. So naming them here to keep an eye out. But the refactor is so massive that it's likely a ton more.


Removed Slate exception:

            # exception for slate workflow
            if index_frame_start and "slate" in instance.data["families"]:
                index_frame_start -= 1

This "fix" should be moved to outside of the integrator itself.


Removed this logic for 'single file' processing for representations.
It will now always use name of the representation

            template_data["representation"] = repre['ext']

I believe this behavior actually was a bug because other areas of the code believed it was actually using name instead?


def register_subset(self, instance):
# todo: rely less on self.prepare_anatomy to create this value
asset = instance.data.get("assetEntity") # <- from prepare_anatomy :(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually quite amazed that after all of this refactoring on this very early draft this is really the only nitpick the hound has this time. 🥇

destination_indexes = list(src_collection.indexes)
destination_padding = len(get_first_frame_padded(src_collection))
if repre.get("frameStart") is not None:
index_frame_start = int(repre.get("frameStart"))

# TODO use frame padding from right template group
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkolar do you happen to know what the "right template group" would be? :) This is an old to do comment and I'm wondering if I could easily resolve this as I go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment it's always getting frame_padding from render template. However different families can use their own templates for the publishing, hence there is a chance for this to be wrong. In practice, we never saw it happen though. Usually, studios use the same padding across the board.

For completeness, this is where you can choose what template will be used for publishing in various situations project_settings/global/publish/IntegrateAssetNew/template_name_profiles

Copy link
Collaborator Author

@BigRoy BigRoy Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - I can see the entries there - and I also see the Templates in Anatomy however it's not clear how using a custom template name I'd overwrite the frame padding value instead? I'll leave the todo for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think we planned to allow defining custom padding in each template category, but eventually removed it to reduce complexity. Looks like it would be safe to remove this TODO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use anatomy.templates[template_name]["frame_padding"].

Copy link
Collaborator Author

@BigRoy BigRoy Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use anatomy.templates[template_name]["frame_padding"].

Not tested yet, but implemented with 3e095bc

"parent": version["_id"],
"name": repre['name'],
"data": data,
"dependencies": instance.data.get("dependencies", "").split(),
Copy link
Collaborator Author

@BigRoy BigRoy Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone confirm for me that this dependencies is unused/deprecated?

I believe it originated from the RigLoader in Maya which stored 'dependencies' on load. Oddly enough it's storing just the representation id of the actual rig that was loaded (which seems odd) - the value also didn't get updated on updates of the rigs.

It seems extra weird that the value on the instance wasn't just a list to begin with but expecting a concatenated string?

Anyway, with 'input dependencies' the source rig representation for a published alembic would already be retrievable and thus also more reliable than this old unused logic. @mkolar Correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I believe this is is not used anywhere anymore

@BigRoy
Copy link
Collaborator Author

BigRoy commented Mar 17, 2022

The current refactored state as of 56bcd8c now can actually integrate e.g. a simple Maya Look (with a texture) and a Maya Pointcache. So it's somewhat functional as before.

Likely Hero Versions are broken. Likely Reviews as well. Likely site sync functionality isn't completely functional either.

Also haven't tested what happens if a publish actually fails and whether the file transaction rolls back correctly. But separate "tests" can be implemented to define our expected behavior explicitly and adapt the FileTransaction class to behave as we'd want.

Nonetheless would love input at this stage @mkolar @iLLiCiTiT @m-u-r-p-h-y as to what I should continue to do from here.
I've mostly focused on breaking the code down into more understandable 'parts'.

I've noticed however that if representations fails that the Subset and Version are registered nonetheless. Would be extra nice if we could somehow even merge those transactions together so that we can write all of those even closer together in the code.

@BigRoy
Copy link
Collaborator Author

BigRoy commented Mar 23, 2022

With the current state being "barebones functional and highly untested" with "more than likely broken in major areas" I do feel it's a good moment to digest what the integrator currently does (and why - if I can) and note some of the changes made (and why).

What did the Integrator do that I have removed or have moved into its own

  • Separated registering "subset", "version", "representation" logic more into isolation where possible.
  • It previously prepared anatomy data which I mostly removed with this commit due to it being collected by CollectAnatomyContextData and CollectAnatomyInstanceData. It should not be the responsibility of the integrator.
  • I've moved the "site sync" functionality in such a way that it's more encapsulated and we can discuss how to best clean that up. The logic now comes down to defining the sites once (because they got cached originally anyway)
    • The available "sites" logic is here and only used by a single call to compute_resource_sync_sites here - other than that it's quite "contained".
    • Then those sites are used to prepare the file info which is the data Site Sync uses.

What does the Integrator currently do?

It runs per instance and for each instance it does this:

  1. Register the documents in the database
    1. Register the subset
    2. Register the version
    3. Register the representation - still quite messy
  2. Prepare data for template formatting to define publish file destinations
    1. To be moved, see comment in code: Quite some template data preparation for subset group
    2. Handle the template formatting to published files - still quite complex
  3. Integrate the files safely

Why still the complexity?

The complexity comes mostly from the fact that:


Is the Hound failing to run or how come I'm not making mistakes? I'm a bit worried the dog is sleeping and bites me later. 🔥

@BigRoy BigRoy added the type: refactor Structural changes not affecting functionality label Mar 24, 2022
@BigRoy
Copy link
Collaborator Author

BigRoy commented Mar 24, 2022

Few questions!

  1. Is it correct that the "archived_representation" type during the Integrator only got created 'during integration' and then directly deleted. Originally it would archive representations and then before registering the new representations delete all archived ones (existing_repres defined here). So if instead I'd turn the registering into a single Bulk Write I wouldn't need the intermediate "archived_representation". Correct?

  2. Can I just move the "intent" anatomy data to CollectAnatomyContextData? Intent is defined at the beginning, no? Since it's set directly in Pyblish Pype UI. Or would there be a reason not to?

  3. Could we move _get_subset_group into a global Collector of its own to just store instance.data["subsetGroup"]?

  4. Is it correct that some publishes might not have a rootless path stored due to instance.data["source"] being taken directly if it exists? Looking over the code base e.g. After Effects expects instance.data["source"] set in a validator and Maya VRayScene collector sets it and Maya Collect Render.

  5. Glancing over the original integrator code it looks to me as if site sync would not work for published Look textures - as those files not part of a representation do not get their file info sorted for site sync. For representation files they were originally collected here but I wasn't entirely sure how files directly in instance.data["transfers"] would be processed too. I'm pretty sure it's broken in the current refactored state but I'm also doubting whether it worked in the original. Anyone can confirm that published textures in a Maya look publish in Site Sync used to work as expected? Likely I'm misreading the original code somewhere due to how it overrides the global instance.data["transfers"]. E.g. maybe originally both representations would have the "resources" files added uniquely as site sync entries? If it does work I'd love to know whether this is the case that the entries are duplicated.

  6. Shouldn't version data fps rely on instance.data["fps"] over context.data["fps"]? Originally context data would override instance data. Is that as expected?

dst_collection.indexes.update(set(destination_indexes))
dst_collection.padding = destination_padding
assert len(src_collection.indexes) == \
len(dst_collection.indexes), "This is a bug"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAY! The hound works - and I was expecting is to bark about this line! 🐕‍🦺 ❤️

@mkolar
Copy link
Member

mkolar commented Apr 28, 2022

Just noting that this will need a merge of develop, that is removing avalon-core and some magic to make it compliant with that PR. Unfortunately the last step interfered with this.

# Conflicts:
#	openpype/plugins/publish/integrate_new.py
@BigRoy
Copy link
Collaborator Author

BigRoy commented May 2, 2022

Just noting that this will need a merge of develop, that is removing avalon-core and some magic to make it compliant with that PR. Unfortunately the last step interfered with this.

@mkolar Should be fixed.

Please let me know what you need on this PR from my end to make this testable by OP team.

@antirotor
Copy link
Member

Please let me know what you need on this PR from my end to make this testable by OP team.

I'll try to test it this week per battle plan. I'll let you know what happened :)

@BigRoy
Copy link
Collaborator Author

BigRoy commented May 11, 2022

Looking forward to hearing about the test runs!

@antirotor
Copy link
Member

antirotor commented May 13, 2022

So far so good!

I've tested:

Maya 2022

  • Animation
  • Ass StandIn
  • Assembly
  • Camera
  • Camera Rig
  • Layout
  • Look
  • Maya Scene
  • Model
  • Multiverse USD
  • Multiverse USD Composition
  • Multiverse USD Override
  • Point Cache
  • Redshift Proxy
  • Render
  • Render Setup Preset
  • Review
  • Rig
  • Set Dress
  • Unreal Skeletal Mesh
  • Unreal Static Mesh
  • VRay Proxy
  • VRay Scene
  • Xgen Interactive
  • Yeti Cache
  • Yeti Rig

Blender 3.0

  • Action
  • Animation
  • Camera
  • Layout
  • Model
  • Point Cache
  • Rig

Houdini 19

  • Camera
  • Arnold ASS
  • Composite (Image Sequence)
  • Houdini Digital Asset (Hda)
  • Point Cache
  • Redshift ROP
  • USD
  • USD Render
  • VDB Cache

@BigRoy
Copy link
Collaborator Author

BigRoy commented May 16, 2022

So far so good!

Nice! We should actually add the following to testing too:

Preferably each should be tested for publishes with a single file, multiple files, sequences and publishes with resources like Maya lookdev

  • Hero versions
  • Site Sync
  • A failing publish (e.g. Failing Extractor - does the Integrator fail nicely?)
  • Farm Publishing (e.g. Maya Renders with Deadline)
  • A publish that writes into an existing version (e.g. Maya Render re-publish with different frame range -> customize frame range in Deadline)
  • A Python 2 host

@antirotor
Copy link
Member

Would you be so kind to split it in separate integrator so we can slowly add hosts there for better testing? Because I have a feeling that for Maya, this can be used right away (for example).

Comment on lines 95 to 102
# todo: some code actually expects the dict itself and others doesn't
# question: what should it be?
intent = context.data.get("intent")
if intent and isinstance(intent, dict):
intent = intent.get("value")
if intent:
context_data["intent"] = intent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intent should be a dictionary with "value" and "label", to be able tell if you want use value or label of the intent in templates.

Suggested change
# todo: some code actually expects the dict itself and others doesn't
# question: what should it be?
intent = context.data.get("intent")
if intent and isinstance(intent, dict):
intent = intent.get("value")
if intent:
context_data["intent"] = intent

Copy link
Collaborator Author

@BigRoy BigRoy Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed with recent commit. However, this logic with the if statement still feels like we should somehow correct other logic elsewhere so that we know it's always a dict to begin with.

@mkolar
Copy link
Member

mkolar commented Jul 5, 2022

@BigRoy could you please separate the integrator into another file? that way we could actually merge it and start moving host by host to it. I've tested quite a bit in maya and the chances are if it works there it's good in other places, However, to be able to deply it into production I think we should expose the family lists of both the existing and the re-worked integrator to the settings temporarily. That way if something goes wrong with the new one, we can easily reassign a family to the old one and unblock production situation.

@BigRoy
Copy link
Collaborator Author

BigRoy commented Jul 5, 2022

Sorry, I've been so busy with day to day production that this keeps slipping through. Beginning of new day now, let's start! :) On it.

@BigRoy BigRoy requested a review from mkolar July 5, 2022 08:35
@BigRoy
Copy link
Collaborator Author

BigRoy commented Jul 5, 2022

@mkolar Could you check if I missed anything crucial?

@mkolar
Copy link
Member

mkolar commented Jul 5, 2022

Thx. gonna try now

@mkolar
Copy link
Member

mkolar commented Jul 13, 2022

@BigRoy This should probably still be addressed #2898 (comment)

  • adding the hosts and families into the settings so we can safely switch between the integrators. To move it along. I'll merge this into an intermediate branch, where we can finish it off right away and get it inot develop for some long term testing.

@mkolar mkolar changed the base branch from develop to feature/refactor_integrator July 13, 2022 14:22
@mkolar mkolar merged commit f2ce000 into ynput:feature/refactor_integrator Jul 13, 2022
@mkolar mkolar mentioned this pull request Jul 18, 2022
@BigRoy BigRoy deleted the refactor_integrator branch March 20, 2024 15:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: refactor Structural changes not affecting functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants