Skip to content

Conversation

@pfefferle
Copy link
Member

While debugging the Outbox-Scheduler, I realized that the Announce Outbox Items had the wrong _activitypub_object_id. It was the Activity ID instead of the original Object ID of the post/comment/...

This PR checks recursively for the deepest nested ID possible. This way we always have the Object ID which is needed to invalidate Outbox-Activities.

Proposed changes:

  • Properly search for the ID in nested Activities/Objects

Other information:

  • Have you written new tests for your changes, if applicable?

Testing instructions:

  • Write a new Post
  • Check the Outbox-Items

Before the PR, the Announce Activity would use the ID of the Create Activity instead of the Object ID of the Post.

Changelog entry

  • Automatically create a changelog entry from the details below.
Changelog Entry Details

Significance

  • Patch
  • Minor
  • Major

Type

  • Added - for new features
  • Changed - for changes in existing functionality
  • Deprecated - for soon-to-be removed features
  • Removed - for now removed features
  • Fixed - for any bug fixes
  • Security - in case of vulnerabilities

Message

Support multiple layers of nested Outbox activities when searching for the Object ID.

@pfefferle pfefferle self-assigned this Mar 28, 2025
@pfefferle pfefferle added the [Type] Bug Something isn't working label Mar 28, 2025
@pfefferle pfefferle requested a review from a team March 28, 2025 08:59
pfefferle and others added 5 commits March 28, 2025 14:56
* Update with activities

* Remove comments as the function gets used recursively now

---------

Co-authored-by: Konstantin Obenland <obenland@gmx.de>
@pfefferle pfefferle requested a review from obenland March 31, 2025 08:17
@pfefferle pfefferle requested a review from a team March 31, 2025 14:55
@pfefferle
Copy link
Member Author

@obenland I merged your PR (thanks for the input) and improved the set_object function of the Activity.

@obenland
Copy link
Member

ade7adc and ccd8c2a feel unrelated to the original premise of this PR, unless I'm missing something?

I'm also not sure it's needed, necessarily? Is it used anywhere yet?
If we want class-specific initialization, why can't we call the the specific class directly instead of relying on Activity->set_object to handle that? Feels like that would provide more clarity.

@pfefferle
Copy link
Member Author

@obenland this is to ensure that an object is always an object and not accidentally an array. It is used by every init_from_array that means by a lot of transformers and by the dispatcher, to migrate the db data to a proper object tree. So if we do not fill the objects properly (Generic_Object does not transform anything on set_object) we would need the fallback I had before.

@pfefferle
Copy link
Member Author

There are still places that needs to be optimized, but I think this provides a solid foundation to ensure objects.

@obenland
Copy link
Member

this is to ensure that an object is always an object and not accidentally an array

Does this part not do that?

So if we do not fill the objects properly

Are there any instances where we don't do that? If there are, we should fix them specifically. Creating a big fallback for an eventuality, I think, would do us a disservice as it removes any confidence in data integrity in the future.

Objects are not spec'd to contain other objects, only activities can be nested (according to my current understanding of things, please correct me if I'm wrong). Let's make make sure we deal with Activities as appropriate so this is not needed.

@pfefferle
Copy link
Member Author

pfefferle commented Mar 31, 2025

This is true if you see only one level of inheritance... We currently have the tree of

Activity (Announce) > Activity (Create) > Object (Note)

The Generic_Object has no special handling for set_object, so the object might end up as an array. Things like this are very risky to produce artefacts like that: https://github.com/Automattic/wordpress-activitypub/pull/1515/files#diff-2b79b0caa05a137854d850f948bf52e99bf50ad9636c834bc300894dd0b49a5eR84

@pfefferle
Copy link
Member Author

Are there any instances where we don't do that? If there are, we should fix them specifically. Creating a big fallback for an eventuality, I think, would do us a disservice as it removes any confidence in data integrity in the future.

This is not eventually. Every Activity that comes through the inbox is an array and we need something to properly transform these, if we do not want to have array handlers as fallbacks everywhere.

@pfefferle
Copy link
Member Author

pfefferle commented Mar 31, 2025

This code example:

<?php

$test = array(
	'type' => 'Announce',
	'id' => 'https://example.com/test',
	'object' => array(
		'id' => 'https://example.com/test2',
		'type' => 'Create',
		'object' => array(
			'id' => 'https://example.com/test3',
			'type' => 'Note',
			'content' => 'Hello, world!',
		),
	)
);

$object = \Activitypub\Activity\Activity::init_from_array( $test );

echo '<pre>';
var_dump( $object );
echo '</pre>';

currently returns:

object(Activitypub\Activity\Activity)#1327 (41) {
  ["id":protected]=>
  string(24) "https://example.com/test"
  ["type":protected]=>
  string(8) "Announce"
  ["object":protected]=>
  object(Activitypub\Activity\Generic_Object)#1353 (3) {
    ["id":protected]=>
    string(25) "https://example.com/test2"
    ["type"]=>
    string(8) "Activity"
    ["object"]=>
    array(3) {
      ["id"]=>
      string(25) "https://example.com/test3"
      ["type"]=>
      string(4) "Note"
      ["content"]=>
      string(13) "Hello, world!"
    }
  }
}

So the nested object is still an array!

With the change the result is:

object(Activitypub\Activity\Activity)#1332 (41) {
  ["id":protected]=>
  string(24) "https://example.com/test"
  ["type":protected]=>
  string(8) "Announce"
  ["object":protected]=>
  object(Activitypub\Activity\Activity)#1358 (41) {
    ["id":protected]=>
    string(25) "https://example.com/test2"
    ["type":protected]=>
    string(6) "Create"
    ["object":protected]=>
    object(Activitypub\Activity\Base_Object)#1352 (35) {
      ["id":protected]=>
      string(25) "https://example.com/test3"
      ["type":protected]=>
      string(4) "Note"
      ["attachment":protected]=>
      NULL
      ["attributed_to":protected]=>
      NULL
      ["audience":protected]=>
      NULL
      ["content":protected]=>
      string(13) "Hello, world!"
    }
  }
}

Copy link
Contributor

@mattwiebe mattwiebe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of complexity! Could we not just run a migration on our Announces instead to fix the underlying issue?

@pfefferle
Copy link
Member Author

pfefferle commented Mar 31, 2025

@mattwiebe this is always an issue! this IS the migration we always have to do, because we get data through the inbox and have to transform it. This is not a backwards compatibility thing! This is to handle Activity-JSONs properly!

@pfefferle
Copy link
Member Author

pfefferle commented Mar 31, 2025

We use this mechanism quite often and if the result is not properly transformed, before we store it in the DB or if we do not process it properly by retrieving it from the DB, we will get issue when we strictly rely on objects:

https://github.com/search?q=repo%3AAutomattic%2Fwordpress-activitypub+%3A%3Ainit_from_array&type=code

@pfefferle
Copy link
Member Author

pfefferle commented Mar 31, 2025

Even Outbox::get_activity has this issue and does not properly nest the data from the db for the Announces of the Blog-User, and we are responsible for the whole path there!

Copy link
Member

@obenland obenland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the examples, that was helpful

@pfefferle pfefferle merged commit b022733 into trunk Mar 31, 2025
11 checks passed
@pfefferle pfefferle deleted the fix/outbox-object-id branch March 31, 2025 21:21
jsit added a commit to jsit/wordpress-activitypub that referenced this pull request Nov 1, 2025
* tag '5.6.1': (47 commits)
  Release 5.6.1 (Automattic#1542)
  Fix: Use specified time formatting for Outbox Activities (Automattic#1537)
  Fix: Default value for empty `movedTo` field (Automattic#1539)
  Fix: Post Interactions settings (Automattic#1540)
  Release 5.6.0 (Automattic#1534)
  lover significance (Automattic#1533)
  Scheduler: Don't federate Deletes of unfederated posts. (Automattic#1528)
  Move: use `$from` for object in `Move::externally` (Automattic#1531)
  Fix: Properly traverse the object hierarchy to get the ID (Automattic#1518)
  Mentions: Standardize around only displaying username (Automattic#1510)
  Tests: Reduce chance of time offsets (Automattic#1527)
  Import: Don't federate new attachments (Automattic#1526)
  Importer: Format import count number. (Automattic#1525)
  Move: Add a Mastodon importer (Automattic#1502)
  Move: use `$from` for object in `Move::internally` (Automattic#1516)
  Fix: Do not send self-replies (Automattic#1517)
  Fix: Delete Undone comments instead of trashing them (Automattic#1520)
  Dispatcher: Adhere to passed batch size. (Automattic#1514)
  Add interaction settings (Automattic#1395)
  Fix: Set proper default, to show Welcome screen by default (Automattic#1512)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants