Skip to content
This repository was archived by the owner on Jul 16, 2025. It is now read-only.

feat!: add UUID to all messages #364

Merged
merged 1 commit into from
Jun 30, 2025
Merged

feat!: add UUID to all messages #364

merged 1 commit into from
Jun 30, 2025

Conversation

OskarStark
Copy link
Contributor

@OskarStark OskarStark commented Jun 29, 2025

Summary

  • Added unique identifiers (UUIDv7) to all message types
  • Each message now automatically gets a unique ID upon instantiation
  • Added comprehensive tests for the new ID functionality

Breaking Change 🚨

This is a breaking change as MessageInterface now requires the getId(): Uuid method to be implemented by all message classes.

Implementation Details

  • Added symfony/uid package dependency (^6.4 || ^7.1)
  • Added public readonly Uuid $id property to all message classes
  • IDs are generated automatically in constructors using Uuid::v7()
  • Added getId() method to all message implementations

Why UUID v7?

UUID v7 offers significant advantages over other UUID versions:

  • Time-ordered: Natural chronological sorting without additional timestamp fields
  • Millisecond precision: Captures creation time with high accuracy
  • Better database performance: Sequential nature improves B-tree index locality
  • Globally unique: No coordination needed between distributed systems
  • Extractable timestamp: Creation time can be retrieved from the ID itself

Practical Example

$message = new UserMessage(new Text('Hello'));
$timestamp = $message->getId()->getDateTime(); // Returns \DateTimeImmutable
echo $timestamp->format('Y-m-d H:i:s.u'); // e.g., "2025-06-29 23:45:12.123456"

Test Coverage

Added tests for each message type to ensure:

  • ID is properly generated and accessible
  • ID remains consistent for the same message instance
  • Different message instances have different IDs
  • Messages with identical content still receive unique IDs

Closes #77
Closes #344

🤖 Generated with Claude Code

@OskarStark OskarStark changed the title feat\!: add unique ID to all messages feat!: add UUID to all messages Jun 29, 2025
@OskarStark OskarStark force-pushed the feat/message-uid-clean branch from 7f5b2f8 to 54c82db Compare June 29, 2025 21:23
@OskarStark OskarStark added the BC BREAK Backwards compatibility break label Jun 29, 2025
@OskarStark OskarStark force-pushed the feat/message-uid-clean branch from b8c85bd to 5191313 Compare June 29, 2025 21:26
@OskarStark
Copy link
Contributor Author

OskarStark commented Jun 30, 2025

@chr-hertel I have a question regarding the normalized message format:

Currently, the normalizers include the UUID as an id field in the serialized format. In our project, we also need the timestamp extracted from the UUID v7 in the normalized data.

Should we:

  1. Add the timestamp directly to the normalizers in this library (e.g., 'timestamp' => $data->getId()->getDateTime()->format('c'))
  2. Or keep the normalizers minimal and let projects that need the timestamp decorate the normalizers?

I'm leaning towards option 2 to keep the library focused, but wanted to get your thoughts on this, knowing that I need the timestamp in my project 😄

Example of what option 1 would look like:

public function normalize(mixed $data, ?string $format = null, array $context = []): array
{
    return [
        'role' => $data->getRole()->value,
        'content' => $data->content,
        'id' => $data->getId()->toRfc4122(),
        'timestamp' => $data->getId()->getDateTime()->format('c'), // ISO 8601 format
    ];
}

What do you think?

cc @DZunke @llupa

@OskarStark OskarStark force-pushed the feat/message-uid-clean branch 4 times, most recently from 5b00f02 to dea59a2 Compare June 30, 2025 10:00
@OskarStark OskarStark requested a review from chr-hertel June 30, 2025 10:03
@DZunke
Copy link
Contributor

DZunke commented Jun 30, 2025

I would say go with option 2 to keep the library as minimal as needed. Having the UUIDv7 is already allowing people to extract the timestamp when overriding the normalizer - as shown in the example. It is sortable by this data and also extractable on demand afterwards when someone needs it and do not want to have an overwritten normalizer. So flexibility is there.

As a compromize, when moving on with option one, it would be nice to have the format configurable with the normalizer context because the ISO 8601 format is not the only one utilized and so it would be more flexible.

@OskarStark
Copy link
Contributor Author

OskarStark commented Jun 30, 2025

Great discussion! I think Dennis's suggestion for configurable timestamps is excellent and would provide the flexibility needed.

However, I believe we should proceed step by step. This PR already implements the core UUID functionality and ID serialization, which is the foundation. Let's get this merged first, then we can tackle the configurable timestamp feature in a follow-up PR.

The current implementation gives us:

  • Unique message IDs (UUID v7 with embedded timestamps)
  • ID serialization in all normalizers
  • Solid foundation for future timestamp features

Once this is merged, we can discuss and implement the configurable timestamp context approach in a separate, focused PR. What do you think?

@DZunke
Copy link
Contributor

DZunke commented Jun 30, 2025

Sounds pretty good to me. Options would then only have to be transported to the normalizers context. But it could be a good next step, as you mentioned. I already saw the code you had here as an example before editing your message and it looked also good.

The only thing that i would then think about is the options transport. Currently it is this flexible array construction but maybe the options also need a bit more brain to have it maybe more structured into different features with a context class, or something in this direction but this is also a topic on it's own.

Generally, even i was against having the Message bloated in last discussions, i think the MR gives a good foundation for further progressing. At least because there is an interface now and the facade approach, from the last discussion about it, is obsolete.

@OskarStark
Copy link
Contributor Author

OskarStark commented Jun 30, 2025

You mean this code, right ❓

For reference, here's how we could implement the configurable timestamp feature in a follow-up PR:

// Example implementation for SystemMessageNormalizer
public function normalize(mixed $data, ?string $format = null, array $context = []): array
{
    $array = [
        'role' => $data->getRole()->value,
        'content' => $data->content,
        'id' => $data->getId()->toRfc4122(),
    ];

    // Add timestamp if requested in context
    if ($context['include_timestamp'] ?? false) {
        $timestampFormat = $context['timestamp_format'] ?? 'c'; // ISO 8601 as default
        $array['timestamp'] = $data->getId()->getDateTime()->format($timestampFormat);
    }

    return $array;
}

Usage examples:

// Default behavior (no timestamp)
$normalizer->normalize($message);

// Include ISO 8601 timestamp
$normalizer->normalize($message, null, ['include_timestamp' => true]);

// Include Unix timestamp
$normalizer->normalize($message, null, [
    'include_timestamp' => true,
    'timestamp_format' => 'U'
]);

// Include custom format
$normalizer->normalize($message, null, [
    'include_timestamp' => true,
    'timestamp_format' => 'Y-m-d H:i:s'
]);

This approach would need to be applied to all message normalizers (User, System, Assistant, ToolCall) for consistency.

@DZunke
Copy link
Contributor

DZunke commented Jun 30, 2025

Yep! That is it what i saw with the last message - without the examples. Thanks for re-referencing it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this more, than having the preg match written over and over again

@OskarStark OskarStark force-pushed the feat/message-uid-clean branch 4 times, most recently from 56c7c92 to e3e1439 Compare June 30, 2025 11:30
@OskarStark OskarStark force-pushed the feat/message-uid-clean branch 2 times, most recently from 90f7995 to 14ac0ba Compare June 30, 2025 11:35
@OskarStark
Copy link
Contributor Author

CI 💚

@OskarStark OskarStark force-pushed the feat/message-uid-clean branch from 14ac0ba to 07b8111 Compare June 30, 2025 11:36
@OskarStark
Copy link
Contributor Author

So I could get an approval from you @DZunke ? 😄

@DZunke
Copy link
Contributor

DZunke commented Jun 30, 2025

@OskarStark sure, i scrolled through the code a second time and yeah, it looks fine from my side. Thanks again!

@chr-hertel
Copy link
Member

So, @OskarStark, how much of code and comments were hand crafted and what AI? 😆

@OskarStark
Copy link
Contributor Author

Everything is AI, except 3 of my comments 😄

Copy link
Member

@chr-hertel chr-hertel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normalizers are intended for the contract of the API payload - so we cannot just add the ID there. OpenAI might tolerate that, but for example Mistral and Anthropic fail with invalid request errors.
The UID is intended for user land code anyways, and i guess after a revert in normalizers this is good to be merged :)

@OskarStark OskarStark force-pushed the feat/message-uid-clean branch from 499c48e to a6d73c1 Compare June 30, 2025 20:41
All message types now include a UUID v7 identifier that provides:
- Unique identification for each message
- Embedded timestamp information
- Sortable message ordering

The UUID is available on message objects for userland code but is not serialized in API payloads to ensure compatibility with all LLM providers (Mistral and Anthropic fail with invalid request errors when extra fields are present).

BREAKING CHANGE: Message constructors now generate a UUID automatically

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Member

@chr-hertel chr-hertel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice one - finally settled :D thanks @OskarStark

@chr-hertel chr-hertel merged commit 4d4a7fa into main Jun 30, 2025
7 checks passed
@chr-hertel chr-hertel deleted the feat/message-uid-clean branch June 30, 2025 21:03
@OskarStark
Copy link
Contributor Author

Can we please get a release? 😄 So @llupa can proceed? Thanks

OskarStark added a commit to symfony/ai that referenced this pull request Jul 1, 2025
This PR was merged into the main branch.

Discussion
----------

feat!: add UUID to all messages

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | yes
| Docs?         | yes
| Issues        |
| License       | MIT

Cherry picking php-llm/llm-chain#364

Commits
-------

4fa1279 feat!: add UUID to all messages (#364)
symfony-splitter pushed a commit to symfony/ai-platform that referenced this pull request Jul 11, 2025
This PR was merged into the main branch.

Discussion
----------

feat!: add UUID to all messages

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | yes
| Docs?         | yes
| Issues        |
| License       | MIT

Cherry picking php-llm/llm-chain#364

Commits
-------

4fa1279 feat!: add UUID to all messages (#364)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
BC BREAK Backwards compatibility break
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce UID for Messages
3 participants