Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Adopts *Writable* BaggageContext (Better) #167

Closed
wants to merge 7 commits into from

Conversation

ktoso
Copy link
Contributor

@ktoso ktoso commented Sep 16, 2020

This adopts the baggage API in the shape of slashmo/gsoc-swift-baggage-context#34

I have discussed with @fabianfett and @tomerd a little bit about this.


This adoption is preferable I believe, because of comments that @adam-fowler raised in slashmo/gsoc-swift-baggage-context#34 (comment) In general, this means is also ready for the Context requiring a set/get on the baggage, making lifes easier for other folks.

Specifically, we talked with @fabianfett why the Lambda.Context context type was a class -- since it was immutable, it avoided copying around things a lot. We can achieve the same niceness by making it a CoW type, rather than just a struct. It's a bit of work but worth it in the long run as we get the best of both worlds.

@ktoso ktoso changed the title [RFC] Adopts *Writable* BaggageContext [RFC] Adopts *Writable* BaggageContext (Better) Sep 16, 2020
@ktoso
Copy link
Contributor Author

ktoso commented Sep 16, 2020

Whoop I PRed from master based branc... fixing

Copy link
Member

@fabianfett fabianfett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this change – and it is totally in line with what I expected the Baggage change to look like! There are two questions about the use of var _logger?

var baggage: Baggage

let invokedFunctionARN: String
let deadline: DispatchWallTime
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deadlines we'll eventually be able to handle in a distributed way -- so this will eventually move into the baggage.

See: Future: Deadlines and Cancelation patterns slashmo/gsoc-swift-baggage-context#24

@tomerd tomerd marked this pull request as draft September 16, 2020 20:54
// utility
_logger: logger,
eventLoop: eventLoop,
allocator: allocator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc evenloop and allocator are immutable and dont need to be at the storage level imo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or is this or allow them to be as part of the by-ref trick?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is to keep the outer Context type small -- just a single pointer, otherwise we'd grow the struct and each time it gets let or passed somewhere those references would be copied -- this way, everything in storage, nothing gets copied other than the single pointer just when passing the context around.

cognitoIdentity: cognitoIdentity,
clientContext: clientContext,
// utility
_logger: logger,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we need underscore for logger or Storage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For logger because it's "underlying logger, don't use this, use logger on context" for Storage happy to remove the underscore (done)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can relate to doing so for the member, but not for the ctor argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see, got it will do

///
/// - note: The `LogLevel` can be configured using the `LOG_LEVEL` environment variable.
public let logger: Logger
public var logger: Logger {
self.storage._logger.with(self.baggage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember what with() does... does it mean every logger access is a new instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, every context.logger is a "new logger" but a logger is a cheap value type so that's not terrible.

This has to be like that because we need the logger to point to the "latest" baggage values. As everything here has value semantics, we can't just pass a reference to the context or baggage into the handler, and have to pass it when we're about to use it.

The with created LogHandler is lazy though -- it does not create Logger.Metadata yet unless a log statement is actually triggered.

Alternatively, this could be done on a baggage { didSet { ... if you'd prefer that but I think that's actually hard to implement correctly without growing memory forever... we'd need to notice "aha, this logger already has a baggage logger, so we need to unwrap it and then wrap again with a baggage aware one" to prevent it growing infinitely with Logger(BaggageLogHandler(BaggageLogHandler(BaggageLogHandler(BaggageLogHandler(BaggageLogHandler(BaggageLogHandler(... if you know what I mean... This is hard to implement because handler is private in logging, so we can't check for it... Implementing it that way may require making the handler public in SwiftLog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think this could actually add overhead for cases that log allot. I think keeping state on when baggage changed otherwise returning the existing logger could be a good optimization. maybe even coded into BaggageContext so users dont have to reinvent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revisited this and will try to implement via an "materialize once" into logging metadata. You're right that this is painful if there's a lot of logging, and we should not make that painful :\

It won't be doable as a log handler I guess, so that goes out the window; really I guess what would solve the needless copying is to make Logging depend on Baggage, and then we can remove its storage for metadata and replace it with baggage, avoiding the copying alltogether.

Not sure when to pitch this, it'd be a logging 2.0 I guess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this -- metadata is now set eagerly and we won't get into crazy copying anymore.

Thanks for reminding me about this, it's kind of obvious in hindsight but I missed it.

Some numbers how the copying "eagerly" amortizes (that's for 10k logs); it's just bad for "never logs" cases but one can assume those don't really happen in a large "actual" system (vs a library) tbh.

The logger.with lazy-pattern remains available when one wants to use it instead.

Screen Shot 2020-09-18 at 18 17 27

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(orange/red are the logger.with(baggage))

deadline: DispatchWallTime,
cognitoIdentity: String?,
clientContext: String?,
_logger: Logger,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove underscore from here

self.storage.baggage
}
set {
if isKnownUniquelyReferenced(&self.storage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please correct me if i am wrong, as far as I understand it:

  1. everything (new storage object) is copied every time baggage is updated, which will happen every time new span is created = often

  2. this does implement copy on write however changing of the storage is not thread safe which I think is not a nice API for user who would need to sync the changes himself

see discussion here https://gist.github.com/drewmccormack/b1c4487935cf3c3e0a5feaf488a95ebd#gistcomment-2826755

Copy link
Contributor Author

@ktoso ktoso Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm aware of that rule, it could be that it's misapplied here after all... I'll dig into it again.

With the latest changes in context perhaps actually we can get away without the Storage completely after all actually. Meh we'd want to keep CoW here to keep the lambda context small. I'll look into this shortly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pokryfka, so no. This is implementation correct. (You had me worried there for a second! 😉)

This is as thread safe as an Int or any other value semantics type is.

isKnownUniquelyReferenced is commonly misunderstood, so I understand where your confusion is coming from. However such implementation is safe by itself. Now how someone uses this type may or may not be thread-safe -- and that is what the example you're linking is doing wrong. The same kind of race can be exhibited on even a plain integer value. This type provides value semantics guarantees, and no more.


everything (new storage object) is copied every time baggage is updated, which will happen every time new span is created = often

I disagree about the "often" quantification used here. It is by far not as often as a raw struct would be copying -- each passing around and reassigning to a new variable even would technically then be copying.

We're getting the best of both worlds; and simply following the known CoW semantics that are frequent in Swift.

this does implement copy on write however changing of the storage is not thread safe which I think is not a nice API for user who would need to sync the changes himself

The operations in this piece of code are safe. If they weren't then Array and all of Swift's collections are screwed as well 😉 We are providing the same semantics as those types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything (new storage object) is copied every time baggage is updated, which will happen every time new span is created = often

I disagree about the "often" quantification used here. It is by far not as often as a raw struct would be copying -- each passing around and reassigning to a new variable even would technically then be copying.

We're getting the best of both worlds; and simply following the known CoW semantics that are frequent in Swift.

it used to be a "const reference" and so only the reference (pointer) was copied.

I dont know how expensive is copying of a small struct, and how much more expensive is allocating the space for it on heap rather than on stack; perhaps its not a real issue at all.

if the goal was to avoid copying of the lambda context, perhaps Context should have a reference to "static storage" and a reference to CoW "mutable storage" (the baggage). That way 2 references would always be copied but the content of the "static storage" would not need to be copied every time the baggage changes.

this does implement copy on write however changing of the storage is not thread safe which I think is not a nice API for user who would need to sync the changes himself

The operations in this piece of code are safe. If they weren't then Array and all of Swift's collections are screwed as well 😉 We are providing the same semantics as those types.

Thank you for rechecking it.
I do find it a bit confusing...

if I understand that it will be safe because set is mutating.

struct Context {

func notSafeUpdateBaggage(_ baggage: Baggage) {
 // update storage
}

// need to make a copy which will increase the storage reference counter
mutating func updateBaggage(_ baggage: Baggage) {
 // update storage
}

}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the goal was to avoid copying of the lambda context, perhaps Context should have a reference to "static storage" and a reference to CoW "mutable storage" (the baggage). That way 2 references would always be copied but the content of the "static storage" would not need to be copied every time the baggage changes.

Yeah if it becomes a problem I think we could do that 🤔 👍

I do find it a bit confusing...

if I understand that it will be safe because set is mutating.

Maybe this example can illustrate: Because this is a struct, in order to change it the function must be mutating (or set {} which implicitly is).

Then if anyone has:

let context = ...

they can't mutate it unless a second reference is created:

let context = ...
var changeMe = context

which ensures that we'll have the +1 count there and will copy on mutation.

It still is possible to make bugs like:

class Me { 
var onlyReference: Lambda.Context = ... 
for t in threads { 
  t.run { onlyReference.mutate() }
}

that remains illegal as usual, but not because of the CoW or Context type being wrong but how the user uses it being unsafe. The same example is true for e.g. var int: Int which you'd try to mutate +=1 from many threads. There's nothing wrong about the Int type, but about the unsafe modifications to where we store it.

@tomerd
Copy link
Contributor

tomerd commented Jan 22, 2024

closing in inactive PRs, feel free to re-open if still relevant

@tomerd tomerd closed this Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants