Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inputcapture: Reference-count Call objects #191

Merged
merged 10 commits into from
Feb 12, 2025

Conversation

smcv
Copy link
Contributor

@smcv smcv commented Feb 10, 2025

Best reviewed commit-by-commit.

  • glib-backports: Add a backport of g_steal_handle_id()

    This is useful when writing idempotent code to unsubscribe from GDBus
    signals.

  • inputcapture: Factor out call_dispose() from call_free()

    This makes call_dispose() idempotent, so that it can be called whenever a
    Call has become irrelevant, but without freeing the memory used for the
    Call structure itself, which would leave it as a dangling pointer if it
    is still in use as the user-data of an async call.

    This will be used in a subsequent commit to keep the Call alive via a
    circular reference between the GTask and Call while we are still waiting
    for a result, and then break the circular reference with call_dispose()
    when the Call has reached a resolution, either success or failure.

  • inputcapture: Reference a pre-existing session where necessary

    We're passing the session object to g_task_return_pointer() with
    g_object_unref as the free-function, and that's only correct if we
    previously owned a reference to it. In the case where we created a fresh
    session object, we did own a reference, but in the case where we merely
    retrieved it from call->session, we did not.

    This would have led to a use-after-free if it wasn't for the fact that
    we are leaking the Call object (inputcapture: Call objects are leaked on successful code paths #190), which leaks a reference to the
    session, avoiding the use-after-free via compensating errors. Before we
    can safely fix the leak, we have to reference-count the session correctly.

  • inputcapture: Reference-count Call instances

    To avoid a use-after-free (intermittent segfault during automated tests #169), we need to make sure that there is
    at least one reference held to the memory used to store the Call struct
    for the duration of each D-Bus method call, so consistently take a new
    ref to the user-data of g_dbus_connection_call(), and release it in
    call_returned().

    However, we also need to keep the Call alive for as long as it is
    listening to any D-Bus signals, without introducing a long-term
    circular reference that would make it be leaked indefinitely.

    To achieve this, create a temporary circular reference between the Call
    and the GTask, but break it via call_dispose() every time we reach a
    resolution to the task, whether that's success or a failure.

    Resolves: intermittent segfault during automated tests #169
    Resolves: inputcapture: Call objects are leaked on successful code paths #190

  • inputcapture: Assert that we don't prep_call() twice in parallel

    If we did, then the second signal subscription would overwrite the first,
    and call_dispose() would no longer unsubscribe the first signal
    subscription, breaking our memory-management assumptions.

  • inputcapture: Drop unused userdata parameter to prep_call

    Despite its name, this was really a free-function for the GDBus
    signal subscription, which in practice was always NULL.

  • inputcapture: Guard against unknown Response codes

    If the response code was somehow neither 0, 1 nor 2, we would previously
    have freed the GTask without ever returning a result. Return a failed
    result in this case, mechanically equivalent to response code 2
    ("The user interaction was ended in some other way") but with a
    different error message.

  • inputcapture: Consistently end the Call iff the GTask has completed

    After the previous commit ensured that the GTask is always given a
    result at all appropriate times, this should not make any practical
    difference.

    However, it hopefully makes the intention clearer: after the GTask has
    completed, we always dispose the Call, and if the GTask has not yet
    completed, we never do.

  • inputcapture: Assert that we don't receive Response signals after dispose

    When the Call object is disposed, it unsubscribes from the Response
    signal before releasing its GTask reference, so it should be impossible
    to get into the Response handler after dispose. Clarify that.

@smcv
Copy link
Contributor Author

smcv commented Feb 10, 2025

Marked as draft because I'm still testing this (100 runs in a loop) on a cloud VM instance that, for whatever reason, reproduces #169 more easily than any other environment I have access to. But I think it's ready for review.

It certainly does resolve #191. When I add this:

diff --git a/libportal/inputcapture.c b/libportal/inputcapture.c
index 0736a0e..1504cf1 100644
--- a/libportal/inputcapture.c
+++ b/libportal/inputcapture.c
@@ -237,6 +237,7 @@ typedef struct {
 
 static void create_session (Call *call);
 static void get_zones (Call *call);
+static size_t call_instances = 0;
 
 static void
 call_dispose (Call *call)
@@ -277,6 +278,8 @@ call_last_unref (void *call)
    * wants a GDestroyNotify, and because this is a convenient place to put
    * life-cycle debugging */
   call_dispose (call);
+  call_instances--;
+  g_debug ("Call instances: %zu", call_instances);
 }
 
 static inline void
@@ -297,6 +300,9 @@ call_new (XdpPortal *portal,
 {
   g_autoptr(Call) call = g_rc_box_new0 (Call);
 
+  call_instances++;
+  g_debug ("Call instances: %zu", call_instances);
+
   call->portal = g_object_ref (portal);
 
   if (session != NULL)
diff --git a/tests/meson.build b/tests/meson.build
index d126a06..600fef0 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -17,7 +17,7 @@ if meson.version().version_compare('>= 0.56.0')
 
     test('pytest',
       pytest,
-      args: ['--verbose', '--verbose', '--log-level=DEBUG'],
+      args: ['--verbose', '--verbose', '--log-level=DEBUG', '-s'],
       env: test_env,
       workdir: meson.current_source_dir(),
       timeout: 180,

I can still see the number of instances increase, but only gradually (the highest it reaches is 6), and the number drops back down to 0 by the time testing finishes.

@smcv smcv force-pushed the inputcapture-call-refcount branch from 686ee22 to f648496 Compare February 10, 2025 13:03
@smcv smcv marked this pull request as ready for review February 10, 2025 13:04
@smcv
Copy link
Contributor Author

smcv commented Feb 10, 2025

cc @whot

@smcv smcv force-pushed the inputcapture-call-refcount branch from f648496 to 54bd6aa Compare February 10, 2025 13:06
smcv added a commit that referenced this pull request Feb 10, 2025
It's easier to deal with header files when it's unconditionally safe to
include the same header more than once.

In particular the absence of `#pragma once` in `glib-backports.h` caused
a build failure in #191, but only for older GLib releases.

Signed-off-by: Simon McVittie <smcv@debian.org>
@smcv smcv mentioned this pull request Feb 10, 2025
smcv added a commit that referenced this pull request Feb 10, 2025
It's easier to deal with header files when it's unconditionally safe to
include the same header more than once.

In particular the absence of `#pragma once` in `glib-backports.h` caused
a build failure in #191, but only for older GLib releases.

Signed-off-by: Simon McVittie <smcv@debian.org>
@smcv smcv force-pushed the inputcapture-call-refcount branch from 54bd6aa to 0253b7e Compare February 10, 2025 13:20
@smcv
Copy link
Contributor Author

smcv commented Feb 10, 2025

Marked as draft because I'm still testing this (100 runs in a loop) on a cloud VM instance that, for whatever reason, reproduces #169 more easily than any other environment I have access to

It was successful. I'm continuing to test in the same environment with the whole test suite.

smcv added a commit that referenced this pull request Feb 10, 2025
It's easier to deal with header files when it's unconditionally safe to
include the same header more than once.

In particular the absence of `#pragma once` in `glib-backports.h` caused
a build failure in #191, but only for older GLib releases.

Signed-off-by: Simon McVittie <smcv@debian.org>
@smcv smcv added the bug Something isn't working label Feb 10, 2025
smcv added 9 commits February 10, 2025 20:36
This is useful when writing idempotent code to unsubscribe from GDBus
signals.

Signed-off-by: Simon McVittie <smcv@debian.org>
This makes call_dispose() idempotent, so that it can be called whenever a
Call has become irrelevant, but without freeing the memory used for the
Call structure itself, which would leave it as a dangling pointer if it
is still in use as the user-data of an async call.

This will be used in a subsequent commit to keep the Call alive via a
circular reference between the GTask and Call while we are still waiting
for a result, and then break the circular reference with call_dispose()
when the Call has reached a resolution, either success or failure.

Signed-off-by: Simon McVittie <smcv@debian.org>
We're passing the session object to g_task_return_pointer() with
g_object_unref as the free-function, and that's only correct if we
previously owned a reference to it. In the case where we created a fresh
session object, we did own a reference, but in the case where we merely
retrieved it from call->session, we did not.

This would have led to a use-after-free if it wasn't for the fact that
we are leaking the Call object (flatpak#190), which leaks a reference to the
session, avoiding the use-after-free via compensating errors. Before we
can safely fix the leak, we have to reference-count the session correctly.

Signed-off-by: Simon McVittie <smcv@debian.org>
To avoid a use-after-free (flatpak#169), we need to make sure that there is
at least one reference held to the memory used to store the Call struct
for the duration of each D-Bus method call, so consistently take a new
ref to the user-data of g_dbus_connection_call(), and release it in
call_returned().

However, we also need to keep the Call alive for as long as it is
listening to any D-Bus signals, without introducing a long-term
circular reference that would make it be leaked indefinitely.

To achieve this, create a temporary circular reference between the Call
and the GTask, but break it via call_dispose() every time we reach a
resolution to the task, whether that's success or a failure.

Resolves: flatpak#169
Resolves: flatpak#190
Signed-off-by: Simon McVittie <smcv@debian.org>
If we did, then the second signal subscription would overwrite the first,
and call_dispose() would no longer unsubscribe the first signal
subscription, breaking our memory-management assumptions.

Signed-off-by: Simon McVittie <smcv@debian.org>
Despite its name, this was really a free-function for the GDBus
signal subscription, which in practice was always NULL.

Signed-off-by: Simon McVittie <smcv@debian.org>
If the response code was somehow neither 0, 1 nor 2, we would previously
have freed the GTask without ever returning a result. Return a failed
result in this case, mechanically equivalent to response code 2
("The user interaction was ended in some other way") but with a
different error message.

Signed-off-by: Simon McVittie <smcv@debian.org>
After the previous commit ensured that the GTask is always given a
result at all appropriate times, this should not make any practical
difference.

However, it hopefully makes the intention clearer: after the GTask has
completed, we always dispose the Call, and if the GTask has not yet
completed, we never do.

Signed-off-by: Simon McVittie <smcv@debian.org>
…pose

When the Call object is disposed, it unsubscribes from the Response
signal before releasing its GTask reference, so it should be impossible
to get into the Response handler after dispose. Clarify that.

Signed-off-by: Simon McVittie <smcv@debian.org>
@smcv smcv force-pushed the inputcapture-call-refcount branch from 0253b7e to 9ec5fcc Compare February 10, 2025 20:36
@smcv
Copy link
Contributor Author

smcv commented Feb 10, 2025

Rebased after @TingPing merged prerequisites #184 and #189 (thanks!)

A previous version survived 100 iterations of repeated testing on a cloud VM where 0.9.0 hits #169 80% of the time. Testing the rebased version now, it will likely take a while but is looking good so far (on its 10th run).

@smcv
Copy link
Contributor Author

smcv commented Feb 11, 2025

Testing the rebased version now, it will likely take a while but is looking good so far (on its 10th run)

This also survived 100 test iterations, on a cloud VM where 0.9.0 had approximately an 80% failure rate, so I'm now quite confident that this PR fixes the race condition.

Copy link
Member

@TingPing TingPing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputcapture: Consistently end the Call iff the GTask has completed

Typo: "iff"

Everything looks good to me.

@smcv
Copy link
Contributor Author

smcv commented Feb 11, 2025

Typo: "iff"

No, that was intentional: it's mathematical jargon for "if and only if", which would have made the subject line of the commit too long if I wrote it out in full.

If you don't like that, perhaps inputcapture: Consistently end Call if and only if GTask completed would be better?

@TingPing
Copy link
Member

Its fine, just new to me.

Copy link
Contributor

@whot whot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks heaps for finding all these... quite a ride just reading the commit messages :)

Signed-off-by: Simon McVittie <smcv@debian.org>
@smcv smcv merged commit ee20ea8 into flatpak:main Feb 12, 2025
11 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

inputcapture: Call objects are leaked on successful code paths intermittent segfault during automated tests
3 participants