Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in string_map.c in rcutils #411

Merged
merged 4 commits into from
Mar 2, 2023

Conversation

nnmm
Copy link
Contributor

@nnmm nnmm commented Feb 27, 2023

If realloc for new_values failed, string_map->impl->keys would have a size that is inconsistent with string_map->impl->capacity, leading to a memory leak in the best case, and invalid memory access in the worst case.

Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the problem you are pointing out, but I don't necessarily agree that we can't use reallocate here.

Instead of this change, if we just move the assignment of string_map->impl->keys until after we know that the new_values is not NULL, that would accomplish the same thing, correct? That is (pseudo-code):

new_keys = reallocate();
if (new_keys == NULL) {
    return RCUTILS_RET_BAD_ALLOC;
}
new_values = reallocate();
if (new_values == NULL) {
    free(new_keys);
    return RCUTILS_RET_BAD_ALLOC;
}
string->map->impl->keys = new_keys;
string->map->impl->values = new_values;

(reallocate has performance benefits, so I don't want to move away from it if we don't have to)

@nnmm
Copy link
Contributor Author

nnmm commented Feb 27, 2023

@clalancette I thought of that too, but decided against it because reallocate is allowed to return the same pointer. If new_keys == string->map->impl->keys, you've freed string->map->impl->keys. Edit: And checking for that and not freeing is also not an option, because then string->map->impl->keys would have a different size from the capacity and the values.

@clalancette
Copy link
Contributor

@clalancette I thought of that too, but decided against it because reallocate is allowed to return the same pointer. If new_keys == string->map->impl->keys, you've freed string->map->impl->keys. Edit: And checking for that and not freeing is also not an option, because then string->map->impl->keys would have a different size from the capacity and the values.

Good point.

That said, looking at it again I'm not entirely sure where the problem is. It is true that we may succeed in reallocating the keys, but not the values. But we don't update the capacity or the size in that case. That means that all other operations should still work fine on the old data, and we should never run into a memory leak or invalid memory accesses.

Can you explain a bit more about the problem that caused you to look at this?

@nnmm
Copy link
Contributor Author

nnmm commented Feb 27, 2023

That said, looking at it again I'm not entirely sure where the problem is. It is true that we may succeed in reallocating the keys, but not the values. But we don't update the capacity or the size in that case. That means that all other operations should still work fine on the old data, and we should never run into a memory leak or invalid memory accesses.

If the new capacity is smaller than the old one, and the value reallocation fails, then the map will still report the old capacity even though string->map->impl->keys is now smaller than that. Then e.g. rcutils_string_map_clear() will access invalid memory.

  for (; i < string_map->impl->capacity; ++i) {
    if (string_map->impl->keys[i] != NULL) {
      …
    }
  }

Can you explain a bit more about the problem that caused you to look at this?

Apex.AI used the time bomb allocator on the string map to increase coverage, and I noticed that this test leaked memory.

@clalancette
Copy link
Contributor

If the new capacity is smaller than the old one, and the value reallocation fails, then the map will still report the old capacity even though string->map->impl->keys is now smaller than that. Then e.g. rcutils_string_map_clear() will access invalid memory.

Ah! Thanks for the explanation, now I see the issue.

That said, I'm still not that willing to give up the realloc if we don't have to. I know that this is only for the pointer tracking, but I want to give the malloc implementation as much chance as we can for performance.

Thinking about this more, fundamentally I think the problem is that we are tracking the keys and values separately. That is, we really should be allocating and reallocating the keys and values as pairs. That way, they always succeed or fail as a pair. Luckily, we are using the PIMPL pattern here, so we are free to make a change like this. Towards that end, please see ApexAI#1, where I've done exactly this. I'm interested to hear your thoughts about it. If you agree with the direction, feel free to merge it into this PR and we can continue from there.

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation looks good to me, either @clalancette proposal or this, i am okay to go.

@nnmm
Copy link
Contributor Author

nnmm commented Feb 28, 2023

Thinking about this more, fundamentally I think the problem is that we are tracking the keys and values separately. That is, we really should be allocating and reallocating the keys and values as pairs. That way, they always succeed or fail as a pair. Luckily, we are using the PIMPL pattern here, so we are free to make a change like this. Towards that end, please see ApexAI#1, where I've done exactly this. I'm interested to hear your thoughts about it. If you agree with the direction, feel free to merge it into this PR and we can continue from there.

I think that's a brilliant idea, it makes things much simpler. I merged it, but now I see that your commit wasn't signed off, how do we best solve that?

@clalancette
Copy link
Contributor

I think that's a brilliant idea, it makes things much simpler. I merged it, but now I see that your commit wasn't signed off, how do we best solve that?

It's upset because of the merge commit, which doesn't have a sign-off. The easiest thing to do here is to rebase and squash all of the commits together with proper Signed-off-by lines from both you and me. I'd do it myself, but I don't have permissions to push to the ApexAI account, so you'll have to do it.

Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left one comment that I think we need to replace the __clear_string_map call. Other than that, this looks good to me so I'll run CI once we come to consensus on that.

src/string_map.c Show resolved Hide resolved
clalancette and others added 3 commits February 28, 2023 16:47
Signed-off-by: Chris Lalancette <clalancette@gmail.com>
This makes it easier for us to recover from errors.

Signed-off-by: Chris Lalancette <clalancette@gmail.com>
…tring_map call

Signed-off-by: Nikolai Morin <nikolai.morin@apex.ai>
@clalancette
Copy link
Contributor

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

Copy link
Collaborator

@iuhilnehc-ynos iuhilnehc-ynos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with two minor suggestions.

src/string_map.c Outdated Show resolved Hide resolved
src/string_map.c Show resolved Hide resolved
Signed-off-by: Nikolai Morin <nikolai.morin@apex.ai>
@clalancette
Copy link
Contributor

clalancette commented Mar 1, 2023

Here's another CI. Assuming this comes back clean, we just need to wait for @fujitatomoya to resolve the last outstanding conversation:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@fujitatomoya
Copy link
Collaborator

lgtm with green CI.

@nnmm
Copy link
Contributor Author

nnmm commented Mar 2, 2023

It's green with a tinge of yellow, is that okay?

@clalancette
Copy link
Contributor

It's green with a tinge of yellow, is that okay?

The yellow on aarch64 is OK, that is a known flake. Windows being red is not good, but we were having some Windows infrastructure issues yesterday. I'll see what is going on there.

@clalancette
Copy link
Contributor

All right, the yellow on Windows is in some other builds, so not caused by this. I'll go ahead and merge, thanks for the contribution.

@clalancette clalancette merged commit 16e6bfd into ros2:rolling Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants