Skip to content

Deleting unused variables after Response return #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Deleting unused variables after Response return #154

wants to merge 2 commits into from

Conversation

mmoreram
Copy link

Rationale

HttpKernel handle method is injected as an anonymous function, and because of that, internal memory positions used inside this function is only removed when the GC is applied.
In order to avoid memory incremental, we can set to null these variables, and this used memory space will automaticaly be released.

How to check the impact

  • Create an application using HttpKernel as the bridge
  • Add this line before the response is returned in the method handle. - syslog(1, memory_get_usage(false));
  • Use the application and tailf the syslog.
  • You will find that the memory increases after each call
  • By adding these lines, the memory will be stable (you may not see each line the, because syslog group them when are equal)

- This is part of this discussion php-pm/php-pm#448

# Rationale
HttpKernel handle method is injected as an anonymous function, and because of that, internal memory positions used inside this function is only removed when the GC is applied.
In order to avoid memory incremental, we can set to null these variables, and this used memory space will automaticaly be released.

# How to check the impact
- Create an application using HttpKernel as the bridge
- Add this line before the response is returned in the method `handle`. - `syslog(1, memory_get_usage(false));`
- Use the application and tailf the syslog.
- You will find that the memory increases after each call
- By adding these lines, the memory will be stable (you may not see each line the, because syslog group them when are equal)
@andig
Copy link
Contributor

andig commented Feb 20, 2019

Is there any proof that this works?

Imho releasing a variable does not return any memory unless gc actually runs. When the anonymous function returns, it will automatically release all local variables. That is- unless there are any other references in the system to those.

So imho this is not a cure but treating the symptoms.

If this PR really changes the behaviour we should analyse the released variable one by one and check which one triggers the growing memory consumption and then figure out where that is being referenced outside of the closure.

@mmoreram
Copy link
Author

mmoreram commented Feb 20, 2019

Setting a variable to NULL it releases the memory at the moment. Difference between =null and unset() as nicely @acasademont pointed to me https://stackoverflow.com/a/50931077

@mmoreram
Copy link
Author

mmoreram commented Feb 20, 2019

@andig it has nothing to be related with referencies, but a behavior that PHP has with anonymous functions. You can proof that it works by just following the instructions I pointed to you in the PR description.

@andig
Copy link
Contributor

andig commented Feb 20, 2019

Is there any proof that this works?

Could you provide before/after test case and measurements? Is this caused by any specific variable?

it has nothing to be related with referencies, but a behavior that PHP has with anonymous functions.

I would appreciate a link to the docs if possible.

@mmoreram
Copy link
Author

@andig https://stackoverflow.com/a/50931077

I will post the measurements later

@andig
Copy link
Contributor

andig commented Mar 6, 2019

ping @mmoreram did you get any proof that this changes the behaviour? Pleas note that the SO link is ages old and php has matured a lot since that time.

@mmoreram
Copy link
Author

@andig You can check a comparision between having this improvements inside the kernel, and without it.

https://gist.github.com/mmoreram/d10b40e7745e80617dfc6753ef117016

The test is simple. An active active and a tests battery working against it. In each request / response iteration, a simple syslog print of the current used memory.

As you can see, without improvements, the memory grows. With improvements, the memory stays much more stable over the time :)

@marcj
Copy link
Member

marcj commented Mar 11, 2019

We should add a comment above those lines explaining that circumstances, so our future self will know what the f was/is wrong with PHP.

@andig
Copy link
Contributor

andig commented Mar 11, 2019

I realize I'm a sceptic here as this is beyond what I've experienced with php. Are these tests comparable, i.e. why do they already start at different base memory consumption after the first request?

it has nothing to be related with referencies, but a behavior that PHP has with anonymous functions

What behavior is that? If nothing holds on to a reference it should be garbage-collected.

update are you potentially talking about https://bugs.php.net/bug.php?id=69639? Then gc_collect_cycles() should also help, though I don't see where we have a closure here that falls into that category. It still feels a little like voodoo for me.

@mmoreram
Copy link
Author

mmoreram commented Mar 12, 2019

@andig OK.

IMHO, forcing gc_collect_cycles() in each request iteration is not what I would say good practices, but is OK for me. You can ask me what behavior is a specific PHP one, and I would say to you that I don't really know. My expertise in PHP internals is not that good, but what I proposed to you guys is a demostration that by adding some small lines, the memory stays stable over the request iterations.

Now is up to you to accept it or decline, but I'm not going to enter a discussion like this.

Good luck.

update BTW. Both tests are exactly the same. Same tests battery, same code, just changing the lines I exposed.

@andig
Copy link
Contributor

andig commented Mar 19, 2019

IMHO, forcing gc_collect_cycles() in each request iteration is not what I would say good practices, but is OK for me.

It isn't, full ack!

...but I'm not going to enter a discussion like this.

Not sure what you mean? You've specifically mentioned

it has nothing to be related with referencies, but a behavior that PHP has with anonymous functions.

and I'd appreciate any reference you have for this claim.

Update for reference: there is also a discussion in reactphp/http#289

@dnna
Copy link
Contributor

dnna commented Mar 23, 2019

I know this PR isn't a cure and probably just treats the symptoms of a deeper issue (maybe in reactphp? maybe in PHP itself?), but if it does work and reduces long-term memory consumption shouldn't we merge it anyway? The changes don't seem very risky and we can always clean it up later if the root cause is discovered and it's no longer needed.

@mmoreram
Copy link
Author

@andig when I say that I don't want to enter a discussion like this, I mean opinions discussions. I exposed some examples and demonstrations that these changes improve the behavior of the package. I really LOVE this project, and improving it is one of my project Apisearch goals.

Said that, in Apisearch we have implemented our own Bridge and Bootstrap. This is why I want the maximum performance and avoiding if/else blocks is a must for me. In my Bridge I added these "=null" lines, and my containers are now pretty stable in AWS, instead of being killed each 16 minutes.

@dnna I think that there's not a root cause. The cause itself is how PHP and GC works. Deleting variables inside anonymous functions in PHP have always been a MUST if these variables can have a big size (like a Request, or a Response objects). Deleting them is a good practice (has been since there isn't an Apache behing the project killing the PHP thread after each response return).

BTW, @andig, I can set my own code in the Bridge and this would improve the memory allocated in each of the slaves, but if I want to change the behavior in the server itself, I should fork the project in order to add my gc_collect_cycles, and this is something I want to completely avoid.

Thanks for the responses :)

@andig
Copy link
Contributor

andig commented Mar 24, 2019

This is not about opinions. My question is about a claim you've made:

it has nothing to be related with referencies, but a behavior that PHP has with anonymous functions.

I'd appreciate any reference you have for this claim.

I'd simply appreciate an answer in order to better understand the root cause.

If there isn't one then fine, too, and let merge as-is.

@mmoreram
Copy link
Author

@andig ups, I was not talking about this.

For example - https://bugs.php.net/bug.php?id=75914

Everyone tell that calling gc_collect_cycles fixes the problem, but for me, this is not a solution.

@marcj
Copy link
Member

marcj commented Mar 24, 2019

So, @mmoreram does alternatively calling gc_collect_cycle() fixes your issue as well? I'd think it should. Can you please test it in your use-case? I'd prefer that way over setting variables to null as it's very clear what that function call does. Also, I'd like to mention that we really can't talk about 'best practice' in any way here, since there's not much experiences in running PHP in such a way, so I don't see any argument that supports that claim that it isn't best practice. I mean, it's not best practice in a shared-nothing architecture, right, but that doesn't mean it's automatically the same in a completely different architecture/way of running an PHP application. Back in the days when I was coding a lot of PHP with long running processes like migration scripts, gc_collect_cycle() was actually a must to avoid tragic memory leaks.

@mmoreram
Copy link
Author

@marcj never mind. My project is running as I was expecting now with my own Bridge, but seems that in PHP gc_collect_cycle() is the final solution for everything. Is OK for me :)

@mmoreram mmoreram closed this Mar 24, 2019
@mmoreram mmoreram deleted the patch-1 branch March 24, 2019 15:18
@andig
Copy link
Contributor

andig commented Mar 24, 2019

@marcj must say that I prefer nulling over gc_collect_cycle as the latter has unpredictable runtime behaviour time-wise (assumption). I'd still merge this one but would prefer if we found where we are subject to https://bugs.php.net/bug.php?id=75914.

@marcj
Copy link
Member

marcj commented Mar 24, 2019

@andig, you're probably right 👍

@dnna
Copy link
Contributor

dnna commented Mar 24, 2019

@andig @marcj I did some investigation after reading a bit more about this bug, and the leak seems to be in the block:
$mapFiles = function (&$files) use (&$mapFiles) { ... }

Commenting out this part causes the memory to be stable in the same way like in @mmoreram 's example.

I'll investigate further to see what exactly is causing this leak and make a PR for it.

@marcj
Copy link
Member

marcj commented Mar 24, 2019

@dnna, very good find! 🤩

@dnna dnna mentioned this pull request Mar 24, 2019
@andig
Copy link
Contributor

andig commented Mar 24, 2019

GREAT! This looks exactly like the referenced bug. Closure is created like

$mapFiles = function (&$files) use (&$mapFiles) {...}

which will have access to $this and make it linger around. Would it help to create as static function instead since its not using $this?

@dnna
Copy link
Contributor

dnna commented Mar 24, 2019

I tried the static part but it still leaked. At the end I just declared it as a normal private function and that seems to solve the issue.

@andig
Copy link
Contributor

andig commented Mar 24, 2019

Actually, its not because it isn't a classvar. Mhhhm...

@andig
Copy link
Contributor

andig commented Mar 24, 2019

I tried the static part but it still leaked. At the end I just declared it as a normal private function and that seems to solve the issue.

Funny :O

/cc @kelunik maybe you could shed some light on this?

@dnna
Copy link
Contributor

dnna commented Mar 24, 2019

Could it be because it uses $mapFiles internally, so it created a new circular reference on each request as the function is redeclared every time?

@kelunik
Copy link

kelunik commented Mar 24, 2019

$mapFiles has a circular reference to itself here. Why isn't that a private method?

@andig
Copy link
Contributor

andig commented Mar 24, 2019

Big thank you- finally an explanation. My world has just shifted back into order where laws of physics apply. 🌎

It is a private method thanks to @dnna now. No idea why the ever was different...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants