Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG Replace phpdotenv with thread-safe replacement #7484

Merged

Conversation

tractorcow
Copy link
Contributor

@tractorcow tractorcow commented Oct 16, 2017

Fixes #7478

Not strictly breaking, but any code which uses getenv php method will remain vulnerable until upgrading to the new Environment::getenv method.

In order to be as unobtrusive as possible the getenv and putenv methods on Environment class are designed as drop-in replacements for the global php methods.

Thanks to @NightJar for beginning this solution (see NightJar@e614ede).

Other PRs for various modules incoming to replace getenv usages.

Other prs to merge after this:

References to this issue (notably laravel based)

@tractorcow tractorcow force-pushed the pulls/4.0/environment-getenv branch from ed6976f to 301a09b Compare October 16, 2017 04:44
Copy link
Contributor

@zanderwar zanderwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@dhensby
Copy link
Contributor

dhensby commented Oct 16, 2017

I'm not really behind this fix as the whole point in using an external library is so that we aren't maintaining this kind of code ourselves.

from my understanding putenv and getenv are not threadsafe methods and so this error is only happens on multi-threaded environments (IIS, HHVM and multithreaded apache on windows).

Have we investigated using Dotenv\Loader::setEnvironmentVariable and Dotenv\Loader::getEnvironmentVariable instead as they rely on the super globals first...

@zanderwar
Copy link
Contributor

zanderwar commented Oct 16, 2017

The external library is not intended for production, so using it in the first place only satisfies the environment when in development albeit is not in require-dev and as such is adding unnecessary bloat to production.

IMHO the dotenv dependency should be dropped entirely for methodology like this. Even if I am being biased here because this resolves my issue of wanting to use the .env file like laravel/symfony allow flawlessly

@tractorcow
Copy link
Contributor Author

Our external library is parse_ini_string :)

Copy link
Contributor

@flamerohr flamerohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor request, it's not blocking for this to be merged

$variables = parse_ini_string($contents) ?: [];
foreach ($variables as $name => $value) {
// Don't overload vars
if (static::getenv($name) === false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have overloading vars as a second param for this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's a good idea thank you. :)

Copy link
Member

@sminnee sminnee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed, few stylistic comments. Hand-rolling our own dotenv implementation surprised me a bit – we're trying to reduce wheel reinvention. Sure it's not lengthy code, but it's the kind of regex soup that can end up having bugs.

*
* @param string $string Setting to assign in KEY=VALUE or KEY="VALUE" syntax
*/
public static function putenv($string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this to be our API rather than setEnv($k, $v)? Who would you recommend calls putenv?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having looked at the other code that calls it, it seems making the setEnv API public and removing putenv would be more appropriate to developers.

Copy link
Contributor Author

@tractorcow tractorcow Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've ensure that putenv() promises to be a drop-in replacement for the core putenv. Mirroring the API means that users don't have to do any string-manipulation in their usercode. As a developer I love the peace of mind of an api promising to work the same way; Less for me to think about and re-test. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much value in that. People will still have to change their code, and if they're going to change their code we may as well give them the API they're more likely to want, which is setEnv.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at all our putenv calls we're doing string concatenation in them now, so providing setEnv() isn't going to force people to do string manipulation.

If we had 5 years of code built around putenv there might be a stronger argument for it, but we don't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fair enough.

Shall I make putEnv and setEnv both public?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable compromise.

* @param string $name
* @return mixed Value of the environment variable, or null if not set
*/
public static function getenv($name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this camel case, so getEnv()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This api mirrors the old global getenv, so I've opted to maintain the semantics as close as possible, including case name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make it getEnv

* @param string $path Path to the file
* @return array List of values parsed as an associative array
*/
public static function putenvFromFile($path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have we reimplemented dotenv?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much!

Copy link
Member

@sminnee sminnee Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to retain a dotenv dependency (either vlucas or symfony) and make reference to its loader within Environment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a look into symfony.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symfony seems to do exactly the same thing; symfony dotenv simply uses core-php getenv. Given our implementation is small and works for us, I suggest to leave it as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -58,39 +57,39 @@
}

// Allow a first class env var to be set that disables .env file loading
if (!getenv('SS_IGNORE_DOT_ENV')) {
if (!Environment::getenv('SS_IGNORE_DOT_ENV')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably still be getenv, otherwise it's a bit of a circular dependency, and this var can't be set by .env

Copy link
Contributor Author

@tractorcow tractorcow Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour is required in case user-specific bootstrap files set environment vars.

Note that Environment::getenv doesn't actually require .env files to be loaded; It also acts as an accessor to the underlying system environment. This just maintains consistency between all getenv accessors.

The risk of using old getenv is that if a user does custom loading of env vars in user-code or custom index.php, then that call is going to be vulnerable to the same race condition we are trying to address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

@tractorcow tractorcow force-pushed the pulls/4.0/environment-getenv branch from 301a09b to 8e6bb43 Compare October 16, 2017 23:59
@tractorcow
Copy link
Contributor Author

Ok, all feedback has been implemented and also on all other repos.

@@ -32,13 +32,18 @@ You can set "real" environment variables using Apache. Please

## How to access the environment variables

Accessing the environment varaibles is easy and can be done using the `getenv` method or in the `$_ENV` and `$_SERVER`
super-globals:
Accessing the environment varaibles should be done via the `Environment::getenv()` method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

referencing old API, maybe double check all docs :)

@@ -486,6 +486,8 @@ SS_BASE_URL="//localhost/"
The global values `$database` and `$databaseConfig` have been deprecated, as has `ConfigureFromEnv.php`
which is no longer necessary.

To access environment variables you can use the `SilverStripe\Core\Environment::getenv()` method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same in changelogs

@@ -58,39 +57,39 @@
}

// Allow a first class env var to be set that disables .env file loading
if (!getenv('SS_IGNORE_DOT_ENV')) {
if (!Environment::getEnv('SS_IGNORE_DOT_ENV')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just revisiting what @sminnee said - this should only ever be set by a global bonefido environment variable that won't change from execution to execution. As the comment says "Allow a first class env var to be set that disables .env file loading" but this allows second-class / psuedo environment variables to manipulate/unset this value which is the opposite of what's intended here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this allows second-class / psuedo environment variables to manipulate/unset this value which is the opposite of what's intended here.

getenv allows exactly the same if you use putenv, so what's the introduced risk that didn't exist before?

// If PHP is running as an Apache module and an existing
// Apache environment variable exists, overwrite it
if (function_exists('apache_getenv') && function_exists('apache_setenv') && apache_getenv($name)) {
apache_setenv($name, $value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the preference for handrolling our own dotenv work when this is practically identical to using Dotenv\Loader::setEnvironmentVariable and Dotenv\Loader::getEnvironmentVariable and means we don't have to maintain this code :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the setEnvironmentVariable is in the library: https://github.com/vlucas/phpdotenv/blob/828d19e597052ddeee65890bb2b1a0912d79fea8/src/Loader.php#L353-L375

It's super close to what we have just we also have an extra layer of the local static $env too.

Copy link
Contributor Author

@tractorcow tractorcow Oct 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just we also have an extra layer of the local static $env too.

That is the critical point of this PR. That's not simply a cache, that's the key feature that solves the process isolation issue.

@@ -0,0 +1,13 @@
# Test file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests for nested variables, eg:

TEST="value"
TEST_AGAIN="$TEST"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't write any code to support it either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are nested variables a thing? :o

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't let this be a thing... at least not here :P

Copy link
Contributor

@NightJar NightJar Oct 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts exactly. We're not implementing a POSIX shell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a feature of the dotenv lib so if we're replacing it like-for-like it needs to be here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've restored this. ;)

@dhensby
Copy link
Contributor

dhensby commented Oct 17, 2017

@zanderwar

The external library is not intended for production, so using it in the first place only satisfies the environment when in development albeit is not in require-dev and as such is adding unnecessary bloat to production.

I don't see how lifting the code from the library and putting it in our codebase makes this any more suitable for production. If we feel that this PR solves the problem, then using the built in methods (instead of php's getenv and putenv) will solve the problem too.

IMHO the dotenv dependency should be dropped entirely for methodology like this. Even if I am being biased here because this resolves my issue of wanting to use the .env file like laravel/symfony allow flawlessly

Laravel uses the same library for their .env so if you're happy with how laravel works then there should be no complaints. We could move to symfony's if it's more robust/resilient.

@tractorcow
Copy link
Contributor Author

tractorcow commented Oct 17, 2017

I don't see how lifting the code from the library and putting it in our codebase makes this any more suitable for production.

We aren't lifting it, we have made a completely unique implementation. We store the vars in a static $env to protected it from cross-process pollution (which apparently $_ENV and both $_SERVER were subject to in testing).

so if you're happy with how laravel works

I'm not happy, as even laravel / symfony suffer from the risks documented at https://gist.github.com/progmars/1e545d96dd48676a2aa7

@tractorcow
Copy link
Contributor Author

Maybe someone who has an environment which can reproduce this issue (e.g. @zanderwar ) could take over development of this fix and see if they can minimise the fix to one that still solves the issue, but maybe without removal of the existing backend dotenv library?

@tractorcow
Copy link
Contributor Author

Maybe we should request feedback from one of the laravel devs; Maybe they can provide some insight that I haven't seen.

@tractorcow
Copy link
Contributor Author

tractorcow commented Oct 17, 2017

Well, I've tracked down the appropriate laravel discussion.

laravel/framework#8191 (comment)

You should avoid calling env()in your application.

The way this is addressed in laravel is that the result of the combined config is cached, and thus there is no race condition when setting / loading env() vars, because getenv() is rarely called, and loading .env files is only done to populate this cache.

In the case of SilverStripe, what we could do is use .env to populate (and cache) config, and we use the config API.

We already use the back-tick operator to interpolate env / constants into config. What we could do is actually cache these within the config manifest (rather than lazy-evaluating on each request).

For example, this is how the default cache factory sets the temp path (cache.yml)

---
Name: corecache
---
SilverStripe\Core\Injector\Injector:
  SilverStripe\Core\Cache\CacheFactory:
    class: 'SilverStripe\Core\Cache\DefaultCacheFactory'
    constructor:
      args:
        directory: '`TEMP_PATH`'

This could be solved in a number of ways:

Option 1

  • Move all used env vars to a config var on some relevant class
  • Update config module to interpolate these values during flush / cache generation
  • Replace all getenv() calls to the appropriate config calls

Option 2

Alternatively, we could not interpolate these variables, but instead create a separate config transformer which populates config directly from the environment into a holder class (e.g. SilverStripe\Core\Environment). Thus getenv('SS_DATABASE_CLASS') becomes Environment::config()->get('SS_DATABASE_CLASS').

@tractorcow
Copy link
Contributor Author

@dhensby what do you think about these other options?

@sminnee
Copy link
Member

sminnee commented Oct 17, 2017

We're close to an RC1 now is not the time to be radically re-thinking APIs.

Are we all 👍 on Environment::setEnv($key, $value) and Environment::getEnv($key) as the public API for this?

I love the idea of relying on the config's backtick system as much as possible, but that's a styleguide question, we'd still need the public API. I think that making more use of the backtick system is something that we can look at incrementing towards in 4.x.

I, like Dan, questioned why we rewrote the dotenv package instead of just including it as a dependency, but I don't think that an argument over that should have anything to do with the public API we expose. And, because of that, we could change this in 4.1 if we want.

I don't think that we should expose Dotenv\Loader::setEnvironmentVariable and Dotenv\Loader::getEnvironmentVariable as our public API as it ties us to that specific implementation, and if we realise (like we did a few days ago) that we have a problem with their API then we're stuffed until SS5. However, I think that Environment::setEnv() calling these methods behind the scenes would make a lot of sense.

Re-reading, I can see that Damian has found specific issues with the 3rd party implementations. I would say that we bless our current public API, go with Damian's implementation for now, and if we want, we can look at:

  • an upstream PR to one of the dotenv implementations that improves it in the way that Damian has done here
  • a 4.x change to SilverStripe to remove our internal implementation and switch back to a 3rd party dependency.

@tractorcow
Copy link
Contributor Author

@sminnee pro-tip; If we switch to Environment::getEnv() now, we can still switch the back-end cache at some later point (e.g. 4.1).

$_ENV['SS_DATABASE_CLASS'];
$_SERVER['SS_DATABASE_CLASS'];
use SilverStripe\Core\Environment;
Environment::getenv('SS_DATABASE_CLASS');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getEnv


```php
use SilverStripe\Core\Environment;
Environment::putenv('API_KEY="AABBCCDDEEFF012345"');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setEnv('API_KEY', 'AABBCCDDEEFF012345')

}
use SilverStripe\Core\Environment;
$env = BASE_PATH . '/mysite/.env';
Environment::putenvFromFile($env);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putEnvFromFile

@tractorcow
Copy link
Contributor Author

I'll fix up the docs now. :)

@@ -100,8 +100,7 @@ public function testModuleRules()

public function testEnvVarSetRules()
{
$loader = new Loader(null);
$loader->setEnvironmentVariable('ENVVARSET_FOO', 1);
Environment::putEnv('ENVVARSET_FOO=1');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setEnv('ENVVARSET_FOO', 1)

@tractorcow
Copy link
Contributor Author

I'll defer to your better experience in this case. :)

@sminnee
Copy link
Member

sminnee commented Oct 18, 2017

So in summary:

  • Environment::getEnv() uses getenv() as its starting point
  • .env files are loaded into static properties, and don't rely on putenv()
  • We recommend system-defined environment variables and not .env in prod environments

Right?

Perhaps we ditch the putenv() call to make it obvious that you should use Environment::getEnv() in your application code and not getenv(). In particular, if you're running with a .env file that means it will fail consistently, instead of only failing in a race condition, which seems like better developer experience.

@dhensby
Copy link
Contributor

dhensby commented Oct 18, 2017

I don't think that we should expose Dotenv\Loader::setEnvironmentVariable and Dotenv\Loader::getEnvironmentVariable as our public API as it ties us to that specific implementation, and if we realise (like we did a few days ago) that we have a problem with their API then we're stuffed until SS5. However, I think that Environment::setEnv() calling these methods behind the scenes would make a lot of sense.

This is my view. We can use our own API which wraps the DotEnv library and even adds the local caching layer, but we shouldn't be removing it as a dependency at this time.

It also seems that this fix doesn't actually solve the problem anyway as we still rely on putenv (and getenv to an extent) which aren't thread-safe...

@tractorcow
Copy link
Contributor Author

I may update so that Environment::setEnv() simply sets the $env static, and we don't modify the environment at all.

@tractorcow tractorcow force-pushed the pulls/4.0/environment-getenv branch from 008bb3a to 883dda6 Compare October 18, 2017 21:37
@tractorcow
Copy link
Contributor Author

One more iteration, with these changes:

  • Move env file loading into EnvironmentLoader. It's <120 lines so I don't feel too bad about replicating dotenv library logic.
  • Implement variable interpolation at @dhensby's request.
  • No longer uses putenv(). Environment::setEnv() now exclusively sets a protected static var, which is used as the authoritative source for vars loaded from .env.
  • getenv() no longer loads variables loaded in this request; Environment::getEnv() must be used by ss applications.

@tractorcow tractorcow force-pushed the pulls/4.0/environment-getenv branch 2 times, most recently from 4dcfc5b to be85c99 Compare October 18, 2017 21:49
@sminnee
Copy link
Member

sminnee commented Oct 20, 2017

Final version looks good to me; let's merge this, assuming that it works for @zanderwar and @NightJar

@NightJar
Copy link
Contributor

NightJar commented Oct 20, 2017

MY_SECRET="has a \" in it; because #edge-cases dominate my life."
This solution is well good enough for now, but I think we've lost sight of the goal a fair bit.
The goal of a .env file is to shift named strings into memory. No more no less. The same as getenv() would do. Is there any particular reason we need type checking & parsing, nested variables or any of the other features an external library such as dotenv provides us, and we've gone for feature parity with?

@sminnee
Copy link
Member

sminnee commented Oct 20, 2017

@NightJar does the current solution fix the bugs that you experienced?

Also: in your comment above you didn't say whether or not that edge-case worked.

@tractorcow
Copy link
Contributor Author

I think supporting escaped quotes can be a later release fix, if necessary.

@tractorcow
Copy link
Contributor Author

tractorcow commented Oct 20, 2017

Shall we escew dotenv / parse_ini_string() for say, one of the below:

Unlike the prior library, these support the ability for us to parse and extract, but not populate putenv.

@dhensby @sminnee opinions?

I'm not really interested in complicating my regexp any more.

@tractorcow tractorcow force-pushed the pulls/4.0/environment-getenv branch from be85c99 to a5d8fcb Compare October 20, 2017 03:21
@tractorcow
Copy link
Contributor Author

Switched out env parser for m1/env. @dhensby see, no internal-env-parser implementation anymore. :) And feature parity.

@NightJar
Copy link
Contributor

Attempting to test.

@NightJar
Copy link
Contributor

This patch; good news.
Can replicate without.
Cannot replicate with.

Copy link
Contributor

@zanderwar zanderwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working for me 100%

@sminnee sminnee merged commit b9cb1e6 into silverstripe:4.0 Oct 20, 2017
@tractorcow tractorcow deleted the pulls/4.0/environment-getenv branch October 20, 2017 05:43
@sminnee
Copy link
Member

sminnee commented Oct 20, 2017

Merged, and merged 4.0 -> 4 as well on cms and framework

@tractorcow
Copy link
Contributor Author

Yay thank you. :)

@dhensby
Copy link
Contributor

dhensby commented Oct 20, 2017

Yay

@GrahamCampbell
Copy link

The current version of phpdotenv is robust and thread-safe. The following example will work in a multithreaded environment. This avoids using the adapter that would have called putenv and getenv, which are not threadsafe.

<?php

use Dotenv\Environment\Adapter\EnvConstAdapter;
use Dotenv\Environment\Adapter\ServerConstAdapter;
use Dotenv\Environment\DotenvFactory;
use Dotenv\Dotenv;

$factory = new DotenvFactory([new EnvConstAdapter(), new ServerConstAdapter()]);

Dotenv::create($path, null, $factory)->load();

@tractorcow
Copy link
Contributor Author

That's excellent, thank you @GrahamCampbell. I think the approach we've gone with is the best for us, however, but we appreciate your getting back to us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants