Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split off metadata #21

Merged
merged 18 commits into from
Jan 10, 2015
Merged

Conversation

jrfnl
Copy link
Contributor

@jrfnl jrfnl commented Sep 28, 2014

As - unlike with the Headers class -, the static methods in the Package class can not easily be overloaded, I'd suggest splitting them off into a separate Metadata class which can be overloaded.

This will make it a lot easier to add additional information to the metadata for a package.
All you need to do is extend the Metadata class and overload the $cacheTime $headerMap, $readmeMap and/or constructor to add additional data.
You will of course also need to overload the Package::fromArchive() method to use your extended class rather than the original, but as it's now a small method, it doesn't break the upgrade path as much as before.

Simple example code for how this could now be overloaded:

class My_Metadata extends Wpup_Metadata {

    protected $headerMapExtra = array(
        'Text Domain' => 'textdomain',
    );

    protected $readmeMapExtra = array(
        'short_description',
    );

    public function __construct( $slug, $filename, Wpup_Cache $cache = null ) {
        $this->headerMap = array_merge( $this->headerMap, $this->headerMapExtra );
        $this->readmeMap = array_merge( $this->readmeMap, $this->readmeMapExtra );

        parent::__construct( $slug, $filename, $cache );
    }

    protected function extractMetadata() {
        $metadata = parent::extractMetadata();
        if ( ! is_array( $metadata ) ) {
            return null;
        }

        // Add additional metadata
        $metadata['type'] = $this->packageInfo['type'];

        return $metadata;
    }
}

This will make it a lot easier to add additional information to the metadata for a package.
All you need to do is extend the Metadata class and overload the $cacheTime $headerMap, $readmeMap and/or constructor to add additional data.
You will of course also need to overload the Package::fromArchive() method to use your extended class rather than the original, but as it's now a small method, it doesn't break the upgrade path as much as before.
… split-off-metadata

Conflicts:
	includes/Wpup/Package.php
@jrfnl
Copy link
Contributor Author

jrfnl commented Nov 25, 2014

@YahnisElsts Just wondering whether you'd had a chance to have a look at this one yet. Hope all is well ;-)

@YahnisElsts
Copy link
Owner

Thanks for the reminder!

This will make it a lot easier to add additional information to the metadata for a package.

That's a worthy goal, but I'm not convinced that introducing another class is the best way to achieve it. Some thoughts:

  • You could make Wpup_Package easier to overload by swapping a couple of self keywords with static.
  • Right now, there's one class that handles metadata parsing, mapping, caching, and minor utility stuff (file size, modification time, etc). With this change, there would be two classes: one that handles metadata parsing, mapping, caching and some file related things, and one that handles minor utility stuff (file size, etc). It makes Wpup_Package such a thin wrapper around Wpup_Metadata that you might as well merge them back into one class.
  • Extracting the header and readme field mappings as separate properties is a good idea, but you don't need a new class for that.
  • In addition to overloading Wpup_Package::fromArchive(), you would also need to overload Wpup_UpdateServer::findPackage() to use your package subclass.

I would propose something like this instead:

  • Make it easier to extend Wpup_Package. This way we can cover a wider range of possible use cases, like loading metadata from the database, parsing files in SVN, automatic packaging, and so on.
    • Replace self with static.
    • Extract metadata mapping as a separate static method (e.g. mapMetadata($packageInfo)). This both simplifies fromArchive and gives subclasses an easy way to change the mappings in arbitrary ways.
    • Extract header and readme field maps as separate properties, as in your current implementation.
  • You would still need to overload Wpup_UpdateServer::findPackage(). If that's too much, add some kind of a protected $packageFileLoader = array('PackageClass', 'fromArchive') property and change findPackage() to call it with call_user_func().

What do you think?

@jrfnl
Copy link
Contributor Author

jrfnl commented Dec 4, 2014

Hi @YahnisElsts, Sorry for my slow reply, I was away at a conference.

I had to dig back into my memory why I choose this implementation.

From looking at the code, I believe I choose this implementation for three reasons:

  • Separation of concerns
  • Maintaining backward compatibility
  • Ease and elegance of overloading

The main thing is that the Package object in this implementation stays the same as before (methods, properties), just how it retrieves the metadata changes.

In addition to overloading Wpup_Package::fromArchive(), you would also need to overload Wpup_UpdateServer::findPackage() to use your package subclass.

True, forgot to mention that. We're overloading that already as we do version negotiation there ;-)

You could make Wpup_Package easier to overload by swapping a couple of self keywords with static.

True, but the methods would still be static, i.e. the properties would also all need to be static for the static method to have access to them and then you'd get in quite a complicated mess of static vs self in accessing those properties.

The other alternative would of course be to make the relevant methods non-static within the Wpup_Package class.
If I would want to do this in a clean way it would break backward compatibility as the constructor would change. Alternatively it would become a bit messy with methods and arguments becoming deprecated.

What do you think ? Shall I code out the alternative and send it in as a separate PR for review ?

@jrfnl
Copy link
Contributor Author

jrfnl commented Dec 14, 2014

@YahnisElsts So... what do you think ?

@YahnisElsts
Copy link
Owner

Huh, I could swear I replied to this thread a week ago. Fortunately, I still remember most of what I had (or thought I had) written.

Separation of concerns

There is that. I guess this particular case just doesn't hit my personal threshold for "needs better separation of concern".

Maintaining backward compatibility
[...]
The main thing is that the Package object in this implementation stays the same as before (methods, properties), just how it retrieves the metadata changes.

The alternative I mentioned would provide a similar level of backwards compatibility. The Wpup_Package class would get a couple of new methods and fields, but it would also keep its existing methods and their behaviour. The interface would be preserved.

Ease and elegance of overloading
[...]
True, but the methods would still be static, i.e. the properties would also all need to be static for the static method to have access to them and then you'd get in quite a complicated mess of static vs self in accessing those properties.

It seems I'm missing something here. What is the problem with overloading static methods? The only scenario that I can think of where it would complicate things is if you were doing some kind of dependency injection, and even then you could probably inject a class instead of an instance.

As for static vs self and making a mess, the current implementation uses self in only three places. Are there any downsides to just replacing them all with static?

What do you think ? Shall I code out the alternative and send it in as a separate PR for review ?

Sure.

@jrfnl
Copy link
Contributor Author

jrfnl commented Jan 6, 2015

Sorry about not doing anything with this, but I keep struggling with what you propose.

The choice for sticking with static methods and even adding static properties in this case is IMHO just wrong.

The split into two classes is justified as - as documented in the original class - metadata can be retrieved from other sources (db/config file). The implementation I propose(d) in this PR makes it simple to either swap out the Wpup_Metadata class for a completely custom implementation for retrieving the metadata or to overload it like shown in my original post - all while maintaining full backward compatibility.

Your suggested implementation on the other hand would make it more complicated rather than simpler.

Sorry, but I stand by my original PR.

@YahnisElsts
Copy link
Owner

Nitpick: Still, what's the problem with static methods and static properties? Is it a style issue? Why is it better to use instance methods in this case?

@jrfnl
Copy link
Contributor Author

jrfnl commented Jan 6, 2015

Static methods and properties have their place in programming, just not in this situation.
@mathiasverraes has explained this much better than I could in this blogpost.
What it comes down to is that the methods in the PR could have widely different implementations (taking the data from a database, config files, which metadata will be available) and are therefore not stateless nor entirely predictable.

Here are some more articles which argue in a reasonably balanced manner when to use static and when not (even though the titles are misleading):
http://simpleprogrammer.com/2010/01/29/static-methods-will-shock-you/
http://www.phparch.com/2010/03/static-methods-vs-singletons-choose-neither/
http://kore-nordmann.de/blog/0103_static_considered_harmful.html

@YahnisElsts
Copy link
Owner

Note: This ended up being pretty long.

Regarding static methods

Thanks for the links. Those articles make some good points.

It looks like the first blogpost says that the problem with static methods is shared global state and side effects. The author mentions that static methods are appropriate for implementing stateless and/or purely functional algorithms. I can agree with that.

Personally, I'd even go as far as to argue that any stateless task should be implemented statically because it makes code easier to read. Given a well-written static method, you can determine its inputs and outputs just by looking at the method signature. You don't have to worry that its behaviour might change in some unexpected way based on some hidden variable. For an instance method, you need to actually read the entire function body to see how it interacts with instance properties.

Both "Static Methods Will Shock You" and "static considered harmful" mention another case where using static methods is valid: factory methods. I can agree with that, too. This is one of the reasons why Wpup_Package::fromArchive is static - it's essentially a factory method.

"static considered harmful" also spends a lot of time explaining how static methods introduce dependencies that are hard to change without changing the original source code. It suggests dependency injection as a better alternative. I'd say this is generally a good idea, though one should be careful to avoid overengineering. Not every project needs an AbstractFactoryFactory.

Then there's the testing/mocking argument, which is basically a special case of "dependency injection is better".

How that applies to metadata

I think my main cause for disagreement on this PR could be summarized like this: extracting ZIP metadata is

  1. Mostly a stateless process (aside from caching). Given the same file contents you'll always get the same result. That's why static methods seem appropriate.
  2. A reasonably simple task that's highly relevant to the only currently supported package type (ZIP files). So refactoring it as a separate class looks like an unnecessary complication. It trades two methods with a combined ~100 LOC for a class with 200+ LOC and a more complex execution flow.

On the other hand, it's true that the above points might not apply to other implementations (loading metadata from the database, etc). And maybe some users that need to make complex changes to the metadata would benefit from a non-static implementation.

All right, lets make a separate metadata class.

Practical suggestions

Should Wpup_Metadata be a highly general class than can be used/subclassed by most developers, or a more focused utility class that people can inherit from when they want to customize how the server parses ZIP files? Right now it's looking more like the latter. Someone who needs to load metadata from a different source probably wouldn't gain much by resuing it, and would instead either write a completely independent implementation or subclass Wpup_Package.

Ideas/options:

  1. Rename Wpup_Metadata to something like Wpup_ZipMetadataParser to better represent what it does. Skip adding a general metadata class, at least until there's another metadata source. I'd prefer this option.
  2. Make a general Wpup_Metadata class that only contains stuff that would apply to all packages. Make a separate subclass that deals with ZIP files specifically. Downside: I suspect the base class would end up being very bare and not that useful.

Regarding the current implementation:

  • Why is setMetadataFromArchive() public? It's not called from anywhere outside the class.

  • This bit looks suspicious to me:

    $meta = array();
    $meta = $this->getInfoFromHeader($meta);
    $meta = $this->getInfoFromReadme($meta);

    Someone reading the code would have to look at the implementation of each method to see what exactly they're doing with $meta and what they return (maybe they transform it into an object? maybe the call sequence matters?). In fact, passing $meta as an argument is unnecessary since the methods are mutually independent (AFAICT) and don't use its contents. This fragment could be rewritten like this:

    $meta = array_merge(
        array(),
        $this->getInfoFromHeader(),
        $this->getInfoFromReadme()
    );
    /* Also modify both methods to use a temp array. */

    Now it's obvious that $meta is always an array, and that the methods add something to that array.

@jrfnl jrfnl mentioned this pull request Jan 7, 2015
@jrfnl
Copy link
Contributor Author

jrfnl commented Jan 7, 2015

@YahnisElsts Thank you for taking the time and your detailed feedback.

Someone who needs to load metadata from a different source probably wouldn't gain much by resuing it, and would instead either write a completely independent implementation or subclass Wpup_Package.

In which case that custom implementation is now easier to call what with the $packageFileLoader property in the UpdateServer class which I implemented based on your suggestion.

  1. Rename Wpup_Metadata to something like Wpup_ZipMetadataParser

Done.

  1. Make a general Wpup_Metadata class that only contains stuff that would apply to all packages.

I considered it, but as you say it would be extremely bare, so I left that out for now until there turns out to be a call for it. Would be easy enough to implement anyway.

Why is setMetadataFromArchive() public? It's not called from anywhere outside the class.

Artefact from the original static implementation. Fixed.

This bit looks suspicious to me: ... ... Now it's obvious that $meta is always an array, and that the methods add something to that array.

As it's now a class and the data gets send to the $metadata property anyway, I've now refactored the code to use that property everywhere rather than passing the data around all the time.

I've also abstracted out the code a bit more. The original implementation stayed very close to the original methods. I've now refactored it to be more like a proper class ;-)

Hope you approve of the changes.

@YahnisElsts
Copy link
Owner

Some thoughts:

Since this is now a class and not a public method, there's no need for extractMetadata() to report an error by setting $this->metadata to null. Just let it throw new Wpup_InvalidPackageException directly and get rid of the null check in createMetadataCache.

On a related note, I disagree with the idea of splitting the cache-related stuff into setMetadataFromCache and createMetadataCache. Caching is an optional optimization layer. I think it should be as unobtrusive as possible and most code shouldn't know or care about the cache. Calling the main entry point to the parser logic setMetadataFromCache gives caching far too much weight.

Also, in my experience, this is a fairly common pattern when working with cache:

  • Try to retrieve data from the cache.
    • On success, return it and exit.
  • Generate data the slow way.
  • Store it in the cache.

These are all closely related operations. Why split off the last two?

Another thing I noticed is that setHeaderFields and setReadmeFields are very similar. Both of them copy values from one array to another based on a list of keys. It would be possible write a more general method that maps specific keys from an input array to $this->metadata. Something like setMetadataFields($sourceArray, $keyMap) (names need work).

setReadmeSections is also very similar, except it transforms key names and has no filter, accepting all keys/sections. A more complicated version of setMetadataFields could handle that as well.

Finally, a minor style issue: in extractMetadata, that else should look like this: } else {.

@jrfnl
Copy link
Contributor Author

jrfnl commented Jan 8, 2015

Thanks for your feedback again. I agree with most of your points and probably would have done those things already if I wasn't working on ten things at the same time ;-)
I've updated the PR incorporating most of your points.

setReadmeSections is also very similar, except it transforms key names and has no filter, accepting all keys/sections. A more complicated version of setMetadataFields could handle that as well.

I've looked at combining that into the new method as well, but with there being three differences between how things would need to be handled, it would actually result in a lot more code, rather than less.
The three differences being:

  • no map, everything is accepted
  • the str_replace() for the section name
  • the sections going into a 'sections' subarray within $metadata

Hope we got everything now.

* @param array $map The key mapping for that sub-array where the key is the key as used in the
* input array and the value is the key to use for the output array
*/
protected function setMappedFields( $input, $map ){
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No spaces around method parameters, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, sorry, too used to using the WP coding standards.

@YahnisElsts
Copy link
Owner

I've looked at combining that into the new method as well, but with there being three differences between how things would need to be handled, it would actually result in a lot more code, rather than less.

In theory, you could make the mapping function handle the differences by making it more general and extracting filtering/key mapping logic to a separate method or closure:

protected function mapFields($input, $map, $transform){
    $output = array();
    foreach($input as $fieldKey => $value) {
        $metaKey = call_user_func($transform, $fieldKey, $map);
        if ( $metaKey !== null ) {
            $output[$metaKey] = $value;
        }
    }
    return $output;
}

/* This is the default $transform */
protected function getMappedKey($fieldKey, $map) {
    return isset($map[$fieldKey]) ? $map[$fieldKey] : null;
}

/*...*/

$this->metadata['sections'] = $this->mapFields(
    $this->packageInfo['readme']['sections'],
    array(),
    function($sectionName, $dummy) {
        return str_replace(' ', '_', strtolower($sectionName));
    }
);

... but it is indeed more code.

@jrfnl
Copy link
Contributor Author

jrfnl commented Jan 9, 2015

@YahnisElsts So are we good to go now ?

YahnisElsts added a commit that referenced this pull request Jan 10, 2015
Split off metadata parsing as a separate class
@YahnisElsts YahnisElsts merged commit f25c8cc into YahnisElsts:master Jan 10, 2015
@jrfnl
Copy link
Contributor Author

jrfnl commented Jan 10, 2015

Thank you!

@jrfnl jrfnl deleted the split-off-metadata branch January 10, 2015 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants