Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract object storage interaction #35

Open
misterbisson opened this issue May 24, 2016 · 17 comments
Open

Abstract object storage interaction #35

misterbisson opened this issue May 24, 2016 · 17 comments

Comments

@misterbisson
Copy link
Contributor

Given our goal to enable local development and portability, we might consider abstracting away the object storage interaction from manage.py inside the MySQL container.

In an offline conversation previously, I'd proposed doing the MySQL backups to a container serving WebDAV and accessed via https://github.com/amnong/easywebdav (or some other non-filesystem client library). The WebDAV container could then own the responsibility of interacting with the object store.

This would work on a laptop without any internet connection, and in private clouds where there's no intention of sending the backups off-site nor of setting up a local object store.

@tgross
Copy link
Contributor

tgross commented Aug 9, 2016

Could we use the NFS container that you have in autopilotpattern/wordpress instead? There's an existing Manta-NFS that would be the production-ready drop-in replacement.

@misterbisson
Copy link
Contributor Author

misterbisson commented Aug 9, 2016

Could we use the NFS container

autopilotpattern/nfsserver depends on --privileged in Docker, an issue that raised enough eyebrows among everybody involved that I added the following note:

This is not recommended for production use

Server and client containers need privileged on Linux hosts (though not on Triton, which supports this securely). This may not be a solvable problem. Docker volume drivers are probably the best recommended work around. On Triton, RFD26 will provide network shared filesystems to Docker containers using Docker volume syntax.

It may be possible that we could replace this with an RFD26 volume, but we should have a solid answer for how to use it on non-Triton environments.

@tgross
Copy link
Contributor

tgross commented Feb 9, 2017

In autopilotpattern/nginx#42 we've moved the deployment of the application into an examples repo that can target different environments. When we implement that for this repo we'll definitely want this storage abstraction as well.

@misterbisson
Copy link
Contributor Author

misterbisson commented Mar 6, 2017

This ticket includes mention of WebDAV and NFS. Between the two, NFS is easier in some ways, but WebDAV might be the better choice.

  1. It's easier to use WebDAV without mounting it as a filesystem, making it more portable. Indeed, many of the semantics match object storage usage.
  2. WebDAV over SSL on the public internet is sane, but NFS over the public internet isn't.

Thoughts?

In terms of goals, I realize there are two things we might want to achieve here, and they might go well together:

  1. Abstract the object storage component as described in the OP
  2. Put the backups on a local network for faster PUTs and GETs of them
  3. Though if we can sanely do this in a way that also supports access over the public internet, that might be good too

@tianon
Copy link

tianon commented Mar 13, 2017

(Hope you don't mind me quoting this publicly so we can discuss it openly @misterbisson ❤️)

Also, a question I want to pose to the group of us: which part of the interface is pluggable? It sounds like you're suggesting there's replaceable code in the MySQL container, but what if we moved that out to the WebDAV container? That would dramatically reduce the complexity for testing each of those apps and expand the number of different persistence stores that can be supported, but require an extra container (which might then synchronize back to an object store, or something) for every use case.

We've been looking at refactoring the current manta interactions that the MySQL pattern is using, and there is a fairly standard (tiny) set of operations that need to be performed to do a safe, reasonably-atomic backup in the context of MySQL (put and get, with associated Consul interaction, including/especially locking and staleness checking). The idea is to create a simple libbackup.py which can be integrated into any Autopilot implementation, and then they can trivially share backup backend implementations (which should be fairly trivial to write as well -- the Manta integration should be ~10 lines, as an example).

I think adding WebDAV support makes a ton of sense, and should be similarly trivial to include. We were planning on a direct-filesystem implementation as well so that folks running Docker proper can use volumes, NFS, or bind-mounts to store/manage their backups however makes sense for their environment. If we then implement an S3 backend, I think we'll have pretty well covered the majority of the big "content storage" platforms (especially since most of them have either a direct filesystem interface or an S3 knock-off interface).

So to respond directly to your question about creating a separate WebDAV server container which interacts with the storage platforms, I think that's an interesting way to try and force our implementations to stay DRY (and help us not duplicate effort). On the other hand, it will add an extra level of complexity and one more place that things could break down (not to mention that we'll be streaming all our backups over the network twice).

Also, that will mean that each autopilot pattern will be implementing the Consul updating/locking in their own unique way, which we've found even in MySQL isn't currently done 100% consistently (there are small differences in the way different parts of the MySQL implementation update Consul that could lead to races in certain edge cases that using a standardized libbackup.py implementation could help mitigate by forcing them to use a common implementation of the backup patterns).

The main downside we see with what we've come up with so far (putting the logic in MySQL itself) is that synchronizing the implementations themselves then becomes an irritating problem. One solution to that which we've considered is actually making a PyPI module for some of this "common" autopilot pattern code (or at least having a common repository where they're all synced to/from that implements a simple dummy application for testing/verifying).

@misterbisson
Copy link
Contributor Author

@tianon thanks for moving the conversation here.

I think you raise a number of important questions. One of the first ones I spotted was about what

We were planning on a direct-filesystem implementation as well so that folks running Docker proper can use volumes, NFS, or bind-mounts to store/manage their backups however makes sense for their environment

This is definitely easier from an implementation perspective, but it's also something I'm increasingly identifying as an antipattern that we need to avoid in our cloud applications. Assuming the filesystem is reliable is the cause of a lot of breakage, and once you make that assumption it's really hard to work around it to recover your apps in time of failure.

we'll be streaming all our backups over the network twice

I agree, that's definitely a cost. The recent S3 failure has also reminded me how important it is to design for failures. Doubling the network transfers also doubles the opportunity for recovery. That is, systems that depended on S3 failed, but systems that included some diversity didn't. Having a local copy of that data has huge value in those failure situations. (Yes, we are depending on network access to the local cache, but if that network is down we're also assuming the cloud availability zone is down.) Two network transfers gives us three copies of the data:

  1. In the MySQL primary
  2. On the filesystem of the WebDAV container
  3. In whatever storage solution the WebDAV container is backed up to

So, all-in-all, perhaps doubling the network interaction adds a lot of value?

there are small differences in the way different parts of the MySQL implementation update Consul that could lead to races in certain edge cases that using a standardized libbackup.py implementation could help mitigate by forcing them to use a common implementation of the backup patterns

You make a really good point about the challenges for locking and coordination, and that might be fair criticism. I get the idea you've just spent more time in the messy bits of code than I have recently, so I definitely don't want to argue with you.

That feels like a problem that needs a solution, but I'm not sure how directly connected the solution to that is to making backups in the DB container pluggable. In fact, it feels as though pluggability there increases the complexity and surface area. Your suggestion to create a library is a good one, but is that enough?

@tianon
Copy link

tianon commented Mar 14, 2017

Good points -- yeah, I can see the value of having a separate sidekick WebDAV container as an additional point of backup.

Honestly, I see a lot of value in implementing both the pluggability in MySQL and implementing the WebDAV sidekick, since I see valuable use cases for each (and either way, we need some way for MySQL to dump into WebDAV).

When deploying to Triton specifically, dumping backups directly into Manta is going to be more stable/reliable than using a proxy WebDAV container, isn't it? It seems to me like it'd be a regression (from the perspective of a user of this MySQL implementation) to stray from that by essentially re-implementing a smaller version of Manta itself which does automatic backups to Manta as well.

Would this WebDAV sidekick be responsible for pulling down the backups at launch too? Would it support multi-instance clustering similar to other autopilot implementations? Or would the backups on Manta at that point simply be a backup of the backup, and need to be restored into WebDAV manually to "reseed" a cluster, and WebDAV itself is the "source of truth" for which backups are available for cluster restoration/seeding?

@yosifkit
Copy link

The backup files in the container in /tmp/backup or wherever are considered part of the backup strategy? I was assuming it was a bug that there was no control over them and that they would grow indefinitely. 😕

Here is part of the idea we had so far. I can push it up to a branch if you'd like a full diff of other changes for more concrete talking points. Part of the idea was to ensure that the download of a backup (and its extraction) or the creation of a backup gets cleaned up after it is used. This could probably be extended to keep X number of backups locally if that is desired.

    def put_backup(self, backup_func):
        # TODO get lock
        # self.consul...
        # get any previous backup info from consul
        previous_data = self.consul.get_snapshot_data()

        # get time now so it is consistant throughout the backup process
        backup_time = datetime.utcnow()
        # backup_id generation from format string using time
        backup_id = now.strftime('{}'.format(self.backup_id_fmt))

        # make a working space that the db can use
        # we'll clean it up at the end
        workspace = self.__make_workspace()

        # have the db make a backup
        infile, extra_data = backup_func(workspace, backup_time, previous_data.get('extra'))

        # database function can return None for the file is it doesn't need to do a backup right now
        # ie, it will compare timestamps or binlog or whatever the db uses to determine need to backup
	# can store anything like binlog name in the "extra_data" dict/map return value and it gets saved to consul; extra_data should probably be kept to a minimum

        if infile:
            # this is one of the functions that would then proxy to whichever backup solution
            self._put_backup(infile, backup_id)
            # store successful backup info in consul
            extra_data = extra_data or {}
            self.consul.record_snapshot({ 'id': backup_id, 'extra': extra_data})

        # TODO release lock
        # self.consul...

        # clean up
        self.__clean_workspace(workspace)

    def get_backup(self, restore_func):
	# TODO locking needed for restore?
        current_data = self.self.consul.get_snapshot_data() or {}
        backup_id = current_data.get('id')

        if backup_id:
            # make a space for backup storage to download the backup
            workspace = self.__make_workspace()
            # download the backup file
            # this is one of the functions that would then proxy to whichever backup solution
            datafile = self._get_backup(backup_id, workspace)

            # make a new space for db so that it can extract if needed
            db_workspace = self.__make_workspace()
            # have the db restore from the given file
            restore_func(datafile, db_workspace)

            # clean up the workspaces
            self.__clean_workspace(db_workspace)
            self.__clean_workspace(workspace)

@tgross
Copy link
Contributor

tgross commented Mar 15, 2017

@tianon wrote:

When deploying to Triton specifically, dumping backups directly into Manta is going to be more stable/reliable than using a proxy WebDAV container, isn't it? It seems to me like it'd be a regression (from the perspective of a user of this MySQL implementation) to stray from that by essentially re-implementing a smaller version of Manta itself which does automatic backups to Manta as well.

I agree that's the case, but it's a question of portability across deployment platforms and across blueprints.

@yosifkit wrote:

The backup files in the container in /tmp/backup or wherever are considered part of the backup strategy? I was assuming it was a bug that there was no control over them and that they would grow indefinitely

They should be removed from the container after uploading. If not, yes that's a bug.

@misterbisson
Copy link
Contributor Author

@tianon wrote:

Would this WebDAV sidekick be responsible for pulling down the backups at launch too? Would it support multi-instance clustering similar to other autopilot implementations? Or would the backups on Manta at that point simply be a backup of the backup, and need to be restored into WebDAV manually to "reseed" a cluster, and WebDAV itself is the "source of truth" for which backups are available for cluster restoration/seeding?

These are good questions, and I think you're slyly drawing out the complexity of attempting to run your own object store.

Stale reads of the backup files are clearly a problem we'll need to avoid. My "simple" answer for that is to make the WebDAV container the source of truth. And that also means we'll probably need to run just one sidekick instance per thing-being-backed-up.

For those WebDAV sidekicks that backup to Manta, I'd want to use https://github.com/bahamas10/node-manta-sync and create startup options that would support ingesting the contents of a directory as a preStart action. I might even want the ability to ingest the contents of one directory but back up to a different directory, or maybe do no backups at all after the download, as that would give me significant flexibility in bringing up dev/test environments. The same is obviously also possible with aws s3 sync....

For production use on AWS, I'd probably back the sidekick container with an EBS volume and sync the contents off to S3 in a different region. On Triton I'd do much the same, but use an RFD26 NFS volume (not yet actually available) and Manta. I can imagine myself choosing to backup to object storage solutions from multiple providers, honestly, since storage is cheap and downtime is so expensive. /end uptime paranoia

@tianon
Copy link

tianon commented Mar 31, 2017

Ok, that's fair (big +1 to letting the WebDAV container be the source of truth for simplicity's sake). I can definitely see the appeal of a backup-storing sidekick, and agree that it's probably a good plan to move forward on that (so MySQL needs a way to get/put backups to a WebDAV endpoint for sure).

I think the question that still remains is whether we want MySQL to still be able to backup directly into Manta (especially since it currently works that way). What's the backwards-compatibility story for users of this image? 😄 @yosifkit has a sane (IMO) design for being able to trivially support both Manta and WebDAV directly, and I think it'll help with clarity to abstract some of the "interact with a backup service" code from the "interact with MySQL" code (and more importantly, makes sure that existing users are still covered as-is), but we're happy to go either way.

(For my own personal deployments, especially smaller ones, I'd really rather avoid the extra complexity of having a WebDAV container too, so I'd honestly like this whole backup bit to be completely optional as well, but I digress. 😇)

@misterbisson
Copy link
Contributor Author

I think the question that still remains is whether we want MySQL to still be able to backup directly into Manta

I'm ready and willing to break backwards compatibility, but...

For my own personal deployments, especially smaller ones, I'd really rather avoid the extra complexity of having a WebDAV container too, so I'd honestly like this whole backup bit to be completely optional as well

The complexity you're hoping to avoid is having the WebDAV at all, not whether or not the WebDAV container then backs up elsewhere?

Is this in MySQL or Mongo? The difference is that MySQL won't work without a way to bootstrap replicas with a backup from somewhere, but if your interest is primarily in Mongo, then perhaps we can make the backup behavior optional?

You can probably see that I'm trying to avoid having extra code paths, but I also need to check my assumptions about complexity here. Well, I need to avoid being too dogmatic, anyway.

@misterbisson
Copy link
Contributor Author

misterbisson commented Mar 31, 2017

One thing that I've been really hand-wavey about is the behaviors of the WebDAV container.

  • Normal runtime behavior
    • Does it have back up files to an object store at all?
    • Can it simultaneously backup files to multiple object stores?
    • How are the object store credentials configured?
    • How does it handle errors?
    • What's the retention policy for old backups?
  • Bootstrap behavior (corresponding to ContainerPilot's pre-3.0 preStart)
    • Does it/can it fetch files from an object store?
    • If so, does it have to be the same object store or path as used during normal runtime?
    • If so, we have to assume it doesn't become healthy until the bootstrap is complete?
  • DB host behavior
    • Depending on the bootstrap behavior above, the DB host bootstrap might have to block on the availability of the WebDAV host, at least to maintain compatibility with the current feature in AP MySQL that allows us to import an existing database backup at startup

There are probably other questions as well.

@yosifkit
Copy link

Let me make sure I understand the two approaches I think we are discussing:

option 1: (mysql -> webdav -> manta/S3/etc)

  • mysql containers talk to a single WebDAV container to get backups for bootstrapping
  • WebDAV container (new) may or may not be configured to also push data to an external storage like manta or S3
  • backup process would be initiated from one of the mysql servers via the current console locking and pushed to WebDAV
  • optional, keep existing config to push to manta without needing WebDAV container

option 2: (mysql -> manta/S3/WebDAV/etc)

  • mysql containers talk to any configured (via ENV) backup service to get or push backups
    • start with local disk, manta, and WebDAV, with simple interface for extending to other storage services
  • would require own custom periodic task that would sync to another service

While writing this out, I can see that the two are not mutually exclusive; we can use option 2 to abstract away the tight manta integration while adding WebDAV support and also create a WebDAV container that can sync its local storage to another service.

@misterbisson
Copy link
Contributor Author

One of the most salient concerns I have is for being able to test the MySQL (or other DB) container thoroughly. We're doing a lot of work on that point now, and it's really highlighted how much more complex it would be if we add new code paths inside the DB container.

Option 1 definitely satisfies that while also making it possible to have a separately tested container that can back up to many different locations. Option 2 is what I'm afraid of.

I'll acknowledge that some of the complexity is just traded between the two options, but I believe that defining the interface between the DB and the backup narrowly as WebDAV does eliminate some complexity. It eliminates it to the degree that we can trust the DB will behave the same way regardless of what happens on the other side of the WebDAV container.

@yosifkit
Copy link

yosifkit commented May 8, 2017

Current Work-in-Progress is over in https://github.com/infosiftr/autopilotpattern-mysql/tree/generalize-backup. Hopefully we can have it running soon (@moghedrin has started helping as well) and should be able to reuse much of the additions for a WebDav service container that backs up to a configured service like Manta or even just local disk.

The flow will be something like this: Mysql container (using the new backup lib) sends tarball to the WebDav container that saves the file locally and then sends the file to Manta or other service. The plan is to also have the ability to use Manta (with possibility to add other services) directly from the Mysql container but it will be provided without full integration testing.

@misterbisson and @tgross, once we have a complete solution, I think it would make sense to move the new libbackup stuff to their own lib-autopilot-common-py git repo so that they can be used easily in mongo and postgresql autopilots, as well as the WebDav server. I hope in the future we can also include containerpilot.py and other functions that are not specific to the particular database.

Does that sound amenable? Anything you want to change at this beginning stage? Any deadlines that you want/need to achieve?

Note: lib-autopilot-common-py is just a random suggestion, bike-shedding welcome 😉

@tgross
Copy link
Contributor

tgross commented Jun 5, 2017

The plan is to also have the ability to use Manta (with possibility to add other services) directly from the Mysql container but it will be provided without full integration testing.

This seems like it doesn't get us much; the primary reason to abstract the storage component was to isolate the testing. If we keep the ability to use Manta directly from the MySQL container we still have to have all the full integration testing -- having untested components isn't going to fly.

I think it would make sense to move the new libbackup stuff to their own lib-autopilot-common-py git repo so that they can be used easily in mongo and postgresql autopilots, as well as the WebDav server.

The WebDAV server should certainly be in its own repo. I'm not so sure about making the backup component a library though; once you've pushed it into the WebDAV server it's a pretty simply client and having it as a library makes it harder to iterate on the design across multiple blueprints.

I hope in the future we can also include containerpilot.py and other functions that are not specific to the particular database.

I find it slightly terrifying that we've been copying these components into databases that have completely different semantics for clustering and replication. 😀 Maybe we get the individual blueprints into a mature and production-ready shape before we try factoring out abstractions that might just prove to be the wrong abstractions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants