-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage/cloud: create user scoped "from client" file upload for easy imports (fka make 'nodelocal' safe for multi-user clusters) #47211
Comments
Notes after talking with @dt and @miretskiy and exploring relevant code. Using
To address these concerns we feel that instead of incrementally making The new command will reuse much of the logic to transfer a local file over pgwire, as well as the existing inter-node blob communication service. The main differences are highlighted below: Instead of using Another difference between While The eventual goal of developing
Outstanding questions:
|
…ageBuilder Previously, the ExternalStorage factory methods were initialized on creation of a NewServer(). These closures were then plumbed through various componenets such as sqlServer, storeConfig etc. The closures were invoked whenever an ExternalStorage object needed creation. This refactor maintains the semantics of the ExternalStorage factory methods, but introduces an ExternalStorageBuilder. This object implements the factory methods and stores relevant configuration parameters. The main goal of this refactor was to separate the "creation" and "initialization" stages of the ExternalStorageBuilder. While the object is created and plumbed on creation of a NewServer(), certain config params (eg: storage.Engine) are only initialized on server.Start(). The plumbing of an object pointer as compared to closures allows us to cleanly configure the builder in server.Start() and propogate this information for future use in the factory methods. Using and plumbing a dedicated builder also seems like a design improvement to bookkeeping closure variables. This lays the groundwork for the user scoped storage which is being discussed in cockroachdb#47211. Release note: None
…r struct Previously, we initialized the ExternalStorage factory methods on creation of a NewServer() as all the required config params were ready-to-use. With future work related to user scoped storage requiring access to the underlying storage.Engine, this change introduces a wrapper around these factory methods. Using a builder struct allows us to split the "creation" and "initialization" of the builder between the NewServer() and Start() methods respectively. This allows for params which are only initialized on server.Start() to be propogated to the builder for future use. This is part of a gradual refactor of the ExternalStorage factory interface and is primarily to unblock development of cockroachdb#47211. Release note: None
Notes after UX discussion with @mwang1026 and @dt. I believe we have a consensus on the importance of developing a new user scoped, encrypted storage, and so I will give my 2¢ on some of the pending technical/UX questions below:
Taking a leaf from the
Broadly I see three sources of files in this staging area:
In the first two cases, I do not believe we should proactively delete the files as the user has initiated the process of getting them onto the node. In the third case, the files should be deleted on import completion (both success and failure) because the goal of the CLI import command is to get the data into the specified table and not on disk. Some of the suggestions on how we will keep track of this metadata were to create a system table or simply write to a txt file stored in the user-specific directory with the relevant metadata (this is what
Additional conversations will be required to narrow down on the URI we require users to specify when uploading, downloading, backing up, or importing to their directories, but at a high level, username directories under the global Roughly the development process should be very similar to the one mentioned in the
I'd love to get comments on these broad design areas (and ones that I have missed)! |
…r struct Previously, we initialized the ExternalStorage factory methods on creation of a NewServer() as all the required config params were ready-to-use. With future work related to user scoped storage requiring access to the underlying storage.Engine, this change introduces a wrapper around these factory methods. Using a builder struct allows us to split the "creation" and "initialization" of the builder between the NewServer() and Start() methods respectively. This allows for params which are only initialized on server.Start() to be propogated to the builder for future use. This is part of a gradual refactor of the ExternalStorage factory interface and is primarily to unblock development of cockroachdb#47211. Release note: None
More notes after a discussion with @bdarnell and @dt. Adding a new user storage into CRDB comes with a host of additional semantics we have to design solutions for, such as:
The surface area of this change is very large for the target use case which is top-of-the-funnel non-production users trying to import local files/dumps into a cluster. With the hypothesis that we are not designing for large scale imports, another solution is to upload files to user-specific tables. More specifically when a user triggers an upload/import from the CLI we would create a table tied to that user with primary key It is important to understand the downsides of this approach:
Now for the upsides:
As an aside, we already allow users to create a table and insert a column of bytes into it. The new solution is more in line with this, we are telling the user "we could do this for you, and use it for an import". There are still some product-related questions which @miretskiy raised: |
I'm a little concerned that the complexity of these solutions is >> the value it'll bring. A "can't upload files bigger than X" criteria seems like poor UX as well--it all but tells the user that it's a "for testing only" feature. It feels to me that we're overthinking this a bit. If the file is big and they're worried about timeouts there IS a workaround that is very straightforward--upload file to something that won't go to sleep! And if they're looking for something quick and dirty for small-ish files on their laptop maybe streaming is sufficient and they're OK with having to retry of the connection fails or computer goes to sleep. For the Postgres and Mysql parallels, do they also falter if the connection fails? If not what is the behavior for them? |
We already have pretty poor UX experience if a user attempts to upload large single file; |
Ben had voiced concerns about exposing a gRPC connection to the client. It is common for DBs to only have a SQL access. It would add additional security concerns about ensuring that only an authorized user can read files off their client. David gave a worst-case scenario where a malicious client A could pull (import) private docs over an existing connection to the cluster with client B, via a crafted URI. Streaming is also technically more complex than the table based solution. |
(obviously we'd code-up checks in the client to the paths the server requests it send to avoid that case, but it's an example of the added security surface area we'd need to consider). A nice thing about just stuffing bytes in a table is that it also means we get to use our existing security controls "for free" -- since it is just a table, existing grants and access control just work, existing space accounting / monitoring / visibility just works (i.e. the space used by these files would be reflected on dbs/tables pages), durability/replication/availability all work. It certainly has some appeal. |
There are clear benefits with the table based approach; no doubt.
However, if we are going to impose fairly low limits on the size of those
files, then my question is: can we take this small file
(say ~4MB) and put it directly inside import job specification? There are,
of course, negatives w/ this (namely, the file size limit has to be fairly
small, increase in proto buf size in the jobs table); but there are also
clear benefits too (namely, we don't really have to develop much of
anything, and, upon job completion, we can even erase those bytes from the
payload).
…On Wed, Jun 17, 2020 at 3:19 PM David Taylor ***@***.***> wrote:
(obviously we'd code-up checks in the client to the paths the server
requests it send to avoid that case, but it's an example of the added
security surface area we'd need to consider).
A nice thing about just stuffing bytes in a table is that it also means we
get to use our existing security controls "for free" -- since it is just a
table, existing grants and access control just work, existing space
accounting / monitoring / visibility just works (i.e. the space used by
these files would be reflected on dbs/tables pages),
durability/replication/availability all work. It certainly has some appeal.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#47211 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANA4FVCUUUEPRFCVKAFUCUDRXEJLBANCNFSM4MEDRLPQ>
.
|
50116: kv: decompose roles of Gossip dependency in DistSender r=nvanbenschoten a=nvanbenschoten This change decomposes the roles of Gossip as a dependency in DistSender into that of a NodeDescStore, a source of node descriptors, and that of a FirstRangeProvider, a provider of information on the first range in a cluster. This decomposition will be used to address #47909 by: 1. replacing Gossip with a TenantService as a NodeDescStore 2. providing a custom RangeDescriptorDB (also backed by a TenantService) instead of a FirstRangeProvider. Together, these changes will allow us to remove DistSender's dependency on Gossip for SQL-only tenant processes. The next step after this will be to introduce a TenantService that can satisfy these two dependencies (NodeDescStore and RangeDescriptorDB) and also use the new NodeDescStore-to-AddressResolver binding to replace the use of Gossip with the TenantService in nodedialer instances. 50293: server: Wrap ExternalStorage factory methods in externalStorageBuilder struct r=adityamaru a=adityamaru Previously, we initialized the ExternalStorage factory methods on creation of a NewServer() as all the required config params were ready-to-use. With future work related to user scoped storage requiring access to the underlying storage.Engine, this change introduces a wrapper around these factory methods. Using a builder struct allows us to split the "creation" and "initialization" of the builder between the NewServer() and Start() methods respectively. This allows for params which are only initialized on server.Start() to be propogated to the builder for future use. This is part of a gradual refactor of the ExternalStorage factory interface and is primarily to unblock development of #47211. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Aditya Maru <adityamaru@gmail.com>
Makes sense re: complexity of solutions and gRPC concerns with streaming. No objections from me on implementation with the "intermediate" table with byte blobs. What's a high level estimate for the effort to accomplish this feature? re: the downsides Ben mentioned, are they mitigated by a lower replication factor on this special table?
Agreed. The point I'm trying to make is if someone is trying to migrate from a production DB, they're probably OK to use cloud storage, http sink, etc. to host the files to import. re: the small use case it's definitely valuable to have a simpler option--the question in my mind is do we have explicit size limitations (not my favorite) or just stern warnings that this is best used for files smaller than X for testing. To help make a decision here, I need a better understanding of what happens if the file IS big? So a few open questions still:
|
Indeed, we'd love to do this (and could see it being useful in IMPORT more broadly too), but right now loss of quorum is a very, very bad thing -- unavailable ranges will cause various systems will get stuck forever and make a cluster sad. We need KV to build our more robust recovery from loss of quorum tools before we can safely run at lower replication factors in subsets of the keyspace. |
This is a building block for supporting user scoped file upload and download to and from a cluster, details of which are being discussed in cockroachdb#47211. FileTableReadWriter can be used to store, retrieve and delete blobs and metadata of files, from user scoped tables. Access to these tables is restricted to the root/admin user and the user responsible for triggering table creation in the first place. Under-the-hood FileTableReadWriter operates on user specific File and Payload tables. The File table stores the metadata, while the Payload table stores the contents of the file as chunks of a configurable size. Release note: None
This is a building block for supporting user scoped file upload and download to and from a cluster, details of which are being discussed in cockroachdb#47211. FileTableReadWriter can be used to store, retrieve and delete blobs and metadata of files, from user scoped tables. Access to these tables is restricted to the root/admin user and the user responsible for triggering table creation in the first place. Under-the-hood FileTableReadWriter operates on user specific File and Payload tables. The File table stores the metadata, while the Payload table stores the contents of the file as chunks of a configurable size. Release note: None
This is a building block for supporting user scoped file upload and download to and from a cluster, details of which are being discussed in cockroachdb#47211. FileTableReadWriter can be used to store, retrieve and delete blobs and metadata of files, from user scoped tables. Access to these tables is restricted to the root/admin user and the user responsible for triggering table creation in the first place. Under-the-hood FileTableReadWriter operates on user specific File and Payload tables. The File table stores the metadata, while the Payload table stores the contents of the file as chunks of a configurable size. Release note: None
This is a building block for supporting user scoped file upload and download to and from a cluster, details of which are being discussed in cockroachdb#47211. FileTableReadWriter can be used to store, retrieve and delete blobs and metadata of files, from user scoped tables. Access to these tables is restricted to the root/admin user and the user responsible for triggering table creation in the first place. Under-the-hood FileTableReadWriter operates on user specific File and Payload tables. The File table stores the metadata, while the Payload table stores the contents of the file as chunks of a configurable size. Release note: None
This is a building block for supporting user scoped file upload and download to and from a cluster, details of which are being discussed in cockroachdb#47211. FileTableReadWriter can be used to store, retrieve and delete blobs and metadata of files, from user scoped tables. Access to these tables is restricted to the root/admin user and the user responsible for triggering table creation in the first place. Under-the-hood FileTableReadWriter operates on user specific File and Payload tables. The File table stores the metadata, while the Payload table stores the contents of the file as chunks of a configurable size. Release note: None
This is a building block for supporting user scoped file upload and download to and from a cluster, details of which are being discussed in cockroachdb#47211. FileTableReadWriter can be used to store, retrieve and delete blobs and metadata of files, from user scoped tables. Access to these tables is restricted to the root/admin user and the user responsible for triggering table creation in the first place. Under-the-hood FileTableReadWriter operates on user specific File and Payload tables. The File table stores the metadata, while the Payload table stores the contents of the file as chunks of a configurable size. Release note: None
50493: cloud: Add FileTableReadWriter to interface with user scoped blob tables r=miretskiy a=adityamaru This is a building block for supporting user scoped file upload and download to and from a cluster, details of which are being discussed in #47211. FileTableReadWriter can be used to store, retrieve and delete blobs and metadata of files, from user scoped tables. Access to these tables is restricted to the root/admin user and the user responsible for triggering table creation in the first place. Under-the-hood FileTableReadWriter operates on user specific File and Payload tables. The File table stores the metadata, while the Payload table stores the contents of the file as chunks of a configurable size. Release note: None Co-authored-by: Aditya Maru <adityamaru@gmail.com>
50644: storage: Split pkg cloud into cloud interface and cloud impl r=miretskiy a=adityamaru Previously, both the ExternalStorage interface and all the specialized implementations were in `pkg/storage/cloud`. This PR moves the interface and factory method signatures to `pkg/storage/cloud` and all the implementations to `pkg/storage/cloudimpl`. The motivation behind this is to ensure that other packages such `pkg/sql` and `pkg/kv` depend on just the interface/factory signatures (not the concrete impl) to prevent dependency cycles. More concreteley, we need to plumb an InternalExecutor and kv.DB for our new user scoped file-table storage. This was previously not possible because of a `pkg/sql -> pkg/storage/cloud ->pkg/sql` cycle. Now, since the executor will be used in `cloudimpl` this cycle will not occur. Informs: #47211 Release note: None Co-authored-by: Aditya Maru <adityamaru@gmail.com>
50755: storage,ccl: Plumbs session user information to the ExternalStorage r=miretskiy a=adityamaru The new UserFileTableSytem (#50493) needs knowledge of the user interacting with it to check access privileges on the underlying user scoped tables. These interactions are most notably via IMPORT, BACKUP and CHANGEFEED jobs and processors. This change plumbs the session user string from these various entities via the factory methods used to create the ExternalStorage objects. It also passes an internal executor and kvdb from server.Start(), which are required by the FileTableSystem to execute its user scoped SQL queries. This PR will be followed up by the implementation of the FileTable ExternalStorage, which will supply the user string, internal executor and kvDB to the UserFileTableSystem. Informs: #47211 Release note: None 50782: geo: minor fix ups to ST_Summary r=sumeerbhola a=otan Minor fixes to ST_Summary to make it more PostGIS like: * Correct ordering of the symbols. * Fix bounding boxes not being reported in sub structures. Release note: None Co-authored-by: Aditya Maru <adityamaru@gmail.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
50837: cloudimpl: add FileTableStorage as an ExternalStorage option r=miretskiy a=adityamaru There are two commits to this PR: commit 1 - teaches FileTableSystem about new URI, moves tests to break a dependency cycle. commit 2 - adds the concrete FileTableStorage implementation and tests. Informs: #47211 Co-authored-by: Aditya Maru <adityamaru@gmail.com>
Future work:
|
50981: cli: add support for userfile upload CLI command r=adityamaru a=adityamaru This change adds a CLI command allowing users to upload local files to the user scoped file table ExternalStorage. The command `userfile upload` uses the existing COPY protocol (similar to `nodelocal upload`) to upload files and write them to the UserFileTableSystem. The UserFileTableSystem is backed by two SQL tables which are currently always created with `defaultdb.public.user` as the prefix of the qualified name. In the future we may allow users to specify the `db.schema` they wish to store their tables in. The command takes a source and destination path. The former is used to find the file contents locally, the latter is used to reference the file and its related metadata/payload in the SQL tables. Known limitations: - All destination paths must start with `/`, this is to help us disambiguate filepath from `db.schema` name when we allow users to specify that in the future. - Destination paths must not have a `..` in them. Since the UserFilTableSystem is not a "real" file system, storing SQL rows with filenames such as /test/../test.csv seems strange. We will work on enforcing a better naming scheme. Informs: #47211 Release note (cli change): Adds a userfile upload command that can be used to upload a file to the user scoped blob storage: `userfile upload source/file /destination/of/file` 51392: build/deploy: add GEOS libraries to CRDB Docker builds r=jlinder a=otan Now that we have the GEOS libraries being built, it's time we copy them into the right place in the Docker container such that users can import geospatial features out of the box. The bless release script will also copy these files over. Release note (general change): The Docker container that ships with CockroachDB now includes the GEOS library needed for geospatial functionality in `/usr/local/lib/cockroach` (which is the default location of where the cockroach binary looks for the GEOS libraries). 51443: opt: improve geo func costing r=otan a=mjibson Release note: None 51444: builtins: implement ST_Disjoint r=rytaft a=otan Resolves #48919. Release note (sql change): Implements the ST_Disjoint builtin for geometry types. 51471: cloud: Respect Kubernetes resource limits r=bobvawter a=bobvawter Detecting the number of available CPU cores and memory limits from within a container can be inaccurate. This change passes the CPU and memory limits used by the Kubernetes scheduler into the cockroach process. In the absense of a limits block (e.g. when using BestEffort QOS in a dev/staging environment), the scheduler will substitute the maximum value that is appropriate for the node on which the container is scheduled. This change can be applied retroactively to existing deployments and is not specific to a particular version of CockroachDB. The cockroach start command is broken onto multiple lines to improve readability. See also: #34988 Release note: None Co-authored-by: Aditya Maru <adityamaru@gmail.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Matt Jibson <matt.jibson@gmail.com> Co-authored-by: Bob Vawter <bob@vawter.org>
Closing per #51222 |
Right now
nodelocal
simply maps to the external IO dir, but since all access is done as the server process, this means there is no user-level access control the way there is with, say, s3 where the user can bring their own tokens.The current system is useful in some cases but to be more generally useful in multi-user deployments, we need to make it user-aware: it should be a grantable privilege to interact with nodelocal files at all and users should only be able to interact with their own files. We potentially could also want quotas on nodelocal usage, though until table usage has quotas it is a moot point (you could just store your files in a blob columns).
The text was updated successfully, but these errors were encountered: