-
-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
split repo and archive name into separate args? #948
Comments
Yeah, please implement it the classical way, as you suggested. I hate the rather strange |
Thanks for the feedback. As a help for you until this is implemented: "A::B" is usually used to say "B" is in scope of "A", so A is always the "container / namespace". In Python, one would say "A.B". Of course, one always begins with the toplevel "container / namespace". |
i would suggest with support for getting both variables from the env |
@RonnyPfannschmidt sure, makes sense to move this to global options so we do not duplicate it in every command description. |
@ThomasWaldmann it would also lend itself to formulate a click command group |
Shouldn't required paramters be positional arguments instead of --options? In the context of "borg create", not sure how well it would work for everything else:
That would simplify usage to:
|
I like the added flexibility of pattern matching for --archive-match (but don't need it myself). |
That doesn't look too bad to me! And keeping this is good for backwards compatibility. |
How about this:
|
Tried keeping repository as a positional arg and adding --name option for the archive name. #6766 Due to the argparse limitation (see "order matters" in the docs), this leads to strange command lines like:
This reads best, but does not work:
To solve, we could really consider
In fact, the repository can be optional on the command line if |
I had a look how
|
Current state of this in PR #6766:
|
Hmm, guess i don't really like these
OTOH, most other commands working with archives require one or two archive names, so they could be positional args also, like But, there are some commands where not giving the archive name switches the command to another mode, e.g. Shall we just make separate commands for these modes? Like |
ideas:
all commands below given without -r REPO (assume BORG_REPO=... is in the environment) for brevity.
|
@RonnyPfannschmidt @elho @textshell @enkore @rumpelsepp @pepa65 any comments? |
My take on archive metadata:
|
Like tags? I've recently looked how restic handles this. their archives (called snapshots there) do not have a name, just a hash. They automatically save hostname, user, timestamp and source paths into metadata (and they also support tags). Found that an interesting approach, but with some issues:
|
No. One clear identifier that tells you at which set of archives you usually would apply your purge. If you just use random tags it again opens the opportunity for confusion in configs. In a GUI you don't have one clear identifier that you can generate and expose to the user. Pika has a feature to set up backup configurations based on existing archives in the repo, but you can't guess what should be used for I think it should be one defined identifier that replaces the current use of prefixes. |
OK, so it is a groupid, sequenceid, datasetid, ... (just searching for a good name). BTW, there is another place where such an id would be useful: to identify a specific (partial) files cache (in that case, datasetid would make sense, because the files cache depends on the specific set of input data). |
Have to agree with @sophie-h here. When looking at a random list of archives in Vorta, it basically just shows the date:
This sounds sensible. Duration and change size (or similar) could be regarded as metadata too. Allowing just one tag would keep it simple.
Need not be one, as people have different workflows. Some use hostname (with prefixes currently), others just the time. So I think this is worth considering:
Playing with possible commands:
|
Just to be clear: I don't want to remove the other filter features from |
When pruning with a hostname/username/tag based subset of all archives, there is some risk that it matches more than one sequence of that host/user/tag (similar issue like forgetting to give the correct --prefix), leading to unwanted deletion of the wrong archives. We could change The generated archive name would then be The user would be required to define distinct datasetids for each different way they invoke
Better name than datasetid? |
So the benefit would be to always use the same datasetid/name and get the date appended automatically?
instead of
Pretty small benefit at the cost of explaining a new term and making it harder to understand. And the same behavior is already possible with placeholders in the archive name. I even imagine people would want to customize the timestamp to be appended or turn it off. So even more options and complexity. Given all that, I find the current behavior preferable. Or anything I missed? |
Thinking further: Let's say the current archive name becomes a dataset ID or archive group. Then users would need to refer to an individual archive by some hash (which Borg already generates) or look up This is similar to how Restic and Kopia do things, except that they use a shorter ID. Also similar to Git commit IDs. So the real question is: Should the user give the unique identifier when creating an archive or something else? (like the dataset/archive group). Using a generated identifier may be cleaner than something user-provided, like Here an example for illustration and brainstorming: Create, list, extract, delete
Prune
Summarizing suggested changes, if the dataset-ID suggestion moves forward:
Benefits over current way of doing things:
|
Many years ago there was the idea of tags where iirc there were two proposals, one just plain tags, and the other being essentially key-value pairs. This sounds like a specialization of the latter, where Borg defines the available keys and values ( The main advantage of defining this metadata through Borg, instead of creative archive names (which became more powerful over time with archive name globbing and so on), is that frontends should have a much easier time working with this. I don't think it meaningfully improves or detracts from the backup UX of people using Borg directly, because before Borg was conceptually very simple ("A repository is a bunch of tars in a box"), and with this Borg gains the conceptual complexities of traditional backup tools (rsnapshot, bacula etc.) where there's datasets, groups, schedules and so on. To me it seems to be net-zero in this area. This would also mean Borg becoming more narrow in purpose and usage, and more specialized to "the typical backup workflow" (as defined here) - which is good for those using it that way, and not so good if not. I've used Borg for archiving purposes (and continue to do so), where it is a decent solution because there still is no portable, checksumming FS. (In fact I still have repositories formatted with the Borg patch I made ages ago that allows hierarchical archives - I can tell you from long-term usage that the concept works very well). |
Agree that changing prefixes to groups/datasets doesn’t improve the experience for those used to building complex prefixes. It may make it easier to get started for new users and those without much need for archive names. Adding the “free” metadata, like hostname, user and paths (in addition to date) is a smaller change and may enable new features later. This also doesn’t interfere with other uses. Using some internal ID as primary key needs more consideration. Just suggesting it here. If we want to keep archive names and prefixes as they are, here a minimal non-breaking change, which would enable richer UIs:
Let’s see what @ThomasWaldmann and @sophie-h think. This is all building on their suggestions. |
hostname: iirc, we already store that into metadata, i just see some formatting issue when trying to output that into a table (short names no problem, but for uniqueness we rather want the fqdn and that tends to be rather long). also there is the problem that uniqueness is not guaranteed here (not at all for the short name and in the worst case not even for the fqdn). paths: same table formatting issue. works nice with a few paths (as shown above), ugly with many paths and impossible when feeding individual paths (as I pointed out above). the main reason (and a definite advantage) for a datasetid (archive group id) is to have a value that can be used without pattern matching and also to remove a dangerous usability trap we have in current versions:
For very simple use cases, users could always give datasetid == "all" or "mymachine" and it would behave the same as now. About hex ids vs archive names:
|
@enkore do we have an issue here about that idea / patch?
|
Yeah, no matter wether its
Generally, making things consistent is good, but making things more complicated and counter-intuitive just for that reason makes no sense, IMNSHO. Similar with list, In case of
This is not at all similar or desirable (or usable, I would personally argue) for anyone who did not name his archives ""something-{now}". Even when ending the archive name with a timestamp, formatting could be dosired different.
The latter is a specialization of the former, which still allows anyone to use tags like
Having hostame, user, time etc. as borg sees it in the meta-data for someone to use is what we have and what shousd not be takesn away when adding tags, but the request we saw here was to add a special meta-data item for one frontend, that no-one else may use. And that is where tags shine, that frontent could just set a |
Maybe "series" / "series name"? (although "data set" does seems a nice alternative name for the "series" concept, below) I've been using Borg mostly via a Bash script (soon to be rewritten in Python and made public) — one of my main motivations was to conveniently handle what I called "archive series" within repositories. using a configuration file where I specify
I can then run commands like
and
From
|
The parser to split up repo and archive name into all needed parts is rather complex.
Also, some commands (prune) have a separate
--prefix
argument, which is kind ofarchivename*
.The repo part can also come from BORG_REPO env var.
Native windows support (see "windows branch") might even make it more complex, due to different matching patterns needed for it.
So, if we refactor this (which is a major cli api change, this the 2.0 milestone), it could look like:
ARCHIVE_PATTERN would support glob patterns on the archive name.
Additionally to
--archive-match
, we could support a--index [from:to]
option that just results into that part of the match result list.Support getting REPO and ARCHIVE from the environment.
The text was updated successfully, but these errors were encountered: