YAML file for options (--opts my_opts.yaml) #622

stared · 2022-03-27T18:36:18Z

I added an option to load options form a YAML file. So, we can run

python -m bdfr download ./output --opts my_opts.yaml

A file looks like

skip: [mp4, avi, mov]
file_scheme: "{UPVOTES}_{REDDITOR}_{POSTID}_{DATE}"
limit: 10
sort: top
time: all
no_dupes: true
subreddit:
  - EarthPorn
  - CityPorn

General rationale:

We don't need to write long commands if we use the same options.
If we want to run the same (or a similar) query many times, we don't need to memorize it, a YAML file suffices.
It supports multiple subreddits. Right now it was clunky (i.e. only through bash pipes).
And well - we don't lose anything. It is fully backwards compatible!

If there are more options provided from the CLI (e.g. -L 20), they take priority over the file.

YAML file for options (--opts my_opts.yaml)
file example
README update
Messages for wrong options

sinclairkosh · 2022-03-27T19:21:52Z

if that's going to be a thing, might it be worthwhile having a check for a standardised filename (probably hidden, say something like .bdfr.yml ) in the current dir or base/destination dir and use those options?

stared · 2022-03-27T19:27:56Z

@sinclairkosh Not sure if I understand the rationale. This file (unlike config.cfg) is supposed to be explicit, and possibly different for various queries (e.g. top.yaml, new.yaml, my_friends.yaml). A hidden file goes pretty much against this philosophy.

While there might be a default name (e.g. bdfr.yaml), not sure if it adds much (and at the same time, may cause confusion).

sinclairkosh · 2022-03-27T19:44:22Z

the rationale would be the same or similar to the first two points of your general rationale. Repeat downloads in specific locations with the same set of options. It was a suggestion, nothing more, nothing less. More user QOL than anything.

Serene-Arc · 2022-03-28T00:28:29Z

Question from me (the developer): what are the advantages of this over a bash script? That is the method that I have been recommending (and personally use) for repetitive, identical commands. What are the advantages of this method over simple scripting?

To me, it seems to add more points of possible failure and redundancy, as well as introduce more complexity, albeit minor, in the form of the YAML file and format.

sinclairkosh · 2022-03-28T04:18:11Z

@Serene-Arc The answer to that depends on who your audience is.

Who do you, the developers, see as the users of this program?

If, as I seem to recall somewhere in the text, you're talking about using it for research (amongst other things). In that case, someone using it for NLP or ML research might have no issues at all with a shellscript, but another researcher using it for political/sociological etc research might not be anywhere near as computer savvy and familiar with things such as shell scripts.

Then it's a matter of if you want to "support" your users with QOL features that makes life easier for them or if it's a case of, well it works for us, sort your own crap out. This is open-source, so both paths are reasonable and generally acceptable, if not necessarily enjoyable if you're on the end of the "sort your own crap out" if you don't know how :)

There are, to be honest, likely no real technical advantages to this approach, although I'm sure if you thought about it hard enough you might come up with some. In the end, what going down this path will do is make life easier for a subset of those using the program, especially those with less programming related skill.

aliparlakci · 2022-03-28T08:22:29Z

First of all, hello Piotr, one of the first contributors to the project :) I hope you are well.

The biggest concern of ours about new features and additions is that it might bloat the program. We want to keep it as simple as possible. There were many other things which would make the life of the users "easier" but we had to refuse them solely because there were other ways to achieve it rather than implementing it in the source code.

As far as I see, this yaml option feature can be achieved as such:

custom_downloader.sh

# custom_downloader.sh
python -m bdfr downloader ./output \
  --skip mp4, avi, mov \
  --file_scheme "{UPVOTES}_{REDDITOR}_{POSTID}_{DATE}" \
  --limit 10 \
  --sort top \
  --time all \
  --no_dupes \
  --subreddit EarthPorn+CityPorn

Syntax and options might be a little different but to run this program, you only need to execute:

> ./custom_downlader.sh

Instead of

python -m bdfr download ./output --opts my_opts.yaml

Please correct me if I am missing something.

However, I do think that not everyone is as computer literate as us. So, there should be a guide/explanation to use bdfr in this kind of workflow. So, this PR might be documentation contribution instead.

stared · 2022-03-28T08:25:36Z

@sinclairkosh It is up to someone's workflow. If bash for everything is one that works for you the best, awesome.

For me (well, I came from the ML background), while I can certainly do bash, too long bash commands are neither pretty nor convenient. In fact, most ML and DS frameworks I know provide YAML (or JSON, TOML) configs.

Just compare and contrast the very example I provided.
Or: How would you run a query "same, but for year not all). \
All in all, in provides an additional way, it does not hurt any CLI functionalities.
Moreover, there is a lot of room for very productive mixing: e.g. run a bash script process a sequence of YAML-specified downloads.

@aliparlakci I understand your point. Still, I wrote the YAML part for myself. As:

--subreddit EarthPorn+CityPorn is fine; how about if it is 20?
This "same but with one alternated parameter changed".
While you can generate any query with any other language, it is much easier to use YAML as the exchange format.

aliparlakci · 2022-03-28T09:25:46Z

Just to answer your questions:

Or: How would you run a query "same, but for year not all

I don't understand this part. You would still have different top.sh, new.sh, my_friends.sh files. Afaik, this is also the case for .yaml files.

Moreover, there is a lot of room for very productive mixing: e.g. run a bash script process a sequence of YAML-specified downloads.

This is also the same for .sh files compared to .yaml. You can create .sh scripts running other, small .sh scripts. .yaml does not introduce any advantage over .sh files.

--subreddit EarthPorn+CityPorn is fine; how about if it is 20?

Below syntax is also valid:

# custom_downloader.sh
python -m bdfr downloader ./output \
  --subreddit EarthPorn \
  --subreddit CityPorn \

This "same but with one alternated parameter changed".

Just have some .sh file without the parameter and use it as such

~$ base_bdf_options.sh --subreddit EarthPorn --subreddit CityPorn

However, I don't think this will make too much clutter in the codebase and it is not so strange that a CLI tool accepts a .yaml config and it does not seem to be against our design choices. Let us discuss it if you still think that the feature is needed.

Serene-Arc · 2022-03-28T12:10:59Z

My position is that this feature is not needed. I don't see any difference in skill level between writing a bash script and writing a YAML file, and there is little difference in the actual format.

That being said, I did specifically design the BDFR to accommodate things like this, which is why the Configuration class is how it is. I don't see any use for this but there's no reason for it not to be implemented as it should be a very low-maintainence module.

However I would request that any changes, and any code related to YAML processing, be moved to a separate module. I can do this refactoring myself if you prefer.

stared · 2022-03-28T15:01:22Z

I understand that you @Serene-Arc and @aliparlakci skeptical if it offers any improvement.

I wrote it for my personal use, as it fits my workflow/preferences better. I shared it as a PR as I believe some others would prefer this way. Since it does not change the CLI style, for anyone not intending to use YAML, it would make no difference at all. And you @Serene-Arc noticed, it is designed to be a low-maintenance module.

There is a general trend of moving in that direction of exchangeable data formats used as configs. See e.g.:

New Feature: YAML Config File on Gradient
Ditch the command-line arguments from your machine learning code: use a config file
Many configs everywhere could have been bash scripts (GitHub Actions).

Of course, bash scripts can do more. But following the rule of least power, YAML configs offer a more predictable way for many protocols than any executable (or potentially executable) files.

However I would request that any changes, and any code related to YAML processing, be moved to a separate module. I can do this refactoring myself if you prefer.

Not sure how do you like to refactor it, so it would be wonderful if you could take it from here.

One note, which is not related to YAML per see. I used in a few places setattr, which seems more idiomatic that way with vars (one can argue that self.__setattr__ would be even better).

stared changed the title ~~yaml for options~~ YAML file for options (--opts my_opts.yaml) Mar 27, 2022

stared marked this pull request as ready for review March 27, 2022 19:16

aliparlakci closed this Mar 28, 2022

aliparlakci reopened this Mar 28, 2022

stared force-pushed the yaml-options branch from 92a8337 to 871d975 Compare March 28, 2022 19:33

stared changed the base branch from master to development March 28, 2022 19:33

Serene-Arc self-assigned this Mar 28, 2022

stared added 5 commits July 22, 2022 15:38

yaml for options

798ed72

underscores in YAML

ef82387

explicit warnings for non-exisitng args

395bf91

instructions for YAML options

0731de7

a better check for opts

5f443fd

Serene-Arc force-pushed the yaml-options branch from 871d975 to 5f443fd Compare July 22, 2022 05:39

Serene-Arc added 4 commits July 22, 2022 15:44

Extract YAML function

cb3415c

Rename variable

23e20e6

Change logger message level

af3f98f

Add simple test

27ca92e

Serene-Arc merged commit 7ae318f into Serene-Arc:development Jul 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YAML file for options (--opts my_opts.yaml) #622

YAML file for options (--opts my_opts.yaml) #622

stared commented Mar 27, 2022 •

edited

Loading

sinclairkosh commented Mar 27, 2022

stared commented Mar 27, 2022 •

edited

Loading

sinclairkosh commented Mar 27, 2022

Serene-Arc commented Mar 28, 2022

sinclairkosh commented Mar 28, 2022

aliparlakci commented Mar 28, 2022

stared commented Mar 28, 2022 •

edited

Loading

aliparlakci commented Mar 28, 2022 •

edited

Loading

Serene-Arc commented Mar 28, 2022

stared commented Mar 28, 2022 •

edited

Loading

YAML file for options (--opts my_opts.yaml) #622

YAML file for options (--opts my_opts.yaml) #622

Conversation

stared commented Mar 27, 2022 • edited Loading

sinclairkosh commented Mar 27, 2022

stared commented Mar 27, 2022 • edited Loading

sinclairkosh commented Mar 27, 2022

Serene-Arc commented Mar 28, 2022

sinclairkosh commented Mar 28, 2022

aliparlakci commented Mar 28, 2022

stared commented Mar 28, 2022 • edited Loading

aliparlakci commented Mar 28, 2022 • edited Loading

Serene-Arc commented Mar 28, 2022

stared commented Mar 28, 2022 • edited Loading

stared commented Mar 27, 2022 •

edited

Loading

stared commented Mar 27, 2022 •

edited

Loading

stared commented Mar 28, 2022 •

edited

Loading

aliparlakci commented Mar 28, 2022 •

edited

Loading

stared commented Mar 28, 2022 •

edited

Loading