Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML file for options (--opts my_opts.yaml) #622

Merged
merged 9 commits into from
Jul 22, 2022

Conversation

stared
Copy link
Contributor

@stared stared commented Mar 27, 2022

I added an option to load options form a YAML file. So, we can run

python -m bdfr download ./output --opts my_opts.yaml

A file looks like

skip: [mp4, avi, mov]
file_scheme: "{UPVOTES}_{REDDITOR}_{POSTID}_{DATE}"
limit: 10
sort: top
time: all
no_dupes: true
subreddit:
  - EarthPorn
  - CityPorn

General rationale:

  • We don't need to write long commands if we use the same options.
  • If we want to run the same (or a similar) query many times, we don't need to memorize it, a YAML file suffices.
  • It supports multiple subreddits. Right now it was clunky (i.e. only through bash pipes).
  • And well - we don't lose anything. It is fully backwards compatible!

If there are more options provided from the CLI (e.g. -L 20), they take priority over the file.

  • YAML file for options (--opts my_opts.yaml)
  • file example
  • README update
  • Messages for wrong options

@stared stared changed the title yaml for options YAML file for options (--opts my_opts.yaml) Mar 27, 2022
@stared stared marked this pull request as ready for review March 27, 2022 19:16
@sinclairkosh
Copy link
Contributor

if that's going to be a thing, might it be worthwhile having a check for a standardised filename (probably hidden, say something like .bdfr.yml ) in the current dir or base/destination dir and use those options?

@stared
Copy link
Contributor Author

stared commented Mar 27, 2022

@sinclairkosh Not sure if I understand the rationale. This file (unlike config.cfg) is supposed to be explicit, and possibly different for various queries (e.g. top.yaml, new.yaml, my_friends.yaml). A hidden file goes pretty much against this philosophy.

While there might be a default name (e.g. bdfr.yaml), not sure if it adds much (and at the same time, may cause confusion).

@sinclairkosh
Copy link
Contributor

the rationale would be the same or similar to the first two points of your general rationale. Repeat downloads in specific locations with the same set of options. It was a suggestion, nothing more, nothing less. More user QOL than anything.

@Serene-Arc
Copy link
Owner

Question from me (the developer): what are the advantages of this over a bash script? That is the method that I have been recommending (and personally use) for repetitive, identical commands. What are the advantages of this method over simple scripting?

To me, it seems to add more points of possible failure and redundancy, as well as introduce more complexity, albeit minor, in the form of the YAML file and format.

@sinclairkosh
Copy link
Contributor

@Serene-Arc The answer to that depends on who your audience is.

Who do you, the developers, see as the users of this program?

If, as I seem to recall somewhere in the text, you're talking about using it for research (amongst other things). In that case, someone using it for NLP or ML research might have no issues at all with a shellscript, but another researcher using it for political/sociological etc research might not be anywhere near as computer savvy and familiar with things such as shell scripts.

Then it's a matter of if you want to "support" your users with QOL features that makes life easier for them or if it's a case of, well it works for us, sort your own crap out. This is open-source, so both paths are reasonable and generally acceptable, if not necessarily enjoyable if you're on the end of the "sort your own crap out" if you don't know how :)

There are, to be honest, likely no real technical advantages to this approach, although I'm sure if you thought about it hard enough you might come up with some. In the end, what going down this path will do is make life easier for a subset of those using the program, especially those with less programming related skill.

@aliparlakci
Copy link
Collaborator

First of all, hello Piotr, one of the first contributors to the project :) I hope you are well.

The biggest concern of ours about new features and additions is that it might bloat the program. We want to keep it as simple as possible. There were many other things which would make the life of the users "easier" but we had to refuse them solely because there were other ways to achieve it rather than implementing it in the source code.

As far as I see, this yaml option feature can be achieved as such:

custom_downloader.sh

# custom_downloader.sh
python -m bdfr downloader ./output \
  --skip mp4, avi, mov \
  --file_scheme "{UPVOTES}_{REDDITOR}_{POSTID}_{DATE}" \
  --limit 10 \
  --sort top \
  --time all \
  --no_dupes \
  --subreddit EarthPorn+CityPorn

Syntax and options might be a little different but to run this program, you only need to execute:

> ./custom_downlader.sh

Instead of

python -m bdfr download ./output --opts my_opts.yaml

Please correct me if I am missing something.

However, I do think that not everyone is as computer literate as us. So, there should be a guide/explanation to use bdfr in this kind of workflow. So, this PR might be documentation contribution instead.

@aliparlakci aliparlakci reopened this Mar 28, 2022
@stared
Copy link
Contributor Author

stared commented Mar 28, 2022

@sinclairkosh It is up to someone's workflow. If bash for everything is one that works for you the best, awesome.

For me (well, I came from the ML background), while I can certainly do bash, too long bash commands are neither pretty nor convenient. In fact, most ML and DS frameworks I know provide YAML (or JSON, TOML) configs.

  • Just compare and contrast the very example I provided.
  • Or: How would you run a query "same, but for year not all). \
  • All in all, in provides an additional way, it does not hurt any CLI functionalities.
  • Moreover, there is a lot of room for very productive mixing: e.g. run a bash script process a sequence of YAML-specified downloads.

@aliparlakci I understand your point. Still, I wrote the YAML part for myself. As:

  • --subreddit EarthPorn+CityPorn is fine; how about if it is 20?
  • This "same but with one alternated parameter changed".
  • While you can generate any query with any other language, it is much easier to use YAML as the exchange format.

@aliparlakci
Copy link
Collaborator

aliparlakci commented Mar 28, 2022

Just to answer your questions:

Or: How would you run a query "same, but for year not all

I don't understand this part. You would still have different top.sh, new.sh, my_friends.sh files. Afaik, this is also the case for .yaml files.

Moreover, there is a lot of room for very productive mixing: e.g. run a bash script process a sequence of YAML-specified downloads.

This is also the same for .sh files compared to .yaml. You can create .sh scripts running other, small .sh scripts. .yaml does not introduce any advantage over .sh files.

--subreddit EarthPorn+CityPorn is fine; how about if it is 20?

Below syntax is also valid:

# custom_downloader.sh
python -m bdfr downloader ./output \
  --subreddit EarthPorn \
  --subreddit CityPorn \

This "same but with one alternated parameter changed".

Just have some .sh file without the parameter and use it as such

~$ base_bdf_options.sh --subreddit EarthPorn --subreddit CityPorn

However, I don't think this will make too much clutter in the codebase and it is not so strange that a CLI tool accepts a .yaml config and it does not seem to be against our design choices. Let us discuss it if you still think that the feature is needed.

@Serene-Arc
Copy link
Owner

My position is that this feature is not needed. I don't see any difference in skill level between writing a bash script and writing a YAML file, and there is little difference in the actual format.

That being said, I did specifically design the BDFR to accommodate things like this, which is why the Configuration class is how it is. I don't see any use for this but there's no reason for it not to be implemented as it should be a very low-maintainence module.

However I would request that any changes, and any code related to YAML processing, be moved to a separate module. I can do this refactoring myself if you prefer.

@stared
Copy link
Contributor Author

stared commented Mar 28, 2022

I understand that you @Serene-Arc and @aliparlakci skeptical if it offers any improvement.

I wrote it for my personal use, as it fits my workflow/preferences better. I shared it as a PR as I believe some others would prefer this way. Since it does not change the CLI style, for anyone not intending to use YAML, it would make no difference at all. And you @Serene-Arc noticed, it is designed to be a low-maintenance module.

There is a general trend of moving in that direction of exchangeable data formats used as configs. See e.g.:

Of course, bash scripts can do more. But following the rule of least power, YAML configs offer a more predictable way for many protocols than any executable (or potentially executable) files.

However I would request that any changes, and any code related to YAML processing, be moved to a separate module. I can do this refactoring myself if you prefer.

Not sure how do you like to refactor it, so it would be wonderful if you could take it from here.

One note, which is not related to YAML per see. I used in a few places setattr, which seems more idiomatic that way with vars (one can argue that self.__setattr__ would be even better).

@stared stared changed the base branch from master to development March 28, 2022 19:33
@Serene-Arc Serene-Arc self-assigned this Mar 28, 2022
@Serene-Arc Serene-Arc merged commit 7ae318f into Serene-Arc:development Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants