Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method for setting parameter options #74

Closed
chuckwondo opened this issue Aug 14, 2024 · 15 comments
Closed

Add method for setting parameter options #74

chuckwondo opened this issue Aug 14, 2024 · 15 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@chuckwondo
Copy link
Collaborator

Parameter options can currently be set directly on a query instance, like so:

query.options[parameter] = {option_name: True}  # or False

However, this is undocumented, unintuitive, and error-prone.

Given that there are only specific parameter options available, I suggest providing a method (on the Query class) per option. For example, for the ignore_case parameter option:

def option_ignore_case(self, parameter: str, value: bool = True) -> Self:
    options[parameter] = {"ignore_case": value}
    return self

This would allow calls like query.option_ignore_case("entry_title") to ignore case for search by entry title, for example.

Implement methods option_pattern, option_and, and option_or similarly.

@chuckwondo chuckwondo added enhancement New feature or request good first issue Good for newcomers labels Aug 14, 2024
@miabobia
Copy link
Contributor

Should I be checking to see if the parameter url is valid in these functions? Reading through the documentation and just wanted to get some clarification

The behavior of search with respect to each parameter can be modified using the options parameter. The options parameter takes the following form:

options[parameter][option_key]=value

where parameter is the **URL parameter** whose behavior is to be affected, value is either true or false, and option_key is one of the following

@chuckwondo
Copy link
Collaborator Author

Should I be checking to see if the parameter url is valid in these functions? Reading through the documentation and just wanted to get some clarification

Excellent question. For the initial implementation, I suggest not performing such a check.

If the user specifies something that's invalid, they'll get a query response error indicating the problem.

The purpose of providing some client-side validation is to avoid an unnecessary request, but it's not a terrible thing to make such a request. On the contrary, one could argue that such client-side validation can become out of sync with what the server allows, thus potentially preventing the user from performing an action that is actually valid.

Long story short: skip client-side validation. We can always open a new issue to add such validation at a later point.

@miabobia
Copy link
Contributor

Thank you for the quick response! I will work on implementing this first version with no validation checking this evening

@miabobia
Copy link
Contributor

I had another question. Should I make sure the user explicitly adds a parameter before they set an option for said parameter? Functionally I think either approach would work, but I wanted to clarify

@chuckwondo
Copy link
Collaborator Author

chuckwondo commented Aug 15, 2024

I had another question. Should I make sure the user explicitly adds a parameter before they set an option for said parameter? Functionally I think either approach would work, but I wanted to clarify

Another good question. Since setting an option for a parameter without setting the parameter itself is a valid CMR query, I'd say we can ignore this for now. Again, we can open a new issue to add that as an enhancement, if someone wants such client-side validation.

I would equate that to gmail's feature that blocks you from sending an email containing the word "attachment" when there is nothing attached. In that case, you perhaps made a mistake by forgetting to attach something. Similarly, here that would catch a user's "accidental" failure to set a parameter when an option for that parameter is set.

As a note for the future, this is likely something we would want to implement in the _valid_state method. That would allow the user to specify an option before specifying a value for the associated parameter. If the user forgot to set the parameter, an exception would be raised when the URL is built internally because it would then trigger a call to _valid_state, which would be where we would check that all parameters for which options were specified also have a parameter value specified.

@miabobia
Copy link
Contributor

I'm pretty new to contributing so I have another validation question haha. I am unsure if I should be raising errors if an invalid parameter is passed. My first thought is to do something similar to the parameters function.

        methods = dict(getmembers(self, predicate=ismethod))

        for key, val in kwargs.items():
            # verify the key matches one of our methods
            if key not in methods:
                raise ValueError(f"Unknown key {key}")
    def option_or(self, parameter: str, value: bool = True) -> Self:
        """
        Set this Query's or option for the specified parameter
        :param parameter: search parameter to specify or option for
        :param value: value for or option
        :returns: self
        :raises: Will raise if invalid parameter provided
        """

        if not parameter:
            return self
        
        valid_parameters = dict(getmembers(self, predicate=ismethod)).keys()

        if parameter not in valid_parameters:
            raise ValueError(f"Unknown parameter {parameter}")
        
        self.options[parameter] = {"or": value}

        return self

From testing I can see that this won't throw an error if an option for an invalid parameter is set. I just am unsure if I should be blocking the user like the gmail attachment example. eg: the user passes in a parameter spelled slightly wrong. If I should be raising errors I think I may have to refactor test_queries.py with a unittest class, so we can have access to the assertRaises function

@chuckwondo
Copy link
Collaborator Author

Your code looks good. I have only 1 very minor suggestion for clarifying the error message, which is to put a colon (:) preceding the parameter name: "Unknown parameter: {parameter}", to aid in readability.

Regarding unit tests, add the following test functions to test_queries.py:

def test_option_or_valid_parameter():
    query = CollectionQuery()
    query.option_or("revision_date")


def test_option_or_invalid_parameter():
    query = CollectionQuery()

    with pytest.raises(ValueError):
        query.option_or("no_such_parameter")

This will also require adjusting the imports at the top of that file, to the following:

import pytest
from cmr import CollectionQuery, Query

@chuckwondo
Copy link
Collaborator Author

Oh, just a bit of tidying to the docstring too:

        """
        Set this query's `or` parameter option for the specified parameter.

        See `CMR Search API Search Options`_.

        .. _CMR Search API Search Options:
           https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#parameter-options

        :param parameter: search parameter to specify `or` option for
        :param value: specify `True` (default) to retrieve results matching _any_ of the
            values set for the parameter; `False` to retrieve _only_ results matching
            _all_ values set for the parameter
        :returns: self
        :raises: `ValueError` if invalid parameter provided
        """

@miabobia
Copy link
Contributor

Thanks for the feedback! I was considering putting 'or' and 'and' into quotes in the docstrings but I wasn't sure if it matched the styling of the others well enough.

@chuckwondo
Copy link
Collaborator Author

Oops! I forgot the assertions in the new test functions, plus testing turning the option off. Should be these test functions:

def test_option_or_valid_parameter_on():
    query = CollectionQuery()
    query.option_or("revision_date")

    assert query.options["revision_date"]["or"] is True


def test_option_or_valid_parameter_off():
    query = CollectionQuery()
    query.option_or("revision_date", False)

    assert query.options["revision_date"]["or"] is False


def test_option_or_invalid_parameter():
    query = CollectionQuery()

    with pytest.raises(ValueError):
        query.option_or("no_such_parameter")

    assert "no_such_parameter" not in query.options

@chuckwondo
Copy link
Collaborator Author

Thanks for the feedback! I was considering putting 'or' and 'and' into quotes in the docstrings but I wasn't sure if it matched the styling of the others well enough.

Yep, some sort of quoting is necessary so they are not mistaken for regular words in the prose. In this case, I think the backticks are a bit more appropriate than single- or double-quotes since the or and the and are literally made part of the query URLs.

@chuckwondo
Copy link
Collaborator Author

@mamamia96, as I mentioned in #75, upon further reading of the CMR Search API documentation, I see that there are other possibilities we need to support, so a single method will allow us to provide better support and coverage. For example, if you search for option[ (including the square bracket) on the documentation page, you will find a number of other options (in addition to or, and, ignore_case, and pattern) and other possible value types (not only boolean).

Below, I've included what I believe to be a more comprehensive implementation for this issue.

In addition, I've added inline doctest examples, so that we can not only show users how to use the method (and how it behaves), but also get "free" tests, so we don't need to write separate tests in test_queries.py. I also discovered a bug in your initial implementation in #75, which prevents setting multiple options for the same parameter, so my suggested code fixes that as well.

Finally, I removed the logic that checks for a "valid" parameter for 2 reasons:

  1. There are many parameters that are not (yet) directly supported in this library through dedicated methods. For example, the CMR Search API supports an include_highlights parameter, but we don't have an include_highlights method. In these cases, it is possible for the user to directly set such a parameter (e.g., query.params["include_highlights"] = "true"), so raising an exception because we do not have a specific include_highlights method needlessly prevents the user from setting a parameter option for it. That is, to avoid getting an "invalid option" error, a user has to resort to directly setting the query parameter, which defeats the purpose of providing a "convenience" function for the user.
  2. The parameter name is not always the same as the parameter option name. For the include_highlights parameter, the parameter option name is highlights. Thus, we do not want to raise an exception when setting an option parameter for "highlights" because we not only have no method named highlights (see point above), but we might never have such a method because the corresponding parameter method would be called include_highlights, not highlights. In other words, we might eventually want to be able to do this:
    results = (
        query
        .include_highlights()
        .option("highlights", "begin_tag", "<b>")
        .option("highlights", "end_tag", "</b>")
        .get()
    )

Here are the specific code changes I suggest.

First, in pyproject.toml, add the following block at the end of the file, which will run all doctest comments (see code below) when pytest runs:

[tool.pytest.ini_options]
addopts = ["--doctest-modules"]

In cmr/queries.py, immediately below the from abc import ... line, add the following line:

from collections import defaultdict

Then, in the same file, add this option (singular) method:

    def option(
        self, parameter: str, key: str, value: Union[str, bool, int, float, None]
    ) -> Self:
        """
        Set or remove a search parameter option.

        If either an empty parameter name or option key is given, do nothing.
        Otherwise, if a non-`None` option value is given, set the specified parameter
        option to the value; else _remove_ the parameter option, if previously given.

        In all cases, return self to support method chaining.

        See `CMR Search API Parameter Options`_ as well as other sections of the
        documentation that describe other available parameter options.

        .. _CMR Search API Parameter Options:
           https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#parameter-options

        Example:

        .. code:: python

           >>> query = CollectionQuery()
           >>> query.option("short_name", "ignore_case", True)  # doctest: +ELLIPSIS
           <cmr.queries.CollectionQuery ...>
           >>> query.options  # doctest: +ELLIPSIS
           defaultdict(..., {'short_name': {'ignore_case': True}})
           >>> query.option("short_name", "ignore_case", False)  # doctest: +ELLIPSIS
           <cmr.queries.CollectionQuery ...>
           >>> query.options  # doctest: +ELLIPSIS
           defaultdict(..., {'short_name': {'ignore_case': False}})
           >>> (query
           ... .option("short_name", "ignore_case", None)  # remove an option
           ... .option("short_name", "or", True)
           ... .option("highlights", "begin_tag", "<b>")
           ... .option("highlights", "end_tag", "</b>")
           ... )
           <cmr.queries.CollectionQuery ...>
           >>> query.options  # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
           defaultdict(..., {'short_name': {'or': True},
           'highlights': {'begin_tag': '<b>', 'end_tag': '</b>'}})

        :param parameter: search parameter to set an option for
        :param key: option key to set a value for
        :param value: value to set for the option, or `None` to remove the option
        :returns: self
        """

        if parameter and key:
            if value is None:
                del self.options[parameter][key]
            else:
                self.options[parameter][key] = value

        return self

Further, in the same file, in the Query._build_url method, we have the following lines:

                # all CMR options must be booleans
                if not isinstance(val, bool):
                    raise TypeError(
                        f"parameter '{param_key}' with option '{option_key}' must be a boolean"
                    )

As I described above, parameter options are not restricted to boolean values, so the block above should be removed. For example, the values of options[highlights][begin_tag] and options[highlights][end_tag] are strings, and the values of options[highlights][snippet_length] and options[highlights][num_snippets] are positive integers.

At a later point, we might want to add logic that knows about the expected data types for specific parameter options, but for now, let's simply let the user specify any type of value for any parameter option. If the CMR doesn't like something, it will respond with an error.

@miabobia
Copy link
Contributor

miabobia commented Aug 17, 2024

Excellent thank you for the detailed feedback I appreciate it. One question I have is do you want self.options to be a defaultdict now? Currently the doc string tests get key errors the way we are trying to access them.

update: I think I may have answered my own question after reading the doc string tests a little closer

@chuckwondo
Copy link
Collaborator Author

Excellent thank you for the detailed feedback I appreciate it. One question I have is do you want self.options to be a defaultdict now? Currently the doc string tests get key errors the way we are trying to access them.

update: I think I may have answered my own question after reading the doc string tests a little closer

Ah, yes, sorry, I forgot to include that change. It should now be:

        self.options: MutableMapping[str, MutableMapping[str, Any]] = defaultdict(dict)

miabobia added a commit to miabobia/python_cmr that referenced this issue Aug 17, 2024
frankinspace added a commit that referenced this issue Aug 19, 2024
* Added functions for setting parameter options in Query class.

* Revised doc strings on option functions. Added unit tests for each of the functions in test_queries.py.

* Reverted changes to implement a more generic singular option function. Enabled doctests. Query's options member is now a defaultdict

* removed unused import from test_queries

* Added entry to CHANGELOG.md for issue #74

---------

Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com>
@chuckwondo
Copy link
Collaborator Author

Fixed by #75

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants