Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-47375: Run query_all_datasets in a single request for RemoteButler #1114

Merged
merged 12 commits into from
Nov 14, 2024

Commits on Nov 14, 2024

  1. Use advanced query system in query_all_datasets

    Switch from the query_datasets convenience method to the advanced query system in query_all_datasets.
    
    This lets us get the results one page at a time, which will be needed to prevent memory exhaustion when running these queries on the server.
    dhirving committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    39e9d02 View commit details
    Browse the repository at this point in the history
  2. Remove with_dimension_records from query-datasets

    It turns out that the query-datasets CLI was not actually using dimension records, and it will simplify the implementation to not support this.
    dhirving committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    6b10874 View commit details
    Browse the repository at this point in the history
  3. Restrict --order-by in query-datasets to single type

    The backend for querying multiple dataset types will not support "order by", so restrict the CLI to match the implementation.
    dhirving committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    8da61ab View commit details
    Browse the repository at this point in the history
  4. Remove order_by from query_all_datasets

    The upcoming implementation of query_all_datasets will not support order_by, so remove it.  This requires modifying the query-datasets CLI to use the single dataset type query_datasets when order by needs to be supported.
    dhirving committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    0a92c87 View commit details
    Browse the repository at this point in the history
  5. Make streaming query logic reusable

    In preparation for implementing query_all_datasets on the server, make the streaming response and timeout logic from the existing query handler re-usable.
    dhirving committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    f3e2e9d View commit details
    Browse the repository at this point in the history
  6. Move query streaming logic to its own file

    After the refactor in the previous commit, this is somewhat independent of the query routes.
    dhirving committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    7aaee6c View commit details
    Browse the repository at this point in the history
  7. Move query streaming client code to its own file

    This will be shared by the RemoteButler query_all_datasets implementation in an upcoming commit.
    dhirving committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    5c4d54d View commit details
    Browse the repository at this point in the history
  8. Define a dataclass for query_all_datasets args

    This will be used in an upcoming commit to prevent excessive duplication of function parameters between implementations of query_all_datasets.
    dhirving committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    f5aa116 View commit details
    Browse the repository at this point in the history
  9. Add server-side implementation of query_all_datasets

    query_all_datasets can potentially involve hundreds or thousands of separate dataset queries.  We don't want clients slamming the server with that many HTTP requests, so add a server-side endpoint that can handle these queries in a single request.
    dhirving committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    ad26503 View commit details
    Browse the repository at this point in the history
  10. Add back dimension records to QueryDatasets

    It turns out the QueryDatasets class is shared by multiple CLI scripts, some of which need dimension records included.  So add back `with_dimension_records` to the internal implementation of query_all_datasets.
    dhirving committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    c9fbdb9 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    d40596b View commit details
    Browse the repository at this point in the history
  12. Clean up imports

    dhirving committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    3ffbb4b View commit details
    Browse the repository at this point in the history