Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve client benchmark identification #3496

Merged
merged 1 commit into from
Jul 17, 2023

Conversation

dbutenhof
Copy link
Member

PBENCH-1186
PBENCH-1187

A bit of a catch-all. First, this simplifies and standardizes the dashboard's need to identify a benchmark. Eventually we'll be "smart enough" to go beyond the Pbench benchmark wrapper (as we intend to move away from relying on them) and dataset.metalog.pbench.script will become unreliable. This adds a new server.benchmark, which right now is the same but can be extended beyond that in the future.

The dashboard would also like to be able to identify which benchmark types the current server can support, in advance, and we'd like to avoid hardcoding that knowledge in the dashboard. For this purpose, I've added a "visualization" key in the endpoints response to indicate the benchmark types supported by the current server's datasets_visualize and datasets_compare APIs.

For some reason, minor consolidation of metadata setting code in the intake API base resulted in weird unit test and functional test failures. The odd part is that these should have been failing earlier, as they're cases where a dataset doesn't have server.index-map but the API requires it (for example in datasets_detail and datasets_contents). I twizzled the code a bit to fail with a CONFLICT error (which had already been used elsewhere) rather than a server internal error when the dataset is server.archiveonly, and adjusted the tests to match.

PBENCH-1186
PBENCH-1187

A bit of a catch-all. First, this simplifies and standardizes the dashboard's
need to identify a benchmark. Eventually we'll be "smart enough" to go beyond
the Pbench benchmark wrapper (as we intend to move away from relying on them)
and `dataset.metalog.pbench.script` will become unreliable. This adds a new
`server.benchmark`, which right now is the same but can be extended beyond
that in the future.

The dashboard would also like to be able to identify which benchmark types the
current server can support, in advance, and we'd like to avoid hardcoding that
knowledge in the dashboard. For this purpose, I've added a "visualization" key
in the endpoints response to indicate the benchmark types supported by the
current server's `datasets_visualize` and `datasets_compare` APIs.

For some reason, minor consolidation of metadata setting code in the intake
API base resulted in weird unit test and functional test failures. The odd
part is that these should have been failing earlier, as they're cases where
a dataset doesn't have `server.index-map` but the API requires it (for example
in `datasets_detail` and `datasets_contents`). I twizzled the code a bit to
fail with a `CONFLICT` error (which had already been used elsewhere) rather
than a server internal error when the dataset is `server.archiveonly`, and
adjusted the tests to match.
@dbutenhof dbutenhof added Server API Of and relating to application programming interfaces to services and functions labels Jul 13, 2023
@dbutenhof dbutenhof self-assigned this Jul 13, 2023
Copy link
Member

@siddardh-ra siddardh-ra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@dbutenhof dbutenhof merged commit 5329c0f into distributed-system-analysis:main Jul 17, 2023
@dbutenhof dbutenhof deleted the bmark branch July 17, 2023 12:01
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm late to this game, but I think this change may have introduced a bug into one of the tests (and I think it may have removed some checks for stuff that we do want to test for)...and, as long as I'm here, I have a few other comments.

Comment on lines +124 to +125
if Metadata.getvalue(dataset, Metadata.SERVER_ARCHIVE):
raise APIAbort(HTTPStatus.CONFLICT, "Dataset is not indexed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is insufficient: in 90% of the cases when users receive this, they are going to ask, "why isn't this dataset indexed?" And, it's unlikely that they are going to realize that it is because indexing was disabled on it when it was uploaded.

So, this message really should say something more like, "Indexing of this dataset was disabled by the owner".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily by deliberate action of the owner: if a tarball is uploaded without a metadata.log in the right place, we'll accept it but mark it "archive only". I suppose I could have said that it's marked for archive only, but at the time that seemed more complicated and more obscure than just saying it wasn't indexed. Maybe "indexing was disabled" would be OK, but "by the owner" implies deliberate intent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a tarball is uploaded without a metadata.log in the right place, we'll accept it but mark it "archive only".

Ah, yes, I'd neglected that detail. But, yes, "marked for archive only" would be much better than "not indexed".

Comment on lines +615 to +620
# BENCHMARK provides a simple identification of the native benchmark where
# one can be identified. At the simplest for Pbench Agent tarballs this is
# just "dataset.metalog.pbench.script" but will be expanded in the future
# for example to identify a "pbench-user-benchmark -- fio" as "fio".
SERVER_BENCHMARK = "server.benchmark"
SERVER_BENCHMARK_UNKNOWN = "unknown"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably would be good to have a reference here to possible values for this key (i.e., to BenchmarkName or endpoints["visualization"]["benchmarks"]?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those would constitute examples, but I already added fio as an example. Maybe eventually we'll support visualization of all possible benchmarks, but right now either of those references would be incomplete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the value associated with "server.benchmark" can be anything the user wants it to be...but, if it's not one of the BenchmarkName values, the Server won't be able to recognize it, and if it's not one of endpoints["visualization"]["benchmarks"] it can't be visualized by the Dashboard, right? I vote that we say that, here....

Comment on lines 427 to +437
if not metalog:
metalog = {"pbench": {"script": "Foreign"}}
benchmark = Metadata.SERVER_BENCHMARK_UNKNOWN
metalog = {"pbench": {"name": dataset.name, "script": benchmark}}
metadata[Metadata.SERVER_ARCHIVE] = True
attributes["missing_metadata"] = True
else:
p = metalog.get("pbench")
if p:
benchmark = p.get("script", Metadata.SERVER_BENCHMARK_UNKNOWN)
else:
benchmark = Metadata.SERVER_BENCHMARK_UNKNOWN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively,

                benchmark = Metadata.SERVER_BENCHMARK_UNKNOWN
                if not metalog:
                    metalog = {"pbench": {"name": dataset.name, "script": benchmark}}
                    metadata[Metadata.SERVER_ARCHIVE] = True
                    attributes["missing_metadata"] = True
                else:
                    p = metalog.get("pbench", {"script": benchmark})
                    benchmark = p.get("script", benchmark)

Comment on lines +100 to +103
assert (
dataset.metadata["dataset.metalog.pbench.script"]
== dataset.metadata["server.benchmark"]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this assertion (generally) holds at the moment, this isn't really what we want going forward, right? That is, we're hoping that at some point we'll have pbench-user-benchmark as the script and fio as the benchmark, right? So, it would be better if we could structure this assertion in a way which doesn't break when we get where we want to be going. (I.e., "test as specification".)

(Also, you didn't take this shortcut at lines 190 and 192.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the archiveonly test, I know exactly what values to expect; this is an iteration, and all we can really know at this point is that they ought to match.

When and if we work out a more general "base benchmark" recognition that doesn't rely on dataset.metalog.pbench.script, we'll have to figure out how to mimic that on the functional test side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all we can really know at this point is that they ought to match.

They should only match currently because of an implementation detail, not because that's specifically the way we want these fields to behave. It would be best if the tests were constructed in a way that doesn't require them to change when details of the implementation change.... (Ideally, we would not change the tests nearly as often as we seem to do.)

Comment on lines 730 to +735
for dataset in datasets:
server_client.update(dataset.resource_id, access=expected)
print(f"\t ... updating {dataset.name} to {access!r}")
meta = server_client.get_metadata(
dataset.resource_id, metadata="dataset.access"
response = server_client.update(
dataset.resource_id, access=expected, raise_error=False
)
assert meta["dataset.access"] == expected
print(f"\t ... updating {dataset.name} to {access!r}")
if response.ok:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the print() at line 734 precede the operation at line 731? (And, shouldn't the print() be referencing expected rather than access, since that's what line 732 does?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shows up as a diff because I rearranged code, but I didn't change the line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. But, I still think the lines are in the wrong order, etc. 🙂

dbutenhof added a commit to dbutenhof/pbench that referenced this pull request Jul 20, 2023
PBENCH-1215

Webb pointed out a broken functional test in distributed-system-analysis#3496. Once it was testing what I
had wanted it to test, I realized that it wasn't quite working. This contains
some server and functional test changes to make archive-only dataset queries
work correctly and to be sure that's adequately tested in the functional test
cases.
dbutenhof added a commit to dbutenhof/pbench that referenced this pull request Jul 21, 2023
PBENCH-1215

Webb pointed out a broken functional test in distributed-system-analysis#3496. Once it was testing what I
had wanted it to test, I realized that it wasn't quite working. This contains
some server and functional test changes to make archive-only dataset queries
work correctly and to be sure that's adequately tested in the functional test
cases.
dbutenhof added a commit that referenced this pull request Jul 26, 2023
* Fix some problems with archive-only behavior

PBENCH-1215

Webb pointed out a broken functional test in #3496. Once it was testing what I
had wanted it to test, I realized that it wasn't quite working. This contains
some server and functional test changes to make archive-only dataset queries
work correctly and to be sure that's adequately tested in the functional test
cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Of and relating to application programming interfaces to services and functions Server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants