Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/disk: some follow ups #4519

Conversation

srenatus
Copy link
Contributor

@srenatus srenatus commented Mar 30, 2022

  • manage /system partitions on our own
  • document bundle limitations
  • check backwards compatibility wrt wildcards
  • check prefix when adding pattern partitions

@srenatus srenatus force-pushed the sr/storage-disk/partitioning-follow-ups branch 3 times, most recently from 9c10b21 to 26d7e40 Compare March 30, 2022 12:30
@srenatus srenatus marked this pull request as ready for review March 30, 2022 12:32
}

for _, path := range addedPartitions {
for _, path := range addedPartitions.Diff(replacements) {
for i := len(path); i > 0; i-- {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 I'll double check tomorrow if this logic (not the change here, but what's in main) needs adjustments for patterned partitions. I think we are open to ignoring existing data that we should care about? Because the len(path) logic followed by a lookup will not take * into account at all.

The impact is when adding wildcard partitions to a database with existing data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed this in c8f9305. I'll add a test case tomorrow.

@srenatus srenatus force-pushed the sr/storage-disk/partitioning-follow-ups branch from 26d7e40 to c8f9305 Compare March 30, 2022 12:46
@srenatus
Copy link
Contributor Author

Got some tests to fix there...

@srenatus srenatus force-pushed the sr/storage-disk/partitioning-follow-ups branch 2 times, most recently from 63058fd to 6e27125 Compare March 30, 2022 16:07
Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
If these are found in the user-provided partitions, we'll error out.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
It is now allowed to replace a partition like

    /foo/bar

by

    /foo/*

also if multiple wildcards are used.

Caveats:

You cannot add a wildcard partition like /*/*, since it would overlap
the managed "/system/*" partition.

When attempting to go back from /foo/* to /foo/bar, an error is
raised _unconditionally_ -- we could check the existing data, but
currently don't.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
The previously done check would have falsely returned that there is no problem
when adding a wildcard partition: lookup of "/foo/*" with '*' not interpreted
as a wildcard, but as a string, would yield a not-found, even if there was any
data under /foo/.

Now, we'll check the prefix-until-wildcard. It's more cautious than
theoretically necessary, but safe.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
@srenatus srenatus force-pushed the sr/storage-disk/partitioning-follow-ups branch from 6e27125 to 8a9546c Compare March 31, 2022 07:29
Copy link
Member

@anderseknert anderseknert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@srenatus srenatus merged commit 51181a8 into open-policy-agent:main Mar 31, 2022
@srenatus srenatus deleted the sr/storage-disk/partitioning-follow-ups branch March 31, 2022 07:52
damienjburks added a commit to damienjburks/opa that referenced this pull request Apr 1, 2022
Signed-off-by: Damien Burks <damien@damienjburks.com>

removing space from testcase

Signed-off-by: Damien Burks <damien@damienjburks.com>

website: make local dev and PR preview not build everything (open-policy-agent#4474)

With this change, the work done for local development, and the per-PR netlify preview changes:

It will no longer include the website stuff for versions other than edge, the current working tree.

We thus save us the time, and the flakiness, involved with

- checking if github has release binaries for all the versions
- checking out their sources
- fetching the release binaries to pre-hydrate old versions' live-blocks.

The previously-used, documented make target should still be intact.

Fixes open-policy-agent#4379 to some extent, I hope.

* docs/website: remove "latest" binary from opa versions cache

Having a stale binary here -- one called "latest" but not actually
being "latest" -- causes issues like this: when building the website
content for the (real) latest version, the script would take the
old (previous-latest) binary, and fail because that binary didn't
know the latest future keywords.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix download link redirect (open-policy-agent#4482)

Fixes the bug introduced in open-policy-agent#4474 causing the setup-opa action to fail
to retrieve the "latest" opa binary.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(deps): bump github.com/yashtewari/glob-intersection to v0.1.0 (open-policy-agent#4481)

https://github.com/yashtewari/glob-intersection/releases/tag/v0.1.0

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs/website: fix /docs redirect (open-policy-agent#4483)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix template to produce proper redirects (open-policy-agent#4484)

Also redirect /docs -> /docs/edge in preview

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs/envoy: include new configurable gRPC msg sizes (open-policy-agent#4459)

Docs for open-policy-agent/opa-envoy-plugin#323

Signed-off-by: Elliot Maincourt <e.maincourt@gmail.com>

website/live-blocks: update caniuse/browserlist

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website/live-blocks: don't call the github api to determine release asset urls

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: don't run generate twice (open-policy-agent#4488)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

ast: Fix print call rewriting in else rules

The compiler was accidentally checking/rewriting print calls in the
body of else rules before the implicit args of the else rule were
processed. As a result, the compiler was generating false-positive
errors for refs to undeclared args. The root of the problem was the
usage of WalkBodies on the top-level rule (which implicitly walks the
bodies of else rules under the top-level rule). With this change, the
compiler will call WalkBodies on the head and body of each rule rather
than the entire rule itself (which includes the else chain).

Fixes open-policy-agent#4489

Signed-off-by: Torin Sandall <torinsandall@gmail.com>

Update maintainer status, add Will Beason (open-policy-agent#4465)

Signed-off-by: Max Smythe <smythe@google.com>

Co-authored-by: Will Beason <willbeason@google.com>

Make maintainers list alphabetical (open-policy-agent#4491)

Signed-off-by: Max Smythe <smythe@google.com>

plugins/logs: Fix broken retry logic

We were incorrectly resetting the retry counter after
every error condition instead of using the incremented
value. As a result, retry delay would always be 0s.
This meant that if OPA encountered an error while
uploading decision logs it would immediately retry
instead of doing an exponential backoff.

Fixes: open-policy-agent#4486

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>

logging: mask authorization header value in debug logs (open-policy-agent#4496)

Fixes open-policy-agent#4495

Signed-off-by: Anders Eknert <anders@eknert.com>

Fixes open-policy-agent#4376 (open-policy-agent#4494)

Do note though that this does not change how multiple
with statements are grouped. Although I agree with that,
it's IMHO a separate feature request, while the spacing
issue is a bug.

Signed-off-by: Anders Eknert <anders@eknert.com>

runtime: improve log output for binary response (open-policy-agent#4498)

This change omits the response body when using compression on metrics endpoint and when pprof is enabled.

Signed-off-by: Christian Altamirano <christian.altamirano.ayala@gmail.com>

build: bump golang: 1.17 -> 1.18

No change to go.mod's `go` stanza, so no changes in code compatibility.

However, it's used for building our docker images and release
binaries, and for fuzz testing in our nightly workflow.

Some test-related changes with the dns lookup built-in function's
error handling; and the hardcoded signature. Running

    go test ./topdown -run TestTopdownJWTEncodeSignECWithSeedReturnsSameSignature -count 10000

makes me believe that for whatever reason the signature changed,
it's at least stable.

topdown/http_test: Test-only change to accomodate this change in Go (https://go.dev/doc/go1.18):

    Certificate.Verify now uses platform APIs to verify certificate
    validity on macOS and iOS when it is called with a nil
    VerifyOpts.Roots or when using the root pool returned from
    SystemCertPool.

We're keeping the old message for go <= 1.17; in a silly-simple way.

Also:

* ci: build and test two old golang version on macos|linux

  We'll drop golang 1.15, keep one unsupported version (1.16).

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

ci: remove go-fuzz, use native go 1.18 fuzzer

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(dev-deps): update live-scripts deps: mocha, nanoid (open-policy-agent#4500)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(deps): bump nanoid in /docs/website/scripts/live-blocks (open-policy-agent#4501)

update bob_token and alice_token in markdown (open-policy-agent#4504)

Signed-off-by: yongen.pan <yongen.pan@daocloud.io>

topdown/eval: copy without modifying expr, update test/e2e/concurrency (open-policy-agent#4503)

What we previously did turned into a race condition with multiple
concurrent calls to /v1/compile.

With a change introduced with 0.38.0 (the `every` keyword), the
`nil` Terms of an `ast.Expr` node was surfaced: previously, it would
go unnoticed, but could potentially have yielded bad results.

The effect of this change is proven using a new e2e test that would
fail on the code we had previous.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

uploading latest changes

Signed-off-by: Damien Burks <damien@damienjburks.com>

fixing test cases

Signed-off-by: Damien Burks <damien@damienjburks.com>

sdk: avoid using different state (open-policy-agent#4505)

I noticed that when operating on opa.state, locking is usually done to avoid
a race, whereas here opa.state is used directly. By comparing the previous
changes, I found that open-policy-agent#4240 changed the previous behaviour.

This change adjusts that: we ensure that we work on the state as read via
the mutex-protected `s := *opa.state`.

Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>

build(deps): update opentelemetry-go: 1.5.0 -> 1.6.1 (metrics 0.28.0) (open-policy-agent#4467)

https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.6.0
https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.6.1
https://github.com/open-telemetry/opentelemetry-go-contrib/releases/tag/v1.6.0

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(deps): open opentelemetry-go 1.6.0 -> 1.6.1 (open-policy-agent#4512)

This is a left-over from the previous bump, 0856120.

https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.6.1

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

sdk/opa_test: increase max delta (0.5s) (open-policy-agent#4513)

When running on GHA, we've found this test to often fail on macos-latest:
It would not functionally be wrong, but it also wasn't able to finish in
the alloted time. Now, the maximum delta has been increased a lot (10ms to
500ms). It's much, but it's still good enough to ensure that the context
passed to Stop() is the one that matters for shutdown.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs: Add clarifications for docs describing bundle signing and delta features

This commit adds some clarification to the bundle doc regarding
the format of bundle signatures and manifest behavior for delta
bundles.

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>

runtime: change processed file event log level to info (open-policy-agent#4514)

"Processed file watch event." will now be logged at level "info" (was "warn").

Signed-off-by: Anders Eknert <anders@eknert.com>

runtime/logging: only suppress payloads for handlers that compress responses (open-policy-agent#4502)

* runtime/logging: only suppress payloads for handlers that compress responses

To get compressed responses, two things need to be true:

1. The client must accept compressed responses
2. The handler must reply with a compressed response

For general API requests, (1) holds most of the time. (2) is only true
for the metrics endpoint at them moment, since the 3rd party library we
use for serving the prometheus endpoint will do compression.

* runtime/logging: remove dead code

The http.Hijack stuff was related to a watch feature removed in
open-policy-agent@186ef99

dropInputParam was only used by its tests.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

storage: code cosmetics

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

storage: allow implementations to override MakeDir

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

runtime+storage: integrate disk storage

With this change, the disk backend (badger) becomes available for
use with the OPA runtime properly:

It can be configured using the `storage.disk` key in OPA's config
(see included documentation).

When enabled,
- any data or policies stored with OPA will persist over restarts
- per-query metrics related to disk usage are reported
- Prometheus metrics per storage operation are exported

The main intention behind this feature is to optimize memory usage:
OPA can now operate on more data than fits into the allotted memory
resources. It is NOT meant to be used as a primary source of truth:
there are no backup/restore or desaster recovery procedures -- you
MUST secure the means to restore the data stored with OPA's disk
storage by yourself.

See also open-policy-agent#4014. Future improvements around bundle loading are
planned.

Some notes on details:

storage/disk: impose same locking regime used with inmem

With this setup, we'll ensure:

- there is only one open write txn at a time
- there are any number of open read txns at a time
- writes are blocked when reads are inflight
- during a commit (and triggers being run), no read txns can be created

This is to ensure the same atomic policy update semantics when using
'disk" as we have with "inmem". We're basically opting out of badger's
currency control and transactionality guarantees. This is because we
cannot piggy back on that to ensure the atomic update we want.

There might be other ways -- using subscribers, and blocking in some
other place -- but this one seems preferrable since it mirrors inmem.

Part of the problem is ErrTxnTooLarge, and committing and renewing
txns when it occurs: that, which is the prescribed solution to txns
growing too big, also means that reads can see half of the "logical"
transaction having been committed, while the rest is still getting
processed.

Another approach would have been using `WriteBatch`, but that won't
let us read from the batch, only apply Set and Delete operations.
We currently need to read (via an iterator) to figure out if we
need to delete keys to replace something in the store.  There is
no DropPrefix operation on the badger txn, or the WriteBatch API.

storage/disk: remove commit-and-renew-txn code for txn-too-big errors

This would break transactional guarantees we care about: while there
can be only one write transaction at a time, read transactions may
happen while a write txn is underway -- with this commit-and-reset
logic, those would read partial data.

Now, the error will be returned to the caller. The maximum txn size
depends on the size of memtables, and could be tweaked manually.
In general, the caller should try to push multiple smaller increments
of the data.

storage/disk: implement noop MakeDir

The MakeDir operation as implemented in the backend-agnostic storage
code has become an issue with the disk store: to write /foo/bar/baz,
we'd have to read /foo (among other subdirs), and that can be _much_
work for the disk backend. With inmem, it's cheap, so this wasn't
problematic before.

Some of the storage/disk/txn.go logic had to be adjusted to properly
do the MakeDir steps implicitly.

The index argument addition to patch() in storage/disk/txn.go was
necessary to keep the error messages conforming to the previous
code path: previously, conflicts (arrays indexed as objects) would
be surfaced in the MakeDir step, now it's entangled with the patch
calculation.

storage/disk: check ctx.Err() in List/Get operations

This won't abort reading a single key, but it will abort iterations.

storage/disk: support patterns in partitions

There is a potential clash here: "*", the path wildcard, is
a valid path section. However, it only affects the case when
a user would want to have a partition at

    /foo/*/bar

and would really mean "*", and not the wildcard.

Storing data at /foo/*/bar with a literal "*" won't be treated
differently than storing something at /fo/xyz/bar.

storage/disk: keep per-txn-type histograms of stats

This is done by reading off the metrics on commit, and shovelling
their numbers into the prometheus collector.

NOTE: if you were to share a metrics object among multiple transactions,
the results would be skewed, as it's not reset. However, our server
handlers don't do that.

storage/disk: opt out of badger's conflict detection

With only one write transaction in flight at any time, the situation
that badger guards against cannot happen:

A transaction has written to a key after the current, to-be-committed
transaction has last read that key from the store.

Since it can't happen, we can ignore the bookkeeping involved. This
improves the time it takes to overwrite existing keys.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs/management-bundles: remove quotations with info boxes (open-policy-agent#4517)

Small follow-up to open-policy-agent#4515, using info boxes instead of generic markup.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix community site (open-policy-agent#4522)

* remove hugo templating

This page doesn't go through hugo at all, so these things would
end up on the website.

A quick fix is reverting the bits that added the templating. The
downside is that local dev and PR previews will link to
/docs/latest/ecosystem, which doesn't exist there.

* link Documentation and Download to /docs

For /docs/ (not /docs/latest) we have a redirect in local dev and
PR preview mode:

    /docs -> /docs/edge

so this will be a little less broken: only ecosystem is a 404 link
on the preview.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

integrations: Add Emissary-Ingress (open-policy-agent#4523)

This PR adds the Emissary ingress to the Ecosystem page.
The blog section contains link to an article which guides on Emissary ingress and OPA integrations.

Signed-off-by: Tayyab J <tayyab.jamadar@styra.com>

storage/disk: wildcard partition validation, docs caveat (open-policy-agent#4519)

A bunch of smaller follow-up tasks to open-policy-agent#4381.

* storage/disk_test: check invalid patches with wildcard partition, too
* docs/disk: add caveat re: bundles loaded into memory
* storage/disk: auto-manage /system partitions

If these are found in the user-provided partitions, we'll error out.

* storage/disk: pretty-print partitions with "*" instead of %2A
* storage/disk: respect wildcard-replacement in partition validation

It is now allowed to replace a partition like

    /foo/bar

by

    /foo/*

also if multiple wildcards are used.

Caveats:

You cannot add a wildcard partition like /*/*, since it would overlap
the managed "/system/*" partition.

When attempting to go back from /foo/* to /foo/bar, an error is
raised _unconditionally_ -- we could check the existing data, but
currently don't.

* storage/disk: check prefix when adding wildcard partitions

The previously done check would have falsely returned that there is no problem
when adding a wildcard partition: lookup of "/foo/*" with '*' not interpreted
as a wildcard, but as a string, would yield a not-found, even if there was any
data under /foo/.

Now, we'll check the prefix-until-wildcard. It's more cautious than
theoretically necessary, but safe.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website/style: fix code block bottom margin (open-policy-agent#4526)

The text following regular code blocks was too close. Now, the bottom
margin is the same was used with live-blocks, 1.5rem (24px).

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs: Add CloudFormation Hook tutorial (open-policy-agent#4525)

Obviously missing is deployment of OPA to AWS. This will come in the next iteration.

Signed-off-by: Anders Eknert <anders@eknert.com>

Prepare v0.39.0 release (open-policy-agent#4524)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

Prepare v0.40.0 development (open-policy-agent#4527)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix styra support link

Signed-off-by: Torin Sandall <torinsandall@gmail.com>

adopters: Add Wealthsimple (open-policy-agent#4530)

Signed-off-by: hannahyeates <hannahgrey06@gmail.com>

ci: fix rego check (open-policy-agent#4532)

* build/policies: format using 0.39.0
* workflow/pull-request: use edge opa for rego PR checks

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: remove right margin on sidebar (open-policy-agent#4529)

* remove right margin on sidebar
* replace the removed margin with padding

Signed-off-by: orweis <orweis@gmail.com>

website: Remove unused variables to avoid error in strict mode(open-policy-agent#4534)

To Fix:
```
2 errors occurred:
policy.rego:13: rego_compile_error: assigned var header unused
policy.rego:13: rego_compile_error: assigned var signature unused
```

Signed-off-by: panpan0000 <panpan0000@msn.com>

website: show yellow banner for old version (open-policy-agent#4533)

Hopefully this is gets more attention than the yellow "older" button
in the version picker.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

fixing disk diagnostic test case
anderseknert pushed a commit to damienjburks/opa that referenced this pull request Apr 5, 2022
Signed-off-by: Damien Burks <damien@damienjburks.com>

removing space from testcase

Signed-off-by: Damien Burks <damien@damienjburks.com>

website: make local dev and PR preview not build everything (open-policy-agent#4474)

With this change, the work done for local development, and the per-PR netlify preview changes:

It will no longer include the website stuff for versions other than edge, the current working tree.

We thus save us the time, and the flakiness, involved with

- checking if github has release binaries for all the versions
- checking out their sources
- fetching the release binaries to pre-hydrate old versions' live-blocks.

The previously-used, documented make target should still be intact.

Fixes open-policy-agent#4379 to some extent, I hope.

* docs/website: remove "latest" binary from opa versions cache

Having a stale binary here -- one called "latest" but not actually
being "latest" -- causes issues like this: when building the website
content for the (real) latest version, the script would take the
old (previous-latest) binary, and fail because that binary didn't
know the latest future keywords.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix download link redirect (open-policy-agent#4482)

Fixes the bug introduced in open-policy-agent#4474 causing the setup-opa action to fail
to retrieve the "latest" opa binary.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(deps): bump github.com/yashtewari/glob-intersection to v0.1.0 (open-policy-agent#4481)

https://github.com/yashtewari/glob-intersection/releases/tag/v0.1.0

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs/website: fix /docs redirect (open-policy-agent#4483)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix template to produce proper redirects (open-policy-agent#4484)

Also redirect /docs -> /docs/edge in preview

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs/envoy: include new configurable gRPC msg sizes (open-policy-agent#4459)

Docs for open-policy-agent/opa-envoy-plugin#323

Signed-off-by: Elliot Maincourt <e.maincourt@gmail.com>

website/live-blocks: update caniuse/browserlist

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website/live-blocks: don't call the github api to determine release asset urls

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: don't run generate twice (open-policy-agent#4488)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

ast: Fix print call rewriting in else rules

The compiler was accidentally checking/rewriting print calls in the
body of else rules before the implicit args of the else rule were
processed. As a result, the compiler was generating false-positive
errors for refs to undeclared args. The root of the problem was the
usage of WalkBodies on the top-level rule (which implicitly walks the
bodies of else rules under the top-level rule). With this change, the
compiler will call WalkBodies on the head and body of each rule rather
than the entire rule itself (which includes the else chain).

Fixes open-policy-agent#4489

Signed-off-by: Torin Sandall <torinsandall@gmail.com>

Update maintainer status, add Will Beason (open-policy-agent#4465)

Signed-off-by: Max Smythe <smythe@google.com>

Co-authored-by: Will Beason <willbeason@google.com>

Make maintainers list alphabetical (open-policy-agent#4491)

Signed-off-by: Max Smythe <smythe@google.com>

plugins/logs: Fix broken retry logic

We were incorrectly resetting the retry counter after
every error condition instead of using the incremented
value. As a result, retry delay would always be 0s.
This meant that if OPA encountered an error while
uploading decision logs it would immediately retry
instead of doing an exponential backoff.

Fixes: open-policy-agent#4486

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>

logging: mask authorization header value in debug logs (open-policy-agent#4496)

Fixes open-policy-agent#4495

Signed-off-by: Anders Eknert <anders@eknert.com>

Fixes open-policy-agent#4376 (open-policy-agent#4494)

Do note though that this does not change how multiple
with statements are grouped. Although I agree with that,
it's IMHO a separate feature request, while the spacing
issue is a bug.

Signed-off-by: Anders Eknert <anders@eknert.com>

runtime: improve log output for binary response (open-policy-agent#4498)

This change omits the response body when using compression on metrics endpoint and when pprof is enabled.

Signed-off-by: Christian Altamirano <christian.altamirano.ayala@gmail.com>

build: bump golang: 1.17 -> 1.18

No change to go.mod's `go` stanza, so no changes in code compatibility.

However, it's used for building our docker images and release
binaries, and for fuzz testing in our nightly workflow.

Some test-related changes with the dns lookup built-in function's
error handling; and the hardcoded signature. Running

    go test ./topdown -run TestTopdownJWTEncodeSignECWithSeedReturnsSameSignature -count 10000

makes me believe that for whatever reason the signature changed,
it's at least stable.

topdown/http_test: Test-only change to accomodate this change in Go (https://go.dev/doc/go1.18):

    Certificate.Verify now uses platform APIs to verify certificate
    validity on macOS and iOS when it is called with a nil
    VerifyOpts.Roots or when using the root pool returned from
    SystemCertPool.

We're keeping the old message for go <= 1.17; in a silly-simple way.

Also:

* ci: build and test two old golang version on macos|linux

  We'll drop golang 1.15, keep one unsupported version (1.16).

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

ci: remove go-fuzz, use native go 1.18 fuzzer

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(dev-deps): update live-scripts deps: mocha, nanoid (open-policy-agent#4500)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(deps): bump nanoid in /docs/website/scripts/live-blocks (open-policy-agent#4501)

update bob_token and alice_token in markdown (open-policy-agent#4504)

Signed-off-by: yongen.pan <yongen.pan@daocloud.io>

topdown/eval: copy without modifying expr, update test/e2e/concurrency (open-policy-agent#4503)

What we previously did turned into a race condition with multiple
concurrent calls to /v1/compile.

With a change introduced with 0.38.0 (the `every` keyword), the
`nil` Terms of an `ast.Expr` node was surfaced: previously, it would
go unnoticed, but could potentially have yielded bad results.

The effect of this change is proven using a new e2e test that would
fail on the code we had previous.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

uploading latest changes

Signed-off-by: Damien Burks <damien@damienjburks.com>

fixing test cases

Signed-off-by: Damien Burks <damien@damienjburks.com>

sdk: avoid using different state (open-policy-agent#4505)

I noticed that when operating on opa.state, locking is usually done to avoid
a race, whereas here opa.state is used directly. By comparing the previous
changes, I found that open-policy-agent#4240 changed the previous behaviour.

This change adjusts that: we ensure that we work on the state as read via
the mutex-protected `s := *opa.state`.

Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>

build(deps): update opentelemetry-go: 1.5.0 -> 1.6.1 (metrics 0.28.0) (open-policy-agent#4467)

https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.6.0
https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.6.1
https://github.com/open-telemetry/opentelemetry-go-contrib/releases/tag/v1.6.0

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

build(deps): open opentelemetry-go 1.6.0 -> 1.6.1 (open-policy-agent#4512)

This is a left-over from the previous bump, 0856120.

https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.6.1

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

sdk/opa_test: increase max delta (0.5s) (open-policy-agent#4513)

When running on GHA, we've found this test to often fail on macos-latest:
It would not functionally be wrong, but it also wasn't able to finish in
the alloted time. Now, the maximum delta has been increased a lot (10ms to
500ms). It's much, but it's still good enough to ensure that the context
passed to Stop() is the one that matters for shutdown.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs: Add clarifications for docs describing bundle signing and delta features

This commit adds some clarification to the bundle doc regarding
the format of bundle signatures and manifest behavior for delta
bundles.

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>

runtime: change processed file event log level to info (open-policy-agent#4514)

"Processed file watch event." will now be logged at level "info" (was "warn").

Signed-off-by: Anders Eknert <anders@eknert.com>

runtime/logging: only suppress payloads for handlers that compress responses (open-policy-agent#4502)

* runtime/logging: only suppress payloads for handlers that compress responses

To get compressed responses, two things need to be true:

1. The client must accept compressed responses
2. The handler must reply with a compressed response

For general API requests, (1) holds most of the time. (2) is only true
for the metrics endpoint at them moment, since the 3rd party library we
use for serving the prometheus endpoint will do compression.

* runtime/logging: remove dead code

The http.Hijack stuff was related to a watch feature removed in
open-policy-agent@186ef99

dropInputParam was only used by its tests.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

storage: code cosmetics

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

storage: allow implementations to override MakeDir

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

runtime+storage: integrate disk storage

With this change, the disk backend (badger) becomes available for
use with the OPA runtime properly:

It can be configured using the `storage.disk` key in OPA's config
(see included documentation).

When enabled,
- any data or policies stored with OPA will persist over restarts
- per-query metrics related to disk usage are reported
- Prometheus metrics per storage operation are exported

The main intention behind this feature is to optimize memory usage:
OPA can now operate on more data than fits into the allotted memory
resources. It is NOT meant to be used as a primary source of truth:
there are no backup/restore or desaster recovery procedures -- you
MUST secure the means to restore the data stored with OPA's disk
storage by yourself.

See also open-policy-agent#4014. Future improvements around bundle loading are
planned.

Some notes on details:

storage/disk: impose same locking regime used with inmem

With this setup, we'll ensure:

- there is only one open write txn at a time
- there are any number of open read txns at a time
- writes are blocked when reads are inflight
- during a commit (and triggers being run), no read txns can be created

This is to ensure the same atomic policy update semantics when using
'disk" as we have with "inmem". We're basically opting out of badger's
currency control and transactionality guarantees. This is because we
cannot piggy back on that to ensure the atomic update we want.

There might be other ways -- using subscribers, and blocking in some
other place -- but this one seems preferrable since it mirrors inmem.

Part of the problem is ErrTxnTooLarge, and committing and renewing
txns when it occurs: that, which is the prescribed solution to txns
growing too big, also means that reads can see half of the "logical"
transaction having been committed, while the rest is still getting
processed.

Another approach would have been using `WriteBatch`, but that won't
let us read from the batch, only apply Set and Delete operations.
We currently need to read (via an iterator) to figure out if we
need to delete keys to replace something in the store.  There is
no DropPrefix operation on the badger txn, or the WriteBatch API.

storage/disk: remove commit-and-renew-txn code for txn-too-big errors

This would break transactional guarantees we care about: while there
can be only one write transaction at a time, read transactions may
happen while a write txn is underway -- with this commit-and-reset
logic, those would read partial data.

Now, the error will be returned to the caller. The maximum txn size
depends on the size of memtables, and could be tweaked manually.
In general, the caller should try to push multiple smaller increments
of the data.

storage/disk: implement noop MakeDir

The MakeDir operation as implemented in the backend-agnostic storage
code has become an issue with the disk store: to write /foo/bar/baz,
we'd have to read /foo (among other subdirs), and that can be _much_
work for the disk backend. With inmem, it's cheap, so this wasn't
problematic before.

Some of the storage/disk/txn.go logic had to be adjusted to properly
do the MakeDir steps implicitly.

The index argument addition to patch() in storage/disk/txn.go was
necessary to keep the error messages conforming to the previous
code path: previously, conflicts (arrays indexed as objects) would
be surfaced in the MakeDir step, now it's entangled with the patch
calculation.

storage/disk: check ctx.Err() in List/Get operations

This won't abort reading a single key, but it will abort iterations.

storage/disk: support patterns in partitions

There is a potential clash here: "*", the path wildcard, is
a valid path section. However, it only affects the case when
a user would want to have a partition at

    /foo/*/bar

and would really mean "*", and not the wildcard.

Storing data at /foo/*/bar with a literal "*" won't be treated
differently than storing something at /fo/xyz/bar.

storage/disk: keep per-txn-type histograms of stats

This is done by reading off the metrics on commit, and shovelling
their numbers into the prometheus collector.

NOTE: if you were to share a metrics object among multiple transactions,
the results would be skewed, as it's not reset. However, our server
handlers don't do that.

storage/disk: opt out of badger's conflict detection

With only one write transaction in flight at any time, the situation
that badger guards against cannot happen:

A transaction has written to a key after the current, to-be-committed
transaction has last read that key from the store.

Since it can't happen, we can ignore the bookkeeping involved. This
improves the time it takes to overwrite existing keys.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs/management-bundles: remove quotations with info boxes (open-policy-agent#4517)

Small follow-up to open-policy-agent#4515, using info boxes instead of generic markup.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix community site (open-policy-agent#4522)

* remove hugo templating

This page doesn't go through hugo at all, so these things would
end up on the website.

A quick fix is reverting the bits that added the templating. The
downside is that local dev and PR previews will link to
/docs/latest/ecosystem, which doesn't exist there.

* link Documentation and Download to /docs

For /docs/ (not /docs/latest) we have a redirect in local dev and
PR preview mode:

    /docs -> /docs/edge

so this will be a little less broken: only ecosystem is a 404 link
on the preview.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

integrations: Add Emissary-Ingress (open-policy-agent#4523)

This PR adds the Emissary ingress to the Ecosystem page.
The blog section contains link to an article which guides on Emissary ingress and OPA integrations.

Signed-off-by: Tayyab J <tayyab.jamadar@styra.com>

storage/disk: wildcard partition validation, docs caveat (open-policy-agent#4519)

A bunch of smaller follow-up tasks to open-policy-agent#4381.

* storage/disk_test: check invalid patches with wildcard partition, too
* docs/disk: add caveat re: bundles loaded into memory
* storage/disk: auto-manage /system partitions

If these are found in the user-provided partitions, we'll error out.

* storage/disk: pretty-print partitions with "*" instead of %2A
* storage/disk: respect wildcard-replacement in partition validation

It is now allowed to replace a partition like

    /foo/bar

by

    /foo/*

also if multiple wildcards are used.

Caveats:

You cannot add a wildcard partition like /*/*, since it would overlap
the managed "/system/*" partition.

When attempting to go back from /foo/* to /foo/bar, an error is
raised _unconditionally_ -- we could check the existing data, but
currently don't.

* storage/disk: check prefix when adding wildcard partitions

The previously done check would have falsely returned that there is no problem
when adding a wildcard partition: lookup of "/foo/*" with '*' not interpreted
as a wildcard, but as a string, would yield a not-found, even if there was any
data under /foo/.

Now, we'll check the prefix-until-wildcard. It's more cautious than
theoretically necessary, but safe.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website/style: fix code block bottom margin (open-policy-agent#4526)

The text following regular code blocks was too close. Now, the bottom
margin is the same was used with live-blocks, 1.5rem (24px).

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

docs: Add CloudFormation Hook tutorial (open-policy-agent#4525)

Obviously missing is deployment of OPA to AWS. This will come in the next iteration.

Signed-off-by: Anders Eknert <anders@eknert.com>

Prepare v0.39.0 release (open-policy-agent#4524)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

Prepare v0.40.0 development (open-policy-agent#4527)

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: fix styra support link

Signed-off-by: Torin Sandall <torinsandall@gmail.com>

adopters: Add Wealthsimple (open-policy-agent#4530)

Signed-off-by: hannahyeates <hannahgrey06@gmail.com>

ci: fix rego check (open-policy-agent#4532)

* build/policies: format using 0.39.0
* workflow/pull-request: use edge opa for rego PR checks

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

website: remove right margin on sidebar (open-policy-agent#4529)

* remove right margin on sidebar
* replace the removed margin with padding

Signed-off-by: orweis <orweis@gmail.com>

website: Remove unused variables to avoid error in strict mode(open-policy-agent#4534)

To Fix:
```
2 errors occurred:
policy.rego:13: rego_compile_error: assigned var header unused
policy.rego:13: rego_compile_error: assigned var signature unused
```

Signed-off-by: panpan0000 <panpan0000@msn.com>

website: show yellow banner for old version (open-policy-agent#4533)

Hopefully this is gets more attention than the yellow "older" button
in the version picker.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>

fixing disk diagnostic test case

removing additional space

Signed-off-by: Damien Burks <damien@damienjburks.com>

removing test files

Signed-off-by: Damien Burks <damien@damienjburks.com>

fixing linting issue

Signed-off-by: Damien Burks <damien@damienjburks.com>

adding test case for logging

Signed-off-by: Damien Burks <damien@damienjburks.com>

ecosystem: adding permit.io (open-policy-agent#4531)

Signed-off-by: Oz Radiano <oz@permit.io>

docs: CloudFormation fixes (open-policy-agent#4541)

* Fix step headers to use h3
* Add warning about boolean values conversion

Signed-off-by: Anders Eknert <anders@eknert.com>
rokkiter pushed a commit to rokkiter/opa that referenced this pull request Apr 18, 2022
…-agent#4519)

A bunch of smaller follow-up tasks to open-policy-agent#4381.

* storage/disk_test: check invalid patches with wildcard partition, too
* docs/disk: add caveat re: bundles loaded into memory
* storage/disk: auto-manage /system partitions

If these are found in the user-provided partitions, we'll error out.

* storage/disk: pretty-print partitions with "*" instead of %2A
* storage/disk: respect wildcard-replacement in partition validation

It is now allowed to replace a partition like

    /foo/bar

by

    /foo/*

also if multiple wildcards are used.

Caveats:

You cannot add a wildcard partition like /*/*, since it would overlap
the managed "/system/*" partition.

When attempting to go back from /foo/* to /foo/bar, an error is
raised _unconditionally_ -- we could check the existing data, but
currently don't.

* storage/disk: check prefix when adding wildcard partitions

The previously done check would have falsely returned that there is no problem
when adding a wildcard partition: lookup of "/foo/*" with '*' not interpreted
as a wildcard, but as a string, would yield a not-found, even if there was any
data under /foo/.

Now, we'll check the prefix-until-wildcard. It's more cautious than
theoretically necessary, but safe.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants