Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of azure obj store #3957

Closed
wiardvanrij opened this issue Mar 23, 2021 · 19 comments
Closed

Refactoring of azure obj store #3957

wiardvanrij opened this issue Mar 23, 2021 · 19 comments

Comments

@wiardvanrij
Copy link
Member

I wanted to add some extra configuration options, but then kinda hit a wall with the current testing method. As it's quite hard to extend those. So I started refactoring the azure part and I think I can squash a few issues;

#3933
#2839
#3952

Also adding;

  • Implementing our own HTTP transport method (and therefore configuration options). Just like S3.
  • Adding configuration for PipelineRetry method
  • Adding configuration for ReaderRetry method
  • Refactoring tests

As I'm quite far, I wanted to make this issue to check if no one else was on it. Also open for further feedback :)
If possible add, me as assignee to this issue?

@bwplotka bwplotka changed the title refactoring azure obj store Refactoring of azure obj store Mar 24, 2021
@bwplotka
Copy link
Member

Thanks!

I think no one is on it, and we saw some issues on GitHub e.g #3952 soi I think it's healthy to do!

@wiardvanrij
Copy link
Member Author

Updated the PR.

Absolutely not tested yet, but the image is: wiardvanrij/thanos:azure-test -> https://hub.docker.com/repository/docker/wiardvanrij/thanos

Going to test this myself later on, but would like more feedback from others..

@airkewld
Copy link

level=error ts=2021-04-29T21:49:30.0974553Z caller=main.go:130 err="yaml: unmarshal errors:\n line 3: field max_retries not found in type azure.Config\ncreate AZURE client\ngithub.com/thanos-io/thanos/pkg/objstore/client.NewBucket\n\t/go/src/github.com/thanos-io/thanos/pkg/objstore/client/factory.go:77\nmain.runCompact\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/compact.go:198\nmain.registerCompact.func1\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/compact.go:89\nmain.main\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:128\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\npreparing compact command failed\nmain.main\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:130\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"

Below are the contents of the obj secret

type: AZURE
config:
  storage_account: "storage_account"
  storage_account_key: "storage_account_key"
  container: "container_name"
  endpoint: "azure_endpoint"
  max_retries: 10

@wiardvanrij
Copy link
Member Author

@airkewld that's correct. Please check the changed config here: https://github.com/thanos-io/thanos/blob/2c435ade8d73af7ff9901a31b98a3920bfe1ef16/docs/storage.md#azure

@airkewld
Copy link

Perfect. I’ll give that a try.

Thank you

@airkewld
Copy link

@wiardvanrij ran your image all night locally with no failures.

@wiardvanrij
Copy link
Member Author

@wiardvanrij ran your image all night locally with no failures.

Awesome, what config did you use? Did you increase timeouts and/or retries?
Also, did you by any chance tested the msi_resource? I've added it but never tested it out :P

@airkewld
Copy link

airkewld commented Apr 30, 2021

This is what i had in the config

  msi_resource: ""
  pipeline_config:
    max_tries: 5
    try_timeout: 10s
    retry_delay: 0s
    max_retry_delay: 0s
  reader_config:
    max_retry_requests: 3
  http_config:
    idle_conn_timeout: 5s
    response_header_timeout: 5s
    insecure_skip_verify: false
    tls_handshake_timeout: 5s
    expect_continue_timeout: 5s
    max_idle_conns: 0
    max_idle_conns_per_host: 0
    max_conns_per_host: 0
    disable_compression: false

I'll be changing these up a bit while testing and let you know what worked best for us.

@wiardvanrij
Copy link
Member Author

I'm probably going to change the config part.

I will make it non-breaking by not removing the max_retries
This value will then be used on the max_tries in the pipeline_config and the max_retry_requests in the reader_config. I will keep the other config options, so that it can be overwritten specifically if desired.

This removes the use of having to fork Cortex and allows for some backwards compatibility.

WDYT?

@phillebaba
Copy link
Contributor

phillebaba commented May 10, 2021

@wiardvanrij I figured out what the issue was after testing a couple of things. These issues are not always clear from the documentation so the only way to solving them is trial and error.

TLDR; The solution is to change the docs to use the correct msi resource.

So I was getting a AuthorizationPermissionMismatch when Thanos was going to upload a blob, but the container was created without issues. This made me guess that the issue was caused by the audience and not that the token was invalid. The domain used to upload blobs was https://<storage-account-name>.blob.core.windows.net so I tried changing the msi resource to that instead, and then it worked.

If you look at Azure docs it states the difference between the two resource IDs.
https://docs.microsoft.com/en-us/azure/storage/common/storage-auth-aad-app?tabs=dotnet#azure-storage-resource-id

https://.blob.core.windows.net The service endpoint for a given storage account. Use this value to acquire a token for authorizing requests to that specific Azure Storage account and service only. Replace the value in brackets with the name of your storage account.

https://storage.azure.com/ Use to acquire a token for authorizing requests to any Azure Storage account.

@phillebaba
Copy link
Contributor

Another point is that it was not enough for the SP to be Contributor for the Storage Account, it also needs to be Storage Blob Data Contributor.

@wiardvanrij
Copy link
Member Author

@phillebaba thanks for the info.
The problem here is that I've made a feature to fix some things but I don't have explicit Azure knowledge. Not ideal but Azure really needed some 'upgrades'.

Therefore your (and others) input on the usage + documentation is 100% viable to make everything neat.

Could you perhaps propose some documentation addition/change so I can include that in the PR? Thanks so much.

@phillebaba
Copy link
Contributor

phillebaba commented May 10, 2021

I get that it can be issue developing without a "real" testing environment. I am just happy that you took the time to make the changes so the least I can do is test them 😄

Added some suggested changes which would make the docs correct.

@phillebaba
Copy link
Contributor

@wiardvanrij is there anything I can help with to move this issue forward?

@wiardvanrij
Copy link
Member Author

@wiardvanrij is there anything I can help with to move this issue forward?

I'm waiting for an other PR as this includes something I really want to use. This makes everything a bit better code-wise :)
Would expect this feature to be included in new release, at least that is my intend.

@phillebaba
Copy link
Contributor

Great, in that case I will hang back :)

bwplotka pushed a commit that referenced this issue Jul 5, 2021
* Rebased feature - initial commit of azure obj store extend

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* use custom cortex to fix config change

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* modules acting up

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* remove sprint

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* adds dot

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* removed need for changes on cortex side

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* adds changelog

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* Update docs/storage.md

Co-authored-by: Philip Laine <philip.laine@gmail.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>

* Update pkg/objstore/azure/azure_test.go

Co-authored-by: Philip Laine <philip.laine@gmail.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>

* Update pkg/objstore/azure/azure_test.go

Co-authored-by: Philip Laine <philip.laine@gmail.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>

* update a few cleanups

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* fixes go.mod and go.sum?

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* fixes whitespace

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* fixes space

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* updates azure Go-autorest

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* fixes readme

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* fixes go.sum.. (╯°□°)╯︵ ┻━┻

Signed-off-by: Wiard van Rij <wiard@outlook.com>

* e2e retest

Signed-off-by: Wiard van Rij <wiard@outlook.com>

Co-authored-by: Philip Laine <philip.laine@gmail.com>
@stale
Copy link

stale bot commented Jul 17, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jul 17, 2021
@wiardvanrij
Copy link
Member Author

This is actually implemented.

@airkewld
Copy link

Thank you so much for the hard work on this issue @wiardvanrij

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants