Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Getting the client-id and client-secret from stdout #37

Closed
AlexMercedCoder opened this issue Jul 31, 2024 · 9 comments
Closed

[BUG] Getting the client-id and client-secret from stdout #37

AlexMercedCoder opened this issue Jul 31, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@AlexMercedCoder
Copy link

A couple issues I had running through the quickstart, which just may documentation issues.

  1. I deployed using Docker Compose

Looking at the stdout output I did not find what the docs asked me to look for:

**Bootstrapped with credentials: {"client-id": "XXXX", "client-secret": "YYYY"}**

But I did find this which I assumed was what I was looking for:

  realm: default-realm root principal credentials: fa44645a04410a0e:f1b82a42de2295da466682d3cfdbb0f1

I assume the format in this case was id:secret

So then ran the CLI command

./polaris \
  --client-id fa44645a04410a0e \
  --client-secret f1b82a42de2295da466682d3cfdbb0f1 \
  catalogs \
  create \
  --storage-type s3 \
  --default-base-location s3://some-example-bucket/folder/ \
  --role-arn arn:aws:iam::###########:role/polaris-storage \
  first-polaris-catalog

I ran into a few issues:

  1. The Polaris CLI startup script did still needed Pydantic and dateutil to be installed, so I had to install those manually, assuming probably poetry. I do see them in the pyproject.toml, so I'm assuming that's more a "my system" problem, so Installed them in the virtual environment that was created and all seems to work fine now.

  2. I ran the command above and got a JSON parsing error

Traceback (most recent call last):
  File "/home/alexmerced/development/developmentwork/dremio/repos/polaris/regtests/client/python/cli/polaris_cli.py", line 73, in <module>
    PolarisCli.execute()
  File "/home/alexmerced/development/developmentwork/dremio/repos/polaris/regtests/client/python/cli/polaris_cli.py", line 45, in execute
    error = json.loads(e.body)['error']
            ^^^^^^^^^^^^^^^^^^
  File "/home/alexmerced/.pyenv/versions/3.11.2/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexmerced/.pyenv/versions/3.11.2/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexmerced/.pyenv/versions/3.11.2/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
  1. This wasn't super helpful so I modified polaris_cli.py to get me more detail
from cli.options.parser import Parser
from polaris.management import ApiClient, Configuration, ApiException
from polaris.management import PolarisDefaultApi

class PolarisCli:
    """
    Implements a basic Command-Line Interface (CLI) for interacting with a Polaris service. The CLI can be used to
    manage entities like catalogs, principals, and grants within Polaris and can perform most operations that are
    available in the Python client API.

    Example usage:
    * ./polaris --client-id ${id} --client-secret ${secret} --host ${hostname} principals create example_user
    * ./polaris --client-id ${id} --client-secret ${secret} --host ${hostname} principal-roles create example_role
    * ./polaris --client-id ${id} --client-secret ${secret} --host ${hostname} catalog-roles list
    """

    @staticmethod
    def execute():
        options = Parser.parse()
        client_builder = PolarisCli._get_client_builder(options)
        with client_builder() as api_client:
            try:
                from cli.command import Command
                admin_api = PolarisDefaultApi(api_client)
                command = Command.from_options(options)
                command.execute(admin_api)
            except ApiException as e:
                import json
                print(f'Exception when communicating with the Polaris server. Status code: {e.status}')
                print(f'Response headers: {e.headers}')
                print(f'Response body: {e.body}')
                try:
                    error = json.loads(e.body)['error']
                    print(f'{error["type"]}: {error["message"]}')
                except (json.JSONDecodeError, KeyError):
                    print('Error parsing response as JSON. Raw response body:')
                    print(e.body)

    @staticmethod
    def _get_client_builder(options):
        # Validate
        has_access_token = options.access_token is not None
        has_client_secret = options.client_id is not None and options.client_secret is not None
        if has_access_token and has_client_secret:
            raise Exception("Please provide credentials via either --client-id / --client-secret or "
                            "--access-token, but not both")

        # Authenticate accordingly
        polaris_catalog_url = f'http://{options.host}:{options.port}/api/management/v1'
        if has_access_token:
            return lambda: ApiClient(
                Configuration(host=polaris_catalog_url, access_token=options.access_token),
            )
        elif has_client_secret:
            return lambda: ApiClient(
                Configuration(host=polaris_catalog_url, username=options.client_id, password=options.client_secret),
            )
        else:
            raise Exception("Please provide credentials via --client-id & --client-secret or via --access-token")


if __name__ == '__main__':
    PolarisCli.execute()
  1. So running this version of python_cli.py I got the following output:
Exception when communicating with the Polaris server. Status code: 401
Response headers: HTTPHeaderDict({'Date': 'Wed, 31 Jul 2024 15:16:14 GMT', 'Vary': 'Origin', 'WWW-Authenticate': 'Bearer realm="realm"', 'Content-Type': 'text/plain', 'Content-Length': '49'})
Response body: Credentials are required to access this resource.
Error parsing response as JSON. Raw response body:
Credentials are required to access this resource.

So I'm assuming the client-id and client-secret aren't what I thought they were or I passed them incorrectly, any guidance on this would be helpful.

@AlexMercedCoder AlexMercedCoder added the bug Something isn't working label Jul 31, 2024
kevinjqliu pushed a commit to kevinjqliu/polaris-catalog that referenced this issue Jul 31, 2024
…pache#46)

* Squashed commit of the following:

Co-authored-by: Evgeny Zubatov <evgeny.zubatov@snowflake.com>
Co-authored-by: Michael Collado <collado.mike@gmail.com>
Co-authored-by: Shannon Chen <shannon.chen@snowflake.com>
Co-authored-by: Eric Maynard <eric.maynard@snowflake.com>
Co-authored-by: Alvin Chen <alvin.chen@snowflake.com>

commit de0b4ee768a62221a480dce7da935a27a206d076
Merge: 1c19fc8 85e69a3
Author: Michael Collado <michael.collado@snowflake.com>
Date:   Mon Jul 29 16:36:25 2024 -0700

    Merge commit '3e6e01aae203356ed972502dfa596d04ec5a8ca5' into mcollado-merge-oss

commit 1c19fc877231e34d5e8baa4a05902d13f6120050
Author: Michael Collado <michael.collado@snowflake.com>
Date:   Mon Jul 29 16:25:05 2024 -0700

    Merge polaris-dev OSS contributions

commit a3fbf4ce4bc6c629bef308349b7c7a64c8335ac9
Author: Michael Collado <michael.collado@snowflake.com>
Date:   Mon Jul 29 15:43:23 2024 -0700

    Fix token refresh in oauth service to work with client credentials (apache#37)

    The Iceberg REST client _does_ retry refreshing the auth token with
    client credentials, but it submits them in Basic Auth form rather than
    as form parameters. We need to base64/url-decode them in order to
    validate the credentials are correct. We also need to return an accepted
    tokenType during refresh.

    Tested with
    ```java
            RESTSessionCatalog sessionCatalog =
                    new RESTSessionCatalog(config -> HTTPClient.builder(config).uri(config.get(CatalogProperties.URI)).build(), null);
            sessionCatalog.initialize("demo", Map.of(
                    "uri", "http://localhost:8181/api/catalog",
                    "prefix", "catalog",
                    "credential", "$URLENCODED_CLIENTID:$URLENCODED_CLIENTSECRET",
                    "scope", "PRINCIPAL_ROLE:ALL",
                    "token-refresh-enabled", "true",
                    "warehouse", "martins_demo1"
            ));
            Field catalogAuth = RESTSessionCatalog.class.getDeclaredField("catalogAuth");
            catalogAuth.setAccessible(true);
            OAuth2Util.AuthSession authSession = (OAuth2Util.AuthSession) catalogAuth.get(sessionCatalog);

            Field client = RESTSessionCatalog.class.getDeclaredField("client");;
            client.setAccessible(true);
            RESTClient restClient = (RESTClient) client.get(sessionCatalog);
            for (int i = 0; i < 10; i++) {
                System.out.println(authSession.refresh(restClient));
                Thread.sleep(10000);
            }
    ```

commit 517cb6231d424fac59ceecb1845bdb0a3e065265
Author: Michael Collado <michael.collado@snowflake.com>
Date:   Mon Jul 29 10:47:32 2024 -0700

    Changed reg test docker image to stop exposing aws credentials as env variables (apache#36)

    In the reg tests, when S3 credentials aren't present in the FileIO, the
    S3 client is falling back to the credentials in the environment
    variables, which have access to everything. This caused a previous bug
    to go uncaught.

    I verified that if I don't update the FileIO for a table, these tests
    fail now.

commit e418aefe8964c7c67b509f8eec43055f1c17a742
Author: Michael Collado <michael.collado@snowflake.com>
Date:   Mon Jul 29 08:40:58 2024 -0700

    Fix error with catalog FileIO credentials and path construction (apache#35)

    Namespace directories were being constructed backwards. Unsure why the
    tests didn't catch this

    FileIO for table creation and update was also not updating credentials
    correctly due to several `fileIO` variables in scope and updating the
    wrong one. I've renamed the variables to be more clear what each fileIO
    is scoped to.

    Future change upcoming to improve reg tests to catch issues - fileIO is
    falling back to environment variables in the docker image

commit 7f37bb252cec7df07db3994bf981efe76a52639c
Author: Michael Collado <michael.collado@snowflake.com>
Date:   Mon Jul 29 02:05:14 2024 -0700

    Fix validation to read table metadata file after fileio initialization with credentials (apache#34)

    TableMetadataParser was reading metadata file before FileIO was
    initialized with credentials. This was uncaught in the tests because the
    FileIO reads the test image's environment variables.

commit 0866f3ad54295a6d7822b9645f59996986d23acd
Author: Michael Collado <michael.collado@snowflake.com>
Date:   Sun Jul 28 22:10:22 2024 -0700

    Fixed issue when creating table under namespace with custom location (apache#33)

    Tables were still being created with the default directory structure
    when their parent namespace had a custom location. This fixes the issue
    and adds a test proving the table is successfully created and that its
    location is under the custom namespace directory

commit ee701ff99120b948b5ed3120461a9aaf0842f944
Author: Michael Collado <michael.collado@snowflake.com>
Date:   Sun Jul 28 20:53:52 2024 -0700

    Disallow overlapping base locations for table and namespaces and prevent table location from escaping via metadata file (apache#32)

    Two major changes included here:
    * Disables table locations and namespace locations from overlapping.
    Since namespaces can't overlap and tables can't overlap sibling
    namespaces or sibling tables, this prevents all table locations from
    overlapping within a catalog
    * Prevents metadata files from pointing at table locations outside of
    the table's base location

    Both of these features are controllable by feature flags. Because of
    existing unit and integration tests (notably the ones we import from
    Iceberg), I also made the TableMetadata location check and the namespace
    directory checking configurable at the catalog level. However, the
    overlap checking within a namespace is not configurable at the catalog
    level (this means there's a gap so that if a catalog allows metadata
    files to point to locations outside of the namespace directory, a
    table's location could overlap with a table in another directory).

    It is possible for a table or a namespace to _set_ its base-directory to
    something other than the default. However, tables and namespaces with
    overridden base locations still cannot overlap with sibling tables or
    namespaces.

commit 51ac320f60c103c2b10cd8d2910217010f38afdd
Author: Shannon Chen <shannon.chen@snowflake.com>
Date:   Sun Jul 28 23:13:59 2024 +0000

    The loadTable is current scoped to table location, this PR makes
    1. loadTable only scoped to table location + `metadata/` and `data/`.
    2. when refreshTable keep it only scoped to table location + `metadata`
    3. Throw user error when the user specify `write.metadata.path` or
    `write.data.path`

commit 838ba65849e7f4f0dd5dfcac0093262a165e52e4
Author: Eric Maynard <eric.maynard@snowflake.com>
Date:   Fri Jul 26 22:04:41 2024 -0700

    CREATE_TABLE shouldn't return credentials to read the table if the user doesn't have that privilege (apache#29)

    Before CREATE_TABLE returns credentials that can be used to read a
    table, Polaris should validate that the user has TABLE_READ_DATA on that
    table.

commit 96257f4fa54372fb565954024cbfe256de5d6f20
Author: Alvin Chen <alvin.chen@snowflake.com>
Date:   Fri Jul 26 15:43:35 2024 -0700

    Call metric init on IcebergRestConfigurationApi and IcebergRestOAuth2Api class (apache#30)

commit d6e057778811f20a462ad2746722e1a1427197cf
Author: Alvin Chen <alvin.chen@snowflake.com>
Date:   Fri Jul 26 10:22:34 2024 -0700

    Switch Metrics Emission from Ad-Hoc to Initialization-Based (apache#28)

    <!-- Please describe your change here and remove this comment -->
    Metrics are currently emitted in an ad-hoc fashion, meaning they will
    only be queriable in Prometheus if the corresponding API endpoint is
    invoked. This makes plotting difficult in Grafana, especially in the
    case where we need to aggregate across multiple metrics to reflect, for
    instance, the overall error rate across all endpoints in the
    application.
    Say we have metrics a, b, c. If b is null since the corresponding API
    has not yet been invoked, a+b+c would result in the value null, instead
    of the sum of a and c. This can be fixed in Grafana by "filling in" the
    metrics with a metric that is guaranteed to be present, say `up`. The
    promql query will then become:
    `(a or (up * 0)) + (b or (up * 0)) + (c or (up * 0))`
    However, the query becomes extremely large and slow. Thus to avoid this,
    we'll make sure the metrics are always emitted regardless of the
    corresponding API being called.

    We also add a counter metric to each endpoint to track the total number
    of invokes. Previously we had timers who have an attribute `count`.
    However, they are only incremented on successes (since we only record
    timer on success), therefore, they did not incorporate failures in the
    counts.

commit 20bf59b9b5e8dba9a35b932f036114b85375829b
Author: Eric Maynard <eric.maynard@snowflake.com>
Date:   Fri Jul 26 10:09:50 2024 -0700

    When a catalog is created or updated, we should check that the catalog
    does not have a location which overlaps with the location of an existing
    catalog.

* Squashed commit of the following:

commit b65dbc68c43e7c3ff0e1901e516c9749fda58ced
Author: Michael Collado <michael.collado@snowflake.com>
Date:   Mon Jul 29 17:24:10 2024 -0700

    Fix Gradle formatting error in merge

* Update aws config to check for blank instead of null to address reg tests when aws keys are not available
kevinjqliu pushed a commit to kevinjqliu/polaris-catalog that referenced this issue Jul 31, 2024
... to avoid repeating dependency version.

Also update all strings in `.gradle` files to use double quotes instead of single quotes and use bracketed syntax.

A follow-up's going to introduce Gradle version catalogs.
@AlexMercedCoder
Copy link
Author

Upon deeper exploration, based on the code I am correct that the values I thought were client and secret are, but when I started printing the headers/body/params of the request in the CLI it seemed that the client_id value and client_secret weren't being passed anywhere in the request. But when I printed (options.client_id) and (options.client_secret) in python_cli.py it did log the right values they just don't seem to be making it to the API call when I log the request details from rest.py .

Still working on it, if I can figure out the fix I'll make a PR.

@AlexMercedCoder
Copy link
Author

Ok, so in rest.py I've logged

  • params
  • headers
  • body
  • the request object being sent here is what I get
##Params
{}
##Header
{'Content-Type': 'application/json', 'User-Agent': 'OpenAPI-Generator/1.0.0/python'}
## Body
{"catalog": {"type": "INTERNAL", "name": "first-polaris-catalog", "properties": {"default-base-location": "s3://somebucket/folder/"}, "storageConfigInfo": {"storageType": "S3", "roleArn": "arn:aws:iam::xxxxxxxxxx:role/polaris-storage"}}}
##Request Object
r1 {'headers': HTTPHeaderDict({'Date': 'Wed, 31 Jul 2024 19:26:42 GMT', 'Vary': 'Origin', 'WWW-Authenticate': 'Bearer realm="realm"', 'Content-Type': 'text/plain', 'Content-Length': '49'}), 'status': 401, 'version': 11, 'reason': 'Unauthorized', 'strict': 0, 'decode_content': True, 'retries': Retry(total=3, connect=None, read=None, redirect=None, status=None), 'enforce_content_length': False, 'auto_close': True, '_decoder': None, '_body': None, '_fp': <http.client.HTTPResponse object at 0x75ec46e41840>, '_original_response': <http.client.HTTPResponse object at 0x75ec46e41840>, '_fp_bytes_read': 0, 'msg': None, '_request_url': 'http://localhost:8181/api/management/v1/catalogs', '_pool': <urllib3.connectionpool.HTTPConnectionPool object at 0x75ec46e785d0>, '_connection': <urllib3.connection.HTTPConnection object at 0x75ec47525750>, 'chunked': False, 'chunk_left': None, 'length_remaining': 49}

the r1 was print for me identify which request block in rest.py this was coming from which means it ran through this:

# no content type provided or payload is json
                content_type = headers.get('Content-Type')
                if (
                    not content_type
                    or re.search('json', content_type, re.IGNORECASE)
                ):
                    request_body = None
                    if body is not None:
                        request_body = json.dumps(body)
                        print(request_body)
                    r = self.pool_manager.request(
                        method,
                        url,
                        body=request_body,
                        timeout=timeout,
                        headers=headers,
                        preload_content=False
                    )
                    print("r1", r.__dict__)

Weird that the body was labeled as NONE cause when I logged request_body I got the right value and it's clearly assigned to body but when I log the request the body is not there.

@cgpoh
Copy link

cgpoh commented Aug 1, 2024

Hi @AlexMercedCoder , I have the same error when I'm trying to create a catalog using CLI and following the readme doesn't help for me too. After digging further, I'm able to create my catalog using the following curl command:

curl -i -X POST -H "Authorization: Bearer $PRINCIPAL_TOKEN" -H 'Accept: application/json' -H 'Content-Type: application/json' \
  http://${POLARIS_HOST:-localhost}:8181/api/management/v1/catalogs \
  -d '{"name": "polaris", "type": "INTERNAL", "properties": {
        "default-base-location": "abfss://mycontainer@storageaccount.dfs.core.windows.net/test/"
    },"storageConfigInfo": {
        "tenantId": "long-tenant-id",
        "storageType": "AZURE",
        "allowedLocations": [
            "abfss://mycontainer@storageaccount.dfs.core.windows.net/test/"
        ]
    } }'

@AlexMercedCoder
Copy link
Author

@cgpoh is the principal token just id:secret or do I have to make another curl request for the token cause I was working on trying to get the token endpoint to return me a token and hadn't successfully done that yet.

But this is super helpful, thank you!

@cgpoh
Copy link

cgpoh commented Aug 1, 2024

@AlexMercedCoder, I have to make a curl request to get the token:

curl -i -X POST \
  http://localhost:8181/api/catalog/v1/oauth/tokens \
  -d 'grant_type=client_credentials&client_id=client_id_obtained_when_boot_up&client_secret=client_secret_obtained_when_boot_up&scope=PRINCIPAL_ROLE:ALL'

After getting the token:

{
    "access_token": "principal:root;password:af88b69c4f1215a1e62202ceb45f5a92;realm:default-realm;role:ALL",
    "scope": "PRINCIPAL_ROLE:ALL",
    "token_type": "bearer",
    "expires_in": 3600
}

I export the access_token:

export PRINCIPAL_TOKEN="principal:root;password:af88b69c4f1215a1e62202ceb45f5a92;realm:default-realm;role:ALL"

@eric-maynard
Copy link
Contributor

I think many of these issues should be fixed by:
#30

@AlexMercedCoder
Copy link
Author

Just in case anyone want to go through the steps I did get it all going here is write up:
https://www.dremio.com/blog/getting-hands-on-with-polaris-oss-apache-iceberg-and-apache-spark/

@collado-mike
Copy link
Contributor

@AlexMercedCoder can you confirm the linked PR did solve your issue? Can we close this?

@collado-mike
Copy link
Contributor

I'll close for now. Feel free to reopen if this is still an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants