Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #43445

…erg Catalog Operations. (#43445)

### What problem does this PR solve?

Support Pre-Execution Authentication for HMS Type Iceberg Catalog
Operations Summary
This PR introduces a new utility class, PreExecutionAuthenticator, which
is designed to ensure pre-execution authentication for HMS (Hive
Metastore) type operations on Iceberg catalogs. This is especially
useful in environments where secure access is required, such as
Kerberos-based Hadoop ecosystems. By integrating
PreExecutionAuthenticator, each relevant operation will undergo an
authentication step prior to execution, maintaining security compliance.

### Motivation
In environments utilizing an Iceberg catalog with an HMS backend, many
operations may require authentication to access secure data or perform
privileged tasks. Given that operations on HMS-type catalogs typically
run within a Hadoop environment secured by Kerberos, ensuring each
operation is executed within an authenticated context is essential.
Previously, there was no standardized mechanism to enforce pre-execution
authentication, which led to potential security gaps. This PR aims to
address this issue by introducing an extensible authentication utility.

### Key Changes
Addition of PreExecutionAuthenticator Utility Class

Provides a standard way to perform pre-execution authentication for
tasks. Leverages HadoopAuthenticator (when available) to execute tasks
within a privileged context using doAs. Supports execution with or
without authentication, enabling flexibility for both secure and
non-secure environments. Integration with Iceberg Catalog Operations

All relevant HMS-type catalog operations will now use
PreExecutionAuthenticator to perform pre-execution authentication.
Ensures that operations like createDb, dropDb, and other privileged
tasks are executed only after authentication. Extensible Design

PreExecutionAuthenticator is adaptable to other future authentication
methods, if needed, beyond Hadoop and Kerberos.
CallableToPrivilegedExceptionActionAdapter class allows any Callable
task to be executed within a PrivilegedExceptionAction, making it
versatile for various task types.


### Check List (For Author)

- Test <!-- At least one of them must be included. -->

    - [x] Manual test (add detailed scripts or steps below)
```
mysql> CREATE TABLE ha
    ->        (
    ->            vendor_id BIGINT,
    ->            trip_id BIGINT,
    ->            trip_distance FLOAT,
    ->            fare_amount DOUBLE,
    ->            store_and_fwd_flag STRING,
    ->            ts DATETIME
    ->        );
Query OK, 0 rows affected (2.08 sec)

mysql> show create table ha;
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                                                                                                                                                                              |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ha    | CREATE TABLE `ha` (
  `vendor_id` bigint NULL,
  `trip_id` bigint NULL,
  `trip_distance` float NULL,
  `fare_amount` double NULL,
  `store_and_fwd_flag` text NULL,
  `ts` datetimev2(6) NULL
) ENGINE=ICEBERG_EXTERNAL_TABLE
LOCATION 'xxxxx'
PROPERTIES (
  "doris.version" = "doris-2.1.6-rc04-67ee7f53e6",
  "write.parquet.compression-codec" = "zstd"
);

mysql>        INSERT INTO iceberg.ck_iceberg.ha
    ->        VALUES
    ->         (1, 1000371, 1.8, 15.32, 'N', '2024-01-01 9:15:23'),
    ->         (2, 1000372, 2.5, 22.15, 'N', '2024-01-02 12:10:11'),
    ->         (2, 1000373, 0.9, 9.01, 'N', '2024-01-01 3:25:15'),
    ->         (1, 1000374, 8.4, 42.13, 'Y', '2024-01-03 7:12:33');  
Query OK, 4 rows affected (5.10 sec)
{'status':'COMMITTED', 'txnId':'35030'}

mysql> select * from ha;
+-----------+---------+---------------+-------------+--------------------+----------------------------+
| vendor_id | trip_id | trip_distance | fare_amount | store_and_fwd_flag | ts                         |
+-----------+---------+---------------+-------------+--------------------+----------------------------+
|         1 | 1000371 |           1.8 |       15.32 | N                  | 2024-01-01 09:15:23.000000 |
|         2 | 1000372 |           2.5 |       22.15 | N                  | 2024-01-02 12:10:11.000000 |
|         2 | 1000373 |           0.9 |        9.01 | N                  | 2024-01-01 03:25:15.000000 |
|         1 | 1000374 |           8.4 |       42.13 | Y                  | 2024-01-03 07:12:33.000000 |
+-----------+---------+---------------+-------------+--------------------+----------------------------+
4 rows in set (1.20 sec)
```
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

run buildall

@CalvinKirs CalvinKirs closed this Nov 18, 2024
@CalvinKirs CalvinKirs reopened this Nov 18, 2024
@CalvinKirs CalvinKirs merged commit 8da1e8c into branch-2.1 Nov 18, 2024
@CalvinKirs CalvinKirs deleted the auto-pick-43445-branch-2.1 branch November 18, 2024 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants