Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting use_beeline by default for hive cli connection #38763

Merged
merged 11 commits into from
Apr 9, 2024

Conversation

amoghrajesh
Copy link
Contributor

Hive connections from airflow are usually always made through beeline in most cases, due to this we should enable the beeline checkbox by default to make it more convenient to not miss the chance to tick this or while deploying the connections through CLI or API.

The code also follows the pattern of "if beeline, do this"

    def _prepare_cli_cmd(self) -> list[Any]:
        """Create the command list from available information."""
        conn = self.conn
        hive_bin = "hive"
        cmd_extra = []

        if self.use_beeline:
            hive_bin = "beeline"
            self._validate_beeline_parameters(conn)
            if self.high_availability:
                jdbc_url = f"jdbc:hive2://{conn.host}/{conn.schema}"
                self.log.info("High Availability set, setting JDBC url as %s", jdbc_url)
            else:
                jdbc_url = f"jdbc:hive2://{conn.host}:{conn.port}/{conn.schema}"
                self.log.info("High Availability not set, setting JDBC url as %s", jdbc_url)
            if conf.get("core", "security") == "kerberos":
                template = conn.extra_dejson.get("principal", "hive/_HOST@EXAMPLE.COM")
                if "_HOST" in template:
                    template = utils.replace_hostname_pattern(utils.get_components(template))
                proxy_user = self._get_proxy_user()
                if ";" in template:
                    raise RuntimeError("The principal should not contain the ';' character")
                if ";" in proxy_user:
                    raise RuntimeError("The proxy_user should not contain the ';' character")
                jdbc_url += f";principal={template};{proxy_user}"
                if self.high_availability:
                    if not jdbc_url.endswith(";"):
                        jdbc_url += ";"
                    jdbc_url += "serviceDiscoveryMode=zooKeeper;ssl=true;zooKeeperNamespace=hiveserver2"
            elif self.auth:
                jdbc_url += ";auth=" + self.auth

            jdbc_url = f'"{jdbc_url}"'

            cmd_extra += ["-u", jdbc_url]
            if conn.login:
                cmd_extra += ["-n", conn.login]
            if conn.password:
                cmd_extra += ["-p", conn.password]

        hive_params_list = self.hive_cli_params.split()

        return [hive_bin, *cmd_extra, *hive_params_list]

Also added an additional unit test for default values


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@amoghrajesh amoghrajesh requested a review from eladkal April 5, 2024 15:34
@amoghrajesh amoghrajesh requested a review from potiuk April 8, 2024 04:05
@potiuk
Copy link
Member

potiuk commented Apr 8, 2024

Approved with NIT's

@amoghrajesh
Copy link
Contributor Author

@eladkal fixed spelling and spaces, we should get a green run now.

amoghrajesh and others added 2 commits April 9, 2024 12:19
@amoghrajesh amoghrajesh merged commit 6e0ac39 into apache:main Apr 9, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants