Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement using Postgres instead of MariaDB for hive-metastore #982

Closed
wg102 opened this issue Apr 5, 2022 · 12 comments
Closed

Implement using Postgres instead of MariaDB for hive-metastore #982

wg102 opened this issue Apr 5, 2022 · 12 comments
Assignees
Labels
size/M 2-3 days

Comments

@wg102
Copy link
Contributor

wg102 commented Apr 5, 2022

To discuss with Cloud. See the repercussions.

@wg102 wg102 mentioned this issue Apr 5, 2022
54 tasks
@rohank07 rohank07 self-assigned this Apr 6, 2022
@rohank07 rohank07 added size/M 2-3 days size/S ~1 day and removed size/M 2-3 days labels Apr 6, 2022
@rohank07 rohank07 assigned wg102 and unassigned rohank07 Apr 20, 2022
@wg102
Copy link
Contributor Author

wg102 commented Apr 25, 2022

Cloud answer: You would have to either use PostgreSQL or MySQL managed as currently the only ATO'd. I think AAW has a preference for PostgreSQL ^^_

Therefore we will go with PostgreSQL

@wg102
Copy link
Contributor Author

wg102 commented May 6, 2022

It was chosen that since Minio will soon be remove, this task will be implemented with Azure blob storage account instead of s3.
It might therefore be a little longer. Also, this makes it not going to work

@wg102
Copy link
Contributor Author

wg102 commented May 12, 2022

Following a discussion, it was decided to use the binami image.

To test, we need the same trino steps as local development.
trino --server https://trino.example.com --insecure --access-token [access-token] --user default

Note: how to get the access-token:
kubectl get secrets (once the makefile was run)
kubectl describe secret [name of secret]
Copy the token paste it.

Note 2: to get the trino.example.com, need to add it to /etc/hosts, have the metallb stuff running in your cluster
docker network inspect -f '{{.IPAM.Config}}' kind to have the ip

@wg102
Copy link
Contributor Author

wg102 commented May 17, 2022

Currently implemented using the image from bitnami but that does not include a hive-metastore component.

@wg102 wg102 removed the size/S ~1 day label May 18, 2022
@wg102 wg102 changed the title Investigate/Implement using Postgres instead of MariaDB for hive-metastore Implement using Postgres instead of MariaDB for hive-metastore May 18, 2022
@wg102 wg102 added the size/M 2-3 days label May 18, 2022
@wg102
Copy link
Contributor Author

wg102 commented May 25, 2022

The idea I am currently following is to replace all mentions of mysql by postgresql.
But it also need to change the image that we use. I am using the bitnami image, and trying to modify it so it also has hive-metastore standalone.

Some of the sites I looked at to follow :

@wg102
Copy link
Contributor Author

wg102 commented May 25, 2022

Currently, the postgresql is there, but it does not connect with the metastore. From using k9s, I cannot see the metastore in either the svc, statefulset, pods or helm chart. All of which i expected to see an instance of hive-metastore in addition to hive-metastore-postgres

Also, when using the trino --server command, And trying to create a schema to minio.people.
"Query failed: hive-metastore-postgresql.hive-system:9083: java.net.SocketTimeoutException: Read timed out"

Despite port 9083 being associated with the service

@wg102
Copy link
Contributor Author

wg102 commented May 30, 2022

We seem to have hive-metastore now, but it doesnt seem to really start.
The docker image https://github.com/wg102/bitnami-docker-postgresql/tree/hive-post
and the code of : https://github.com/StatCan/aaw-trino/commits/feat-db-postgresql.
When doing local trino test, i get this error:

create schema if not exists minio.people with (location = 's3a://people/');
Query 20220530_171938_00001_5j2g4 failed: hive-metastore.hive-system:9083: java.net.SocketException: Connection reset

Similarly, when shelling into the hive pod and trying to curl the trino-coordinator service, :

I have no name!@hive-metastore-0:/$ curl 10.96.170.22:9083
curl: (56) Recv failure: Connection reset by peer

@wg102
Copy link
Contributor Author

wg102 commented May 30, 2022

I had to add jdk 11 aka Java to the docker image because the schematool -initSchema -dbType postgresand the startmetastore could be executed.

RUN apt-get update && \
    apt-get install -y openjdk-11-jdk ca-certificates-java && \
    apt-get clean && \
    update-ca-certificates -f
ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64/
RUN export JAVA_HOME

Unfortunately there seems to be multiple issues with this.
When trying /opt/apache-hive-metastore-3.0.0-bin/bin$ ./schematool -initSchema -dbType postgres --verbose

19:14:47.163 [main] ERROR org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool - Failed to load driver
Failed to load driver
19:14:47.163 [main] ERROR org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool - Underlying cause: java.lang.ClassNotFoundException : org.postgresql.Driver
Underlying cause: java.lang.ClassNotFoundException : org.postgresql.Driver

Similarly but with different outcome, the other one:
I have no name!@hive-metastore-0:/opt/apache-hive-metastore-3.0.0-bin/bin$ ./start-metastore --verbose

id: cannot find name for user ID 1001
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-metastore-3.0.0-bin/lib/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'org.apache.logging.log4j.simplelog.StatusLogger.level' to TRACE to show Log4j2 internal initialization logging.
2022-05-30 19:16:43: Starting Metastore Server
id: cannot find name for user ID 1001
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-metastore-3.0.0-bin/lib/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

@wg102
Copy link
Contributor Author

wg102 commented May 31, 2022

Fixed the driver, with the correct version.
This has changed one error, but it has not fixed the problem, just changed the error.

It seems when the

schematool -initSchema -dbType postgres
start-metastore

are executed, it immediately tries to create a database, which makes sense. But it tries to do it with data i am not sure where it came. 'metastore-db' database, as well as something called Derby.

I have no name!@hive-metastore-0:/opt/apache-hive-metastore-3.0.0-bin/bin$ ./start-metastore 
id: cannot find name for user ID 1001
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-metastore-3.0.0-bin/lib/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'org.apache.logging.log4j.simplelog.StatusLogger.level' to TRACE to show Log4j2 internal initialization logging.
2022-05-31 18:36:11: Starting Metastore Server
id: cannot find name for user ID 1001
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-metastore-3.0.0-bin/lib/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger File not found in file system or classpath: metastore-log4j2.properties
ERROR StatusLogger Reconfiguration failed: No configuration found for '14dad5dc' at 'null' in 'null'

And the second one:

I have no name!@hive-metastore-0:/opt/apache-hive-metastore-3.0.0-bin/bin$ ./schematool -initSchema -dbType postgres --verbose
id: cannot find name for user ID 1001
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-metastore-3.0.0-bin/lib/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger File not found in file system or classpath: metastore-log4j2.properties
ERROR StatusLogger Reconfiguration failed: No configuration found for '14dad5dc' at 'null' in 'null'
Metastore connection URL:	 jdbc:postgresql://hive-metastore-postgresql/metastore
Metastore Connection Driver :	 org.postgresql.Driver
Metastore connection User:	 hive
Starting metastore schema initialization to 3.0.0
Initialization script hive-schema-3.0.0.postgres.sql
1/1206       --
2/1206       -- PostgreSQL database dump
3/1206       --
4/1206       
5/1206       SET statement_timeout = 0;
No rows affected (0.002 seconds)
6/1206       SET client_encoding = 'UTF8';
No rows affected (0.001 seconds)
7/1206       SET standard_conforming_strings = off;
No rows affected (0.001 seconds)
8/1206       SET check_function_bodies = false;
No rows affected (0.001 seconds)
9/1206       SET client_min_messages = warning;
No rows affected (0.001 seconds)
10/1206      SET escape_string_warning = off;
No rows affected (0.001 seconds)
11/1206      
12/1206      SET search_path = public, pg_catalog;
No rows affected (0.001 seconds)
13/1206      
14/1206      SET default_tablespace = '';
No rows affected (0.002 seconds)
15/1206      
16/1206      SET default_with_oids = false;
No rows affected (0.002 seconds)
17/1206      
18/1206      --
19/1206      -- Name: BUCKETING_COLS; Type: TABLE; Schema: public; Owner: hiveuser; Tablespace:
20/1206      --
21/1206      
22/1206      CREATE TABLE "BUCKETING_COLS" ( 
"SD_ID" bigint NOT NULL, 
"BUCKET_COL_NAME" character varying(256) DEFAULT NULL::character varying, 
"INTEGER_IDX" bigint NOT NULL 
);
Error: ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0)
org.postgresql.util.PSQLException: ERROR: relation "BUCKETING_COLS" already exists
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
	at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:329)
	at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:315)
	at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:291)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:286)
	at sqlline.Commands.execute(Commands.java:823)
	at sqlline.Commands.sql(Commands.java:733)
	at sqlline.SqlLine.dispatch(SqlLine.java:795)
	at sqlline.SqlLine.runCommands(SqlLine.java:1706)
	at sqlline.Commands.run(Commands.java:1317)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38)
	at sqlline.SqlLine.dispatch(SqlLine.java:791)
	at sqlline.SqlLine.initArgs(SqlLine.java:595)
	at sqlline.SqlLine.begin(SqlLine.java:643)
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.runSqlLine(MetastoreSchemaTool.java:1034)
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.runSqlLine(MetastoreSchemaTool.java:1007)
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.doInit(MetastoreSchemaTool.java:596)
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.doInit(MetastoreSchemaTool.java:574)
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.run(MetastoreSchemaTool.java:1273)
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.main(MetastoreSchemaTool.java:1178)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Aborting command set because "force" is false and command failed: "CREATE TABLE "BUCKETING_COLS" ( 
"SD_ID" bigint NOT NULL, 
"BUCKET_COL_NAME" character varying(256) DEFAULT NULL::character varying, 
"INTEGER_IDX" bigint NOT NULL 
);"
Closing: org.postgresql.jdbc.PgConnection
18:45:02.144 [main] ERROR org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool - Schema initialization FAILED! Metastore state would be inconsistent !!
Schema initialization FAILED! Metastore state would be inconsistent !!
18:45:02.148 [main] ERROR org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool - Underlying cause: java.io.IOException : Schema script failed, errorcode OTHER
Underlying cause: java.io.IOException : Schema script failed, errorcode OTHER
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.doInit(MetastoreSchemaTool.java:600)
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.doInit(MetastoreSchemaTool.java:574)
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.run(MetastoreSchemaTool.java:1273)
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.main(MetastoreSchemaTool.java:1178)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.io.IOException: Schema script failed, errorcode OTHER
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.runSqlLine(MetastoreSchemaTool.java:1040)
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.runSqlLine(MetastoreSchemaTool.java:1007)
	at org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool.doInit(MetastoreSchemaTool.java:596)
	... 9 more
18:45:02.149 [main] ERROR org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool - *** schemaTool failed ***
*** schemaTool failed ***


Which might just be because the init script was already run. instead i ran this:

I have no name!@hive-metastore-0:/opt/apache-hive-metastore-3.0.0-bin/bin$ ./schematool -upgradeSchema  -dbType postgres --verbose
id: cannot find name for user ID 1001
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-metastore-3.0.0-bin/lib/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger File not found in file system or classpath: metastore-log4j2.properties
ERROR StatusLogger Reconfiguration failed: No configuration found for '14dad5dc' at 'null' in 'null'
No schema upgrade required from version 3.0.0
schemaTool completed

It complains about a missing file.

@wg102
Copy link
Contributor Author

wg102 commented Jun 3, 2022

On June 4th 2022, there was an update of the bitnami repo.

Error: can't get a valid version for repositories postgresql. Try changing the version constraint in Chart.yaml
make: *** [Makefile:30: helm-hive] Error 1

When using the Chart 10.2.1 it gives this error.

I tried updating for a newer version, be it 10.21.0 (which was updated yesterday) but it gives the same error.
Or 11.6.0 which complains about a missing secret key in the key.

RESOLVED: Delete the cluster (make kind-delete) and create a new one (make helm-all(

@wg102
Copy link
Contributor Author

wg102 commented Jun 8, 2022

Yesterday we managed to get the docker image built in a way that worked.
A few changes were needed in both repos.

  1. The trino repo, I needed to edit the value file to have the right structure with the new version.
postgresql:
  auth:
    postgresPassword: hive
    username: hive
    password: hive
    database: metastore
    existingSecret: ""

of course the configmap had to be updated for this change as well.

Which will need to be done in a way to not be hardcoded code.

  1. In the configmap, these properties needed to be added
<property>
     <name>metastore.task.threads.always</name>  
     <value>org.apache.hadoop.hive.metastore.events.EventCleanerTask,org.apache.hadoop.hive.metastore.MaterializationsCacheCleanerTask</value>
 </property>
 <property>
     <name>metastore.expression.proxy</name>
     <value>org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy</value>
</property>	

@wg102
Copy link
Contributor Author

wg102 commented Jun 8, 2022

The changes now work. I will link the repo in which our current docker image for bitnami with hive-standalone can be found.
And create the PR for this to be in the trino repo. Once this done, I will create another ticket to get postgresql into DEV, which will need to talk about images. And locations.

@wg102 wg102 closed this as completed Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M 2-3 days
Projects
None yet
Development

No branches or pull requests

2 participants