Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command to migrate MariaDB database #61

Merged
merged 54 commits into from
Sep 4, 2024
Merged
Changes from 1 commit
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
5a469e2
restoring maria-migrate branch
amyfromandi Jul 17, 2024
677cc4a
modified .gitignore to ignore specific files
amyfromandi Jul 17, 2024
871128b
Refactored some code
amyfromandi Jul 22, 2024
66c161a
updated utils.py
amyfromandi Jul 25, 2024
b2700bd
added liths prescript query
amyfromandi Jul 25, 2024
ce87076
Merge branch 'main' into maria-migrate
davenquinn Jul 25, 2024
b4da967
Moved entire mariadb_migration system to subdirectory
davenquinn Jul 25, 2024
28c062a
Incorporated legacy migration command into new directory
davenquinn Jul 25, 2024
00c1a02
Created basic command to run MariaDB CLI
davenquinn Jul 26, 2024
1f234f9
Remove need to connect with --net=host
davenquinn Jul 26, 2024
7f9193c
Add command that restores a database onto the MariaDB server
davenquinn Jul 26, 2024
93c7fb6
Removed overly complex logging approach
davenquinn Jul 26, 2024
225e016
find_row_variances() compares maridb data to macrostrat data to ensur…
amyfromandi Jul 26, 2024
12f5509
Refactor - move stream reader to a utils file
davenquinn Jul 26, 2024
76b1b2b
Fix reference to decoder
davenquinn Jul 26, 2024
4c58f5e
Rearrange CLI utils
davenquinn Jul 26, 2024
8d7eccd
Moved pre- and post- scripts to external SQL files
davenquinn Jul 26, 2024
9b8ba34
Removed extra files created by IntelliJ
davenquinn Jul 26, 2024
6245e56
Set up namespaced packages correctly in intellij
davenquinn Jul 26, 2024
8161b71
Fix a bit more sql
davenquinn Jul 26, 2024
985bab4
added find_col_variance() method
amyfromandi Jul 26, 2024
884814b
Merge branch 'maria-migrate' of https://github.com/UW-Macrostrat/macr…
amyfromandi Jul 26, 2024
acf35b9
Move change assessment methods to separate file
davenquinn Jul 26, 2024
aa4c84d
Updated MariaDB dump/restore functions
davenquinn Jul 27, 2024
1262ea3
PostgreSQL migration now mostly works
davenquinn Jul 29, 2024
de6470e
Get reporting to work
davenquinn Jul 29, 2024
e239b9b
Somewhat improved column and row count functions
davenquinn Jul 29, 2024
e7650e6
Updated mariadb migration functions
davenquinn Jul 29, 2024
0ab3678
Fixed MYSQLDump command and deadlock in dump/restore
davenquinn Jul 29, 2024
1a9fbf3
added find_col_variances() function
amyfromandi Jul 29, 2024
5fc2636
Updated some formatting
davenquinn Jul 29, 2024
4fdc534
Streamlined migration scripts and renamed files
davenquinn Jul 29, 2024
795395c
Added the ability to run different steps of the migration process
davenquinn Jul 29, 2024
5064029
accommodated code for port missing in macrostrat.toml. also added ssl…
Jul 31, 2024
5152d49
updated pgloader code to use new pg_temp_engine url and creds
amyfromandi Jul 31, 2024
fb9e628
Merged maria-migrate into this branch
amyfromandi Jul 31, 2024
02d7112
Added most up-to-date find_row_variances() and find_col_variances
amyfromandi Aug 1, 2024
6f3774f
Got find_row_variances() and find_col_variances to function!
amyfromandi Aug 1, 2024
15d7a90
Created utility function we might use to run PGLoader
davenquinn Aug 1, 2024
88c0191
Updated utility functions for creating temporary database users
davenquinn Aug 2, 2024
0c6dda9
Updated function names somewhat
davenquinn Aug 2, 2024
cb633f6
fixed pgloader issue by adding mariadb_migrator as superuser
amyfromandi Aug 6, 2024
761903c
Fixed pgloader and check-data stats
amyfromandi Aug 6, 2024
22214cf
Merge pull request #74 from UW-Macrostrat/maria-migrate-cli
davenquinn Aug 6, 2024
daa13ad
repointed migration scripts to point to macrostrat.macrostrat_temp sc…
amyfromandi Aug 7, 2024
454129f
added preserve_macrostrat_data() function for the final steps in the …
amyfromandi Aug 8, 2024
1f1cf02
Merge branch 'maria-migrate-cli', remote-tracking branch 'origin' int…
amyfromandi Aug 8, 2024
fd039a5
Added code to resolve table and column variances across macrostrat an…
amyfromandi Aug 9, 2024
4c729c3
post script and preserve-macrostrat-data script is finalized!
amyfromandi Aug 13, 2024
f03941c
refactored a little bit of data
amyfromandi Aug 13, 2024
265feaf
removing unnecessary schlep scripts. index files in schlep-index.sql …
amyfromandi Aug 30, 2024
8f8ab02
Delete cli/macrostrat/cli/commands/schlep.py
amyfromandi Sep 3, 2024
78067e6
Delete cli/macrostrat/cli/commands/table_meta.py
amyfromandi Sep 3, 2024
57a2ccf
Modified output to pass data variance test with known 'issues'
amyfromandi Sep 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Added most up-to-date find_row_variances() and find_col_variances
amyfromandi committed Aug 1, 2024
commit 02d71128c7cfeb3e3f030fe311ecff2a41fb9c5f
Original file line number Diff line number Diff line change
@@ -179,30 +179,61 @@ def success(message):


def find_row_variances(
database_name_one,
schema_one,
database_name_two,
schema_two,
username,
password,
table,
database_name_one,
schema_one,
schema_two,
username,
password,
tables
):
SQLALCHEMY_DATABASE_URI = (
f"postgresql://{username}:{password}@{pg_server}/{database_name_one}"
)
engine = create_engine(SQLALCHEMY_DATABASE_URI)
insp = inspect(engine)
with engine.connect() as conn:
query = text(f"SELECT * FROM {schema_one}.{table}")
result = conn.execute(query)
df = pd.DataFrame(result)
for table in tables:
# Get the actual first column name for each table
columns = insp.get_columns(table, schema=schema_one)
first_column_name = columns[0]['name']
query = f"""
SELECT m.{first_column_name}
FROM macrostrat.macrostrat.{table} m
RIGHT JOIN macrostrat.macrostrat_temp.{table} t ON m.{first_column_name} = t.{first_column_name}
WHERE t.{first_column_name} IS NULL;
"""
result_df = pd.read_sql_query(query, engine)
print(f"Macrostrat rows not in Macrostrat_two rows for table {table}:")
print(result_df)
engine.dispose()
return

def find_col_variances(
database_name_one,
schema_one,
schema_two,
username,
password,
tables
):
SQLALCHEMY_DATABASE_URI = (
f"postgresql://{username}:{password}@{pg_server}/{database_name_two}"
f"postgresql://{username}:{password}@{pg_server}/{database_name_one}"
)
engine = create_engine(SQLALCHEMY_DATABASE_URI)
with engine.connect() as conn:
query = text(f"SELECT * FROM {schema_two}.{table}")
result = conn.execute(query)
df_two = pd.DataFrame(result)
insp = inspect(engine)
for table in tables:
columns_one = insp.get_columns(table, schema=schema_one)
columns_two = insp.get_columns(table, schema=schema_two)

col_names_one = {col['name'] for col in columns_one}
col_names_two = {col['name'] for col in columns_two}

col_not_in_schema_two = col_names_one - col_names_two

if col_not_in_schema_two:
print(f"Columns that exist in {schema_one} but NOT in {schema_two} for {table}: {col_not_in_schema_two}")
else:
print(f"All columns in {schema_one} exist in {schema_two} for {table}")

engine.dispose()
return df, df_two
return