Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup issue when continuously inserting rows. #256

Closed
goelsatyam opened this issue Mar 15, 2019 · 15 comments · Fixed by #260
Closed

Backup issue when continuously inserting rows. #256

goelsatyam opened this issue Mar 15, 2019 · 15 comments · Fixed by #260
Labels
Milestone

Comments

@goelsatyam
Copy link

goelsatyam commented Mar 15, 2019

While doing insert queries to mysql cluster, i tried to take backup in parallel. For 2gb data backup size was 43MB and for 200GB data backup size was 33MB. And when i tried to restore the cluster from the backup file it got stuck with STATUS: Init:1/2.

@AMecea
Copy link
Contributor

AMecea commented Mar 15, 2019

Hi @goelsatyam, thanks for reporting this!

Can you please give me the logs of init container by running:

kubectl logs <restore-cluster-name>-mysql-0 -c clone-mysql

Also, I think will be useful the logs of xtrabackup tool from the pod from where the backup was taken:

kubectl logs <original-cluster-name>-mysql-<slave index> -c sidecar

Note: make sure to strip sensitive data from logs if the case.

Thanks in advance!

@goelsatyam
Copy link
Author

goelsatyam commented Mar 15, 2019

Logs of init container:-

Create rclone.conf file.
Create google-credentials.json file.
2019-03-15T09:34:48.604Z	INFO	sidecar	environment is not set	{"key": "MYSQL_REPLICATION_USER"}
2019-03-15T09:34:48.604Z	INFO	sidecar	environment is not set	{"key": "MYSQL_REPLICATION_PASSWORD"}
2019-03-15T09:34:48.604Z	INFO	sidecar	environment is not set	{"key": "MYSQL_METRICS_EXPORTER_USER"}
2019-03-15T09:34:48.604Z	INFO	sidecar	environment is not set	{"key": "MYSQL_METRICS_EXPORTER_PASSWORD"}
2019-03-15T09:34:48.604Z	INFO	sidecar	environment is not set	{"key": "MYSQL_ORC_TOPOLOGY_USER"}
2019-03-15T09:34:48.604Z	INFO	sidecar	environment is not set	{"key": "MYSQL_ORC_TOPOLOGY_PASSWORD"}
2019-03-15T09:34:48.604Z	INFO	sidecar	clonning command	{"host": "cluster-test-b-2-mysql-0"}
2019-03-15T09:34:48.614Z	WARN	sidecar	failed to obtain master from orchestrator, go for default master	{"master": "cluster-test-b-2-mysql-0.cluster-test-b-2-mysql-nodes.default"}
2019-03-15T09:34:48.614Z	INFO	sidecar	cloning from bucket	{"bucket": "s3:dataplatform-sql-backup/backup/cluster-test-2-2019-03-15T09:29:32.xbackup.gz"}
encryption: using gcrypt 1.7.6-beta
2019/03/15 09:34:48 DEBUG : rclone: Version "v1.46" starting with parameters ["rclone" "-vv" "--config=/etc/rclone.conf" "cat" "s3:dataplatform-sql-backup/backup/cluster-test-2-2019-03-15T09:29:32.xbackup.gz"]
2019/03/15 09:34:48 DEBUG : Using config file from "/etc/rclone.conf"
2019/03/15 09:34:49 DEBUG : pacer: Reducing sleep to 0s
xbstream: Can't create/write to file './xtrabackup_logfile' (Errcode: 17 - File exists)
xbstream: failed to create file.

@AMecea
Copy link
Contributor

AMecea commented Mar 15, 2019

I think that those logs are not from the first fail, those are the logs after a failed happened before and leave behind some files and it does not reflect the real issue. Maybe the operator should run a cleanup before doing a backup restore (but this is a different issue).

@goelsatyam
Copy link
Author

I think is problem is while taking the backup. Because when i am took the backup without doing any queries in parallel, 2gb data backup size was .35GB and for 200GB data backup size was 36GB. In both cases i was able to successfully restore the cluster from the backup file of both 2GB and 200GB.

@AMecea
Copy link
Contributor

AMecea commented Mar 15, 2019

That may be a problem of xtrabackup. I will try to reproduce it.

Do you still have the logs of the container where the backup was taken, to examine them?

@goelsatyam
Copy link
Author

logs of the container where the backup was taken:-

Create rclone.conf file.
Create google-credentials.json file.
2019-03-15T11:49:01.566Z INFO sidecar environment is not set {"key": "MY_CLUSTER_NAME"}
2019-03-15T11:49:01.566Z INFO sidecar environment is not set {"key": "MY_NAMESPACE"}
2019-03-15T11:49:01.566Z INFO sidecar environment is not set {"key": "MY_SERVICE_NAME"}
2019-03-15T11:49:01.566Z INFO sidecar environment is not set {"key": "INIT_BUCKET_URI"}
2019-03-15T11:49:01.566Z INFO sidecar environment is not set {"key": "ORCHESTRATOR_URI"}
2019-03-15T11:49:01.566Z INFO sidecar environment is not set {"key": "MYSQL_REPLICATION_USER"}
2019-03-15T11:49:01.566Z INFO sidecar environment is not set {"key": "MYSQL_REPLICATION_PASSWORD"}
2019-03-15T11:49:01.566Z INFO sidecar environment is not set {"key": "MYSQL_METRICS_EXPORTER_USER"}
2019-03-15T11:49:01.566Z INFO sidecar environment is not set {"key": "MYSQL_METRICS_EXPORTER_PASSWORD"}
2019-03-15T11:49:01.566Z INFO sidecar environment is not set {"key": "MYSQL_ORC_TOPOLOGY_USER"}
2019-03-15T11:49:01.566Z INFO sidecar environment is not set {"key": "MYSQL_ORC_TOPOLOGY_PASSWORD"}
2019-03-15T11:49:01.566Z INFO sidecar take a backup {"host": "cluster-test-2-mysql-1.cluster-test-2-mysql-nodes.default", "bucket": "s3://bucket-url/cluster-test-2-2019-03-15T11:48:50.xbackup.gz"}
2019-03-15T11:49:01.566Z INFO sidecar initialize a backup {"host": "cluster-test-2-mysql-1.cluster-test-2-mysql-nodes.default", "endpoint": "/xbackup"}

@AMecea AMecea added this to the 0.2.x milestone Mar 18, 2019
@goelsatyam
Copy link
Author

@AMecea The table in which i was inserting the rows didn't had any primary key.
can this cause the problem?

@goelsatyam
Copy link
Author

@AMecea Getting the same error when I tried it with table having primary key.

@AMecea
Copy link
Contributor

AMecea commented Mar 18, 2019

Do you have a script to reproduce this? I need to reproduce it to inspect closely the logs and to tweak some parameters to find a solution. Also, I want to write an e2e test on this case.

@goelsatyam
Copy link
Author

goelsatyam commented Mar 19, 2019

You can use this python script:-

import time
import sys
import mysql.connector

def insert():
    sql = "INSERT INTO user (firstname, lastname, address) VALUES (%s, %s, %s)"
    val = []
    for i in range(100000):
        val.append(("l_"+str(i), "f_"+str(i), "a_"+str(i)))

    mydb = mysql.connector.connect(
        host="master_cluster_IP",
        user="root",
        passwd="root_password",
        database="test_data"
        )
    mycursor = mydb.cursor()
    mycursor.executemany(sql, val)
    mydb.commit()

total_time = time.time()
for i in range(int(sys.argv[1])):
    start_time = time.time()
    insert()
    print ("time taken for " + str(i*100000) + "-" + str(i+1)*100000 + " :" + str(time.time() - start_time))

run the script by:- python app.py 100

I ran this script from two different shells at same time.

@calind calind modified the milestones: 0.2.x, 0.2.7 Mar 19, 2019
@AMecea
Copy link
Contributor

AMecea commented Mar 19, 2019

Thanks for the script, I ran it and I was able to reproduce the error.

The logs that I was looking for are those:

xtrabackup: error: log block numbers mismatch:
xtrabackup: error: expected log block no. 567753, but got no. 1092033 from the log file.
xtrabackup: error: it looks like InnoDB log has wrapped around before xtrabackup could process all records due to either log copying being too slow, or  log files being too small.
xtrabackup: Error: xtrabackup_copy_logfile() failed.
2019-03-19T14:27:18.199Z	ERROR	sidecar	failed waiting for xtrabackup to finish	{"error": "exit status 1"}

Because the writes are too many and the innodb log files are too small the data from the log file is flushed before Xtraackup finishes copying files. What you can do is to increase the innodb_log_file_size (by default is computed based on requested memory (default 1G)).

This is a valid bug the backup should fail but instead succeeds.

@AMecea AMecea added the bug label Mar 19, 2019
@goelsatyam
Copy link
Author

@AMecea Where i can see the xtrabackup logs?

@AMecea
Copy link
Contributor

AMecea commented Mar 20, 2019

The Xtrabackup logs are in the sidecar container of the node from where the backup was taken, and you can see them by running:

kubectl logs <original-cluster-name>-mysql-0 -c sidecar

@goelsatyam
Copy link
Author

If i got unexpected high number of insert requests and innodb_log_file_size is small for the amount of load server is getting, then my backup will fail. Can we lock the slave pod so that atleast we can take the backup of the data already present at that time?

@AMecea
Copy link
Contributor

AMecea commented Mar 21, 2019

Yes, we can do that, we can issue a STOP SLAVE SQL_THREAD query before doing a backup. This logic can be under a feature flag on the MysqlClusterresource (like a backupMode field). But this is a different issue, so please open a new one. I will be happy to review and merge a PR that does that 😄

PR #260 will fix the issue of having corrupt backups with success status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants