Yesterday I had a fun opportunity of restoring roughly 912 Million Rows to a database. 902 Million belonged to one single table (902,966,645 rows, to be exact).

Problem

Our current backup system uses mysqldump. It dumps a 25 GB sql dump file, which compresses to about 2.5GB using gzip. The last time we needed to restore a backup it was only about 9GB, and it took several hours.

This time, I created the database, and from the mysql prompt I issued the following command:

. /path/to/backup/database_dump.sql

It would run great until it got to about 10% of the way through the scores table. However, it would start to slow down. Because the rows were so small in our scores table, each INSERT statement had about 45,000-50,000 records. So each line had roughly 1MB of data. Also I’m now having this issue: ERROR 1153 (08S01) at line 96: Got a packet bigger than 'max_allowed_packet' bytes

At first it would insert a set of 50,000 in half a second or so. However, after a few million records, it would slow down to three second, and got to about 10 seconds per INSERT statement. This was a huge problem, given that I had roughly 18,000 INSERT statements, and at 10 seconds per INSERT, it would take 50 hours to restore. Our website was down during this restore, since it was our primary database. So being down for over two days was not an option.

While trying to diagnose the problem I noticed something. While using the MySQL command show processlist the thread for the Database Restore would be in the sleep state for 9-10 seconds, and then the query would execute in under 0.2 seconds. So it wasn’t a problem with MySQL storing the data, but a problem with reading the data from such a large database dump file.

So I tried from the server’s command line mysql -u user_name -p database_name < /path/to/backup/database_dump.sql with the same result. The longer into the file I got, the longer it was taking for MySQL to read the query.

Solution

So, after drinking lots of coffee, I came up with an idea. Why not split up the database sql dump into multiple files. So I used the linux split command like this:

cd /path/to/backup/
mkdir splits
split -n 200 database_backup.sql splits/sql_

This produced several dozen files in order, and it took about 10 minutes. The -n option told split to split each file up into 200 lines each. So the files were then named sql_aa, sql_ab, sql_ac all the way to sql_fg. Then, I did the following command using cat to pipe the files to mysql:

cd splits
cat sql_* | mysql -u root -p database_name

The only problem with this method is you don’t see a status report for each query executed, it just runs until you hit an error, displaying the error. If no errors occur, it will just return you to the prompt. So to monitor the progress I would execute a show processlist; command on mysql to see how far we were.

the entire database was restored. A few things to note, I didn’t try just using cat on the original file to see if it would read the file differently than the was mysql was trying. But the important thing is I got the database restored in a relatively timely manner.

Hopefully, in the very near future, we will have moved to a new score system that doesn’t have almost a billion rows in it.

P.S. There is other way to do it like using mydumper suggested by my friend and also in AWS.