Tips for handling big SQL dumps

Recently I had to work with SQL dumps to recover a database server and to update a MySQL installation. Because complete SQL dumps are too big to handle them with diff and vi (with enough memory vim actually works on large files, but it is really slow) I needed more traditional tools to compare them and to extract only the parts I needed.

To extract single DB dumps from a complete dump find the line numbers where the single DB dumps start. Then use sed to print only the lines from the start of the searched DB and the next one. mysqldump seems to use separate passes for data, stored routines, and triggers — so there may be multiple ranges of lines in a complete dump:

fgrep -n 'Current Database: ' dump_all.sql
sed -n -e '86226,93282p' -e '123249,123254p' 
  dump_all.sql > db_test.sql

diff loads the files it works on into memory, thus it cannot be used for large files (>= 1Gb). To extract multiple DBs or to compare dumps it is useful to split the SQL dump at DB boundaries:

split -p 'Current Database: ' dump_all.sql

The resulting partial files can be renamed/concatenated to one file per DB (using tcsh syntax here):

foreach f ( x?? )
  set dbname=`head -1 $f 
  | sed -e 's/^-- Current Database: .(.*).$/1/' 
    -e 's/^-- MySQL dump.*$/dumpheader/'`
  test $dbname && cat $f >> part-${dbname} && rm $f
end

If we did this with two SQL dumps in $dir1 and $dir2, then we can compare them chunk by chunk (assuming every DB on itself is small enough to fit into memory for diff):

foreach f ($dir1/part-*)
  diff --ignore-case --ignore-space-change 
    --ignore-blank-lines -u 
    $dir1/$f $dir2/$f >> /tmp/diff.out
end

Comments are closed.