Rescuing files: the last of them

⇠ Back to Blog:Hacks

A friend encountered a silly bug which wiped an entire folder of data (which bug?). Said folder was in fact the home directory so he basically lost everything. He had a backup of the many years of accumulated data, but in the process he lost what in many cases is the most important: all that came since the last backup. In his case, roughly two months of work.

Photorec can recover files (not only photos). The problem in his case is that he would recover not the last two months of unbackuped data, but years of them, with no name or timestamp, only the extension. It would be impossible to go through hundred of thousands of files to recover the missing two-months.

Here we provide a simple solution to this problem. Stamp all files from the backup, all files from the recovery and keep only those of the later set which aren't in the former. Then go through them individually to do a manual recovery (restoring their name and location in the filesystem).

First compile a list of checksums of the files you already have safely backuped:

find directory-to-backup -type f | xargs -d$'\n' -I {} md5sum {} > cksum-backup.txt

Then the same on the recovered files, but running the script from the root of the recovered files (so "." as the starting point of search):

find . -type f | xargs -d$'\n' -I {} md5sum {} > cksum-recovered.txt

Run the following Mathematica code on the two files you have generated:

This will create a list of the files not found in your backup, and thus identifying those missing since then. The file is called "new.txt".

Create a directory, e.g., NEW where to gather the new files (still locally to the recovered files directory), then run:

cat new.txt | while read line ; do cp $line NEW/ ; done

In NEW, there remains to inspect the files manually. Hopefully there now should be a fairly manageable amount of them.