m (Reliability of support)
m
 
Line 11: Line 11:
 
First, some definition:
 
First, some definition:
  
== Size ==
+
=== Size ===
  
 
How much data you need backup. In the digital world, this is measured in bits, or, more likely, some units (most likely the MiB or GiB). A backup for us takes some fraction of a TiB.
 
How much data you need backup. In the digital world, this is measured in bits, or, more likely, some units (most likely the MiB or GiB). A backup for us takes some fraction of a TiB.
  
== Speed ==
+
=== Speed ===
  
 
How much time it takes to copy the whole backup. If you trapped it to some support that takes 2 years to transfer, that's tantamount to having lost your data.
 
How much time it takes to copy the whole backup. If you trapped it to some support that takes 2 years to transfer, that's tantamount to having lost your data.
  
== Quality of data ==
+
=== Quality of data ===
  
 
It doesn't matter that your data is backup-ed somewhere if you don't know where it is or that it even exists. The data should be well organized and classified.
 
It doesn't matter that your data is backup-ed somewhere if you don't know where it is or that it even exists. The data should be well organized and classified.
  
== Reliability of support ==
+
=== Reliability of support ===
  
 
The data is stored on a physical support. This can be lost, broken or malfunction. Even if it's in the cloud, and the business of assuring the integrity of the support is not yours, that still translates into similar problems: price, the continuation of service, confidentiality, etc.
 
The data is stored on a physical support. This can be lost, broken or malfunction. Even if it's in the cloud, and the business of assuring the integrity of the support is not yours, that still translates into similar problems: price, the continuation of service, confidentiality, etc.
Line 29: Line 29:
 
We use mainly [[hard drives]] to backup data, with the stuff we currently work on also being on dropbox.
 
We use mainly [[hard drives]] to backup data, with the stuff we currently work on also being on dropbox.
  
== Redundancy and synchronization ==
+
=== Redundancy and synchronization ===
  
 
To overcome the fact that eventually, any support will break down, one has to make copies of backup. This requires some effort as well to make sure the copies agree with each other and to monitor which possibly became faulty.
 
To overcome the fact that eventually, any support will break down, one has to make copies of backup. This requires some effort as well to make sure the copies agree with each other and to monitor which possibly became faulty.
 +
 +
== Techniques ==
 +
 +
This works well:
 +
 +
<pre>
 +
rsync -axHAWXS --numeric-ids --info=progress2 /home/laussy /media/laussy/backups/x--date-hostname
 +
</pre>
 +
 +
# -a  : all files, with permissions, etc..
 +
# -v  : verbose, mention files
 +
# -x  : stay on one file system
 +
# -H  : preserve hard links (not included with -a)
 +
# -A  : preserve ACLs/permissions (not included with -a)
 +
# -X  : preserve extended attributes (not included with -a)
 +
# <tt>--info=progress2</tt> instead of <tt>--progress</tt> is useful for large transfers, as it gives overall progress, instead of (millions of lines for) individual files.
 +
# <tt>--numeric-ids</tt> to avoid mapping uid/gid values by user/group name

Latest revision as of 15:18, 26 February 2022

Contents

Backup

Backup is an important business. There are five crucial to address issues:

  1. Size
  2. Speed
  3. Quality of data
  4. Reliability of support
  5. Redundancy and synchronization

First, some definition:

Size

How much data you need backup. In the digital world, this is measured in bits, or, more likely, some units (most likely the MiB or GiB). A backup for us takes some fraction of a TiB.

Speed

How much time it takes to copy the whole backup. If you trapped it to some support that takes 2 years to transfer, that's tantamount to having lost your data.

Quality of data

It doesn't matter that your data is backup-ed somewhere if you don't know where it is or that it even exists. The data should be well organized and classified.

Reliability of support

The data is stored on a physical support. This can be lost, broken or malfunction. Even if it's in the cloud, and the business of assuring the integrity of the support is not yours, that still translates into similar problems: price, the continuation of service, confidentiality, etc.

We use mainly hard drives to backup data, with the stuff we currently work on also being on dropbox.

Redundancy and synchronization

To overcome the fact that eventually, any support will break down, one has to make copies of backup. This requires some effort as well to make sure the copies agree with each other and to monitor which possibly became faulty.

Techniques

This works well:

rsync -axHAWXS --numeric-ids --info=progress2 /home/laussy /media/laussy/backups/x--date-hostname
  1. -a  : all files, with permissions, etc..
  2. -v  : verbose, mention files
  3. -x  : stay on one file system
  4. -H  : preserve hard links (not included with -a)
  5. -A  : preserve ACLs/permissions (not included with -a)
  6. -X  : preserve extended attributes (not included with -a)
  7. --info=progress2 instead of --progress is useful for large transfers, as it gives overall progress, instead of (millions of lines for) individual files.
  8. --numeric-ids to avoid mapping uid/gid values by user/group name