Backups: What matters and how to do it right๐
Backup types and strategies, storage solutions, and the requirements analysis you need to do
Letโs be honest: Everybody hates this topic. Itโs not sexy. You just want a solution that works and that you can trust. You might not even have created backups โ and I mean real backups, not randomly storing copies.
I had the weirdest kinds of backups in the past. I remember that I was super nervous about my bachelor's thesis. Iโve stored it in a private repository on GitHub and Iโve sent snapshots to my fatherโฆ just to be sure I donโt lose it. You know, if the computer breaks.
In this article, you will learn about where you can store your backups and how you can create periodic backups automatically. Letโs start!
Requirement Analysis
Before we dive into solutions, we need to analyze what we want to protect and against what we need to protect.
Which data is important to you?
Business care about their databases. They might contain all kinds of things, but especially their customer data. Code is another big one. E-mails. And contracts, of course.
In your private life, you might care most about photos and documents. Maybe your whole system setup as well.
What might cause data loss?
Against which possible causes of data loss do you want to protect with your backup strategy? I notice five big data loss scenarios:
Disk failure is for sure the most common one. It hasnโt happened to me so far, but the longer you use your device and the more read/write operations you have, the more likely it gets that the device just breaks at some point. There are companies that can recover data from broken disks, but itโs unclear to me how expensive that might get, how long it would take, and how much can be recovered.
Accidental deletions are another common one. You just hit the wrong button or execute the wrong command in the terminal and your data is gone.
Malware and especially ransomware is another reason why you want to have backups. Notice that this is different from the rest. Here somebody is actively trying to lock you out from accessing your data.
Theft is likely more relevant to your private life than for a business that might have better security on-site โฆ but things might change with people working remotely.
Natural catastrophes are the last big reason for data loss. It could be simply your house or data center being on fire, an earthquake, a tsunami, or a flood. In the case of data centers, a catastrophe might not even destroy the data permanently, but โjustโ lock you out if network cables or power lines of the data center are destroyed.
Solutions
Letโs look at what you can do to protect against those issues.
Having more than one disk in your computer and letting it automatically write the data to both places helps against disk failure. Itโs called RAID โ redundant array of independent disks. It does not help against accidental deletions as the files would be deleted on the other disk as well. Malware likely also affects both disks at the same time, just like theft and catastrophes.
You can have a disk in a different location at home. Just a minimal โcomputerโ that basically only puts your disk in your local network. A network-attached storage (short: NAS). It might help against theft a bit better, depending on how well you hide it. Now you need to have a backup strategy and software that actually creates and stores your backups on the NAS. We will talk about that later. It depends on the system and how well it is protected against malware. It does not help against a natural catastropheโฆ except if you have the NAS really remotely and do the backup over the internet. But that is not the case most of the time for people who use NAS.
A cloud provider is for sure an option that can fulfill all your requirements. In this case, you want to consider different issues such as who can access the backup (privacy) and ensure that the network traffic is encrypted. Also, the pricing model just changed from a one-time investment to a subscription model.
Backup Strategies
The โaccidental deletionโ and โmalwareโ data loss scenarios are tricky because there might be a serious time delay between the incident time and the detection time.
Assume you make one backup at night. The new backup always overwrites the old backup. Now you delete the photos from your kids' 8th birthday, but you only notice that when your wife wants to create a collage for the 10th birthday. As youโve overwritten the backup, there is no chance of recovery.
Instead of overwriting the backup, you might just create copies. But that grows really quickly. So you mix those two approaches by introducing the idea of backup generations. This is also known as the generation principle or the grandfather-father-son principle. Itโs a rotation scheme for backups. The idea is the following:
You have for example 5 grandfathers, 12 fathers, and 7 sons. The 7 sons might refer to days of the week. So you have a Monday / Tuesday / Wednesday / Thursday / Friday / Saturday / Sunday backup. The 12 fathers could refer to the 1st of the month, so you have a January, February, โฆ backup. And the Grandfathers are the 1st of January of the past 5 years.
That means if you delete the photos of your kid on 2022โ12โ03 and you notice on 2024โ10โ20, you would go back to the backup from 2022โ01โ01. Yes, you would lose everything that happened between 2022โ01โ01 and 2022โ12โ03. Almost a year of data. But not everything.
The described generations would also mean that you need to have 5+12+7 = 24 backups at all times. If you have 100 GB of data that would mean you need to store 2400 GB just for the backups.
Reducing Storage: Different types of backups
We rotate backups to reduce the required storage space, but there are other options to do that. One is to use different types of backups.
A full backup is conceptually the simplest. You just make a copy of what you want to back up. If you create a daily backup, you just have one copy for every day. You donโt depend on any other backups.
A differential backup just stores the difference (sometimes called the delta) since the last full backup.
Incremental backups are similar to differential backups, but they store the difference from the last backup. That could also be another incremental backup.
Reducing Storage: Compression
Another trade-off you can do is storage vs. computational power. Compression algorithms try to find duplicates within your code and find a different representation.
Assume you want to store a sequence of numbers:
[1, 1, 1, 1, 4, 5, 5, 5, 8, 8, 8, 1, 1, 1, 1, 1, 1]You might notice that this sequence has several duplicates. Instead of storing that, you could store:
[(4x, 1), (1x, 4), (3x, 5), (3x, 8), (6x, 1)]And instead of just looking for the next digit, you could also look for the next two:
[(2x, 11), (1x, 4), (3x, 5), (3x, 8), (3x, 11)]There are lots of clever ways how to combine ideas to make it shorter. But that shorter representation has to be computed. Applying compression takes time. Also when you want to read your original data, you need to decompress.
Confidentiality: Encrypt your backup!
Having a copy of your important data makes it more likely that somebody else can access it. It could be during transit (when you transfer the backup to the storage) or at rest (while it is stored).
Just as you want full disk encryption for laptop, you want to make sure your backup stays private. Make sure it is encrypted. And make really sure you donโt loose the key to decrypt it!
Other things you might care about
Itโs pretty clear that everybody cares about the price which is always per GB and month, but might also include additional fees for restoring or data transfer. You also know that you should care about encryption. Besides that, there are a few topics you might not have initially thought of:
The first thing you should check if the tool you intend to use supports the operating system youโre working with (Windows, Mac, or Linux).
There might be a limit in single-file size. Very likely something like 2 GB. If you have videos you might care about this a lot.
Backups are something that should just work in the background. Hence you really want scheduled automatic backups.
And, of course, you should know how you can use your backup for recovery. The user experience while recovering / restoring the backup is important. When you need it, youโre likely stressed. No time to watch many semi-professional YouTube videos that explain the tool you rely on.
Storage
There are two very different storage solutions: A network-attached storage (NAS) which you can operate yourself or cloud storage. The NAS has high upfront cost, but except for electricity no operating cost. Cloud storage pricing models are monthly subscriptions where you pay for storage and sometimes also for the amount of data you transfer.
NAS
The cheapest NAS I could find is โWD My Cloud Homeโ with 2 TB for 135 EUR (I have a pretty old one from WD which isnโt sold anymore).
While writing this article I found several people using Synology products. A key point for them is that the Synology devices are only bays for the disks. You can buy the disks independently.
Cloud Storage
I use Google Drive for a lot of different things. Itโs 2 EUR / 100 GB and month. It gets cheaper the more you need (see pricing plans).
Dropbox is another popular choice. Their cheapest plan is 10 EUR / 2 TB and month. Dropbox offers a lot of different features around its storage. For example, the โrewindโ feature allows you to undo any changes of the past 30 days. That might make it a proper solution for accidental file deletions or ransomware. See their pricing plans for details about those features.
A friend of mine (also a developer) uses a Hetzner Storage Box for his backups. Itโs 3.81 EUR/1 TB per month (see pricing plans). I also know Hetzner as a reliable and trustworthy provider
A storage solution that is only suited for developers is AWS S3. The pricing model of AWS S3 is a bit complicated, but if I got it right you can save 100 GB for only 0.10 EUR per month.
Software Solutions
Letโs dive into a few concrete solutions that you might want to use!
Pure folder synchronization
A couple of backup solutions are essentially just synchronizing two directories (folders): The local one and a remote one.
In the Linux world, this is typically done with rsync (potentially with inotifywait or a CRON job) and on Windows there is PureSync.
BorgBackup (short: Borg)
BorgBackup is free software that allows you to create, compress, encrypt, and manage your backups. As itโs written in Python you can make it work on pretty much any system (installation notes).
It is a command line application and here are the basic commands (docs):
# Create a borg repo: You need to enter
# a passphrase you have to remember
$ borg init --encryption=repokey ~/borg_repo# Borg created a key: Print that key and store it somewhere save
$ borg key export ~/borg_repo# Create a backup archive
$ borg create ~/borg_repo::Saturday1 ~/Documents# Inspect the borg repo:
$ borg list ~/borg_repo
Saturday1 Sun, 2022-09-04 12:03:59 [07752098e880ddffcc470c9a45382c8285c6dc9500fc8a8d4e4b279e0802086e]# Inspect a backup archive:
$ borg list ~/borg_repo::Saturday1
drwxr-xr-x moose moose 0 Sat, 2022-08-27 18:07:24 home/moose/Documents
-rw-rw-r-- moose moose 99420 Sat, 2021-12-18 15:23:23 home/moose/Documents/Finanzbildung-Version-2.pdf
-rw-rw-r-- moose moose 466391 Sat, 2022-04-09 23:22:05 home/moose/Documents/out.pdf
...# Restore a backup
$ borg extract ~/borg_repo::Saturday1Borg de-duplicates files between backups. That means if you run two backups of exactly the same content, it will only store it once. As borg takes care of all your backups in one repository, you donโt have to think about the different backup types (full/differential/incremental). Borg manages it for you.
You can automate backups via simple scripts. The borg prunecommand helps to implement the generation principle.
Vorta is a graphical interface to borg:

This also makes scheduled backups easy:

And even the pruning options are there:

Any Last Words?
The 3โ2โ1 rule is mentioned a lot when youโre looking for backup advice. People recommend having 3 copies in 2 different storage mediums/storage technologies and 1 off-site backup. That would cover all five data loss scenarios to some degree.
For a software developer, a Hetzner Storage Box + BorgBackup + Vorta is a great solution.
There are also many other backup solutions such as Backblaze, Duplicity, or Apple Time Machine. They might fit your needs just as well or even better than what Iโve described. You now know what to consider when you build your backup system.
Whatโs next?
In this series about application security (AppSec) we already explained some of the techniques of the attackers ๐ and also techniques of the defenders ๐:
- Part 1: SQL Injections ๐๐
- Part 2: Donโt leak Secrets ๐
- Part 3: Cross-Site Scripting (XSS) ๐๐
- Part 4: Password Hashing ๐
- Part 5: ZIP Bombs ๐
- Part 6: CAPTCHA ๐
- Part 7: Email Spoofing ๐
- Part 8: Software Composition Analysis (SCA) ๐
- Part 9: XXE attacks ๐๐
- Part 10: Effective Access Control ๐
- Part 11: DOS via a Billion Laughs ๐
- Part 12: Full Disk Encryption ๐
- Part 13: Insecure Deserialization ๐
- Part 14: Docker Security ๐
- Part 15: Credential Stuffing ๐๐
- Part 16: Multi-Factor Authentication (MFA/2FA) ๐
- Part 17: ReDoS ๐
- Part 18: Secure and Private Instant Messaging ๐
- Part 19: Cryptojacking ๐
- Part 20: Backups ๐
- Part 21: CSRF ๐
- Part 22: Single-Sign-On ๐
- Part 23: Clipboard Hijacking ๐
- Part 24: Certificates ๐
- Part 25: Race Condition Attacks in Blockchains ๐
- Part 26: Mobile Device Management (MDM) ๐
- Part 27: Server-Side Request Forgery (SSRF) ๐
- Part 28: Network Separation ๐
- Part 29: Social Engineering (including Phising) ๐
- Part 30: Virtual Private Networks (VPNs) ๐
Let me know if you are interested in more articles around AppSec / InfoSec!
I love writing about software development and technology ๐คฉ Donโt miss updates: Get my free email newsletter ๐ง or sign up for Medium โ๏ธ if you havenโt done it yet โ both encourage me to write more ๐ค





