13 March 2026

Back up with rsync

Gemini, when I back up my dev/ (development) directory (there are some Python scripts), may I exclude all the venv/ (virtual environment) directories?


The short answer is yes, absolutely. In fact, most developers would argue that you should exclude them.

Think of a virtual environment like a specialized toolkit you assembled for a specific project. While the tools inside are necessary to get the job done, you don't need to pack the entire heavy toolbox when you move house—you just need the shopping list so you can buy those same tools again at your new place.

Why you should exclude venv/

  • Size: Virtual environments can contain thousands of files and take up hundreds of megabytes. Backing them up makes your backups slow and bloated.

  • Portability: Virtual environments are "brittle." They often contain hardcoded paths to your specific machine's Python interpreter. If you restore a venv/ folder to a different computer (or even a different folder on the same computer), it will likely break.

  • Redundancy: All the libraries inside (like Pandas, Flask, or Requests) already exist on the internet (PyPI). There's no need to pay for storage to keep a second copy of them.


The "Shopping List" Method

Instead of backing up the folder, you should back up the metadata required to recreate it. This keeps your backup light and functional.

  1. Generate the list: Before backing up, run this command inside your project:

    pip freeze > requirements.txt

  2. Back up the text file: Ensure requirements.txt is included in your backup.

  3. Restore when needed: If you ever need to restore your project, simply create a fresh environment and run:

    pip install -r requirements.txt

What to skip vs. what to keep

To make your backup strategy efficient, here is a quick guide:

CategoryInclude in Backup?Examples
Source CodeYes.py files, README, config files
DependenciesNovenv/, env/, node_modules/
MetadataYesrequirements.txt, pyproject.toml
Build ArtifactsNo__pycache__/, .pyc files, dist/

Pro-tip: if you use Git for your development, ensure you have a .gitignore file in your root directory that includes venv/. This ensures you don't accidentally push those bulky files to GitHub or GitLab either.

Would you like me to generate a standard .gitignore template for your Python projects?

 

I back up my dev/ directory with rsync. Could you show me the rsync command, which exclude dependencies and build artifacts, please?

 

Since you are using rsync, you can use the --exclude flag to skip those bulky, non-essential directories. This keeps your backup slim and avoids the "broken path" issues that come with moving virtual environments.

The Recommended rsync Command

You can run a single command that targets the most common Python clutter:

Bash
rsync -avz --exclude='venv/' --exclude='env/' --exclude='__pycache__/' --exclude='.pytest_cache/' --exclude='*.pyc' /path/to/source/dev/ /path/to/destination/backup/

Breakdown of the flags:

  • -a (archive): This is a "super-flag" that preserves permissions, symlinks, and timestamps. It also turns on recursion.

  • -v (verbose): Shows you exactly which files are being copied in real-time.

  • -z (compress): Compresses file data during the transfer (great if you're backing up to a network drive or remote server).

  • --exclude: Tells rsync to ignore specific patterns. Note that venv/ (with the trailing slash) ensures it only skips directories.


Pro-Tip: Using an Exclude File

If your dev/ directory has a lot of different project types (e.g., some Python, some Node.js, some Rust), the command line can get very messy. Instead, you can create a text file named exclude-list.txt and list everything you want to skip:

exclude-list.txt

Plaintext
venv/
env/
__pycache__/
.pytest_cache/
.git/
node_modules/
dist/
build/
*.pyc
.DS_Store

Then run the simplified command:

Bash
rsync -avz --exclude-from='exclude-list.txt' /path/to/source/dev/ /path/to/destination/backup/

A Quick Warning on Trailing Slashes

In rsync, the trailing slash on the source directory matters significantly:

  • dev/ (with slash): Copies the contents of the directory into the destination.

  • dev (no slash): Copies the directory itself into the destination.

Would you like me to help you write a small bash script that automates this backup and adds a timestamp to each folder?

 =============================================== 

The Best Way to Back up Files with rsync

 

This rsync command is a powerful method for creating incremental, space-efficient backups. It performs a full-looking backup, but only consumes disk space for files that have actually changed since the last backup (/backups/current).
Here is the detailed breakdown:
Command Breakdown
rsync "-a" --link-dest=/backups/current /source/data /backups/new_backup
  • rsync: The command-line utility used for efficient file synchronization and backups.
  • -a (Archive Mode): This is crucial for backups. It is a shorthand flag (-rlptgoD) that tells rsync to act recursively, copy symlinks as symlinks, and preserve permissions, modification times, group, owner, and device files.
  • --link-dest=/backups/current: This is the "secret sauce" for incremental backups.
    • It tells rsync to look at the /backups/current directory.
    • If a file in /source/data has not changed compared to the file in /backups/current, rsync will create a hard link to the file in the current backup instead of copying it again.
    • If the file has changed, it copies the new version.
    • Result: The new backup looks like a full backup, but shared files don't occupy extra space.
  • /source/data: The source directory you want to back up.
  • /backups/new_backup: The target directory where the new backup will be created.

Detailed Step-by-Step Execution
  1. Preparation: rsync starts scanning /source/data and creates the directory structure for /backups/new_backup.
  2. Comparison: rsync compares files in /source/data with files in /backups/current (the link-dest directory).
  3. Hard Linking (Unchanged Files): For files that are identical, rsync creates a hard link in /backups/new_backup pointing to the exact same data blocks on the disk as the file in /backups/current.
  4. Copying (Changed/New Files): If a file in /source/data is new or changed, it is copied directly into /backups/new_backup.
  5. Completion: You now have a "full" backup in /backups/new_backup that takes up very little extra disk space.
Key Considerations and Tips
  • Same File System: The --link-dest directory (/backups/current) and the new_backup directory must be on the same file system for hard linking to work.
  • Trailing Slashes: The trailing slash on /source/data/ determines whether the folder itself is copied or just its contents. Usually, for backups, you want to copy the contents, so /source/data/ (with a slash) is preferred.
  • Rotational Strategy: After running this command, you would typically move or rename /backups/new_backup to /backups/current for the next run (e.g., mv /backups/new_backup /backups/current).
  • Safety: Before running this in a script, use the -n (dry-run) flag to see what would happen:
    rsync -an --link-dest=/backups/current /source/data /backups/new_backup.
Summary
This command is efficient because it allows you to have a "full" backup structure for every day (making restoring files simple), while only using storage space equivalent to an incremental backup.

 


No comments:

Post a Comment