Chapter 10. Data management

Table of Contents

10.1. Sharing, copying, and archiving
10.1.1. Archive and compression tools
10.1.2. Copy and synchronization tools
10.1.3. Idioms for the archive
10.1.4. Idioms for the copy
10.1.5. Idioms for the selection of files
10.1.6. Archive media
10.1.7. Removable storage device
10.1.8. Filesystem choice for sharing data
10.1.9. Sharing data via network
10.2. Backup and recovery
10.2.1. Backup utility suites
10.2.2. An example script for the system backup
10.2.3. A copy script for the data backup
10.3. Data security infrastructure
10.3.1. Key management for GnuPG
10.3.2. Using GnuPG on files
10.3.3. Using GnuPG with Mutt
10.3.4. Using GnuPG with Vim
10.3.5. The MD5 sum
10.4. Source code merge tools
10.4.1. Extracting differences for source files
10.4.2. Merging updates for source files
10.4.3. Updating via 3-way-merge
10.5. Version control systems
10.5.1. Comparison of VCS commands
10.6. Git
10.6.1. Configuration of Git client
10.6.2. Git references
10.6.3. Git commands
10.6.4. Git for the Subversion repository
10.6.5. Git for recording configuration history
10.7. CVS
10.7.1. Configuration of CVS repository
10.7.2. Local access to CVS
10.7.3. Remote access to CVS with pserver
10.7.4. Remote access to CVS with ssh
10.7.5. Importing a new source to CVS
10.7.6. File permissions in CVS repository
10.7.7. Work flow of CVS
10.7.8. Latest files from CVS
10.7.9. Administration of CVS
10.7.10. Execution bit for CVS checkout
10.8. Subversion
10.8.1. Configuration of Subversion repository
10.8.2. Access to Subversion via Apache2 server
10.8.3. Local access to Subversion by group
10.8.4. Remote access to Subversion via SSH
10.8.5. Subversion directory structure
10.8.6. Importing a new source to Subversion
10.8.7. Work flow of Subversion

Tools and tips for managing binary and text data on the Debian system are described.

[Warning] Warning

The uncoordinated write access to actively accessed devices and files from multiple processes must not be done to avoid the race condition. File locking mechanisms using flock(1) may be used to avoid it.

The security of the data and its controlled sharing have several aspects.

  • The creation of data archive

  • The remote storage access

  • The duplication

  • The tracking of the modification history

  • The facilitation of data sharing

  • The prevention of unauthorized file access

  • The detection of unauthorized file modification

These can be realized by using some combination of tools.

  • Archive and compression tools

  • Copy and synchronization tools

  • Network filesystems

  • Removable storage media

  • The secure shell

  • The authentication system

  • Version control system tools

  • Hash and cryptographic encryption tools

Here is a summary of archive and compression tools available on the Debian system.

Table 10.1. List of archive and compression tools

package popcon size extension command comment
tar V:916, I:999 2880 .tar tar(1) the standard archiver (de facto standard)
cpio V:464, I:999 989 .cpio cpio(1) Unix System V style archiver, use with find(1)
binutils V:186, I:694 93 .ar ar(1) archiver for the creation of static libraries
fastjar V:4, I:45 172 .jar fastjar(1) archiver for Java (zip like)
pax V:14, I:36 164 .pax pax(1) new POSIX standard archiver, compromise between tar and cpio
gzip V:888, I:999 243 .gz gzip(1), zcat(1), … GNU LZ77 compression utility (de facto standard)
bzip2 V:178, I:953 196 .bz2 bzip2(1), bzcat(1), … Burrows-Wheeler block-sorting compression utility with higher compression ratio than gzip(1) (slower than gzip with similar syntax)
lzma V:3, I:39 141 .lzma lzma(1) LZMA compression utility with higher compression ratio than gzip(1) (deprecated)
xz-utils V:434, I:964 442 .xz xz(1), xzdec(1), … XZ compression utility with higher compression ratio than bzip2(1) (slower than gzip but faster than bzip2; replacement for LZMA compression utility)
p7zip V:88, I:439 986 .7z 7zr(1), p7zip(1) 7-Zip file archiver with high compression ratio (LZMA compression)
p7zip-full V:131, I:521 4659 .7z 7z(1), 7za(1) 7-Zip file archiver with high compression ratio (LZMA compression and others)
lzop V:6, I:51 97 .lzo lzop(1) LZO compression utility with higher compression and decompression speed than gzip(1) (lower compression ratio than gzip with similar syntax)
zip V:50, I:442 608 .zip zip(1) InfoZIP: DOS archive and compression tool
unzip V:250, I:804 554 .zip unzip(1) InfoZIP: DOS unarchive and decompression tool

[Warning] Warning

Do not set the "$TAPE" variable unless you know what to expect. It changes tar(1) behavior.

[Note] Note

The gzipped tar(1) archive uses the file extension ".tgz" or ".tar.gz".

[Note] Note

The xz-compressed tar(1) archive uses the file extension ".txz" or ".tar.xz".

[Note] Note

Popular compression method in FOSS tools such as tar(1) has been moving as follows: gzipbzip2xz

[Note] Note

cp(1), scp(1) and tar(1) may have some limitation for special files. cpio(1) is most versatile.

[Note] Note

cpio(1) is designed to be used with find(1) and other commands and suitable for creating backup scripts since the file selection part of the script can be tested independently.

[Note] Note

Internal structure of Libreoffice data files are ".jar" file which can be opened also by unzip.

[Note] Note

The de-facto cross platform archive tool is zip. Use it as "zip -rX" to attain the maximum compatibility. Use also the "-s" option, if the maximum file size matters.

Here are several ways to copy the entire content of the directory "./source" using different tools.

  • Local copy: "./source" directory → "/dest" directory

  • Remote copy: "./source" directory at local host → "/dest" directory at "user@host.dom" host

rsync(8):

# cd ./source; rsync -aHAXSv . /dest
# cd ./source; rsync -aHAXSv . user@host.dom:/dest

You can alternatively use "a trailing slash on the source directory" syntax.

# rsync -aHAXSv ./source/ /dest
# rsync -aHAXSv ./source/ user@host.dom:/dest

Alternatively, by the following.

# cd ./source; find . -print0 | rsync -aHAXSv0 --files-from=- . /dest
# cd ./source; find . -print0 | rsync -aHAXSv0 --files-from=- . user@host.dom:/dest

GNU cp(1) and openSSH scp(1):

# cd ./source; cp -a . /dest
# cd ./source; scp -pr . user@host.dom:/dest

GNU tar(1):

# (cd ./source && tar cf - . ) | (cd /dest && tar xvfp - )
# (cd ./source && tar cf - . ) | ssh user@host.dom '(cd /dest && tar xvfp - )'

cpio(1):

# cd ./source; find . -print0 | cpio -pvdm --null --sparse /dest

You can substitute "." with "foo" for all examples containing "." to copy files from "./source/foo" directory to "/dest/foo" directory.

You can substitute "." with the absolute path "/path/to/source/foo" for all examples containing "." to drop "cd ./source;". These copy files to different locations depending on tools used as follows.

  • "/dest/foo": rsync(8), GNU cp(1), and scp(1)

  • "/dest/path/to/source/foo": GNU tar(1), and cpio(1)

[Tip] Tip

rsync(8) and GNU cp(1) have option "-u" to skip files that are newer on the receiver.

find(1) is used to select files for archive and copy commands (see Section 10.1.3, “Idioms for the archive” and Section 10.1.4, “Idioms for the copy”) or for xargs(1) (see Section 9.3.9, “Repeating a command looping over files”). This can be enhanced by using its command arguments.

Basic syntax of find(1) can be summarized as the following.

  • Its conditional arguments are evaluated from left to right.

  • This evaluation stops once its outcome is determined.

  • "Logical OR" (specified by "-o" between conditionals) has lower precedence than "logical AND" (specified by "-a" or nothing between conditionals).

  • "Logical NOT" (specified by "!" before a conditional) has higher precedence than "logical AND".

  • "-prune" always returns logical TRUE and, if it is a directory, searching of file is stopped beyond this point.

  • "-name" matches the base of the filename with shell glob (see Section 1.5.6, “Shell glob”) but it also matches its initial "." with metacharacters such as "*" and "?". (New POSIX feature)

  • "-regex" matches the full path with emacs style BRE (see Section 1.6.2, “Regular expressions”) as default.

  • "-size" matches the file based on the file size (value precedented with "+" for larger, precedented with "-" for smaller)

  • "-newer" matches the file newer than the one specified in its argument.

  • "-print0" always returns logical TRUE and print the full filename (null terminated) on the standard output.

find(1) is often used with an idiomatic style as the following.

# find /path/to \
    -xdev -regextype posix-extended \
    -type f -regex ".*\.cpio|.*~" -prune -o \
    -type d -regex ".*/\.git" -prune -o \
    -type f -size +99M -prune -o \
    -type f -newer /path/to/timestamp -print0

This means to do following actions.

  1. Search all files starting from "/path/to"

  2. Globally limit its search within its starting filesystem and uses ERE (see Section 1.6.2, “Regular expressions”) instead

  3. Exclude files matching regex of ".*\.cpio" or ".*~" from search by stop processing

  4. Exclude directories matching regex of ".*/\.git" from search by stop processing

  5. Exclude files larger than 99 Megabytes (units of 1048576 bytes) from search by stop processing

  6. Print filenames which satisfy above search conditions and are newer than "/path/to/timestamp"

Please note the idiomatic use of "-prune -o" to exclude files in the above example.

[Note] Note

For non-Debian Unix-like system, some options may not be supported by find(1). In such a case, please consider to adjust matching methods and replace "-print0" with "-print". You may need to adjust related commands too.

When choosing computer data storage media for important data archive, you should be careful about their limitations. For small personal data backup, I use CD-R and DVD-R by the brand name company and store in a cool, shaded, dry, clean environment. (Tape archive media seem to be popular for professional use.)

[Note] Note

A fire-resistant safe are meant for paper documents. Most of the computer data storage media have less temperature tolerance than paper. I usually rely on multiple secure encrypted copies stored in multiple secure locations.

Optimistic storage life of archive media seen on the net (mostly from vendor info).

  • 100+ years : Acid free paper with ink

  • 100 years : Optical storage (CD/DVD, CD/DVD-R)

  • 30 years : Magnetic storage (tape, floppy)

  • 20 years : Phase change optical storage (CD-RW)

These do not count on the mechanical failures due to handling etc.

Optimistic write cycle of archive media seen on the net (mostly from vendor info).

  • 250,000+ cycles : Harddisk drive

  • 10,000+ cycles : Flash memory

  • 1,000 cycles : CD/DVD-RW

  • 1 cycles : CD/DVD-R, paper

[Caution] Caution

Figures of storage life and write cycle here should not be used for decisions on any critical data storage. Please consult the specific product information provided by the manufacture.

[Tip] Tip

Since CD/DVD-R and paper have only 1 write cycle, they inherently prevent accidental data loss by overwriting. This is advantage!

[Tip] Tip

If you need fast and frequent backup of large amount of data, a hard disk on a remote host linked by a fast network connection, may be the only realistic option.

Removable storage devices may be any one of the following.

They may be connected via any one of the following.

Modern desktop environments such as GNOME and KDE can mount these removable devices automatically without a matching "/etc/fstab" entry.

  • udisks package provides a daemon and associated utilities to mount and unmount these devices.

  • D-bus creates events to initiate automatic processes.

  • PolicyKit provides required privileges.

[Tip] Tip

Automounted devices may have the "uhelper=" mount option which is used by umount(8).

[Tip] Tip

Automounting under modern desktop environment happens only when those removable media devices are not listed in "/etc/fstab".

Mount point under modern desktop environment is chosen as "/media/<disk_label>" which can be customized by the following.

  • mlabel(1) for FAT filesystem

  • genisoimage(1) with "-V" option for ISO9660 filesystem

  • tune2fs(1) with "-L" option for ext2/ext3/ext4 filesystem

[Tip] Tip

The choice of encoding may need to be provided as mount option (see Section 8.4.6, “Filename encoding”).

[Tip] Tip

The use of the GUI menu to unmount a filesystem may remove its dynamically generated device node such as "/dev/sdc". If you wish to keep its device node, unmount it with the umount(8) command from the shell prompt.

When sharing data with other system via removable storage device, you should format it with common filesystem supported by both systems. Here is a list of filesystem choices.


[Tip] Tip

See Section 9.8.1, “Removable disk encryption with dm-crypt/LUKS” for cross platform sharing of data using device level encryption.

The FAT filesystem is supported by almost all modern operating systems and is quite useful for the data exchange purpose via removable hard disk like media.

When formatting removable hard disk like devices for cross platform sharing of data with the FAT filesystem, the following should be safe choices.

When using the FAT or ISO9660 filesystems for sharing data, the following should be the safe considerations.

  • Archiving files into an archive file first using tar(1), or cpio(1) to retain the long filename, the symbolic link, the original Unix file permission and the owner information.

  • Splitting the archive file into less than 2 GiB chunks with the split(1) command to protect it from the file size limitation.

  • Encrypting the archive file to secure its contents from the unauthorized access.

[Note] Note

For FAT filesystems by its design, the maximum file size is (2^32 - 1) bytes = (4GiB - 1 byte). For some applications on the older 32 bit OS, the maximum file size was even smaller (2^31 - 1) bytes = (2GiB - 1 byte). Debian does not suffer the latter problem.

[Note] Note

Microsoft itself does not recommend to use FAT for drives or partitions of over 200 MB. Microsoft highlights its short comings such as inefficient disk space usage in their "Overview of FAT, HPFS, and NTFS File Systems". Of course, we should normally use the ext4 filesystem for Linux.

[Tip] Tip

For more on filesystems and accessing filesystems, please read "Filesystems HOWTO".

We all know that computers fail sometime or human errors cause system and data damages. Backup and recovery operations are the essential part of successful system administration. All possible failure modes hit you some day.

[Tip] Tip

Keep your backup system simple and backup your system often. Having backup data is more important than how technically good your backup method is.

There are 3 key factors which determine actual backup and recovery policy.

  1. Knowing what to backup and recover.

    • Data files directly created by you: data in "~/"

    • Data files created by applications used by you: data in "/var/" (except "/var/cache/", "/var/run/", and "/var/tmp/")

    • System configuration files: data in "/etc/"

    • Local softwares: data in "/usr/local/" or "/opt/"

    • System installation information: a memo in plain text on key steps (partition, …)

    • Proven set of data: confirmed by experimental recovery operations in advance

  2. Knowing how to backup and recover.

    • Secure storage of data: protection from overwrite and system failure

    • Frequent backup: scheduled backup

    • Redundant backup: data mirroring

    • Fool proof process: easy single command backup

  3. Assessing risks and costs involved.

    • Value of data when lost

    • Required resources for backup: human, hardware, software, …

    • Failure mode and their possibility

[Note] Note

Do not back up the pseudo-filesystem contents found on /proc, /sys, /tmp, and /run (see Section 1.2.12, “procfs and sysfs” and Section 1.2.13, “tmpfs”). Unless you know exactly what you are doing, they are huge useless data.

As for secure storage of data, data should be at least on different disk partitions preferably on different disks and machines to withstand the filesystem corruption. Important data are best stored on a write-once media such as CD/DVD-R to prevent overwrite accidents. (See Section 9.7, “The binary data” for how to write to the storage media from the shell commandline. GNOME desktop GUI environment gives you easy access via menu: "Places→CD/DVD Creator".)

[Note] Note

You may wish to stop some application daemons such as MTA (see Section 6.3, “Mail transport agent (MTA)”) while backing up data.

[Note] Note

You should pay extra care to the backup and restoration of identity related data files such as "/etc/ssh/ssh_host_dsa_key", "/etc/ssh/ssh_host_rsa_key", "~/.gnupg/*", "~/.ssh/*", "/etc/passwd", "/etc/shadow", "/etc/fetchmailrc", "popularity-contest.conf", "/etc/ppp/pap-secrets", and "/etc/exim4/passwd.client". Some of these data can not be regenerated by entering the same input string to the system.

[Note] Note

If you run a cron job as a user process, you must restore files in "/var/spool/cron/crontabs" directory and restart cron(8). See Section 9.3.14, “Scheduling tasks regularly” for cron(8) and crontab(1).

Here is a select list of notable backup utility suites available on the Debian system.

Table 10.5. List of backup suite utilities

package popcon size description
dump V:1, I:6 340 4.4 BSD dump(8) and restore(8) for ext2/ext3/ext4 filesystems
xfsdump V:0, I:10 834 dump and restore with xfsdump(8) and xfsrestore(8) for XFS filesystem on GNU/Linux and IRIX
backupninja V:4, I:4 355 lightweight, extensible meta-backup system
bacula-common V:10, I:17 2369 Bacula: network backup, recovery and verification - common support files
bacula-client I:4 175 Bacula: network backup, recovery and verification - client meta-package
bacula-console V:1, I:6 75 Bacula: network backup, recovery and verification - text console
bacula-server I:1 175 Bacula: network backup, recovery and verification - server meta-package
amanda-common V:1, I:2 9890 Amanda: Advanced Maryland Automatic Network Disk Archiver (Libs)
amanda-client V:1, I:2 1133 Amanda: Advanced Maryland Automatic Network Disk Archiver (Client)
amanda-server V:0, I:0 1089 Amanda: Advanced Maryland Automatic Network Disk Archiver (Server)
backup-manager V:1, I:2 571 command-line backup tool
backup2l V:0, I:1 114 low-maintenance backup/restore tool for mountable media (disk based)
backuppc V:4, I:4 2284 BackupPC is a high-performance, enterprise-grade system for backing up PCs (disk based)
duplicity V:8, I:15 1609 (remote) incremental backup
flexbackup V:0, I:0 243 (remote) incremental backup
rdiff-backup V:8, I:16 704 (remote) incremental backup
restic V:0, I:1 22182 (remote) incremental backup
rsnapshot V:6, I:11 452 (remote) incremental backup
slbackup V:0, I:0 152 (remote) incremental backup

Backup tools have their specialized focuses.

Basic tools described in Section 10.1.1, “Archive and compression tools” and Section 10.1.2, “Copy and synchronization tools” can be used to facilitate system backup via custom scripts. Such script can be enhanced by the following.

  • The restic package enables incremental (remote) backups.

  • The rdiff-backup package enables incremental (remote) backups.

  • The dump package helps to archive and restore the whole filesystem incrementally and efficiently.

[Tip] Tip

See files in "/usr/share/doc/dump/" and "Is dump really deprecated?" to learn about the dump package.

For a personal Debian desktop system running unstable suite, I only need to protect personal and critical data. I reinstall system once a year anyway. Thus I see no reason to backup the whole system or to install a full featured backup utility.

I use a simple script to make a backup archive and burn it into CD/DVD using GUI. Here is an example script for this.

#!/bin/sh -e
# Copyright (C) 2007-2008 Osamu Aoki <osamu@debian.org>, Public Domain
BUUID=1000; USER=osamu # UID and name of a user who accesses backup files
BUDIR="/var/backups"
XDIR0=".+/Mail|.+/Desktop"
XDIR1=".+/\.thumbnails|.+/\.?Trash|.+/\.?[cC]ache|.+/\.gvfs|.+/sessions"
XDIR2=".+/CVS|.+/\.git|.+/\.svn|.+/Downloads|.+/Archive|.+/Checkout|.+/tmp"
XSFX=".+\.iso|.+\.tgz|.+\.tar\.gz|.+\.tar\.bz2|.+\.cpio|.+\.tmp|.+\.swp|.+~"
SIZE="+99M"
DATE=$(date --utc +"%Y%m%d-%H%M")
[ -d "$BUDIR" ] || mkdir -p "BUDIR"
umask 077
dpkg --get-selections \* > /var/lib/dpkg/dpkg-selections.list
debconf-get-selections > /var/cache/debconf/debconf-selections

{
find /etc /usr/local /opt /var/lib/dpkg/dpkg-selections.list \
     /var/cache/debconf/debconf-selections -xdev -print0
find /home/$USER /root -xdev -regextype posix-extended \
  -type d -regex "$XDIR0|$XDIR1" -prune -o -type f -regex "$XSFX" -prune -o \
  -type f -size  "$SIZE" -prune -o -print0
find /home/$USER/Mail/Inbox /home/$USER/Mail/Outbox -print0
find /home/$USER/Desktop  -xdev -regextype posix-extended \
  -type d -regex "$XDIR2" -prune -o -type f -regex "$XSFX" -prune -o \
  -type f -size  "$SIZE" -prune -o -print0
} | cpio -ov --null -O $BUDIR/BU$DATE.cpio
chown $BUUID $BUDIR/BU$DATE.cpio
touch $BUDIR/backup.stamp

This is meant to be a script example executed from root.

I expect you to change and execute this as follows.

Keep it simple!

[Tip] Tip

You can recover debconf configuration data with "debconf-set-selections debconf-selections" and dpkg selection data with "dpkg --set-selection <dpkg-selections.list".

For the set of data under a directory tree, the copy with "cp -a" provides the normal backup.

For the set of large non-overwritten static data under a directory tree such as the one under the "/var/cache/apt/packages/" directory, hardlinks with "cp -al" provide an alternative to the normal backup with efficient use of the disk space.

Here is a copy script, which I named as bkup, for the data backup. This script copies all (non-VCS) files under the current directory to the dated directory on the parent directory or on a remote host.

#!/bin/sh -e
# Copyright (C) 2007-2008 Osamu Aoki <osamu@debian.org>, Public Domain
fdot(){ find . -type d \( -iname ".?*" -o -iname "CVS" \) -prune -o -print0;}
fall(){ find . -print0;}
mkdircd(){ mkdir -p "$1";chmod 700 "$1";cd "$1">/dev/null;}
FIND="fdot";OPT="-a";MODE="CPIOP";HOST="localhost";EXTP="$(hostname -f)"
BKUP="$(basename $(pwd)).bkup";TIME="$(date  +%Y%m%d-%H%M%S)";BU="$BKUP/$TIME"
while getopts gcCsStrlLaAxe:h:T f; do case $f in
g)  MODE="GNUCP";; # cp (GNU)
c)  MODE="CPIOP";; # cpio -p
C)  MODE="CPIOI";; # cpio -i
s)  MODE="CPIOSSH";; # cpio/ssh
t)  MODE="TARSSH";; # tar/ssh
r)  MODE="RSYNCSSH";; # rsync/ssh
l)  OPT="-alv";; # hardlink (GNU cp)
L)  OPT="-av";;  # copy (GNU cp)
a)  FIND="fall";; # find all
A)  FIND="fdot";; # find non CVS/ .???/
x)  set -x;; # trace
e)  EXTP="${OPTARG}";; # hostname -f
h)  HOST="${OPTARG}";; # user@remotehost.example.com
T)  MODE="TEST";; # test find mode
\?) echo "use -x for trace."
esac; done
shift $(expr $OPTIND - 1)
if [ $# -gt 0 ]; then
  for x in $@; do cp $OPT $x $x.$TIME; done
elif [ $MODE = GNUCP ]; then
  mkdir -p "../$BU";chmod 700 "../$BU";cp $OPT . "../$BU/"
elif [ $MODE = CPIOP ]; then
  mkdir -p "../$BU";chmod 700 "../$BU"
  $FIND|cpio --null --sparse -pvd ../$BU
elif [ $MODE = CPIOI ]; then
  $FIND|cpio -ov --null | ( mkdircd "../$BU"&&cpio -i )
elif [ $MODE = CPIOSSH ]; then
  $FIND|cpio -ov --null|ssh -C $HOST "( mkdircd \"$EXTP/$BU\"&&cpio -i )"
elif [ $MODE = TARSSH ]; then
  (tar cvf - . )|ssh -C $HOST "( mkdircd \"$EXTP/$BU\"&& tar xvfp - )"
elif [ $MODE = RSYNCSSH ]; then
  rsync -aHAXSv ./ "${HOST}:${EXTP}-${BKUP}-${TIME}"
else
  echo "Any other idea to backup?"
  $FIND |xargs -0 -n 1 echo
fi

This is meant to be command examples. Please read script and edit it by yourself before using it.

[Tip] Tip

I keep this bkup in my "/usr/local/bin/" directory. I issue this bkup command without any option in the working directory whenever I need a temporary snapshot backup.

[Tip] Tip

For making snapshot history of a source file tree or a configuration file tree, it is easier and space efficient to use git(7) (see Section 10.6.5, “Git for recording configuration history”).

The data security infrastructure is provided by the combination of data encryption tool, message digest tool, and signature tool.


See Section 9.8, “Data encryption tips” on dm-crypto and ecryptfs which implement automatic data encryption infrastructure via Linux kernel modules.

Here are GNU Privacy Guard commands for the basic key management.


Here is the meaning of the trust code.


The following uploads my key "1DD8D791" to the popular keyserver "hkp://keys.gnupg.net".

$ gpg --keyserver hkp://keys.gnupg.net --send-keys 1DD8D791

A good default keyserver set up in "~/.gnupg/gpg.conf" (or old location "~/.gnupg/options") contains the following.

keyserver hkp://keys.gnupg.net

The following obtains unknown keys from the keyserver.

$ gpg --list-sigs --with-colons | grep '^sig.*\[User ID not found\]' |\
  cut -d ':' -f 5| sort | uniq | xargs gpg --recv-keys

There was a bug in OpenPGP Public Key Server (pre version 0.9.6) which corrupted key with more than 2 sub-keys. The newer gnupg (>1.2.1-2) package can handle these corrupted subkeys. See gpg(1) under "--repair-pks-subkey-bug" option.

md5sum(1) provides utility to make a digest file using the method in rfc1321 and verifying each file with it.

$ md5sum foo bar >baz.md5
$ cat baz.md5
d3b07384d113edec49eaa6238ad5ff00  foo
c157a79031e1c40f85931829bc5fc552  bar
$ md5sum -c baz.md5
foo: OK
bar: OK
[Note] Note

The computation for the MD5 sum is less CPU intensive than the one for the cryptographic signature by GNU Privacy Guard (GnuPG). Usually, only the top level digest file is cryptographically signed to ensure data integrity.

There are many merge tools for the source code. Following commands caught my eyes.

Table 10.10. List of source code merge tools

package popcon size command description
diffutils V:874, I:987 1574 diff(1) compare files line by line
diffutils V:874, I:987 1574 diff3(1) compare and merges three files line by line
vim V:119, I:395 2799 vimdiff(1) compare 2 files side by side in vim
patch V:115, I:779 243 patch(1) apply a diff file to an original
dpatch V:1, I:13 191 dpatch(1) manage series of patches for Debian package
diffstat V:17, I:179 69 diffstat(1) produce a histogram of changes by the diff
patchutils V:19, I:173 223 combinediff(1) create a cumulative patch from two incremental patches
patchutils V:19, I:173 223 dehtmldiff(1) extract a diff from an HTML page
patchutils V:19, I:173 223 filterdiff(1) extract or excludes diffs from a diff file
patchutils V:19, I:173 223 fixcvsdiff(1) fix diff files created by CVS that patch(1) mis-interprets
patchutils V:19, I:173 223 flipdiff(1) exchange the order of two patches
patchutils V:19, I:173 223 grepdiff(1) show which files are modified by a patch matching a regex
patchutils V:19, I:173 223 interdiff(1) show differences between two unified diff files
patchutils V:19, I:173 223 lsdiff(1) show which files are modified by a patch
patchutils V:19, I:173 223 recountdiff(1) recompute counts and offsets in unified context diffs
patchutils V:19, I:173 223 rediff(1) fix offsets and counts of a hand-edited diff
patchutils V:19, I:173 223 splitdiff(1) separate out incremental patches
patchutils V:19, I:173 223 unwrapdiff(1) demangle patches that have been word-wrapped
wiggle V:0, I:0 174 wiggle(1) apply rejected patches
quilt V:3, I:38 785 quilt(1) manage series of patches
meld V:17, I:42 2942 meld(1) compare and merge files (GTK)
dirdiff V:0, I:2 161 dirdiff(1) display differences and merge changes between directory trees
docdiff V:0, I:0 573 docdiff(1) compare two files word by word / char by char
imediff V:0, I:0 157 imediff(1) interactive full screen 2/3-way merge tool
makepatch V:0, I:0 102 makepatch(1) generate extended patch files
makepatch V:0, I:0 102 applypatch(1) apply extended patch files
wdiff V:8, I:77 644 wdiff(1) display word differences between text files

Here is a summary of the version control systems (VCS) on the Debian system.

[Note] Note

If you are new to VCS systems, you should start learning with Git, which is growing fast in popularity.


VCS is sometimes known as revision control system (RCS), or software configuration management (SCM).

Distributed VCS such as Git is the tool of choice these days. CVS and Subversion may still be useful to join some existing open source program activities.

Debian provides free Git services via Debian Salsa service. Its documentation can be found at https://wiki.debian.org/Salsa .

[Caution] Caution

Debian has closed its old alioth services and the old alioth service data are available at alioth-archive as tarballs.

There are few basics for creating a shared access VCS archive.

Here is an oversimplified comparison of native VCS commands to provide the big picture. The typical command sequence may require options and arguments.


[Caution] Caution

Invoking a git subcommand directly as "git-xyz" from the command line has been deprecated since early 2006.

[Tip] Tip

If there is a executable file git-foo in the path specified by $PATH, entring "git foo" without hyphen to the command line invokes this git-foo. This is a feature of the git command.

[Tip] Tip

GUI tools such as tkcvs(1) and gitk(1) really help you with tracking revision history of files. The web interface provided by many public archives for browsing their repositories is also quite useful, too.

[Tip] Tip

Git can work directly with different VCS repositories such as ones provided by CVS and Subversion, and provides the local repository for local changes with git-cvs and git-svn packages. See git for CVS users, and Section 10.6.4, “Git for the Subversion repository”.

[Tip] Tip

Git has commands which have no equivalents in CVS and Subversion: "fetch", "rebase", "cherry-pick", …

Git can do everything for both local and remote source code management. This means that you can record the source code changes without needing network connectivity to the remote repository.

See the following.

git-gui(1) and gitk(1) commands make using Git very easy.

[Warning] Warning

Do not use the tag string with spaces in it even if some tools such as gitk(1) allow you to use it. It may choke some other git commands.

Even if your upstream uses different VCS, it may be a good idea to use git(1) for local activity since you can manage your local copy of source tree without the network connection to the upstream. Here are some packages and commands used with git(1).


[Tip] Tip

With git(1), you work on a local branch with many commits and use something like "git rebase -i master" to reorganize change history later. This enables you to make clean change history. See git-rebase(1) and git-cherry-pick(1).

[Tip] Tip

When you want to go back to a clean working directory without loosing the current state of the working directory, you can use "git stash". See git-stash(1).

You can manually record chronological history of configuration using Git tools. Here is a simple example for your practice to record "/etc/apt/" contents.

$ cd /etc/apt/
$ sudo git init
$ sudo chmod 700 .git
$ sudo git add .
$ sudo git commit -a

Commit configuration with description.

Make modification to the configuration files.

$ cd /etc/apt/
$ sudo git commit -a

Commit configuration with description and continue your life.

$ cd /etc/apt/
$ sudo gitk --all

You have full configuration history with you.

[Note] Note

sudo(8) is needed to work with any file permissions of configuration data. For user configuration data, you may skip sudo.

[Note] Note

The "chmod 700 .git" command in the above example is needed to protect archive data from unauthorized read access.

[Tip] Tip

For more complete setup for recording configuration history, please look for the etckeeper package: Section 9.2.10, “Recording changes in configuration files”.

CVS is an older version control system before Subversion and Git.

[Caution] Caution

Many URLs found in the below examples for CVS don't exist any more.

See the following.

  • cvs(1)

  • "/usr/share/doc/cvs/html-cvsclient"

  • "/usr/share/doc/cvs/html-info"

  • "/usr/share/doc/cvsbook"

  • "info cvs"

Many public CVS servers provide read-only remote access to them with account name "anonymous" via pserver service. For example, Debian web site contents were maintained by webwml project via CVS at Debian alioth service. The following was used to set up "$CVSROOT" for the remote access to this old CVS repository.

$ export CVSROOT=:pserver:anonymous@anonscm.debian.org:/cvs/webwml
$ cvs login
[Note] Note

Since pserver is prone to eavesdropping attack and insecure, write access is usually disable by server administrators.

The following was used to set up "$CVS_RSH" and "$CVSROOT" for the remote access to the old CVS repository by webwml project with SSH.

$ export CVS_RSH=ssh
$ export CVSROOT=:ext:account@cvs.alioth.debian.org:/cvs/webwml

You can also use public key authentication for SSH which eliminates the remote password prompt.

Here is an example of typical work flow using CVS.

Check all available modules from CVS project pointed by "$CVSROOT" by the following.

$ cvs rls
CVSROOT
module1
module2
...

Checkout "module1" to its default directory "./module1" by the following.

$ cd ~/path/to
$ cvs co module1
$ cd module1

Make changes to the content as needed.

Check changes by making "diff -u [repository] [local]" equivalent by the following.

$ cvs diff -u

You find that you broke some file "file_to_undo" severely but other files are fine.

Overwrite "file_to_undo" file with the clean copy from CVS by the following.

$ cvs up -C file_to_undo

Save the updated local source tree to CVS by the following.

$ cvs ci -m "Describe change"

Create and add "file_to_add" file to CVS by the following.

$ vi file_to_add
$ cvs add file_to_add
$ cvs ci -m "Added file_to_add"

Merge the latest version from CVS by the following.

$ cvs up -d

Watch out for lines starting with "C filename" which indicates conflicting changes.

Look for unmodified code in ".#filename.version".

Search for "<<<<<<<" and ">>>>>>>" in files for conflicting changes.

Edit files to fix conflicts as needed.

Add a release tag "Release-1" by the following.

$ cvs ci -m "last commit for Release-1"
$ cvs tag Release-1

Edit further.

Remove the release tag "Release-1" by the following.

$ cvs tag -d Release-1

Check in changes to CVS by the following.

$ cvs ci -m "real last commit for Release-1"

Re-add the release tag "Release-1" to updated CVS HEAD of main by the following.

$ cvs tag Release-1

Create a branch with a sticky branch tag "Release-initial-bugfixes" from the original version pointed by the tag "Release-initial" and check it out to "~/path/to/old" directory by the following.

$ cvs rtag -b -r Release-initial Release-initial-bugfixes module1
$ cd ~/path/to
$ cvs co -r Release-initial-bugfixes -d old module1
$ cd old
[Tip] Tip

Use "-D 2005-12-20" (ISO 8601 date format) instead of "-r Release-initial" to specify particular date as the branch point.

Work on this local source tree having the sticky tag "Release-initial-bugfixes" which is based on the original version.

Work on this branch by yourself … until someone else joins to this "Release-initial-bugfixes" branch.

Sync with files modified by others on this branch while creating new directories as needed by the following.

$ cvs up -d

Edit files to fix conflicts as needed.

Check in changes to CVS by the following.

$ cvs ci -m "checked into this branch"

Update the local tree by HEAD of main while removing sticky tag ("-A") and without keyword expansion ("-kk") by the following.

$ cvs up -d -kk -A

Update the local tree (content = HEAD of main) by merging from the "Release-initial-bugfixes" branch and without keyword expansion by the following.

$ cvs up -d -kk -j Release-initial-bugfixes

Fix conflicts with editor.

Check in changes to CVS by the following.

$ cvs ci -m "merged Release-initial-bugfixes"

Make archive by the following.

$ cd ..
$ mv old old-module1-bugfixes
$ tar -cvzf old-module1-bugfixes.tar.gz old-module1-bugfixes
$ rm -rf old-module1-bugfixes
[Tip] Tip

"cvs up" command can take "-d" option to create new directories and "-P" option to prune empty directories.

[Tip] Tip

You can checkout only a sub directory of "module1" by providing its name as "cvs co module1/subdir".


Subversion is an older version control system before Git but after CVS. It lacks tagging and branching features found in CVS and Git.

You need to install subversion, libapache2-mod-svn and subversion-tools packages to set up a Subversion server.

Here is an example of typical work flow using Subversion with its native client.

[Tip] Tip

Client commands offered by the git-svn package may offer alternative work flow of Subversion using the git command. See Section 10.6.4, “Git for the Subversion repository”.

Check all available modules from Subversion project pointed by URL "file:///srv/svn/project" by the following.

$ svn list file:///srv/svn/project
module1
module2
...

Checkout "module1/trunk" to a directory "module1" by the following.

$ cd ~/path/to
$ svn co file:///srv/svn/project/module1/trunk module1
$ cd module1

Make changes to the content as needed.

Check changes by making "diff -u [repository] [local]" equivalent by the following.

$ svn diff

You find that you broke some file "file_to_undo" severely but other files are fine.

Overwrite "file_to_undo" file with the clean copy from Subversion by the following.

$ svn revert file_to_undo

Save the updated local source tree to Subversion by the following.

$ svn ci -m "Describe change"

Create and add "file_to_add" file to Subversion by the following.

$ vi file_to_add
$ svn add file_to_add
$ svn ci -m "Added file_to_add"

Merge the latest version from Subversion by the following.

$ svn up

Watch out for lines starting with "C filename" which indicates conflicting changes.

Look for unmodified code in, e.g., "filename.r6", "filename.r9", and "filename.mine".

Search for "<<<<<<<" and ">>>>>>>" in files for conflicting changes.

Edit files to fix conflicts as needed.

Add a release tag "Release-1" by the following.

$ svn ci -m "last commit for Release-1"
$ svn cp file:///srv/svn/project/module1/trunk file:///srv/svn/project/module1/tags/Release-1

Edit further.

Remove the release tag "Release-1" by the following.

$ svn rm file:///srv/svn/project/module1/tags/Release-1

Check in changes to Subversion by the following.

$ svn ci -m "real last commit for Release-1"

Re-add the release tag "Release-1" from updated Subversion HEAD of trunk by the following.

$ svn cp file:///srv/svn/project/module1/trunk file:///srv/svn/project/module1/tags/Release-1

Create a branch with a path "module1/branches/Release-initial-bugfixes" from the original version pointed by the path "module1/tags/Release-initial" and check it out to "~/path/to/old" directory by the following.

$ svn cp file:///srv/svn/project/module1/tags/Release-initial file:///srv/svn/project/module1/branches/Release-initial-bugfixes
$ cd ~/path/to
$ svn co file:///srv/svn/project/module1/branches/Release-initial-bugfixes old
$ cd old
[Tip] Tip

Use "module1/trunk@{2005-12-20}" (ISO 8601 date format) instead of "module1/tags/Release-initial" to specify particular date as the branch point.

Work on this local source tree pointing to branch "Release-initial-bugfixes" which is based on the original version.

Work on this branch by yourself … until someone else joins to this "Release-initial-bugfixes" branch.

Sync with files modified by others on this branch by the following.

$ svn up

Edit files to fix conflicts as needed.

Check in changes to Subversion by the following.

$ svn ci -m "checked into this branch"

Update the local tree with HEAD of trunk by the following.

$ svn switch file:///srv/svn/project/module1/trunk

Update the local tree (content = HEAD of trunk) by merging from the "Release-initial-bugfixes" branch by the following.

$ svn merge file:///srv/svn/project/module1/branches/Release-initial-bugfixes

Fix conflicts with editor.

Check in changes to Subversion by the following.

$ svn ci -m "merged Release-initial-bugfixes"

Make archive by the following.

$ cd ..
$ mv old old-module1-bugfixes
$ tar -cvzf old-module1-bugfixes.tar.gz old-module1-bugfixes
$ rm -rf old-module1-bugfixes
[Tip] Tip

You can replace URLs such as "file:///…" by any other URL formats such as "http://…" and "svn+ssh://…".

[Tip] Tip

You can checkout only a sub directory of "module1" by providing its name as "svn co file:///srv/svn/project/module1/trunk/subdir module1/subdir", etc.