リポジトリの保守

リポジトリの保守
前のページ	第5章リポジトリの管理	次のページ

Maintaining a Subversion repository can be daunting, mostly due to the complexities inherent in systems which have a database backend. Doing the task well is all about knowing the tools—what they are, when to use them, and how to use them. This section will introduce you to the repository administration tools provided by Subversion, and how to wield them to accomplish tasks such as repository data migration, upgrades, backups and cleanups.

管理者用ツールキット

Subversion はリポジトリの作成、調査、修正、修復に便利なユーティリティーをいくつも提供しています。それぞれについてもっと詳しく見てみましょう。その後 Berkeley DB のディストリビューションに含まれるユーティリティーのいくつかを簡単にためしてみます。 Berkeley DB は Subversion 自身のツールとしては提供していないリポジトリデータベースバックエンドに特化した機能を提供しています。

svnadmin

The svnadmin program is the repository administrator's best friend. Besides providing the ability to create Subversion repositories, this program allows you to perform several maintenance operations on those repositories. The syntax of svnadmin is similar to that of other Subversion command-line programs:

$ svnadmin help
general usage: svnadmin SUBCOMMAND REPOS_PATH  [ARGS & OPTIONS ...]
Type 'svnadmin help <subcommand>' for help on a specific subcommand.
Type 'svnadmin --version' to see the program version and FS modules.

Available subcommands:
   crashtest
   create
   deltify
…

We've already mentioned svnadmin's create subcommand (see リポジトリの作成項). Most of the others we will cover later in this chapter. And you can consult svnadmin項 for a full rundown of subcommands and what each of them offers.

svnlook

svnlook is a tool provided by Subversion for examining the various revisions and transactions (which are revisions in-the-making) in a repository. No part of this program attempts to change the repository. svnlook is typically used by the repository hooks for reporting the changes that are about to be committed (in the case of the pre-commit hook) or that were just committed (in the case of the post-commit hook) to the repository. A repository administrator may use this tool for diagnostic purposes.

svnlook は単純な構文です:

$ svnlook help
general usage: svnlook SUBCOMMAND REPOS_PATH [ARGS & OPTIONS ...]
Note: any subcommand which takes the '--revision' and '--transaction'
      options will, if invoked without one of those options, act on
      the repository's youngest revision.
Type 'svnlook help <subcommand>' for help on a specific subcommand.
Type 'svnlook --version' to see the program version and FS modules.
…

Nearly every one of svnlook's subcommands can operate on either a revision or a transaction tree, printing information about the tree itself, or how it differs from the previous revision of the repository. You use the --revision (-r) and --transaction (-t) options to specify which revision or transaction, respectively, to examine. In the absence of both the --revision (-r) and --transaction (-t) options, svnlook will examine the youngest (or 「HEAD」) revision in the repository. So the following two commands do exactly the same thing when 19 is the youngest revision in the repository located at /path/to/repos:

$ svnlook info /path/to/repos
$ svnlook info /path/to/repos -r 19

The only exception to these rules about subcommands is the svnlook youngest subcommand, which takes no options, and simply prints out the repository's youngest revision number.

$ svnlook youngest /path/to/repos
19

注意

Keep in mind that the only transactions you can browse are uncommitted ones. Most repositories will have no such transactions, because transactions are usually either committed (in which case, you should access them as revision with the --revision (-r) option) or aborted and removed.

svnlook の出力は人間にもマシンにも理解できるように設計されています。 info サブコマンドを例にします:

$ svnlook info /path/to/repos
sally
2002-11-04 09:29:13 -0600 (Mon, 04 Nov 2002)
27
Added the usual
Greek tree.

info サブコマンドの出力は、以下のように定義されています。

作業者、改行
日付、改行
ログメッセージの長さ、改行
ログメッセージ自身、改行

This output is human-readable, meaning items like the datestamp are displayed using a textual representation instead of something more obscure (such as the number of nanoseconds since the Tasty Freeze guy drove by). But the output is also machine-parsable—because the log message can contain multiple lines and be unbounded in length, svnlook provides the length of that message before the message itself. This allows scripts and other wrappers around this command to make intelligent decisions about the log message, such as how much memory to allocate for the message, or at least how many bytes to skip in the event that this output is not the last bit of data in the stream.

svnlook can perform a variety of other queries: displaying subsets of bits of information we've mentioned previously, recursively listing versioned directory trees, reporting which paths were modified in a given revision or transaction, showing textual and property differences made to files and directories, and so on. See svnlook項 for a full reference of svnlook's features.

svndumpfilter

While it won't be the most commonly used tool at the administrator's disposal, svndumpfilter provides a very particular brand of useful functionality—the ability to quickly and easily modify streams of Subversion repository history data by acting as a path-based filter.

svndumpfilter の構文は以下になります。

$ svndumpfilter help
general usage: svndumpfilter SUBCOMMAND [ARGS & OPTIONS ...]
Type "svndumpfilter help <subcommand>" for help on a specific subcommand.
Type 'svndumpfilter --version' to see the program version.
  
Available subcommands:
   exclude
   include
   help (?, h)

興味深いサブコマンドは二つだけです。これらのサブコマンドを使って、ストリーム中で明示的に、あるいは暗黙に取得するパスを選ぶことができます。:

exclude: ダンプデータストリームから特定のパスを排除します。
include: ダンプデータストリームから、指定したパスだけを出力するようにします。

You can learn more about these subcommands and svndumpfilter's unique purpose in リポジトリ履歴のフィルタリング項.

svnsync

The svnsync program, which is new to the 1.4 release of Subversion, provides all the functionality required for maintaining a read-only mirror of a Subversion repository. The program really has one job—to transfer one repository's versioned history into another repository. And while there are few ways to do that, its primary strength is that it can operate remotely—the 「source」 and 「sink」 ^[32] repositories may be on different computers from each other and from svnsync itself.

As you might expect, svnsync has a syntax that looks very much like every other program we've mentioned in this chapter:

$ svnsync help
general usage: svnsync SUBCOMMAND DEST_URL  [ARGS & OPTIONS ...]
Type 'svnsync help <subcommand>' for help on a specific subcommand.
Type 'svnsync --version' to see the program version and RA modules.

Available subcommands:
   initialize (init)
   synchronize (sync)
   copy-revprops
   help (?, h)
$

We talk more about replication repositories with svnsync in リポジトリの複製項.

Berkeley DB ユーティリティー

If you're using a Berkeley DB repository, then all of your versioned filesystem's structure and data live in a set of database tables within the db/ subdirectory of your repository. This subdirectory is a regular Berkeley DB environment directory, and can therefore be used in conjunction with any of the Berkeley database tools, typically provided as part of the Berkeley DB distribution.

通常のSubversion の利用ではこれらのツールは不要です。Subversion リポジトリに必要なほとんどの機能はsvnadminを使って実行することができます。たとえばsvnadmin list-unused-dblogsと svnadmin list-dblogsは Berkeley の db_archiveで提供されている機能のサブセットであり、 svnadmin recoverは db_recover ユーティリティーの普通の状況での利用の仕方を反映したコマンドです。

However, there are still a few Berkeley DB utilities that you might find useful. The db_dump and db_load programs write and read, respectively, a custom file format which describes the keys and values in a Berkeley DB database. Since Berkeley databases are not portable across machine architectures, this format is a useful way to transfer those databases from machine to machine, irrespective of architecture or operating system. As we describe later in this chapter, you can also use svnadmin dump and svnadmin load for similar purposes, but db_dump and db_load can do certain jobs just as well and much faster. They can also be useful if the experienced Berkeley DB hacker needs to do in-place tweaking of the data in a BDB-backed repository for some reason, which is something Subversion's utilities won't allow. Also, the db_stat utility can provide useful information about the status of your Berkeley DB environment, including detailed statistics about the locking and storage subsystems.

For more information on the Berkeley DB tool chain, visit the documentation section of the Berkeley DB section of Oracle's website, located at http://www.oracle.com/technology/documentation/berkeley-db/db/.

コミットログメッセージの正確性

Sometimes a user will have an error in her log message (a misspelling or some misinformation, perhaps). If the repository is configured (using the pre-revprop-change hook; see リポジトリフックの実装項) to accept changes to this log message after the commit is finished, then the user can 「fix」 her log message remotely using the svn program's propset command (see svn propset). However, because of the potential to lose information forever, Subversion repositories are not, by default, configured to allow changes to unversioned properties—except by an administrator.

もしログメッセージを管理者が変更する必要がある場合、 svnadmin setlogを使います。このコマンドはリポジトリの指定したリビジョンのログメッセージ(svn:log 属性 )を、用意したファイルから新しい値を読み出し形で変更します

$ echo "Here is the new, correct log message" > newlog.txt
$ svnadmin setlog myrepos newlog.txt -r 388

The svnadmin setlog command, by default, is still bound by the same protections against modifying unversioned properties as a remote client is—the pre- and post-revprop-change hooks are still triggered, and therefore must be set up to accept changes of this nature. But an administrator can get around these protections by passing the --bypass-hooks option to the svnadmin setlog command.

警告

しかしフックを回避すると、属性変更、バージョン化されていない属性変更を追うためのバックアップシステム、などなどに関係した通知メールも回避されてしまうことに注意してください。言い換えると、何を、どのように修正するかについて、非常に注意して実行してください。

ディスク領域の管理

While the cost of storage has dropped incredibly in the past few years, disk usage is still a valid concern for administrators seeking to version large amounts of data. Every bit of version history information stored in the live repository needs to be backed up elsewhere, perhaps multiple times as part of rotating backup schedules. It is useful to know what pieces of Subversion's repository data need to remain on the live site, which need to be backed up, and which can be safely removed.

Subversion はどのようにディスク消費を抑えるか

To keep the repository small, Subversion uses deltification (or, 「deltified storage」) within the repository itself. Deltification involves encoding the representation of a chunk of data as a collection of differences against some other chunk of data. If the two pieces of data are very similar, this deltification results in storage savings for the deltified chunk—rather than taking up space equal to the size of the original data, it takes up only enough space to say, 「I look just like this other piece of data over here, except for the following couple of changes」. The result is that most of the repository data that tends to be bulky—namely, the contents of versioned files—is stored at a much smaller size than the original 「fulltext」 representation of that data. And for repositories created with Subversion 1.4 or later, the space savings are even better—now those fulltext representations of file contents are themselves compressed.

注意

Because all of the data that is subject to deltification in a BDB-backed repository is stored in a single Berkeley DB database file, reducing the size of the stored values will not immediately reduce the size of the database file itself. Berkeley DB will, however, keep internal records of unused areas of the database file, and consume those areas first before growing the size of the database file. So while deltification doesn't produce immediate space savings, it can drastically slow future growth of the database.

死んだトランザクションの削除

Though they are uncommon, there are circumstances in which a Subversion commit process might fail, leaving behind in the repository the remnants of the revision-to-be that wasn't—an uncommitted transaction and all the file and directory changes associated with it. This could happen for several reasons: perhaps the client operation was inelegantly terminated by the user, or a network failure occurred in the middle of an operation. Regardless of the reason, dead transactions can happen. They don't do any real harm, other than consuming disk space. A fastidious administrator may nonetheless wish to remove them.

svnadminのlstxns コマンドを使って、その時点での未完了のトランザクションの名前の一覧表示することができます。

$ svnadmin lstxns myrepos
19
3a1
a45
$

Each item in the resultant output can then be used with svnlook (and its --transaction (-t) option) to determine who created the transaction, when it was created, what types of changes were made in the transaction—information that is helpful in determining whether or not the transaction is a safe candidate for removal! If you do indeed want to remove a transaction, its name can be passed to svnadmin rmtxns, which will perform the cleanup of the transaction. In fact, the rmtxns subcommand can take its input directly from the output of lstxns!

$ svnadmin rmtxns myrepos `svnadmin lstxns myrepos`
$

If you use these two subcommands like this, you should consider making your repository temporarily inaccessible to clients. That way, no one can begin a legitimate transaction before you start your cleanup. 例 5.1. 「txn-info.sh (未解決トランザクションの表示)」 contains a bit of shell-scripting that can quickly generate information about each outstanding transaction in your repository.

例 5.1. txn-info.sh (未解決トランザクションの表示)

#!/bin/sh

### Generate informational output for all outstanding transactions in
### a Subversion repository.

REPOS="${1}"
if [ "x$REPOS" = x ] ; then
  echo "usage: $0 REPOS_PATH"
  exit
fi

for TXN in `svnadmin lstxns ${REPOS}`; do 
  echo "---[ Transaction ${TXN} ]-------------------------------------------"
  svnlook info "${REPOS}" -t "${TXN}"
done

The output of the script is basically a concatenation of several chunks of svnlook info output (see svnlook項), and will look something like:

$ txn-info.sh myrepos
---[ Transaction 19 ]-------------------------------------------
sally
2001-09-04 11:57:19 -0500 (Tue, 04 Sep 2001)
0
---[ Transaction 3a1 ]-------------------------------------------
harry
2001-09-10 16:50:30 -0500 (Mon, 10 Sep 2001)
39
Trying to commit over a faulty network.
---[ Transaction a45 ]-------------------------------------------
sally
2001-09-12 11:09:28 -0500 (Wed, 12 Sep 2001)
0
$

長く放置されているトランザクションは普通は何かに失敗したか、コミットを中断されたかのどちらかです。トランザクションの日付スタンプは役に立つ情報を与えてくれます — たとえば 9 ヵ月も前に始まった操作がいまだに有効である可能性など、いったいどの程度あるのでしょうか?

In short, transaction cleanup decisions need not be made unwisely. Various sources of information—including Apache's error and access logs, Subversion's operational logs, Subversion revision history, and so on—can be employed in the decision-making process. And of course, an administrator can often simply communicate with a seemingly dead transaction's owner (via email, for example) to verify that the transaction is, in fact, in a zombie state.

Purging unused Berkeley DB logfiles

Until recently, the largest offender of disk space usage with respect to BDB-backed Subversion repositories was the log files in which Berkeley DB performs its pre-writes before modifying the actual database files. These files capture all the actions taken along the route of changing the database from one state to another—while the database files, at any given time, reflect a particular state, the log files contain all the many changes along the way between states. Thus, they can grow and accumulate quite rapidly.

Fortunately, beginning with the 4.2 release of Berkeley DB, the database environment has the ability to remove its own unused log files automatically. Any repositories created using an svnadmin which is compiled against Berkeley DB version 4.2 or greater will be configured for this automatic log file removal. If you don't want this feature enabled, simply pass the --bdb-log-keep option to the svnadmin create command. If you forget to do this, or change your mind at a later time, simply edit the DB_CONFIG file found in your repository's db directory, comment out the line which contains the set_flags DB_LOG_AUTOREMOVE directive, and then run svnadmin recover on your repository to force the configuration changes to take effect. See Berkeley DB の設定項 for more information about database configuration.

このような自動ログファイル削除の仕組みを利用しなければ、リポジトリを利用するにつれてログファイルは蓄積されていきます。それてこれは実際にデータベースシステムであれば当然付いている機能です— ログファイル以外に何も残っていないような状況でデータベース全体を再構成することができるようになっていなくてはならず、そのようなログファイルはデータベースの壊滅的な破壊からの復旧で利用できなければならないからです。しかし普通は Berkeley DB で既に利用されていないログファイルをアーカイブし、その後ディスクから削除することで領域を広げようとするでしょう。利用していないログファイルの一覧を見るにはsvnadmin list-unused-dblogsコマンドを使ってください:

$ svnadmin list-unused-dblogs /path/to/repos
/path/to/repos/log.0000000031
/path/to/repos/log.0000000032
/path/to/repos/log.0000000033
…
$ rm `svnadmin list-unused-dblogs /path/to/repos`
## disk space reclaimed!

警告

BDB-backed repositories whose log files are used as part of a backup or disaster recovery plan should not make use of the log file autoremoval feature. Reconstruction of a repository's data from log files can only be accomplished when all the log files are available. If some of the log files are removed from disk before the backup system has a chance to copy them elsewhere, the incomplete set of backed-up log files is essentially useless.

Berkeley DB の復旧

As mentioned in Berkeley DB項, a Berkeley DB repository can sometimes be left in frozen state if not closed properly. When this happens, an administrator needs to rewind the database back into a consistent state. This is unique to BDB-backed repositories, though—if you are using FSFS-backed ones instead, this won't apply to you. And for those of you using Subversion 1.4 with Berkeley DB 4.4 or better, you should find that Subversion has become much more resilient in these types of situations. Still, wedged Berkeley DB repositories do occur, and an administrator needs to know how to safely deal with this circumstance.

In order to protect the data in your repository, Berkeley DB uses a locking mechanism. This mechanism ensures that portions of the database are not simultaneously modified by multiple database accessors, and that each process sees the data in the correct state when that data is being read from the database. When a process needs to change something in the database, it first checks for the existence of a lock on the target data. If the data is not locked, the process locks the data, makes the change it wants to make, and then unlocks the data. Other processes are forced to wait until that lock is removed before they are permitted to continue accessing that section of the database. (This has nothing to do with the locks that you, as a user, can apply to versioned files within the repository; we try to clear up the confusion caused by this terminology collision in 「lock」の三つの意味.)

In the course of using your Subversion repository, fatal errors or interruptions can prevent a process from having the chance to remove the locks it has placed in the database. The result is that the back-end database system gets 「wedged」. When this happens, any attempts to access the repository hang indefinitely (since each new accessor is waiting for a lock to go away—which isn't going to happen).

If this happens to your repository, don't panic. The Berkeley DB filesystem takes advantage of database transactions and checkpoints and pre-write journaling to ensure that only the most catastrophic of events ^[33] can permanently destroy a database environment. A sufficiently paranoid repository administrator will have made off-site backups of the repository data in some fashion, but don't head off to the tape backup storage closet just yet.

Instead, use the following recipe to attempt to 「unwedge」 your repository:

Make sure that there are no processes accessing (or attempting to access) the repository. For networked repositories, this means shutting down the Apache HTTP Server or svnserve daemon, too.
リポジトリを所有し、管理しているユーザになってください。これは重要ですが、実行時と同様、復旧時に間違ったユーザで作業することによってもリポジトリファイルのパーミッションが変更されてしまうかも知れないからです。これによって実際には「復旧」したのにアクセス不能のままになってしまう可能性があります。
svnadmin recover /path/to/reposコマンドを実行してください。以下のような出力が表示されると思います:
```
Repository lock acquired.
Please wait; recovering the repository may take some time...

Recovery completed.
The latest repos revision is 19.
```
このコマンドは完了までに数分かかることもあります。
サーバプロセスを再起動してください。

この方法はほとんどのリポジトリロックを解消します。このコマンドは単に rootになるのではなく、データベースを所有し、管理しているユーザで実行することに注意してください。復旧作業は、傷を負ったいろいろなデータベースファイルからの再作成の作業も含みます。(たとえば共有メモリ領域などです) root での復旧は、root が所有しているファイルを作成することで、これはリポジトリへの接続状況が復旧した後でも通常のユーザはこれに対してアクセスすることができないことを意味します。

If the previous procedure, for some reason, does not successfully unwedge your repository, you should do two things. First, move your broken repository directory aside (perhaps by renaming it to something like repos.BROKEN) and then restore your latest backup of it. Then, send an email to the Subversion user list (at <users@subversion.tigris.org>) describing your problem in detail. Data integrity is an extremely high priority to the Subversion developers.

リポジトリデータを別の場所へ移動

A Subversion filesystem has its data spread throughout files in the repository, in a fashion generally understood by (and of interest to) only the Subversion developers themselves. However, circumstances may arise that call for all, or some subset, of that data to be copied or moved into another repository.

Subversion provides such functionality by way of repository dump streams. A repository dump stream (often referred to as a 「dumpfile」 when stored as a file on disk) is a portable, flat file format that describes the various revisions in your repository—what was changed, by whom, when, and so on. This dump stream is the primary mechanism used to marshal versioned history—in whole or in part, with or without modification—between repositories. And Subversion provides the tools necessary for creating and loading these dump streams—the svnadmin dump and svnadmin load subcommands, respectively.

警告

While the Subversion repository dump format contains human-readable portions and a familiar structure (it resembles an RFC-822 format, the same type of format used for most email), it is not a plaintext file format. It is a binary file format, highly sensitive to meddling. For example, many text editors will corrupt the file by automatically converting line endings.

There are many reasons for dumping and loading Subversion repository data. Early in Subversion's life, the most common reason was due to the evolution of Subversion itself. As Subversion matured, there were times when changes made to the back-end database schema caused compatibility issues with previous versions of the repository, so users had to dump their repository data using the previous version of Subversion, and load it into a freshly created repository with the new version of Subversion. Now, these types of schema changes haven't occurred since Subversion's 1.0 release, and the Subversion developers promise not to force users to dump and load their repositories when upgrading between minor versions (such as from 1.3 to 1.4) of Subversion. But there are still other reasons for dumping and loading, including re-deploying a Berkeley DB repository on a new OS or CPU architecture, switching between the Berkeley DB and FSFS back-ends, or (as we'll cover in リポジトリ履歴のフィルタリング項) purging versioned data from repository history.

Whatever your reason for migrating repository history, using the svnadmin dump and svnadmin load subcommands is straightforward. svnadmin dump will output a range of repository revisions that are formatted using Subversion's custom filesystem dump format. The dump format is printed to the standard output stream, while informative messages are printed to the standard error stream. This allows you to redirect the output stream to a file while watching the status output in your terminal window. For example:

$ svnlook youngest myrepos
26
$ svnadmin dump myrepos > dumpfile
* Dumped revision 0.
* Dumped revision 1.
* Dumped revision 2.
…
* Dumped revision 25.
* Dumped revision 26.

At the end of the process, you will have a single file (dumpfile in the previous example) that contains all the data stored in your repository in the requested range of revisions. Note that svnadmin dump is reading revision trees from the repository just like any other 「reader」 process would (svn checkout, for example), so it's safe to run this command at any time.

組になったもう一方のサブコマンドであるsvnadmin loadは、標準入力を、Subversionリポジトリのダンプファイルとして解析し、ダンプされたリビジョンを目的のリポジトリに再現します。それはまた経過情報などを返しますが、こちらは標準出力に表示します:

$ svnadmin load newrepos < dumpfile
<<< Started new txn, based on original revision 1
     * adding path : A ... done.
     * adding path : A/B ... done.
     …
------- Committed new rev 1 (loaded from original rev 1) >>>

<<< Started new txn, based on original revision 2
     * editing path : A/mu ... done.
     * editing path : A/D/G/rho ... done.

------- Committed new rev 2 (loaded from original rev 2) >>>

…

<<< Started new txn, based on original revision 25
     * editing path : A/D/gamma ... done.

------- Committed new rev 25 (loaded from original rev 25) >>>

<<< Started new txn, based on original revision 26
     * adding path : A/Z/zeta ... done.
     * editing path : A/mu ... done.

------- Committed new rev 26 (loaded from original rev 26) >>>

The result of a load is new revisions added to a repository—the same thing you get by making commits against that repository from a regular Subversion client. And just as in a commit, you can use hook programs to perform actions before and after each of the commits made during a load process. By passing the --use-pre-commit-hook and --use-post-commit-hook options to svnadmin load, you can instruct Subversion to execute the pre-commit and post-commit hook programs, respectively, for each loaded revision. You might use these, for example, to ensure that loaded revisions pass through the same validation steps that regular commits pass through. Of course, you should use these options with care—if your post-commit hook sends emails to a mailing list for each new commit, you might not want to spew hundreds or thousands of commit emails in rapid succession at that list! You can read more about the use of hook scripts in リポジトリフックの実装項.

svnadmin は標準入力と標準出力をリポジトリのダンプとロード処理に使うので、気の利いた人は、以下のようなやり方を試すこともできます(おそらく、パイプの両側のsvnadmin は、異なるバージョンであるかも知れません):

$ svnadmin create newrepos
$ svnadmin dump oldrepos | svnadmin load newrepos

By default, the dump file will be quite large—much larger than the repository itself. That's because by default every version of every file is expressed as a full text in the dump file. This is the fastest and simplest behavior, and nice if you're piping the dump data directly into some other process (such as a compression program, filtering program, or into a loading process). But if you're creating a dump file for longer-term storage, you'll likely want to save disk space by using the --deltas option. With this option, successive revisions of files will be output as compressed, binary differences—just as file revisions are stored in a repository. This option is slower, but results in a dump file much closer in size to the original repository.

We mentioned previously that svnadmin dump outputs a range of revisions. Use the --revision (-r) option to specify a single revision to dump, or a range of revisions. If you omit this option, all the existing repository revisions will be dumped.

$ svnadmin dump myrepos -r 23 > rev-23.dumpfile
$ svnadmin dump myrepos -r 100:200 > revs-100-200.dumpfile

Subversionはそれぞれの新しいリビジョンをダンプするのでその出力には後で実行されるローダが前のリビジョンを元にしてそのリビジョンを再作成するのに必要な十分な情報があります。言い換えると、ダンプファイル中でどのようなリビジョンが指定されてもリビジョン中で変更のあったアイテムのみがダンプに現れるということです。この規則の唯一の例外は、現在のsvnadmin dump がダンプする最初のリビジョンです。

By default, Subversion will not express the first dumped revision as merely differences to be applied to the previous revision. For one thing, there is no previous revision in the dump file! And secondly, Subversion cannot know the state of the repository into which the dump data will be loaded (if it ever is). To ensure that the output of each execution of svnadmin dump is self-sufficient, the first dumped revision is by default a full representation of every directory, file, and property in that revision of the repository.

しかし、このデフォルトの振る舞いを変えることもできます。リポジトリをダンプするときに--incremental オプションを追加すると svnadmin は最初のダンプリビジョンとリポジトリ中の直前リビジョンとの差分をとろうとします。残りのすべてのダンプされるリビジョンにも同じ方法で扱います。それからダンプ範囲にある残りのリビジョンが出力するのと同じように最初のリビジョンを—リビジョン中に起こる変更だけを考慮して出力します。この利点は大きな一つのダンプファイルのかわりに、ロードに成醐ｻ駈ｷるような小さないくつものダンプファイルを作ることができることです。こんな感じです :

$ svnadmin dump myrepos -r 0:1000 > dumpfile1
$ svnadmin dump myrepos -r 1001:2000 --incremental > dumpfile2
$ svnadmin dump myrepos -r 2001:3000 --incremental > dumpfile3

これらのダンプファイルは以下のようなコマンドの流れで新しいリポジトリ中にロードされます:

$ svnadmin load newrepos < dumpfile1
$ svnadmin load newrepos < dumpfile2
$ svnadmin load newrepos < dumpfile3

Another neat trick you can perform with this --incremental option involves appending to an existing dump file a new range of dumped revisions. For example, you might have a post-commit hook that simply appends the repository dump of the single revision that triggered the hook. Or you might have a script that runs nightly to append dump file data for all the revisions that were added to the repository since the last time the script ran. Used like this, svnadmin dump can be one way to back up changes to your repository over time in case of a system crash or some other catastrophic event.

ダンプ形式はまたさまざまな異なるリポジトリの内容を単一のリポジトリにマージするために利用することもできます。 svnadmin loadの --parent-dirオプションを使ってロードプロセス用の新たな仮想ルートディレクトリを指定することができます。これは、もしcalc-dumpfile, cal-dumpfile, そして ss-dumpfileという三つのリポジトリのダンプファイルがある場合、最初にそれらすべてを保持するような新しいリポジトリを作ることができることを意味します:

$ svnadmin create /path/to/projects
$

それから三つの以前のリポジトリのそれぞれの内容を含んだ新しいディレクトリをリポジトリ中に作ります:

$ svn mkdir -m "Initial project roots" ¥
      file:///path/to/projects/calc ¥
      file:///path/to/projects/calendar ¥
      file:///path/to/projects/spreadsheet
Committed revision 1.
$

最後に個々のダンプファイルを新しいリポジトリのそれぞれの場所にロードします:

$ svnadmin load /path/to/projects --parent-dir calc < calc-dumpfile
…
$ svnadmin load /path/to/projects --parent-dir calendar < cal-dumpfile
…
$ svnadmin load /path/to/projects --parent-dir spreadsheet < ss-dumpfile
…
$

We'll mention one final way to use the Subversion repository dump format—conversion from a different storage mechanism or version control system altogether. Because the dump file format is, for the most part, human-readable, it should be relatively easy to describe generic sets of changes—each of which should be treated as a new revision—using this file format. In fact, the cvs2svn utility (see CVS から Subversion へのリポジトリ変換項) uses the dump format to represent the contents of a CVS repository so that those contents can be copied into a Subversion repository.

リポジトリ履歴のフィルタリング

Since Subversion stores your versioned history using, at the very least, binary differencing algorithms and data compression (optionally in a completely opaque database system), attempting manual tweaks is unwise, if not quite difficult, and at any rate strongly discouraged. And once data has been stored in your repository, Subversion generally doesn't provide an easy way to remove that data. ^[34] But inevitably, there will be times when you would like to manipulate the history of your repository. You might need to strip out all instances of a file that was accidentally added to the repository (and shouldn't be there for whatever reason). ^[35] Or, perhaps you have multiple projects sharing a single repository, and you decide to split them up into their own repositories. To accomplish tasks like this, administrators need a more manageable and malleable representation of the data in their repositories—the Subversion repository dump format.

As we described in リポジトリデータを別の場所へ移動項, the Subversion repository dump format is a human-readable representation of the changes that you've made to your versioned data over time. You use the svnadmin dump command to generate the dump data, and svnadmin load to populate a new repository with it (see リポジトリデータを別の場所へ移動項). The great thing about the human-readability aspect of the dump format is that, if you aren't careless about it, you can manually inspect and modify it. Of course, the downside is that if you have three years' worth of repository activity encapsulated in what is likely to be a very large dump file, it could take you a long, long time to manually inspect and modify it.

That's where svndumpfilter becomes useful. This program acts as path-based filter for repository dump streams. Simply give it either a list of paths you wish to keep, or a list of paths you wish to not keep, then pipe your repository dump data through this filter. The result will be a modified stream of dump data that contains only the versioned paths you (explicitly or implicitly) requested.

このプログラムが実際にどのように動作するか例を見てみましょう。別の場所でリポジトリ中でどのようにレイアウトを選ぶかを決める手順について議論しました(リポジトリ構成の計画項)—プロジェクトごとのリポジトリ、あるいはそれらをまとめたものを使って、リポジトリ中で構成を変更し、などの手法です。しかし、新しいリポジトリが運用されたあとで、よくレイアウトを再編成していくつかの修正をしたいということもあります。一番多いのは一つのリポジトリを共有していた複数のプロジェクトをプロジェクトごとの別々のリポジトリに分離したい、という場合です。

私たちの架空のリポジトリは三つのプロジェクトを含んでいます: calc, calendar, そして spreadsheetです。それらは以下のようなレイアウトになっています:

/
   calc/
      trunk/
      branches/
      tags/
   calendar/
      trunk/
      branches/
      tags/
   spreadsheet/
      trunk/
      branches/
      tags/

これら三つのプロジェクトごとの固有のリポジトリを手に入れるには、まずリポジトリ全体をダンプします:

$ svnadmin dump /path/to/repos > repos-dumpfile
* Dumped revision 0.
* Dumped revision 1.
* Dumped revision 2.
* Dumped revision 3.
…
$

次に結果のダンプファイルをフィルタに通しますが、各実行時でただ一つの最上位ディレクトリを含むように指定することで、三つの新しいダンプファイルを生成することができます:

$ svndumpfilter include calc < repos-dumpfile > calc-dumpfile
…
$ svndumpfilter include calendar < repos-dumpfile > cal-dumpfile
…
$ svndumpfilter include spreadsheet < repos-dumpfile > ss-dumpfile
…
$

この時点で、判断しなくてはなりません。上でできた三つのダンプファイルは正しいリポジトリですが、元のリポジトリ中にあった通りのパス構成で保存されています。これはcalcプロジェクト単独のリポジトリを取得したにもかかわらず、リポジトリはあいかわらずcalc という名前の最上位ディレクトリ名称を持っていることを意味します。もしtrunk, tags, そして branchesディレクトリそれぞれをリポジトリのルートディレクトリとしたければダンプファイルを編集してNode-path と Copyfrom-pathヘッダがもうこれからは先頭に calc/というパス部分を持たないようにしなくてはなりません。同様にcalcディレクトリを作ったダンプデータのセクションを削除したいでしょう。それは何か以下のような感じになっています:

Node-path: calc
Node-action: add
Node-kind: dir
Content-length: 0

警告

If you do plan on manually editing the dump file to remove a top-level directory, make sure that your editor is not set to automatically convert end-of-line characters to the native format (e.g. \r\n to \n), as the content will then not agree with the metadata. This will render the dump file useless.

この修正後に残ったファイルを使って新しい三つのリポジトリを作成することができ、それぞれのダンプファイルを正しいリポジトリにロードすることができます:

$ svnadmin create calc; svnadmin load calc < calc-dumpfile
<<< Started new transaction, based on original revision 1
     * adding path : Makefile ... done.
     * adding path : button.c ... done.
…
$ svnadmin create calendar; svnadmin load calendar < cal-dumpfile
<<< Started new transaction, based on original revision 1
     * adding path : Makefile ... done.
     * adding path : cal.c ... done.
…
$ svnadmin create spreadsheet; svnadmin load spreadsheet < ss-dumpfile
<<< Started new transaction, based on original revision 1
     * adding path : Makefile ... done.
     * adding path : ss.c ... done.
…
$

svndumpfilterの両方のサブコマンドとも「空の」リビジョンをどのように扱うかを決めることができます。パスの変更のみを含んでいるようなリビジョンを除外すれば、空のリビジョンは興味がないか、不要なものであると考えることができます。svndumpfilterは以下のコマンドラインオプションを用意しています:

--drop-empty-revs: 空のリビジョンを生成しません— 単に無視します。
--renumber-revs: 空のリビジョンが削除された場合に(--drop-empty-revsを利用することによって)、残っているリビジョンのリビジョン番号を変更してリビジョン番号が飛ばないようにします。
--preserve-revprops: 空のリビジョンが削除されない場合に、それら空のリビジョンに関するリビジョン属性(ログメッセージ、変更者、日付、カスタム属性、など)を保存します。そうでなければ、空のリビジョンは元のタイムスタンプと、このリビジョンはsvndumpfilterによって空にされたということを示す自動生成されたログメッセージのみを含むことになります。

svndumpfilter は非常に便利で、作業を省力化してくれますが、残念なことにいろいろな問題もあります。まずこのユーティリティーはパスの構文に極端に敏感です。ダンプファイル中のパスが先頭にスラッシュを含んでいるかどうかに注意してください。Node-path と Copyfrom-pathヘッダを確認する必要があるかも知れません。

…
Node-path: spreadsheet/Makefile
…

パスの先頭にスラッシュがある場合、svndumpfilter include と svndumpfilter excludeに渡すパスの先頭にスラッシュを含める必要があります(そして、逆にスラッシュがないなら含めてはいけません)。さらにダンプファイルの先頭のスラッシュが何かの理由で矛盾している場合には ^[36] おそらく、すべてをスラッシュ付きにするか、その逆にするような正規化をパスに対して施す必要があります。

Also, copied paths can give you some trouble. Subversion supports copy operations in the repository, where a new path is created by copying some already existing path. It is possible that at some point in the lifetime of your repository, you might have copied a file or directory from some location that svndumpfilter is excluding, to a location that it is including. In order to make the dump data self-sufficient, svndumpfilter needs to still show the addition of the new path—including the contents of any files created by the copy—and not represent that addition as a copy from a source that won't exist in your filtered dump data stream. But because the Subversion repository dump format only shows what was changed in each revision, the contents of the copy source might not be readily available. If you suspect that you have any copies of this sort in your repository, you might want to rethink your set of included/excluded paths, perhaps including the paths that served as sources of your troublesome copy operations, too.

Finally, svndumpfilter takes path filtering quite literally. If you are trying to copy the history of a project rooted at trunk/my-project and move it into a repository of its own, you would, of course, use the svndumpfilter include command to keep all the changes in and under trunk/my-project. But the resulting dump file makes no assumptions about the repository into which you plan to load this data. Specifically, the dump data might begin with the revision which added the trunk/my-project directory, but it will not contain directives which would create the trunk directory itself (because trunk doesn't match the include filter). You'll need to make sure that any directories which the new dump stream expect to exist actually do exist in the target repository before trying to load the stream into that repository.

リポジトリの複製

There are several scenarios in which it is quite handy to have a Subversion repository whose version history is exactly the same as some other repository's. Perhaps the most obvious one is the maintenance of a simple backup repository, used when the primary repository has become inaccessible due to a hardware failure, network outage, or other such annoyance. Other scenarios include deploying mirror repositories to distribute heavy Subversion load across multiple servers, use as a soft-upgrade mechanism, and so on.

As of version 1.4, Subversion provides a program for managing scenarios like these—svnsync. svnsync works by essentially asking the Subversion server to 「replay」 revisions, one at a time. It then uses that revision information to mimic a commit of the same to another repository. Neither repository needs to be locally accessible to machine on which svnsync is running—its parameters are repository URLs, and it does all its work through Subversion's repository access (RA) interfaces. All it requires is read access to the source repository and read/write access to the destination repository.

注意

When using svnsync against a remote source repository, the Subversion server for that repository must be running Subversion version 1.4 or better.

Assuming you already have a source repository that you'd like to mirror, the next thing you need is an empty target repository which will actually serve as that mirror. This target repository can use either of the available filesystem data-store back-ends (see リポジトリ保存形式の選択項), but it must not yet have any version history in it. The protocol via which svnsync communicates revision information is highly sensitive to mismatches between the versioned histories contained in the source and target repositories. For this reason, while svnsync cannot demand that the target repository be read-only, ^[37] allowing the revision history in the target repository to change by any mechanism other than the mirroring process is a recipe for disaster.

警告

Do not modify a mirror repository in such a way as to cause its version history to deviate from that of the repository it mirrors. The only commits and revision property modifications that ever occur on that mirror repository should be those performed by the svnsync tool.

Another requirement of the target repository is that the svnsync process be allowed to modify certain revision properties. svnsync stores its bookkeeping information in special revision properties on revision 0 of the destination repository. Because svnsync works within the framework of that repository's hook system, the default state of the repository (which is to disallow revision property changes; see pre-revprop-change) is insufficient. You'll need to explicitly implement the pre-revprop-change hook, and your script must allow svnsync to set and change its special properties. With those provisions in place, you are ready to start mirroring repository revisions.

ティップ

It's a good idea to implement authorization measures which allow your repository replication process to perform its tasks while preventing other users from modifying the contents of your mirror repository at all.

Let's walk through the use of svnsync in a somewhat typical mirroring scenario. We'll pepper this discourse with practical recommendations which you are free to disregard if they aren't required by or suitable for your environment.

As a service to the fine developers of our favorite version control system, we will be mirroring the public Subversion source code repository and exposing that mirror publicly on the Internet, hosted on a different machine than the one on which the original Subversion source code repository lives. This remote host has a global configuration which permits anonymous users to read the contents of repositories on the host, but requires users to authenticate in order to modify those repositories. (Please forgive us for glossing over the details of Subversion server configuration for the moment—those are covered thoroughly in 第6章.) And for no other reason than that it makes for a more interesting example, we'll be driving the replication process from a third machine, the one which we currently find ourselves using.

First, we'll create the repository which will be our mirror. This and the next couple of steps do require shell access to the machine on which the mirror repository will live. Once the repository is all configured, though, we shouldn't need to touch it directly again.

$ ssh admin@svn.example.com \
      "svnadmin create /path/to/repositories/svn-mirror"
admin@svn.example.com's password: ********
$

At this point, we have our repository, and due to our server's configuration, that repository is now 「live」 on the Internet. Now, because we don't want anything modifying the repository except our replication process, we need a way to distinguish that process from other would-be committers. To do so, we use a dedicated username for our process. Only commits and revision property modifications performed by the special username syncuser will be allowed.

We'll use the repository's hook system both to allow the replication process to do what it needs to do, and to enforce that only it is doing those things. We accomplish this by implementing two of the repository event hooks—pre-revprop-change and start-commit. Our pre-revprop-change hook script is found in 例 5.2. 「ミラーリポジトリの pre-revprop-change フックスクリプト」, and basically verifies that the user attempting the property changes is our syncuser user. If so, the change is allowed; otherwise, it is denied.

例 5.2. ミラーリポジトリの pre-revprop-change フックスクリプト

#!/bin/sh 

USER="$3"

if [ "$USER" = "syncuser" ]; then exit 0; fi

echo "Only the syncuser user may change revision properties" 
>&2
exit 1

That covers revision property changes. Now we need to ensure that only the syncuser user is permitted to commit new revisions to the repository. We do this using a start-commit hook scripts like the one in 例 5.3. 「ミラーリポジトリの start-commit フックスクリプト」.

例 5.3. ミラーリポジトリの start-commit フックスクリプト

#!/bin/sh 

USER="$2"

if [ "$USER" = "syncuser" ]; then exit 0; fi

echo "Only the syncuser user may commit new revisions" 
>&2
exit 1

After installing our hook scripts and ensuring that they are executable by the Subversion server, we're finished with the setup of the mirror repository. Now, we get to actually do the mirroring.

The first thing we need to do with svnsync is to register in our target repository the fact that it will be a mirror of the source repository. We do this using the svnsync initialize subcommand. Note that the various svnsync subcommands provide several of the same authentication-related options that svn does: --username, --password, --non-interactive, --config-dir, and --no-auth-cache.

$ svnsync help init
initialize (init): usage: svnsync initialize DEST_URL SOURCE_URL

Initialize a destination repository for synchronization from
another repository.

The destination URL must point to the root of a repository with
no committed revisions.  The destination repository must allow
revision property changes.

You should not commit to, or make revision property changes in,
the destination repository by any method other than 'svnsync'.
In other words, the destination repository should be a read-only
mirror of the source repository.

Valid options:
  --non-interactive        : do no interactive prompting
  --no-auth-cache          : do not cache authentication tokens
  --username arg           : specify a username ARG
  --password arg           : specify a password ARG
  --config-dir arg         : read user configuration files from directory ARG

$ svnsync initialize http://svn.example.com/svn-mirror \
                     http://svn.collab.net/repos/svn \
                     --username syncuser --password syncpass
Copied properties for revision 0.
$

Our target repository will now remember that it is a mirror of the public Subversion source code repository. Notice that we provided a username and password as arguments to svnsync—that was required by the pre-revprop-change hook on our mirror repository.

注意

The URLs provided to svnsync must point to the root directories of the target and source repositories, respectively. The tool does not handle mirroring of repository subtrees.

注意

The initial release of svnsync (in Subversion 1.4) has a small shortcoming—the values given to the --username and --password command-line options get used for authentication against both the source and destination repositories. Obviously, there's no guarantee that the synchronizing user's credentials are the same in both places. In the event that they are not the same, users trying to run svnsync in non-interactive mode (with the --non-interactive option) might experience problems.

And now comes the fun part. With a single subcommand, we can tell svnsync to copy all the as-yet-unmirrored revisions from the source repository to the target. ^[38] The svnsync synchronize subcommand will peek into the special revision properties previously stored on the target repository, and determine what repository it is mirroring and that the most recently mirrored revision was revision 0. Then it will query the source repository and determine what the latest revision in that repository is. Finally, it asks the source repository's server to start replaying all the revisions between 0 and that latest revision. As svnsync get the resulting response from the source repository's server, it begins forwarding those revisions to the target repository's server as new commits.

$ svnsync help synchronize
synchronize (sync): 使用方法: svnsync synchronize <同期先 URL>

同期元から同期先へ、未同期のリビジョンを転送します。
…
$ svnsync synchronize http://svn.example.com/svn-mirror \
                      --username syncuser --password syncpass
Committed revision 1.
Copied properties for revision 1.
Committed revision 2.
Copied properties for revision 2.
Committed revision 3.
Copied properties for revision 3.
…
Committed revision 23406.
Copied properties for revision 23406.
Committed revision 23407.
Copied properties for revision 23407.
Committed revision 23408.
Copied properties for revision 23408.

Of particular interest here is that for each mirrored revision, there is first a commit of that revision to the target repository, and then property changes follow. This is because the initial commit is performed by (and attributed to) the user syncuser, and datestamped with the time as of that revision's creation. Also, Subversion's underlying repository access interfaces don't provide a mechanism for setting arbitrary revision properties as part of a commit. So svnsync follows up with an immediate series of property modifications which copy all the revision properties found for that revision in the source repository into the target repository. This also has the effect of fixing the author and datestamp of the revision to match that of the source repository.

Also noteworthy is that svnsync performs careful bookkeeping that allows it to be safely interrupted and restarted without ruining the integrity of the mirrored data. If a network glitch occurs while mirroring a repository, simply repeat the svnsync synchronize command and it will happily pick up right where it left off. In fact, as new revisions appear in the source repository, this is exactly what you to do in order to keep your mirror up-to-date.

There is, however, one bit of inelegance in the process. Because Subversion revision properties can be changed at any time throughout the lifetime of the repository, and don't leave an audit trail that indicates when they were changed, replication processes have to pay special attention to them. If you've already mirrored the first 15 revisions of a repository and someone then changes a revision property on revision 12, svnsync won't know to go back and patch up its copy of revision 12. You'll need to tell it to do so manually by using (or with some additionally tooling around) the svnsync copy-revprops subcommand, which simply re-replicates all the revision properties for a particular revision.

$ svnsync help copy-revprops
copy-revprops: 使用方法: svnsync copy-revprops <同期先 URL> <REV>

同期元から同期先へ、リビジョン <REV> のリビジョン属性をすべてコピーします。
…
$ svnsync copy-revprops http://svn.example.com/svn-mirror 12 \
                        --username syncuser --password syncpass
Copied properties for revision 12.
$

That's repository replication in a nutshell. You'll likely want some automation around such a process. For example, while our example was a pull-and-push setup, you might wish to have your primary repository push changes to one or more blessed mirrors as part of its post-commit and post-revprop-change hook implementations. This would enable the mirror to be up-to-date in as near to realtime as is likely possible.

Also, while it isn't very commonplace to do so, svnsync does gracefully mirror repositories in which the user as whom it authenticates only has partial read access. It simply copies only the bits of the repository that it is permitted to see. Obviously such a mirror is not useful as a backup solution.

As far as user interaction with repositories and mirrors goes, it is possible to have a single working copy that interacts with both, but you'll have to jump through some hoops to make it happen. First, you need to ensure that both the primary and mirror repositories have the same repository UUID (which is not the case by default). You can set the mirror repository's UUID by loading a dump file stub into it which contains the UUID of the primary repository, like so:

$ cat - <<EOF | svnadmin load --force-uuid dest
SVN-fs-dump-format-version: 2

UUID: 65390229-12b7-0310-b90b-f21a5aa7ec8e
EOF
$

Now that the two repositories have the same UUID, you can use svn switch --relocate to point your working copy to whichever of the repositories you wish to operate against, a process which is described in svn switch. There is a possible danger here, though, in that if the primary and mirror repositories aren't in close synchronization, a working copy up-to-date with, and pointing to, the primary repository will, if relocated to point to an out-of-date mirror, become confused about the apparent sudden loss of revisions it fully expects to be present, and throws errors to that effect. If this occurs, you can relocate your working copy back to the primary repository and then either wait until the mirror repository is up-to-date, or backdate your working copy to a revision you know is present in the sync repository and then retry the relocation.

Finally, be aware that the revision-based replication provided by svnsync is only that—replication of revisions. It does not include such things as the hook implementations, repository or server configuration data, uncommitted transactions, or information about user locks on repository paths. Only information carried by the Subversion repository dump file format is available for replication.

リポジトリのバックアップ

現代的なコンピュータが生まれてから技術的には非常に発展してきたものの、残念なことに、一つのことだけは間違いなく真実です—ときどき、ものごとはまったく台無しになってしまう、ということです。停電、ネットワーク切断、 RAMの破壊、ハードディスクのクラッシュは、魔物以外の何者でもありません。運命は最も優れた管理者にさえ降りかかるのです。それで、とても重要なトピックに行き着きます—どうやってリポジトリのバックアップをとるか、です。

There are two types of backup methods available for Subversion repository administrators—full and incremental. A full backup of the repository involves squirreling away in one sweeping action all the information required to fully reconstruct that repository in the event of a catastrophe. Usually, it means, quite literally, the duplication of the entire repository directory (which includes either a Berkeley DB or FSFS environment). Incremental backups are lesser things, backups of only the portion of the repository data that has changed since the previous backup.

As far as full backups go, the naive approach might seem like a sane one, but unless you temporarily disable all other access to your repository, simply doing a recursive directory copy runs the risk of generating a faulty backup. In the case of Berkeley DB, the documentation describes a certain order in which database files can be copied that will guarantee a valid backup copy. A similar ordering exists for FSFS data. But you don't have to implement these algorithms yourself, because the Subversion development team has already done so. The svnadmin hotcopy command takes care of the minutia involved in making a hot backup of your repository. And its invocation is as trivial as Unix's cp or Windows' copy operations:

$ svnadmin hotcopy /path/to/repos /path/to/repos-backup

結果のバックアップは、完全に機能するSubversionリポジトリで、現行のリポジトリが何かひどいことになったときには、置き換えて使うことができるものです。

When making copies of a Berkeley DB repository, you can even instruct svnadmin hotcopy to purge any unused Berkeley DB logfiles (see Purging unused Berkeley DB logfiles項) from the original repository upon completion of the copy. Simply provide the --clean-logs option on the command-line.

$ svnadmin hotcopy --clean-logs /path/to/bdb-repos /path/to/bdb-repos-backup

Additional tooling around this command is available, too. The tools/backup/ directory of the Subversion source distribution holds the hot-backup.py script. This script adds a bit of backup management atop svnadmin hotcopy, allowing you to keep only the most recent configured number of backups of each repository. It will automatically manage the names of the backed-up repository directories to avoid collisions with previous backups, and will 「rotate off」 older backups, deleting them so only the most recent ones remain. Even if you also have an incremental backup, you might want to run this program on a regular basis. For example, you might consider using hot-backup.py from a program scheduler (such as cron on Unix systems) which will cause it to run nightly (or at whatever granularity of Time you deem safe).

Some administrators use a different backup mechanism built around generating and storing repository dump data. We described in リポジトリデータを別の場所へ移動項 how to use svnadmin dump --incremental to perform an incremental backup of a given revision or range of revisions. And of course, there is a full backup variation of this achieved by omitting the --incremental option to that command. There is some value in these methods, in that the format of your backed-up information is flexible—it's not tied to a particular platform, versioned filesystem type, or release of Subversion or Berkeley DB. But that flexibility comes at a cost, namely that restoring that data can take a long time—longer with each new revision committed to your repository. Also, as is the case with so many of the various backup methods, revision property changes made to already-backed-up revisions won't get picked up by a non-overlapping, incremental dump generation. For these reasons, we recommend against relying solely on dump-based backup approaches.

As you can see, each of the various backup types and methods has its advantages and disadvantages. The easiest is by far the full hot backup, which will always result in a perfect working replica of your repository. Should something bad happen to your live repository, you can restore from the backup with a simple recursive directory copy. Unfortunately, if you are maintaining multiple backups of your repository, these full copies will each eat up just as much disk space as your live repository. Incremental backups, by contrast, tend to be quicker to generate and smaller to store. But the restoration process can be a pain, often involving applying multiple incremental backups. And other methods have their own peculiarities. Administrators need to find the balance between the cost of making the backup and the cost of restoring it.

The svnsync program (see リポジトリの複製項) actually provides a rather handy middle-ground approach. If you are regularly synchronizing a read-only mirror with your main repository, then in a pinch, your read-only mirror is probably a good candidate for replacing that main repository if it falls over. The primary disadvantage of this method is that only the versioned repository data gets synchronized—repository configuration files, user-specified repository path locks, and other items which might live in the physical repository directory but not inside the repository's virtual versioned filesystem are not handled by svnsync.

In any backup scenario, repository administrators need to be aware of how modifications to unversioned revision properties affect their backups. Since these changes do not themselves generate new revisions, they will not trigger post-commit hooks, and may not even trigger the pre-revprop-change and post-revprop-change hooks. ^[39] And since you can change revision properties without respect to chronological order—you can change any revision's properties at any time—an incremental backup of the latest few revisions might not catch a property modification to a revision that was included as part of a previous backup.

Generally speaking, only the truly paranoid would need to backup their entire repository, say, every time a commit occurred. However, assuming that a given repository has some other redundancy mechanism in place with relatively fine granularity (like per-commit emails or incremental dumps), a hot backup of the database might be something that a repository administrator would want to include as part of a system-wide nightly backup. It's your data—protect it as much as you'd like.

Often, the best approach to repository backups is a diversified one which leverages combinations of the methods described here. The Subversion developers, for example, back up the Subversion source code repository nightly using hot-backup.py and an offsite rsync of those full backups; keep multiple archives of all the commit and property change notification emails; and have repository mirrors maintained by various volunteers using svnsync. Your solution might be similar, but should be catered to your needs and that delicate balance of convenience with paranoia. And whatever you do, validate your backups from time to time—what good is a spare tire that has a hole in it? While all of this might not save your hardware from the iron fist of Fate, ^[40] it should certainly help you recover from those trying times.

^[32]Or is that, the 「sync」?

^[33]E.g.: hard drive + huge electromagnet = disaster.

^[34]That's rather the reason you use version control at all, right?

^[35]Conscious, cautious removal of certain bits of versioned data is actually supported by real use-cases. That's why an 「obliterate」 feature has been one of the most highly requested Subversion features, and one which the Subversion developers hope to soon provide.

^[36] svnadmin dump は先頭スラッシュに関して一貫したポリシーがありますが— 付けないようにするというものです— データをダンプするほかのプログラムはそれほど一貫していません。

^[37]In fact, it can't truly be read-only, or svnsync itself would have a tough time copying revision history into it.

^[38]Be forewarned that while it will take only a few seconds for the average reader to parse this paragraph and the sample output which follows it, the actual time required to complete such a mirroring operation is, shall we say, quite a bit longer.

^[39]svnadmin setlog can be called in a way that bypasses the hook interface altogether.

^[40]You know—the collective term for all of her 「fickle fingers」.

前のページ	上に戻る	次のページ
Creating and Configuring Your Repository	ホーム	まとめ