Skip to content

MDEV-14992 BACKUP SERVER#4817

Open
dr-m wants to merge 3 commits into
13.0from
MDEV-14992
Open

MDEV-14992 BACKUP SERVER#4817
dr-m wants to merge 3 commits into
13.0from
MDEV-14992

Conversation

@dr-m

@dr-m dr-m commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

The following SQL statements will be introduced:

BACKUP SERVER TO '/path/to/directory';
BACKUP SERVER TO '/path/to/directory' 1 CONCURRENT;
BACKUP SERVER WITH 'command';
BACKUP SERVER WITH 1 CONCURRENT 'command';

In place of the 1, any positive number of threads may be specified. For the first variant, '/path/to' must exist and '/path/to/directory' must not exist; that is where the backup will be written to.

For the second variant, 'command' must be the name of a script or command that will be executed in a child process. The standard input of that command will be in a format that is compatible with GNU tar --format=oldgnu (and also BSD tar variants that are also part of Microsoft Windows and Apple macOS). The command is expected to optionally compress and encrypt the stream and redirect it to a file on a local or a remote server. The BACKUP SERVER WITH will append an additional argument, a positive base-ten number in ASCII, starting with 1, to identify the current thread. In this way, each concurrent stream can write a separate file.

The backup or the first stream will contain a file backup.cnf, which includes parameters needed for restoring the backup. Currently, these are innodb_log_recovery_start and innodb_log_recovery_target. If innodb_log_recovery_target>0, InnoDB will be in read-only mode, not allowing any writes to persistent files other than via the log application.

To restore a streaming backup made with BACKUP SERVER WITH, an empty directory needs to be created and all streams be extracted there using the standard tar utility of the operating system, optionally after undoing any encryption or compression that had been added by the backup command. Then, the backup is prepared or MariaDB server started up on the extracted directory, similar to as if the BACKUP SERVER TO statement had been used.

Note: The parameter innodb_log_recovery_start in backup.cnf is STRICTLY NECESSARY TO AVOID CORRUPTION! By default, InnoDB crash recovery starts from the latest available log checkpoint. However, for restoring a backup, recovery must start from the checkpoint that was the latest when the backup was started. Starting recovery from a possible later checkpoint will result in a corrupted database!

The following will be implemented separately:

MDEV-39061 mariadb-backup compatible wrapper script for BACKUP SERVER
MDEV-40163 Partial backup and restore
MDEV-39091 Back up ENGINE=RocksDB
MDEV-39092 Less blocking backup of ENGINE=Aria

The implementation introduces a basic driver Sql_cmd_backup, storage engine interfaces, and basic copying of the storage engines InnoDB, Aria, MyISAM, MERGE (MyISAM), Archive, CSV.

backup_target: A structured data type to represent a target directory. On Microsoft Windows, we must use directory paths because there is no variant of CopyFileEx() that would work on file handles.

backup_sink: Wraps a per-thread output stream as well as storage engine specific context.

handlerton::backup_start(), handlerton::backup_end(): Invoked at the start or end of a backup phase, in the thread that executes a BACKUP SERVER statement.

handlerton::backup_step(): A backup step that can be invoked from multiple threads concurrently, between the execution of the corresponding handlerton::backup_start() and handlerton::backup_end() of the same phase.

copy_entire_file(): A file copying service for POSIX systems.

copy_file(): A partial or sparse file-copying service for all systems.

backup_stream_append(): Equivalent to copy_file(), but appending to a stream. On Linux, this uses sendfile(2), which assumes that the source data will not be changed before the data has been consumed from the pipe.

backup_stream_append_async(): A variant of backup_stream_append() where the source file region is guaranteed to be immutable after the call returns. We must not use Linux sendfile(2) for copying data files that may be modified in place, because it could introduce a race condition between a page write that runs concurrently with a child process that is reading the data from the pipe.

InnoDB_backup::context: Backup context, attached to backup_sink so that context can continue to exist between the time a BACKUP SERVER releases all locks and another BACKUP SERVER starts executing, with innodb_backup pointing to the new backup, while the old backup is still being finished.

fil_space_t::write_or_backup: Keep track of in-flight page writes and pending backup operation. We must not allow them concurrently, because that could lead into torn pages in the backup.

fil_space_t::backup_end: The first page number that is not being backed up (by default 0, to indicate that no backup is in progress).

fil_space_t::BACKUP_BATCH_SIZE: The number of preceding pages that will be covered by fil_space_t::backup_end. This is the unit of "page range locking" during InnoDB backup.

log_sys.backup: Whether BACKUP SERVER is in progress. The purpose of this is to make BACKUP SERVER prevent the concurrent execution of SET GLOBAL innodb_log_archive=OFF or SET GLOBAL innodb_log_file_size when innodb_log_archive=OFF.

log_sys.archived_checkpoint: Keep track of the earliest available checkpoint, corresponding to log_sys.archived_lsn. This reflects SET GLOBAL innodb_log_recovery_start (which is settable now), for incremental backup.

buf_flush_list_space(): Check for concurrent backup before writing each page. This is inefficient, but this function may be invoked from multiple threads concurrently, and it cannot be changed easily, especially for fil_crypt_thread().

fil_system.have_all_spaces: Whether all tablespace metadata is guaranteed to be known. To speed up startup, InnoDB does not normally open all tablespace files.

@dr-m dr-m self-assigned this Mar 17, 2026
@CLAassistant

CLAassistant commented Mar 17, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@dr-m dr-m force-pushed the MDEV-14992 branch 2 times, most recently from 2723322 to 1703796 Compare March 18, 2026 11:01
Comment thread sql/sql_backup.cc
@dr-m dr-m force-pushed the MDEV-14992 branch 2 times, most recently from 9a529de to 857edeb Compare March 23, 2026 08:28
@dr-m dr-m changed the base branch from 11.4 to 12.3 March 24, 2026 11:51
@dr-m dr-m force-pushed the MDEV-14992 branch 3 times, most recently from 8149b3d to c08d121 Compare March 27, 2026 09:48
Comment thread storage/innobase/handler/backup_innodb.cc Outdated
Comment thread mysql-test/suite/backup/backup_innodb.test
@dr-m dr-m changed the base branch from 12.3 to main May 5, 2026 10:49
Comment thread sql/sql_backup.cc Outdated
Comment thread storage/innobase/handler/backup_innodb.cc Outdated
Comment thread storage/innobase/buf/buf0flu.cc
Comment thread storage/maria/ma_backup.cc Outdated
Comment thread storage/maria/ma_backup_server.cc Outdated
Comment thread storage/innobase/handler/backup_innodb.cc
@dr-m dr-m changed the base branch from main to 13.0 June 26, 2026 12:54
@dr-m

dr-m commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

I plan to rebase this once #5070 has been merged up to the 13.0 branch. @grooverdan pushed a merge to 10.11 today, and I pushed to 11.4 and 11.8. I hope that the conflicts for 12.3 and potentially 13.0 will have be resolved by Monday.

The ultimate merge target is main. For testing, it is better to be based on the oldest maintained branch that includes #4405, which forms the fundament for this, innodb_log_archive=ON.

While rebasing, I will write a description based on the commit message of 4769a43, but mentioning actual MDEVs for the outstanding work. Soon after the rebase, we can include #5140 so that this can be tested more conveniently.

The following SQL statements will be introduced:

BACKUP SERVER TO '/path/to/directory' [ 1 CONCURRENT ];
BACKUP SERVER WITH [ 1 CONCURRENT ] 'command';

In place of the 1, any positive number of threads may be specified.
For the first variant, '/path/to' must exist and '/path/to/directory'
must not exist; that is where the backup will be written to.

For the second variant, 'command' must be the name of a script or
command that will be executed in a child process. The standard input
of that command will be in a format that is compatible with
GNU tar --format=oldgnu (and also BSD tar variants that are also part of
Microsoft Windows and Apple macOS). The command is expected to optionally
compress and encrypt the stream and redirect it to a file on a local or
a remote server. The BACKUP SERVER WITH will append an additional argument,
a positive base-ten number in ASCII, starting with 1, to identify the
current thread. In this way, each concurrent stream can write a separate
file.

The backup or the first stream will contain a file backup.cnf, which
includes parameters needed for restoring the backup. Currently,
these are innodb_log_recovery_start and innodb_log_recovery_target.
If innodb_log_recovery_target>0, InnoDB will be in read-only mode,
not allowing any writes to persistent files other than via the log
application.

To restore a streaming backup made with BACKUP SERVER WITH, an empty
directory needs to be created and all streams be extracted there using
the standard tar utility of the operating system, optionally after
undoing any encryption or compression that had been added by the
backup command. Then, the backup is prepared or MariaDB server started
up on the extracted directory, similar to as if the BACKUP SERVER TO
statement had been used.

Note: The parameter innodb_log_recovery_start in backup.cnf is
STRICTLY NECESSARY TO AVOID CORRUPTION! By default, InnoDB crash recovery
starts from the latest available log checkpoint. However, for restoring
a backup, recovery must start from the checkpoint that was the latest
when the backup was started. Starting recovery from a possible later
checkpoint will result in a corrupted database!

The following will be implemented separately:

MDEV-39061 mariadb-backup compatible wrapper script for BACKUP SERVER
MDEV-40163 Partial backup and restore
MDEV-39091 Back up ENGINE=RocksDB
MDEV-39092 Less blocking backup of ENGINE=Aria

The implementation introduces a basic driver Sql_cmd_backup,
storage engine interfaces, and basic copying of the storage engines
InnoDB, Aria, MyISAM, MERGE (MyISAM), Archive, CSV.

backup_target: A structured data type to represent a target directory.
On Microsoft Windows, we must use directory paths because there is
no variant of CopyFileEx() that would work on file handles.

backup_sink: Wraps a per-thread output stream as well as storage engine
specific context.

handlerton::backup_start(), handlerton::backup_end(): Invoked at the
start or end of a backup phase, in the thread that executes a
BACKUP SERVER statement.

handlerton::backup_step(): A backup step that can be invoked from
multiple threads concurrently, between the execution of the corresponding
handlerton::backup_start() and handlerton::backup_end() of the same
phase.

copy_entire_file(): A file copying service for POSIX systems.

copy_file(): A partial or sparse file-copying service for all systems.

backup_stream_append(): Equivalent to copy_file(), but appending to
a stream. On Linux, this uses sendfile(2), which assumes that the
source data will not be changed before the data has been consumed
from the pipe.

backup_stream_append_async(): A variant of backup_stream_append()
where the source file region is guaranteed to be immutable after the
call returns. We must not use Linux sendfile(2) for copying data files
that may be modified in place, because it could introduce a race
condition between a page write that runs concurrently with a child process
that is reading the data from the pipe.

InnoDB_backup::context: Backup context, attached to backup_sink
so that context can continue to exist between the time a
BACKUP SERVER releases all locks and another BACKUP SERVER starts
executing, with innodb_backup pointing to the new backup, while
the old backup is still being finished.

fil_space_t::write_or_backup: Keep track of in-flight page writes and
pending backup operation. We must not allow them concurrently, because
that could lead into torn pages in the backup.

fil_space_t::backup_end: The first page number that is not being backed up
(by default 0, to indicate that no backup is in progress).

fil_space_t::BACKUP_BATCH_SIZE: The number of preceding pages that will be
covered by fil_space_t::backup_end. This is the unit of "page range locking"
during InnoDB backup.

log_sys.backup: Whether BACKUP SERVER is in progress. The purpose of this
is to make BACKUP SERVER prevent the concurrent execution of
SET GLOBAL innodb_log_archive=OFF or SET GLOBAL innodb_log_file_size
when innodb_log_archive=OFF.

log_sys.archived_checkpoint: Keep track of the earliest available
checkpoint, corresponding to log_sys.archived_lsn. This reflects
SET GLOBAL innodb_log_recovery_start (which is settable now), for
incremental backup.

buf_flush_list_space(): Check for concurrent backup before writing each
page. This is inefficient, but this function may be invoked from multiple
threads concurrently, and it cannot be changed easily, especially for
fil_crypt_thread().

fil_system.have_all_spaces: Whether all tablespace metadata is guaranteed
to be known. To speed up startup, InnoDB does not normally open
all tablespace files.
@dr-m dr-m requested a review from Thirunarayanan June 29, 2026 08:34
@dr-m dr-m changed the title MDEV-14992 BACKUP SERVER to mounted file system MDEV-14992 BACKUP SERVER Jun 29, 2026
@dr-m dr-m marked this pull request as ready for review June 29, 2026 08:35
dr-m added 2 commits June 29, 2026 13:51
Observe aria_log_dir_path

Patch based on code by Thirunarayanan Balathandayuthapani
Comment on lines +595 to +600
const uint32_t end{start + fil_space_t::BACKUP_BATCH_SIZE};
backup_batch_start(node->space, end);
/* TODO: avoid copying freed page ranges */
err= copy_file(node->handle, f, start * uint64_t{page_size},
std::min(end, file_size) * uint64_t{page_size});
backup_batch_stop(node->space);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a ROW_FORMAT=COMPRESSED table, then the file may be 1024, 2048, or 3172 bytes shorter than calculated, and the copying could fail. This API as well as the one in stream() must be refactored so that we will know how much was actually copied. The reason for this short file is that fil_space_extend_must_retry() will only extend files to integer multiples of 4096 bytes.

In stream() we must pad with field_ref_zero so that the file size will match what was written to the header. The last page will be recovered from the redo log.

Note: We don’t currently keep track of the file size or the allocated file size as of the checkpoint when the backup started. If we did that, we could copy even less. That could be an even more elegant fix of this. I think we would create sparse files that match the current file size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants