MySQL
Replication is based on a master-slave architecture, where updates on
the master are propagated
to the slave through binary log events via a communication channel
(aka a network connection).
A slave would have been able to connect to only a single master at
a time. If
a slave wanted to receive updates from several masters, there were a
few choices possible:
- Time sharing replication, where a slave would connect to a master for a particular time slice. (See, Mats blog here )
- Have a hierarchical replication, where a slave that is to receive updates from the several masters is at the end of replication hierarchy. ( Like, M1->M2->M3->Slave)
- Using several instances of mysqlbinlog + GTIDs. (See Luis blog here)
There
was no way for a slave to connect to multiple masters and receive
updates from these masters simultaneously. The Multisource
Replication feature aims precisely to do that. To define clearly,
Multisource replication feature for a slave server aims to:
- Receive transactions (in the form of binary log events) from several MySQL servers simultaneously via several communication channels.
- Apply these received transactions whilst not doing any conflict detection or resolution
One
can call it Nx1 replication where N masters (sources) data is
replicated to a
single server.
Note:
There is a difference between Multimaster Replication and Multisource
Replication, to be considered which is really important to take note
of, based on the definitions floating around in the MySQL
world.
Multimaster
Replication:
Implicitly, this means Multi Master Update Everywhere replication
setup i.e In this setup all the N servers form a
cluster ( i.e each server is connected to other N-1 servers). Updates could be run on any
of the servers and these are replicated to the remaining servers. This setup should
maintain ordering of the transactions, do transaction conflict detection, transaction
conflict resolution (or rollback). One can call this set up as NxN setup, where each
server is a peer of all others.
4x4 multimaster |
In contrast, the Multisource Replication feature, does
not concern itself with conflict resolution at all. This
leads us to say that, an application has to take care of data
partitions on these masters so that there
are no conflicts at the slave.
Use
cases of Multisource replication
The
main use cases of Multisource replication are related to data
aggregation. For example:
- Backup of several servers to a single place.
- Merging of table shards: Multiple shards of tables can be jointly replicated to form a single table in a slave. If all the shards are replicated to form a single global table, it is easier for the application to replace the complex cross shard joins, since all the data is located in one place only. Mainly used for informational and analytic purposes
Also, note that this feature would make any
replication topology possible. (For example, the classic Diamond Topology).
Release
MySQL
Multisource Replication feature preview was released on MySQL Labs
today (21st September, 2013). It is based on MySQL 5.7.2 DMR (which was also released today). You can download the packages from
labs.mysql.com and play with it. NOTE:
It is only a feature preview. There are things that need
improvement and things might break easily. The idea is to expose the
interface and receive feedback from the community. It is not
production ready. So, do not use it in production systems.
MySQL Labs
Multisource Replication
Goal:
A slave server should be able to replicate from several masters.
This
feature should enable a user to control/manage the transactions being
received from
each
master. How to identify each master? Instead of identifying a
master, we identify a channel that connects a master and a slave. A
'channel' encompasses the path of binary log events from the master
to the IO thread to the Relay log files to the applier threads (SQL
thread or coordinator & worker threads). This definition makes
us to assosiate replication commands without any ambiguity.
How to start a
Mulitsourced slave?
As
you might be aware, MySQL 5.6 introduced crash safe replication.
That feature makes sure that recovery happens automatically when the
slave is restarted after a crash. Essentially that feature, stores
the slave replication state in tables. (Traditionally, these were
stored in files called master.info and slave.info). Instead of
storing the replication state of several channels of a multisourced
slave, in
several *.info files. It is best to store the state in tables. These
are namely mysql.slave_master_info, mysql.slave_relay_log_info and
mysql.slave_worker. So,
to start a multisourced slave, the slave repositories should be of
type TABLE. When
repositories are of type FILE, a multisourced slave cannot be set up.
This is settable by the following options:
master_info_repository=TABLE,
relay_log_info_repository=TABLE in the cnf file.
Threads and Files
For
each channel, there is a corresponding IO thread and a set of relay
log files and applier threads. These
applier thread(s) is just a SQL thread when slave_parallel_workers==0
and is an MTS instance
when slave_parallel_workers>0. Currently, number of channels is unlimited but having a configurable parameter
would be a best thing to do.
The relaylog files and index files for each channel are named
”base-name-relay-bin-channel_name.0000x” where base-name is
generally a host name if
not specified through a user option.
Commands
We
introduced FOR CHANNEL=”<channel_name>" as an extension to
the present replication commands to control/manage the updates
received from each master.
If
FOR CHANNEL=”<channel_name>" is not given by the user, the
command acts on the default channel.
The default channel is always an “empty" string and currently,
not settable. This
is introduced mainly for the backward compatibility for a user who
doesn't find a use case
for this feature.
Following
are the commands for individual channels with a short notes. Other
commands
are
self explanatory.
- CHANGE MASTER TO master_def ….. FOR CHANNEL=”<channel_name>”Create slave info objects if not already present and set the user parameters forthe new channel.
- START SLAVE [thread_types] [until_option] [connection_options]FOR CHANNEL=”<channel_name>"
- STOP SLAVE [thread_types] FOR CHANNEL=”<channel_name>”
- RESET SLAVE [ALL] FOR CHANNEL=”<channel_name>”
- FLUSH RELAY LOGS FOR CHANNEL=”channel_name>”
- SHOW RELAY LOG EVENTS FOR CHANNEL=”<channel_name>”
- SELECT MASTER_POS_WAIT(“master_log_file' master_log_pews [, timeout][, channel_name])
Commands
to control multiple channels.
START
SLAVE [thread_type] FOR ALL CHANNELS
STOP
SLAVE [thread_type] FOR ALL CHANNELS
Status monitoring of a Multi sourced slave
How
to check the status of each channels and all the other channels?
MySQL
5.7.2 introduced Performance schema tables for replication. We use
those tables to look at the overall status of a multi sourced slave.
For introduction on Performance schema for replication tables, see Shiv's blog post here. Every
other status variable, currently supports only a default channel
(which is of
course an “empty” string channel, as mentioned above.)
Multi threaded slaves
and Multi source replication
There
are two types of multi threaded slaves in MySQL 5.7.2
1. Database partition (Inter schema MTS) and
2. Parent commit sequencing (See, Rohit's blog post here )
With
Multi source replication, there will be an instance of a multi
threaded applier for each channel. Both
forms of MTS can be used in conjunction with Multi Source
replication.
When
slave_parallel_workers > 0, an MTS instance for each channel is
automatically started.
For
example if slave_parallel_workers= k, and then there are M channels,
then for each channel there
will be a coordinator and k worker threads. In this case there will
be (1+k)*M applier threads running.
GTIDs
and Multisource replication
Multisource replication supports GTIDs and the AUTO_POSITION protocol.
A requirement as is the case with (normal master-slave
replication), is that all the masters and the slave should have
GTID_MODE=ON.
Semisynchnoous
Replication and Multisource replication
When
all the masters and the slave are in semisync mode, multi source
replication can be used with
semisynchronous replication. Currently there is no way to specify a
single channel to be in
semisync mode.
Filters
and Multisource replication
Currently,
the replication filters are global in the labs release, ie. The
applier threads of all channels will rely on a single global filter.
However, there are interesting use cases where these should
be settable per channel. Therefore the designs are likely to change
here in the area.
Summary:
MySQL
Labs contain a feature preview of Multisource replication based on
MySQL. 5.7.2.
Try
it out and if you have feature requests and comments about the interface or about the big
picture, please note them down in the comments section.
“The views
expressed on this blog are my own and do not necessarily reflect the views of Oracle."
No comments:
Post a Comment