Saturday, September 21, 2013

Feature Preview: MySQL Multi Source Replication


MySQL Replication is based on a master-slave architecture, where updates on the master are propagated to the slave through binary log events via a communication channel (aka a network connection). A slave would have been able to connect to only a single master at a time. If a slave wanted to receive updates from several masters, there were a few choices possible:
  1. Time sharing replication, where a slave would connect to a master for a particular time slice.   (See, Mats blog here )
  2. Have a hierarchical replication, where a slave that is to receive updates from the several masters is at the end of replication hierarchy. ( Like, M1->M2->M3->Slave)
  3. Using several instances of mysqlbinlog + GTIDs. (See Luis blog  here)
There was no way for a slave to connect to multiple masters and receive updates from these masters simultaneously. The Multisource Replication feature aims precisely to do that. To define clearly, Multisource replication feature for a slave server aims to:
  1. Receive transactions (in the form of binary log events) from several MySQL servers simultaneously via several communication channels.
  2. Apply these received transactions whilst not doing any conflict detection or resolution
One can call it Nx1 replication where N masters (sources) data is replicated to  a single server.


















Note: There is a difference between Multimaster Replication and Multisource Replication, to be considered which is really important to take note of, based on the definitions floating around in the MySQL world.
Multimaster Replication: Implicitly, this means Multi Master Update Everywhere replication setup i.e In this setup all the N servers form a cluster ( i.e each server is connected to other N-1 servers). Updates could be run on any of the servers and these are replicated to the remaining servers. This setup should maintain ordering of the transactions, do transaction conflict detection, transaction conflict resolution (or rollback). One can call this set up as NxN setup, where each server is a peer of all others.


4x4 multimaster




















In contrast, the Multisource Replication feature, does not concern itself with conflict resolution at all. This leads us to say that, an application has to take care of data partitions on these masters so that there are no conflicts at the slave.

Use cases of Multisource replication
The main use cases of Multisource replication are related to data aggregation. For example:
  1. Backup of several servers to a single place.
  2. Merging of table shards: Multiple shards of tables can be jointly replicated to form a single table in a slave. If all the shards are replicated to form a single global table, it is easier for the application to replace the complex cross shard joins, since all the data is located in one place only. Mainly used for informational and analytic purposes
Also, note that this feature would make any replication topology possible. (For example, the classic Diamond Topology).

Release
MySQL Multisource Replication feature preview was released on MySQL Labs today (21st September, 2013). It is based on MySQL 5.7.2 DMR (which was also released today). You can download the packages from labs.mysql.com and play with it. NOTE: It is only a feature preview. There are things that need improvement and things might break easily. The idea is to expose the interface and receive feedback from the community. It is not production ready. So, do not use it in production systems.


MySQL Labs Multisource Replication

Goal: A slave server should be able to replicate from several masters.
This feature should enable a user to control/manage the transactions being received from
each master. How to identify each master? Instead of identifying a master, we identify a channel that connects a master and a slave. A 'channel' encompasses the path of binary log events from the master to the IO thread to the Relay log files to the applier threads (SQL thread or coordinator & worker threads). This definition makes us to assosiate replication commands without any ambiguity.













How to start a Mulitsourced slave?

As you might be aware, MySQL 5.6 introduced crash safe replication. That feature makes sure that recovery happens automatically when the slave is restarted after a crash. Essentially that feature, stores the slave replication state in tables. (Traditionally, these were stored in files called master.info and slave.info). Instead of storing the replication state of several channels of a multisourced slave, in several *.info files. It is best to store the state in tables. These are namely mysql.slave_master_info, mysql.slave_relay_log_info and mysql.slave_worker. So, to start a multisourced slave, the slave repositories should be of type TABLE. When repositories are of type FILE, a multisourced slave cannot be set up.
This is settable by the following options:
 master_info_repository=TABLE, relay_log_info_repository=TABLE in the cnf file.

Threads and Files
For each channel, there is a corresponding IO thread and a set of relay log files and applier threads. These applier thread(s) is just a SQL thread when slave_parallel_workers==0 and is an MTS instance when slave_parallel_workers>0. Currently, number of channels is unlimited but having a configurable parameter would be a best thing to do.
The relaylog files and index files for each channel are named
 ”base-name-relay-bin-channel_name.0000x” where base-name is generally a host name if not specified through a user option.

Commands

We introduced FOR CHANNEL=”<channel_name>"  as an extension to the present replication commands to control/manage the updates received from each master.
If FOR CHANNEL=”<channel_name>" is not given by the user, the command acts on the default channel. The default channel is always an “empty" string and currently, not settable. This is introduced mainly for the backward compatibility for a user who doesn't find a use case for this feature.

Following are the commands for individual channels with a short notes. Other commands
are self explanatory.
  • CHANGE MASTER TO master_def ….. FOR CHANNEL=”<channel_name>”
      Create slave info objects if not already present and set the user parameters for 
       the new channel.
  • START SLAVE [thread_types] [until_option] [connection_options]
                FOR CHANNEL=”<channel_name>"
  • STOP SLAVE [thread_types] FOR CHANNEL=”<channel_name>”
  • RESET SLAVE [ALL] FOR CHANNEL=”<channel_name>”
  • FLUSH RELAY LOGS FOR CHANNEL=”channel_name>”
  • SHOW RELAY LOG EVENTS FOR CHANNEL=”<channel_name>”
  • SELECT MASTER_POS_WAIT(“master_log_file' master_log_pews [, timeout][, channel_name])
    Commands to control multiple channels.
          START SLAVE [thread_type] FOR ALL CHANNELS
          STOP SLAVE [thread_type] FOR ALL CHANNELS

Status monitoring of a Multi sourced slave
How to check the status of each channels and all the other channels?
MySQL 5.7.2 introduced Performance schema tables for replication. We use those tables to look at the overall status of a multi sourced slave. For introduction on Performance schema for replication tables, see Shiv's blog post hereEvery other status variable, currently supports only a default channel (which is of course an “empty” string channel, as mentioned above.)

Multi threaded slaves and Multi source replication
There are two types  of multi threaded slaves in MySQL 5.7.2
  1. Database partition (Inter schema MTS) and
  2. Parent commit sequencing (See, Rohit's blog post here )
 With Multi source replication, there will be an instance of a multi threaded applier for each channel. Both forms of MTS can be used in conjunction with Multi Source replication.

When slave_parallel_workers > 0, an MTS instance for each channel is automatically started.
For example if slave_parallel_workers= k, and then there are M channels, then for each channel there will be a coordinator and k worker threads. In this case there will be (1+k)*M applier threads running.

GTIDs and Multisource replication
Multisource replication supports GTIDs and the AUTO_POSITION protocol. A requirement as is the case with (normal master-slave replication), is that all the masters and the slave should have GTID_MODE=ON.

Semisynchnoous Replication and Multisource replication
When all the masters and the slave are in semisync mode, multi source replication can be used with semisynchronous replication. Currently there is no way to specify a single channel to be in semisync mode.

Filters and Multisource replication
Currently, the replication filters are global in the labs release, ie. The applier threads of all channels will rely on a single global filter. However, there are interesting use cases where these should be settable per channel. Therefore the designs are likely to change here in the area.

Summary:
MySQL Labs contain a feature preview of Multisource replication based on MySQL. 5.7.2.
Try it out and if you have feature requests and comments about the interface or  about the big picture, please note them down in the comments section.


The views expressed on this blog are my own and do not  necessarily reflect the views of Oracle." 


No comments:

Post a Comment