Friday, November 15, 2013

Parallel replication: off by one

One of the most common errors in development is where a loop or a retrieval by index falls short or long by one unit, usually because of an oversight or a logic in coding.

Of the following snippets, which one will run 10 times?

/* #1 */    for (N = 0 ; N < 10; N++) printf("%d\n", N);

/* #2 */    for (N = 0 ; N <= 10; N++) printf("%d\n", N); 

/* #3 */    for (N = 1 ; N <= 10; N++) printf("%d\n", N);

/* #4 */    for (N = 1 ; N < 10; N++) printf("%d\n", N);

The question is deceptive, as there are two snippets that will run 10 times (1 and 3). But they will print different numbers. If you ware aiming for numbers from 1 to 10, only #3 is good.

After many years of programming, off-by-one errors are rare in my code, and I have been able to spot them or prevent them at first sight. That’s why I feel uneasy when I look at the way parallel replication is enabled in MySQL 5.6,5.7 and MariaDB 10.0.5. In both cases, there is a variable that sets the number of replication threads:

set global slave_parallel_workers=5 in MySQL
set global slave_parallel_threads=5 in mariadb

Yet, for both implementations, you can set the number of threads to 1, and it is not the same as disabling parallel replication.

set global slave_parallel_workers=1 in MySQL
set global slave_parallel_threads=1 in mariadb

It will run parallel replication with one thread, meaning that you will have all the overhead of parallel replication with none of the benefits. Not only that, but replication actually slows down. The extra channel reduces performance by 7% in MariaDB and 10% in MySQL.

Now for the punch line. In Tungsten-Replicator, to disable parallel replication you set the number of channels to 1 (the intuitive value). If you set it to 0, the setup fails, as it should, since there would be no replication without channels. The reason for the fit is that in Tungsten, parallel replication was designed around the core functionality, while in MySQl and MariaDB it is an added feature that struggles to be integrated.

2 comments:

Jan Steinman said...

Queries that use INT columns in CONCAT together with GROUP BY... WITH ROLLUP appear to have a problem of this sort.

When I use CONCAT to produce a link from the GROUPed BY column, it produces a link to the next sequential row instead!

Jean-François Gagné said...

I like the fact that MySQL and MariaDB allows us to set the number of worker threads to 1: it allows me to easily test the overhead of // replication. For me, this looks very transparent.