Motr  M0
SNS copy machine DLD

Overview

This module implements sns copy machine using generic copy machine infrastructure. SNS copy machine is built upon the request handler service. The same SNS copy machine can be configured to perform multiple tasks, viz.repair and rebalance using parity de-clustering layout. SNS copy machine is typically started during Motr process startup, although it can also be started later.


Definitions

Please refer to "Definitions" section in "HLD of copy machine and agents" and "HLD of SNS Repair" in References


Requirements

  • r.sns.cm.buffer.acquire The implementation should efficiently provide buffers for the repair as well as re-balance operation without any deadlock.
  • r.sns.cm.sliding.window The implementation should efficiently use various copy machine resources using sliding window during copy machine operation, e.g. memory, cpu, etc.
  • r.sns.cm.sliding.window.init The implementation should efficiently communicate the initial sliding window to other replicas in the cluster.
  • r.sns.cm.sliding.window.update The implementation should efficiently update the sliding window to other replicas during repair.
  • r.sns.cm.data.next The implementation should efficiently select next data to be processed without causing any deadlock or bottle neck.
  • r.sns.cm.report.progress The implementation should efficiently report overall progress of data restructuring and update corresponding layout information for restructured objects.
  • r.sns.cm.repair.trigger For repair, SNS copy machine should respond to triggers caused by various kinds of failures as mentioned in the HLD of SNS Repair.
  • r.sns.cm.repair.iter For repair, SNS copy machine iterator should iterate over parity group units on the survived COBs and accordingly calculate and write the lost data to spare units of the corresponding parity group.
  • r.sns.cm.rebalance.iter For rebalance, SNS copy machine iterator should iterate over the spare units of the repaired parity groups and copy the data from corresponding spare units to the target unit on the new device.

Dependencies

  • r.sns.cm.resources.manage It must be possible to efficiently manage and throttle resources.

    Please refer to "Dependencies" section in "HLD of copy machine and agents" and "HLD of SNS Repair" in References


Design Highlights

  • SNS copy machine uses request handler service infrastructure.
  • SNS copy machine specific data structure embeds generic copy machine and other sns repair specific objects.
  • SNS copy machine defines its specific aggregation group data structure which embeds generic aggregation group.
  • Once initialised SNS copy machine remains idle until failure is reported.
  • SNS buffer pool provisioning is done when operation starts.
  • SNS copy machine creates copy packets only if free buffers are available in the outgoing buffer pool.
  • Failure triggers SNS copy machine to start repair operation.
  • For multiple nodes, SNS copy machine maintains a local proxy of every other remote replica in the cluster.
  • For multiple nodes, SNS copy machine calculates its initial sliding window and communicates it to other replicas identified by the local proxies through READY FOPs.
  • During the operation the sliding window updates are piggy backed along with the outgoing copy packets and their replies.
  • Once repair operation is complete, the rebalance operation can start if there exist a new device corresponding to the lost device. Thus the same copy machine is configured to perform re-balance operation.
  • For rebalance, Each used spare unit corresponds to exactly one (data or parity) unit on the lost device. SNS copy machine uses the same layout as used during sns repair to map a spare unit to the target unit on new device. The newly added device may have a new UUID, but will have the same index in the pool and the COB identifiers of the failed device and the replacement device will also be the same. Thus for re-balance, the same indices of the lost data/parity units on the lost device are used to write on to the newly added device with the same COB identifier as the failed device.

Logical specification

Component overview

The focus of sns copy machine is to efficiently restructure (repair or re-balance) data in case of failures, viz. device, node, etc. The restructuring operation is split into various copy packet phases.

Copy machine setup

SNS copy machine service allocates and initialises the corresponding copy machine.

See also
SNS Repair service for details. Once the copy machine is initialised, as part of copy machine setup, SNS copy machine specific resources are initialised, viz. incoming and outgoing buffer pools (m0_sns_cm::sc_ibp and ::sc_obp). Both the buffer pools are initialised with colours equal to total number of localities in the request handler. After cm_setup() is successfully called, the copy machine transitions to M0_CMS_IDLE state and waits until failure happens. As mentioned in the HLD, failure information is a broadcast to all the replicas in the cluster using TRIGGER FOP. The FOM corresponding to the TRIGGER FOP activates the SNS copy machine to start repair operation by invoking m0_cm_start(), this invokes SNS copy machine specific start routine which initialises specific data structures.

Once the repair operation is complete the same copy machine is used to perform re-balance operation, iff there exist a new device/s corresponding to the lost device/s. In re-balance operation the data from the spare units of the repaired parity groups is copied to the new device using the layout.

Copy machine ready

Allocates buffers for incoming and outgoing sns copy machine buffer pools.

Copy machine startup

Starts and initialises sns copy machine data iterator.

See also
m0_sns_cm_iter_start()

Copy machine data iterator

SNS copy machine implements an iterator to efficiently select next data to process. This is done by implementing the copy machine specific operation, m0_cm_ops::cmo_data_next(). The following pseudo code illustrates the SNS data iterator for repair as well as re-balance operation,

- for each GOB G in aux-db (in global fid order)
- fetch layout L for G
// proceed in parity group order.
- for each parity group S, until eof of G
- map group S to COB list
// determine whether group S needs reconstruction.
- if no COB.containerid is in the failure set continue to the next group
// group has to be reconstructed, create copy packets for all local units
- if REPAIR
- for each data and parity unit U in S (0 <= U < N + K)
- if RE-BALANCE
- for each spare unit U in S (N + K < U <= N + 2K)
- map (S, U) -> (COB, F) by L
- if COB is local and COB.containerid does not belong to the failure set
- fetch frame F of COB

The above iterator iterates through each GOB in aggregation group (parity group) order, so that the copy packet transformation doesn't block. Thus for SNS repair operation, only the data/parity units from every parity group belonging to the lost device are iterated, where as for SNS re-balance operation only the spare units from the repaired parity groups are iterated.

Copy machine sliding window

SNS copy machine implements sliding window using struct m0_cm::cm_aggr_grps_in list for aggregation groups having incoming copy packets. SNS copy machine implements the copy machine specific m0_cm::cmo_ag_next() operation to calculate the next relevant aggregation group identifier. Following algorithm illustrates the implementation of m0_cm::cmo_ag_next(),

1) extract GOB (file identifier) G from the given aggregation group identifier A 2) extract parity group identifier P from A 3) increment P to process next group 4) if G is valid (i.e. G is not any of the reserved file identifier e.g. M0_COB_ROOT_FID)

  • fetch layout and file size for G
  • calculate total number of parity groups Sn for G
  • for each parity group P' until eof of G (p < p' < Sn)
  • setup aggregation group identifier A' using G and P
  • If P' is relevant aggregation group (has spare unit on any of the local COBs)
  • If copy machine has space (has enough buffers for all the incoming copy packets)
  • return A' 5) else reset P to 0, fetch next G from aux-db and repeat from step 5

m0_cm_ops::cmo_ag_next() is invoked from m0_cm_ag_advance() in a loop until m0_cm_ops::cmo_ag_next() returns valid next relevant aggregation group identifier.

Copy machine stop

Once all the COBs (i.e. component objects) corresponding to the GOBs (i.e global file objects) belonging to the failure set are re-structured (repair or re-balance) by every replica in the cluster successfully, the re-structuring operation is marked complete.

Threading and Concurrency Model

SNS copy machine is implemented as a request handler service, thus it shares the request handler threading model and does not create its own threads. All the copy machine operations are performed in context of request handler threads.

SNS copy machine uses generic copy machine infrastructure, which implements copy machine state machine using generic Motr state machine infrastructure. State machine

Locking All the updates to members of copy machine are done with m0_cm_lock() held.

NUMA optimizations

N/A


Conformance

i.sns.cm.buffer.acquire SNS copy machine implements its incoming and outgoing buffer pools. The outgoing buffer pool is used to create copy packets. The respective buffer pools are provisioned during the start of the copy machine operation.

i.sns.cm.sliding.window SNS copy machine implements the sliding window using the struct m0_cm::cm_aggr_grps_in list for aggregation groups having incoming copy packets.

i.sns.cm.sliding.window.init SNS copy machine calculates and communicates the initial sliding window in M0_CMS_READY phase through READY FOPs.

i.sns.cm.sliding.window.update SNS copy machine piggy backs the sliding window with every outgoing copy packet during the operation.

i.sns.cm.data.next SNS copy machine implements a next function using cob name space iterator and pdclust layout infrastructure to select the next data to be repaired from the failure set. This is done in GOB fid and parity group order.

i.sns.cm.report.progress Progress is reported using sliding window and layout updates.

i.sns.cm.repair.trigger Various failures are reported through TRIGGER FOP, which create corresponding FOMs. FOMs invoke sns specific copy machine operations through generic copy machine interfaces which cause copy machine state transitions.

r.sns.cm.repair.iter For repair, SNS copy machine iterator iterates over parity group units on the survived COBs and accordingly calculates and writes the lost data to spare units of the corresponding parity group using layout.

r.sns.cm.rebalance.iter For rebalance, SNS copy machine iterator iterates over only the spare units of the repaired parity groups and copy the data to the corresponding target units on the new device.


Unit tests

Copy packet specific tests

Test:
Test01: If an aggregation group is having a single copy packet, then transformation function should be a NO-OP.
Test:
Test02: Test if all copy packets of an aggregation group get collected.
Test:
Test03: Test the transformation function. Input: 2 bufvec's src and dest to be XORed. Output: XORed output stored in dest bufvec.

System tests

N/A


Analysis

N/A


References

Following are the references to the documents from which the design is derived, For documentation links, please refer to this file : doc/motr-design-doc-list.rst

  • Copy Machine redesign
  • HLD of copy machine and agents
  • HLD of SNS Repair