|
M0_INTERNAL int | m0_fd_tolerance_check (struct m0_conf_pver *pv, uint32_t *failure_level) |
|
M0_INTERNAL int | m0_fd_tile_build (const struct m0_conf_pver *pv, struct m0_pool_version *pool_ver, uint32_t *failure_level) |
|
M0_INTERNAL void | m0_fd_tile_destroy (struct m0_fd_tile *tile) |
|
M0_INTERNAL void | m0_fd_src_to_tgt (const struct m0_fd_tile *tile, const struct m0_pdclust_src_addr *src, struct m0_pdclust_tgt_addr *tgt) |
|
M0_INTERNAL void | m0_fd_tgt_to_src (const struct m0_fd_tile *tile, const struct m0_pdclust_tgt_addr *tgt, struct m0_pdclust_src_addr *src) |
|
M0_INTERNAL int | m0_fd_tree_build (const struct m0_conf_pver *pv, struct m0_fd_tree *tree) |
|
M0_INTERNAL void | m0_fd_tree_destroy (struct m0_fd_tree *tree) |
|
M0_INTERNAL void | m0_fd_fwd_map (struct m0_pdclust_instance *pi, const struct m0_pdclust_src_addr *src, struct m0_pdclust_tgt_addr *tgt) |
|
M0_INTERNAL void | m0_fd_bwd_map (struct m0_pdclust_instance *pi, const struct m0_pdclust_tgt_addr *tgt, struct m0_pdclust_src_addr *src) |
|
M0_INTERNAL int | m0_fd_perm_cache_init (struct m0_fd_perm_cache *cache, uint64_t len) |
|
M0_INTERNAL void | m0_fd_perm_cache_fini (struct m0_fd_perm_cache *cache) |
|
Overview
A parity declustered layout divides a file into a collection of parity groups. Each parity group has N number of data units, K number of parity units, and S spare units, where S >= K. A parity declustered layout de-clusters these parity groups across the available pool of hardware resources, in such a way that:
- load per hardware resource is minimal in case of failure of any of the h/w resource.
- load across all h/w resources is as uniform as possible.
A pool of h/w resources includes (but is not restricted to) racks, enclosures, controllers, and disks. These resources form a hierarchical structure that forms a tree. We call each of these resources a failure-domain.
Definitions
- Failure domain: Any h/w resource failure of which can cause a loss of a file data is called as a failure domain.
- Failure domain tree: A hierarchical topology in which failure domains are arranged is called a failure domain tree.
- Tolerance constraint: A vector of the size of height of failure domain tree, i^th member of which represents expected tolerable failures at that level of failure domain tree.
- Base tile: When parity groups from a tile are laid down sequentially over available pool of targets then we call such arrangement a base tile.
- Fault tolerant tile: A fault-tolerant permutation of base tile, that is applicable to all tiles across all files.
Design Highlights
We aim to address two key issues:
- ensure that parity groups are declustered across the pool of resources such that user provided tolerance constraint is supported.
- data is distributed in such a way that load balancing is achieved during IO and repair of data.
Failure domains algorithm achieves the first goal by creating a fault tolerant tile, mapping of which will be common across all tiles across all files. This tile is created only once, when pool version is built. The second objective is attained by applying a sequence of cascaded permutations to units from the fault tolerant tile, one at each level of failure domains tree. These permutations are applied when IO or SNS-repair require to map a parity group and unit, to target and frame (and vice versa).
HLD : For documentation links, please refer to this file : doc/motr-design-doc-list.rst
Requirements
r.conf.pool.pool_version.fault_tolerant_permutation Implementation must generate a fault tolerant permutation of base tile that guarantees the required tolerance constraint for failure domains.
Dependencies
Pools in confc: Pools in confc are required in order to create a fault-tolerant tile.
◆ m0_fd_bwd_map()
Maps a target and frame from the pool version, to appropriate parity group and its unit.
- Parameters
-
[in] | pi | Parity declustered layout instance for a particular file. |
[in] | tgt | Target and frame to be maped. |
[out] | src | Parity group adn unit to which tgt maps. |
Definition at line 959 of file fd.c.
◆ m0_fd_fwd_map()
Maps the source parity group and parity unit to appropriate target and a frame from the pool version.
- Parameters
-
[in] | pi | Parity declustered layout instance for a particular file. |
[in] | src | Parity group and unit to be mapped. |
[out] | tgt | Target to which src maps. |
Definition at line 838 of file fd.c.
◆ m0_fd_perm_cache_fini()
Frees the internal arrays from the permutation cache.
Definition at line 825 of file fd.c.
◆ m0_fd_perm_cache_init()
M0_INTERNAL int m0_fd_perm_cache_init |
( |
struct m0_fd_perm_cache * |
cache, |
|
|
uint64_t |
len |
|
) |
| |
Initializes the permutation cache to consistent values.
- Parameters
-
[in] | len | Total elements present in the cache. |
Definition at line 741 of file fd.c.
◆ m0_fd_src_to_tgt()
Returns the target and frame at which the input unit from a given parity group is located.
- Parameters
-
[in] | tile | The failure permutation. |
[in] | src | Parity group and a unit within it to be located in the tile. |
[out] | tgt | Target and frame associated with the input src. |
Definition at line 547 of file fd.c.
◆ m0_fd_tgt_to_src()
Returns the parity group and unit located at given target and frame.
- Parameters
-
[in] | tile | The failure permutation. |
[in] | tgt | The target and frame co-ordinates for the required parity group and unit. |
[out] | src | Parity group and unit located at given target and frame. |
Definition at line 571 of file fd.c.
◆ m0_fd_tile_build()
Allocates and prepares a fault tolerant tile for input pool version. Returns infeasibility in case tolerance constraint for the pool version can not be met.
- Parameters
-
[in] | pv | The pool version present in configuration. |
| [in[ | pool_ver In-memory representation of pv. |
[out] | failure_level | Failure_level holds the level for which required tolerance can not be met. In case of a success this value holds the depth of the symmetric tree formed from the input tree. |
- Return values
-
0 | On success. |
-EINVAL | When required tolerance can not be met. |
-ENOMEM | When a tile can not be allocated. |
Definition at line 260 of file fd.c.
◆ m0_fd_tile_destroy()
M0_INTERNAL void m0_fd_tile_destroy |
( |
struct m0_fd_tile * |
tile | ) |
|
Frees the memory allocated for the fault tolerant tile.
Definition at line 590 of file fd.c.
◆ m0_fd_tolerance_check()
M0_INTERNAL int m0_fd_tolerance_check |
( |
struct m0_conf_pver * |
pv, |
|
|
uint32_t * |
failure_level |
|
) |
| |
Checks the feasibility of required tolerance for various failure domains.
- Parameters
-
[in] | pv | The pool version in configuration, for which the tolerance of failure domains is to be checked. m0_conf_pver::pv_u.subtree.pvs_tolerance holds the required tolerances. |
[out] | failure_level | Indicates the level for which the input tolerance can not be supported, when returned value by the function is -EINVAL. In all other cases its value is undefined. |
- Return values
-
0 | On success. |
-EINVAL | When tolerance can not be met. |
-ENOMEM | When system is out of memory. |
- Precondition
- configuration cache should be locked.
Definition at line 250 of file fd.c.
◆ m0_fd_tree_build()
Bulds a failure domains tree using input pool version object from the configuration schema. The levels for which no failure can be tolerated are not present in this tree.
- Parameters
-
[in] | pv | Configuration object corresponding to the pool version. |
[out] | tree | Holds the failure domains tree created using the pool version. |
Definition at line 598 of file fd.c.
◆ m0_fd_tree_destroy()
M0_INTERNAL void m0_fd_tree_destroy |
( |
struct m0_fd_tree * |
tree | ) |
|
Deallocates all the nodes from the tree.
Definition at line 785 of file fd.c.