Motr
M0
|
This document explains the detailed level design for generic part of the copy machine module.
Please refer to "Definitions" section in "HLD of copy machine and agents" in References
The requirements below are grouped by various milestones.
The complete data restructuring process of copy machine follows non-blocking processing model of Motr design.
Copy machine maintains the list of aggregation groups being processed and implements a sliding window over this list to keep track of restructuring process and manage resources efficiently.
Please refer to "Logical Specification" section in "HLD of copy machine and agents" in References
After copy machine is successfully initialised (m0_cm_init()), it is configured as part of the copy machine service startup by invoking m0_cm_setup(). This performs copy machine specific setup by invoking m0_cm_ops::cmo_setup(). Once successfully completed the copy machine is transitioned into M0_CMS_IDLE state. In case of setup failure, copy machine is transitioned to M0_CMS_FAIL state, this also fails copy machine service startup, and thus copy machine is finalised during copy machine service finalisation.
Initialise local sliding window and sliding window persistent store. Persist sliding window after initialisation and proceed to READY phase.
In case of multiple nodes, every copy machine replica allocates an instance of struct m0_cm_proxy representing a particular remote replica and establishes rpc connection and session with the same. See Copy machine proxy for more details. After successfully establishing the rpc connections, copy machine specific m0_cm_ops::cmo_ready() operation is invoked to further setup the specific copy machine data structures. After creating proxies representing the remote replicas, for each remote replica the READY FOPs are allocated and initialised with the calculated local sliding window. The copy machine then broadcasts these READY FOPs to every remote replica using the rpc connection in the corresponding m0_cm_proxy. A READY FOP is a one-way fop and thus do not have a reply associated with it. Once every replica receives READY FOPs from all the corresponding remote replicas, the copy machine proceeds to the START phase.
After copy machine service is successfully started, it is ready to perform its respective tasks (e.g. SNS Repair). On receiving a trigger event (i.e failure in case of sns repair) copy machine transitions into M0_CMS_ACTIVE state once copy machine specific startup tasks are complete (m0_cm_start()). In case of copy machine startup failure, copy machine transitions into M0_CMS_FAIL state, once failure is handled, copy machine transitions back into M0_CMS_IDLE state and waits for further events.
Copy machine implements a special FOM type, viz. copy packet pump FOM to create copy packets (
After creating initial number of copy packets, copy machine broadcasts READY FOPs with its corresponding sliding window information to all its replicas in the pool. Every copy machine replica, after receiving READY FOPs from all its replicas in the pool, transitions into M0_CMS_ACTIVE state.
Generic copy machine infrastructure provides data structures and interfaces which are used to implement sliding window. Copy machine sliding window is based on aggregation group identifiers. Copy machine maintains two lists of aggregation groups, i) aggregation groups having only outgoing copy packets, viz. m0_cm:: cm_aggr_grps_out. ii) aggregation groups having only incoming copy packets, viz. m0_cm:: cm_aggr_grps_in.
Updating the local sliding window and saving it to persistent store is implemented through sliding window update FOM. This helps in performing various tasks asynchronously, viz:- updating the local sliding window and saving it to persistent store. See Sliding window update fom.
window persistence Copy machine sliding window is an in-memory data structure to keep track of progress of some operations. When some failure happens, e.g. software or node crash, this in-memory sliding window information is lost. Copy machine has no clue how to resume the operations at the point of failure. To solve this problem, copy machine stores some information about the completed operations onto persistent storage transactionally. So when node and/or copy machine restarts after failure, it reads from persistent storage and resumes its operations.
The following information is to be stored on persistent storage, i) copy machine id. It is struct m0_cm::cm_id. ii) last completed aggregation group id.
This information is stored in BE. It is also inserted into BE dictionary, with the key "CM ${ID}". Copy machine can find a pointer to this information from BE dictionary with proper key.
The following interfaces are provided to manage this information,
These interfaces will be used in various copy machine operations to manage the persistent information. For example, m0_cm_sw_store_load() will be used in copy machine start routine to check if a previous unfinished operation is in-progress. m0_cm_sw_store_update() will be called when sliding window advances. m0_cm_sw_store_complete() is called when a copy machine operation completes. In this case, the stored AG id will be deleted from storage, to indicate that the operation has already completed successfully. When node failure happens at this time, and then restarts again, it loads from storage, and -ENOENT indicates no pending copy machine operation is progress.
The call sequence of interface and sliding window update FOM execution is as below:
| | V ------------------------------------ | m0_cm_sw_store_load() | Read sliding window from persistent store to continue from any previously pending repair operation. Also start sliding update FOM. ------------------------------------ / \ / \ ret == 0 / \ ret == -ENOENT A restart from failure / \ A fresh new operation A valid sw is returned/ \ No sw info on storage. / \ V V ----------------------------- ---------------------------- | setup sliding window with | | m0_cm_sw_store_init(): | | the returned sw. | | Allocate a persistent sw | | CM operation will start | | and init it to zero. CM | | from this sw. | | starts from scratch | ----------------------------- ---------------------------- \ / \ / \ / \ / \ / V SWU_STORE V --------------------------- operation completed -------> | m0_cm_sw_store_update() |-----------------------> | --------------------------- | | | | | | | | | | <--------------------V | operation continue | SWU_COMPLETE V ---------------------------- |m0_cm_sw_store_complete():| |delete sw info from | |persistent storage. | |m0_cm_sw_store_load() | |returns -ENOENT after this| |call. | ----------------------------
Once operation completes successfully, copy machine performs required tasks, (e.g. updating layouts, etc.) by invoking m0_cm_stop(), this transitions copy machine back to M0_CMS_IDLE state. Copy machine invokes m0_cm_stop() also in case of operational failure to broadcast STOP FOPs to its other replicas in the pool, indicating failure. This is handled specific to the copy machine type.
As copy machine is implemented as a m0_reqh_service, the copy machine finalisation path is m0_reqh_service_stop()->rso_stop()->m0_cm_fini(). Now, before invoking m0_reqh_service_stop(), m0_reqh_shutdown_wait() is called, this returns when all the FOMs in the given reqh are finalised. Although there is a possibilty that the copy machine operation is in-progress while the reqh is being shutdown, this situation is taken care by m0_reqh_shutdown() mechanism as mentioned above. Thus the copy machine pump FOM (m0_cm::cm_cp_pump) is created when copy machine operation starts and destroyed when copy machine operation stops, until then it is alive within the reqh. Thus using m0_reqh_shutdown_wait() mechanism we are sure that copy machine is IDLE and operation is completed before the m0_cm_fini() is invoked.
This section briefly describes interfaces and structures conforming to above mentioned copy machine requirements.
NA
NA
Following are the references to the documents from which the design is derived. For documentation links, please refer to this file : doc/motr-design-doc-list.rst