Motr
M0
|
Color agenda:
green - States during startup or reelection
pink - Reelection-only states
dark grey - Stopping states
After successful start rconfc is in M0_RCS_IDLE state, waiting for one of two events: read lock conflict or user request for stopping. These two events are handled only when rconfc is in M0_RCS_IDLE state. If rconfc was in other state, then a fact of the happened event is stored, but its handling is delayed until rconfc state is M0_RCS_IDLE.
If failure is occurred that prevents rconfc from functioning properly, then rconfc goes to M0_RCS_FAILURE state. SM in this state do nothing until user requests for stopping.
Rconfc internal state is protected by SM group lock. SM group is provided by user on rconfc initialisation.
The first stage of rconfc startup is determining the entry point of motr cluster, which configuration should be accessed. The entry point consists of several components. All of them can be changed during cluster lifetime.
Cluster entry point includes:
HA subsystem is responsible for serving queries for current cluster entry point. Rconfc makes query to HA subsystem through a local HA agent.
It may happen that rconfc is not able to succeed with version election for some reason, (e.g. connection to active RM cannot be established, current set of confds reported by HA does not yield the quorum, etc.) In this case rconfc repeats entry point request to HA and attempts to elect version with the most recent entry point data set. There is no limit imposed on the number of attempts.
During m0_rconfc_start() execution rconfc requesting read lock from Resource Manager (RM) by calling rconfc_read_lock_get(). On request completion rconfc_read_lock_complete() is called. Successful lock acquisition indicates no configuration change is in progress and configuration reading is allowed.
The read lock is retained by rconfc instance until finalisation. But the lock can be revoked by RM in case a conflicting lock is requested. On the lock revocation rconfc_read_lock_conflict() is called. The call installs m0_confc_gate_ops::go_drain() callback to be notified when the last reading context is detached from m0_rconfc::rc_confc instance. The callback ends in calling rconfc_gate_drain() where rconfc starts conductor cache drain. In rconfc_conductor_drained() rconfc eventually puts the read lock back to RM.
Being informed about the conflict, rconfc disallows configuration reading done via m0_rconfc::rc_confc until the next read lock acquisition is complete. Besides, in rconfc_conductor_drain() the mentioned confc's cache is drained to prevent consumer from reading cached-but-outdated configuration values. However, the cache data remains untouched and readable to the very moment when there is no cache object pinned anymore, and the last reading context detaches from the confc being in use.
When done with the cache, m0_rconfc::rc_confc is disconnected from confd server to prevent unauthorized read operations. Then the conflicting lock is returned back to RM complying with the conflict request.
Immediately after revocation rconfc attempts to acquire read lock again. The lock will be granted once the conflicting lock is released.
In the course of rconfc_read_lock_complete() under condition of successful read lock acquisition rconfc transits to M0_RCS_VERSION_ELECT state. It initialises every confc instance of the m0_rconfc::rc_herd list, attaches rconfc__cb_quorum_test() to its context and initiates asynchronous reading from the corresponding confd server. When version quorum is either reached or found impossible rconfc_version_elected() is called.
On every reading event rconfc__cb_quorum_test() is called. In case the reading context is not completed, the function returns zero value indicating the process to go on. Otherwise rconfc_quorum_test() is called to see if quorum is reached with the last reply. If quorum is reached or impossible, then rconfc_version_elected() is called.
Quorum is considered reached when the number of confd servers reported the same version number is greater or equal to the value provided to m0_rconfc_init(). In case zero value was provided, the required quorum number is automatically calculated as a half of confd server count plus one.
If quorum is reached, rconfc_conductor_engage() is called connecting m0_rconfc::rc_confc with a confd server from active list. Starting from this moment configuration reading is allowed until read lock is revoked.
If quorum was not reached, rconfc repeats request to HA about entry point information and starts new version election with the most recent entry point data set.
Rconfc is interested in the following notifications from HA:
In order to receive these notifications rconfc creates phony confc (m0_rconfc::rc_phony) and adds fake objects for RM creditor service and confd services upon receiving cluster entry point. Using general non-phony confc instance is not possible, because configuration version election isn't done to that moment.
Actions performed on RM creditor death:
Actions performed on death of confd server from herd:
Death notification is basically handled by rconfc_link::rl_fom that is queued from rconfc_herd_link__on_death_cb(). The FOM is intended to safely disconnect herd link from problematic confd when session and connection termination may be timed out. The FOM prevents client's locality from being blocked for a noticeably long time.
| m0_fom_init() !m0_confc_is_inited() || | m0_fom_queue() !m0_confc_is_online() V +--------------------------- M0_RLF_INIT | | | | wait for M0_RPC_SESSION_IDLE | V +---------------------- M0_RLF_SESS_WAIT_IDLE | | | | m0_rpc_session_terminate() | V +---------------------- M0_RLF_SESS_TERMINATING | | m0_rpc_session_fini() | | m0_rpc_conn_terminate() | V +---------------------- M0_RLF_CONN_TERMINATING | | m0_rpc_conn_fini() | V +--------------------------->M0_RLF_FINI | m0_fom_fini() | rconfc_herd_link_fini() V
Rconfc performs gating read operations conducted through the confc instance governed by the rconfc, i.e. m0_rconfc::rc_confc. When read lock is acquired by rconfc, the reading is allowed. To be allowed to go on with reading, m0_confc_ctx_init() performs checking by calling previously set callback m0_confc::cc_gops::go_check(), that in fact is rconfc_gate_check().
With the read lock revoked inside rconfc_gate_check() rconfc blocks any m0_confc_ctx_init() calls done with this particular m0_rconfc::rc_confc. On next successful read lock acquisition all the previously blocked contexts get unblocked. Once being allowed to read, the context can be used as many times as required.
When new configuration change is in progress, and therefore, read lock is revoked, rconfc_read_lock_conflict() defers cache draining until there is no reading context attached. It installs m0_confc::cc_gops::go_drain() callback, that normally remains set to NULL and this way does not affect execution of m0_confc_ctx_fini() anyhow. But with the callback set up, at the moment of the very last detach m0_confc_ctx_fini() calls m0_confc::cc_gops::go_drain() callback, that in fact is rconfc_gate_drain(), where cache cleanup is finally invoked by setting M0_RCS_CONDUCTOR_DRAIN state. Rconfc SM remains in M0_RCS_CONDUCTOR_DRAIN_CHECK state until all conf objects are unpinned. Once there are no pinned objects, rconfc cleans cache, put read lock and starts reelection process.
In case configuration reading fails because of network error, the confc context requests the confc to skip its current connection to confd and switch to some other confd server running the same version. This is done inside state machine being in S_SKIP_CONFD state by calling callback function m0_confc::cc_gops::go_skip() that in fact is rconfc_gate_skip(). The function iterates through the m0_rconfc::rc_active list and returns on the first successful connection established. In case of no success, the function returns with -ENOENT making the state machine end in S_FAILURE state.
When rconfc is stopping, it scans configuration for pinned objects (i. e. objects with m0_conf_obj::co_nrefs > 0). If such object is found then rconfc waits until it will be unpinned by a configuration consumer. The consumer must be subscribed to m0_reqh::rh_confc_cache_expired chan and put its pinned objects in the callback registered with this chan. When all configuration objects become unpinned, rconfc is able to clean configuration cache and go to M0_RCS_FINAL state.