Motr  M0
confd Internals

XXX FIXME: confd documentation is outdated.

Dependencies

Confd depends on the following subsystems:

Most important functions, confd depends on, are listed above:


Design Highlights

  • User-space implementation.
  • Provides a "FOP-based" interface for confc to access configuration information.
  • Relies on request handler threading model and is driven by reqh. Request processing is based on FOM execution.
  • Maintains its own configuration cache, implementation of which is common to confd and confc.
  • Several confd state machines (FOMs) processing requests from configuration consumers can work with configuration cache concurrently.

Logical Specification

Confd service initialization is performed by request handler. To allocate Confd service and its internal structures in memory m0_confd_service_locate() is used.

Confd service type is registered in ‘subsystem’ data structure of "motr/init.c", the following lines are added:

...
...
};

Configuration cache pre-loading procedure traverses all tables of configuration db. Since relations between neighbour levels only are possible, tables of higher "levels of DAG" are processed first. The following code example presents pre-loading in details:

conf_cache_preload (...)
{
for each record in the "profiles" table do
... allocate and fill struct m0_conf_profile from p_prof
endfor
for table in "file_systems", "services",
"nodes", "nics", "storage_devices",
in the order specified, do
for each record in the table, do
... allocate and fill struct m0_conf_obj ...
... create DAG struct m0_conf_relation to appropriate conf object ...
endfor
end for
}

FOP operation vector, FOP type, and RPC item type have to be defined for each FOP. The following structures are defined for m0_conf_fetch FOP:

  • struct m0_fop_type m0_conf_fetch_fopt — defines FOP type;
  • struct m0_fop_type_ops m0_conf_fetch_ops — defines FOP operation vector;
  • struct m0_rpc_item_type m0_rpc_item_type_fetch — defines RPC item type.

m0_fom_fetch_state() - called by reqh to handle incoming confc requests. Implementation of this function processes all FOP-FOM specific and m0_conf_fetch_resp phases:

static int m0_fom_fetch_state(struct m0_fom *fom)
{
checks if FOM should transition into a generic/standard
phase or FOP specific phase.
if (fom->fo_phase < FOPH_NR) {
result = m0_fom_state_generic(fom);
} else {
... process m0_conf_fetch_resp phase transitions ...
}
}

Request handler triggers user-defined functions to create FOMs for processed FOPs. Service has to register FOM-initialization functions for each FOP treated as a request:

To do so, the appropriate structures and functions have to be defined. For example the following used by m0_conf_fetch FOP:

static const struct m0_fom_type_ops fom_fetch_type_ops = {
.fto_create = fetch_fom_create
};
struct m0_fom_type m0_fom_fetch_mopt = {
.ft_ops = &fom_fetch_type_ops
};
static int fetch_fom_create(struct m0_fop *fop, struct m0_fom **m,
struct m0_reqh *reqh)
{
1) allocate fom;
2) m0_fom_init(fom, &m0_fom_ping_mopt, &fom_fetch_type_ops,
3) *m = fom;
}

The implementation of m0_fom_fetch_state() needs the following functions to be defined:

  • fetch_check_request(), update_check_request() - check incoming request and validates requested path of configuration objects.
  • fetch_next_state(), update_next_state() - transit FOM phases depending on the current phase and on the state of configuration objects.
  • obj_serialize() - serializes given object to FOP.
  • fetch_failure_handle(), update_failure_handle() - handle occurred errors.

State Specification

Confd as a whole is not a state machine, phase processing is implemented on basis of FOM of m0_conf_fetch, m0_conf_update FOPs. After corresponding FOM went through a list of FOM specific phases it transited into F_INITIAL phase.

The number of state machine instances correspond to the number of FOPs being processed in confd.

m0_conf_fetch FOM state transition diagram:

dot_inline_dotgraph_8.png
  • F_INITIAL In this phase, incoming FOM/FOP-related structures are being initialized and FOP-processing preconditions are being checked. Then, an attempt is made to obtain a read lock m0_confd::d_cache::ca_rwlock. When it's obtained then m0_long_lock logic transits FOM back into F_SERIALISE.
  • F_SERIALISE: Current design assumes that data is pre-loaded into configuration cache. In F_SERIALISE phase, m0_confd::d_cache::ca_rwlock lock has been already obtained as a read lock. m0_conf_fetch_resp FOP is being prepared for sending by looking up requested path in configuration cache and unlocking m0_confd::d_cache::ca_rwlock. After that, m0_conf_fetch_resp FOP is sent with m0_rpc_reply_post(). fetch_next_state() transits FOM into F_TERMINATE. If incoming request consists of a path which is not in configuration cache, then the m0_conf_fetch FOM is transitioned to the F_FAILURE phase.
  • F_TERMINATE: In this phase, statistics values are being updated in m0_confd::d_stat. m0_confd::d_cache::ca_rwlock has to be unlocked.
  • F_FAILURE: In this phase, statistics values are being updated in m0_confd::d_stat. m0_conf_fetch_resp FOP with an empty configuration objects sequence and negative error code is sent with m0_rpc_reply_post(). m0_confd::d_cache::ca_rwlock has to be unlocked.

    Note
    m0_conf_stat FOM has a similar state diagram as m0_conf_fetch FOM does and hence is not illustrated here.

    m0_conf_update FOM state transition diagram:

    dot_inline_dotgraph_9.png
  • U_INITIAL: In this phase, incoming FOM/FOP-related structures are being initialized and FOP-processing preconditions are being checked. Then, an attempt is made to obtain a write lock m0_confd::d_cache::ca_rwlock. When it's obtained then m0_long_lock logic transits FOM back into U_UPDATE.
  • U_UPDATE: In current phase, m0_confd::d_cache::ca_rwlock lock has been already obtained as a write lock. Then, configuration cache has to be updated and m0_confd::d_cache::ca_rwlock lock should be unlocked. After that, m0_conf_update_resp FOP is sent with m0_rpc_reply_post(). update_next_state() transits FOM into U_TERMINATE. If incoming request consists of a path which is not in configuration cache than the m0_conf_fetch FOM is transitioned to the U_FAILURE phase
  • U_TERMINATE: In this phase, statistics values are being updated in m0_confd::d_stat. m0_confd::d_cache::ca_rwlock has to be unlocked.
  • U_FAILURE: In this phase, statistics values are being updated in m0_confd::d_stat. m0_conf_update_resp FOP with an empty configuration objects sequence and negative error code is sent with m0_rpc_reply_post(). m0_confd::d_cache::ca_rwlock has to be unlocked.

Locking model

Confd relies on a locking primitive integrated with FOM signaling mechanism. The following interfaces are used:

bool m0_long_{read,write}_lock(struct m0_longlock *lock,
struct m0_fom *fom, int next_phase);
void m0_long_{read,write}_unlock(struct m0_longlock *lock);
bool m0_long_is_{read,write}_locked(struct m0_longlock *lock);

m0_long_{read,write}_lock() returns true iff the lock is obtained. If the lock is not obtained (i.e. the return value is false), the subroutine would have arranged to awaken the FOM at the appropriate time to retry the acquisition of the lock. It is expected that the invoker will return M0_FSO_AGAIN from the state function in this case.

m0_long_is_{read,write}_locked() returns true iff the lock has been obtained.

The following code example shows how to perform a transition from F_INITIAL to F_SERIALISE and obtain a lock:

static int fom_fetch_state(struct m0_fom *fom)
{
//...
struct m0_long_lock_link *link;
if (fom->fo_phase == F_INITIAL) {
// Initialise things.
// ...
// Retreive long lock link from derived FOM object: link = ...;
// and acquire the lock
link,
F_SERIALISE));
}
//...
}
See also
fom-longlock

Threading and Concurrency Model

Confd creates no threads of its own but instead is driven by the request handler. All threading and concurrency is being performed on the Request Handler side, registered in the system. Incoming FOPs handling, phase transitions, outgoing FOPs serialization, error handling is done in callbacks called by reqh-component.

Configuration service relies on rehq component threading model and should not acquire any locks or be in any waiting states, except listed below. Request processing should be performed in an asynchronous-like manner. Only synchronous calls to configuration DB are allowed which should be bracketed with m0_fom_block_{enter,leave}().

Multiple concurrently executing FOMs share the same configuration cache and db environment of confd, so access to them is synchronized with the specialized m0_longlock read/write lock designed for use in FOMs: the FOM does not busy-wait, but gets blocked until lock acquisition can be retried. Simplistic synchronization of the database and in-memory cache through means of this read/writer lock (m0_confd::d_lock) is sufficient, as the workload of confd is predominantly read-only.

NUMA Optimizations

Multiple confd instances can run in the system, but no more than one per request handler. Each confd has its own data-base back-end and its own pre-loaded copy of data-base in memory.


Conformance

  • i.conf.confd.user Confd is implemented in user space.
  • i.conf.cache.data-model Configuration information is organized as outlined in section 4.1 of the HLD. The same data structures are used for confc and confd. Configuration structures are kept in memory.
  • i.conf.cache.unique-objects A registry of cached objects (m0_conf_cache::ca_registry) is used to achieve uniqueness of configuration object identities.

Unit Tests

Test:

obj_serialize() will be tested.

{fetch,update}_next_state() will be tested.

Test:
Load predefined configuration object from configuration db. Check its predefined value.
Test:
Load predefined configuration directory from db. Check theirs predefined values.
Test:
Fetch non-existent configuration object from configuration db.

Analysis

Size of configuration cache, can be evaluated according to a number of configuration objects in configuration db and is proportional to the size of the database file

Configuration request FOP (m0_conf_fetch) is executed in approximately constant time (measured in disk I/O) because the entire configuration db is cached in-memory and rarely would be blocked by an update.

See also
confd Internals