Motr
M0
|
#include <rm.h>
Data Fields | |
enum m0_rm_incoming_type | rin_type |
struct m0_sm | rin_sm |
int32_t | rin_rc |
enum m0_rm_incoming_policy | rin_policy |
uint64_t | rin_flags |
struct m0_rm_credit | rin_want |
struct m0_tl | rin_pins |
int | rin_priority |
const struct m0_rm_incoming_ops * | rin_ops |
m0_time_t | rin_req_time |
struct m0_rm_reserve_prio | rin_reserve |
struct m0_rm_remote * | rin_remote |
uint64_t | rin_magix |
Resource usage credit request.
The same m0_rm_incoming structure is used to track state of the incoming requests both "local", i.e., from the same domain where the owner resides and "remote".
An incoming request is created for
- local credit request, when some user wants to use the resource; - remote credit request from a "downward" owner which asks to sub-let some credits; - remote credit request from an "upward" owner which wants to revoke some credits.
These usages are differentiated by m0_rm_incoming::rin_type.
An incoming request is a state machine, going through the following stages:
- [CHECK] This stage determines whether the request can be fulfilled immediately. Local request can be fulfilled immediately if the wanted credit is possessed by the owner, that is, if in->rin_want is implied by a join of owner->ro_owned[]. A non-local (loan or revoke) request can be fulfilled immediately if the wanted credit is implied by a join of owner->ro_owned[OWOS_CACHED], that is, if the owner has enough credits to grant the loan and the wanted credit does not conflict with locally held credits. - [POLICY] If the request can be fulfilled immediately, the "policy" is invoked which decides which credit should be actually granted, sublet or revoked. That credit can be larger than requested. A policy is, generally, resource type dependent, with a few universal policies defined by enum m0_rm_incoming_policy. - [SUCCESS] Finally, fulfilled request succeeds. - [ISSUE] Otherwise, if the request can not be fulfilled immediately, "pins" (m0_rm_pin) are added which will notify the request when the fulfillment check might succeed. Pins are added to: - every conflicting credit held by this owner (when RIF_LOCAL_WAIT flag is set on the request and always for a remote request); - outgoing requests to revoke conflicting credits sub-let to remote owners (when RIF_MAY_REVOKE flag is set); - outgoing requests to borrow missing credits from remote owners (when RIF_MAY_BORROW flag is set); - reserved credits if current request has smaller reserve priority; Outgoing requests mentioned above are created as necessary in the ISSUE stage. - [CYCLE] When all the pins stuck in the ISSUE state are released (either when a local credit is released or when an outgoing request completes or when reserved credits are granted), go back to the CHECK state.
Looping back to the CHECK state is necessary, because possessed non-reserved credits are not "pinned" during wait and can go away (be revoked or sub-let). The credits are not pinned to avoid dependencies between credits that can lead to dead-locks and "cascading evictions". But in this case there is possibility of live-lock.
The alternative is to use RIF_RESERVE flag that leads to pinning credits with M0_RPF_BARRIER. Dead-locks are avoided by determining global strict ordering between such requests using "reserve priority" (m0_rm_reserve_prio). Reserve priority is assigned once to the local incoming request and then inherited by all remote requests created in sake of that local request fulfillment. Therefore reserve priorities are handled consistently through the whole cluster.
Reserve priorities are used only to avoid possible dead-locks and live-locks. There is no guarantee that request with higher reserve priority will be fully fulfilled before request with lower reserve priority.
If probability of a live-lock is low enough then using incoming requests without RIF_RESERVE flag is preferable.
How many outgoing requests are sent out in ISSUE state is a matter of policy. The fewer requests are sent, the more CHECK-ISSUE-WAIT loop iterations would typically happen. An extreme case of sending no more than a single request is also possible and has some advantages: outgoing request can be allocated as part of incoming request, simplifying memory management.
It is also a matter of policy, how exactly the request is satisfied after a successful CHECK state. Suppose, for example, that the owner possesses credits C0 and C1 such that wanted credit W is implied by join(C0, C1), but neither C0 nor C1 alone imply W. Some possible CHECK outcomes are:
- increase user counts in both C0 and C1; - insert a new credit equal to W into owner->ro_owned[]; - insert a new credit equal to join(C0, C1) into owner->ro_owned[].
All have their advantages and drawbacks:
- elevating C0 and C1 user counts keeps owner->ro_owned[] smaller, but pins more credits than strictly necessary; - inserting W behaves badly in a standard use case where a thread doing sequential IO requests a credit on each iteration; - inserting the join pins more credits than strictly necessary.
All policy questions are settled by per-request flags and owner settings, based on access pattern analysis.
Following is a state diagram, where stages that are performed without blocking (for network communication) are lumped into a single state:
* SUCCESS-----------------------+ * ^ | * too many iterations | | * live-lock | last completion | * +-----------------CHECK<-----------------+ | * | | | | * | | | | * V | | | * +----FAILURE | pins placed | | * | ^ | | | * | | | | | * | | V | | * | +----------------WAITING-----------------+ | * | timeout ^ | | * | | | completion | * | | | | * | +---+ | * | | * | RELEASED<--------------------+ * | | * | | * | V * +------------------------->FINAL * *
m0_rm_incoming fields and state transitions are protected by the owner's mutex.
An incoming request is placed by m0_rm_credit_get() on one of owner's m0_rm_owner::ro_incoming[] lists depending on its priority. It remains on this list until request processing failure or m0_rm_credit_put() call.
const struct m0_rm_incoming_ops* rin_ops |
struct m0_tl rin_pins |
List of pins, linked through m0_rm_pin::rp_incoming_linkage, for all credits held to satisfy this request.
enum m0_rm_incoming_policy rin_policy |
int rin_priority |
int32_t rin_rc |
Stores the error code for incoming request. A separate field is needed because rin_sm.sm_rc is associated with an error of a state.
For incoming it's possible that an error is set in RI_WAIT and then incoming has to be put back in RI_CHECK state before it can be put into RI_FAILURE. The state-machine model does not handle this well.
struct m0_rm_remote* rin_remote |
struct m0_rm_reserve_prio rin_reserve |
enum m0_rm_incoming_type rin_type |
struct m0_rm_credit rin_want |