Resource manager maintains a number of interrelated data-structures in memory. Invariant checking functions, defined in this section assert internal consistency of these structures.
A resource is an entity in Motr for which a notion of ownership can be well-defined. See the HLD referenced below for more details.
In Motr almost everything is a resource, except for the low-level types that are used to implement the resource framework.
Resource management is split into two parts:
generic functionality, implemented by the code in rm/ directory and
resource type specific functionality.
These parts interact through the operation vectors (m0_rm_resource_ops, m0_rm_resource_type_ops and m0_rm_credit_ops) provided by a resource type and called by the generic code. Type specific code, in turn, calls generic entry-points described in the Resource type interface section.
In the documentation below, responsibilities of generic and type specific parts of the resource manager are delineated.
Overview
A resource (m0_rm_resource) is associated with various file system entities:
file meta-data. Credits to use this resource can be thought of as locks on file attributes that allow them to be cached or modified locally;
file data. Credits to use this resource are extents in the file plus access mode bits (read, write);
free storage space on a server (a "grant" in Lustre terminology). Credit to use this resource is a reservation of a given number of bytes;
quota;
many more, see the HLD for examples.
A resource owner (m0_rm_owner) represents a collection of credits to use a particular resource.
To use a resource, a user of the resource manager creates an incoming resource request (m0_rm_incoming), that describes a wanted usage credit (m0_rm_credit_get()). Sometimes the request can be fulfilled immediately, sometimes it requires changes in the credit ownership. In the latter case outgoing requests are directed to the remote resource owners (which typically means a network communication) to collect the wanted usage credit at the owner. When an outgoing request reaches its target remote domain, an incoming request is created and processed (which in turn might result in sending further outgoing requests). Eventually, a reply is received for the outgoing request. When incoming request processing is complete, it "pins" the wanted credit. This credit can be used until the incoming request structure is destroyed (m0_rm_credit_put()) and the pin is released.
See the documentation for individual resource management data-types and interfaces for more detailed description of their behaviour.
Terminology.
Various terms are used to described credit flow of the resources in a cluster.
Owners of credits for a particular resource are arranged in a cluster-wide hierarchy. This hierarchical arrangement depends on system structure (e.g., where devices are connected, how network topology looks like) and dynamic system behaviour (how accesses to a resource are distributed).
Originally, all credits on the resource belong to a single owner or a set of owners, residing on some well-known servers. Proxy servers request and cache credits from there. Lower level proxies and clients request credits in turn. According to the order in this hierarchy, one distinguishes "upward" and "downward" owners relative to a given one.
In a given ownership transfer operation, a downward owner is "debtor" and upward owner is "creditor". The credit being transferred is called a "loan" (note that this word is used only as a noun). When a credit is transferred from a creditor to a debtor, the latter "borrows" and the former "sub-lets" the loan. When a credit is transferred in the other direction, the creditor "revokes" and debtor "returns" the loan.
A debtor can voluntarily return a loan. This is called a "cancel" operation.
Concurrency control.
Generic resource manager makes no assumptions about threading model used by its callers. Generic resource data-structures and code are thread safe.
3 types of locks protect all generic resource manager states:
per domain m0_rm_domain::rd_lock. This lock serialises addition and removal of resource types. Typically, it won't be contended much after the system start-up;
per resource type m0_rm_resource_type::rt_lock. This lock is taken whenever a resource or a resource owner is created or destroyed. Typically, that would be when a file system object is accessed which is not in the cache;
per resource owner m0_rm_owner::ro_lock. These locks protect a bulk of generic resource management state:
lists of possessed, borrowed and sub-let usage credits;
Owner lock is accessed (taken and released) at least once during processing of an incoming request. Main owner state machine logic (owner_balance()) is structured in a way that is easily adaptable to a finer grained logic.
None of these locks are ever held while waiting for a network communication to complete.
The owners in a group coordinate their activities internally (by means outside of resource manager control) as far as resource management is concerned.
Resource manager assumes that credits granted to the owners from the same group never conflict.
Typical usage is to assign all owners from the same distributed transaction (or from the same network client) to a group. The decision about a group scope has concurrency related implications, because the owners within a group must coordinate access between themselves to maintain whatever scheduling properties are desired, like serialisability.
Liveness.
None of the resource manager structures, except for m0_rm_resource, require reference counting, because their liveness is strictly determined by the liveness of an "owning" structure into which they are logically embedded.
As in many other places in Motr, liveness of "global" long-living structures (m0_rm_domain, m0_rm_resource_type) is managed by the upper layers which are responsible for determining when it is safe to finalise the structures. Typically, an upper layer would achieve this by first stopping and finalising all possible resource manager users.
Similarly, a resource owner (m0_rm_owner) liveness is not explicitly determined by the resource manager. It is up to the user to determine when an owner (which can be associated with a file, or a client, or a similar entity) is safe to be finalised.
When a resource owner is finalised (ROS_FINALISING) it tears down the credit network by revoking the loans it sublet to and by retuning the loans it borrowed from other owners.
M0_RM_REMOTE_PUT() decrements remote's reference counter. rm_remote_free() is called if counter reaches zero, that's why resource->r_mutex is taken here.
Incoming requests are assigned a priority (greater numerical value is higher). When multiple requests are ready to be fulfilled, higher priority ones have a preference.
Previously sub-let credits may be revoked, if necessary, to fulfill this request.
RIF_MAY_BORROW
More credits may be borrowed, if necessary, to fulfill this request.
RIF_LOCAL_WAIT
The interaction between the request and locally possessed credits is the following:
by default, locally possessed credits are ignored. This scenario is typical for a local request (M0_RIT_LOCAL), because local users resolve conflicts by some other means (usually some form of concurrency control, like locking);
if RIF_LOCAL_WAIT is set, the request will wait until there is no locally possessed credits conflicting with the wanted credit. This is typical for a remote request (M0_RIT_BORROW or M0_RIT_REVOKE);
if RIF_LOCAL_TRY is set, the request will be immediately denied, if there are conflicting local credits. This allows to implement a "try-lock" like functionality.
RIF_LOCAL_WAIT and RIF_LOCAL_TRY flags are mutually exclusive.
RIF_LOCAL_TRY
Fail the request if it cannot be fulfilled because of the local conflicts.
Reserve credits that fulfill incoming request by putting M0_RPF_BARRIER pins. Reserved credit can't be granted to other incoming request until request which made reservation is granted. The only exception is when incoming request also has RIF_RESERVE flag and has bigger reserve priority (see m0_rm_incoming documentation).
A request for a usage credit from a local user. When the request succeeds, the credit is held by the owner.
M0_RIT_BORROW
A request to loan a usage (credit) to a remote owner. Fulfillment of this request might cause further outgoing requests to be sent, e.g., to revoke credits sub-let to remote owner.
M0_RIT_REVOKE
A request to return a usage credit previously sub-let to this owner.
Not-pinned credit is "cached". Such credit can be returned to an upward owner from which it was previously borrowed (i.e., credit can be "cancelled") or sub-let to downward owners.
Lists of incoming and outgoing requests are subdivided into sub-lists.
Enumerator
OQS_GROUND
"Ground" request is not excited.
OQS_EXCITED
Excited requests are those for which something has to be done. An outgoing request is excited when it completes (or times out). An incoming request is excited when it's ready to go from RI_WAIT to RI_CHECK state.
Resource owner state machine goes through lists of excited requests processing them. This processing can result in more excitement somewhere, but eventually terminates.
In this state owner credits lists are empty (including incoming and outgoing request lists).
ROS_INITIALISING
Initial network setup state:
registering with the resource data-base;
&c.
ROS_ACTIVE
Active request processing state. Once an owner reached this state it must pass through the finalising state.
ROS_QUIESCE
No new requests are allowed in this state. Existing incoming requests are drained in this state.
ROS_FINALISING
Flushes all the loans. The owner collects from debtors and repays creditors.
ROS_DEAD_CREDITOR
Failure state.
Creditor was considered dead by HA. Owner made credits cleanup and is not able to satisfy any new incoming requests. Owner can recover from this state back to ROS_ACTIVE after HA notification saying RM creditor is online again or if user provides another creditor via m0_rm_owner_creditor_reset().
ROS_INSOLVENT
Final state.
During finalisation, if owner fails to clear the loans, it enters INSOLVENT state.
ROS_FINAL
Final state.
In this state owner credits lists are empty (including incoming and outgoing request lists).
Deletes all M0_RPF_BARRIER pins set by a given incoming request.
In other words, function cancels all credit reservations made by incoming request. Also it deletes pins set to track reserved credits. It is guaranteed that if M0_RPF_TRACK pin exists for reserved credit, then it was stuck to track reservation cancel, because reserved credit is cached. In some rare cases credit can also be held (if M0_RIF_LOCAL_WAIT was not set for 'in'), but logic works fine in this case too.
This function handles the request to borrow a credit to a resource on a server ("creditor").
Prior to borrowing credit remote object (debtor) has to be subscribed to HA notifications about conf object status change to handle debtor death properly. This is done in FOPH_RM_REQ_DEBTOR_SUBSCRIBE phase.
Parameters
fom
-> fom processing the CREDIT_BORROW request on the server
Call m0_rm_incoming_ops::rio_conflict() for all incoming requests which pinned the given credit. Function is called when a request arrives which conflicts with a held credit.
Checks whether barrier currently set for credit (if any) is overcome by a given incoming request. If yes, then barrier is replaced, otherwise tracking pin is added.
Check if credit that was granted by remote owner should be reserved by some incoming request that waits for outgoing request completion. If yes, then reserve 'to_cache' credit and force other requests to wait for reservation cancel.
Here we introduce "thundering herd" problem, potentially waking up all requests waiting for reserved credit. It is necessary, because rio_conflict() won't be called for 'in' if waiting requests are not woken up.
Ignore borrow requests for held non-conflicting credits. If it is the only credit that can satisfy incoming request, then eventually creditor will revoke it.
Ideally, non-conflicting credits should be borrowed unconditionally. But that means that copy of credit is borrowed, so the same credit is held by two RM owners and the total number of credits in cluster increases. Currently, there is no way in RM framework to link credit and its borrowed copy.
Main helper function to incoming_check(), which starts with "rest" set to the wanted credit and goes though the sequence of checks, reducing "rest".
CHECK logic can be described by means of "wait conditions". A wait condition is something that prevents immediate fulfillment of the request.
- A request with RIF_LOCAL_WAIT bit set can be fulfilled iff the credits
on ->ro_owned[OWOS_CACHED] list together imply the wanted credit;
- a request without RIF_LOCAL_WAIT bit can be fulfilled iff the credits
on all ->ro_owned[] lists together imply the wanted credit.
If there is not enough credits on ->ro_owned[] lists, an incoming request has to wait until some additional credits are borrowed from the upward creditor or revoked from downward debtors.
A RIF_LOCAL_WAIT request, in addition, can wait until a credit moves from ->ro_owned[OWOS_HELD] to ->ro_owned[OWOS_CACHED].
This function performs no state transitions by itself. Instead its return value indicates the target state:
- 0: the request is fulfilled, the target state is RI_SUCCESS,
- +ve: more waiting is needed, the target state is RI_WAIT,
- -ve: error, the target state is RI_FAILURE.
External resource manager entry point: request a credit from the resource owner.
Starts a state machine for a resource usage credit request. Adds pins for this request. Asynchronous operation - the credit will not generally be held at exit.
Initialises generic fields in struct m0_rm_credit.
This is called by generic RM code to initialise an empty credit of any resource type and by resource type specific code to initialise generic fields of a struct m0_rm_credit.
Wait for owner to get to a particular state. Once the winding up process on owner has started, it can take a while. The following function will typically used to check if the owner has reached ROS_FINAL state. The user can then safely call m0_rm_owner_fini(). Calling m0_rm_owner_fini() immediately after m0_rm_owner_windup() may cause unexpected behaviour.
Allocates m0_rm_outgoing, adds a pin from "in" to the outgoing request. Constructs and sends an outgoing request fop. Arranges m0_rm_outgoing_complete() to be called on fop reply or timeout.
Sends a resource management fop to the service. The service responds with the remote owner identifier (m0_rm_remote::rem_id) used for further communications.
This function handles the request to revoke a credit to a resource on a server ("debtor"). REVOKE is typically issued to the client. In Motr, resources are arranged in hierarchy (chain). Hence a server can receive REVOKE from another server.
Parameters
fom
-> fom processing the CREDIT_REVOKE request on the server
Sends an outgoing revoke request to remote owner specified by the "loan". The request will revoke the credit "credit", which might be a part of original loan.
Asynchronously starts subscribing debtor to HA notification. This call makes remote request FOM sm route to diverge and first locate conf object corresponding to the remote object and install clink into the object's channel, and only then continue dealing with credit processing.
Callback that is associated with remote RM service configuration object state change. If HA notification about service failure is accepted, then post AST to resource type sm group to process remote failure under group lock.
Note
RM service can recover from failure in the feature. HA will send notification with M0_NC_ONLINE ha state, so keep tracking RM service state.
Processes remote RM service failure. Corresponding remote instance of type m0_rm_remote played one of two exclusive roles for local owners: debtor or creditor. Both roles are handled in this function to make it generic.
If remote instance was created to represent debtor part in the loans, then these loans are instantly settled to cached list. If there are outgoing revoke requests in progress to this remote, then they will eventually be completed with successful return code (
See also
revoke_ast()).
If remote instance was created to represent creditor of some local owner, then further functioning of this owner is impossible, because all borrowed credits should be dropped. The potential problem is that borrowed credits could be sub-let to other owners or could be in active use ("held"). In order to gracefully drop these credits owner does "self-windup" process. Eventually this owner will transit to ROS_FINAL state.
Note
Remote structures (m0_rm_remote) are not destroyed until rm resource finalisation, because there can be outgoing requests to them in progress. Also remote can recover from failed state and become online again.
if rc != 0, then credits remain in sub-let list and can't be revoked anymore. Also we can't return error code to user. Maybe we should notify HA about error?
Processes HA notification saying remote is ONLINE. We are interested only in the case when remote recovered from a failure. If remote is a debtor, then local owner will accept requests from it again. If remote is a creditor, then local owner regain an opportunity to satisfy incoming requests.
Note
It is assumed that all RPC sessions to remote are still valid and operational.