Motr
M0
|
This document describes the design of the auto-provisioning of network buffers to the receive message queue of a transfer machine feature.
, and Network Buffer Pool.
from HLD of Motr LNet Transport : For documentation links, please refer to this file : doc/motr-design-doc-list.rsti
The following new APIs are introduced:
The transfer machine data structure is extended as follows:
The network buffer data structure is extended as follows:
The following enumeration is defined:
Automatic provisioning of network buffers to the receive queue takes place only when a buffer pool is "attached" to a transfer machine using the m0_net_tm_pool_attach() subroutine. The subroutine can only be called prior to starting the transfer machine.
The subroutine validates that the specified pool is in the same network domain. It then saves the pool pointer in the m0_net_transfer_mc::ntm_recv_pool field, and the callback pointer in the m0_net_transfer_mc::ntm_recv_pool_callbacks field.
The first attempt to provision the transfer machine is on start up, just after the transfer machine state change event is delivered. This ensures that there is no race condition between the state change event and the buffer completion callback notifying receipt first incoming unsolicited message.
The receive message queue is provisioned with M0_NET_TM_RECV_QUEUE_DEF_LEN (nominally 2) network buffers if possible.
Whenever a network buffer is dequeued from the receive message queue, an attempt to re-provision the queue is made prior to delivering the buffer completion event. This ensures that the queue is replenished as soon as possible. Note that not every receive message queue buffer completion event will trigger re-provisioning if multiple message delivery is enabled in the buffer.
When re-provisioning, as many buffers are fetched from the pool as needed to bring its length to the minimum desired value. Changing the minimum receive queue length with the m0_net_tm_pool_length_set() subroutine always triggers an attempt to re-provision. No attempt is ever made, however, to return buffers to the pool if the length of the queue is greater than the minimum.
New network buffers are obtained by invoking the m0_net_buffer_pool_get() subroutine. The buffer obtained from this subroutine is expected to have its m0_net_buffer::nb_pool variable set to the pool pointer, to enable the application to easily return it to the pool it came from, without having to explicitly track the pool. This requires a modification to the m0_net_buffer_pool_get() subroutine.
Actual provisioning is done by invoking m0_net_buffer_add() or its internal equivalent, depending on the locking model used.
It is possible that the buffer pool gets exhausted and re-provisioning fails, partially or entirely. In such cases, the transfer machine maintains a count of the number of additional buffers it requires in the m0_net_transfer_mc::ntm_recv_queue_deficit atomic variable. This is to facilitate later re-provisioning without unnecessary locking and loss of locality.
Re-provisioning a transfer machine after pool exhaustion requires a triggering event:
The first two cases result in the same behavior as normal provisioning.
The m0_net_domain_buffer_pool_not_empty() subroutine initiates the replenishment of all depleted transfer machines in the network domain that are provisioned from the specified buffer pool. The order in which each transfer machine gets processed is arbitrary, but this poses no particular problem because such a situation is assumed to be very rare and the system is already in deep trouble were it to happen.
The following pseudo-code illustrates the subroutine algorithm:
Note the following:
Note that the network layer has no control over the pool operations, so it is up to the application to supply a not-empty pool callback subroutine and make the call to the m0_net_domain_buffer_pool_not_empty() subroutine from there.
Automatic provisioning only takes place in an active transfer machine (state is M0_NET_TM_STARTED).
Automatic provisioning of a transfer machine exists in two (informal) states:
In the Provisioned state there are sufficient network buffers enqueued on the receive message queue. The algorithms do not care whether these buffers were obtained from the buffer pool or not, just that the count is right.
When there are insufficient network buffers in the receive message queue, the provisioning state is said to be Depleted. The provisioning algorithms work to change the state back to Provisioned by obtaining additional buffers from the buffer pool. It is expected that this usually gets done before the application can sense the transition out of the Provisioned state (prior to the buffer completion event callback), but there is a possibility that the pool gets exhausted before this is accomplished.
A non-zero value in m0_net_transfer_mc::ntm_recv_queue_deficit indicates that automatic provisioning is in the Depleted state. Otherwise, it is in the Provisioned state.
Every time the minimum required network buffer count is modified it is possible that the transfer machine's automatic provisioning state transitions to Depleted, so an attempt is made to re-provision to restore the state.
There are more reentrancy issues involved with automatic provisioning than concurrency issues, which in some sense is more complicated.
Applications return receive message buffers to the buffer pool on their own accord, possibly, but not always, in the buffer completion callback itself. The transfer machine lock is not held by the application at this time; instead, the application has to obtain the pool lock to return the buffer. It is possible that this operation triggers a domain wide re-provisioning if the pool was exhausted. The re-provisioning operation, as explained above, will obtain the domain lock and internal transfer machine locks, but assumes that the buffer pool lock is held.
Normal provisioning usually takes place in the context of normal transfer machine operations, protected by the transfer machine mutex. The provisioning steps necessarily require that the pool lock be obtained which clearly is exactly in the opposite order of the application triggered re-provisioning, hence can result in a deadlock. Since application behavior cannot be dictated, normal provisioning must be made to use the same locking order as the re-provisioning case. This requires that the transfer machine lock be released, the pool lock obtained, and then the transfer machine lock re-obtained.
This is not a new situation; the transfer machine is already handling cases where it has to temporarily give up and re-obtain its own mutex. To avoid getting destroyed while operating out of its mutex, the transfer machine uses the m0_net_transfer_mc::ntm_callback_counter to indicate that it is operating in such a mode. When it re-obtains the mutex and decrements the counter, it signals on the m0_net_transfer_mc::ntm_chan channel. This is illustrated in the following pseudo-code:
The callback counter logic is already used currently to synchronize buffer completion events with the concurrent finalization of the transfer machine. The new addition is to obtain the pool and transfer machine locks if provisioning is needed; this can be done on a demand basis only for the most frequent re-provisioning case, so the overhead can be held to a minimum.
Transfer machine finalization must be slightly tweaked to continue waiting on the counter and channel as long as the transfer machine state is active.
The Network Buffer Pool module provides support for "colored" operations to maximize the locality of reference between a network buffer and a transfer machine. All m0_net_buffer_pool_get() calls will use the m0_net_transfer_mc::ntm_pool_colour field value as the color. This value is initialized to ::M0_BUFFER_ANY_COLOR
, and it is up to the higher level application to assign a color to the transfer machine with the m0_net_tm_colour_set() subroutine. The higher level application is also responsible for creating the buffer pool with sufficient colors in the first place.
Special care is taken during domain wide re-provisioning after the buffer pool recovers from an exhausted state, to not lose the locality of reference of the various transfer machine locks with respect to their CPUs. An atomic variable is used to track if a transfer machine needs re-provisioning, and only if this is the case is the lock obtained.
All tests are done with a fake transport and a real buffer pool.
No system testing is planned, though the multiple transfer machine unit tests have some system testing flavor.
For documentation links, please refer to this file : doc/motr-design-doc-list.rst