Motr
M0
|
The LNet Transport is built over an address space agnostic "core" I/O interface. This document describes the user space implementation of this interface, which interacts with Kernel Core by use of the LNet Transport Device.
m0_net_lnet_ifaces_get()
and m0_net_lnet_ifaces_put()
APIs must be changed to require a m0_net_domain
parameter, because the user space must interact with the kernel to get the interfaces, and a per-domain file descriptor is required for this interaction.nlx_core_ep_addr_encode()
and nlx_core_nidstr_decode()
functions are added to the core interface, allowing endpoint address encoding and decoding to be implemented in an address space independent manner.m0_net_domain
corresponds to opening a separate file descriptor.The relationship between the various objects in the components of the LNet transport and the networking layer is illustrated in the following UML diagram.
The Core layer in user space has no sub-components but interfaces with the kernel core layer via the device driver layer.
Refer specifically the Design Highlights component diagram.
The LNet driver layer requires that each shared object fit within a single page. Assertions about the structures in question,
ensure they are smaller than a page in size. However, to guarantee that instances of these structures do not cross page boundaries, all allocations of these structures must be performed using m0_alloc_aligned()
. The shift
parameter for each allocation must be picked such that 1<<shift
is at least the size of the structure. Build-time assertions about these shifts can assure the correct shift is used.
The user space core interacts with the kernel through ioctl requests on a file descriptor opened on the "/dev/m0lnet" device. There is a 1:1 correspondence between m0_net_domain
(nlx_core_domain
) objects and file descriptors. So, each time a domain is initialized, a new file descriptor is obtained. After the file descriptor is obtained, further interaction is in the form of ioctl requests. When the m0_net_domain
is finalized, the file descriptor is closed. The specific interactions are detailed in the following sections.
In the case of domain initialization, nlx_core_dom_init()
, the following sequence of tasks is performed by the user space core. This is the first interaction between the user space core and the kernel core.
nlx_core_domain
object.nlx_ucore_domain
object and setting the nlx_core_domain::cd_upvt
field.open()
system call. The device is named "/dev/m0lnet"
and the device is opened with O_RDWR|O_CLOEXEC
flags. The file descriptor is saved in the nlx_ucore_domain::ud_fd
field.m0_lnet_dev_dom_init_params
object, setting the m0_lnet_dev_dom_init_params::ddi_cd
field.nlx_core_domain
object via the M0_LNET_DOM_INIT
ioctl request. Note that a side effect of this request is that the nlx_core_domain::cd_kpvt
is set.nlx_core_domain
object and the nlx_ucore_domain
object, including caching the three buffer maximum size values returned in the m0_lnet_dev_dom_init_params
.During domain finalization, nlx_core_dom_fini()
, the user space core performs the following steps.
nlx_ucore_domain
and nlx_core_domain
objects.close()
to release the file descriptor. This will typically cause the kernel to immediately finalize its private data and release resources (unless there is duplicate file descriptor, in which case the kernel will delay finalization until the final duplicate is closed; this is unlikely because the file descriptor is not exposed and the file is opened using O_CLOEXEC
).nlx_ucore_domain
object.The user space core implementations of nlx_core_get_max_buffer_size()
, nlx_core_get_max_buffer_segment_size()
and nlx_core_get_max_buffer_segments()
each return the corresponding value cached in the nlx_ucore_domain
object.
The user space core completes the following tasks to perform buffer registration.
nlx_core_buffer
object. This includes allocating and initializing the nlx_ucore_buffer
object and setting the nlx_core_buffer::cb_upvt
field.m0_lnet_dev_buf_register_params
object, setting the parameter fields from the nlx_core_buf_register()
parameters.M0_LNET_BUF_REGISTER
ioctl request to share the buffer with the kernel and complete the kernel part of buffer registration.nlx_ucore_buffer
object.The user space core completes the following tasks to perform buffer de-registration.
nlx_core_buffer
object.M0_LNET_BUF_DEREGISTER
ioctl request, causing the kernel to complete the kernel part of buffer de-registration.nlx_core_buffer
and nlx_ucore_buffer
objects.nlx_ucore_buffer
object and resets the nlx_core_buffer::cb_upvt
to NULL.The nlx_core_new_blessed_bev()
helper allocates and blesses buffer event objects. In user space, blessing the object requires interacting with the kernel. After the object is blessed by the kernel, the user space core can add it to the buffer event queue directly, without further kernel interaction. The following steps are taken by the user space core.
nlx_core_buffer_event
object.m0_lnet_dev_bev_bless_params
object and sets its fields.M0_LNET_BEV_BLESS
ioctl request to share the nlx_core_buffer_event
object with the kernel and complete the kernel part of blessing the object.Buffer event objects are never removed from the buffer event queue until the transfer machine is stopped.
The user space core nlx_core_tm_start()
subroutine completes the following tasks to start a transfer machine. Recall that there is no core API corresponding to the nlx_xo_tm_init()
function.
nlx_core_transfer_mc
object. This includes allocating and initializing the nlx_ucore_transfer_mc
object and setting the nlx_core_transfer_mc::ctm_upvt
field.M0_LNET_TM_START
ioctl request to share the nlx_core_transfer_mc
object with the kernel and complete the kernel part of starting the transfer machine.nlx_core_buffer_event
objects, using the user space nlx_core_new_blessed_bev()
helper.nlx_core_buffer
and nlx_ucore_transfer_mc
objects. This including initializing the buffer event circular queue using the bev_cqueue_init()
function.The user space core nlx_core_tm_stop()
subroutine completes the following tasks to stop a transfer machine. Recall that there is no core API corresponding to the nlx_xo_tm_fini()
function.
nlx_core_transfer_mc
object.M0_LNET_TM_STOP
ioctl request, causing the kernel to complete the kernel part of stopping the transfer machine.bev_cqueue_fini()
function.nlx_ucore_transfer_mc
object and resets the nlx_core_transfer_mc::ctm_upvt
to NULL.Several LNet transport core subroutines,
nlx_core_buf_msg_recv()
nlx_core_buf_msg_send()
nlx_core_buf_active_recv()
nlx_core_buf_active_send()
nlx_core_buf_passive_recv()
nlx_core_buf_passive_send()
nlx_core_buf_del()
operate on buffers and transfer machine queues. In all user space core cases, the shared objects, nlx_core_buffer
and nlx_core_transfer_mc
, must have been previously shared with the kernel, through use of the M0_LNET_BUF_REGISTER
and M0_LNET_TM_START
ioctl requests, respectively.
The ioctl requests available to the user space core for managing buffers and transfer machine buffer queues are as follows.
M0_LNET_BUF_MSG_RECV
M0_LNET_BUF_MSG_SEND
M0_LNET_BUF_ACTIVE_RECV
M0_LNET_BUF_ACTIVE_SEND
M0_LNET_BUF_PASSIVE_RECV
M0_LNET_BUF_PASSIVE_SEND
M0_LNET_BUF_DEL
In each case, the user space core performs the following steps.
m0_lnet_dev_buf_queue_params
object and sets the two fields. In this case, both fields are set to the kernel private pointers of the shared objects.The user space core nlx_core_buf_event_wait() subroutine completes the following tasks to wait for buffer events.
m0_lnet_dev_buf_event_wait_params
and sets the fields.M0_LNET_BUF_EVENT_WAIT
ioctl request to wait for the kernel to generate additional buffer events.Operations involving NID strings require ioctl requests to access kernel-only functions.
Most of the nlx_core_ep_addr_decode()
and nlx_core_ep_addr_encode()
functions can be implemented common in user and kernel space code. However, converting a NID to a string or vice versa requires access to functions which exists only in the kernel. The nlx_core_nidstr_decode()
and nlx_core_nidstr_encode()
functions provide separate user and kernel implementations of this conversion code.
To convert a NID string to a NID, the user space core performs the following tasks.
m0_lnet_dev_nid_encdec_params
and sets the dn_buf
to the string to be decoded.M0_LNET_NIDSTR_DECODE
ioctl request to cause the kernel to decode the string. On successful return, the dn_nid
field will be set to the corresponding NID.To convert a NID into a NID string, the user space core performs the following tasks.
m0_lnet_dev_nid_encdec_params
and sets the dn_nid
to the value to be converted.M0_LNET_NIDSTR_ENCODE
ioctl request to cause the kernel to encode the string. On successful return, the dn_buf
field will be set to the corresponding NID string.The final operations involving NID strings are the nlx_core_nidstrs_get()
and nlx_core_nidstrs_put()
operations. The user space core obtains the strings from the kernel using the M0_LNET_NIDSTRS_GET
ioctl request. This ioctl request returns a copy of the strings, rather than sharing a reference to them. As such, there is no ioctl request to "put" the strings. To get the list of strings, the user space core performs the following tasks.
m0_lnet_dev_nidstrs_get_params
object and sets the fields based on the allocated buffer and its size.M0_LNET_NIDSTRS_GET
ioctl request to populate the buffer with the NID strings, which returns the number of NID strings (not 0) on success.char**
array corresponding to the number of NID strings (plus 1 for the required terminating NULL pointer).The User Space Core implementation does not introduce its own state model, but operates within the frameworks defined by the Motr Networking Module and the Kernel device driver interface.
Use of the driver requires a file descriptor. This file descriptor is obtained as part of nlx_core_dom_init()
and closed as part of nlx_core_dom_fini()
.
The user space threading and concurrency model works in conjunction with the kernel core model. No additional behavior is added in user space.
The user space core does not allocate threads. The user space application can control thread processor affiliation by confining the threads it uses to via use of m0_thread_confine()
.
Unit tests already exist for testing the core API. These tests have been used previously for the kernel core implementation. Since the user space must implement the same behavior, the unit tests will be reused.
System testing will be performed as part of the transport operation system test.
The overall design of the LNet transport already addresses the need to minimize data copying between the kernel and user space, and the need to minimize context switching. This is accomplished by use of shared memory and a circular buffer event queue maintained in shared memory. For more information, refer to the HLD. For documentation links, please refer to this file : doc/motr-design-doc-list.rst
In general, the User Core layer simply routes parameters to and from the Kernel Core via the LNet driver. The complexity of this routing is analyzed in LNet Driver Analysis.
The user core requires a small structure for each shared core structure. These user core private structures, e.g. nlx_ucore_domain
are of fixed size and their number is directly proportional to the number of core objects allocated by the transport layer.