Motr  M0
LNet Transport User Space Core DLD

Overview

The LNet Transport is built over an address space agnostic "core" I/O interface. This document describes the user space implementation of this interface, which interacts with Kernel Core by use of the LNet Transport Device.


Definitions

  • HLD of Motr LNet Transport : For documentation links, please refer to this file : doc/motr-design-doc-list.rst

Requirements

  • r.m0.net.xprt.lnet.ioctl The user space core interacts with the kernel core through ioctl requests.
  • r.m0.net.xprt.lnet.aligned-objects The implementation must ensure that shared objects do not cross page boundaries.

Dependencies


Design Highlights

  • The Core API is an address space agnostic I/O interface intended for use by the Motr Networking LNet transport operation layer in either user space or kernel space.
  • The user space implementation interacts with the kernel core implementation via a device driver.
  • Each user space m0_net_domain corresponds to opening a separate file descriptor.
  • Shared memory objects allow indirect interaction with the kernel and reduce the number of required context switches.
  • Those core operations that require direct kernel interaction do so via ioctl requests.

Logical Specification

Component Overview

The relationship between the various objects in the components of the LNet transport and the networking layer is illustrated in the following UML diagram.

lnet_xo.png
LNet Transport Objects

The Core layer in user space has no sub-components but interfaces with the kernel core layer via the device driver layer.

  • HLD of Motr LNet Transport : For documentation links, please refer to this file : doc/motr-design-doc-list.rst

Refer specifically the Design Highlights component diagram.

See also
Kernel Support for User Space Transports.

Memory Allocation Strategy

The LNet driver layer requires that each shared object fit within a single page. Assertions about the structures in question,

ensure they are smaller than a page in size. However, to guarantee that instances of these structures do not cross page boundaries, all allocations of these structures must be performed using m0_alloc_aligned(). The shift parameter for each allocation must be picked such that 1<<shift is at least the size of the structure. Build-time assertions about these shifts can assure the correct shift is used.

Strategy for Kernel Interaction

The user space core interacts with the kernel through ioctl requests on a file descriptor opened on the "/dev/m0lnet" device. There is a 1:1 correspondence between m0_net_domain (nlx_core_domain) objects and file descriptors. So, each time a domain is initialized, a new file descriptor is obtained. After the file descriptor is obtained, further interaction is in the form of ioctl requests. When the m0_net_domain is finalized, the file descriptor is closed. The specific interactions are detailed in the following sections.

See also
LNet Transport Device DLD

Domain Initialization

In the case of domain initialization, nlx_core_dom_init(), the following sequence of tasks is performed by the user space core. This is the first interaction between the user space core and the kernel core.

See also
Corresponding device layer behavior

Domain Finalization

During domain finalization, nlx_core_dom_fini(), the user space core performs the following steps.

  • It completes pre-checks of the nlx_ucore_domain and nlx_core_domain objects.
  • It calls close() to release the file descriptor. This will typically cause the kernel to immediately finalize its private data and release resources (unless there is duplicate file descriptor, in which case the kernel will delay finalization until the final duplicate is closed; this is unlikely because the file descriptor is not exposed and the file is opened using O_CLOEXEC).
  • It completes any post-finalization steps, such as freeing its nlx_ucore_domain object.
See also
Corresponding device layer behavior

Buffer Registration and De-registration

The user space core implementations of nlx_core_get_max_buffer_size(), nlx_core_get_max_buffer_segment_size() and nlx_core_get_max_buffer_segments() each return the corresponding value cached in the nlx_ucore_domain object.

The user space core completes the following tasks to perform buffer registration.

The user space core completes the following tasks to perform buffer de-registration.

See also
Corresponding device layer behavior

Managing the Buffer Event Queue

The nlx_core_new_blessed_bev() helper allocates and blesses buffer event objects. In user space, blessing the object requires interacting with the kernel. After the object is blessed by the kernel, the user space core can add it to the buffer event queue directly, without further kernel interaction. The following steps are taken by the user space core.

Buffer event objects are never removed from the buffer event queue until the transfer machine is stopped.

See also
Corresponding device layer behavior

Starting a Transfer Machine

The user space core nlx_core_tm_start() subroutine completes the following tasks to start a transfer machine. Recall that there is no core API corresponding to the nlx_xo_tm_init() function.

See also
Corresponding device layer behavior

Stopping a Transfer Machine

The user space core nlx_core_tm_stop() subroutine completes the following tasks to stop a transfer machine. Recall that there is no core API corresponding to the nlx_xo_tm_fini() function.

See also
Corresponding device layer behavior

Transfer Machine Buffer Queue Operations

Several LNet transport core subroutines,

operate on buffers and transfer machine queues. In all user space core cases, the shared objects, nlx_core_buffer and nlx_core_transfer_mc, must have been previously shared with the kernel, through use of the M0_LNET_BUF_REGISTER and M0_LNET_TM_START ioctl requests, respectively.

The ioctl requests available to the user space core for managing buffers and transfer machine buffer queues are as follows.

In each case, the user space core performs the following steps.

  • Validates the parameters.
  • Declares a m0_lnet_dev_buf_queue_params object and sets the two fields. In this case, both fields are set to the kernel private pointers of the shared objects.
  • Performs the appropriate ioctl request from the list above.
See also
Corresponding device layer behavior

Waiting for Buffer Events

The user space core nlx_core_buf_event_wait() subroutine completes the following tasks to wait for buffer events.

See also
Corresponding device layer behavior

Node Identifier Support

Operations involving NID strings require ioctl requests to access kernel-only functions.

Most of the nlx_core_ep_addr_decode() and nlx_core_ep_addr_encode() functions can be implemented common in user and kernel space code. However, converting a NID to a string or vice versa requires access to functions which exists only in the kernel. The nlx_core_nidstr_decode() and nlx_core_nidstr_encode() functions provide separate user and kernel implementations of this conversion code.

To convert a NID string to a NID, the user space core performs the following tasks.

  • It declares a m0_lnet_dev_nid_encdec_params and sets the dn_buf to the string to be decoded.
  • It calls the M0_LNET_NIDSTR_DECODE ioctl request to cause the kernel to decode the string. On successful return, the dn_nid field will be set to the corresponding NID.

To convert a NID into a NID string, the user space core performs the following tasks.

  • It declares a m0_lnet_dev_nid_encdec_params and sets the dn_nid to the value to be converted.
  • It calls the M0_LNET_NIDSTR_ENCODE ioctl request to cause the kernel to encode the string. On successful return, the dn_buf field will be set to the corresponding NID string.

The final operations involving NID strings are the nlx_core_nidstrs_get() and nlx_core_nidstrs_put() operations. The user space core obtains the strings from the kernel using the M0_LNET_NIDSTRS_GET ioctl request. This ioctl request returns a copy of the strings, rather than sharing a reference to them. As such, there is no ioctl request to "put" the strings. To get the list of strings, the user space core performs the following tasks.

  • It allocates buffer where the NID strings are to be stored.
  • It declares a m0_lnet_dev_nidstrs_get_params object and sets the fields based on the allocated buffer and its size.
  • It performs a M0_LNET_NIDSTRS_GET ioctl request to populate the buffer with the NID strings, which returns the number of NID strings (not 0) on success.
  • If the ioctl request returns -EFBIG, the buffer should be freed, a larger buffer allocated, and the ioctl request re-attempted.
  • It allocates a char** array corresponding to the number of NID strings (plus 1 for the required terminating NULL pointer).
  • It populates this array by iterating over the now-populated buffer, adding a pointer to each nul-terminated NID string, until the number of strings returned by the ioctl request have been populated.
See also
Corresponding device layer behavior

State Specification

The User Space Core implementation does not introduce its own state model, but operates within the frameworks defined by the Motr Networking Module and the Kernel device driver interface.

Use of the driver requires a file descriptor. This file descriptor is obtained as part of nlx_core_dom_init() and closed as part of nlx_core_dom_fini().

See also
Corresponding device layer behavior

Threading and Concurrency Model

The user space threading and concurrency model works in conjunction with the kernel core model. No additional behavior is added in user space.

See also
Kernel Core Threading and Concurrency Model

NUMA optimizations

The user space core does not allocate threads. The user space application can control thread processor affiliation by confining the threads it uses to via use of m0_thread_confine().


Conformance

  • i.m0.net.xprt.lnet.ioctl The Logical Specification covers how each LNet Core operation in user space is implemented using the driver ioctl requests.
  • i.m0.net.xprt.lnet.aligned-objects The Memory Allocation Strategy section discusses how shared objects can be allocated as required.

Unit Tests

Unit tests already exist for testing the core API. These tests have been used previously for the kernel core implementation. Since the user space must implement the same behavior, the unit tests will be reused.

See also
LNet Transport Unit Tests

System Tests

System testing will be performed as part of the transport operation system test.


Analysis

The overall design of the LNet transport already addresses the need to minimize data copying between the kernel and user space, and the need to minimize context switching. This is accomplished by use of shared memory and a circular buffer event queue maintained in shared memory. For more information, refer to the HLD. For documentation links, please refer to this file : doc/motr-design-doc-list.rst

In general, the User Core layer simply routes parameters to and from the Kernel Core via the LNet driver. The complexity of this routing is analyzed in LNet Driver Analysis.

The user core requires a small structure for each shared core structure. These user core private structures, e.g. nlx_ucore_domain are of fixed size and their number is directly proportional to the number of core objects allocated by the transport layer.


References