Motr  M0
Collaboration diagram for FDMI FOL source:

Data Structures

struct  ffs_fol_frag_handler
 
struct  m0_fol_fdmi_src_ctx
 

Macros

#define M0_FOL_FRAG_DATA_HANDLER_DECLARE(_opecode, _get_val_func)
 

Functions

 M0_TL_DESCR_DEFINE (ffs_tx, "fdmi fol src tx list", M0_INTERNAL, struct m0_be_tx, t_fdmi_linkage, t_magic, M0_BE_TX_MAGIC, M0_BE_TX_ENGINE_MAGIC)
 
 M0_TL_DEFINE (ffs_tx, M0_INTERNAL, struct m0_be_tx)
 
static struct m0_dtxffs_get_dtx (struct m0_fdmi_src_rec *src_rec)
 
static void be_tx_put_ast_cb (struct m0_sm_group *grp, struct m0_sm_ast *ast)
 
static void ffs_tx_inc_refc (struct m0_be_tx *be_tx, int64_t *counter)
 
static void ffs_tx_dec_refc (struct m0_be_tx *be_tx, int64_t *counter)
 
static int64_t ffs_rec_get (struct m0_fdmi_src_rec *src_rec)
 
static int64_t ffs_rec_put (struct m0_fdmi_src_rec *src_rec)
 
static int ffs_op_node_eval (struct m0_fdmi_src_rec *src_rec, struct m0_fdmi_flt_var_node *value_desc, struct m0_fdmi_flt_operand *value)
 
static void ffs_op_get (struct m0_fdmi_src_rec *src_rec)
 
static void ffs_op_put (struct m0_fdmi_src_rec *src_rec)
 
static int ffs_op_encode (struct m0_fdmi_src_rec *src_rec, struct m0_buf *buf)
 
static int ffs_op_decode (struct m0_buf *buf, void **handle)
 
static void ffs_op_begin (struct m0_fdmi_src_rec *src_rec)
 
static void ffs_op_end (struct m0_fdmi_src_rec *src_rec)
 
M0_INTERNAL int m0_fol_fdmi_src_init (void)
 
M0_INTERNAL void m0_fol_fdmi_src_fini (void)
 
M0_INTERNAL int m0_fol_fdmi_src_deinit (void)
 
M0_INTERNAL void m0_fol_fdmi_post_record (struct m0_fom *fom)
 
M0_INTERNAL bool m0_fol_fdmi__filter_kv_substring_match (struct m0_buf *value, const char **substrings)
 
M0_INTERNAL int m0_fol_fdmi_filter_kv_substring (struct m0_fdmi_eval_ctx *ctx, struct m0_conf_fdmi_filter *filter, struct m0_fdmi_eval_var_info *var_info)
 

Variables

static struct ffs_fol_frag_handler ffs_frag_handler_array []
 

Detailed Description

See also
FDMI Functional Specification

Implementation notes.

FDMI needs in-memory representaion of a FOL record to operate on. So, FOL source will increase backend transaction ref counter (fom->fo_tx.be_tx, using m0_be_tx_get()) to make sure it is not destroyed, and pass m0_fom::fo_tx as a handle to FDMI. The refcounter will be decremented back once FDMI has completed its processing and all plugins confirm they are done with record.

FDMI refc inc/dec will be kept as a separate counter inside m0_be_tx. This will help prevent/debug cases when FDMI decref calls count does not match incref calls count. At first, we used transaction lock to protect this counter modification, but it caused deadlocks. So we switched to using m0_atomic64 instead. This is OK, since inc/dec operations are never mixed, they are always "in line": N inc operations, followed by N dec operations, so there is no chance of race condition when it decreased to zero, and we initiated tx release operation, and then "someone" decides to increase the counter again.

This is implementation of Phase1, which does not need transaction support (that is, we don't need to re-send FDMI records, which would normally happen in case when plugin for example crashes and re-requests FDMI records starting from X in the past). This assumption results in the following implementation appoach.

FOL records pruning on plugin side

FDMI FOL records may be persisted on FDMI plugin side. In this case FDMI plugin must handle duplicates (because the process with FDMI source may restart before receiving "the message had been received" confirmation from FDMI plugin). In case if the source of FDMI records is CAS there might also be duplicates (FDMI FOL records about the same KV operation, but from different CASes due to N-way replication of KV pairs in DIX), and they must also be deduplicated.

One of the deduplication approaches is to have some kind of persistence on FDMI plugin side. Every record is looked up in this persistence and if there is a match then it's a duplicate.

The persistence couldn't grow indefinitely, so there should be a way to prune it. One obvious thing would be to prune records after some timeout, but delayed DTM recovery may make this timeout very high (days, weeks or more). Another approach is to prune the records after it's known for sure that they are not going to be resent again.

Current implementation uses m0_fol_rec_header::rh_lsn to send lsn for each FDMI FOL record to FDMI plugin. This lsn (log sequence number) is a monotonically non-decreasing number that represents position of BE transaction in BE log. FOL record is stored along with other BE tx data in BE log and therefore has the same lsn as the corresponding BE tx. Several transactions may have the same lsn in the current implementation, but there is a limit on a number of transactions with the same lsn. m0_fol_rec_header::rh_lsn_discarded is an lsn, for which every other transaction with lsn less than rh_lsn_discarded is never going to be sent again from the same BE domain (in the configurations that we are using currently it's equivalent to the Motr process that uses this BE domain). It means that every FDMI FOL record which has all rh_lsn less than corresponding rh_lsn_discarded for its BE domain could be discarded from deduplication persistence because there is nothing in the cluster that is going to send FDMI FOL record about this operation.

FOL records pruning on plugin side

There are 2 major cases here:

  1. FDMI plugin restarts, FDMI source is not. In this case FDMI source needs to resend everything FDMI plugin hasn't confirmed consumption for. This could be done by FDMI source dock fom by indefinitely resending FDMI records until either FDMI plugin confirms consumption or FDMI plugin process fails permanently.
  2. FDMI source restarts. The following description is about this case.

FDMI source may restart unexpectedly (crash/restart) or it might restart gracefully (graceful shutdown/startup). In either case there may be FDMI FOL records that don't have consumption confirmation from FDMI plugin. Current implementation takes BE tx reference until there is such confirmation from FDMI plugin. BE tx reference taken means in this case that BE tx wouldn't be discarded from BE log until the reference is put. BE recovery recovers all transactions that were not discarded, which is very useful for FDMI FOL record resend case: if all FOL records for such recovered transactions are resent as FDMI FOL records then there will be no FDMI FOL record missing on FDMI plugin side regardless whether FDMI source restarts or not, how and how many times it restarts.

It leads to an obvious solution: just send all FOL records as FDMI FOL records during BE recovery.

There are several ways this task could be done.

FOM for each FOL record

For each FOL record a fom with the orignal fom type is created. A special flag is added to indicate that the fom was created during BE recovery . It allows the fom to not to execute it's usual actions, but to close BE tx immediatelly. Then, after BE tx goes to M0_BTS_LOGGED phase a generic fom phase calls m0_fom_fdmi_record_post(), which sends FDMI record to FDMI plugin as usual.

FOM for each FOL record

A special FOM would be created for each recovered BE tx. The purpose of the fom would be post FDMI record with the FOL record for this BE tx and then wait until consumption of the FDMI record is acklowledged.

FDMI records for every BE tx

FDMI FOL record for every recovered BE tx is posted as usual. FDMI source dock fom would make a queue of all the records and it would send them and wait for consumption acknowledgement as usual.

special BE recovery FOM phase

A special FOM phase is added to every fom that stores something in BE. If the fom is created during BE recovery, then initial phase of the fom is this special phase. This allows each fom to handle BE recovery as it sees fit. Default generic phase sequence should also include this special fom phase and by default it would post FDMI FOL record in the same way it's done currently for normal FOM phase sequence.

details

Macro Definition Documentation

◆ M0_FOL_FRAG_DATA_HANDLER_DECLARE

#define M0_FOL_FRAG_DATA_HANDLER_DECLARE (   _opecode,
  _get_val_func 
)
Value:
{ \
.ffh_opecode = (_opecode), \
.ffh_fol_frag_get_val = (_get_val_func) }

Definition at line 210 of file fol_fdmi_src.c.

Function Documentation

◆ be_tx_put_ast_cb()

static void be_tx_put_ast_cb ( struct m0_sm_group grp,
struct m0_sm_ast ast 
)
static

Definition at line 243 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ffs_get_dtx()

static struct m0_dtx* ffs_get_dtx ( struct m0_fdmi_src_rec src_rec)
static

Definition at line 232 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ffs_op_begin()

static void ffs_op_begin ( struct m0_fdmi_src_rec src_rec)
static

No need to do anything on this event for FOL Source. Call to ffs_rec_get() done in m0_fol_fdmi_post_record below will make sure the data is already in memory and available for fast access at the moment of this call.

Definition at line 520 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ffs_op_decode()

static int ffs_op_decode ( struct m0_buf buf,
void **  handle 
)
static

Definition at line 482 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ffs_op_encode()

static int ffs_op_encode ( struct m0_fdmi_src_rec src_rec,
struct m0_buf buf 
)
static
Todo:
Q: (for FOL owners) FOL record does not provide API call to calculate record size when encoded. For now, I'll do double allocation. Alloc internal buf of max size, then encode, then alloc with correct size, then copy, then dealloc inernal buf. Can be done properly once FOL record owner exports needed api call.

Definition at line 419 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ffs_op_end()

static void ffs_op_end ( struct m0_fdmi_src_rec src_rec)
static

Definition at line 538 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ffs_op_get()

static void ffs_op_get ( struct m0_fdmi_src_rec src_rec)
static

Definition at line 396 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ffs_op_node_eval()

static int ffs_op_node_eval ( struct m0_fdmi_src_rec src_rec,
struct m0_fdmi_flt_var_node value_desc,
struct m0_fdmi_flt_operand value 
)
static
Todo:
Phase 2: STUB: For now, we will not analyze filter, we just return FOL op code – always.

TODO: Q: (question to FOP/FOL owners) I could not find a better way to assert that this frag is of m0_fop_fol_frag_type, than to use this workaround (referencing internal _ops structure). Looks like they are ALWAYS of this type?... Now that there is NO indication of frag type whatsoever?...

Definition at line 351 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ffs_op_put()

static void ffs_op_put ( struct m0_fdmi_src_rec src_rec)
static

Definition at line 408 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ffs_rec_get()

static int64_t ffs_rec_get ( struct m0_fdmi_src_rec src_rec)
static

Definition at line 314 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ffs_rec_put()

static int64_t ffs_rec_put ( struct m0_fdmi_src_rec src_rec)
static

Definition at line 332 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ffs_tx_dec_refc()

static void ffs_tx_dec_refc ( struct m0_be_tx be_tx,
int64_t *  counter 
)
static

Definition at line 285 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ffs_tx_inc_refc()

static void ffs_tx_inc_refc ( struct m0_be_tx be_tx,
int64_t *  counter 
)
static

Value = 0 means this call happened during record posting. Execution context is well-defined, all locks already acquired, no need to use AST.

Definition at line 254 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ m0_fol_fdmi__filter_kv_substring_match()

M0_INTERNAL bool m0_fol_fdmi__filter_kv_substring_match ( struct m0_buf value,
const char **  substrings 
)

Internal function used to match the strings. Exported for UTs.

Definition at line 710 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ m0_fol_fdmi_filter_kv_substring()

M0_INTERNAL int m0_fol_fdmi_filter_kv_substring ( struct m0_fdmi_eval_ctx ctx,
struct m0_conf_fdmi_filter filter,
struct m0_fdmi_eval_var_info var_info 
)

Implements M0_FDMI_FILTER_TYPE_KV_SUBSTRING filter.

Definition at line 738 of file fol_fdmi_src.c.

Here is the call graph for this function:

◆ m0_fol_fdmi_post_record()

M0_INTERNAL void m0_fol_fdmi_post_record ( struct m0_fom fom)

Submit new FOL entry to FDMI.

There is no "unpost record" method, so we have to prepare everything that may fail – before calling to post method.

NOTE: IMPORTANT! Do not call anything that may fail here! It is not possible to un-post the record; anything that may fail, must be done before the M0_FDMI_SOURCE_POST_RECORD call above.

Definition at line 646 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ m0_fol_fdmi_src_deinit()

M0_INTERNAL int m0_fol_fdmi_src_deinit ( void  )

Deinitializes FOL FDMI source.

The deregister below does not call for fs_put/fs_end, so we'll have to do call m0_be_tx_put explicitly here, over all transactions we've locked.

Note we don't reset t_fdmi_ref here, it's a flag the record is not yet released by plugins.

Definition at line 601 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ m0_fol_fdmi_src_fini()

M0_INTERNAL void m0_fol_fdmi_src_fini ( void  )

Deinitializes FOL FDMI source.

Same as m0_fol_fdmi_src_deinit, but suppresses retcode. Needed for motr/init.c table.

Definition at line 594 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ m0_fol_fdmi_src_init()

M0_INTERNAL int m0_fol_fdmi_src_init ( void  )

Initializes/registers FOL FDMI source.

Definition at line 551 of file fol_fdmi_src.c.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ M0_TL_DEFINE()

M0_TL_DEFINE ( ffs_tx  ,
M0_INTERNAL  ,
struct m0_be_tx   
)

◆ M0_TL_DESCR_DEFINE()

M0_TL_DESCR_DEFINE ( ffs_tx  ,
"fdmi fol src tx list ,
M0_INTERNAL  ,
struct m0_be_tx  ,
t_fdmi_linkage  ,
t_magic  ,
M0_BE_TX_MAGIC  ,
M0_BE_TX_ENGINE_MAGIC   
)

Variable Documentation

◆ ffs_frag_handler_array

struct ffs_fol_frag_handler ffs_frag_handler_array[]
static
Initial value:
= {
}

Definition at line 214 of file fol_fdmi_src.c.