Motr
M0
|
Data Structures | |
struct | ffs_fol_frag_handler |
struct | m0_fol_fdmi_src_ctx |
Macros | |
#define | M0_FOL_FRAG_DATA_HANDLER_DECLARE(_opecode, _get_val_func) |
Functions | |
M0_TL_DESCR_DEFINE (ffs_tx, "fdmi fol src tx list", M0_INTERNAL, struct m0_be_tx, t_fdmi_linkage, t_magic, M0_BE_TX_MAGIC, M0_BE_TX_ENGINE_MAGIC) | |
M0_TL_DEFINE (ffs_tx, M0_INTERNAL, struct m0_be_tx) | |
static struct m0_dtx * | ffs_get_dtx (struct m0_fdmi_src_rec *src_rec) |
static void | be_tx_put_ast_cb (struct m0_sm_group *grp, struct m0_sm_ast *ast) |
static void | ffs_tx_inc_refc (struct m0_be_tx *be_tx, int64_t *counter) |
static void | ffs_tx_dec_refc (struct m0_be_tx *be_tx, int64_t *counter) |
static int64_t | ffs_rec_get (struct m0_fdmi_src_rec *src_rec) |
static int64_t | ffs_rec_put (struct m0_fdmi_src_rec *src_rec) |
static int | ffs_op_node_eval (struct m0_fdmi_src_rec *src_rec, struct m0_fdmi_flt_var_node *value_desc, struct m0_fdmi_flt_operand *value) |
static void | ffs_op_get (struct m0_fdmi_src_rec *src_rec) |
static void | ffs_op_put (struct m0_fdmi_src_rec *src_rec) |
static int | ffs_op_encode (struct m0_fdmi_src_rec *src_rec, struct m0_buf *buf) |
static int | ffs_op_decode (struct m0_buf *buf, void **handle) |
static void | ffs_op_begin (struct m0_fdmi_src_rec *src_rec) |
static void | ffs_op_end (struct m0_fdmi_src_rec *src_rec) |
M0_INTERNAL int | m0_fol_fdmi_src_init (void) |
M0_INTERNAL void | m0_fol_fdmi_src_fini (void) |
M0_INTERNAL int | m0_fol_fdmi_src_deinit (void) |
M0_INTERNAL void | m0_fol_fdmi_post_record (struct m0_fom *fom) |
M0_INTERNAL bool | m0_fol_fdmi__filter_kv_substring_match (struct m0_buf *value, const char **substrings) |
M0_INTERNAL int | m0_fol_fdmi_filter_kv_substring (struct m0_fdmi_eval_ctx *ctx, struct m0_conf_fdmi_filter *filter, struct m0_fdmi_eval_var_info *var_info) |
Variables | |
static struct ffs_fol_frag_handler | ffs_frag_handler_array [] |
Implementation notes.
FDMI needs in-memory representaion of a FOL record to operate on. So, FOL source will increase backend transaction ref counter (fom->fo_tx.be_tx, using m0_be_tx_get()) to make sure it is not destroyed, and pass m0_fom::fo_tx as a handle to FDMI. The refcounter will be decremented back once FDMI has completed its processing and all plugins confirm they are done with record.
FDMI refc inc/dec will be kept as a separate counter inside m0_be_tx. This will help prevent/debug cases when FDMI decref calls count does not match incref calls count. At first, we used transaction lock to protect this counter modification, but it caused deadlocks. So we switched to using m0_atomic64 instead. This is OK, since inc/dec operations are never mixed, they are always "in line": N inc operations, followed by N dec operations, so there is no chance of race condition when it decreased to zero, and we initiated tx release operation, and then "someone" decides to increase the counter again.
This is implementation of Phase1, which does not need transaction support (that is, we don't need to re-send FDMI records, which would normally happen in case when plugin for example crashes and re-requests FDMI records starting from X in the past). This assumption results in the following implementation appoach.
FDMI FOL records may be persisted on FDMI plugin side. In this case FDMI plugin must handle duplicates (because the process with FDMI source may restart before receiving "the message had been received" confirmation from FDMI plugin). In case if the source of FDMI records is CAS there might also be duplicates (FDMI FOL records about the same KV operation, but from different CASes due to N-way replication of KV pairs in DIX), and they must also be deduplicated.
One of the deduplication approaches is to have some kind of persistence on FDMI plugin side. Every record is looked up in this persistence and if there is a match then it's a duplicate.
The persistence couldn't grow indefinitely, so there should be a way to prune it. One obvious thing would be to prune records after some timeout, but delayed DTM recovery may make this timeout very high (days, weeks or more). Another approach is to prune the records after it's known for sure that they are not going to be resent again.
Current implementation uses m0_fol_rec_header::rh_lsn to send lsn for each FDMI FOL record to FDMI plugin. This lsn (log sequence number) is a monotonically non-decreasing number that represents position of BE transaction in BE log. FOL record is stored along with other BE tx data in BE log and therefore has the same lsn as the corresponding BE tx. Several transactions may have the same lsn in the current implementation, but there is a limit on a number of transactions with the same lsn. m0_fol_rec_header::rh_lsn_discarded is an lsn, for which every other transaction with lsn less than rh_lsn_discarded is never going to be sent again from the same BE domain (in the configurations that we are using currently it's equivalent to the Motr process that uses this BE domain). It means that every FDMI FOL record which has all rh_lsn less than corresponding rh_lsn_discarded for its BE domain could be discarded from deduplication persistence because there is nothing in the cluster that is going to send FDMI FOL record about this operation.
There are 2 major cases here:
FDMI source may restart unexpectedly (crash/restart) or it might restart gracefully (graceful shutdown/startup). In either case there may be FDMI FOL records that don't have consumption confirmation from FDMI plugin. Current implementation takes BE tx reference until there is such confirmation from FDMI plugin. BE tx reference taken means in this case that BE tx wouldn't be discarded from BE log until the reference is put. BE recovery recovers all transactions that were not discarded, which is very useful for FDMI FOL record resend case: if all FOL records for such recovered transactions are resent as FDMI FOL records then there will be no FDMI FOL record missing on FDMI plugin side regardless whether FDMI source restarts or not, how and how many times it restarts.
It leads to an obvious solution: just send all FOL records as FDMI FOL records during BE recovery.
There are several ways this task could be done.
For each FOL record a fom with the orignal fom type is created. A special flag is added to indicate that the fom was created during BE recovery . It allows the fom to not to execute it's usual actions, but to close BE tx immediatelly. Then, after BE tx goes to M0_BTS_LOGGED phase a generic fom phase calls m0_fom_fdmi_record_post(), which sends FDMI record to FDMI plugin as usual.
A special FOM would be created for each recovered BE tx. The purpose of the fom would be post FDMI record with the FOL record for this BE tx and then wait until consumption of the FDMI record is acklowledged.
FDMI FOL record for every recovered BE tx is posted as usual. FDMI source dock fom would make a queue of all the records and it would send them and wait for consumption acknowledgement as usual.
A special FOM phase is added to every fom that stores something in BE. If the fom is created during BE recovery, then initial phase of the fom is this special phase. This allows each fom to handle BE recovery as it sees fit. Default generic phase sequence should also include this special fom phase and by default it would post FDMI FOL record in the same way it's done currently for normal FOM phase sequence.
#define M0_FOL_FRAG_DATA_HANDLER_DECLARE | ( | _opecode, | |
_get_val_func | |||
) |
Definition at line 210 of file fol_fdmi_src.c.
|
static |
Definition at line 243 of file fol_fdmi_src.c.
|
static |
Definition at line 232 of file fol_fdmi_src.c.
|
static |
No need to do anything on this event for FOL Source. Call to ffs_rec_get() done in m0_fol_fdmi_post_record below will make sure the data is already in memory and available for fast access at the moment of this call.
Definition at line 520 of file fol_fdmi_src.c.
|
static |
Definition at line 482 of file fol_fdmi_src.c.
|
static |
Definition at line 419 of file fol_fdmi_src.c.
|
static |
Definition at line 538 of file fol_fdmi_src.c.
|
static |
Definition at line 396 of file fol_fdmi_src.c.
|
static |
TODO: Q: (question to FOP/FOL owners) I could not find a better way to assert that this frag is of m0_fop_fol_frag_type, than to use this workaround (referencing internal _ops structure). Looks like they are ALWAYS of this type?... Now that there is NO indication of frag type whatsoever?...
Definition at line 351 of file fol_fdmi_src.c.
|
static |
Definition at line 408 of file fol_fdmi_src.c.
|
static |
Definition at line 314 of file fol_fdmi_src.c.
|
static |
Definition at line 332 of file fol_fdmi_src.c.
|
static |
Definition at line 285 of file fol_fdmi_src.c.
|
static |
Value = 0 means this call happened during record posting. Execution context is well-defined, all locks already acquired, no need to use AST.
Definition at line 254 of file fol_fdmi_src.c.
M0_INTERNAL bool m0_fol_fdmi__filter_kv_substring_match | ( | struct m0_buf * | value, |
const char ** | substrings | ||
) |
Internal function used to match the strings. Exported for UTs.
Definition at line 710 of file fol_fdmi_src.c.
M0_INTERNAL int m0_fol_fdmi_filter_kv_substring | ( | struct m0_fdmi_eval_ctx * | ctx, |
struct m0_conf_fdmi_filter * | filter, | ||
struct m0_fdmi_eval_var_info * | var_info | ||
) |
Implements M0_FDMI_FILTER_TYPE_KV_SUBSTRING filter.
Definition at line 738 of file fol_fdmi_src.c.
M0_INTERNAL void m0_fol_fdmi_post_record | ( | struct m0_fom * | fom | ) |
Submit new FOL entry to FDMI.
There is no "unpost record" method, so we have to prepare everything that may fail – before calling to post method.
NOTE: IMPORTANT! Do not call anything that may fail here! It is not possible to un-post the record; anything that may fail, must be done before the M0_FDMI_SOURCE_POST_RECORD call above.
Definition at line 646 of file fol_fdmi_src.c.
M0_INTERNAL int m0_fol_fdmi_src_deinit | ( | void | ) |
Deinitializes FOL FDMI source.
The deregister below does not call for fs_put/fs_end, so we'll have to do call m0_be_tx_put explicitly here, over all transactions we've locked.
Note we don't reset t_fdmi_ref here, it's a flag the record is not yet released by plugins.
Definition at line 601 of file fol_fdmi_src.c.
M0_INTERNAL void m0_fol_fdmi_src_fini | ( | void | ) |
Deinitializes FOL FDMI source.
Same as m0_fol_fdmi_src_deinit, but suppresses retcode. Needed for motr/init.c table.
Definition at line 594 of file fol_fdmi_src.c.
M0_INTERNAL int m0_fol_fdmi_src_init | ( | void | ) |
Initializes/registers FOL FDMI source.
Definition at line 551 of file fol_fdmi_src.c.
M0_TL_DEFINE | ( | ffs_tx | , |
M0_INTERNAL | , | ||
struct m0_be_tx | |||
) |
M0_TL_DESCR_DEFINE | ( | ffs_tx | , |
"fdmi fol src tx list" | , | ||
M0_INTERNAL | , | ||
struct m0_be_tx | , | ||
t_fdmi_linkage | , | ||
t_magic | , | ||
M0_BE_TX_MAGIC | , | ||
M0_BE_TX_ENGINE_MAGIC | |||
) |
|
static |
Definition at line 214 of file fol_fdmi_src.c.