barry: Your go-to motif accountant
0.0-1
Full enumeration of sample space and fast count of sufficient statistics for binary arrays
|
Classes | |
class | NetworkData |
Data class for Networks. More... | |
class | Counter< Array_Type, Data_Type > |
A counter function based on change statistics. More... | |
Macros | |
#define | MAKE_DEFM_HASHER(hasher, a, cov) |
Data for the counters. More... | |
#define | MAKE_DUPL_VARS() |
#define | IS_EITHER() (DATA_AT == Geese::etype_either) |
#define | IS_DUPLICATION() ((DATA_AT == Geese::etype_duplication) & (DPL)) |
#define | IS_SPECIATION() ((DATA_AT == Geese::etype_speciation) & (!DPL)) |
#define | IF_MATCHES() |
#define | IF_NOTMATCHES() |
#define | PHYLO_RULE_LAMBDA(a) |
Extension of a simple counter. More... | |
#define | PHYLO_COUNTER_LAMBDA(a) |
#define | PHYLO_RULE_DYN_LAMBDA(a) |
#define | PHYLO_CHECK_MISSING() |
std::string | get_last_name (size_t d) |
void | counter_overall_gains (PhyloCounters *counters, size_t duplication=Geese::etype_default) |
Overall functional gains. More... | |
void | counter_gains (PhyloCounters *counters, std::vector< size_t > nfun, size_t duplication=Geese::etype_default) |
Functional gains for a specific function (nfun ). More... | |
void | counter_gains_k_offspring (PhyloCounters *counters, std::vector< size_t > nfun, size_t k=1u, size_t duplication=Geese::etype_default) |
k genes gain function nfun More... | |
void | counter_genes_changing (PhyloCounters *counters, size_t duplication=Geese::etype_default) |
Keeps track of how many genes are changing (either 0, 1, or 2 if dealing with regular trees.) More... | |
void | counter_preserve_pseudogene (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default) |
Keeps track of how many pairs of genes preserve pseudostate. More... | |
void | counter_prop_genes_changing (PhyloCounters *counters, size_t duplication=Geese::etype_default) |
Keeps track of how many genes are changing (either 0, 1, or 2 if dealing with regular trees.) More... | |
void | counter_overall_loss (PhyloCounters *counters, size_t duplication=Geese::etype_default) |
Overall functional loss. More... | |
void | counter_maxfuns (PhyloCounters *counters, size_t lb, size_t ub, size_t duplication=Geese::etype_default) |
Cap the number of functions per gene. More... | |
void | counter_loss (PhyloCounters *counters, std::vector< size_t > nfun, size_t duplication=Geese::etype_default) |
Total count of losses for an specific function. More... | |
void | counter_overall_changes (PhyloCounters *counters, size_t duplication=Geese::etype_default) |
Total number of changes. Use this statistic to account for "preservation". More... | |
void | counter_subfun (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default) |
Total count of Sub-functionalization events. More... | |
void | counter_cogain (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default) |
Co-evolution (joint gain or loss) More... | |
void | counter_longest (PhyloCounters *counters, size_t duplication=Geese::etype_default) |
Longest branch mutates (either by gain or by loss) More... | |
void | counter_neofun (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default) |
Total number of neofunctionalization events. More... | |
void | counter_pairwise_neofun_singlefun (PhyloCounters *counters, size_t nfunA, size_t duplication=Geese::etype_default) |
Total number of neofunctionalization events sum_u sum_{w < u} [x(u,a)*(1 - x(w,a)) + (1 - x(u,a)) * x(w,a)] change stat: delta{x(u,a): 0->1} = 1 - 2 * x(w,a) More... | |
void | counter_neofun_a2b (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default) |
Total number of neofunctionalization events. More... | |
void | counter_co_opt (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default) |
Function co-opting. More... | |
void | counter_k_genes_changing (PhyloCounters *counters, size_t k, size_t duplication=Geese::etype_default) |
Indicator function. Equals to one if \(k\) genes changed and zero otherwise. More... | |
void | counter_less_than_p_prop_genes_changing (PhyloCounters *counters, double p, size_t duplication=Geese::etype_default) |
Indicator function. Equals to one if \(k\) genes changed and zero otherwise. More... | |
void | counter_gains_from_0 (PhyloCounters *counters, std::vector< size_t > nfun, size_t duplication=Geese::etype_default) |
Used when all the functions are in 0 (like the root node prob.) More... | |
void | counter_overall_gains_from_0 (PhyloCounters *counters, size_t duplication=Geese::etype_default) |
Used when all the functions are in 0 (like the root node prob.) More... | |
void | counter_pairwise_overall_change (PhyloCounters *counters, size_t duplication=Geese::etype_default) |
Used when all the functions are in 0 (like the root node prob.) More... | |
void | counter_pairwise_preserving (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default) |
Used when all the functions are in 0 (like the root node prob.) More... | |
void | counter_pairwise_first_gain (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default) |
Used when all the functions are in 0 (like the root node prob.) More... | |
barry
includes a flexible way to generate counters based on change statistics. Since most of the time we are counting many motifs in a graph, change statistics make a reasonable (and efficient) way to make such counts.
In particular, let the motif be defined as \(s(y)\), with \(y\) as the binary array. The change statistic when adding cell \(y_{ij}\), i.e. when the cell moves from being emty to have a one, is defined as
\[ \delta(y_{ij}) = s^+_{ij}(y) - s^-_{ij}(y), \]
where \(s^+_{ij}(y)\) and \(s^-_{ij}(y)\) represent the motif statistic with and without the ij-cell. For example, in the case of networks, the change statistic for the number of edges is always 1.
To count statistics in an array, the [Counter] class will empty the array, initialize the counters, and then start counting while adding at each step a single cell, until matching the original array.
#define IF_MATCHES | ( | ) |
Definition at line 20 of file counters.hpp.
#define IF_NOTMATCHES | ( | ) |
Definition at line 22 of file counters.hpp.
#define IS_DUPLICATION | ( | ) | ((DATA_AT == Geese::etype_duplication) & (DPL)) |
Definition at line 17 of file counters.hpp.
#define IS_EITHER | ( | ) | (DATA_AT == Geese::etype_either) |
Definition at line 16 of file counters.hpp.
#define IS_SPECIATION | ( | ) | ((DATA_AT == Geese::etype_speciation) & (!DPL)) |
Definition at line 18 of file counters.hpp.
#define MAKE_DEFM_HASHER | ( | hasher, | |
a, | |||
cov | |||
) |
Data for the counters.
Details on the available counters for DEFMworkData
can be found in the Network counters section.
This class is used to store the data for the counters. It is used by the Counters
class.
Definition at line 27 of file counters.hpp.
#define MAKE_DUPL_VARS | ( | ) |
Details about the available counters for PhyloArray
objects can be found in the Phylo counters section.
Definition at line 12 of file counters.hpp.
#define PHYLO_CHECK_MISSING | ( | ) |
Definition at line 45 of file counters.hpp.
#define PHYLO_COUNTER_LAMBDA | ( | a | ) |
Definition at line 39 of file counters.hpp.
#define PHYLO_RULE_DYN_LAMBDA | ( | a | ) |
Definition at line 42 of file counters.hpp.
#define PHYLO_RULE_LAMBDA | ( | a | ) |
Extension of a simple counter.
It allows specifying extra arguments, in particular, the corresponding sets of rows to which this statistic may be relevant. This could be important in the case of, for example, counting correlation type statistics between function 1 and 2, and between function 1 and 3.
Definition at line 36 of file counters.hpp.
|
inline |
Function co-opting.
Function co-opting of functions A and B happens when, for example, function B is gained as a new featured leveraging what function A already does; without losing function A. The sufficient statistic is defined as follows:
\[ x_{pa}(1 - x_{pb})\sum_{i<j}\left[x_{ia}^p(1 - x_{ib}^p)x_{ja}^px_{jb}^p + x_{ja}^p(1 - x_{jb}^p)x_{ia}^px_{ib}^p\right] \]
This algorithm implements the change statistic.
Definition at line 1299 of file counters.hpp.
|
inline |
Co-evolution (joint gain or loss)
Needs to specify pairs of functions (nfunA
, nfunB
).
Definition at line 794 of file counters.hpp.
|
inline |
Functional gains for a specific function (nfun
).
Definition at line 99 of file counters.hpp.
|
inline |
Used when all the functions are in 0 (like the root node prob.)
Needs to specify function a.
Definition at line 1633 of file counters.hpp.
|
inline |
k genes gain function nfun
Definition at line 159 of file counters.hpp.
|
inline |
Keeps track of how many genes are changing (either 0, 1, or 2 if dealing with regular trees.)
Definition at line 231 of file counters.hpp.
|
inline |
Indicator function. Equals to one if \(k\) genes changed and zero otherwise.
Definition at line 1397 of file counters.hpp.
|
inline |
Indicator function. Equals to one if \(k\) genes changed and zero otherwise.
< How many genes diverge the parent
Definition at line 1517 of file counters.hpp.
|
inline |
Longest branch mutates (either by gain or by loss)
Definition at line 851 of file counters.hpp.
|
inline |
Total count of losses for an specific function.
Definition at line 594 of file counters.hpp.
|
inline |
Cap the number of functions per gene.
Definition at line 532 of file counters.hpp.
|
inline |
Total number of neofunctionalization events.
Needs to specify pairs of function.
Definition at line 1021 of file counters.hpp.
|
inline |
Total number of neofunctionalization events.
Needs to specify pairs of function.
Definition at line 1166 of file counters.hpp.
|
inline |
Total number of changes. Use this statistic to account for "preservation".
Definition at line 646 of file counters.hpp.
|
inline |
Overall functional gains.
Total number of gains (irrespective of the function).
Definition at line 61 of file counters.hpp.
|
inline |
Used when all the functions are in 0 (like the root node prob.)
Needs to specify function a.
Definition at line 1699 of file counters.hpp.
|
inline |
Overall functional loss.
Definition at line 484 of file counters.hpp.
|
inline |
Used when all the functions are in 0 (like the root node prob.)
Needs to specify function a. sum x(a)^3(1-x(b))^3 + x(b)^3(1-x(a))^3 + x(a)^3 * x(b)^3 + (1 - x(a))^3 * (1-x(b))^3
Definition at line 1951 of file counters.hpp.
|
inline |
Total number of neofunctionalization events sum_u sum_{w < u} [x(u,a)*(1 - x(w,a)) + (1 - x(u,a)) * x(w,a)] change stat: delta{x(u,a): 0->1} = 1 - 2 * x(w,a)
Definition at line 1102 of file counters.hpp.
|
inline |
Used when all the functions are in 0 (like the root node prob.)
Needs to specify function a.
Definition at line 1747 of file counters.hpp.
|
inline |
Used when all the functions are in 0 (like the root node prob.)
Needs to specify function a. sum x(a)^3(1-x(b))^3 + x(b)^3(1-x(a))^3 + x(a)^3 * x(b)^3 + (1 - x(a))^3 * (1-x(b))^3
Definition at line 1812 of file counters.hpp.
|
inline |
Keeps track of how many pairs of genes preserve pseudostate.
Definition at line 300 of file counters.hpp.
|
inline |
Keeps track of how many genes are changing (either 0, 1, or 2 if dealing with regular trees.)
Definition at line 382 of file counters.hpp.
|
inline |
Total count of Sub-functionalization events.
It requires to specify data = {funA, funB}
Definition at line 705 of file counters.hpp.
|
inline |
Definition at line 48 of file counters.hpp.