barry: Your go-to motif accountant  0.0-1
Full enumeration of sample space and fast count of sufficient statistics for binary arrays
Counting

Classes

class  NetworkData
 Data class for Networks. More...
 
class  Counter< Array_Type, Data_Type >
 A counter function based on change statistics. More...
 

Macros

#define MAKE_DEFM_HASHER(hasher, a, cov)
 Data for the counters. More...
 
#define MAKE_DUPL_VARS()
 
#define IS_EITHER()   (DATA_AT == Geese::etype_either)
 
#define IS_DUPLICATION()   ((DATA_AT == Geese::etype_duplication) & (DPL))
 
#define IS_SPECIATION()   ((DATA_AT == Geese::etype_speciation) & (!DPL))
 
#define IF_MATCHES()
 
#define IF_NOTMATCHES()
 
#define PHYLO_RULE_LAMBDA(a)
 Extension of a simple counter. More...
 
#define PHYLO_COUNTER_LAMBDA(a)
 
#define PHYLO_RULE_DYN_LAMBDA(a)
 
#define PHYLO_CHECK_MISSING()
 
std::string get_last_name (size_t d)
 
void counter_overall_gains (PhyloCounters *counters, size_t duplication=Geese::etype_default)
 Overall functional gains. More...
 
void counter_gains (PhyloCounters *counters, std::vector< size_t > nfun, size_t duplication=Geese::etype_default)
 Functional gains for a specific function (nfun). More...
 
void counter_gains_k_offspring (PhyloCounters *counters, std::vector< size_t > nfun, size_t k=1u, size_t duplication=Geese::etype_default)
 k genes gain function nfun More...
 
void counter_genes_changing (PhyloCounters *counters, size_t duplication=Geese::etype_default)
 Keeps track of how many genes are changing (either 0, 1, or 2 if dealing with regular trees.) More...
 
void counter_preserve_pseudogene (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default)
 Keeps track of how many pairs of genes preserve pseudostate. More...
 
void counter_prop_genes_changing (PhyloCounters *counters, size_t duplication=Geese::etype_default)
 Keeps track of how many genes are changing (either 0, 1, or 2 if dealing with regular trees.) More...
 
void counter_overall_loss (PhyloCounters *counters, size_t duplication=Geese::etype_default)
 Overall functional loss. More...
 
void counter_maxfuns (PhyloCounters *counters, size_t lb, size_t ub, size_t duplication=Geese::etype_default)
 Cap the number of functions per gene. More...
 
void counter_loss (PhyloCounters *counters, std::vector< size_t > nfun, size_t duplication=Geese::etype_default)
 Total count of losses for an specific function. More...
 
void counter_overall_changes (PhyloCounters *counters, size_t duplication=Geese::etype_default)
 Total number of changes. Use this statistic to account for "preservation". More...
 
void counter_subfun (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default)
 Total count of Sub-functionalization events. More...
 
void counter_cogain (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default)
 Co-evolution (joint gain or loss) More...
 
void counter_longest (PhyloCounters *counters, size_t duplication=Geese::etype_default)
 Longest branch mutates (either by gain or by loss) More...
 
void counter_neofun (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default)
 Total number of neofunctionalization events. More...
 
void counter_pairwise_neofun_singlefun (PhyloCounters *counters, size_t nfunA, size_t duplication=Geese::etype_default)
 Total number of neofunctionalization events sum_u sum_{w < u} [x(u,a)*(1 - x(w,a)) + (1 - x(u,a)) * x(w,a)] change stat: delta{x(u,a): 0->1} = 1 - 2 * x(w,a) More...
 
void counter_neofun_a2b (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default)
 Total number of neofunctionalization events. More...
 
void counter_co_opt (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default)
 Function co-opting. More...
 
void counter_k_genes_changing (PhyloCounters *counters, size_t k, size_t duplication=Geese::etype_default)
 Indicator function. Equals to one if \(k\) genes changed and zero otherwise. More...
 
void counter_less_than_p_prop_genes_changing (PhyloCounters *counters, double p, size_t duplication=Geese::etype_default)
 Indicator function. Equals to one if \(k\) genes changed and zero otherwise. More...
 
void counter_gains_from_0 (PhyloCounters *counters, std::vector< size_t > nfun, size_t duplication=Geese::etype_default)
 Used when all the functions are in 0 (like the root node prob.) More...
 
void counter_overall_gains_from_0 (PhyloCounters *counters, size_t duplication=Geese::etype_default)
 Used when all the functions are in 0 (like the root node prob.) More...
 
void counter_pairwise_overall_change (PhyloCounters *counters, size_t duplication=Geese::etype_default)
 Used when all the functions are in 0 (like the root node prob.) More...
 
void counter_pairwise_preserving (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default)
 Used when all the functions are in 0 (like the root node prob.) More...
 
void counter_pairwise_first_gain (PhyloCounters *counters, size_t nfunA, size_t nfunB, size_t duplication=Geese::etype_default)
 Used when all the functions are in 0 (like the root node prob.) More...
 

Detailed Description

barry includes a flexible way to generate counters based on change statistics. Since most of the time we are counting many motifs in a graph, change statistics make a reasonable (and efficient) way to make such counts.

In particular, let the motif be defined as \(s(y)\), with \(y\) as the binary array. The change statistic when adding cell \(y_{ij}\), i.e. when the cell moves from being emty to have a one, is defined as

\[ \delta(y_{ij}) = s^+_{ij}(y) - s^-_{ij}(y), \]

where \(s^+_{ij}(y)\) and \(s^-_{ij}(y)\) represent the motif statistic with and without the ij-cell. For example, in the case of networks, the change statistic for the number of edges is always 1.

To count statistics in an array, the [Counter] class will empty the array, initialize the counters, and then start counting while adding at each step a single cell, until matching the original array.

Macro Definition Documentation

◆ IF_MATCHES

#define IF_MATCHES ( )
Value:
#define IS_EITHER()
Definition: counters.hpp:16
#define IS_SPECIATION()
Definition: counters.hpp:18
#define IS_DUPLICATION()
Definition: counters.hpp:17
#define MAKE_DUPL_VARS()
Definition: counters.hpp:12

Definition at line 20 of file counters.hpp.

◆ IF_NOTMATCHES

#define IF_NOTMATCHES ( )
Value:

Definition at line 22 of file counters.hpp.

◆ IS_DUPLICATION

#define IS_DUPLICATION ( )    ((DATA_AT == Geese::etype_duplication) & (DPL))

Definition at line 17 of file counters.hpp.

◆ IS_EITHER

#define IS_EITHER ( )    (DATA_AT == Geese::etype_either)

Definition at line 16 of file counters.hpp.

◆ IS_SPECIATION

#define IS_SPECIATION ( )    ((DATA_AT == Geese::etype_speciation) & (!DPL))

Definition at line 18 of file counters.hpp.

◆ MAKE_DEFM_HASHER

#define MAKE_DEFM_HASHER (   hasher,
  a,
  cov 
)
Value:
barry::Hasher_fun_type<DEFMArray, DEFMCounterData> \
hasher = [cov](const DEFMArray & array, DEFMCounterData * d) -> \
std::vector< double > { \
std::vector< double > res; \
/* Adding the column feature */ \
for (size_t i = 0u; i < array.nrow(); ++i) \
res.push_back(array.D()(i, cov)); \
/* Adding the fixed dims */ \
for (size_t i = 0u; i < (array.nrow() - 1); ++i) \
for (size_t j = 0u; j < array.ncol(); ++j) \
res.push_back(array(i, j)); \
return res;\
};
Data class used to store arbitrary size_t or double vectors.
Definition: defm-types.hpp:66
return res
size_t size_t j
size_t i
barry::BArrayDense< int, DEFMData > DEFMArray
Definition: defm-types.hpp:3

Data for the counters.

Details on the available counters for DEFMworkData can be found in the Network counters section.

This class is used to store the data for the counters. It is used by the Counters class.

Definition at line 27 of file counters.hpp.

◆ MAKE_DUPL_VARS

#define MAKE_DUPL_VARS ( )
Value:
bool DPL = Array.D_ptr()->duplication; \
size_t DATA_AT = data[0u];
Data_Type &&counter_ data(std::move(counter_.data))

Details about the available counters for PhyloArray objects can be found in the Phylo counters section.

Definition at line 12 of file counters.hpp.

◆ PHYLO_CHECK_MISSING

#define PHYLO_CHECK_MISSING ( )
Value:
if (Array.D_ptr() == nullptr) \
throw std::logic_error("The array data is nullptr."); \

Definition at line 45 of file counters.hpp.

◆ PHYLO_COUNTER_LAMBDA

#define PHYLO_COUNTER_LAMBDA (   a)
Value:
barry::Counter_fun_type<PhyloArray, PhyloCounterData> a = \
[](const PhyloArray & Array, size_t i, size_t j, PhyloCounterData & data)
barry::BArrayDense< size_t, NodeData > PhyloArray

Definition at line 39 of file counters.hpp.

◆ PHYLO_RULE_DYN_LAMBDA

#define PHYLO_RULE_DYN_LAMBDA (   a)
Value:
barry::Rule_fun_type<PhyloArray, PhyloRuleDynData> a = \
[](const PhyloArray & Array, size_t i, size_t j, PhyloRuleDynData & data)

Definition at line 42 of file counters.hpp.

◆ PHYLO_RULE_LAMBDA

#define PHYLO_RULE_LAMBDA (   a)
Value:
barry::Rule_fun_type<PhyloArray, PhyloRuleData> a = \
[](const PhyloArray & Array, size_t i, size_t j, PhyloRuleData & data)
std::vector< std::pair< size_t, size_t > > PhyloRuleData

Extension of a simple counter.

It allows specifying extra arguments, in particular, the corresponding sets of rows to which this statistic may be relevant. This could be important in the case of, for example, counting correlation type statistics between function 1 and 2, and between function 1 and 3.

Definition at line 36 of file counters.hpp.

Function Documentation

◆ counter_co_opt()

void counter_co_opt ( PhyloCounters counters,
size_t  nfunA,
size_t  nfunB,
size_t  duplication = Geese::etype_default 
)
inline

Function co-opting.

Function co-opting of functions A and B happens when, for example, function B is gained as a new featured leveraging what function A already does; without losing function A. The sufficient statistic is defined as follows:

\[ x_{pa}(1 - x_{pb})\sum_{i<j}\left[x_{ia}^p(1 - x_{ib}^p)x_{ja}^px_{jb}^p + x_{ja}^p(1 - x_{jb}^p)x_{ia}^px_{ib}^p\right] \]

This algorithm implements the change statistic.

Definition at line 1299 of file counters.hpp.

◆ counter_cogain()

void counter_cogain ( PhyloCounters counters,
size_t  nfunA,
size_t  nfunB,
size_t  duplication = Geese::etype_default 
)
inline

Co-evolution (joint gain or loss)

Needs to specify pairs of functions (nfunA, nfunB).

Definition at line 794 of file counters.hpp.

◆ counter_gains()

void counter_gains ( PhyloCounters counters,
std::vector< size_t >  nfun,
size_t  duplication = Geese::etype_default 
)
inline

Functional gains for a specific function (nfun).

Definition at line 99 of file counters.hpp.

◆ counter_gains_from_0()

void counter_gains_from_0 ( PhyloCounters counters,
std::vector< size_t >  nfun,
size_t  duplication = Geese::etype_default 
)
inline

Used when all the functions are in 0 (like the root node prob.)

Needs to specify function a.

Definition at line 1633 of file counters.hpp.

◆ counter_gains_k_offspring()

void counter_gains_k_offspring ( PhyloCounters counters,
std::vector< size_t >  nfun,
size_t  k = 1u,
size_t  duplication = Geese::etype_default 
)
inline

k genes gain function nfun

Definition at line 159 of file counters.hpp.

◆ counter_genes_changing()

void counter_genes_changing ( PhyloCounters counters,
size_t  duplication = Geese::etype_default 
)
inline

Keeps track of how many genes are changing (either 0, 1, or 2 if dealing with regular trees.)

Definition at line 231 of file counters.hpp.

◆ counter_k_genes_changing()

void counter_k_genes_changing ( PhyloCounters counters,
size_t  k,
size_t  duplication = Geese::etype_default 
)
inline

Indicator function. Equals to one if \(k\) genes changed and zero otherwise.

Definition at line 1397 of file counters.hpp.

◆ counter_less_than_p_prop_genes_changing()

void counter_less_than_p_prop_genes_changing ( PhyloCounters counters,
double  p,
size_t  duplication = Geese::etype_default 
)
inline

Indicator function. Equals to one if \(k\) genes changed and zero otherwise.

< How many genes diverge the parent

Definition at line 1517 of file counters.hpp.

◆ counter_longest()

void counter_longest ( PhyloCounters counters,
size_t  duplication = Geese::etype_default 
)
inline

Longest branch mutates (either by gain or by loss)

Definition at line 851 of file counters.hpp.

◆ counter_loss()

void counter_loss ( PhyloCounters counters,
std::vector< size_t >  nfun,
size_t  duplication = Geese::etype_default 
)
inline

Total count of losses for an specific function.

Definition at line 594 of file counters.hpp.

◆ counter_maxfuns()

void counter_maxfuns ( PhyloCounters counters,
size_t  lb,
size_t  ub,
size_t  duplication = Geese::etype_default 
)
inline

Cap the number of functions per gene.

Definition at line 532 of file counters.hpp.

◆ counter_neofun()

void counter_neofun ( PhyloCounters counters,
size_t  nfunA,
size_t  nfunB,
size_t  duplication = Geese::etype_default 
)
inline

Total number of neofunctionalization events.

Needs to specify pairs of function.

Definition at line 1021 of file counters.hpp.

◆ counter_neofun_a2b()

void counter_neofun_a2b ( PhyloCounters counters,
size_t  nfunA,
size_t  nfunB,
size_t  duplication = Geese::etype_default 
)
inline

Total number of neofunctionalization events.

Needs to specify pairs of function.

Definition at line 1166 of file counters.hpp.

◆ counter_overall_changes()

void counter_overall_changes ( PhyloCounters counters,
size_t  duplication = Geese::etype_default 
)
inline

Total number of changes. Use this statistic to account for "preservation".

Definition at line 646 of file counters.hpp.

◆ counter_overall_gains()

void counter_overall_gains ( PhyloCounters counters,
size_t  duplication = Geese::etype_default 
)
inline

Overall functional gains.

Total number of gains (irrespective of the function).

Definition at line 61 of file counters.hpp.

◆ counter_overall_gains_from_0()

void counter_overall_gains_from_0 ( PhyloCounters counters,
size_t  duplication = Geese::etype_default 
)
inline

Used when all the functions are in 0 (like the root node prob.)

Needs to specify function a.

Definition at line 1699 of file counters.hpp.

◆ counter_overall_loss()

void counter_overall_loss ( PhyloCounters counters,
size_t  duplication = Geese::etype_default 
)
inline

Overall functional loss.

Definition at line 484 of file counters.hpp.

◆ counter_pairwise_first_gain()

void counter_pairwise_first_gain ( PhyloCounters counters,
size_t  nfunA,
size_t  nfunB,
size_t  duplication = Geese::etype_default 
)
inline

Used when all the functions are in 0 (like the root node prob.)

Needs to specify function a. sum x(a)^3(1-x(b))^3 + x(b)^3(1-x(a))^3 + x(a)^3 * x(b)^3 + (1 - x(a))^3 * (1-x(b))^3

Definition at line 1951 of file counters.hpp.

◆ counter_pairwise_neofun_singlefun()

void counter_pairwise_neofun_singlefun ( PhyloCounters counters,
size_t  nfunA,
size_t  duplication = Geese::etype_default 
)
inline

Total number of neofunctionalization events sum_u sum_{w < u} [x(u,a)*(1 - x(w,a)) + (1 - x(u,a)) * x(w,a)] change stat: delta{x(u,a): 0->1} = 1 - 2 * x(w,a)

Definition at line 1102 of file counters.hpp.

◆ counter_pairwise_overall_change()

void counter_pairwise_overall_change ( PhyloCounters counters,
size_t  duplication = Geese::etype_default 
)
inline

Used when all the functions are in 0 (like the root node prob.)

Needs to specify function a.

Definition at line 1747 of file counters.hpp.

◆ counter_pairwise_preserving()

void counter_pairwise_preserving ( PhyloCounters counters,
size_t  nfunA,
size_t  nfunB,
size_t  duplication = Geese::etype_default 
)
inline

Used when all the functions are in 0 (like the root node prob.)

Needs to specify function a. sum x(a)^3(1-x(b))^3 + x(b)^3(1-x(a))^3 + x(a)^3 * x(b)^3 + (1 - x(a))^3 * (1-x(b))^3

Definition at line 1812 of file counters.hpp.

◆ counter_preserve_pseudogene()

void counter_preserve_pseudogene ( PhyloCounters counters,
size_t  nfunA,
size_t  nfunB,
size_t  duplication = Geese::etype_default 
)
inline

Keeps track of how many pairs of genes preserve pseudostate.

Definition at line 300 of file counters.hpp.

◆ counter_prop_genes_changing()

void counter_prop_genes_changing ( PhyloCounters counters,
size_t  duplication = Geese::etype_default 
)
inline

Keeps track of how many genes are changing (either 0, 1, or 2 if dealing with regular trees.)

Definition at line 382 of file counters.hpp.

◆ counter_subfun()

void counter_subfun ( PhyloCounters counters,
size_t  nfunA,
size_t  nfunB,
size_t  duplication = Geese::etype_default 
)
inline

Total count of Sub-functionalization events.

It requires to specify data = {funA, funB}

Definition at line 705 of file counters.hpp.

◆ get_last_name()

std::string get_last_name ( size_t  d)
inline

Definition at line 48 of file counters.hpp.