Simulate correlated blocks of variables — simulate_block

simulate_block_data() creates a dataset of blocks of data where variables within each block are correlated. The correlation for each pair of variables is sampled uniformly from lower_corr to upper_corr, and the values of each are sampled using MASS::mvrnorm().

simulate_block_data(
  block_sizes,
  lower_corr,
  upper_corr,
  n,
  block_name = "block",
  sep = "_",
  var_name = "x"
)

Arguments

block_sizes: a vector of block sizes. The size of each block is the number of variables within it.
lower_corr: the lower bound of the correlation within each block
upper_corr: the upper bound of the correlation within each block
n: the number of observations or rows
block_name: description prepended to the variable to indicate the block it belongs to
sep: a character, what to separate the variable names with
var_name: the name of the variable within the block

Value

a tibble with sum(block_sizes) columns and n rows.

Examples

# create a 100 x 15 data set with 3 blocks
simulate_block_data(
  block_sizes = rep(5, 3),
  lower_corr = .4,
  upper_corr = .6,
  n = 100
)
#> # A tibble: 100 × 15
#>    block1_x1 block1_x2 block1_x3 block1_x4 block1_x5 block2_x1 block2_x2
#>        <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
#>  1    -0.675    -0.959     0.256  -0.0978     1.61      0.877     -0.629
#>  2     1.08      1.55      0.209   0.490      0.246    -0.583     -0.958
#>  3    -0.311     0.476     1.11   -0.383      0.877    -0.568      0.971
#>  4    -0.387     0.474     0.927   0.966     -0.0765    0.244      0.483
#>  5     1.03      0.919     0.992  -0.0421    -0.446     0.357     -0.386
#>  6     0.434     1.11      1.75    0.227      0.415    -1.45      -1.09 
#>  7     0.434     0.848    -1.19    0.0387    -1.42     -0.266      1.04 
#>  8    -0.257     0.601     0.660  -1.16      -2.00     -0.0244     0.656
#>  9    -0.618     0.863     1.04    0.00510    1.13      0.194      0.815
#> 10    -1.17     -0.628    -1.42   -0.225     -1.78      0.795     -1.46 
#> # ℹ 90 more rows
#> # ℹ 8 more variables: block2_x3 <dbl>, block2_x4 <dbl>, block2_x5 <dbl>,
#> #   block3_x1 <dbl>, block3_x2 <dbl>, block3_x3 <dbl>, block3_x4 <dbl>,
#> #   block3_x5 <dbl>