simulate_block_data()
creates a dataset of blocks of data where variables
within each block are correlated. The correlation for each pair of variables
is sampled uniformly from lower_corr
to upper_corr
, and the values of
each are sampled using MASS::mvrnorm()
.
simulate_block_data(
block_sizes,
lower_corr,
upper_corr,
n,
block_name = "block",
sep = "_",
var_name = "x"
)
a vector of block sizes. The size of each block is the number of variables within it.
the lower bound of the correlation within each block
the upper bound of the correlation within each block
the number of observations or rows
description prepended to the variable to indicate the block it belongs to
a character, what to separate the variable names with
the name of the variable within the block
a tibble
with sum(block_sizes)
columns and n
rows.
# create a 100 x 15 data set with 3 blocks
simulate_block_data(
block_sizes = rep(5, 3),
lower_corr = .4,
upper_corr = .6,
n = 100
)
#> # A tibble: 100 × 15
#> block1_x1 block1_x2 block1_x3 block1_x4 block1_x5 block2_x1 block2_x2
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.675 -0.959 0.256 -0.0978 1.61 0.877 -0.629
#> 2 1.08 1.55 0.209 0.490 0.246 -0.583 -0.958
#> 3 -0.311 0.476 1.11 -0.383 0.877 -0.568 0.971
#> 4 -0.387 0.474 0.927 0.966 -0.0765 0.244 0.483
#> 5 1.03 0.919 0.992 -0.0421 -0.446 0.357 -0.386
#> 6 0.434 1.11 1.75 0.227 0.415 -1.45 -1.09
#> 7 0.434 0.848 -1.19 0.0387 -1.42 -0.266 1.04
#> 8 -0.257 0.601 0.660 -1.16 -2.00 -0.0244 0.656
#> 9 -0.618 0.863 1.04 0.00510 1.13 0.194 0.815
#> 10 -1.17 -0.628 -1.42 -0.225 -1.78 0.795 -1.46
#> # ℹ 90 more rows
#> # ℹ 8 more variables: block2_x3 <dbl>, block2_x4 <dbl>, block2_x5 <dbl>,
#> # block3_x1 <dbl>, block3_x2 <dbl>, block3_x3 <dbl>, block3_x4 <dbl>,
#> # block3_x5 <dbl>