The create_sim_samples function generates random samples with different performance levels.

create_sim_samples(n_repeat, np, nn, score_names = "random")

Arguments

n_repeat

The number of iterations to make samples.

np

The number of positives in a sample.

nn

The number of negatives in a sample.

score_names

A character vector for the names of the following performance levels.

"random"

Random

"poor_er"

Poor early retrieval

"good_er"

Good early retrieval

"excel"

Excellent

"perf"

Perfect

"all"

All of the above

Value

The create_sim_samples function returns a list with the following items.

  • scores: a list of numeric vectors

  • labels: an integer vector

  • modnames: a character vector of the model names

  • dsids: a character vector of the dataset IDs

See also

mmdata for formatting input data. evalmod for calculation evaluation measures.

Examples


##################################################
### Create a set of samples with 10 positives and 10 negatives
### for the random performance level
###
samps1 <- create_sim_samples(1, 10, 10, "random")

## Show the list structure
str(samps1)
#> List of 4
#>  $ scores  :List of 1
#>   ..$ :List of 1
#>   .. ..$ : num [1:20] 0.586 -0.149 1.448 -0.317 0.367 ...
#>  $ labels  : num [1:20] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ modnames: chr "random"
#>  $ dsids   : int 1


##################################################
### Create two sets of samples with 10 positives and 20 negatives
### for the random and the poor early retrieval performance levels
###
samps2 <- create_sim_samples(2, 10, 20, c("random", "poor_er"))

## Show the list structure
str(samps2)
#> List of 4
#>  $ scores  :List of 2
#>   ..$ :List of 2
#>   .. ..$ : num [1:30] -0.358 0.058 0.176 2.657 0.433 ...
#>   .. ..$ : num [1:30] 0.927 0.843 0.418 0.837 0.709 ...
#>   ..$ :List of 2
#>   .. ..$ : num [1:30] -0.563 -0.426 -0.649 0.435 0.817 ...
#>   .. ..$ : num [1:30] 0.972 0.97 0.966 0.915 0.577 ...
#>  $ labels  : num [1:30] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ modnames: chr [1:4] "random" "poor_er" "random" "poor_er"
#>  $ dsids   : int [1:4] 1 1 2 2


##################################################
### Create 3 sets of samples with 5 positives and 5 negatives
### for all 5 levels
###
samps3 <- create_sim_samples(3, 5, 5, "all")

## Show the list structure
str(samps3)
#> List of 4
#>  $ scores  :List of 3
#>   ..$ :List of 5
#>   .. ..$ : num [1:10] -2.363 0.611 -0.831 1.356 0.846 ...
#>   .. ..$ : num [1:10] 0.691 0.84 0.957 0.876 0.98 ...
#>   .. ..$ : num [1:10] 0.328 0.257 0.998 0.632 0.657 ...
#>   .. ..$ : num [1:10] 0.747 3.795 3.74 2.82 2.22 ...
#>   .. ..$ : num [1:10] 1 1 1 1 1 0 0 0 0 0
#>   ..$ :List of 5
#>   .. ..$ : num [1:10] -1.344 -0.256 1.381 1.227 -0.523 ...
#>   .. ..$ : num [1:10] 0.856 0.696 0.644 0.969 0.915 ...
#>   .. ..$ : num [1:10] 0.218 0.342 0.802 0.711 0.89 ...
#>   .. ..$ : num [1:10] 2.62 2.45 2.48 2.85 2.76 ...
#>   .. ..$ : num [1:10] 1 1 1 1 1 0 0 0 0 0
#>   ..$ :List of 5
#>   .. ..$ : num [1:10] 0.2686 -0.0428 -1.621 -0.7818 0.2408 ...
#>   .. ..$ : num [1:10] 0.636 0.657 0.918 0.538 0.611 ...
#>   .. ..$ : num [1:10] 0.292 0.441 0.132 0.717 0.257 ...
#>   .. ..$ : num [1:10] 3.1 2.6 3.41 2.14 3.98 ...
#>   .. ..$ : num [1:10] 1 1 1 1 1 0 0 0 0 0
#>  $ labels  : num [1:10] 1 1 1 1 1 0 0 0 0 0
#>  $ modnames: chr [1:15] "random" "poor_er" "good_er" "excel" ...
#>  $ dsids   : int [1:15] 1 1 1 1 1 2 2 2 2 2 ...