The create_testset function creates test datasets either for benchmarking or curve evaluation.

create_testset(test_type, set_names = NULL)

Arguments

test_type

A single string to specify the type of dataset generated by this function.

"bench"

Create test datasets for benchmarking

"curve"

Create test datasets for curve evaluation

set_names

A character vector to specify the names of test datasets.

  1. For benchmarking (test_type = "bench")

    This function uses a naming convention for randomly generated data for benchmarking. The format is a prefix ('i' or 'b') followed by the number of dataset. The prefix 'i' indicates a balanced dataset, whereas 'b' indicates an imbalanced dataset. The number can be used with a suffix 'k' or 'm', indicating respectively 1000 or 1 million.

    Below are some examples.

    "b100"

    A balanced data set with 50 positives and 50 negatives.

    "b10k"

    A balanced data set with 5000 positives and 5000 negatives.

    "b1m"

    A balanced data set with 500,000 positives and 500,000 negatives.

    "i100"

    An imbalanced data set with 25 positives and 75 negatives.

    The function returns a list of TestDataB objects.

  2. For curve evaluation (test_type = "curve")

    The following three predefined datasets can be specified for curve evaluation.

    set nameS3 objectdata source
    c1 or C1TestDataCC1DATA
    c2 or C2TestDataCC2DATA
    c3 or C3TestDataCC3DATA
    c4 or C4TestDataCC4DATA

    The function returns a list of TestDataC objects.

Value

A list of R6 test dataset objects.

See also

run_benchmark and run_evalcurve require the list of the datasets generated by this function. TestDataB for benchmarking test data. TestDataC, C1DATA, C2DATA, C3DATA, and C4DATA for curve evaluation test data. create_usrdata for creating a user-defined test set.

Examples

## Create a balanced data set with 50 positives and 50 negatives tset1 <- create_testset("bench", "b100") tset1
#> $b100 #> #> === Test dataset for prcbench functions === #> #> Testset name: b100 #> # of positives: 50 #> # of negatives: 50 #> Scores: 0.0009157793 (min) #> 0.3417239 (mean) #> 0.9926006 (max) #> Labels: 0 (neg), 1 (pos) #> #>
## Create an imbalanced data set with 25 positives and 75 negatives tset2 <- create_testset("bench", "i100") tset2
#> $i100 #> #> === Test dataset for prcbench functions === #> #> Testset name: i100 #> # of positives: 25 #> # of negatives: 75 #> Scores: 0.001296925 (min) #> 0.2468037 (mean) #> 0.9040735 (max) #> Labels: 0 (neg), 1 (pos) #> #>
## Create P1 dataset tset3 <- create_testset("curve", "c1") tset3
#> $c1 #> #> === Test dataset for prcbench functions === #> #> Testset name: c1 #> # of positives: 2 #> # of negatives: 2 #> Scores: 1 (min) #> 2 (mean) #> 3 (max) #> Labels: 0 (neg), 1 (pos) #> Pre-calculated: Yes #> # of base points: 6 #> Text position: (0.85, 0.9) #> Text position2: (0.9, 0.9) #> #>
## Create P1 dataset tset4 <- create_testset("curve", c("c1", "c2")) tset4
#> $c1 #> #> === Test dataset for prcbench functions === #> #> Testset name: c1 #> # of positives: 2 #> # of negatives: 2 #> Scores: 1 (min) #> 2 (mean) #> 3 (max) #> Labels: 0 (neg), 1 (pos) #> Pre-calculated: Yes #> # of base points: 6 #> Text position: (0.85, 0.9) #> Text position2: (0.9, 0.9) #> #> #> $c2 #> #> === Test dataset for prcbench functions === #> #> Testset name: c2 #> # of positives: 2 #> # of negatives: 2 #> Scores: 1 (min) #> 2.25 (mean) #> 3 (max) #> Labels: 0 (neg), 1 (pos) #> Pre-calculated: Yes #> # of base points: 6 #> Text position: (0.2, 0.65) #> Text position2: (0.2, 0.75) #> #>