The create_testset function creates test datasets either for benchmarking or curve evaluation.

create_testset(test_type, set_names = NULL)

Arguments

test_type

A single string to specify the type of dataset generated by this function.

"bench"

Create test datasets for benchmarking

"curve"

Create test datasets for curve evaluation

set_names

A character vector to specify the names of test datasets.

  1. For benchmarking (test_type = "bench")

    This function uses a naming convention for randomly generated data for benchmarking. The format is a prefix ('i' or 'b') followed by the number of dataset. The prefix 'i' indicates a balanced dataset, whereas 'b' indicates an imbalanced dataset. The number can be used with a suffix 'k' or 'm', indicating respectively 1000 or 1 million.

    Below are some examples.

    "b100"

    A balanced data set with 50 positives and 50 negatives.

    "b10k"

    A balanced data set with 5000 positives and 5000 negatives.

    "b1m"

    A balanced data set with 500,000 positives and 500,000 negatives.

    "i100"

    An imbalanced data set with 25 positives and 75 negatives.

    The function returns a list of TestDataB objects.

  2. For curve evaluation (test_type = "curve")

    The following three predefined datasets can be specified for curve evaluation.

    set nameS3 objectdata source
    c1 or C1TestDataCC1DATA
    c2 or C2TestDataCC2DATA
    c3 or C3TestDataCC3DATA
    c4 or C4TestDataCC4DATA

    The function returns a list of TestDataC objects.

Value

A list of R6 test dataset objects.

See also

run_benchmark and run_evalcurve require the list of the datasets generated by this function. TestDataB for benchmarking test data. TestDataC, C1DATA, C2DATA, C3DATA, and C4DATA for curve evaluation test data. create_usrdata for creating a user-defined test set.

Examples

## Create a balanced data set with 50 positives and 50 negatives
tset1 <- create_testset("bench", "b100")
tset1
#> $b100
#> 
#>     === Test dataset for prcbench functions ===
#> 
#>     Testset name:     b100 
#>     # of positives:   50 
#>     # of negatives:   50 
#>     Scores:           0.01636688 (min) 
#>                       0.365029 (mean) 
#>                       0.9849833 (max) 
#>     Labels:           0 (neg), 1 (pos)
#> 
#> 

## Create an imbalanced data set with 25 positives and 75 negatives
tset2 <- create_testset("bench", "i100")
tset2
#> $i100
#> 
#>     === Test dataset for prcbench functions ===
#> 
#>     Testset name:     i100 
#>     # of positives:   25 
#>     # of negatives:   75 
#>     Scores:           0.002824673 (min) 
#>                       0.2560934 (mean) 
#>                       0.9989533 (max) 
#>     Labels:           0 (neg), 1 (pos)
#> 
#> 

## Create P1 dataset
tset3 <- create_testset("curve", "c1")
tset3
#> $c1
#> 
#>     === Test dataset for prcbench functions ===
#> 
#>     Testset name:     c1 
#>     # of positives:   2 
#>     # of negatives:   2 
#>     Scores:           1 (min) 
#>                       2 (mean) 
#>                       3 (max) 
#>     Labels:           0 (neg), 1 (pos)
#>     Pre-calculated:   Yes
#>     # of base points: 6 
#>     Text position:    (0.85, 0.9)
#>     Text position2:   (0.9, 0.9)
#> 
#> 

## Create P1 dataset
tset4 <- create_testset("curve", c("c1", "c2"))
tset4
#> $c1
#> 
#>     === Test dataset for prcbench functions ===
#> 
#>     Testset name:     c1 
#>     # of positives:   2 
#>     # of negatives:   2 
#>     Scores:           1 (min) 
#>                       2 (mean) 
#>                       3 (max) 
#>     Labels:           0 (neg), 1 (pos)
#>     Pre-calculated:   Yes
#>     # of base points: 6 
#>     Text position:    (0.85, 0.9)
#>     Text position2:   (0.9, 0.9)
#> 
#> 
#> $c2
#> 
#>     === Test dataset for prcbench functions ===
#> 
#>     Testset name:     c2 
#>     # of positives:   2 
#>     # of negatives:   2 
#>     Scores:           1 (min) 
#>                       2.25 (mean) 
#>                       3 (max) 
#>     Labels:           0 (neg), 1 (pos)
#>     Pre-calculated:   Yes
#>     # of base points: 6 
#>     Text position:    (0.2, 0.65)
#>     Text position2:   (0.2, 0.75)
#> 
#>