The create_testset
function creates test datasets either for
benchmarking or curve evaluation.
create_testset(test_type, set_names = NULL)
A single string to specify the type of dataset generated by this function.
Create test datasets for benchmarking
Create test datasets for curve evaluation
A character vector to specify the names of test datasets.
For benchmarking (test_type = "bench"
)
This function uses a naming convention for randomly generated data for benchmarking. The format is a prefix ('i' or 'b') followed by the number of dataset. The prefix 'i' indicates a balanced dataset, whereas 'b' indicates an imbalanced dataset. The number can be used with a suffix 'k' or 'm', indicating respectively 1000 or 1 million.
Below are some examples.
A balanced data set with 50 positives and 50 negatives.
A balanced data set with 5000 positives and 5000 negatives.
A balanced data set with 500,000 positives and 500,000 negatives.
An imbalanced data set with 25 positives and 75 negatives.
The function returns a list of TestDataB
objects.
For curve evaluation (test_type = "curve"
)
The following three predefined datasets can be specified for curve evaluation.
set name | S3 object | data source |
c1 or C1 | TestDataC | C1DATA |
c2 or C2 | TestDataC | C2DATA |
c3 or C3 | TestDataC | C3DATA |
c4 or C4 | TestDataC | C4DATA |
The function returns a list of TestDataC
objects.
A list of R6
test dataset objects.
run_benchmark
and run_evalcurve
require
the list of the datasets generated by this function.
TestDataB
for benchmarking test data.
TestDataC
, C1DATA
, C2DATA
,
C3DATA
, and C4DATA
for curve evaluation
test data.
create_usrdata
for creating a user-defined test set.
## Create a balanced data set with 50 positives and 50 negatives
tset1 <- create_testset("bench", "b100")
tset1
#> $b100
#>
#> === Test dataset for prcbench functions ===
#>
#> Testset name: b100
#> # of positives: 50
#> # of negatives: 50
#> Scores: 0.01636688 (min)
#> 0.365029 (mean)
#> 0.9849833 (max)
#> Labels: 0 (neg), 1 (pos)
#>
#>
## Create an imbalanced data set with 25 positives and 75 negatives
tset2 <- create_testset("bench", "i100")
tset2
#> $i100
#>
#> === Test dataset for prcbench functions ===
#>
#> Testset name: i100
#> # of positives: 25
#> # of negatives: 75
#> Scores: 0.002824673 (min)
#> 0.2560934 (mean)
#> 0.9989533 (max)
#> Labels: 0 (neg), 1 (pos)
#>
#>
## Create P1 dataset
tset3 <- create_testset("curve", "c1")
tset3
#> $c1
#>
#> === Test dataset for prcbench functions ===
#>
#> Testset name: c1
#> # of positives: 2
#> # of negatives: 2
#> Scores: 1 (min)
#> 2 (mean)
#> 3 (max)
#> Labels: 0 (neg), 1 (pos)
#> Pre-calculated: Yes
#> # of base points: 6
#> Text position: (0.85, 0.9)
#> Text position2: (0.9, 0.9)
#>
#>
## Create P1 dataset
tset4 <- create_testset("curve", c("c1", "c2"))
tset4
#> $c1
#>
#> === Test dataset for prcbench functions ===
#>
#> Testset name: c1
#> # of positives: 2
#> # of negatives: 2
#> Scores: 1 (min)
#> 2 (mean)
#> 3 (max)
#> Labels: 0 (neg), 1 (pos)
#> Pre-calculated: Yes
#> # of base points: 6
#> Text position: (0.85, 0.9)
#> Text position2: (0.9, 0.9)
#>
#>
#> $c2
#>
#> === Test dataset for prcbench functions ===
#>
#> Testset name: c2
#> # of positives: 2
#> # of negatives: 2
#> Scores: 1 (min)
#> 2.25 (mean)
#> 3 (max)
#> Labels: 0 (neg), 1 (pos)
#> Pre-calculated: Yes
#> # of base points: 6
#> Text position: (0.2, 0.65)
#> Text position2: (0.2, 0.75)
#>
#>