Leaderboard Guidelines
Every dataset in TDC is a benchmark, and we provide training, validation, and test sets for it, together with data splits and performance evaluation metrics.
To participate in the leaderboard for a specific benchmark, follow these steps:
Use the TDC benchmark data loader to retrieve the benchmark.
Use training and/or validation set to train your model.
Use the TDC model evaluator to calculate the performance of your model on the test set.
Submit the test set performance to a TDC leaderboard.
Below we provide detailed instructions on how to participate in TDC leaderboards.
As many datasets share a therapeutics theme, we organize benchmarks into meaningfully defined groups, which we refer to as benchmark groups. Datasets and tasks within a benchmark group are carefully curated and centered around a theme (for example, TDC contains a benchmark group to support ML predictions of the ADMET properties). While every benchmark group consists of multiple benchmarks, it is possible to separately submit results for each benchmark in the group.
Step-by-step Instructions
TDC provides a programmatic framework to access the benchmarks and use them for model evaluation.
Step 1: Train your model using a TDC BenchmarkGroup
Suppose you want to evaluate your model on the Caco2_Wang
benchmark that belongs to the ADMET_Group
benchmark group. Take the following code and replace the commented line block with the code to train your model. The train
, valid
, test
variables contain the split of the benchmark dataset.
from tdc.benchmark_group import admet_group
group = admet_group(path = 'data/')
predictions_list = []
for seed in [1, 2, 3, 4, 5]:
benchmark = group.get('Caco2_Wang')
# all benchmark names in a benchmark group are stored in group.dataset_names
predictions = {}
name = benchmark['name']
train_val, test = benchmark['train_val'], benchmark['test']
train, valid = group.get_train_valid_split(benchmark = name, split_type = 'default', seed = seed)
# --------------------------------------------- #
# Train your model using train, valid, test #
# Save test prediction in y_pred_test variable #
# --------------------------------------------- #
predictions[name] = y_pred_test
predictions_list.append(predictions)
results = group.evaluate_many(predictions_list)
# {'caco2_wang': [6.328, 0.101]}
The output results
is a dictionary storing average values and standard deviations of each performance metric achieved by your model on the Caco2_Wang
benchmark.
Step 2: Submit results of your model to a TDC Leaderboard
We invite submissions to any one or multiple benchmarks in a group. To be included in the leaderboard, please follow THE LEADERBOARD ENTRY PROCESS, include results of your model and provide a brief summary about your model (e.g., the number of parameters and hardware details).
Further Details about Benchmark Groups in TDC
The BenchmarkGroup
class is a wrapper class that provides utility functions for benchmarking. For each benchmark, TDC provides a separate test
set and a train_val
set.
from tdc.benchmark_group import admet_group
group = admet_group(path = 'data/')
benchmark = group.get('Caco2_Wang')
predictions = {}
name = benchmark['name']
train_val, test = benchmark['train_val'], benchmark['test']
## --- train your model --- ##
predictions[name] = y_pred
group.evaluate(predictions)
# {'caco2_wang': {'mae': 0.234}}
You can use train_val
to construct training and validation sets as you see best fit. For example, you can (1) construct a customized training and validation set using the train_val
or (2) use a TDC utility function to get data splits for different random seeds:
train, valid = group.get_train_valid_split(benchmark = 'Caco2_Wang', split_type = 'default', seed = 42)
Importantly, you must evaluate your model on the test set as specified by TDC to ensure fair comparison of models. To promote robust measurement of model performance, TDC requires at minimum five independent runs of the model to calculate average performance and standard deviation. Following is an example showing how to obtain five different train and validation splits:
from tdc.benchmark_group import admet_group
group = admet_group(path = 'data/')
predictions_list = []
for seed in [1, 2, 3, 4, 5]:
benchmark = group.get('Caco2_Wang')
predictions = {}
name = benchmark['name']
train_val, test = benchmark['train_val'], benchmark['test']
train, valid = group.get_train_valid_split(benchmark = name, split_type = 'default', seed = seed)
## --- train your model --- ##
predictions[name] = y_pred_test
predictions_list.append(predictions)
group.evaluate_many(predictions_list)
# {'caco2_wang': [6.328, 0.101]}
You can get a list of benchmarks included in the benchmark group as follows:
from tdc import utils
names = utils.retrieve_benchmark_names('ADMET_Group')
# ['caco2_wang', 'hia_hou', ....]
Alternatively, the same can be achieved via:
from tdc.benchmark_group import admet_group
group = admet_group(path = 'data/')
group.dataset_names
# ['caco2_wang', 'hia_hou', ....]
For every benchmark group, we provide multiple benchmarks that all instantiate the same learning task. We encourage you to submit results for the entire benchmark group; however, we also accept submissions reporting performance on just one benchmark in the group. To access all benchmarks in a group, TDC provides the following helper function:
from tdc.benchmark_group import admet_group
group = admet_group(path = 'data/')
predictions_list = []
for seed in [1, 2, 3, 4, 5]:
predictions = {}
for benchmark in group:
name = benchmark['name']
train_val, test = benchmark['train_val'], benchmark['test']
train, valid = group.get_train_valid_split(benchmark = name, split_type = 'default', seed = seed)
## --- train your model --- ##
predictions[name] = y_pred_test
predictions_list.append(predictions)
group.evaluate_many(predictions_list)
# {'caco2_wang': [6.328, 0.101], 'hia_hou': [0.5, 0.01], ...}
The FAIR Guiding Principles
ML tools have become essential for research. TDC leaderboards keep track of ML tools across the entire range of therapeutics. To improve the findability, accessibility, interoperability, and reuse of ML tools, we apply FAIR4RS principles and implementation guidelines to all software and ML tools included in TDC leaderboards.
We strongly believe that software and ML tools should be open and adhere to FAIR principles to encourage repeatability, reproducibility, and reuse. TDC follows the FAIR guidelines for both datasets as well as ML tools and data functions.
Start Exploring Groups of Leaderboards in TDC