Leaderboard Guidelines

Every dataset in TDC is a benchmark, and we provide training, validation, and test sets for it, together with data splits and performance evaluation metrics.

To participate in the leaderboard for a specific benchmark, follow these steps:

  1. Use the TDC benchmark data loader to retrieve the benchmark.

  2. Use training and/or validation set to train your model.

  3. Use the TDC model evaluator to calculate the performance of your model on the test set.

  4. Submit the test set performance to a TDC leaderboard.

Below we provide detailed instructions on how to participate in TDC leaderboards.

As many datasets share a therapeutics theme, we organize benchmarks into meaningfully defined groups, which we refer to as benchmark groups. Datasets and tasks within a benchmark group are carefully curated and centered around a theme (for example, TDC contains a benchmark group to support ML predictions of the ADMET properties). While every benchmark group consists of multiple benchmarks, it is possible to separately submit results for each benchmark in the group.


Step-by-step Instructions

TDC provides a programmatic framework to access the benchmarks and use them for model evaluation.

Step 1: Train your model using a TDC BenchmarkGroup

Suppose you want to evaluate your model on the Caco2_Wang benchmark that belongs to the ADMET_Group benchmark group. Take the following code and replace the commented line block with the code to train your model. The train, valid, test variables contain the split of the benchmark dataset.

from tdc.benchmark_group import admet_group
group = admet_group(path = 'data/')
predictions_list = []

for seed in [1, 2, 3, 4, 5]:
    benchmark = group.get('Caco2_Wang') 
    # all benchmark names in a benchmark group are stored in group.dataset_names
    predictions = {}
    name = benchmark['name']
    train_val, test = benchmark['train_val'], benchmark['test']
    train, valid = group.get_train_valid_split(benchmark = name, split_type = 'default', seed = seed)
    
        # --------------------------------------------- # 
        #  Train your model using train, valid, test    #
        #  Save test prediction in y_pred_test variable #
        # --------------------------------------------- #
        
    predictions[name] = y_pred_test
    predictions_list.append(predictions)

results = group.evaluate_many(predictions_list)
# {'caco2_wang': [6.328, 0.101]}

The output results is a dictionary storing average values and standard deviations of each performance metric achieved by your model on the Caco2_Wang benchmark.

Step 2: Submit results of your model to a TDC Leaderboard

We invite submissions to any one or multiple benchmarks in a group. To be included in the leaderboard, please follow THE LEADERBOARD ENTRY PROCESS, include results of your model and provide a brief summary about your model (e.g., the number of parameters and hardware details).


Further Details about Benchmark Groups in TDC

The BenchmarkGroup class is a wrapper class that provides utility functions for benchmarking. For each benchmark, TDC provides a separate test set and a train_val set.

from tdc.benchmark_group import admet_group
group = admet_group(path = 'data/')
benchmark = group.get('Caco2_Wang')

predictions = {}
name = benchmark['name']
train_val, test = benchmark['train_val'], benchmark['test']

## --- train your model --- ##

predictions[name] = y_pred
group.evaluate(predictions)
# {'caco2_wang': {'mae': 0.234}}

You can use train_val to construct training and validation sets as you see best fit. For example, you can (1) construct a customized training and validation set using the train_val or (2) use a TDC utility function to get data splits for different random seeds:

train, valid = group.get_train_valid_split(benchmark = 'Caco2_Wang', split_type = 'default', seed = 42)

Importantly, you must evaluate your model on the test set as specified by TDC to ensure fair comparison of models. To promote robust measurement of model performance, TDC requires at minimum five independent runs of the model to calculate average performance and standard deviation. Following is an example showing how to obtain five different train and validation splits:

from tdc.benchmark_group import admet_group
group = admet_group(path = 'data/')
predictions_list = []

for seed in [1, 2, 3, 4, 5]:

    benchmark = group.get('Caco2_Wang')
    
    predictions = {}
    name = benchmark['name']
    train_val, test = benchmark['train_val'], benchmark['test']
    train, valid = group.get_train_valid_split(benchmark = name, split_type = 'default', seed = seed)
    
    ## --- train your model --- ##
        
    predictions[name] = y_pred_test
    predictions_list.append(predictions)

group.evaluate_many(predictions_list)
# {'caco2_wang': [6.328, 0.101]}

You can get a list of benchmarks included in the benchmark group as follows:

from tdc import utils
names = utils.retrieve_benchmark_names('ADMET_Group')
# ['caco2_wang', 'hia_hou', ....]

Alternatively, the same can be achieved via:

from tdc.benchmark_group import admet_group
group = admet_group(path = 'data/')
group.dataset_names
# ['caco2_wang', 'hia_hou', ....]

For every benchmark group, we provide multiple benchmarks that all instantiate the same learning task. We encourage you to submit results for the entire benchmark group; however, we also accept submissions reporting performance on just one benchmark in the group. To access all benchmarks in a group, TDC provides the following helper function:

from tdc.benchmark_group import admet_group
group = admet_group(path = 'data/')
predictions_list = []

for seed in [1, 2, 3, 4, 5]:
    predictions = {}
    for benchmark in group:
        name = benchmark['name']
        train_val, test = benchmark['train_val'], benchmark['test']
        train, valid = group.get_train_valid_split(benchmark = name, split_type = 'default', seed = seed)
        ## --- train your model --- ##
        predictions[name] = y_pred_test
    predictions_list.append(predictions)

group.evaluate_many(predictions_list)
# {'caco2_wang': [6.328, 0.101], 'hia_hou': [0.5, 0.01], ...}

The FAIR Guiding Principles

ML tools have become essential for research. TDC leaderboards keep track of ML tools across the entire range of therapeutics. To improve the findability, accessibility, interoperability, and reuse of ML tools, we apply FAIR4RS principles and implementation guidelines to all software and ML tools included in TDC leaderboards.

We strongly believe that software and ML tools should be open and adhere to FAIR principles to encourage repeatability, reproducibility, and reuse. TDC follows the FAIR guidelines for both datasets as well as ML tools and data functions.


Start Exploring Groups of Leaderboards in TDC