construction of the PyTorch Lightning module and the hyperparameter search for the SemEval-2019 Task 3 dataset (contextual emotion detection in text)

Lightning Module

Defining the Lightning module is now straightforward, see also the documentation. The default hyperparameter choices were motivated by this paper.

Further references for PyTorch Lightning and its usage for Multi-GPU Training/Hyperparameter search can be found in the following blog posts by William Falcon:

class EmotionModel[source]

EmotionModel(hparams) :: LightningModule

PyTorch Lightning module for the Contextual Emotion Detection in Text Challenge

Hyperparameter Search Argument Parser

Next we define the HyperOptArgumentParser including distributed training (see also the documentation) and debugging functionality.

get_args[source]

get_args(model)

returns the HyperOptArgumentParser

Let us take a look at the different attributes of hparams.

hparams = get_args(EmotionModel)
hparams = hparams.parse_args(args=[])
vars(hparams)
{'mode': 'default',
 'save_path': '/home/julius/Documents/nbdev_venv/emotion_transformer/logs',
 'gpus': None,
 'distributed_backend': None,
 'use_16bit': False,
 'fast_dev_run': False,
 'track_grad_norm': False,
 'bs': 64,
 'projection_size': 256,
 'n_layers': 1,
 'frozen_epochs': 2,
 'lr': 2e-05,
 'layerwise_decay': 0.95,
 'max_seq_len': 32,
 'dropout': 0.1,
 'train_file': '/home/julius/Documents/nbdev_venv/emotion_transformer/data/clean_train.txt',
 'val_file': '/home/julius/Documents/nbdev_venv/emotion_transformer/data/clean_val.txt',
 'test_file': '/home/julius/Documents/nbdev_venv/emotion_transformer/data/clean_test.txt',
 'epochs': 10,
 'seed': None,
 'hpc_exp_number': None,
 'trials': <bound method HyperOptArgumentParser.opt_trials of HyperOptArgumentParser(prog='ipykernel_launcher.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>,
 'optimize_parallel': <bound method HyperOptArgumentParser.optimize_parallel of HyperOptArgumentParser(prog='ipykernel_launcher.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>,
 'optimize_parallel_gpu': <bound method HyperOptArgumentParser.optimize_parallel_gpu of HyperOptArgumentParser(prog='ipykernel_launcher.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>,
 'optimize_parallel_cpu': <bound method HyperOptArgumentParser.optimize_parallel_cpu of HyperOptArgumentParser(prog='ipykernel_launcher.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>,
 'generate_trials': <bound method HyperOptArgumentParser.generate_trials of HyperOptArgumentParser(prog='ipykernel_launcher.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>,
 'optimize_trials_parallel_gpu': <bound method HyperOptArgumentParser.optimize_trials_parallel_gpu of HyperOptArgumentParser(prog='ipykernel_launcher.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>}

Trainer

Next we define a function calling the Lightning trainer using the setting specified in hparams.

main[source]

main(hparams, gpus=None)

Trains the Lightning model as specified in hparams

Let us check the model by running a quick development run.

hparams.fast_dev_run = True
main(hparams)
Epoch 1:  50%|█████     | 1/2 [00:04<00:04,  4.33s/batch, batch_nb=0, loss=2.083, v_nb=11]
Validating:   0%|          | 0/1 [00:00<?, ?batch/s]
Epoch 1: 100%|██████████| 2/2 [00:08<00:00,  4.38s/batch, batch_nb=0, f1_score=nan, fn=9, fp=0, loss=2.083, precision=nan, recall=0, tp=0, v_nb=11, val_acc=0.859, val_loss=2.01]
Epoch 1: 100%|██████████| 2/2 [00:09<00:00,  4.68s/batch, batch_nb=0, f1_score=nan, fn=9, fp=0, loss=2.083, precision=nan, recall=0, tp=0, v_nb=11, val_acc=0.859, val_loss=2.01]

We also create a python file for automatic hyperparameter optimization across different GPUs or CPUs:

%%writefile main.py

from emotion_transformer.lightning import EmotionModel, get_args, main

if __name__ == '__main__':
    hparams = get_args(EmotionModel)
    hparams = hparams.parse_args()

    if hparams.mode in ['test','default']:
        main(hparams)
    elif hparams.mode == 'hparams_search':
        if hparams.gpus:
            hparams.optimize_parallel_gpu(main, max_nb_trials=20, 
                                          gpu_ids = [gpus for gpus in hparams.gpus.split(' ')])
        else:
            hparams.optimize_parallel_cpu(main, nb_trials=20, nb_workers=4)
Overwriting main.py

Background Information

For the interested reader we provide some background information on the (distributed) training loop:

  • one epoch consists of m = ceil(30160/batchsize) batches for the training and additional n = ceil(2755/batchsize) batches for the validation.

dp case:

  • the batchsize will be split and each gpu receives (up to rounding) a batch of size batchsize/num_gpus

  • in the validation steps each gpu computes its own scores for each of the n batches (of size batchsize/num_gpus), i.e. each gpu calls the validation_step method

  • the output which is passed to the validation_end method consists of list of dictionaries (containing the concatenated scores from the different gpus), i.e.

output = [ {first_metric: [first_gpu_batch_1,...,last_gpu_batch_1],..., last_metric: [first_gpu_batch_1,...,last_gpu_batch_1]},..., {first_metric: [first_gpu_batch_n,...,last_gpu_batch_n],..., last_metric: [first_gpu_batch_n,...,last_gpu_batch_n]} ]

ddp case: (does not work from jupyter notebooks)

  • the gpus receive (disjoint) samples of size batchsize and train on own processes but communicate and average their gradients (thus the resulting models on each gpu have the same weights)

  • each gpu computes its own validation_end method and its own list of dictionaries

output_first_gpu = [ {first_metric: batch_1,...,last_metric: batch_1},..., {first_metric: batch_n,...,last_metric: batch_n} ]

output_last_gpu = [ {first_metric: batch_1,...,last_metric: batch_1},..., {first_metric: batch_n,...,last_metric: batch_n} ]

ddp case: (does not work from jupyter notebooks)

  • on each node we have the dp case but the nodes communicate analogous to the ddp case