Lightning Module¶

Defining the Lightning module is now straightforward, see also the documentation. The default hyperparameter choices were motivated by this paper.

Further references for PyTorch Lightning and its usage for Multi-GPU Training/Hyperparameter search can be found in the following blog posts by William Falcon:

Hyperparameter Search Argument Parser¶

Next we define the HyperOptArgumentParser including distributed training (see also the documentation) and debugging functionality.

Let us take a look at the different attributes of hparams.

hparams = get_args(EmotionModel)
hparams = hparams.parse_args(args=[])
vars(hparams)

{'mode': 'default',
 'save_path': '/home/julius/Documents/nbdev_venv/emotion_transformer/logs',
 'gpus': None,
 'distributed_backend': None,
 'use_16bit': False,
 'fast_dev_run': False,
 'track_grad_norm': False,
 'bs': 64,
 'projection_size': 256,
 'n_layers': 1,
 'frozen_epochs': 2,
 'lr': 2e-05,
 'layerwise_decay': 0.95,
 'max_seq_len': 32,
 'dropout': 0.1,
 'train_file': '/home/julius/Documents/nbdev_venv/emotion_transformer/data/clean_train.txt',
 'val_file': '/home/julius/Documents/nbdev_venv/emotion_transformer/data/clean_val.txt',
 'test_file': '/home/julius/Documents/nbdev_venv/emotion_transformer/data/clean_test.txt',
 'epochs': 10,
 'seed': None,
 'hpc_exp_number': None,
 'trials': <bound method HyperOptArgumentParser.opt_trials of HyperOptArgumentParser(prog='ipykernel_launcher.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>,
 'optimize_parallel': <bound method HyperOptArgumentParser.optimize_parallel of HyperOptArgumentParser(prog='ipykernel_launcher.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>,
 'optimize_parallel_gpu': <bound method HyperOptArgumentParser.optimize_parallel_gpu of HyperOptArgumentParser(prog='ipykernel_launcher.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>,
 'optimize_parallel_cpu': <bound method HyperOptArgumentParser.optimize_parallel_cpu of HyperOptArgumentParser(prog='ipykernel_launcher.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>,
 'generate_trials': <bound method HyperOptArgumentParser.generate_trials of HyperOptArgumentParser(prog='ipykernel_launcher.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>,
 'optimize_trials_parallel_gpu': <bound method HyperOptArgumentParser.optimize_trials_parallel_gpu of HyperOptArgumentParser(prog='ipykernel_launcher.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>}

Trainer¶

Next we define a function calling the Lightning trainer using the setting specified in hparams.

Let us check the model by running a quick development run.

hparams.fast_dev_run = True
main(hparams)

Epoch 1:  50%|█████     | 1/2 [00:04<00:04,  4.33s/batch, batch_nb=0, loss=2.083, v_nb=11]
Validating:   0%|          | 0/1 [00:00<?, ?batch/s]
Epoch 1: 100%|██████████| 2/2 [00:08<00:00,  4.38s/batch, batch_nb=0, f1_score=nan, fn=9, fp=0, loss=2.083, precision=nan, recall=0, tp=0, v_nb=11, val_acc=0.859, val_loss=2.01]
Epoch 1: 100%|██████████| 2/2 [00:09<00:00,  4.68s/batch, batch_nb=0, f1_score=nan, fn=9, fp=0, loss=2.083, precision=nan, recall=0, tp=0, v_nb=11, val_acc=0.859, val_loss=2.01]

We also create a python file for automatic hyperparameter optimization across different GPUs or CPUs:

%%writefile main.py

from emotion_transformer.lightning import EmotionModel, get_args, main

if __name__ == '__main__':
    hparams = get_args(EmotionModel)
    hparams = hparams.parse_args()

    if hparams.mode in ['test','default']:
        main(hparams)
    elif hparams.mode == 'hparams_search':
        if hparams.gpus:
            hparams.optimize_parallel_gpu(main, max_nb_trials=20, 
                                          gpu_ids = [gpus for gpus in hparams.gpus.split(' ')])
        else:
            hparams.optimize_parallel_cpu(main, nb_trials=20, nb_workers=4)

Overwriting main.py

Background Information¶

For the interested reader we provide some background information on the (distributed) training loop:

one epoch consists of m = ceil(30160/batchsize) batches for the training and additional n = ceil(2755/batchsize) batches for the validation.

dp case:

the batchsize will be split and each gpu receives (up to rounding) a batch of size batchsize/num_gpus
in the validation steps each gpu computes its own scores for each of the n batches (of size batchsize/num_gpus), i.e. each gpu calls the validation_step method
the output which is passed to the validation_end method consists of list of dictionaries (containing the concatenated scores from the different gpus), i.e.

output = [ {first_metric: [first_gpu_batch_1,...,last_gpu_batch_1],..., last_metric: [first_gpu_batch_1,...,last_gpu_batch_1]},..., {first_metric: [first_gpu_batch_n,...,last_gpu_batch_n],..., last_metric: [first_gpu_batch_n,...,last_gpu_batch_n]} ]

ddp case: (does not work from jupyter notebooks)

the gpus receive (disjoint) samples of size batchsize and train on own processes but communicate and average their gradients (thus the resulting models on each gpu have the same weights)
each gpu computes its own validation_end method and its own list of dictionaries

output_first_gpu = [ {first_metric: batch_1,...,last_metric: batch_1},..., {first_metric: batch_n,...,last_metric: batch_n} ]

output_last_gpu = [ {first_metric: batch_1,...,last_metric: batch_1},..., {first_metric: batch_n,...,last_metric: batch_n} ]

ddp case: (does not work from jupyter notebooks)

on each node we have the dp case but the nodes communicate analogous to the ddp case

PyTorch Lightning

Lightning Module¶

`class` `EmotionModel`[source]

Hyperparameter Search Argument Parser¶

`get_args`[source]

Trainer¶

`main`[source]

Background Information¶

PyTorch Lightning

Lightning Module¶

class EmotionModel[source]

Hyperparameter Search Argument Parser¶

get_args[source]

Trainer¶

main[source]

Background Information¶

`class` `EmotionModel`[source]

`get_args`[source]

`main`[source]