Similarly to last year’s Challenge, teams must submit both the code for training their models and for running their trained models. To help, we have implemented example entries in both MATLAB and Python. We encourage teams to use these examples as templates for their code.
physionetchallengeshelper
user.AUTHORS.txt
, LICENSE.txt
, README.md
: Update as appropriate. Please include your authors. Unfortunately, our submission system will be unable to read your README file to change how we run your code.train_model.m
: Do not change this script. It calls your team_training_code.m
script. We will not use the train_model.m
script from your repository, so any change made to this code will not be included.team_training_code.m
: Update this script to create and save your model.run_model.m
: Do not change this script. It loads your model by calling load_model
and runs your model by calling your team_testing_code
function for each patient ID. We will not use the run_model.m
script from your repository, so any change made to this code will not be included.team_testing_code.m
: Update this script to load and run your model weights and any parameters from files in your submission.mcc -m train_model.m -a .
and mcc -m run_model.m -a .
), and run it on our machines or Google Cloud.Dockerfile
: Update to specify the version of Python that you are using on your machine. Add any additional packages that you need. Do not change the name or location of this file. The structure of this file is important, especially the 3 lines that are marked as “DO NOT EDIT”.requirements.txt
: Add Python packages to be installed with pip
. Specify the versions of these packages that you are using on your machine. Remove unnecessary packages, such as Matplotlib, that your code does not need.AUTHORS.txt
, LICENSE.txt
, README.md
: Update as appropriate. Please include your authors. Unfortunately, our submission system will be unable to read your README file to change how we run your code.team_code.py
: Update this script to load and run your trained model.train_model.py
: Do not change this script. It calls functions from the team_code
script to train your model on the training data.helper_code.py
Do not change this script. It is a script with helper variables and functions used for our code. You are welcome to use them in your code. We will not use the helper_code.py
script from your repository, so any change made to this code will not be included.run_model.py
: Do not change this script. It calls your functions from the team_code
script to load and run your trained models to run on the test data. We will not use the run_model.py
script from your repository, so any change made to this code will not be included.Why containers?
Containers allow you to define the environment that you think is best suited for your algorithm. You can choose a specific version of a Linux distribution, install dependancies, and choose specific versions of your favorite libraries and frameworks. Here are two links with data science-centric introductions to Docker.
Quickly, how can I test my submission locally?
Install Docker. Clone your repository. Build an image. Run it on the data, or at least a small subset of the data.
Less quickly, how can I test my submission locally? Please give me commands that I can copy and paste.
To guarantee that we can run your code, please install Docker, build a Docker image from your code, and run it on the training data. To quickly check your code for bugs, you may want to run it on a subset of the training data.
If you have trouble running your code, then please try the follow steps to run the example code.
Create a folder example
in your home directory with several subfolders.
user@computer:~$ cd ~/
user@computer:~$ mkdir example
user@computer:~$ cd example
user@computer:~/example$ mkdir training_data test_data model test_outputs
Download the training data from the Challenge website. Put some of the training data in training_data
and test_data
. You can use some of the training data to check your code (and should perform cross-validation on the training data to evaluate your algorithm).
Download or clone this repository in your terminal.
user@computer:~/example$ git clone https://github.com/physionetchallenges/python-example-2023.git
Build a Docker image and run the example code in your terminal.
user@computer:~/example$ ls
model python-example-2023 test_data test_outputs training_data
user@computer:~/example$ cd python-example-2023/
user@computer:~/example/python-example-2023$ docker build -t image .
Sending build context to Docker daemon [...]kB
[...]
Successfully tagged image:latest
user@computer:~/example/python-example-2023$ docker run -it -v ~/example/model:/physionet/model -v ~/example/test_data:/physionet/test_data -v ~/example/test_outputs:/physionet/test_outputs -v ~/example/training_data:/physionet/training_data image bash
root@[...]:/physionet# ls
Dockerfile README.md test_outputs
evaluate_model.py requirements.txt training_data
helper_code.py team_code.py train_model.py
LICENSE run_model.py
root@[...]:/physionet# python train_model.py training_data model
root@[...]:/physionet# python run_model.py model test_data test_outputs
root@[...]:/physionet# python evaluate_model.py test_data test_outputs
[...]
root@[...]:/physionet# exit
Exit
How are the validation and test sets different from the training set?
We do not include labels with the validation and test sets, so your code should not try to load them.
The training set has up to 72 hours of recording data, but we will run your trained model on the validation and test sets with only 12, 24, 48, and 72 hours of recording data, so you code should be able to run on them.
You can use these scripts from the Python example code to generate data without labels and only 12, 24, 48, or 72 hours of data from the publicly available training set:
remove_data.py
: Remove the binary signal data, i.e., the EEG recordings. Usage: run python remove_data.py -i input_folder -o output_folder
to copy the labels and metadata from input_folder
to output_folder
.remove_labels.py
: Remove the labels. Usage: run python remove_labels.py -i input_folder -o output_folder
to copy the data and metadata from input_folder
to output_folder
.truncate_data.py
: Truncate the recordings. Usage: run python truncate_data.py -i input_folder -o output_folder -t 12
to truncate the recordings to 12 hours. We will run your trained models on data with 12, 24, 48, and 72 hours of data.What computational resources will my entry have?
We are using a g4dn.4xlarge
instance on AWS to run your code. It has 16 vCPUs, 64 GB RAM (60 GB available to your code), 300 GB of local storage (in addition to the data), and an optional NVIDIA T4 GPU.
For training your model on the training data, we impose a 48 hour time limit for submissions that request a GPU and a 72 hour time limit for submissions that do not request a GPU. For running your trained model on the validation or test data, we impose a 24 hour time limit whether or not a submission requests a GPU.
How do I install Docker?
Go to https://docs.docker.com/install/ and install the Docker Community Edition. For troubleshooting, see https://docs.docker.com/config/daemon/
Do I have to use your Dockerfile?
No. The only part of the Dockerfile we care about are the three lines marked as “DO NOT EDIT”. These three lines help ensure that, during the build process of the container, your code is copied into a folder called physionet
so that our processing pipeline can find your code and run it. Please do not change those three lines. You are free to change your base image, and at times you should (see the next question).
What’s the base image in Docker?
Think of Docker as a series of images, or snapshots of a virtual machine, that are layered on top of each other. For example, your image may built on top of a very lightweight Ubuntu operating system with Python 3.8.6 from the official Docker Hub (think of it as a GitHub for Docker). You can then install your NumPy, SciPy, and other libraries on it. If you need the latest version of TensorFlow, then search for it on hub.docker.com and edit your file so that the first line of your Dockerfile now reads as: FROM tensorflow
. For a specific version, say 1.11, lookup the tags and change it accordingly to FROM tensorflow:1.11.0
. We recommend using specific versions for reproducibility.
sklearn or scikit-learn?
For Python, if your entry uses scikit-learn, then you need to install it via pip
using the package name scikit-learn
instead of sklearn
in your requirements.txt
file: See here.
Why can’t I install a common Python or R package using Python or R’s package manager?
Some packages have dependencies, such as the GCC, that need to be installed. Try python:3.8.9-buster
, which includes more packages by default, or installing the dependencies. If the first line of your Dockerfile is FROM python:3.8.6-slim
, then you are building a Docker image with the Debian Linux distribution, so you can install GCC and other related libraries that many Python and R packages use by adding the line RUN apt install build-essential
to your Dockerfile before installing these packages.
How do I build my image?
git clone <<your repository URL that ends in .git>>
cd <<your repository name>>
ls
You should see a Dockerfile and other relevant files here.
docker build -t <<some image name that must be in lowercase letters>> .
docker images
docker run -it <<image name from above>> bash
This will take you into your container and you should see your code.
Please see Docker-specific FAQs for more information and description.
What can I do to make sure that my submission is successful?
You can avoid most submission errors with the following steps:
train_model
, run_model
, or helper_code
scripts. We will only use the versions of these scripts in the MATLAB and Python example repositories (https://github.com/physionetchallenges), so any changes that you make will not be used.Why is my entry unsuccessful on your submission system? It works on my computer.
There are several common reasons for unexpected errors:
train_model
, run_model
, or helper_code
script. For consistency across submissions from different participants, we will use the scripts available on https://github.com/physionetchallenges/.Please see the George B. Moody PhysioNet Challenge 2023 webpage for more details. Please post questions and concerns on the Challenge discussion forum.
Supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number R01EB030362.
© PhysioNet Challenges. Website content licensed under the Creative Commons Attribution 4.0 International Public License.