Similarly to last year’s Challenge, teams must submit both the code for their models and the code for training their models. To help, we have implemented example entries in both MATLAB and Python, and we encourage teams to use these example entries as templates for their entries.
physionetchallengeshelper
as a collaborator to your repository..git
. On GitHub, you can get this URL by clicking on “Clone or download” and copying and pasting the URL, e.g., https://github.com/physionetchallenges/python-classifier-2021.git
. Please see here for an example.AUTHORS.txt
, LICENSE.txt
, README.md
: Update as appropriate. Please include your authors. Unfortunately, our submission system will be unable to read your README file to change how we run your code.train_model.m
: Do not edit this script. It calls your team_training_code.m
script. We will not use the train_model.m
script from your repository, so any change made to this code will not be included.team_training_code.m
: Update this script to create and save your model. It loads the header with the data and demographics information for a recording, extracts features from the data using the get_features.m
function which you can update and edit, and outputs and saves your model (weights and any needed parameters). You can edit this script and the get_features.m
function as much as you need.test_model.m
: Do not change this script. It loads your models by calling load_ECG_*leads_model
functions (*=2,3,6 or 12
for four different lead sets; 2-leads, 3-leads, 6-leads and 12-leads models). Then, it calls your team_testing_code
function for each recording and performs on all file input and output. We will not use the test_model.m
script from your repository, so any change made to this code will not be included.team_testing_code.m
: Update this script to load and run your model weights and any parameters from files in your submission. It takes the input test data, header files, and the loaded models (outputs of your train_model.m
) and returns a probability or confidence score and a binary classification for each class as output.get_features.m
: Update this scripts to extract your choice of features from the ECG recordings.get_leads.m
: Do not edit this script. It extracts 4 different lead sets (2-leads, 3-leads, 6-leads and 12-leads) of ECG recordings.extract_data_from_header.m
: Do not edit this script. It extracts the data information from the header files.mcc -m train_model.m -a .
) and running (mcc -m test_model.m -a .
) your classifier, and run them on Google Cloud.Dockerfile
: Update to specify the version of Python that you are using on your machine. Add any additional packages that you need. Do not change the name or location of this file. The structure of this file is important, especially the 3 lines that are marked as “DO NOT EDIT”.requirements.txt
: Add Python packages to be installed with pip
. Specify the versions of these packages that you are using on your machine. Remove unnecessary packages, such as Matplotlib, that your classification code does not need.AUTHORS.txt
, LICENSE.txt
, README.md
: Update as appropriate. Please include your authors. Unfortunately, our submission system will be unable to read your README file to change how we run your code.team_code.py
: Update this script to load and run your trained model.train_model.py
: Do not change this script. It calls functions from the team_code
script to run your training code on the training data.helper_code.py
Do not change this script. It is a script with helper variables and functions used for our code. You are welcome to use them in your code.test_model.py
: Do not change this script. It calls your trained models to run on the test data. We will not use the test_model.py
script from your repository, so any change made to this code will not be included.Why containers?
Containers allow you to define the environment that you think is best suited for your algorithm. For example, if you think your algorithm needs a specific version of a Linux distribution or a certain version of a library or framework, then you can use the containers to specify the environment. Here are two links with data science-centric introductions to Docker: https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 https://link.medium.com/G87RxYuQIV
Quickly, how can I test my submission locally?
Install Docker. Clone your repository. Build an image. Run it on a single recording.
Less quickly, how can I test my submission locally? Please give me commands that I can copy and paste.
To guarantee that we can run your code, please install Docker, build a Docker image from your code, and run it on the training data. To quickly check your code for bugs, you may want to run it on a subset of the training data.
If you have trouble running your code, then please try the follow steps to run the example code, which is known to work.
Create a folder example
in your home directory with several subfolders.
user@computer:~$ cd ~/
user@computer:~$ mkdir example
user@computer:~$ cd example
user@computer:~/example$ mkdir training_data test_data model test_outputs
Download the training data from the Challenge website. Put some of the training data in training_data
and test_data
. You can use some of the training data to check your code (and should perform cross-validation on the training data to evaluate your algorithm).
Download or clone this repository in your terminal.
user@computer:~/example$ git clone https://github.com/physionetchallenges/python-classifier-2021.git
Build a Docker image and run the example code in your terminal.
user@computer:~/example$ ls
model python-classifier-2021 test_data test_outputs training_data
user@computer:~/example$ ls training_data/
A0001.hea A0001.mat A0002.hea A0002.mat A0003.hea ...
user@computer:~/example$ cd python-classifier-2021/
user@computer:~/example/python-classifier-2021$ docker build -t image .
Sending build context to Docker daemon 30.21kB
[...]
Successfully tagged image:latest
user@computer:~/example/python-classifier-2021$ docker run -it -v ~/example/model:/physionet/model -v ~/example/test_data:/physionet/test_data -v ~/example/test_outputs:/physionet/test_outputs -v ~/example/training_data:/physionet/training_data image bash
root@[...]:/physionet# ls
Dockerfile model test_data train_model.py
extract_leads_wfdb.py README.md test_model.py
helper_code.py requirements.txt test_outputs
LICENSE team_code.py training_data
root@[...]:/physionet# python train_model.py training_data model
root@[...]:/physionet# python test_model.py model test_data test_outputs
root@[...]:/physionet# exit
Exit
user@computer:~/example/python-classifier-2021$ cd ..
user@computer:~/example$ ls test_outputs/
A0006.csv A0007.csv A0008.csv A0009.csv A0010.csv ...
What computational resources will my entry have?
We will run your training code on Google Cloud using 10 vCPUs, 65 GB RAM, 100 GB disk space, and an optional NVIDIA T4 Tensor Core GPU with 16 GB VRAM. Your training code has a 72 hour time limit without a GPU and a 48 hour time limit with a GPU.
We will run your trained model on Google Cloud using 6 vCPUs, 39 GB RAM, 100 GB disk space, and an optional NVIDIA T4 Tensor Core GPU with 16 GB VRAM. Your trained model has a 24 hour time limit on each of the validation and test sets.
We are using an N1 custom machine type to run submissions on GCP. If you would like to use a predefined machine type, then the n1-highmem-8
is the closest predefined machine type, but with 2 fewer vCPUs and 13 GB less RAM. For GPU submissions, we use the 418.40.04 driver version.
How do I install Docker?
Go to https://docs.docker.com/install/ and install the Docker Community Edition. For troubleshooting, see https://docs.docker.com/config/daemon/
Do I have to use your Dockerfile?
No. The only part of the Dockerfile we care about are the three lines marked as ”DO NOT EDIT”. These three lines help ensure that, during the build process of the container, your code is copied into a folder called physionet so that our cloud-based pipelines can find your code and run it. Please do not change those three lines. You are free to change your base image, and at times you should (see the next question).
What’s the base image in Docker?
Think of Docker as a series of images, or snapshots of a virtual machine, that are layered on top of each other. For example, your image may built on top of a very lightweight Ubuntu operating system with Python 3.8.6 that we get from the official Docker Hub (think of it as a GitHub for Docker). We can then install our requirements (NumPy and SciPy) on it. If you need the latest version of TensorFlow, then search for it on hub.docker.com and edit your file so that the first line of your Dockerfile now reads as: FROM tensorflow
. For a specific version, say 1.11, lookup the tags and change it accordingly to FROM tensorflow:1.11.0
. We recommend using specific versions for reproducibility.
sklearn or scikit-learn?
For Python, if your entry uses scikit-learn, then you need to install it via pip
using the package name scikit-learn
instead of sklearn
in your requirements.txt
file: See here.
xgboost?
For Python, try python:3.8.9-buster
in the first line of your Dockerfile. This image includes additional packages, such as GCC, that xgboost needs. Additionally, include xgboost in your requirements.txt file. Specify the version of xgboost that you are using in your requirements.txt file.
For R, add RUN R -e 'install.packages(“xgboost”)'
to your Dockerfile.
Pandas?
For Python, try python:3.8.9-buster
in the first line of your Dockerfile if you experience errors.
GPUs?
We provide an optional NVIDIA T4 Tensor Core GPU with 16 GB VRAM. We use the NVIDIA 418.40.04
driver for the GPU. The latest supported version of CUDA is 10.1, and the latest supported version of PyTorch is therefore 1.7.1.
Why can’t I install a common Python or R package using Python or R’s package manager?
Some packages have dependencies, such as GCC, that need to be installed. Try python:3.8.9-buster
, which includes more packages by default, or installing the dependencies. If the first line of your Dockerfile is FROM python:3.8.6-slim
, then you are building a Docker image with the Debian Linux distribution, so you can install GCC and other related libraries that many Python and R packages use by adding the line RUN apt install build-essential
to your Dockerfile before installing these packages.
How do I build my image?
git clone <<your repository URL that ends in .git>>
cd <<your repository name>>
ls
You should see a Dockerfile and other relevant files here.
docker build -t <<some image name that must be in lowercase letters>> .
docker images
docker run -it <<image name from above>> bash
This will take you into your container and you should see your code.
Please see Docker-specific FAQs for more information and description.
What can I do to make sure that my submission is successful?
You can avoid most submission errors with the following steps:
Why is my entry unsuccessful on your submission system? It works on my computer.
There are several common reasons for unexpected errors:
Please see the PhysioNet/CinC Challenge 2021 webpage for more details. Please post questions and concerns on the Challenge discussion forum.
Supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number R01EB030362.
© PhysioNet Challenges. Website content licensed under the Creative Commons Attribution 4.0 International Public License.