Similarly to last year’s Challenge, teams must submit both the code for training their models and for running their trained models. To help, we have implemented example entries in both MATLAB and Python. We encourage teams to use these examples as templates for their code.
physionetchallengeshelper
user.AUTHORS.txt
, LICENSE.txt
, README.md
: Update as appropriate. Please include your authors. Unfortunately, our submission system will be unable to read your README file to change how we run your code.train_model.m
: Do not change this script. It calls your team_training_code.m
script. We will not use the train_model.m
script from your repository, so any change made to this code will not be included.team_training_code.m
: Update this script to create and save your model.run_model.m
: Do not change this script. It loads your model by calling load_model
and runs your model by calling your team_testing_code
function for each patient ID. We will not use the run_model.m
script from your repository, so any change made to this code will not be included.team_testing_code.m
: Update this script to load and run your model weights and any parameters from files in your submission.mcc -m train_model.m -a .
and mcc -m run_model.m -a .
), and run it on our machines or the cloud.Dockerfile
: Update to specify the version of Python that you are using on your machine. Add any additional packages that you need. Do not change the name or location of this file. The structure of this file is important, especially the 3 lines that are marked as “DO NOT EDIT”.requirements.txt
: Add Python packages to be installed with pip
. Specify the versions of these packages that you are using on your machine. Remove unnecessary packages that your code does not need.AUTHORS.txt
, LICENSE.txt
, README.md
: Update as appropriate. Please include your authors. Unfortunately, our submission system will be unable to read your README file to change how we run your code.team_code.py
: Update this script to load and run your trained model(s).train_model.py
: Do not change this script. It calls functions from the team_code.py
script to train your model on the training data.helper_code.py
Do not change this script. It is a script with helper functions for our code. You are welcome to use them in your code. We will not use the helper_code.py
script from your repository, so any change made to this code will not be included.run_model.py
: Do not change this script. It calls your functions from the team_code.py
script to load and run your trained models to run on the test data. We will not use the run_model.py
script from your repository, so any change made to this code will not be included.Why containers?
Containers allow you to define the environment that you think is best suited for your algorithm. You can choose a specific version of a Linux distribution, install dependancies, and choose specific versions of your favorite libraries and frameworks. Here are two links with data science-centric introductions to Docker.
Quickly, how can I test my submission locally?
Install Docker. Clone your repository. Build an image. Run it on the data, or at least a small subset of the data.
Less quickly, how can I test my submission locally? Please give me commands that I can copy and paste.
To guarantee that we can run your code, please install Docker, build a Docker image from your code, and run it on the training data. To quickly check your code for bugs, you may want to run it on a subset of the training data.
If you have trouble running your code, then please try the follow steps to run the example code.
Create a folder example
in your home directory with several subfolders.
user@computer:~$ cd ~/
user@computer:~$ mkdir example
user@computer:~$ cd example
user@computer:~/example$ mkdir training_data test_data model test_outputs
Download the training data from the Challenge website. Put some of the training data in training_data
and test_data
. You can use some of the training data to check your code (and should perform cross-validation on the training data to evaluate your algorithm).
Download or clone this repository in your terminal.
user@computer:~/example$ git clone https://github.com/physionetchallenges/python-example-2024.git
Build a Docker image and run the example code in your terminal.
user@computer:~/example$ ls
model python-example-2024 test_data test_outputs training_data
user@computer:~/example$ cd python-example-2024/
user@computer:~/example/python-example-2024$ docker build -t image .
Sending build context to Docker daemon [...]kB
[...]
Successfully tagged image:latest
user@computer:~/example/python-example-2024$ docker run -it -v ~/example/model:/challenge/model -v ~/example/test_data:/challenge/test_data -v ~/example/test_outputs:/challenge/test_outputs -v ~/example/training_data:/challenge/training_data image bash
root@[...]:/challenge# ls
Dockerfile README.md test_outputs
evaluate_model.py requirements.txt training_data
helper_code.py team_code.py train_model.py
LICENSE run_model.py
root@[...]:/challenge# python train_model.py -d training_data -m model
root@[...]:/challenge# python run_model.py -d test_data -m model -o test_outputs
root@[...]:/challenge# python evaluate_model.py -d test_data -o test_outputs
[...]
root@[...]:/challenge# exit
Exit
What computational resources will my entry have?
We are using a g4dn.4xlarge
instance on AWS or a comparable environment to run your code. It has 16 vCPUs, 64 GB RAM (60 GB available to your code), 300 GB of local storage (in addition to the data), and an optional NVIDIA T4 GPU.
For training your model on the training set, we impose a 24 hour time limit on a subset (~1000 records) of the training set and a 168 hour limit on the entirety (~22,000 records) of the training set. For running your trained model on the validation set (~1000 records), we impose a 24 hour time limit. For running your trained model on the test set, we impose a similar limit as the validation set, but with more time for more records, e.g., 48 hours if the test set is twice the size of the validation set.
How do I install Docker?
Go to https://docs.docker.com/install/ and install the Docker Community Edition. For troubleshooting, see https://docs.docker.com/config/daemon/
Do I have to use your Dockerfile?
No. The only part of the Dockerfile we care about are the three lines marked as “DO NOT EDIT”. These three lines help ensure that, during the build process of the container, your code is copied into a folder called physionet
so that our processing pipeline can find your code and run it. Please do not change those three lines. You are free to change your base image, and at times you should (see the next question).
What’s the base image in Docker?
Think of Docker as a series of images, or snapshots of a virtual machine, that are layered on top of each other. For example, your image may built on top of a very lightweight Ubuntu operating system with Python 3.8.6 from the official Docker Hub (think of it as a GitHub for Docker). You can then install your NumPy, SciPy, and other libraries on it. If you need the latest version of TensorFlow, then search for it on hub.docker.com and edit your file so that the first line of your Dockerfile now reads as: FROM tensorflow
. For a specific version, say 1.11, lookup the tags and change it accordingly to a specific version, such asFROM tensorflow:1.11.0
. We recommend using specific versions for reproducibility.
sklearn or scikit-learn?
For Python, if your entry uses scikit-learn, then you need to install it via pip
using the package name scikit-learn
instead of sklearn
in your requirements.txt
file: See here.
Why can’t I install a common Python or R package using Python or R’s package manager?
Some packages have dependencies, such as the GCC, that need to be installed. Try python:3.8.9-buster
, which includes more packages by default, or installing the dependencies. If the first line of your Dockerfile is FROM python:3.8.6-slim
, then you are building a Docker image with the Debian Linux distribution, so you can install GCC and other related libraries that many Python and R packages use by adding the line RUN apt install build-essential
to your Dockerfile before installing these packages.
How do I build my image?
git clone <<your repository URL that ends in .git>>
cd <<your repository name>>
ls
You should see a Dockerfile and other relevant files here.
docker build -t <<some image name that must be in lowercase letters>> .
docker images
docker run -it <<image name from above>> bash
This will take you into your container and you should see your code.
Please see Docker-specific FAQs for more information and description.
What can I do to make sure that my submission is successful?
You can avoid most submission errors with the following steps:
train_model
, run_model
, or helper_code
scripts. We will only use the versions of these scripts in the MATLAB and Python example repositories (https://github.com/physionetchallenges), so any changes that you make will not be used.Why is my entry unsuccessful on your submission system? It works on my computer.
There are several common reasons for unexpected errors:
train_model
, run_model
, or helper_code
script. For consistency across submissions from different participants, we will use the scripts available on https://github.com/physionetchallenges/.Please see the George B. Moody PhysioNet Challenge 2024 webpage for more details. Please post questions and concerns on the Challenge discussion forum.
Supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number R01EB030362.
© PhysioNet Challenges. Website content licensed under the Creative Commons Attribution 4.0 International Public License.