Quick links for this year's Challenge:
For the first time in a public competition, teams must submit both the code for their models and the code for training their models. To help, we have shared simple baseline models in Python and MATLAB, and we encourage teams to use our Python and MATLAB code as templates for their entries. To add the code for training your model to your entry, please edit the train_12ECG_classifier script, and to add the code for running your model to your entry, please edit the run_12ECG_classifier script. Please see the following sections for more detailed, language-specific instructions.
physionetchallengeshelper
as a collaborator to your repository..git
. On GitHub, you can get this URL by clicking on “Clone or download” and copying and pasting the URL, e.g., https://github.com/physionetchallenges/python-classifier-2020.git
. Please see here for an example.mcc -m train_model.m -a .
) and running (mcc -m driver.m -a .
) your classifier, and run them on Google Cloud.Why containers?
Containers allow you to define the environment that you think is best suited for your algorithm. For example, if you think your algorithm needs a specific version of CentOS, a certain version of a library, and specific frameworks, then you can use the containers to specify this. Here are two links with good, data science-centric introductions to Docker: https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 https://link.medium.com/G87RxYuQIV
Quickly, how can I test my submission locally?
Install Docker. Clone your repository. Build an image. Run it on a single recording.
Less quickly, how can I test my submission locally? Please give me commands that I can copy and paste.
Here are instructions for testing the Python example code in Linux. You can test the non-Python example code in a Mac, for example, in a similar way. If you have trouble testing your code, then make sure that you can test the example code, which is known to work.
First, create a folder, docker_test
, in your home directory. Then, put the example code from GitHub in docker_test/python-classifier-2020-master
, some of the training data in docker_test/input_directory
and docker_test/input_training_directory
, an empty folders for the output of the training code in docker_test/output_training_directory
, and empty folder for the classifications in docker_test/output_directory.
Finally, build a Docker image and run the example code using the following steps:
Docker
user@computer:~/docker_test$ ls
input_directory output_directory python-classifier-2020-master
user@computer:~/docker_test$ ls input_directory/
A0001.hea A0001.mat A0002.hea A0002.mat A0003.hea ...
user@computer:~/docker_test$ cd python-classifier-2020-master/
user@computer:~/docker_test/python-classifier-2020-master$ docker build -t image .
Sending build context to Docker daemon 30.21kB
[...]
Successfully tagged image:latest
user@computer:~/docker_test/python-classifier-2020-master$ docker run -it -v
~/docker_test/input_training_directory:/physionet/input_training_directory -v
~/docker_test/output_training_directory:/physionet/output_training_directory -v
~/docker_test/input_directory:/physionet/input_directory -v ~/docker_test/output_directory:/physionet/output_directory image bash
root@[...]:/physionet# ls
AUTHORS.txt Dockerfile LICENSE.txt README.md driver.py run_12ECG_classifier.py get_12ECG_features.py input_directory output_directory requirements.txt
root@[...]:/physionet# python train_model.py input_training_directory/ output_training_directory/
root@[...]:/physionet# python driver.py output_training_directory/ input_directory/ output_directory/
root@[...]:/physionet# exit
Exit
user@computer:~/docker_test$ cd ..
user@computer:~/docker_test$ ls output_directory/
A0001.csv A0002.csv A0003.csv A0004.csv A0005.csv
How do I install Docker?
Go to https://docs.docker.com/install/ and install the Docker Community Edition. For troubleshooting, see https://docs.docker.com/config/daemon/
Do I have to use your Dockerfile?
No. The only part of the Dockerfile we care about are the three lines marked as ”DO NOT EDIT”. These three lines help ensure that, during the build process of the container, your code is copied into a folder called physionet so that our cloud-based pipelines can find your code and run it. Please do not change those three lines. You are free to change your base image, and at times you should (see next question).
What’s the base image in Docker?
Think of Docker as a series of images, or snapshots of a virtual machine, that are layered on top of each other. For example, our image may built on top of a very lightweight Ubuntu operating system with Python 3.7.3 that we get from the official Docker Hub (think of it as a GitHub for Docker). We can then install our requirements (NumPy and SciPy) on it. If you need the latest version of TensorFlow, then search for it on hub.docker.com and edit your file so that the first line of your Dockerfile now reads as: FROM tensorflow
. For a specific version, say 1.11, lookup the tags and change it accordingly to FROM tensorflow:1.11.0
. We recommend using specific versions for reproducibility.
sklearn or scikit-learn?
The single most common error we noticed in the requirements.txt file for Python submissions was the sklearn package. If your entry uses scikit-learn, then you need to install via pip using the package name scikit-learn instead of sklearn in your requirements.txt file: See here.
xgboost?
For Python, replace python:3.7.3-slim
with python:3.7.3-stretch
in the first line of your Dockerfile. This image includes additional packages, such as GCC, that xgboost needs. Additionally, include xgboost in your requirements.txt file. Specify the version of xgboost that you are using in your requirements.txt file.
For R, add RUN R -e 'install.packages(“xgboost”)'
to your Dockerfile.
Pandas?
Replace python:3.7.3-slim
with python:3.7.3-stretch
in the first line of your Dockerfile.
Why can’t I install a common Python or R package using Python or R’s package manager?
Some packages have dependencies, such as GCC, that need to be installed. Try replacing python:3.7.3-slim
with python:3.7.3-stretch
, which includes more packages by default, or installing the dependencies
If the first line of your Dockerfile is FROM python:3.7.3-slim
, then you are building a Docker image with the Debian Linux distribution, so you can install GCC and other related libraries that many Python and R packages use by adding the line RUN apt install build-essential
to your Dockerfile before installing these packages.
How do I build my image?
git clone <<your repository URL that ends in .git>>
cd <<your repository name>>
ls
You should see a Dockerfile and other relevant files here.
docker build -t <<some image name that must be in lowercase letters>> .
docker images
docker run -it <<image name from above>> bash
This will take you into your container and you should see your code.
What can I do to make sure that my submission is successful?
You can avoid most submission errors with the following steps:
Why is my entry unsuccessful on your submission system? It works on my computer.
There are several common reasons for unexpected errors:
The submission form can be found here: https://forms.gle/PWu87SqN8frh6aKS7
Supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number R01EB030362.