This page provides general FAQs for the Challenges. Please see the current Challenge FAQs for more specific information about the current Challenge.
To find the information about the history of the Challenges, please see here.
No, we prohibit access to both the test data and test labels to prevent significant information leakage from the out-of-sample test data to the training process. This is true for both current and past Challenges. Having access to the test labels provides the researcher too much of an opportunity to look at the test data and perform multiple re-tests. These re-tests constitute an outer training loop that lead to overfitting and an overly-optimistic value for the performance metric. Access to the test data (even without labels), provides the opportunity to employ techniques to extract information about the test data that are not representative of the ‘future’ use of an algorithm. These include extracting population statistics (mean, distributions), unsupervised clustering, and even hand-labelling of the data.
Reducing information leakage from test to training is also the reason why you will only be allowed one shot at the full test data (or perhaps two, if you publish in the follow-up journal focus issue). We understand that providing teams up to 15 entries on a subset of the test data causes a small leak, but whenever possible we keep one database separate until the final run. One final reason for not sharing the test data and labels publicly is that we sometimes use data from previous Challenges in future Challenges.
Moreover, we try to source diverse datasets for the Challenges, and we often use datasets in the Challenges that we are unable release as test data.
Yes, most certainly. We encourage you to do this. You do not need to include your data in the code stack for training the algorithm, but you do need to include the pre-trained model in the code and provide code to retrain (continue training) on the training data we provide. The pre-trained network must have a compatible license to the rest of your code. You must also thoroughly document the content of the database you used to pre-train your network. If you are able to provide access to the data, or it is already public, please include links in both your README, and the article documenting your entry. If you would like to contribute data to the Challenge for others to use (or as test data), please contact us directly. We’d be delighted to add you to the team/authorship of the resulting articles if the data adds value.
Yes - under certain conditions. First, check with us that we are able to resource your request. We are really busy and have more work that we are funded to do. If you are able to provide any donations to the resource to fund an engineer’s time, your request will be prioritized. Second, you must provide a 90% complete draft of the article you are writing to describe the method, pointing out where it differs from other known approaches, particularly in the Challenge. We prioritize novelty. We won’t judge your training and validation statistics too heavily, since we are more interested in adding to the discussion around methods, rather than adding a few percent to the top score. We also prioritize open source approaches, although if you do wish to keep your method secret, we may be able to offer a sponsorship plan. Finally, you need to package the code exactly as in the Challenges, and ensure it works in the containerized environment provided (not just on your personal computer or an arbitrarily configured cluster). Please note that bugs in the code will increase the likelihood that you do not receive a score, as we cannot invest large amounts of time into providing a single group with support.
Yes, we are happy to support ongoing research with past Challenges subject to available resources (the Challenges are largely run by volunteers) and whether you are able to do the following:
If you agree to the above conditions, then please contact us at firstname.lastname@example.org to submit your entry. Even so, we cannot guarantee that we will be able to run your code.
If you are interested in contributing to, or posing a Challenge, please feel free to contact us with details of the databases you can provide, the nature of the problem you wish to solve, and some demo code which makes a basic attempt to solve the problem. We strongly recommend having at least three independent databases, two to become public, and one to remain private/hidden. For more information on the general aims and framework of the Challenge, and the criteria for a successful event, please see here and here.
For the current Challenge FAQs, please visit here.
Supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number R01EB030362.
© PhysioNet Challenges. Website content licensed under the Creative Commons Attribution 4.0 International Public License.