Frequently Asked Questions (FAQ) - General

This page provides general FAQs for the Challenges. Please see the current Challenge FAQs for more specific information about the current Challenge.

Challenge History

Data

Scoring

Contribution

Challenge History

What is the history of the Challenges?

To find the information about the history of the Challenges, please see here.

Data

I want to evaluate my code on the test data. Can you provide either the test data or labels for the current Challenge or a previous Challenge?

No, we prohibit access to both the test data and test labels to prevent significant information leakage from the out-of-sample test data to the training process. This is true for both current and past Challenges. Having access to the test labels provides the researcher too much of an opportunity to look at the test data and perform multiple re-tests. These re-tests constitute an outer training loop that lead to overfitting and an overly-optimistic value for the performance metric. Access to the test data (even without labels), provides the opportunity to employ techniques to extract information about the test data that are not representative of the ‘future’ use of an algorithm. These include extracting population statistics (mean, distributions), unsupervised clustering, and even hand-labelling of the data.

Reducing information leakage from test to training is also the reason why you will only be allowed one shot at the full test data (or perhaps two, if you publish in the follow-up journal focus issue). We understand that providing teams up to 15 entries on a subset of the test data causes a small leak, but whenever possible we keep one database separate until the final run. One final reason for not sharing the test data and labels publicly is that we sometimes use data from previous Challenges in future Challenges.

Moreover, we try to source diverse datasets for the Challenges, and we often use datasets in the Challenges that we are unable release as test data.

Are we allowed to use external public or private data? Can we use transfer learning with pre-trained networks?

Yes, most certainly. We encourage you to do this. You do not need to include your data in the code stack for training the algorithm, but you do need to include the pre-trained model in the code and provide code to retrain (continue training) on the training data we provide. The pre-trained network must have a compatible license to the rest of your code. You must also thoroughly document the content of the database you used to pre-train your network. If you are able to provide access to the data, or it is already public, please include links in both your README, and the article documenting your entry. If you would like to contribute data to the Challenge for others to use (or as test data), please contact us directly. We’d be delighted to add you to the team/authorship of the resulting articles if the data adds value.

Scoring

I missed the Challenge, but I still want to run my code on the test data. If you aren’t providing test data or labels, then can you run my code for me?

Yes - under certain conditions. First, check with us that we are able to resource your request. We are really busy and have more work that we are funded to do. If you are able to provide any donations to the resource to fund an engineer’s time, your request will be prioritized. Second, you must provide a 90% complete draft of the article you are writing to describe the method, pointing out where it differs from other known approaches, particularly in the Challenge. We prioritize novelty. We won’t judge your training and validation statistics too heavily, since we are more interested in adding to the discussion around methods, rather than adding a few percent to the top score. We also prioritize open source approaches, although if you do wish to keep your method secret, we may be able to offer a sponsorship plan. Finally, you need to package the code exactly as in the Challenges, and ensure it works in the containerized environment provided (not just on your personal computer or an arbitrarily configured cluster). Please note that bugs in the code will increase the likelihood that you do not receive a score, as we cannot invest large amounts of time into providing a single group with support.

Can you score my algorithm for one of the previous Challenges?

Yes, we are happy to support ongoing research with past Challenges subject to available resources (the Challenges are largely run by volunteers) and whether you are able to do the following:

  1. You must share your code in a GitHub or Gitlab repository.
  2. You must include your code, including your training code and forward model.
  3. You must include an open-source LICENSE file, an AUTHORS file, and a README file that describes the results on the training set.
  4. You must include a detailed draft article describing your method, including the results on the training set (matching the results in your README), with the target journal for your submission.
  5. Your article must describe how your technique differs from all other methods from the Challenge and the subsequent focus issue, especially any previous methods that you may have developed.
  6. You must include a statement that no one from your team will attempt to submit another entry. Each team receives at most one follow-up shot at the data.
  7. We must be able to run your code. For recent Challenges, we expect you to follow the submission instructions and format your submission in the same way.

If you agree to the above conditions, then please contact us at challenge@physionet.org to submit your entry. Even so, we cannot guarantee that we will be able to run your code.

Contribution

I would like to suggest/help organize/contribute software or data to a Challenge - how can I do this?

If you are interested in contributing to, or posing a Challenge, please feel free to contact us with details of the databases you can provide, the nature of the problem you wish to solve, and some demo code which makes a basic attempt to solve the problem. We strongly recommend having at least three independent databases, two to become public, and one to remain private/hidden. For more information on the general aims and framework of the Challenge, and the criteria for a successful event, please see here and here.

For the current Challenge FAQs, please visit here.


Supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number R01EB030362.

© PhysioNet Challenges. Website content licensed under the Creative Commons Attribution 4.0 International Public License.

Back