Detecting and Quantifying Apnea Based on the ECG: The PhysioNet/Computing in Cardiology Challenge 2000

George Moody

Announcements

Revisiting the PhysioNet/CinC Challenge 2000 (March 14, 2003, midnight)

Several of the participants in the first PhysioNet/Computers in Cardiology Challenge, together with the organizers, have published a joint paper that compares the methods used in the challenge and investigates how to combine several of the most successful strategies for detecting and quantifying sleep apnea based on the ECG. This paper appeared last year in Medical & Biological Engineering & Computing, and it can now be read on-line [external link, PDF].

Challenge featured in New Scientist (Dec. 15, 2001, midnight)

PhysioNet’s Challenges are the subject of a feature article in New Scientist (“Off-Beat”, 15 December 2001, pp. 33-35). 

Results from the Computers in Cardiology Challenge 2000 (Sept. 22, 2000, midnight)

Final scores for Computers in Cardiology Challenge 2000 are now posted. Thanks to all who participated in this outstandingly successful event!

Challenge entry deadline extended (April 26, 2000, 1 a.m.)

The deadline for entries in the Computers in Cardiology Challenge 2000 has been extended. (The deadline for submitting abstracts for the conference has not changed, however.)

Computers in Cardiology Challenge 2000 open for scoring (April 12, 2000, midnight)

Entrants in the Computers in Cardiology Challenge 2000 can now submit their results for scoring.

Computers in Cardiology Challenge 2000 (Feb. 10, 2000, midnight)

Can sleep apnea be detected using the ECG only? PhysioNet and Computers in Cardiology 2000 challenge you to develop and evaluate a method for doing so, in CinC Challenge 2000, an open contest aimed at catalyzing research, friendly competition, and wide-ranging collaboration around this clinically important problem. Prizes will be awarded to the most successful participants.

Citations

When using this resource, please cite the following publications:

Introduction

Obstructive sleep apnea (intermittent cessation of breathing) is a common problem with major health implications, ranging from excessive daytime drowsiness to serious cardiac arrhythmias. Obstructive sleep apnea is associated with increased risks of high blood pressure, myocardial infarction, and stroke, and with increased mortality rates. Standard methods for detecting and quantifying sleep apnea are based on respiration monitoring, which often disturbs or interferes with sleep and is generally expensive. A number of studies during the past 15 years have hinted at the possibility of detecting sleep apnea using features of the electrocardiogram. Such approaches are minimally intrusive, inexpensive, and may be particularly well-suited for screening. The major obstacle to use of such methods is that careful quantitative comparisons of their accuracy against that of conventional techniques for apnea detection have not been published.

We therefore offer a challenge to the biomedical research community: demonstrate the efficacy of ECG-based methods for apnea detection using a large, well-characterized, and representative set of data. The goal of the contest is to stimulate effort and advance the state of the art in this clinically significant problem, and to foster both friendly competition and wide-ranging collaborations. We will award prizes of US$500 to the most successful entrant in each of two events.1

Data for development and evaluation

Data for this contest have kindly been provided by Dr. Thomas Penzel of Philipps-University, Marburg, Germany, and are available here.

The data to be used in the contest are divided into a learning set and a test set of equal size. Each set consists of 35 recordings, containing a single ECG signal digitized at 100 Hz with 12-bit resolution, continuously for approximately 8 hours (individual recordings vary in length from slightly less than 7 hours to nearly 10 hours). Each recording includes a set of reference annotations, one for each minute of the recording, that indicate the presence or absence of apnea during that minute. These reference annotations were made by human experts on the basis of simultaneously recorded respiration signals. Note that the reference annotations for the test set will not be made available until the conclusion of the contest. Eight of the recordings in the learning set include three respiration signals (oronasal airflow measured using nasal thermistors, and chest and abdominal respiratory effort measured using inductive plethysmography) each digitized at 20 Hz, and an oxygen saturation signal digitzed at 1 Hz. These additional signals can be used as reference material to understand how the apnea annotations were made, and to study the relationships between the respiration and ECG signals.

The database does not contain episodes of pure central apnea or of Cheyne-Stokes respiration; all apneas in these recordings are either obstructive or mixed. Minutes containing hypopneas (defined as intermittent drops in respiratory flow below 50%, accompanied by drops in oxygen saturation of at least 4%, and followed by compensating hyperventilation) are also scored as minutes containing apnea. Additional information about the recordings was posted here after the conclusion of the competition, including (for all recordings) age, gender, height, weight, AI (apnea index), HI (hypopnea index), and AHI (apnea-hypopnea index). The subjects of these recordings are men and women between 27 and 63 years of age, with weights between 53 and 135 kg (BMI between 20.3 and 42.1); AHI ranges from 0 to 93.5 in these recordings.

Sleep apnea definitions

Several definitions for clinically significant sleep apnea have been in clinical use since 1978, when Guilleminault defined “sleep apnea syndrome” as more than 30 apneas per night. In 1981, Lavie proposed a more selective criterion of 100 apneas per night. Later criteria were based on an “apnea index” (the number of apneas per hour, or the number of minutes containing apnea per hour). Most clinicians regard an apnea index below 5 as normal, and an apnea index of 10 or more as pathologic. In 1988, He et al. found increased mortality in untreated patients with apnea indices of 20 or more, and such patients are now recognized as in need of treatment. Criteria used in current practice rely not only on an apnea index, but also on symptoms and cardiovascular sequelae.2

Data classes

For the purposes of this challenge, based on these varied criteria, we have defined three classes of recordings:

Events and scoring

Each entrant may compete in one or both of the following events:

1. Apnea screening

In this event, your task is to design software that can classify the 35 test set recordings into class A (apnea) and class C (control or normal) groups, using the ECG signal to determine if significant sleep apnea is present. Your classifications for the 5 class B (borderline) recordings will not influence your score in this event (but you must classify them into either class A or class C, since you will not know which records belong to class B until the correct classifications of the 35 test set records are disclosed after the end of the contest). Your score for this event is simply the number of correct classifications; thus the maximum score possible is 30.

An example may help to clarify the scoring: A contestant submits her results, classifying 22 recordings in class A and 13 in class C (for a total of 35). Out of the 22 recordings that her software has identified as class A, 16 of them are actually class A, 3 are class B and 3 are class C. Out of the 13 recordings that her software identified as class C, 7 have been correctly identified, and other 6 include 4 class A and 2 class B. The score in this case is 23 (16 correct class A identifications, plus 7 in class C). Class B cases do not contribute to the final score; rather, they provide a buffer zone between the cut of classes A and C.

We have chosen to exclude the class B recordings from the calculation of the scores because the utility of a screening test depends primarily on the accuracy with which it classifies the unambiguous cases, both positive and negative (classes A and C respectively in this instance). If you wish to attempt to classify recordings into all three groups, you may submit a second set of classifications, and we will calculate your score in the same way (but the maximum possible score in this case will be 35). The highest scores obtained in this way will be published, but will not be the basis for an award.

2. Quantitative assessment of apnea

In this event, your software must generate a minute-by-minute annotation file for each recording, in the same format as those provided with the learning set, using the ECG signal to determine when sleep apnea occurs. Your annotations will be compared with a set of reference annotations to determine your score. Each annotation that matches a reference annotation earns one point; thus the highest possible score for this event will be approximately 16800 (480 annotations in each of 35 records). It is important to understand that scores approaching the maximum are very unlikely, since apnea assessment can be very difficult even for human experts. Nevertheless, the scores can be expected to provide a reasonable ranking of the ability of the respective algorithms to mimic the decisions made by human experts.

Obtaining scores

A form that will permit you to submit your classifications and/or annotations for scoring is now available. You will receive a reference number and your score(s) by return e-mail. You may revise your submissions and try again if you wish, but attempts to exploit this service in order to discover the correct classifications are contrary to the spirit of the contest and will result in disqualification.

How to enter

To enter the competition, submit an abstract with a concise description of your approach to the problem to Computers in Cardiology 2000 no later than Wednesday, 3 May 2000. Your abstract must include your reference number and score(s); for this reason, do not wait until the last minute to submit your classifications and/or annotations for scoring. If your abstract is accepted, you will be expected to prepare a four-page paper for presentation during the conference and publication in the conference proceedings. We welcome and encourage contributions to PhysioNet of software developed during this competition.

Awards

The author(s) of the top-scoring eligible entry in each event will receive an award of US$500 in recognition of his or her achievement. In the event of a tie, the date of the author’s abstract submission will be the tie-breaker. This rule favors early submission of abstracts, but permits authors to improve their results if they can after submitting their abstracts. Classifications or annotations received for scoring after noon GMT on Friday, 22 September 2000 will not be eligible for awards. Submissions from members and affiliates of our research groups at MIT, Boston University, Harvard Medical School, Beth Israel Deaconess Medical Center, McGill University, and Phillips-University are not eligible for awards, although all are welcome to participate.

Workshop/Panel discussion

All entrants are invited to describe their methods during a panel discussion at Computers in Cardiology in Boston on Sunday, 24 September 2000, when the awards will be given. Individual presentations of accepted papers will be scheduled for one or more sessions of the conference during the following days (25-27 September).

Acknowledgements

  1. Funding for the awards has been contributed by the Margret and H.A. Rey Laboratory for Nonlinear Dynamics in Medicine at Boston’s Beth Israel Deaconess Medical Center.

  2. We thank Thomas Penzel for the discussion of diagnostic criteria for sleep apnea syndrome, as well as for making this event possible by his generous contribution of data.

Challenge Results

Event 1 (Apnea Screening)

In this event, the five recordings in class B are not counted. The score is the total number of correct classifications of the 20 class A (apnea) and 10 class C (control/no apnea) recordings, so that the maximum possible score is 30. Since four entrants achieved a perfect score, the date of the top-scoring entrant’s submission is the tiebreaker.

The top scores in event 1 are:

Score Entrant Date Entries
30/30 MR Jarvis and PP Mitra
Caltech, Pasadena, CA, USA
3 May 3
30/30 B Raymond, R Cayton, R Bates, and M Chappell
Birmingham Heartlands Hospital, Birmingham, UK
10 May 3
30/30 P de Chazal, C Henehan, E Sheridan, R Reilly, P Nolan, and M O’Malley
University College - Dublin, Ireland
17 July 1
30/30 J McNames, A Fraser, and A Rechtsteiner
Portland State University, Portland, OR, USA
12 September 3
29/30 PK Stein and PP Domitrovich
Washington University School of Medicine, St. Louis, MO, USA
12 September 2
28/30 JE Mietus, C-K Peng, PCh Ivanov, and AL Goldberger
Beth Israel Deaconess Medical Center, Boston, MA, USA (unofficial entry)
27 April 2
28/30 Z Shinar, A Baharav, and S Akselrod
Tel-Aviv University, Ramat-Aviv, Israel
29 April 1
28/30 MJ Drinnan, J Allen, P Langley, and A Murray
Freeman Hospital, Newcastle upon Tyne, UK
3 May 1
28/30 C Maier, M Bauch, and H Dickhaus
University of Heidelberg, Heilbronn, Germany
3 May 2
28/30 M Schrader, C Zywietz, V von Einem, B Widiger, G Joseph
Medical School Hannover, Hannover, Germany
7 August 8
27/30 C Marchesi, M Paoletti, S Di Gaetano
University of Firenze, Firenze, Italy
28 April 1
27/30 M Ballora, L Glass, B Pennycook, PCh Ivanov, and AL Goldberger
McGill University, Montreal, Quebec, Canada (unofficial entry)
3 May 1

Each entrant’s best score is shown, along with the date when they achieved that score. Many entrants submitted multiple entries, and the ‘Entries’ shown indicate how many entries were submitted by each entrant up to and including the one that scored highest (later entries, and entries that did not receive scores because of formatting errors were not counted); this gives some sense of how much ‘tuning’ may have taken place. Entries noted as ‘unofficial’ came from two of the PhysioNet core research groups, and were therefore not eligible for awards, although they followed all of the rules of the competition.

Only one entry, from PK Stein, was submitted for an unofficial ‘three-way score’ (the total of correct classifications of all 35 records, including the 5 class B (borderline) records). This is a significantly more difficult task, since the amount of apnea in the class B records must be accurately determined in order to classify them correctly, and recognizing apnea in recordings that have only small amounts of apnea is more difficult than in recordings with frequent apneas. The single entry received a score of 33/35, an excellent result; the only errors occurred when a class A record was put in class B, and a class B record was put in class A.

Event 2 (Apnea Quantification)

In this event, each minute of each of the 35 recordings in the test set must be classified as containing apnea (A) or not (N). The maximum possible score is 17268 (the total number of minutes in the 35 recordings for which reference classifications are available).

The top scores in event 2 are:

Score Entrant Date Entries
15994/17268
92.62%
J McNames, A Fraser, and A Rechtsteiner
Portland State University, Portland, OR, USA
21 September 4
15939/17268
92.30%
B Raymond, R Cayton, R Bates, and M Chappell
Birmingham Heartlands Hospital, Birmingham, UK
22 September 8
15432/17268
89.36%
P de Chazal, C Henehan, E Sheridan, R Reilly, P Nolan, and M O’Malley
University College - Dublin, Ireland
22 September 15
15120/17268
87.56%
M Schrader, C Zywietz, V von Einem, B Widiger, G Joseph
Medical School Hannover, Hannover, Germany
12 September 9
15075/17268
87.30%
MR Jarvis and PP Mitra
Caltech, Pasadena, CA, USA
21 September 3
14788/17268
85.63%
Z Shinar, A Baharav, and S Akselrod
Tel-Aviv University, Ramat-Aviv, Israel
11 May 1
14772/17268
85.54%
C Maier, M Bauch, and H Dickhaus
University of Heidelberg, Heilbronn, Germany
20 September 5
14591/17268
84.49%
JE Mietus, C-K Peng, and AL Goldberger
Beth Israel Deaconess Medical Center, Boston, MA, USA (unofficial entry)
19 May 3

As in event 1, each entrant’s best score is shown above, along with the date it was achieved and the number of entries submitted (excluding any entries submitted after the one that received the best score, and any that were not scored because of formatting errors). Notably, four of the top five finishers in this event also achieved perfect scores in event

  1. The classification accuracy achieved by the top finishers is comparable to the roughly 90% concurrence of human experts in classification of the original polysomnograms with reference to the full set of signals, including nasal airflow, respiratory effort, and oxygen saturation.

Papers

These papers were presented at Computers in Cardiology 2000.

Files

Access Policy

Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files)

Open Data Commons Attribution License v1.0

Access the files


Supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number R01EB030362.

© PhysioNet Challenges. Website content licensed under the Creative Commons Attribution 4.0 International Public License.

Back