Estimating Sleep Apnea Cost by Open Data Science Methods

Sleep apnea is disorder in which breathing repeatedly stops and stars during the sleep. In general sleep apnea can be divided to types; obstructive sleep apnea (OSA) and central sleep apnea (CSA). Patients with heart failure are frequently observed having sleep apnea (Kasai 2012). Other medical conditions which are related to sleep apnea are high blood pressure, type 2 diabetes and Parkinson’s disease. Sleep apnea can be undiagnosed for many years and patients use health care services for different other reasons. Some of sleep apnea symptons are loud snoring, morning headache, excessive daytime sleepiness, difficulties paying attention while awake and irritability.

Methods

There is no clear way how to calculate costs of sleep apnea. Patients can be undiscovered for long period time until they get sleep apnea diagnose. When patient is under diagnosed it means that patients can use healthcare services for many different other reasons before they are diagnosed correctly and get help for sleep apnea. This makes it hard to estimate sleep apnea real cost.

One method to estimate obstructive sleep apnea cost is presented in article by Armeni et al. (2019) Cost-of-illness study of Obstructive Sleep Apnea Syndrome (OSAS) in Italy. Method estimates costs by top to down; diseases which are related to sleep apnea, are used for estimating costs which sleep apnea patients generates. Costs are divided into 3 groups; direct healthcare, direct noon healthcare and productivity lost costs. Direct costs includes hospitalizations, consultations, laboratory testing, drug or medical device consumption, etc. Direct non healthcare costs includes transportation costs and informal care (i.e. care provided by family). Productivity costs includes losses related to illness or death, patients and informal care time off work.

Data

Discuss about

[] dataset
[] modification and edit
[] precalculated data
[] PAF formula for OR and RR
[] ready data and results

The Global Data Exchange (GHDx) data catalog provides health-related open data. The Institute for Metric Health and Evaluation (IHME) offers open disease prevalence and population data for for the catalog from different countries. In this project we collected data from 2019 for 42 different countries. As base prevalence data on diseases which are not available on IHME data, we used Armeni et al. (2019) data from the article. For obstructive sleep apnea prevalences, we got data from Benjafield et al. (2019) article.

Data sets are publicly available and downloadable from GHDx website. Data sets consists of the population and disease prevalances information. Data can be downloaded as (several) CSV-files to local computer. We decided to use duckdb database package for R to storage the data. duckdb creates local sql database, but for longer data storage we ended up saving datasets in parquet format, because in further duckdb version older databases might not work. Parquet datas can be pointed for duckdb and queries works as any sql databases.

After downloading population and prevalences datasets, we made a bit data cleaning and stored files into duckdb database for further analyses.

Population

[] describe nations and population
[] describe age population

We collected data for 42 different countries, mostly European Union countries. Their population differ quite much by total number of people as also in age distribution.

Countries population in 2019.

## # A tibble: 6 × 4
## # Groups:   location_name [6]
##   location_name pop_1574_both pop_1574_female pop_1574_male
##   <chr>                 <dbl>           <dbl>         <dbl>
## 1 Albania            2085677.        1042814.      1042863.
## 2 Armenia            2267463.        1187751.      1079711.
## 3 Austria            6785345.        3390812.      3394534.
## 4 Azerbaijan         7752896.        3933115.      3819781.
## 5 Belarus            7249252.        3828210.      3421042.
## 6 Belgium            8459697.        4236476.      4223220.

## # A tibble: 6 × 4
## # Groups:   location_name [6]
##   location_name pop_3069_both pop_3069_female pop_3069_male
##   <chr>                 <dbl>           <dbl>         <dbl>
## 1 Albania            1359142.         692488.       666654.
## 2 Armenia            1577767.         841499.       736268.
## 3 Austria            4794931.        2408749.      2386182.
## 4 Azerbaijan         5171784.        2660620.      2511164.
## 5 Belarus            5391554.        2875716.      2515838.
## 6 Belgium            5901464.        2953915.      2947550.

## # A tibble: 6 × 4
## # Groups:   location_name [6]
##   location_name  pop_both pop_female pop_male
##   <chr>             <dbl>      <dbl>    <dbl>
## 1 Albania        2720353.   1357945. 1362408.
## 2 Armenia        3019674.   1562196. 1457478.
## 3 Austria        8916185.   4522218. 4393968.
## 4 Azerbaijan    10278674.   5136802. 5141872.
## 5 Belarus        9500785.   5072640. 4428145.
## 6 Belgium       11419166.   5800262. 5618903.

Prevalences information

[] matching diseases

In Armeni article, they specified xxx number of diseases which are connected to cost of sleep apnea. Problem here was to find correct cases in IHME dataset. We managed to match xxx number of diseases to get data fromo IHME dataset.

By comparing Italy prevalence values from Armeni article to IHME dataset, we can find some differences, which will affect to results.

PAF calculation

tsekkaa kaavat ja johdonmukaisuus

Population Attributable Fraction (PAF) describes the incidence risk in the overall population that can be attributed to the exposed population. It can be calculated by the risk ratio (\(RR\)) or odds ratio (\(OR\)) of the disease. Levin (1953) introduced PAF formula by knowing the risk ratio and the risk factor prevalence (\(P_e\)). tsekkaa lahde

\(PAF(\%)=\frac{P_e (RR - 1)}{P_e (RR - 1) + 1} \times 100\)

Calculating PAF by odds ratio is more complicated. Else and Heuch (2001) presented solution for calculating PAF by knowing the odds ratio. First we need to solve equation

\(\frac{P(D)(1-OR)+P(\sim E)+OR \times P(E) \pm \sqrt{P(D)(1-OR)+P( \sim E) + OR \times P(E)^2*** - 4P(\sim E)(1-OR)P(D)}}{2P(\sim E)(1-OR)}\)

where we retain solution which applies to \(0 \le P(D|\sim E) \le 100\). Now we can calculate PAF

\(PAF=\frac{(P(D|E)-P(D|\sim E))P(E)}{P(D)} = ... = 1 - \frac{100 \times P(D|\sim E)}{P(D)}\)

For calculating the PAF from Odds Ratio we created a function, which takes as input odds ratio, disease prevalence and sleep apnea prevalence.

Visualizing the sleep apnea costs

[] still picture
[] leafly for web article?

Building the calculator

We wanted to create a calculator, in which user can change attributes such as sleep apnea prevalence by gender, diseases annual costs and prevalences. User can select a country and application loads country specific base data for the population and prevalences. With these information, application visualizes costs in total and per patient annually.

Database for Calculator Shiny

For the application to run fast and efficiently, we needed to pre calculate data as much we could. Only users inputs would affect the calculation.

Final Thoughts

Open Data serve purpose to estimate costs of sleep apnea. Even thought calculation can be done many different ways, this is good start to recognize what are costs of sleep apnea patients. Calculation can be done more specifically, if there is more detailed data available. Here we had to use some prevalence values from Italy, because all the disease prevalences were not available in IHME dataset.

Next question would be how we could affect the costs of sleep apnea, if the diagnose could be done earlier phase and patient could get help he needs. Obviously this kind of question needs more detailed and specific data to answer.

References

Armeni et al. (2019) Cost-of-illness study of Obstructive Sleep Apnea Syndrome (OSAS) in Italy

Benjafield et al. (2019) Estimation of the global prevalence and burden of obstructive sleep apnoea: a literature-based analysis

[] more refenrences!