Disentangling genetic and environmental risk factors for individual diseases from multiplex comorbidity networks

Disentangling genetic and environmental risk factors for individual diseases from multiplex comorbidity networks

Peter Klimek, Silke Aichberger, Stefan Thurner stefan.thurner@meduniwien.ac.at Section for Science of Complex Systems; Medical University of Vienna; Spitalgasse 23; A-1090; Austria
Santa Fe Institute; 1399 Hyde Park Road; Santa Fe; NM 87501; USA
IIASA, Schlossplatz 1, A 2361 Laxenburg; Austria

Most disorders are caused by a combination of multiple genetic and/or environmental factors. If two diseases are caused by the same molecular mechanism, they tend to co-occur in patients. Here we provide a quantitative method to disentangle how much genetic or environmental risk factors contribute to the pathogenesis of 358 individual diseases, respectively. We pool data on genetic, pathway-based, and toxicogenomic disease-causing mechanisms with disease co-occurrence data obtained from almost two million patients. From this data we construct a multilayer network where nodes represent disorders that are connected by links that either represent phenotypic comorbidity of the patients or the involvement of a certain molecular mechanism. From the similarity of phenotypic and mechanism-based networks for each disorder we derive measure that allows us to quantify the relative importance of various molecular mechanisms for a given disease. We find that most diseases are dominated by genetic risk factors, while environmental influences prevail for disorders such as depressions, cancers, or dermatitis. Almost never we find that more than one type of mechanisms is involved in the pathogenesis of diseases.

I Introduction

Multifactorial diseases are disorders that involve multiple disease-causing mechanisms, such as genes acting in concert with environmental factors. The represent one of the most significant challenges that medical research faces today Lim12 (). Disease-causing mechanisms may be (and typically are) involved in more than one disorder Barabasi11 (). If two diseases are related to the same mechanism (say, a single point mutation, SNP, or an altered metabolic pathway), they have a tendency to co-occur in the same patients Rzhetsky07 (); Lee08 (). Here we develop a novel network-medicine approach to quantify the relative contributions of genetic and environmental risk factors for diseases. The central idea of the approach is illustrated in figure 1. We consider three diseases , , (circles) and assume that diseases and co-occur very frequently in patients (thick line), whereas diseases and rarely coincide within patients (thin line). Assume further that can arise through two different disease-causing mechanisms, and , where mechanism is also responsible for (or involved in) disease and mechanism for disease . Obviously, mechanism explains the observed disease phenotype (the frequent co-occurrence with disease ) much better than mechanism and is therefore a more probable causes for disease . Using this idea we are able to identify the most likely causes and are able to disentangle genetic and environmental disease-causing mechanisms for 358 different disease phenotypes.

Figure 1: Consider three diseases , , (blue circles) and assume that disease co-occurs very frequently with (thick line) but only in rare cases with (thin line). Further, assume that there are two different disease-causing mechanisms for , and , where mechanism () is also known to be involved in disease (). Since is very often observed together with , but not with , mechanism explains the disease phenotype much better than .

We consider the three most important classes of disease-causing mechanisms. (i) Genetic mechanisms relate a disease to a specific defect or alteration in the genome. If one such defect is related to two or more pathologies, then those diseases share a genetic comorbidity. For example, it was shown that the phenotypic comorbidity between schizophrenia and Parkinson’s disease is almost entirely accounted for by SNPs in loci near NT5C2 and HLA-DRA Nalls14 (). (ii) Pathway-based mechanisms are given by a defective pathway (e.g. metabolic or signal transduction pathway) that is involved in the etiology of the disease. Pathway-based comorbidities indicate that two diseases are related to different defects in the same pathway. For instance, it is known that the Pi3K/AKT pathway up-regulates anti-inflammatory cytokines and inhibits proinflammatory cytokines such as IL-1b, IL-6, TNF-, and IFN- that show increased levels in patients with major depressive disorder Kitagishi12 (). Also, inactivation of the Pi3K/AKT pathway through the suppression of insulin receptor substrates (IRS) may act as the underlying mechanism for the metabolic syndrome (i.e. the frequent concurrence of metabolic disorders such as hypertension, obesity, or diabetes) Guo14 (). Indeed, depression has been identified as an important comorbidity of the metabolic syndrome in various cross-sectional surveys Dunbar08 (); Klimek15 (). Finally, (iii) toxicogenomic mechanisms characterize diseases caused by exposure to chemical substances that change the activity of certain genes. Two diseases share a toxicogenomic comorbidity if they are related to different genes that interact with the same toxic substance. For example, the immunosuppressive chemical methoxychlor is used as pesticide and can cause atopic dermatitis, possibly by expressing IL-13 in the skin Zhu11 (). Methoxychlor also promotes the epigenetic transgenerational inheritance of kidney disease. Upon prenatal exposure to methoxychlor during fetal gonadal development, offspring show increased incidence of adult-onset kidney disease that was related to differentially DNA methylated regions Manikkam14 (). Atopic dermatitis is indeed associated with the nephritic syndrome Darlenski14 (). There may be cases where the same diseases share genetic and environmental risk factors. Such cases we regard as genetic comorbidities, because the genetic link represents a direct mechanism that explains a corresponding phenotypic comorbidity without the need for additional, environmental influences.

The construction and analysis of networks of diseases that are connected by different comorbidity relations has recently lead to substantial progress in our understanding of the etiologies of various diseases Barabasi11 (); Pawson08 (); Zanzoni09 (). For instance, gene-disease associations collected in the Online Mendelian Inheritance in Man (OMIM) database OMIM () can be used to construct a network where diseases are linked if they are related to the same mutations in one or several genes Goh07 (). This network allowed for the identification of clusters of diseases, such as cancers, which are held together by a small number of genes Feldman08 (). Another approach is to connect diseases if they are both associated with enzymes that catalyze reactions in the same pathway Lee08 (). Protein-protein interaction data can be integrated with toxicogenomics data to construct a network where two diseases are linked if they are both caused by exposure to the same chemical, which has led to the successful identification of novel chemical-protein associations Audouze10 (). It has recently been shown that diseases that are comorbid in the population tend to be related with clusters of proteins that are close to each other in the human protein-protein interaction network Menche15 (). Different types of genomic, metabolomic, and proteomic disease-disease relations have also been combined to form an “integrated disease network” Sun14a (); Sun14b (). In phenotypic comorbidity networks, nodes correspond to disease phenotypes that are linked if the two diseases tend to co-occur in the same patients Hidalgo09 (). Chronic, multifactorial disorders often assume the role of hubs in such networks (i.e. nodes that are strongly connected with a large number of other diseases) Chmiel14 ().

Here we construct a generalized network that combines phenotypic comorbidity networks with those given by different types of shared disease-causing mechanisms (genes, pathways, or exposure to chemicals), the human disease multiplex network (HDMN) (see figure 2). Multiplex networks are given by a set of nodes connected by multiple sets of links Boccaletti14 (); Kivela14 (). One set of links in the HDMN corresponds to phenotypic comorbidity relations, whereas the other sets of links represent different classes of genetic or environmental mechanisms. We quantify how similar the phenotypic links of a particular disease are to its links in other layers in the HDMN. This allows us to derive scores for each disease of how well its phenotypic comorbidities can be explained by genetic, pathway-based, or toxicogenomic mechanisms. In this sense the derived scores quantify “how genetic” or how strong environmental influences are for a given disease.

Ii Data and Methods

Figure 2: Illustration of the HDMN for a disease . In the HDMN, nodes correspond to disease phenotypes that are connected by four different types of links which can be visualized as network layers. The first layer, , encodes phenotypic comorbidity relations. The link-weights in this layer are given by the comorbidity strengths that measure how often two diseases and co-occur within the same patients, i.e. the numbers of patients with either disease (red individuals) or (blue) are compared to the numbers of patients with both diseases (green). The second layer, , contains genetic comorbidities (blue links) where two different phenotypes (illustrated as blue and red individuals) are related to the same genetic defect or alteration. The third type of links are pathway-based comorbidities (green links), layer . Here, two different alterations occur in a pathway that is involved in two or more different diseases. Finally, the fourth layer, , is given by toxicogenomic comorbidities (red links), where a chemical substance is known to trigger different disease-causing mechanisms. Disorder is shown as a red node in the HDMN, together with other phenotypes (blue nodes) that are in ’s neighborhood in at least one of the layers. The relative comorbidity risks measure to which extent shared disease-causing mechanisms between two diseases lead to their phenotypic comorbidity. is the average comorbidity strength of all neighbors of in layer , normalized to the average comorbidity strength over all phenotypes that share no disease-causing mechanism of any type with .

ii.1 Data

Phenotypic disease-disease associations were obtained from a database of the Main Association of Austrian Social Security Institutions that contains pseudonymized claims data of all persons receiving inpatient care in Austria between January 1st, 2006 and December 31st, 2007 Chmiel14 (); Thurner13 (). The data contains age, sex, main- and side-diagnoses (ICD10 codes) icd () for each hospital stay from patients. Not all ICD codes represent disorders, they may also indicate general examinations, injuries, collections of unspecific symptoms or disorders that are not classified elsewhere. Unspecific codes are excluded and we work with the remaining diagnoses on the three-digit ICD levels in chapters (i.e. first-digit-levels) -, labeled by the capital index . We use the words disease, disorder and diagnosis interchangeably whenever referring to an ICD entry.

Molecular disease-disease associations were obtained from molecular data of three types, namely purely genetic associations and two different types of environmental associations. (i) Genetic disease associations were extracted from the OMIM dataset OMIM (), which provides a collection of gene-phenotype relationships. It contains for instance currently more than 30 genes that are known to play a role in type 2 diabetes, e.g. the aforementioned IRS 2 gene. (ii) Pathway-based disease associations we took from the UniProtKB database uniprot (); Croft14 (). The UniProtKB database contains protein sequence and functional information that is cross-referenced with pathways in which the proteins play a role and the protein’s involvement in diseases. For instance, an UniProt entry for the PI3-kinase protein cross-references about 40 different pathways, including the PI3K/AKT activation pathway, in addition to three different disease phenotypes from the OMIM dataset. (iii) Toxicogenomic disease associations were obtained from the Comparative Toxicogenomic Database (CTD) Davis14 (). Entries in the CTD correspond to chemicals that are linked to diseases caused by exposure to the substance and with disease genes that are differentially expressed under exposure to it. For instance, according to this data the chemical methoxychlor is involved in more than ten different diseases, including atopic dermatitis where its influence is mediated by eight different genes, including IL-13. To link the molecular to the phenotypic data, a mapping between ICD10 and OMIM disease identifiers had to be established. To obtain such mappings we compiled three different data sources, namely the Human Disease Ontology database Osborne09 (), OrphaNet Ayme07 (), and Wikipedia 111https://en.wikipedia.org/wiki/ICD-10, retrieved 04/30/2015. For more information on data extraction and the construction of the ICD10-OMIM mappings see SI, Text S1. Each of the three molecular datasets can be represented by a bipartite network , where labels the classes of mechanisms, i.e. genetic (), pathway-based (), or toxicogenomic (), index labels disorders (ICD10 codes) and j labels unique genes (if ), pathways (if ), or chemicals (if ). We set , if there exists is at least one relation between disease and gene/pathway/chemical j, , otherwise.

Heritability and drug approvals. Information on the broad-sense heritability (see SI, Text S2) of individual diseases , , was taken from the SNPedia database SNPedia (). As a source for drug approvals we used the Drugs@FDA database 222http://www.fda.gov/drugsatfda, retrieved 01/07/2016 from which we obtained FDA-approved brand names and approval dates for all drug products approved since 1939. These drugs were mapped via known molecular targets to diseases Yildirim07 () to obtain the number of newly approved drug products of the last twenty years for the specific disease , .

ii.2 Construction of the HDMN

We constructed a multi-layer network that encodes disease-disease associations of four different types, the HDMN, . This network contains one phenotypic layer, , and three layers that encode molecular disease-disease associations, . The layer of phenotypic disease associations, , is given by the contingency coefficient, , between diseases and : Here is the number of patients with disease . For each pair of diseases we counted the number of patients that have both diseases (), only disease or ( or , respectively), or neither disease (). Here, the bar denotes “not”. Entries in the phenotypic disease network, , are then given by the contingency coefficient,


Values of are within the range and measure the phenotypic comorbidity strength between diseases and . The higher (lower) , the higher (lower) the probability that a patient with disease also suffers disease . indicates that occurrences of and are independent from each other. We set , whenever the patient numbers are too low to allow for a reliable estimate of , i.e. whenever one of the possible outcomes for , , , or was below 5. An age-dependent version of the phenotypic disease network for a given age interval is denoted by . Patients fall within one of 11 age groups, 0y-7y, 8y-15y, …, 80y-87y.

The layers 2, 3, and 4 of the HDMN encode three different types of molecular associations, . Each of these layers, , is obtained from the bipartite network as follows,


Note that this definition ensures that associations between pathologies and in the pathway, , and toxicogenomic, , layers are indeed due to shared pathways or exposure to the same chemical that can not be explained by direct genetic causes (i.e. ).

The numbers of non-isolated nodes, , and links, , for each layer are shown in the SI, table S1. Diseases are not included in the HDMN if they are isolated in every molecular layer ,, or . Links in the phenotypic layer are weighted and typically close to zero Hidalgo09 (); Chmiel14 (). Numbers for are between 200 and 300 for the molecular layers, whereas there are more than 350 nodes in the phenotypic layer.

ii.3 Disease risks from shared pathophysiological mechanisms

We introduce a relative risk indicator that measures how similar the phenotypic comorbidities of disease are to its genetic, pathway-based, or toxicogenomic comorbidities. In this sense quantifies how much a specific class of disease-causing mechanisms contributes to the phenotype . is the quotient of the average comorbidity strengths, , of all diseases that are linked to in layer , and the comorbidity strengths of those diseases that are linked to in none of the pathophysiological layers, i.e.,


Here is the degree of disease in layer given by and is a control set of links for disease that contains all links , , for which . For convenience we also defined the logarithmic relative comorbidity risk, . A value of close to zero indicates that the presence of pathophysiological comorbidities of type have no relation whatsoever to the actual, phenotypic comorbidities of . With increasingly positive values of , the probability increases that the pathophysiological comorbidities of are indeed observed in the population.

Note that the relative comorbidity risk can be large due to a single comorbidity of type with a very high phenotypic comorbidity strength , or because there are a large number of comorbidities with only moderately increased comorbidity strengths. In particular, might favor diseases that have a large number of connections of type to diseases that are physiologically very similar and that have similar ICD10 diagnosis codes, see Text S1. To adjust for these biases we rescaled by the node degree to obtain a measure that favors diseases with a smaller number of highly relevant disease-causing mechanisms. The re-scaled comorbidity risk, , is given by .

We performed two different statistical tests to evaluate whether is significantly greater than zero. First, a Wilcoxon rank sum test for equal medians of two samples was performed. The samples were given by the set of comorbidity strengths of all diseases that share a link of type with , , and the set . The -value for , , was obtained from the one-sided Wilcoxon rank sum test against the alternative hypothesis that the median of is smaller than the median of . A Benjamini-Hochberg multiple hypothesis testing correction was applied on each layer using an exploratory threshold for the false discovery rate of (which corresponds to thresholds for the adjusted -values in the range between 0.1 and 0.05). Second, we performed a randomization test for where we replace by a random permutation of its elements, denoted by . The randomized was computed from equation 3 where was replaced by . For a given , has the same number of nodes and links as , but is otherwise completely randomized.

Iii Results and Discussion

The estimates of the most probable disease causes can be visualized in a three-dimensional representation where the axes show the genetic, pathway-based, and toxicogenomic comorbidity risks. Each disease corresponds to a point with coordinates , see figure 3(a) and its projections onto the (b) , (c) , and (d) planes. The size of each marker is proportional to the frequency of disease . We set for all diseases where is not significantly different from zero after the multiple hypothesis testing correction. The majority of disorders are clearly dominated by genetic risk factors (many points are close to the -axis). Some disorders cluster around the and axes indicating purely pathway-based and toxicogenomic origins. Intriguingly, there is precisely no disease that has a significant pathway-based and toxicogenomic comorbidity risk at the same time, see figure 3(d). However, a number of disorders with significant pathway-based or toxicogenomic risks have also significant genetic contributions, see figures 4(b) and (c). This can also be seen in table 1, where for instance the chronic nephritic syndrome ranks high in genetic and toxicogenomic comorbidity risks.

The per-link contributions, , of three types of pathophysiological mechanisms are shown in figures 3(e)-(h). Almost all disorders show one dominant comorbidity risk contribution, i.e. they cluster around a single axis. Again, most diseases show large genetic risks, while some cluster around the and axes. In the supporting information, SI Figure 2, we show results for where we allow comorbidities that are at the same time genetic and pathway-based/toxicogenomic (i.e. we drop the second condition for in equation 2). There are now disorders with, both, significant pathway-based and toxicogenomic comorbidity risks. For these comorbidities, however, there exists also a direct genetic mechanism that may account for the phenotypic comorbidities.

Figure 3: Classification of diseases (circles) according to the dominant causes of their phenotypic comorbidities. Results are shown for (a-d) the relative comorbidity risks and (e-h) their re-scaled versions, . Circle size is proportional to the number of disease occurrences. Re-scaling the risks by the degrees leads to almost perfect clustering of the diseases around one of the axes. The per-link contribution to the relative comorbidity risk is always dominated by one specific mechanism. Only a comparably small number of diseases cluster around the toxicogenomic axis. The comorbidity risks for most pathologies are dominated by genetic disease-causing mechanisms.

Table 1 shows the diseases with the largest genetic, pathway-based, or toxicogenomic comorbidity risks, ranked by statistical significance. The top genetic diseases include schizo-affective and delusional disorders, as well as schizophrenia. Different forms of osteoarthritis and chronic bronchitis, as well as nephrotic and nephritic syndromes also show high genetic comorbidity risks. The top pathway-based diseases are major depressive disorders, endocrine disorders such as obesity and amyloidosis, diseases of the nervous systems including epilepsy and extrapyramidal and movement disorders, as well as disorders of bone density and multiple myeloma. The top toxicogenomic diseases include various forms of dermatitis and other skin diseases such as lichen simplex chronicus and prurigo, but also aortic aneurysms, and the chronic nephritic syndrome.

Schizophrenia is indeed a highly heritable disorder that is associated with more than hundred gene loci Ripke14 (). The large pathway-based risk for depressions is corroborated by strong and supposedly bi-directional associations between the metabolic syndrome and depression, which have been a long-standing puzzle in epidemiological studies Pan12 (). Depressions also exhibit strongly significant genetic comorbidity risks (, ) in consistency with the finding of a gene-by-environment interaction where individuals with a functional polymorphism in the promoter region of the serotonin transporter (5-HT T) gene exhibited more depressive symptoms in relation to stressful life events Caspi03 (). The high toxicogenomic risks for aortic aneurysms are in line with the effects of chemicals such as nicotine and prostaglandin on related disease-genes Sakalihasan05 ().

rank genetic,
1 F25, Schizo-affective disorders 2.4
2 F20, Schizophrenia 2.4
3 M19, Osteoarthritis (unspecified) 2.9
4 N04, Nephrotic syndrome 2.2
5 J41, Simple, mucopurulent chronic bronchitis 2.1
6 J42, Chronic bronchitis (unspecified) 2.0
7 M15, Polyosteoarthritis 2.6
8 N03, Chronic nephritic syndrome 2.3
9 F22, Delusional disorders 2.6
10 M18, Osteoarthritis (first carpometacarpal joint) 2.5
1 F32, Major depressive disorder, single episode 1.1
2 F33, Major depressive disorder, recurrent 0.81
3 M85, Disorders of bone density and structure 1.8
4 G40, Epilepsy and recurrent seizures 0.65
5 E66, Overweight and obesity 0.83
6 E85, Amyloidosis 0.58
7 G25, Other extrapyramidal and movement disorders 0.66
8 H90, Conductive and sensorineural hearing loss 0.56
9 M21, Other acquired deformities of limbs 1.3
10 C90, Multiple myeloma, plasma cell neoplasms 0.90
1 I71, Aortic aneurysm and dissection 0.75
2 L21, Seborrheic dermatitis 0.65
3 L24, Irritant contact dermatitis 0.99
4 K52, Gastroenteritis and colitis 0.64
5 N03, Chronic nephritic syndrome 1.7
6 L20, Atopic dermatitis 1.2
7 L28, Lichen simplex chronicus and prurigo 0.69
8 L30, Unspecified dermatitis 0.58
9 I89, Noninfective disorders of lymphatic vessels and nodes 0.84
10 G91, Hydrocephalus 0.96
Table 1: Top 10 diseases in every class of disease-causing mechanisms, , and their relative comorbidity risks , ranked by the significance of its overlap with the phenotypic disease layer, .

Since phenotypic disease networks are known to undergo large changes in their topology as a function of the age of the underlying patient cohorts Chmiel14 (), we first clarified how the relative comorbidity risks depend on patient age. The age-dependent relative risks, , were computed using equation 3 and by replacing with its age-dependent counterpart, . Results for the average relative comorbidity risks over all diseases , denoted by , are shown in figure 4(a). Note that this average is also taken over diseases with comorbidity risks that are not significantly different from zero. The genetic comorbidity risk averaged over all diseases , , is substantially higher than the pathway-based or toxicogenomic risks and assumes values above 1 for ages between 30 and 90. Effects are considerably smaller for the average pathway-based (toxicogenomic) comorbidity risks that reach values around 0.5 at ages around 30 (50). These age differences in the peaks of the environmental comorbidity risks are driven by the age-dependence in the prevalences of the diseases that provide the most dominant contributions to . In all cases, results for clearly exceed the expectation values from the randomized risks , obtained from . Note that we have confirmed that the dominance of genetic disorders can not be a simple consequence of the exclusion of genetic comorbidities in the other molecular layers in equation 2. Removing this constraint would increase the average environmental contributions by a factor of about 1.5, while the genetic comorbidity risks exceed them by a factor between four and five. From now on we consider only the time-independent HDMN.

Figure 4(b) shows how much genetic, pathway-based, and toxicogenomic risks contribute to the observed comorbidities for subgroups of diseases that are given by the chapters of the ICD10 classification, the disease groups . Clear differences between groups of diseases are revealed. Genetically caused comorbidities include mental disorders, disorders of the digestive system, but also susceptibility to infections. Genetic mechanisms are least relevant for disorders of the eye, ear, skin, and for cancers. Pathway-based comorbidity risks are largest for, again, mental disorders and diseases of the genitourinary system. This shows that the group of mental disorders comprises heterogeneous phenotypes that have either genetically caused or pathway-based comorbidities. Toxicogenomic comorbidity risks are largest for diseases of the skin, the genitourinary and the respiratory system, as well as for congenital malformations.

Figure 4: Contributions of genetic, pathway-based, and toxicogenomic comorbidity risks. (a) The genetic risks, , clearly exceed the pathway-based, , and toxicogenomic, , risks across all ages of patients. The results for all three types of mechanisms exceed their expectations from the randomization test (markers connected by dotted lines, error bars show the standard deviation over 5,000 randomizations). (b) Averages of the relative risks are shown for the chapters of the ICD10 classification, the solid vertical lines show the values of genetic (blue), pathway-based (green) and toxicogenomic (red) risks averaged over all diseases. Diseases of the digestive system, mental disorders, and infections show the highest genetically caused comorbidity risk, whereas cancers, diseases of the skin, eye, and ear show the lowest genetic risks. Pathway-based contributions are also highest for mental disorders and toxicogenomic contributions assume their maximum for diseases of the genitourinary system.

The “nurture index”, , quantifies to which extent comorbidities of phenotype are caused by environmental, i.e. pathway-based or toxicogenomic, mechanisms,


Figure 5 shows results for (a) the heritability and (b) the number of new drug approvals as a function of . Each circle in figure 5 corresponds to a disease phenotype, labeled by its ICD10 code. The colors of the circles refer to their chapter in the ICD classification. The highest values of are found for diseases of the genitourinary system (N03 and N05 nephritic syndrome, N02 hematuria, N08 glomerular disorders), depressions (F32, F33), several cancers (C84 T/NK-cell lymphoma, C74 adrenal gland, C61 prostate), as well as bronchiectasis (J47). Figure 5(a) shows that there is a significant negative correlation between the nurture index, , and the broad-sense heritability, , of disorder . This corroborates that is indeed related to the plasticity of phenotype , i.e. increases with the influence of environmental risk factors. There is also a strong significant negative correlation between the logarithms of and shown in figure 5(b). We found this result to be very robust for a large variety of choices of this time span, ranging from five years upwards. Note that and show no significant correlation among them (, ). This indicates a significant bias in pharmaceutical R&D that favors market placements of drugs that target disorders with low environmental risk factors. It has indeed been shown that the success rates for drug development vary dramatically among disease areas Nelson15 (). These rates have been found to increase with the existence of direct genetic evidence, which in particular applies to diseases of the musculoskeletal system and infections, which we also identified as predominantly genetic in figure 3(b).

Figure 5: Heritability , (a) and the number of newly developed drugs (b) are negatively correlated with the relevance of environmental risk factors for diseases. Each circle corresponds to one disease phenotype, labeled by its three-digit ICD10 code. Both, and are shown as a function of the nurture index. Colors indicate the main ICD chapter to which the diseases belong. We observe particularly high values for diseases of the genitourinary system, various cancers, depression, and bronchiectasis.

Iv Conclusions

We developed a novel approach to quantitatively disentangle the most relevant genetic or environmental disease-causing mechanisms for a large number of particular disorders. This has become possible through recent advances in observing networks of phenotypic comorbidity relations with unprecedented precision Hidalgo09 (); Chmiel14 (). We considered three different classes of mechanisms that can be at the core of these observed comorbidities, namely genetic, pathway-based, and toxicogenomic mechanisms that cause more than one disorder. By constructing the HDMN we have been able to identify the most probable causes for 358 different phenotypes by measuring the overlap between phenotypic and pathophysiological comorbidities, the relative comorbidity risks . We find that the different environmental disease-causing mechanisms do not mix; we found no pathologies that have significant pathway-based and toxicogenomic comorbidity risk contributions at the same time. While for most of the studied diseases genetic risk factors dominate, we identify a number of disorders with significant environmental contributions which typically coincides with low heritability and lower rates of successful market placements of drugs.

Our approach cross-validates pathophysiological mechanisms by whether their predicted comorbidities are indeed directly observed in the population. Moreover we can rule out certain types of disease-causing mechanisms when the comorbidities that they predict are not observed. The methodology developed here can be extended to decide on a quantitative basis if the comorbidities predicted by a particular individual pathophysiological mechanism are also phenotypically relevant. The new technology can be used as a novel and data-driven way to validate potential drug targets.

iv.1 Acknowledgments

We are very grateful to Jörg Menche for stimulating discussions and acknowledge financial support from the European Commission, FP7 project MULTIPLEX No. 317532.


  • (1) Lim SS, Vos T, Flaxman AD, Danaei G, Shibuya K, et al, A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990-2010: a systematic analysis for the Global Burden of Disease Study 2010. The Lancet 2012; 380 (9859), 2224–60.
  • (2) Barabási A-L, Gulbahce N, Loscalzo J, Network medicine: A network-based approach to human disease. Nat Rev Genet 2011; 12(1): 56–68.
  • (3) Rzhetsky A, Wajngurt D, Park N, Zheng T, Probing genetic overlap among complex human phenotypes PNAS 2007; 104: 11694–9.
  • (4) Lee D-S, Park J, Kay KA, Christakis NA, Oltvai ZN, et al, The implications of human metabolic network topology for disease comorbidity. PNAS 2008; 105: 9880–5.
  • (5) Nalls MA, Saad M, Noyce AJ, Keller MF, Schrag A, et al, Genetic comorbidities in Parkinson’s disease. Hum Mol Genet 2014; 23(3): 831-41.
  • (6) Kitagishi Y, Kobayashi M, Kikuta K, Matsuda S, Roles of PI3K/AKT/mTOR pathway in cell signaling of mental illnesses. Depression Research and Treatment 2012; 2012: Article ID 752563, 8p.
  • (7) Guo S, Insulin signaling, resistance, and metabolic syndrome: insights from mouse models into disease mechanisms. J Endocrinol 2014; 22: T1-23.
  • (8) Dunbar JA, Reddy P, Davis-Lameloise N, Philpot B, Laatikainen T, et al, Depression: an important comorbidity with metabolic syndrome in a general population. Diabetes Care 2009, 31(12): 2368-73.
  • (9) Klimek P, Kautzky-Willer A, Chmiel A, Schiller-Frühwirt I, Thurner S, Quantification of diabetes comorbidity risks across life using nation-wide big claims data. PLoS Computational Biology 2015; 11(4): e1004125.
  • (10) Zhu Z, Oh MH, Yu J, Liu YJ, Zheng T, The role of TSLP in IL-13-induced atopic march. Sci Rep 2011; 1: 23.
  • (11) Manikkam M, Haque M, Guerrero-Bosagna C, Nilsson EE, Skinner MK, Pesticide methoxychlor promotes the epigenetic transgenerational inheritance of adult-onset disease through the female germline. PLoS ONE 2014; 9(7): e102091.
  • (12) Darlenski R, Kazandjieva J, Hristakieva E, Fluhr J, Atopic dermatitis as a systemic disease. Clinics in dermatology 2014; 32(3): 409-13.
  • (13) Pawson T, Linding R, Network medicine. FEBS Lett 2008; 582: 1266-70.
  • (14) Zanzoni A, Soler-López M, Aloy P, A network medicine approach to human disease. FEBS Lett 2009; 583: 1759–65.
  • (15) Online Mendelian Inheritance in Man, OMIM. McKusick-Nathans Institute of Genetic Medicine, John Hopkins University (Baltimore, MD).
  • (16) Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, et al, The human disease network. PNAS 2007; 104: 8685–90.
  • (17) Feldman I, Rzhetsky A, Vitkup D, Network properties of genes harboring inherited disease mutations. PNAS 2008; 105: 4323–8.
  • (18) Audouze K, Sierakowska Juncker A, Roque FJSSA, Krysiak-Baltyn K, Weinhold N, et al, Deciphering diseases and biological targets for environmental chemicals using toxicogenomics networks. PLoS Comput Biol 6(5): e1000788.
  • (19) Menche J, Sharma A, Kitsak M, Ghiassian SD, Vidal M, Loscalzo J, Barabási A-L, Uncovering disease-disease relationships through the incomplete interactome. Science 2015; 347: 6224.
  • (20) Sun K, Buchan N, Larminie C, Pržulj N, The integrated disease network. Integr Biol 2014; 6: 1069–79.
  • (21) Sun K, Goncalves JP, Larminie C, Pržulj N, Predicting disease associations via biological network analysis. BMC Bioinformatics 2014; 15: 304–316.
  • (22) Hidalgo C A, Blumm N, Barabási A-L and Christakis N A 2009 PLoS Comput. Biol. 5: 1–11
  • (23) Chmiel A, Klimek P, Thurner S, Spreading of diseases through comorbidity networks across life and gender. New Journal of Physics 2014; 16: 115013.
  • (24) Boccaletti S, Bianconi G, Criado R, del Genio C, Gómez-Gardeñes J, et al, Physics Reports 2014; 544: 1–122.
  • (25) Kivelä M, Arenas A, Barthelemy M, Gleeson JP, Moreno Y, Porter MA, Journal of Complex Networks 2014; 3(2): 203–271.
  • (26) Thurner S, Klimek P, Szell M, Duftschmid G, Endel G, et al, Quantification of excess-risk for diabetes when born in times of hunger, in an entire popuation of a nation, across a century. PNAS 2013; 110(12): 4703–7.
  • (27) http://apps.who.int/classifications/icd10/browse/2010/en, retrieved 01/18/2016.
  • (28) The UniProt Consortium, Activities at the Universal Protein Resource. Nucleic Acids Research 2014; 42: D191–8.
  • (29) Croft D, Mundo AF, Haw R, Milacic M, Weiser J, et al, The reactome pathway knowledgebase. Nucleic Acids Research 2014; 42: D472–7.
  • (30) Davis AP, Grondin CJ, Lennon-Hopkins K, Saraceni-Richards C, Sciaky D, et al. The comparative toxicogenomics database’s 10th year anniversary: update 2015. Nucleic Acids Research 2014 Oct 17.
  • (31) Osborne JD, Flatow J, Holko M, Lin SM, Kibbe WA, et al, Annotating the human genome with disease ontology. BMC Genomics 2009; 10(Suppl1): S6.
  • (32) Aymé S, Schmidtke J, Networking for rare diseases: a necessity for Europe. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2007; 50(12): 1477–83.
  • (33) Cariaso M, Lennon G, SNPedia: a wiki supporting personal genome annotation, interpretation and analysis. Nucleic Acids Res 2012; 40: D13008–12.
  • (34) Yildirim MA, Goh KI, Cusick ME, Barabási AL, Vidal M, Drug-target network. Nat Biotechnol 2007; 25(10): 1119–26.
  • (35) Ripke S, Neale BM, Corvin A, Walters JTR, Farh K-H, et al, Biological insights form 108 schizophrenia-associated genetic loci. Nature 2014; 511(7510): 421–7.
  • (36) Pan A, Keum N, Okereke OI, Sun Q, Kivimaki M, et al, Bidirectional association between depression and metabolic syndrome. Diabetes Care 2012; 35(5): 1171–80.
  • (37) Caspi A, Sugden K, Moffitt TE, Taylor A, Craig IW, et al, Influence of life stress on depression: moderation by a polymporphism in the 5-HTT gene. Science 2003; 301(5631): 386–9.
  • (38) Sakalihasan N, Limet R, Defawe OD, Abdominal aortic aneurysm. The Lancet 2005; 365(9470): 1577–89.
  • (39) Nelson MR, Tipney H, Painter JL, Shen J, Nicoletti P, et al, The support of human genetic evidence for approved drug indications. Nature Genetics 2015; 47: 856-60.

Appendix A Supplementary Information

a.1 Text S1: Further details on data extraction and MIM-ICD10 mappings

Genetic disease associations are extracted from the Online Mendelian Inheritance in Man (OMIM) dataset, from which we obtained a list of 4,847 associations between disorders (phenotype MIM numbers) and genes OMIM (). Thereby we included only those phenotype-gene associations for which the molecular basis of the disorder is known (i.e. a phenotype mapping key with value 3 in the OMIM dataset). Metabolic disease associations stem from the UniProtKB database, which provides a list of 3,020 proteins that are known to be involved in disorders in humans (given by phenotype MIM numbers) uniprot (). The REACTOME database cross-references these proteins with pathways in which they occur Croft14 (). Toxicogenomic disease associations are obtained from the Comparative Toxicogenomic Database as a list of 4,925 associations between disorders (phenotype MIM numbers or MeSH ID) and chemicals Davis14 (). Here we only use curated disease-chemical associations, i.e. those for which direct, literature-curated evidence exists.

To obtain mappings from MIM phenotype numbers and MeSH codes to the ICD10 classification we compile three different data sources. In addition to the Human Disease Ontology database Osborne09 () and mappings provided from OrphaNet Ayme07 (), we extracted mappings by crawling a disease index page from Wikipedia (https://en.wikipedia.org/wiki/ICD-10, retrieved 04/30/2015). From these three sources result 85,303 MeSH-ICD10 and 5,498 MIM-ICD10 associations. Aggregated to the three-digit ICD10 level, of the ICD10 codes can be mapped to MeSH codes and to MIM numbers. This lower number of successfully translated MIM numbers is partly due to the fact that not for all diseases a molecular basis is known or even relevant.

While the ICD10 codes are primarily used for billing and clinical purposes, the OMIM classification focuses on descriptive phenotypes of inherited conditions. From this follows the limitation that some very specific OMIM codes might link to highly unspecific ICD10 codes and vice versa. For instance, colorectal cancer has one MIM number (114500) but four different ICD10 codes on the three-digit level, C18-C21. These four phenotypes are not only connected among each other; each disease that is genetically linked to colorectal cancer is also linked to all four of these ICD10 codes. Consequently the diagnoses C18-C21 have the highest degrees in the genetic comorbidity network. Similarly, the ICD10 codes for essential hypertension (I10-I13) all map to a single MIM number and have the highest degrees in the toxicogenomic disease network. We therefore adjusted for such biases by re-scaling the relative comorbidity risk by the node degree .

a.2 Text S2: Broad-sense heritability

Heritability is a measure that quantifies how much variation in a phenotypic trait (such as a disease) in a population is due to genetic variation among individuals in the population. More specifically, if is the genetic variation and the variation in the population, the broad-sense heritability, , is defined as . Sloppily defined, heritability measures the proportion of (disease) risk that is due to the genetic background of an individual, as opposed to environmental factors. However, high values of heritability do not necessarily imply a high disease risk, as it may be relatively easy to prevent certain genetic diseases by certain interventions.

Table 1: Overview of characteristics of the HDMN layers. The numbers of non-isolated nodes, , and links, , are given for four different layers.
Figure 1: Classification of diseases (circles) according to the dominant contributions to their phenotypic comorbidities for an alternative definition of the re-scaled relative comorbidity risks, , where two diseases can at the same be comorbid in a genetic of pathway-based / toxicogenomic way. Again, most diseases cluster around one of the axis with a clear dominance of genetic comorbidity risks.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description