119
저작자표시-비영리-변경금지 2.0 대한민국 이용자는 아래의 조건을 따르는 경우에 한하여 자유롭게 l 이 저작물을 복제, 배포, 전송, 전시, 공연 및 방송할 수 있습니다. 다음과 같은 조건을 따라야 합니다: l 귀하는, 이 저작물의 재이용이나 배포의 경우, 이 저작물에 적용된 이용허락조건 을 명확하게 나타내어야 합니다. l 저작권자로부터 별도의 허가를 받으면 이러한 조건들은 적용되지 않습니다. 저작권법에 따른 이용자의 권리는 위의 내용에 의하여 영향을 받지 않습니다. 이것은 이용허락규약 ( Legal Code) 을 이해하기 쉽게 요약한 것입니다. Disclaimer 저작자표시. 귀하는 원저작자를 표시하여야 합니다. 비영리. 귀하는 이 저작물을 영리 목적으로 이용할 수 없습니다. 변경금지. 귀하는 이 저작물을 개작, 변형 또는 가공할 수 없습니다.

Genetic Characteristics of Korean Patients with Autosomal … · 2019. 11. 14. · multicenter Korean ADPKD cohort constructed to investigate the natural course of the disease and

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • 저작자표시-비영리-변경금지 2.0 대한민국

    이용자는 아래의 조건을 따르는 경우에 한하여 자유롭게

    l 이 저작물을 복제, 배포, 전송, 전시, 공연 및 방송할 수 있습니다.

    다음과 같은 조건을 따라야 합니다:

    l 귀하는, 이 저작물의 재이용이나 배포의 경우, 이 저작물에 적용된 이용허락조건을 명확하게 나타내어야 합니다.

    l 저작권자로부터 별도의 허가를 받으면 이러한 조건들은 적용되지 않습니다.

    저작권법에 따른 이용자의 권리는 위의 내용에 의하여 영향을 받지 않습니다.

    이것은 이용허락규약(Legal Code)을 이해하기 쉽게 요약한 것입니다.

    Disclaimer

    저작자표시. 귀하는 원저작자를 표시하여야 합니다.

    비영리. 귀하는 이 저작물을 영리 목적으로 이용할 수 없습니다.

    변경금지. 귀하는 이 저작물을 개작, 변형 또는 가공할 수 없습니다.

    http://creativecommons.org/licenses/by-nc-nd/2.0/kr/legalcodehttp://creativecommons.org/licenses/by-nc-nd/2.0/kr/

  • 의학박사 학위논문

    Genetic Characteristics of Korean

    Patients with Autosomal Dominant

    Polycystic Kidney Disease by

    Targeted Exome Sequencing

    2019년 4월

    서울대학교 대학원

    임상의과학과 임상약리학

    김 현 숙

  • Genetic Characteristics of Korean

    Patients with Autosomal Dominant

    Polycystic Kidney Disease by

    Targeted Exome Sequencing

    지도교수 장 인 진

    이 논문을 김현숙 박사 학위논문으로 제출함

    2019년 4월

    서울대학교 대학원

    임상의과학과 임상약리학

    김 현 숙

    김현숙의 박사 학위논문을 인준함

    2019년 월

    위 원 장 (인)

    부 위 원 장 (인)

    위 원 (인)

    위 원 (인)

    위 원 (인)

  • - I -

    Abstract

    Patients with Autosomal Dominant

    Polycystic Kidney Disease by

    Targeted Exome Sequencing

    Hyunsuk Kim

    Department of Clinical Medical Sciences

    The Graduate School

    Seoul National University

    Autosomal dominant polycystic kidney disease (ADPKD) is a monogenic

    disorder that can affect the kidneys as well as the liver and one of the main

    causes of end-stage renal disease (ESRD). Genetic information is of the utmost

    importance in understanding pathogenesis of ADPKD. Therefore, this study

    aimed to demonstrate the genetic characteristics of ADPKD and their effects on

    kidneys and liver in 749 Korean ADPKD subjects from 524 unrelated families.

    Genetic studies of PKD1/2 were performed using targeted exome sequencing

    combined with Sanger sequencing in exon 1 of the PKD1 gene and a multiple

    ligation probe assay. Total kidney volume was calculated by ellipsoid method,

    total liver volume was measured by stereology and adjusted for height (htTKV,

    htTLV). Using htTLV, disease severity was classified as no or mild (htTLV

    2,400 mL/m). Associations between the severity of PKD or PLD and the type

    and position of genetic mutations were analyzed using a multilevel model

    controlling for family-level clustering (within-family correlations). The mutation

  • - II -

    detection rate was 80.7% (423/524 families, 331 mutations) and 70.7% was novel.

    PKD1 protein-truncating (PKD1-PT) genotype was associated with younger

    age at diagnosis, larger htTKV, lower renal function compared to

    PKD1-non-truncating and PKD2 genotypes. The PKD1 genotype showed

    earlier onset of ESRD compared to PKD2 genotype (53.3[46.7, 58.7] vs. 63.1[54.0,

    67.5] years old, P=0.003). In frailty model controlled for age, gender, and familial

    clustering effect, PKD2 genotype had 0.2 times lower risk for reaching ESRD

    than PKD1-PT genotype (p=0.037).

    As for liver phenotype, severe PLD showed more rapid progression to

    end-stage renal disease and higher mortality. Sex, age, multiple pregnancies, and

    female sex hormone use were associated with moderate to severe PLD.

    Family-level clustering effects existed in moderate to severe PLD implying

    genetic link to PLD phenotype, however, unlike kidney, PLD severity did not

    correlate with PKD1/2 genotype but the frequency of No Mutation was

    increased only in mild PLD group. Additionally, the C-type lectin domain might

    be a potential warm spot for moderate to severe PLD in both PT and

    non-truncating PKD1 mutations.

    In conclusion, the results suggest that genotyping can contribute to selecting

    rapid progressors of kidneys for new emerging therapeutic interventions among

    Koreans. However, additional genetic effects on PLD have to be investigated.

    ……………………………………………………………………………………………

    keywords : Autosomal dominant polycystic kidney disease,

    targeted exome sequencing, rapid progressor, moderate to severe

    polycystic liver disease, Genetic variation, phenotype

    Student Number : 2015-30809

  • - III -

    CONTENTS

    Abstract·…………………………………………………………………Ⅰ

    Contents…………………………………………………………………III

    List of tables and figures……………………………………………IV

    1st part: Kidney

    Introduction………………………………………………………………1

    Results……………………………………………………………………3

    Discussione………………………………………………………………7

    Methods…………………………………………………………………12

    Figure Legends and Tables·…………………………………………20

    Supplementary information……………………………………………27

    2nd part: Liver

    Introduction………………………………………………………………67

    Results……………………………………………………………………68

    Discussion·………………………………………………………………74

    Methods·…………………………………………………………………79

    Figure Legends and Tables·…………………………………………82

    Supplementary information……………………………………………90

    References··……………………………………………………………99

    Abstract(Korean)····…………………………………………………109

  • - IV -

    LIST OF TABLES AND FIGURES

    1st part: Kidney

    Table 1. Frequency of each class and grade in 524 families………………22

    Table 2. Baseline Clinical Characteristics According to PKD Genotypes…………23

    Table 3. A. Multilevel multivariate linear regression of log(htTKV) or eGFR

    according to genotype B. Multivariate linear regression of log(htTKV) or eGFR

    for comparing each slope according to genotype……………………………………24

    Supplementary Table S1. Demographic Findings of 749 Patients (524 Families)…27

    Supplementary Table S2. Mutations found in this study……………………………28

    Supplementary Table S2A. PKD1 protein-truncating mutations in study families·…28

    Supplementary Table S2B. PKD1 in-frame insertion/deletions in study families……41

    Supplementary Table S2C. PKD1 non-truncating mutations in study families………42

    Supplementary Table S2D. PKD2 protein-truncating mutations in study families·…47

    Supplementary Table S2E. PKD2 non-truncating mutations in study families………50

    Supplementary Table S3. Design of Custom Bait·……………………………………51

    Supplementary Table S4. Quality of Targeted Exome Sequencing·………………51

    Supplementary Table S5. Definition of Grades··………………………………………52

    Supplementary Table S6. Results from Validation Study(n=80)……………………53

    Supplementary Table S7. MLPA validation using Long Range PCR sequencing…56

    Figure 1. Summary of identified mutations……………………………………………25

    Figure 2. Cross-sectional analysis between age and log(htTKV) or eGFR A.

    Scatter plot and trend line of log (htTKV) according to genotype. B. Scatter

    plot and trend line of log (htTKV) according to genotype…………………………26

  • - V -

    Figure 3. Renal outcomes and death according to type of PKD1/2 mutation. A.

    Survival analysis of ESRD progression. B. Survival analysis of all-cause

    mortality………………………………………………………………………………………26

    Supplementary Figure S1·…………………………………………………………………58

    Supplementary Figure S2·…………………………………………………………………59

    Supplementary Figure S3·…………………………………………………………………60

    Supplementary Figure S4·…………………………………………………………………61

    Supplementary Figure S5·…………………………………………………………………62

    Supplementary Figure S6·…………………………………………………………………63

    Supplementary Figure S7·…………………………………………………………………64

    Supplementary Figure S8 Inverse relationship of GC content and the depth of

    coverage for PKD1…………………………………………………………………………·65

    Supplementary Figure S9. Normalized read depth in PKD1 and PKD2 of

    mutations that detected via MLPA………………………………………………………66

    2st part: Liver

    Table 1. Frequency of each class and grade in 524 families………………84

    Table S1. Distribution of PKD genotype according to liver cysts·…………………93

    Table S2. Mutations of moderate to severe PLD families·…………………………95

    Table S3. Mutations of moderate to severe PLD males·……………………………95

    Figure 1. A. ESRD progression according to PLD severity B. All cause mortality

    according to PLD severity…………………………………………………………………86

    Figure 2. Impact of female sex hormone on severity of PLD··……………………86

  • - VI -

    Figure 3. A. Distrubution of htTLV among siblings B. htTLV of siblings

    excluding the subjects having highest htTLV C. Distrubution of htTLV among

    siblings excluding famales over 45 years D. htTLV of siblings excluding the

    subjects of highest htTLV and famales over 45 years·…………………………… 87

    Figure 4. A. Location of mutation of PKD1 in moderate to severe PLD B.

    Location of mutation of PKD1 in moderate to severe PLD…………………………89

    Figure S1. Distribution of htTLV according to age and sex·………………………90

    Figure S2. Distribution patterns of PKD genotypes according to severity………91

    Figure S3. A. htTLV according to sex and class of PKD gene B. htTLV

    according to sex and class of PKD gene………………………………………………92

    Figure S4. Distribution patterns of PKD genotypes according to no liver cyst

    and liver cyst groups·………………………………………………………………………94

    Figure S5. Mutations of moderate to severe PLD males……………………………97

    Figure S6. Diagram of study patient selection in HOPE PKD cohort……………98

  • 1

    1stpart: Kidney

    INTRODUCTION

    Autosomal dominant polycystic kidney disease (ADPKD) is a potentially lethal

    monogenic disorder characterized by numerous cysts in the kidneys and

    progressive renal failure1. Since the disease is caused by a mutation either in

    PKD1 or PKD2, genetic information is of the utmost importance in

    understanding the pathogenesis of ADPKD.

    The ADPKD mutation database (PKDB, http://pkdb.mayo.edu/) is the largest

    international repository for variants found in PKD genes. By September 2018,

    1895 pathogenic mutations in the PKD1 gene and 438 in the PKD2 gene were

    registered in the PKDB. However, variants registered in the PKDB mainly came

    from the results of genetic studies in Western population. There have been only

    a few reports that presented genetic profiles on Asian ADPKD patients, since

    analyzing PKD1/2 mutations has been difficult in clinical practice. First,

    two-thirds of the 5’ region of PKD1 are duplicated by gene conversion (GC) on

    chromosome 16 within six pseudogenes (PKD1P1-P6)2. Because these

    pseudogenes have a sequence that is 97.7% identical to the original PKD1 gene

    sequence, cloning and sequencing PKD1 has been difficult with direct

    sequencing. Moreover, high allelic heterogeneity of both PKD1 and PKD2

    makes molecular diagnosis of ADPKD more challenging. Previously, indirect

    methods such as two-dimensional gene scanning, denaturing gradient gel

    electrophoresis, and single-strand conformation polymorphism analysis have been

    applied to find pathogenic mutations3-6. However, these indirect methods

    demonstrated false-positive results with a low detection rate compared to direct

    sequencing. Therefore, a long-range polymerase chain reaction (PCR) with

    subsequent nested PCR was developed to isolate the entire PKD1 gene without

  • 2

    interference from the six pseudogenes (PKD1P1-P6)7. However, direct Sanger

    sequencing is time-consuming and costly. Therefore, there has been an urgent

    need for faster and more efficient way to detect mutations in ADPKD.

    Targeted exome sequencing (TES) using next-generation sequencing (NGS) is

    one of the most suitable methods to study monogenic disorders such as

    ADPKD. It has advantages over Sanger sequencing including that a large

    amount of data can be analyzed automatically in a short period of time and can

    be done cheaply. In addition, researchers can focus on a few genes and exons to

    discover mutation spots. Therefore, TES, rather than whole exome sequencing

    or whole genome sequencing, has been used more popularly as a genetic

    screening tool to detect mutations in ADPKD. Since TES became available in

    clinical practice for genetic diseases, the detection rate gradually improved 7.

    The importance of genotype on clinical outcome has been demonstrated from

    many previous studies. Cornec-Le Gall et al.8 reported the influence of PKD1

    mutation type on renal survival. HALT-PKD and CRISP Investigators showed

    the difference of renal survival in PKD1 NT carriers according to their

    conservation and chemical change9. Previously, anecdotal report from Asian

    countries reported novel mutations in Asian ADPKD populations 10-12. However,

    they were either case series or included small samples. No well-designed cohort

    studies have been conducted to evaluate genetic profiles and genotype-

    phenotype associations in Asian populations. The coHOrt for genotype-

    PhenotypE correlation in ADPKD (HOPE-PKD) is a large prospective,

    multicenter Korean ADPKD cohort constructed to investigate the natural course

    of the disease and genotype-phenotype associations. Demographic and clinical

    information of the subjects, as well as family trees, is collected, and specimens

    including DNA, serum, and urine sample are collected and stored from the

    subjects once a year13. To describe the genetic characteristics of ADPKD in

  • 3

    Koreans, I performed a cohort-based genetic analysis using the HOPE-PKD

    cohort.

    RESULTS

    Demographic Characteristics of 749 Subjects of the Korean HOPE-PKD

    Cohort

    Among 749 subjects, 360 (48.1%) were males and 389 (51.9%) were females.

    The mean age was 46.4 ± 13.3 years (male, 44.6 ± 14.4; female, 48.0 ± 12.0; P

    value for gender difference,

  • 4

    determined to have HLP mutations, and 2 families were proven to have LN

    mutations. Thus, the mutation detection rate was 80.7% (423/524 families) by

    TES, PKD1 exon 1 Sanger sequencing, MLPA, and familial segregation analysis

    (see Supplementary Fig. S1 online).

    Baseline Clinical Characteristics According to PKD Genotypes and

    Summary of Identified Mutations

    A total of 364 variants were found in 476 of the 524 families, and of these,

    331 mutations were detected among 423 families in the HOPE-PKD cohort

    (Figure 1). Among the 749 subjects, 68% had PKD1 variants, 16% had PKD2

    variants, and 16% were classified as having no mutations (NM) (Figure 1A).

    Excluding NM, 81% of the 749 subjects and 82% of the 524 families had PKD1

    mutations (Figure 1B). Figure 1C shows the distribution of mutations by class

    among the 630 subjects with mutations. The frameshift mutation (39%) was the

    most common mutation class in the PKD1 genotype, while nonsense mutations

    (52%) were most common in the PKD2 genotype. The prevalence of classes of

    variants (number of families that shared the same variants) among the 524

    families was as follows: PKD1: nonsense, 86 (19); frameshift 136 (24), missense,

    90 (17); PKD2: nonsense, 39 (20); frameshift, 11 (0); missense, 12 (4) (Figure

    1D, Left, Table 1). The prevalence of the grades of the variants (number of

    families that shared the same variants) in the 524 families was as follows:

    PKD1: DP, 248 (46); HLP, 61 (20); and LP (0), 39; PKD2, DP, 63 (22); HLP, 10

    (4); and LP, 2 (0) (Figure 1D, Right, Table 1).

    Comparison of Mutation Frequency in Affected Polycystin Domains

    Between HOPE-PKD and other Cohorts

  • 5

    Among the 331 mutations, 97 were registered in either the PKDB or Toronto

    Genetic Epidemiology Study of PKD14. Therefore 70.7% (234/331) of the

    mutations were novel (see Supplementary Table S2 online). Mutations were

    found more frequently in the C-type lectin and G protein-coupled receptor

    proteolytic site (GPS) domains of polycystin-1 in the Korean HOPE-PKD cohort

    than in those previously reported from PKDB. Likewise, mutations were detected

    more frequently in the linker domain of polycystin-2 in my patients compared to

    those in the PKDB. The standardized residual of the Chi-square test (cut-off of

    significance, 2.58) was 2.8 in the C-type lectin of polycystin-1 and the linker

    domain of polycystin-2, and 2.6 in the GPS domain of polycystin-1 (see

    Supplementary Fig. S2 online).

    Baseline Clinical Characteristics According to PKD Genotypes

    For genotype-phenotype analysis, PKD1 or PKD2 mutations were further

    classified into four PKD genotypes according to the previously described manner

    14; PKD1 protein-truncating (PKD1-PT), PKD1 small (fewer than five amino

    acids) in-frameshift indels (PKD1-ID), PKD1 non-truncating (PKD1-NT), and

    PKD2 genotypes. A total of 630 patients in 423 families after excluding NM

    were re-classified into four genotype groups: PKD1-PT, 371 (59%); PKD1-ID,

    21 (3%); PKD1-NT, 119 (19%); PKD2, 119 (19%). PKD1-PT genotype was

    associated with younger age at the first visit (40.2 ± 12.9), earlier onset of

    hypertension (37.3 ± 10.1), younger ESRD age (52.5 ± 9.5), larger

    height-adjusted total kidney volume (htTKV) (898 [476, 1546], mL/m), and lower

    eGFR (63.5 ± 42.2, mL/min/1.73 m2) compared to other genotypes. The median

    follow-up duration was 77.3 months (Table 2).

  • 6

    As the Mayo imaging classification changed from 1A to 1E, the proportion of

    PKD1-PT genotype increased. Conversely, as the genotype changed from PKD2

    to PKD1-PT, the proportion of Mayo imaging classification 1C-E increased (P

  • 7

    genotypes (Figure 2B and Table 3A). The family-level clustering effect was

    also significant (P=0.029).

    Incidence of ESRD and Death According to PKD Genotypes

    The PKD1 genotype showed an earlier onset of ESRD than PKD2 genotype

    (53.3[46.7, 58.7] vs. 63.1[54.0, 67.5], P=0.003) (see Supplementary Fig. S4 online).

    Among the 630 patients, with the exclusion of the NM subjects (n=119) from

    the total of 749 subjects, 122 ESRD events were observed during follow-up

    (Table 2). The ESRD incidence was 21.7% (110/508) in PKD1 and 10% (12/119)

    in PKD2 (P=0.003). The incidence of ESRD was lower in the PKD2 genotype

    than for other genotypes (PKD2, 12 (10.1%) vs. PKD1-PT, 84 (22.6%);

    PKD1-ID, 4 (19.0%); PKD1-NT, 22 (18.5%), P for trend = 0.004). A multi-level

    multivariate Cox proportional hazard (frailty) model was used to compare the

    effect of genotype on ESRD. When PKD1-PT genotype was used as a

    reference, PKD2 genotype showed a 0.22 times lower hazard ratio than

    PKD1-PT genotype. The familial clustering effect was significant (P=0.037)

    (Figure 3A).

    There were 23 death events during follow up. Either incidence of death or age

    of death did not show a significant difference among genotypes (Table 2). In

    the frailty model, the incidence was too small to find any significant difference

    according to genotypes (Figure 3B).

    DISCUSSION

    As new therapeutic agents became available as disease-course modifying

    interventions, understanding the natural course of ADPKD and identifying

  • 8

    high-risk candidates for new clinical interventions have become important.

    Although defining genetic variations and their associations with phenotypes is

    the first step to understand the natural course of genetic diseases, there have

    been only a few reports on genetic profiles of Asian ADPKD populations4,10,11,15.

    In this study, I investigated the genetic characteristics of PKD1/2 mutations

    using a large prospective Korean ADPKD cohort. In brief, 331 mutations were

    detected by TES using NGS combined with PKD1 exon 1 Sanger sequencing

    and MLPA. The detection rate of mutations was 80.7% (423/524 families).

    Among them, 70.7% of mutations were novel. The prevalence of PKD1 and

    PKD2 among 524 families was 84% and 16%, respectively. Mutations were

    more frequently found in the C-type of lectin and GPS domains of polycystin-1

    and the linker domain of polycystin-2 among the Korean ADPKD population

    than among Western populations.

    TES using NGS was applied in this study. There was a concern about the

    decreased efficacy of TES in mutation detection because of large amount of

    pseudogenic lesion in PKD1. However, demonstrated from the milestone study

    by Trujillano and colleagues16, NGS was not inferior in mutation detection and

    could be a possible substitute for Sanger sequencing because Sanger sequencing

    itself is expensive and time consuming. NGS is economically feasible,

    convenient, and has the possibility of automation. However, low or high GC

    content can reduce the mutation detection rate by NGS by affecting the probe

    hybridization and PCR amplification steps. The false negative rate is expected to

    be elevated for PKD1 exon 1, which has many GC-rich regions. Moreover,

    large deletions can easily be missed by NGS. In this study, I tried to

    compensate for these limitations by exon-specific Sanger sequencing of PKD1

    exon 1 and MLPA. However, the detection rate of in my study was lower than

    that with Sanger sequencing with long-range PCR11. One possible reason for the

  • 9

    low detection rate is that I performed target enrichment with whole genomic

    DNA instead of pure PKD1 and PKD2 genomic regions. A previous study

    suggested using PCR to generate locus-specific amplicons of PKD1 before

    sequencing. Therefore, refining the NGS pipeline to increase sequencing depth

    through locus-specific PCR amplicon enrichment should be considered to improve

    the detection rate. Another possible explanation is that my population may be

    enriched with novel Asian-specific missense variants with unknown significance.

    Therefore, further functional analysis and Sanger sequencing of PKD1/2 should

    be considered in the NM cases to further improve the detection rate.

    The proportions of PKD1 and PKD2 were similar to those from previous

    data7. However, the mean age of ESRD in Korean patients with ADPKD was

    52.6 years for the PKD1 genotype and 60.5 for the PKD2 genotype, which

    seemed to be younger than that of Westerners, especially for PKD2 subjects.

    This discrepancy may be related to selection bias, because my study only

    included ADPKD patients from tertiary hospitals. This means that milder cases

    who tended to visit primary clinics could have been excluded from this analysis.

    Nonetheless, further study is needed to evaluate factors causing faster renal

    progression in my ADPKD patients compared to those from other ethnic

    populations.

    Excluding NM, 331 mutations were ultimately detected in the current study.

    When compared with those reported in other large cohort studies, the

    distribution of the mutation classes was similar to those presented in previous

    studies. Interestingly, my study revealed a high proportion of novel variants.

    This can be explained by the founder effect or repetitive mutations, as a recent

    Chinese study also showed similar findings (personal communication).

    As part of that effort, I compared the frequency of mutations in certain

    domains of polycystin-1 or polycystin-2 with the results from the PKDB for the

  • 10

    C-type lectin and GPS domains of polycystin-1 and the linker domain of

    polycystin-2. The C-type lectin domain of polycystin-1 (protein product of

    PKD1) binds carbohydrate matrices in vitro and may be involved in protein

    carbohydrate interactions in vivo. Polycystin-1 undergoes cleavage at the

    juxtamembrane GPS, a process likely to be essential for its biological activity.

    This GPS is essential for the intracellular transport. Finally, the interaction of

    polycystin-2 with polycystin-1 is mediated by the C-terminal coiled-coil domain

    of polycystin-2, and the linker region between the EF-hand and coiled-coil

    domains is not sufficient to mediate the association by itself. The presence of

    the N-terminal linker to the polycystin-2 coiled-coil domain allows a tighter

    association. The implications of these findings for the prognosis for Asian

    patients with ADPKD should be clarified in the future.

    This study enrolled patients with typical ADPKD imaging findings if the

    subjects had no familial history. Therefore, the NM group seems to be an entity

    that shares other clinico-genetic characteristics. Cases of NM were not directly

    included in the analysis, but most of the patients with NM showed similar or

    milder clinical features compared to PKD2 genotype, suggesting that they are

    groups with mutations in regions such as introns or mosaicism, which are

    difficult to detect using TES. Additionally, Other genes (i.e. GANAB, DNAJB11)

    involved in atypical presentations of ADPKD could have been part of the

    targeted NGS approach. Further studies to clarify the meaning of NM are

    warranted.

    I found families that had two or more novel missense variants. In the absence

    of a functional assay, assessing the putative pathogenicity of missense variants

    is not reliable. To validate the clinical pathogenicity of missense variants, I

    analyzed family-based segregation studies including at least one affected and/or

    one unaffected member in each family. In most of the cases, family-based

  • 11

    segregation analysis was very helpful to discriminate the pathogenicity of

    missense variants. However, for a complete segregation study, it is necessary to

    investigate all household members accessible through the family tree. Complete

    segregation can be determined especially clearly when the affected subject is

    examined together with both parents, which is not always possible. When the

    affected subject is examined with only his/her siblings or children, the power to

    completely discriminate the pathogenicity of the variant may be lowered.

    However, I made every effort to overcome these limitations by enrolling

    additional families until the pathogenicity of the variants was clearly determined.

    My study showed that PKD1-PT genotype was the most common genotype

    (59%). The distribution of each genotype was similar to what has been reported

    in previous studies7. Subjects with the PKD1-PT genotype had an early onset

    of hypertension, larger htTKV, lower eGFR, and more frequent ESRD incidence

    than other genotypes. The family-level clustering effect for log(htTKV) and

    ESRD onset was significant. However, it was also noteworthy that there existed

    discrepancies between genotypes and phenotypes. For example, the proportions of

    mild phenotypes of Mayo class 1A and 1B were 7% and 25% in individuals with

    the PKD1-PT genotype, which suggests that other modifying factors may exist.

    A strength of the current study is that it is unique as it represents the first

    large cohort analysis of an Asian population. This study reveals many novel

    mutations to the PKD1 and PKD2 genes as well as novel mutation enrichment

    sites within the ADPKD proteins. My study made efforts to overcome the

    shortcomings of NGS by performing PKD1 exon-specific Sanger sequencing,

    MLPA, and a familial segregation study. In addition, the HOPE-PKD cohort is

    fairly well designed and has a substantial number of subjects. Nonetheless, there

    are some limitations. Since this study was conducted on patients in tertiary

    hospitals in South Korea, there is a possibility of selection bias that only severe

  • 12

    patients come to the hospital. Thorough evaluation and collection of family

    history was difficult since not all the members were registered at the hospital.

    Therefore, it was difficult to investigate the exact family-level clustering effect.

    Moreover, I could not find familial clustering effects for ESRD or mortality,

    because of the small number of events. Due to the lost to follow-up of the

    patient and the absence of DNA samples, I were unable to perform PKD1/PKD2

    LR-PCR/Sanger sequencing for the subjects with NM. Therefore, I could not

    present the exact specificity parameter of NGS. In addition, I used the ellipsoid

    equation to measure TKV instead of stereology, which is less accurate.

    In conclusion, I performed a comprehensive genotype study using one of the

    largest Asian ADPKD cohorts, HOPE-PKD, and demonstrated the characteristics

    of variants and mutations in Korean patients. The results also strongly suggest

    that an analysis of the ADPKD genotype can contribute to the selection of

    rapidly progressing patients for new emerging therapeutic interventions. In

    regard to the prognostic score combining genetic and clinical information, the

    PRO-PKD score has been developed earlier for the European. Likewise, the

    development of a score suitable for Koreans is deemed necessary in future

    studies.

    METHODS

    Subjects

    A total of 866 subjects from 641 unrelated families registered in the

    HOPE-PKD cohort between 2009 and 2016 were screened. All methods were

    carried out in accordance with relevant guidelines and regulations. The study

    protocol was approved by the Institutional Review Board of Seoul National

  • 13

    University Hospital (IRB No; 1205-112-411). The informed consent was obtained

    from all subjects before performing study.

    Subjects aged over 18 who had compatible imaging findings using the Unified

    criteria17 with or without family history were enrolled in HOPE-PKD cohort.

    Among the screened patients, 117 patients were excluded from the analysis

    including those who had active cancer (n=4) or liver cirrhosis (n = 4) and

    chronic HBV or HCV hepatitis (n = 18) that may change liver volume or affect

    renal function and those who had no kidney or liver volume data available

    (n=91, see Supplementary Fig. S5 online). A total of 749 subjects from 524

    unrelated families were included in the analysis.

    Data Collection

    Baseline demographic profiles were collected including age, gender, height,

    comorbidities (hypertension, diabetes, chronic liver disease, and cancer), familial

    history of ESRD and their age at the diagnosis of ESRD. Hypertension was

    considered present when the subject had either systolic blood pressure > 140

    mmHg or diastolic blood pressure > 90 mmHg or was receiving treatment with

    antihypertensive medications.

    Serum creatinine was measured using the Jaffe method and traced using

    isotope dilution mass spectrometry. The Chronic Kidney Disease Epidemiology

    Collaboration formula was used to calculate eGFR18. ESRD was defined as either

    an eGFR < 15 mL/min/1.73m2 or initiation of renal replacement therapy.

    Participants underwent non-contrast computed tomography (CT) exams every 2

    to 3 years using multi-detector CT scanners. TKV was measured by the

    ellipsoid method an dadjusted to height19 by a single skilled technician.

  • 14

    Variant Screening of PKD1 and PKD2 by TES

    Genetic analysis of the PKD1 and PKD2 genes was performed using TES.

    For cases with inadequate results, additional Sanger sequencing for exon 1 of

    PKD1 gene and MLPA were done. For determining pathogenicity, familial

    segregation analysis was performed.

    For TES, a customized 120-mer RNA bait was designed to capture all exons

    of the PKD1 and PKD2 genes and their flanking intronic sequences16 (see

    Supplementary Table S3 and S4 online). The bait was tiled 3 times to increase

    the coverage of poorly covered regions. For the generation of standard target-all

    exon capture libraries, the Agilent SureSelect™ target enrichment protocol for

    the Illumina paired-end sequencing library (version B.3, June 2015) was used

    together with an input of 500 ng of genomic DNA and sequenced using the

    HiSeq™ 2500 platform (Illumina, San Diego, CA, USA). In all cases, the

    SureSelectXT Custom probe set was used.

    Sequencing reads were first aligned with the human genome reference

    sequence (hg19) using BWA version 0.7.5a with the MEM algorithm (default

    options). To minimize false-positive and false-negative calls, I only used

    uniquely mapped reads that were properly paired to avoid mapping reads that

    aligned to both target regions and pseudogenes. In addition, only selected reads

    with a high mapping quality (>30) were used. SAMTOOLS version 1.2 and

    Picard version 1.127 (http://picard.sourceforge.net) were used to process

    SAM/BAM files to duplicate the marking. Specifically, local realignment to

    reduce any misalignment in the duplicated regions was performed.

    I used RealignerTargetCreator and IndelRealigner from GATK version 3.3-1

    with known single-nucleotide polymorphisms (SNPs) and indels from dbSNP142,

    Mills and 1000G gold-standard indels at hg19 sites, and 1000G phase 1 indels at

  • 15

    hg19 sites. Known SNPs and indels were also used to perform a base

    calibration. For calling variants, Unified Genotyper was used, and the called

    variants were recalibrated by GATK based on dbSNP142, Mills indels, HapMap,

    and Omni. ANNOVAR was used to annotate the variants. The depth of

    coverage approach was applied to detect any large rearrangement or copy

    number variations. The CalculateHsMetrics module in PICARD was used to

    calculate the statistics of each parameter of the samples and depth of each exon,

    and the copy number ratio was characterized by the normalized depth of the

    exons from all the patients (see Supplementary Fig. S6 online).

    Prioritization of Detected Variants

    To identify causal variants, I first selected exonic and splicing variants

    including nonsynonymous variants and small indels. Variants with an allele

    frequency over 1% were discarded based on NHLBI-ESP 6500, the 1000 Genome

    Project, and an in-house database consisting of exomes of 192 (Samsung

    Medical Center, Korea) and 397 (Korean Bioinformation Center, Korea) normal

    Korean individuals. Variants that were not reported in dbSNP142 were included,

    and those with low-quality reads (

  • 16

    variants or small in-frame indels that were previously established as HLP in

    PKDB or previous articles. The LP variants were defined as missense variants

    or small in-frame indels that were previously established as LP in PKDB or

    previous articles or variants that were predicted to be damaging using in-silico

    analysis and segregated within a family. Variants were thought to be damaging

    if the results of in-silico analysis satisfied two of the following conditions: 1)

    damaging as predicted by a SIFT score ≤0.05, 2) damaging as predicted by

    Polyphen-2 (HumDiv), and 3) GERP ++ score ≥4. The definitions of grades are

    described in Supplementary Table S5 and Supplementary Fig. S7 online.

    For genotype-phenotype analysis, PKD1 or PKD2 mutations were further

    classified into four PKD genotypes according to the previously described manner

    14;PKD1-PT, PKD1-ID, PKD1-NT, and PKD2 genotypes.

    Validation of variants detected by TES using LR-PCR/Sanger sequencing

    The validation set (n=80) was developed to confirm the variants that was

    discovered by NGS approach by LR-PCR/Sanger sequencing. The pathogenic

    variants detected by TES were confirmed by direct sequencing of the

    corresponding gene regions using the ABI3730xl Genetic Analyzer (Applied

    Biosystems, Foster City, CA). For the duplicated part of PKD1, long-range PCR

    followed by nested-PCR was performed using PKD1-specific primers. The

    results are summarized in the Supplementary Table S6 online.

    Among 80 variants, 74 were PKD1 variants and 6 were PKD2. Among 74

    PKD1 variants, 70 variants were detected in duplicated region (exon 1-33 in

    PKD1 gene). I performed target specific Sanger sequencing for all 80 variants

    found by NGS approach. Among them, 79 variants were confirmed by Sanger

    sequencing (98.8%). The unconfirmed variant was a missense variant c.C2081T

  • 17

    (p.Pro694Leu) in exon 10 of PKD1. Although I did not perform LR-PCR/Sanger

    sequencing for whole PKD1/PKD2 regions, I believe TES is efficient in variant

    detection even in the duplicated region.

    Nevertheless, there were two challenges using NGS approach. The first

    problem was low read depth or coverage of exon 1 of PKD1 gene. This resulted

    from high GC contents of exon 1 compared to other exons (See Supplementary

    Figure S8). To overcome the problem, I performed additional exon 1 Sanger

    sequencing of PKD1 gene and found seven additional variants. The other

    problem was pseudogenic issue. In addition to validation study, I discarded

    variants with high allele frequency > 1%. The average allele frequency of the

    variants detected in the pseudogenic region was 3.7 * 10-3. Since I performed

    variant calling only for the mapping quality > 20, the mapping quality of the

    reads simultaneously aligned to PKD1 and PKD1P1 to 6 was filtered out to 0,

    and variant calling was performed only using PKD1 unique mapped read.

    Therefore, it is difficult to know pseudogene variants with my analysis method.

    I have examined the low VAF mutations in the NGS data, but it is difficult to

    know and verify whether variants are also in PKD1P1 to 6.

    Improving Mutation Detection Rate by Exon 1-Specific Sanger Sequencing

    and MLPA

    The subjects with LN or I variants or those in whom NV was detected

    underwent Sanger sequencing of PKD1 exon 1 and MLPA. Firstly, Sanger

    sequencing of PKD1 exon 1 was performed because variants might not be

    detected by NGS owing to low coverage of exon 1. For the duplicated part of

    PKD1, long-range PCR followed by nested PCR was performed using

    PKD1-specific primers as described before with minor modifications 20,21. In

  • 18

    brief, 300-400 ng of genomic DNA was amplified in a final volume of 50 μL,

    containing, 0.2 μmol/L of each primer (Cosmogentech, Korea), 0.5 mol/L betaine

    (Sigma, USA), 5% dimethyl sulfoxide (Sigma, USA) in the AccuPower ProFi

    Taq PCR Premix (Bioneer, Korea) as a Touchdown protocol. The long-range

    PCR products were purified with the Qiaquick PCR purification kit and were

    further amplified using a nested PCR. The long-range PCR for the PKD1 exon

    was done using the AccuPower HotStart PCR Premix (Bioneer, Korea). Sanger

    sequencing was performed using the ABI3730xl Genetic Analyzer (Applied

    Biosystems, Foster City, CA, USA). In addition, MLPA was performed to detect

    large deletion cases if both TES and PKD1 exon 1 sequencing revealed NM.

    MLPA analysis was performed with a SALSA MLPA KIT P351-B1/P352-B1

    PKD1-PKD2 kit (MRC-Holland, Amsterdam, the Netherlands) according to the

    manufacturer’s instructions 12 and single probe exon deletions via MLPA was

    confirmed by direct sequencing that they are not due to SNP under the probe

    (for the primer, see Supplementary Table S7 online). Finally, I looked into 7

    variants found through MLPA. The read depth did not show any particular

    problems for the cases. However, only 3 out of 6 variants were detected. The

    Normalized read depth in PKD1 and PKD2 of mutations that detected via MLPA

    was shown in Supplementary Figure S9 online. It was not a matter of read

    depth but a technical limitation of NGS approach in the case of large deletion.

    Familial Segregation Analysis

    When a novel LP variant appeared after variant calling, familial segregation

    analysis was performed to confirm pathogenicity of the variant. Family members

    of the subject with LP variant were additionally enrolled for familial segregation

    study. The parents of the subjects were recruited first. If the parents were

    absent, siblings with and without ADPKD were recruited for segregation

  • 19

    analysis. The presence or absence of ADPKD was screened with portable

    sonography (SONON 300C, Healcerion, Korea), and additional CT scans were

    performed in non-diagnostic cases. For familial segregation analysis, I utilized

    both direct Sanger sequencing and TES.

    Statistical Analysis

    Statistical analyses were performed using SPSS version 23.0 (IBM Corp.,

    Armonk, NY, USA). Analysis of covariance, the Mann-Whitney test, and the

    chi-square test were performed to compare variables between two groups. P

    values

  • 20

    FIGURE LEGENDS and TABLES

    Figure 1. Summary of identified mutations

    A. Prevalence of PKD1 vs. PKD2 mutations and cases of no mutation

    among 749 subjects (524 families); B. Prevalence of PKD1 vs. PKD2

    mutations among 630 subjects (423 families without NM families [n=101]);

    C. Frequency of each class in 423 families according to PKD1/2 status;

    D. Frequency of each mutation class and grade in 524 families.

    Of the 749 subjects, 68% were classified as PKD1, 16% as PKD2, and 16%

    as no mutation (NM). Excluding NM subjects, 81% of the 630 subjects and 82%

    of the 423 families had PKD1 mutations. The mutation classes in the 423

    families that were obtained by excluding the 101 NM families from the original

    524 families are shown in Fig. 1C. In PKD1, frameshift mutations were the

    most common (39%), and in PKD2, nonsense mutations were the most common

    (52%). The prevalence of classes of variants (number of families that shared the

    same variants) among the 524 families was shown in Fig 1D.

    Abbreviations. DP, definitively pathogenic; HLP, highly likely pathogenic; LP,

    likely pathogenic; LN, likely neutral; I, indeterminate; NV, no variant.

    Figure 2. Cross-sectional analysis between age and log(htTKV) or eGFR

    A. Scatter plot and trend line of log (htTKV) according to genotype. B.

    Scatter plot and trend line of log (htTKV) according to genotype.

    PKD1-PT genotype had a larger htTKV or lower eGFR than the PKD1-NT

    or PKD2 genotypes. log(htTKV) increased rapidly with age in the PKD1-PT

    and PKD2 genotypes, but it did so less intensely in the PKD1-NT genotype

  • 21

    and those with NM. The PKD1-PT genotype also showed a steeper slope for

    eGFR than the other genotypes except PKD1-ID genotype.

    Abbreviations. htTKV, height-adjusted total kidney volume; eGFR, estimated

    glomerular filtration rate; PKD1-PT, PKD1 protein-truncating; PKD1-NT,

    PKD1 non-truncating; PKD1-ID, PKD1 small in-frameshift indels.

    Figure 3. Renal outcomes and death according to type of PKD1/2

    mutation. A. Survival analysis of ESRD progression. B. Survival analysis

    of all-cause mortality

    When PKD1-PT genotype was used as a reference, the PKD2 genotype

    showed a 0.22 times lower HR of ESRD. The familial clustering effect was

    significant (P=0.037). The incidence of death did not have difference according to

    genotypes.

    Abbreviations. ESRD, end-stage renal disease; PKD1-PT, PKD1

    protein-truncating; PKD1-NT, PKD1 non-truncating; PKD1-ID, PKD1 small

    in-frameshift indels; HR, hazard ratio; CI, confidence interval.

  • 22

    Table 1. Frequency of each class and grade in 524 families

    A. Class Nonsense

    Frameshift

    Typ icalsplicing

    L a r g edel/dup

    Missense

    In-frameindel NV

    ■ PKD1 86 (19) 136 (24) 16 (3) 8 (0) 90 (17) 12 (3)

    ■ PKD2 39 (20) 11 (0) 8 (1) 5 (1) 12 (4) 0 (0)

    ■ NM 53 (20) 48

    Total 125 (39) 147 (24) 24 (4) 13 (1) 155 (41) 12 (3) 524 (112)

    B. Grade DP HLP LP LN I NV

    ■ PKD1 284 (46) 61 (20) 39 (0)

    ■ PKD2 63 (22) 10 (4) 2 (0)

    ■ NM 39 (19) 14 (1) 48

    Total 311 (68) 71 (24) 41 (0) 39 (19) 14 (1) 524 (112)

    The number in () means the number of families that shared the same variants

    Abbreviation. DP, definitively pathogenic; HLP, highly likely pathogenic; LP, likely

    pathogenic; LN, likely neutral; I, indeterminate; NV, no variant

  • 23

    Table 2. Baseline Clinical Characteristics According to PKD Genotypes

    Variables Total PKD1-PT PKD1-ID PKD1-NT PKD 2 P value

    N (%) 630 371 (59.0) 21 (3.0) 119 (19.0) 119 (19.0)

    Male, n (%) 304 (48.3) 175 (47.2) 12 (57.1) 68 (57.1) 49 (41.2) 0.975

    Age, [mean±SD] 45.6±12.9 44.2±12.9 45.0±10.6 43.3±12.6 48.9±13.0 0.001

    1st visit age, 569,

    [mean±SD]41.9±12.9 40.2±12.9 39.8±10.0 43.3±12.6 45.9±13.0

  • 24

    Table 3. A. Multilevel multivariate linear regression of log(htTKV) or

    eGFR according to genotype B. Multivariate linear regression of

    log(htTKV) or eGFR for comparing each slope according to genotype

    A. Log(htTKV) eGFR

    *Variable Coefficient [95% CI] P value Coefficient [95% CI] P value

    Age 0.012 [0.010, 0.0.013]

  • 25

    Figure 1

  • 26

    Figure 2

    Figure 3

  • 27

    SUPPLEMENTARY INFORMATION

    Supplementary Table S1. Demographic Findings of 749 Patients (524

    Families)

    Variable Total Male Female P value

    N (%) 749 360 (48.1) 389 (51.9)

    Age [mean±SD] 46.4±13.3 44.6±14.4 48.0±12.0

  • 28

    Family ID Exon Codon cDNA change Protein change Predictedeffect

    Domain PKDB CanadaDB

    Koreanindividuals

    Koreanfamilies

    716 1 5 c.13_25del p.Ala5TrpfsX53 Frameshift LRR 1 0 1 1

    61 1 9 c.25_26dup p.Ala9TrpfsX64 FrameshiftExtracellularsequence

    0 0 1 1

    126 1 15 c.43_62del p.Leu15AlafsX92 Frameshift Extracellularsequence 0 0 3 1

    57 1 33 c.78_96dup p.Cys33ArgfsX87 Frameshift LRR 0 0 1 1

    473, 886 1 44 c.129_135delCGGCGCC p.Gly44ProfsX27 Frameshift LRR 0 0 2 2

    904 1 48 c.142del p.Arg48AlafsX25 Frameshift LRRCT 0 0 1 1

    260 1 56 c.165_171delGCTGCGG p.Leu56ArgfsX15 Frameshift LRR 1 0 1 1

    185 4 123 c.369del p.Ser123ArgfsX167 Frameshift Extracellularsequence 0 0 1 1

    224 4 142 c.423_424del p.Glu142AlafsX36 Frameshift LRRCT 0 0 2 1

    415 5 275 c.822_832del p.Ala275GlyfsX92 Frameshift PKD 1 0 0 1 1

    181, 478,657 5 286 c.856_862del p.Gly286X Frameshift PKD 1 1 3x 3 3

    570 6 442 c.1324del p.Ala442ProfsX23 Frameshift C-type lectin 0 0 1 1

    Supplementary Table S2. Mutations found in this study

    Supplementary Table S2A. PKD1 protein-truncating mutations in study families

  • 29

    16 7 499 c.1495del p.Glu499SerfsX59 Frameshift C-type lectin 0 0 3 1

    67 8 557 c.1669_1670del p.Leu557AspfsX29 FrameshiftExtracellularsequence

    0 0 1 1

    649 9 592 c.1776del p.Glu592AsnfsX192 Frameshift Extracellularsequence

    0 0 1 1

    599 10 630 c.1889dup p.Asp630GlyfsX83 FrameshiftExtracellularsequence

    0 0 1 1

    229 10 668 c.2001dup p.Gly668TrpfsX46 FrameshiftLDL-receptor

    class A;aTypical

    0 0 2 1

    363 10 681 c.2040dup p.Ala681CysfsX33 FrameshiftExtracellularsequence 0 0 1 1

    302, 414 10 696 c.2085dup p.Ala696ArgfsX18 Frameshift Extracellularsequence 0 0 2 2

    793 11 797 c.2390dup p.Val797GlyfsX19 Frameshift PKD 2 0 0 1 1

    611 11 824 c.2470del p.Leu824CysfsX74 Frameshift Extracellularsequence 0 0 2 1

    79, 204 11 832 c.2494dup p.Arg832ProfsX40 Frameshift Extracellularsequence 0 0 2 2

    252 11 873 c.2618_2621del p.Val873AlafsX24 Frameshift PKD 3 1 1x 5 1

    20 11 887 c.2659del p.Trp887GlyfsX11 Frameshift PKD 3 0 0 12 1

    73 11 906 c.2716del p.Glu906SerfsX21 Frameshift PKD 3 0 0 2 1

    110 11 928 c.2784_2796del p.Glu928ValfsX18 Frameshift PKD 3 0 0 1 1

    541 12 955 c.2865del p.Val955TrpfsX20 Frameshift PKD 4 1 0 1 1

    318 12 966 c.2896del p.Arg966GlyfsX10 Frameshift PKD 4 0 0 1 1

  • 30

    197 13 1029 c.3085_3088del p.Ala1029CysfsX8 Frameshift PKD 5 0 0 1 1

    146 14 1075 c.3223_3226del p.Phe1075ArgfsX28 Frameshift PKD 5 0 0 1 1

    54 15 1108 c.3323del p.Ser1108LeufsX7 Frameshift PKD 5 0 0 5 1

    4 15 1117 c.3349dup p.Gln1117ProfsX19 Frameshift PKD 5 0 0 3 1

    748 15 1168 c.3503_3504del p.Pro1168ArgfsX42 Frameshift PKD 6 1 0 1 1

    900 15 1174 c.3520_3521del p.Gln1174AlafsX36 Frameshift PKD 6 0 0 1 1

    245 15 1205 c.3613_3622del p.Asp1205SerfsX12 Frameshift PKD 6 0 0 3 1

    160 15 1228 c.3684del p.Val1228TrpfsX44 Frameshift PKD 7 1 0 2 1

    47 15 1248 c.3744_3754del p.Asp1248AlafsX48 Frameshift PKD 7 0 0 2 1

    265 15 1303 c.3906dup p.Ala1303ArgfsX8 Frameshift PKD 8 0 0 1 1

    209 15 1347 c.4041_4042del p.His1347GlnfsX83 Frameshift PKD 8 1 X1 1 1

    365, 887 15 1357 c.4069del p.Leu1357TrpfsX9 Frameshift PKD 8 1 0 2 2

    76 15 1357 c.4070del p.Leu1357ArgfsX9 Frameshift PKD 8 1 0 5 1

    15 15 1439 c.4315_4324delinsCAACTGTTG p.Gly1439GlnfsX5 Frameshift PKD 9 0 0 3 1

    279 15 1460 c.4379_4380del p.Val1460GlyfsX62 Frameshift PKD 9 0 0 1 1

    368 15 1467 c.4401_4407del p.Val1467AlafsX64 Frameshift PKD 9 0 0 1 1

    369 15 1478 c.4434del p.Leu1478TrpfsX55 Frameshift PKD 8 0 0 1 1

    888 15 1529 c.4586del p.Gly1529AlafsX5 Frameshift PKD 10 0 0 1 1

  • 31

    732 15 1551 c.4650dup p.Leu1551AlafsX27 Frameshift PKD 11 0 0 1 1

    21 15 1560 c.4679_4691del p.Val1560GlyfsX3 Frameshift PKD 11 0 0 4 1

    199 15 1599 c.4797del p.Tyr1599X Frameshift PKD 11 0 novel 1 1

    500 15 1642 c.4924del p.Arg1642AlafsX80 Frameshift REJ 0 0 1 1

    1, 33, 336,457, 542,562, 575,635, 666

    15 1672 c.5014_5015del p.Arg1672GlyfsX98 Frameshift PKD 12 1 29x 17 9

    147, 553 15 1841 c.5517_5521dup p.Val1841GlyfsX110 Frameshift PKD 14 0 0 2 2

    901 15 1877 c.5629del p.Ala1877ProfsX72 Frameshift PKD14 0 0 1 1

    582 15 2075 c.6224_6225insGTTG p.Ser2075LeufsX6 Frameshift PKD 11 0 0 1 1

    364 15 2114 c.6341_6344del p.Tyr2114X Frameshift PKD 17 0 0 1 1

    238 15 2183 c.6549_6550del p.Glu2183ValfsX77 Frameshift REJ 0 0 3 1

    215 15 2257 c.6770dup p.Pro2257AlafsX4 Frameshift REJ 0 0 1 1

    84 15 2270 c.6808_6811del p.Asp2270HisfsX43 Frameshift REJ 1 0 1 1

    376 16 2332 c.6994_7000del p.Ala2332TrpfsX7 Frameshift REJ 1 0 1 1

    17 16 2334 c.6994_7000dup p.Val2334GlyfsX88 Frameshift REJ 1 0 6 1

    136 16 2335 c.7003dup p.Glu2335GlyfsX85 Frameshift REJ 0 0 2 1

    232 18 2449 c.7345dup p.Thr2449AsnfsX52 Frameshift REJ 0 0 1 1

    288 18 2470 c.7408_7414del p.Pro2470TrpfsX148 Frameshift REJ 0 0 1 1

  • 32

    60 19 2527 c.7579_7580del p.Val2527LeufsX67 Frameshift REJ 0 0 1 1

    183 19 2542 c.7625del p.Gly2542ValfsX78 Frameshift REJ 0 0 1 1

    285 20 2581 c.7741_7742dup p.Thr2581GlnfsX39 Frameshift REJ 0 0 1 1

    157, 655 20 2606 c.7816del p.Gln2606SerfsX14 Frameshift REJ 0 0 4 2

    559 21 2648 c.7942del p.Glu2648ArgfsX6 Frameshift REJ 0 0 1 1

    85 21 2658 c.7973_7974del p.Val2658GlyfsX2 Frameshift REJ 0 0 2 1

    277, 539 22 2674 c.8019dup p.Pro2674AlafsX148 Frameshift REJ 0 0 2 2

    78, 128 23 2776 c.8327_8330del p.Leu2776ArgfsX98 Frameshift REJ 0 0 4 2

    551 23 2800 c.8399del p.Pro2800GlnfsX75 Frameshift REJ 0 0 1 1

    315 23 2859 c.8570_8574dup p.Ala2859ProfsX18 Frameshift Extracellularsequence 0 0 1 1

    592 23 2861 c.8581dup p.Ile2861AsnfsX76 Frameshift Extracellularsequence 0 0 1 1

    586 23 2881 c.8642_8655del p.Asp2881GlyfsX51 Frameshift Extracellularsequence 0 0 1 1

    556, 589,746 24 2951 c.8851dup p.Arg2951ProfsX4 Frameshift

    Extracellularsequence 0 0 3 3

    49, 789 25 3028 c.9083_9084del p.Glu3028GlyfsX40 Frameshift GPS 1 0 2 2

    754 26 3080 c.9240_9241del p.Ala3080CysfsX96 Frameshift Transmembrane 0 0 1 1

    397 28 3228 c.9683dup p.Leu3228ProfsX24 Frameshift PLAT 0 0 1 1

    425 29 3262 c.9780_9784dup p.Ile3262SerfsX56 Frameshift Cytoplasmic 0 0 1 1

  • 33

    sequence

    881 29 3281 c.9841del p.Ala3281ProfsX35 FrameshiftCytoplasmicsequence

    0 0 1 1

    627 29 3281 c.9843del p.Thr3281ProfsX34 FrameshiftCytoplasmicsequence

    0 0 1 1

    407 31 3382 c.10144dup p.Thr3382AsnfsX8 FrameshiftCytoplasmicsequence

    0 0 1 1

    766 31 3388 c.10162_10163dup p.Glu3388LeufsX8 FrameshiftCytoplasmicsequence 0 0 1 1

    118 33 3419 c.10255dup p.Trp3419LeufsX7 Frameshift Cytoplasmicsequence

    0 0 2 1

    107 35 3529 c.10585_10586dup p.Gln3529HisfsX56 Frameshift Cytoplasmicsequence 0 0 1 1

    39 36 3569 c.10706_10707del p.Val3569GlyfsX56 Frameshift Transmembrane 0 0 2 1

    119 36 3581 c.10742del p.Pro3581ArgfsX3 Frameshift Extracellularsequence 1 0 2 1

    161 37 3613 c.10836_10837del p.Tyr3613LeufsX12 Frameshift Cytoplasmicsequence 0 0 1 1

    96 37 3655 c.10963del p.Leu3655TrpfsX28 Frameshift Cytoplasmicsequence 1 0 1 1

    42 40 3757 c.11270_11274del p.Leu3757ProfsX56 Frameshift Extracellularsequence 0 0 2 1

    188 40 3789 c.11365_11366del p.Asn3789TrpfsX25 Frameshift Extracellularsequence 0 0 2 1

    14 40 3793 c.11376dup p.Thr3793AspfsX22 Frameshift Extracellularsequence 0 0 2 1

    786 40 3793 c.11378del p.Thr3793SerfsX32 Frameshift Extracellularsequence 0 0 1 1

  • 34

    676 41 3817 c.11450dup p.Tyr3817LeufsX142 FrameshiftExtracellularsequence

    1 0 1 1

    141 41 3843 c.11527del p.Asp3843ThrfsX101 FrameshiftExtracellularsequence

    0 0 1 1

    600 42 3851 c.11551del p.Leu3851TrpfsX93 FrameshiftExtracellularsequence

    1 0 1 1

    9 42 3888 c.11661dup p.Ala3888CysfsX72 Frameshift Cytoplasmicsequence

    0 0 3 1

    396 42 3895 c.11684del p.Gly3895AlafsX49 FrameshiftExtracellularsequence

    0 0 1 1

    572 43 3994 c.11973_11980dup p.Leu3994ProfsX47 FrameshiftTransmem

    brane0 0 1 1

    812 43 3995 c.11984_11996del p.Phe3995SerfsX39 Frameshift Transmembrane 0 0 1 1

    298 44 4014 c.12042del p.Phe4014LeufsX24 Frameshift Cytoplasmicsequence 0 0 2 1

    322 44 4033 c.12097del p.Val4033TrpfsX5 Frameshift Transmembrane 0 0 1 1

    36, 159 44 4034 c.12100del p.Val4034CysfsX4 Frameshift Transmembrane 0 0 6 2

    540 45 4059 c.12175dup p.Gln4059ProfsX97 Frameshift Extracellularsequence 1 0 1 1

    583 45 4066 c.12197del p.Pro4066LeufsX131 Frameshift Extracellularsequence 0 0 1 1

    293 45 4073 c.12217_12218del p.Leu4073ValfsX82 Frameshift Extracellularsequence 0 0 1 1

    77 45 4083 c.12247_12259del p.Pro4083TrpfsX110 Frameshift Extracellularsequence 0 0 2 1

    94, 501 45 4102 c.12305del p.Ala4102ValfsX95 Frameshift Transmembrane 0 0 3 2

  • 35

    522 45 4103 c.12307_12310del p.Val4103PhefsX93 FrameshiftTransmem

    brane0 0 1 1

    38, 88 46 4202 c.12605_12632del p.Arg4202ProfsX146 FrameshiftCytoplasmicsequence

    1 0 2 2

    28 46 4224 c.12669_12670del p.Gln4224ValfsX2 FrameshiftCytoplasmicsequence

    0 0 2 1

    167, 382 3 117 c.350T>G p.Leu117X Nonsense Extracellularsequence

    0 0 2 2

    105, 860 4 135 c.405G>A p.Trp135X Nonsense LRRCT 0 0 4 2

    89 5 232 c.696T>A p.Cys232X Nonsense WSC 0 0 1 1

    100, 535 5 236 c.706C>T p.Gln236X Nonsense WSC 1 0 3 2

    102 5 400 c.1198C>T p.Arg400X Nonsense Extracellularsequence 1 0 1 1

    526 6 437 c.1309C>T p.Gln437X Nonsense C-type lectin 0 0 1 1

    132 6 439 c.1316G>A p.Trp439X Nonsense C-type lectin 0 0 3 1

    243 7 523 c.1568C>A p.Ser523X Nonsense C-type lectin 0 0 2 1

    6 9 597 c.1789C>T p.Gln597X Nonsense Extracellularsequence 0 0 2 1

    576 10 680 c.2040T>G p.Tyr680X Nonsense Extracellularsequence 0 0 1 1

    619 11 718 c.2152C>T p.Gln718X Nonsense Extracellularsequence 1 0 1 1

    557 11 901 c.2703G>A p.Trp901X Nonsense PKD 3 0 0 1 1

    385 12 987 c.2959C>T p.Gln987X Nonsense PKD4 1 0 1 1

  • 36

    269 13 1023 c.3067C>T p.Gln1023X Nonsense PKD 5 0 0 1 1

    2 15 1112 c.3334G>T p.Glu1112X Nonsense PKD 5 0 0 2 1

    273 15 1116 c.3346C>T p.Gln1116X Nonsense PKD 5 0 0 1 1

    784 15 1172 c.3514C>T p.Gln1172X Nonsense PKD 6 1 0 1 1

    631 15 1174 c.3520C>T p.Gln1174X Nonsense PKD 6 0 0 1 1

    153 15 1203 c.3607C>T p.Gln1203X Nonsense PKD 6 0 0 2 1

    469 15 1436 c.4306C>T p.Arg1436X Nonsense PKD 9 1 1x 1 1

    24, 59, 91,135, 354,

    67415 1483 c.4447C>T p.Gln1483X Nonsense PKD 10 1 0 8 6

    403 15 1599 c.4797C>A p.Tyr1599X Nonsense PKD 11 0 novel 2 1

    150 15 1621 c.4861C>T p.Gln1621X Nonsense PKD 11 1 0 2 1

    66, 127,439, 552 15 1826 c.5477G>A p.Trp1826X Nonsense PKD 14 0 0 5 4

    864 15 1837 c.5510G>A p.Trp1837X Nonsense PKD 14 0 0 1 1

    207 15 1874 c.5621G>A p.Trp1874X Nonsense PKD 14 0 0 1 1

    543 15 1903 c.5707C>T p.Gln1903X Nonsense PKD 15 1 0 1 1

    227 15 2039 c.6115C>T p.Gln2039X Nonsense PKD 16 1 1x 2 1

    462, 493 15 2067 c.6199C>T p.Gln2067X Nonsense PKD 17 1 0 2 2

    667 15 2164 c.6491C>G p.Ser2164X Nonsense REJ 1 0 1 1

  • 37

    29, 137,892

    15 2184 c.6549dup p.Glu2184X Nonsense REJ 1 0 5 3

    156 15 2187 c.6561G>A p.Trp2187X Nonsense REJ 0 0 1 1

    81 15 2246 c.6736C>T p.Gln2246X Nonsense REJ 1 0 1 1

    454 15 2305 c.6913C>T p.Gln2305X Nonsense REJ 0 0 1 1

    35, 43 17 2388 c.7164C>G p.Tyr2388X Nonsense REJ 0 0 3 2

    734 18 2405 c.7214G>A p.Trp2405X Nonsense REJ 0 0 1 1

    345 18 2430 c.7288C>T p.Arg2430X Nonsense REJ 1 7x 1 1

    206 19 2519 c.7555C>T p.Gln2519X Nonsense REJ 1 0 1 1

    70, 267 20 2606 c.7816C>T p.Gln2606X Nonsense REJ 1 0 2 2

    324, 883 21 2639 c.7915C>T p.Arg2639X Nonsense REJ 1 5x 3 2

    903 22 2699 c.8095C>T p.Gln2699X Nonsense REJ 1 0 1 1

    629 23 2738 c.8212G>T p.Glu2738X Nonsense REJ 0 0 1 1

    452 23 2742 c.8224G>T p.Glu2742X Nonsense REJ 0 0 1 1

    797 23 2810 c.8428G>T p.Glu2810X Nonsense REJ 1 3x 1 1

    564 23 2824 c.8470C>T p.Gln2824X Nonsense REJ 0 0 1 1

    761, 810 27 3183 c.9547C>T p.Arg3183X Nonsense PLAT 1 0 2 2

    588 28 3195 c.9585G>A p.Trp3195X Nonsense PLAT 0 0 2 1

    388 28 3206 c.9616C>T p.Gln3206X Nonsense PLAT 1 0 1 1

  • 38

    890 30 3350 c.10048A>T p.Lys3350X NonsenseCytoplasmicsequence

    0 0 1 1

    145 32 3395 c.10180C>T p.Gln3394X NonsenseCytoplasmicsequence

    1 0 1 1

    95 34 3475 c.10420C>T p.Gln3474X NonsenseCytoplasmicsequence

    1 0 2 1

    104 34 3478 c.10432G>T p.Glu3478X Nonsense Cytoplasmicsequence

    0 0 1 1

    45 34 3488 c.10459C>T p.Gln3487X NonsenseCytoplasmicsequence

    1 0 1 1

    148 36 3603 c.10806G>A p.Trp3602X NonsenseTransmem

    brane0 0 2 1

    7 37 3620 c.10855A>T p.Lys3619X Nonsense Cytoplasmicsequence 0 0 2 1

    113 38 3702 c.11101C>T p.Gln3701X Nonsense Extracellularsequence 1 0 2 1

    567 39 3755 c.11263G>T p.Glu3755X Nonsense Extracellularsequence 0 0 1 1

    607 40 3796 c.11388T>A p.Tyr3796X Nonsense Extracellularsequence 0 0 1 1

    239 41 3807 c.11420G>A p.Trp3807X Nonsense Extracellularsequence 1 0 5 1

    5, 117 41 3808 c.11423G>A p.Gly3808X Nonsense Extracellularsequence 1 0 5 2

    75 42 3871 c.11611G>T p.Glu3871X Nonsense Extracellularsequence 1 0 2 1

    74 43 3921 c.11763G>A p.Trp3921X Nonsense Cytoplasmicsequence 1 3x 1 1

    93 44 4003 c.12007C>T p.Gln4003X Nonsense Cytoplasmicsequence 1 4x 1 1

  • 39

    22 44 4041 c.12121C>T p.Gln4041X Nonsense PKD 3 1 8x 6 1

    109 45 4065 c.12195C>A p.Cys4065X NonsenseExtracellularsequence

    0 0 1 1

    64 45 4126 c.12377dup p.Tyr4126X Nonsense Cytoplasmicsequence

    0 0 2 1

    210 46 4246 c.12736C>T p.Gln4246X NonsenseCytoplasmicsequence

    0 0 1 1

    625 12 951 c.2853+1G>ATypicalsplicing

    PKD 4 1 0 1 1

    549 15 1098 c.3295+1G>TTypicalsplicing

    PKD 5 0 0 1 1

    274 20 2568 c.7703+1G>C Typicalsplicing REJ 0 0 1 1

    246, 538,656, 884 21 2673

    c.8017-2_8017-1del

    Typicalsplicing REJ 1 0 5 4

    129 23 2672 c.8017-2A>G Typicalsplicing REJ 1 0 2 1

    171 26 3067 c.9201+1G>C TypicalsplicingTransmembra

    ne 1 0 1 1

    638 33 3406 c.10217+1G>A TypicalsplicingCytoplasmicsequence 0 0 1 1

    213 32 3476 c.10429-2_10429-1delCATypicalsplicing

    Cytoplasmicsequence 0 0 1 1

    595 39 3671 c.11014-1del TypicalsplicingCytoplasmicsequence 0 0 2 1

    34 41 3757 c.11270-2A>C TypicalsplicingExtracellularsequence 0 0 3 1

    608 44 3903 c.11710-1G>T TypicalsplicingTransmembra

    ne 0 0 1 1

  • 40

    194 44 3904 c.11713-1G>ATypicalsplicing

    Transmembrane

    0 0 1 1

    312 45 4000 c.12001-2A>GTypicalsplicing

    Transmembrane

    0 0 1 1

    299 4 exon 4 exon 4 Large deletion 0 0 1 1

    590 6 exon 6 exon 6 Large deletion 0 0 1 1

    316 15 exon 15 exon 15 Large deletion 0 0 1 1

    628 15 2219 c.6657_6671del p.Arg2219_Pro2224del

    Large deletion REJ 1 0 1 1

    162 19 exon19 dup exon19 dup Largeduplication 0 0 2 1

    878 21 exon 21 exon 21 Large deletion 0 0 1 1

    23 46 3791 c.11373_11390del p.Gly3791_Ser3797del Large deletionExtracellularsequence 0 0 3 1

    643 46 4201 c.12601_12628del p.Gly4201SerfsX147 Large deletion Cytoplasmicsequence 0 0 1 1

    Supplementary Table S2B. PKD1 in-frame insertion/deletions in study families

  • 41

    FamilyID

    Exon Codon cDNA change Protein change Domain PROVEANscore

    PKDB

    CanadaDB

    Koreanindividu

    als

    Koreanfamilies

    Segregation

    patients/control

    99 36 3565 c.10694_10696del p.Val3565delTransmem

    brane-7.49 0 0 2 1

    26, 27,679

    10 689 c.2065_2067del p.Ser689delExtracellularsequence

    -5.37 0 0 6 3 5,0

    133 15 2088 c.6258_6263dup p.Pro2087_Arg2088dup PKD 17 -5.26 0 0 1 1 1,1

    216 15 2217 c.6650_6664del p.Val2217_Leu2221del REJ -16.42 1 0 1 1

    482 15 2251 c.6752_6754del p.Val2251del REJ -7.19 0 0 1 1

    87 20 2613 c.7837_7839del p.Leu2613del REJ -9.98 1 0 3 1 2,0

    130 23 2770 c.8308_8310del p.Asn2770del REJ -11.28 0 0 3 1 3,0

    124,208 24 2979 c.8935_8937del p.Phe2979del

    Extracellularsequence -10.76 1 0 2 2

    37 25 3031 c.9092_9094del p.Leu3031del GPS -14.07 0 0 2 1 2,0

    Supplementary Table S2C. PKD1 non-truncating mutations in study families

  • 42

    FamilyID

    Exon Codon cDNAchange

    Proteinchange

    Domain SIFT polyphen

    GERP++

    PKDB

    CanadaDB

    Koreanindividual

    Koreanfamily

    Segregation

    patients/control

    334 2 75 c.224C>T p.Ser75Phe LRR 1 0 1 4.27 1 0 4 1

    355 2 93 c.278T>C p.Leu93Pro C-typelectin

    0 1 4.27 0 0 1 1

    898 3 120 c.359T>C p.Ile120ThrExtracellularsequence

    00.873

    4.26 1 0 1 1

    304 5 381 c.1141G>A p.Gly381ArgExtracellularsequence

    0.090.999

    5.05 0 0 1 1

    264 6 419 c.1256G>A p.Cys419Tyr C-typelectin 0 1 5.05 0 0 1 1

    186 7 466 c.1396G>A p.Val466Met C-typelectin 0.01 1 4.82 1 2x 2 1

    281, 297,465 7 515 c.1543G>T p.Gly515Trp

    C-typelectin 0 1 4.85 1 0 4 3

    41, 306 7 464 c.1391T>C p.Leu464Pro C-typelectin 0.370.999 4.82 0 0 3 2 3,0

    233 10 693 c.2078G>A p.Gly693Glu Extracellularsequence 0.01 1 5.17 0 0 3 1 3,4

    519 10 699 c.2095T>C p.Ser699Pro Extracellularsequence 0.210.943 4.01 0 0 1 1 2,1

    342 11 791 c.2371C>T p.Arg791Trp PKD 4 0.08 0.999 4.43 0 0 1 1 2,0

    616 11 869 c.2605C>T p.Arg869Cys PKD 12 0.04 0.992 3.48 0 0 1 1 0,1

    248 11 944 c.2830C>T p.Arg944Cys PKD 4 0 1 4.96 1 0 1 1

  • 43

    241 11 845 c.2534T>C p.Leu845SerExtracellularsequence

    0 1 5.09 1 0 2 1

    405 12 960 c.2878G>A p.Gly960Ser PKD 4 0 1 4.79 1 0 1 1 0,1

    444 14 1077 c.3229G>A p.Val1077Ile PKD 5 0.230.982 5.33 0 0 1 1

    614 14 1055 c.3163T>G p.Trp1055Gly PKD 5 0.09 1 5.29 0 0 1 1

    745 15 1587 c.4759C>T p.Arg1587Cys PKD 11 0.02 1 4.14 0 0 1 1

    485 15 2085 c.6254C>G p.Pro2085Arg PKD 17 0.04 1 5.49 0 0 1 1

    591 15 2215 c.6643C>T p.Arg2215Trp REJ 0.010.999 5.49 1 1x 1 1

    530 15 2235 c.6704C>T p.Ser2235Leu REJ 0 0.999 5.35 0 0 2 1

    624, 848 15 1414 c.4241G>C p.Trp1414Ser PKD 9 0.01 1 5.71 0 0 2 2 2,0

    332 15 1544 c.4630G>A p.Val1544Met PKD 10 0 1 5.36 0 0 2 1 2,0

    189 15 1547 c.4640G>A p.Arg1547His PKD 10 0.14 1 4.35 0 0 1 1

    219 15 1666 c.4997G>C p.Trp1666Ser PKD 12 0 1 5.41 0 0 1 1

    865 15 1832 c.5495G>T p.Gly1832Val PKD14 0.01 1 4.67 0 0 1 1

    98 15 2159 c.6475G>T p.Val2159Leu REJ 0.74 1 5.49 0 0 1 1

    561 15 1109 c.3327T>A p.Asn1109Lys PKD 5 0.01 0.998 -4.05 0 0 1 1

    83 15 1652 c.4955T>C p.Leu1652Pro PKD 12 0 1 5.30 0 0 2 1 2,0

    261 15 2006 c.6016T>G p.Trp2006Gly PKD 16 0 1 5.59 0 0 1 1

  • 44

    528 15 2132 c.6395T>G p.Phe2132Cys PKD 17 0.18 1 4.38 1 0 1 1

    328 18 2445 c.7333A>C p.Thr2445Pro REJ 0.130.999

    4.81 0 0 2 1

    555 18 2434 c.7300C>T p.Arg2434Trp REJ 0 1 3.78 1 0 1 1

    348 18 2436 c.7307G>T p.Gly2436Val REJ 0.04 1 4.81 0 0 1 1

    569 18 2476 c.7427G>A p.Cys2476Tyr REJ 0 1 4.77 0 0 1 1

    513 19 2557 c.7670A>G p.Asp2557Gly REJ 0 1 5.14 1 0 1 1

    668 19 2516 c.7546C>T p.Arg2516Cys REJ 0.01 1 4.96 1 8x 1 1

    476 19 2497 c.7490G>T p.Gly2497Val PKD 16 0 1 4.96 0 0 1 1

    773 19 2526 c.7576T>C p.Cys2526Arg REJ 0.24 0.999 4.96 0 0 1 1

    644, 893 23 2863 c.8587A>G p.Ile2863Val Extracellularsequence 0.330.874 4.89 0 0 2 2 2,0

    108, 169,176, 211,214, 253,471, 777,

    854

    23 2771 c.8311G>A p.Glu2771Lys REJ 0.01 1 4.89 0 0 12 9 10,1

    231 23 2783 c.8347G>C p.Ala2783Pro REJ 0.04 1 4.62 0 0 1 1

    713 24 2963 c.8888T>C p.Ile2963Thr Extracellularsequence 0 1 4.55 0 0 2 1 2,0

    201, 594 25 3012 c.9034A>C p.Thr3012Pro GPS 0.02 1 3.45 0 0 4 2 3,0

    677 25 3046 c.9136C>T p.Arg3046Cys GPS 0 0.994 4.38 0 0 1 1

  • 45

    143 25 3043 c.9128G>A p.Cys3043Tyr GPS 0 1 4.38 0 0 1 1

    642 25 3014 c.9041T>C p.Leu3014Pro GPS 0.12 1 4.55 0 0 1 1

    429 26 3132 c.9395C>T p.Ser3132Leu PLAT 0 1 4.51 1 0 1 1

    837 26 3113 c.9338G>A p.Gly3113GluCytoplasmicsequence

    0.02 1 4.51 0 0 1 1

    158, 203 27 3135 c.9404C>T p.Thr3135Met PLAT 0 1 4.49 1 0 2 2

    350, 615 28 3194 c.9581C>T p.Ala3194Val PLAT 0.01 1 4.84 0 0 2 2 4,0

    69 29 3253 c.9758T>C p.Leu3253ProCytoplasmicsequence

    0.01 1 4.69 0 0 2 1 2,0

    50 33 3468 c.10402G>A p.Asp3468AsnCytoplasmicsequence 0.01

    0.999 4.45 1 0 1 1

    25 36 3591 c.10769C>T p.Ser3590Phe Transmembrane 0 1 3.98 0 0 2 1 2,0

    192 36 3603 c.10807G>A p.Glu3603Lys Cytoplasmicsequence 0.01 1 3.98 1 0 2 1

    585 37 3612 c.10835T>C p.Leu3612Pro Cytoplasmicsequence 0.01 1 4.81 0 0 1 1

    236 38 3673 c.11018T>C p.Leu3673Pro Extracellularsequence 0.020.997 3.46 0 0 2 1

    164 39 3749 c.11246G>A p.Arg3749Gln Extracellularsequence 0.07 1 4.04 1 0 1 1

    894 39 3752 c.11255G>A p.Arg3752Gln Extracellularsequence 0.08 1 4.04 1 0 1 1

    90 41 3822 c.11465T>A p.Leu3822Gln Extracellularsequence 0 1 4.49 0 0 1 1

    212 42 3846 c.11536A>G p.Ser3846Gly Extracellular 0.02 0.95 3.88 0 0 1 1

  • 46

    sequence 8

    296 42 3846 c.11538C>G p.Ser3846ArgExtracellularsequence

    0.010.996

    3.88 0 0 1 1

    30 42 3853 c.11558T>C p.Leu3853ProExtracellularsequence

    0 1 3.97 0 0 4 1 2,0

    527 37 3634 c.10901C>A p.Ala3634AspCytoplasmicsequence

    0.570.999

    3.91 0 0 1 1

    242 4 154 c.461C>T p.Thr154Met LRRCT 00.547 -3.45 0 0 2 1

    249 15 1768 c.5303C>A p.Thr1768Asn

    PKD 13 0.05 0.984

    2.6 0 0 2 1

    496 15 2095 c.6285C>A p.Asp2095Glu PKD 17 0.13 0.976 3.51 0 0 1 1

    578 25 2997 c.8991C>G p.Ser2997Arg Extracellularsequence 0.070.999 3.57 0 0 1 1

    320, 510 37 3650 c.10948G>A p.Gly3650Ser Cytoplasmicsequence 0.09 1 3.71 1 0 3 2

    421 15 1706 c.5116G>A p.Ala1706Thr PKD 12 0.42 0.506 4.44 0 0 1 1

    466 15 1951 c.5852G>A p.Arg1951Gln PKD 15 0.82 0.504 4.24 0 0 1 1

    646 23 2765 c.8294G>A p.Arg2765His REJ 0.36 0.543 3.87 0 0 1 1

    289 43 3949 c.11846T>C p.Leu3949Pro Transmembrane 0.17

    0.999 3.12 0 0 1 1

  • 47

    Family ID Exon Codon cDNA change Protein change PredictedEffect

    Domain PKDB

    CanadaDB

    Koreanindividuals

    Koreanfamilies

    101 1 63 c.187_188insAG p.Ala63GlufsX55 FrameshiftCytoplasmicsequence

    0 0 1 1

    902 1 158 c.471dup p.Glu158ArgfsX55 Frameshift Poly-Arg 0 0 1 1

    618 1 197 c.588_595del p.Leu197SerfsX13 Frameshift Cytoplasmicsequence

    0 0 1 1

    134 5 381 c.1142del p.Gly381GlufsX71 Frameshift Extracellularsequence

    0 0 2 1

    329 7 569 c.1704dup p.Val569CysfsX4 Frameshift Transmembrane 0 0 4 1

    72 11 710 c.2127dup p.Lys710X Frameshift EF-handdomain 0 0 1 1

    152 11 714 c.2142del p.Lys714AsnfsX2 Frameshift EF-handdomain 0 0 1 1

    97 11 720 c.2159dup p.Asn720LysfsX5 Frameshift EF-handdomain 0 0 2 1

    346 11 722 c.2164del p.Val722TrpfsX15 Frameshift EF-handdomain 0 0 1 1

    220 11 738 c.2213_2214del p.Phe738X Frameshift EF-handdomain 0 0 1 1

    366 11 747 c.2240del p.Lys747ArgfsX23 Frameshift EF-handdomain 0 0 1 1

    80 1 80 c.239C>A p.Ser80X Nonsense Cytoplasmicsequence 0 0 1 1

    596, 641 1 87 c.260G>A p.Trp87X Nonsense Poly-Glu 0 0 2 2

    32, 154, 437 1 183 c.547C>T p.Gln183X Nonsense Cytoplasmicsequence 0 0 5 3

    474 2 201 c.603G>A p.Trp201X Nonsense Transmembrane 1 0 1 1

    Supplementary Table S2D. PKD2 protein-truncating mutations in study families

  • 48

    727 2 223 c.667G>T p.Glu223X NonsenseCytoplasmicsequence

    0 0 1 1

    10, 654 3 266 c.796G>T p.Glu266X NonsenseExtracellularsequence

    1 0 3 2

    58, 222, 565 4 306 c.916C>T p.Arg306X NonsenseExtracellularsequence

    1 0 3 3

    389 4 320 c.958C>T p.Arg320X Nonsense Polycystinmotif

    1 7x 1 1

    142, 237 4 361 c.1081C>T p.Arg361X Nonsense Extracellularsequence

    1 0 4 2

    40 5 417 c.1249C>T p.Arg417X NonsenseExtracellularsequence 1 10x 2 1

    31, 163, 836 6 455 c.1365G>A p.Trp455X NonsenseExtracellularsequence 1 0 4 3

    651, 660 6 464 c.1390C>T p.Arg464X Nonsense Extracellularsequence 1 0 3 2

    270 9 654 c.1960C>T p.Arg654X Nonsense Extracellularsequence 1 5x 1 1

    275 10 684 c.2052C>A p.Tyr684X Nonsense Cytoplasmicsequence 0 0 2 1

    106, 198,479, 626,

    83810 701 c.2102C>G p.Ser701X Nonsense Cytoplasmic

    sequence 0 0 5 5

    18, 205,546, 573 11 742 c.2224C>T p.Arg742X Nonsense

    EF-handdomain 1 0 6 4

    12 12 776 c.2326C>T p.Gln776X Nonsense EF-handdomain 0 0 2 1

    123, 155,558, 636 13 803 c.2407C>T p.Arg803X Nonsense Linker 1 6x 10 4

    140 14 874 c.2620A>T p.Lys874X Nonsense Coiled coil 0 0 1 1

    86 1 198 c.595+1G>C TypicalsplicingCytoplasmicsequence 0 0 1 1

    372, 633 4 365 c.1094+1G>T TypicalsplicingExtracellularsequence 0 0 2 2

  • 49

    44 5 365 c.1095-2A>GTypicalsplicing

    Extracellularsequence

    1 0 4 1

    460 5 365 c.1095-2A>TTypicalsplicing

    Extracellularsequence

    0 0 1 1

    319 6 516 c.1548+1G>ATypicalsplicing

    Transmembrane

    0 0 1 1

    401 8 572 c.1717-2A>C Typicalsplicing

    Cytoplasmicsequence

    0 0 1 1

    3 11 747 c.2240+1G>T Typicalsplicing

    EF-handdomain

    1 0 2 1

    349 1 exon 1 exon 1Largedeletion 0 0 1 1

    11 2 Intron2-exon5 p.M1fsXLargedeletion

    Cytoplasmicsequence 0 0 8 1

    574, 609 3 Exon 3-5 Exon 3-5 Largedeletion 0 0 2 2

    897 8 exon 8 exon 8 Largedeletion 0 0 1 1

  • 50

    FamilyID

    Exon Codon cDNAchange

    Proteinchange

    Domain SIFT Polyphen

    GERP++

    PKDB

    CanadaDB

    Koreanindividuals

    Koreanfamilies

    Segregation

    patients/control

    435 1 179 c.536C>A p.Pro179His Cytoplasmicsequence

    0.01 0.838 3.98 0 0 1 1

    13 4 290 c.869G>T p.Gly290Val Extracellularsequence

    0 0.99 5.07 0 0 4 1 2,0

    335, 340 4 322 c.965G>A p.Arg322GlnPolycystin

    motif 0 0.999 5.62 1 0 5 2

    82, 899 4 325 c.974G>A p.Arg325GlnPolycystin

    motif 0.01 0.988 -3.84 1 3x 4 2

    240 5 389 c.1166C>T p.Ala389Val Extracellularsequence 0 0.907 4.9 0 0 2 1 3,1

    394 5 374 c.1121T>G p.Leu374Trp Extracellularsequence 0 0.892 4.9 0 0 1 1

    122, 292 5 381 c.1141G>A p.Gly381Arg Extracellularsequence 0 0.966 4.9 0 0 3 2 3,1

    282, 472 14 890 c.2668G>A p.Glu890Lys Coiled coil 0.21 1 5.01 0 0 2 2 2,1

    Supplementary Table S2E. PKD2 non-truncating mutations in study families

  • 51

    Supplementary Table S3. Design of Custom Bait

    Category Description

    Sequencing technology Illumina

    Sequencing protocol Paired-End Short Read (100bp)

    Tiling frequency 3x

    Bait length 120

    Avoid standard repeat masked regions Yes

    Avoid overlap 20

    Layout strategy Centered

    Strand Sense

    Total input targets 1018

    Total valid targets 256

    Average number of baits per target 5.88

    Total targets with bait coverage 2038

    Total number of baits 1498

    Baits removed due to avoid overlap 46

    Total baits covered by baits 59913

    Abbreviations. bp, basepair

    Supplementary Table S4. Quality of Targeted Exome Sequencing

    Cohort (N=749) Average S.D.

    QC passed reads 9141842.96 6008387.66

    Unique reads 7826567.35 5238200.17

    Aligned unique reads 7815828.12 5227207.04

    PKD1 target mean coverage (X) 1523.12 973.96

    % of PKD1 target bases ≥1X 0.9931 0.0076

    % of PKD1 target bases ≥10X 0.9700 0.0073

    % of PKD1 target bases ≥20X 0.9650 0.0061

    % of PKD1 target bases ≥50X 0.9587 0.0084

    % of PKD1 target bases ≥100X 0.9478 0.0207

    PKD2 target mean coverage (X) 1422.98 903.62

    % of PKD2 target bases ≥1X 0.9999 0.0009

    % of PKD2 target bases ≥10X 0.9960 0.0074

    % of PKD2 target bases ≥20X 0.9879 0.0158

    % of PKD2 target bases ≥50X 0.9594 0.0297

    % of PKD2 target bases ≥100X 0.9226 0.0447

    Abbreviations. QC, quality control; S.D, standard deviation

  • 52

    Supplementary Table S5. Definition of Grades

    Grade of Mutations

    Definitely

    pathogenic

    (DP)

    • Stop-gain single-nucleotide variants (SNVs)

    • Frameshift indels

    • In-frameshift indels with ≥5 amino acids

    • Typical splice site mutations

    • Either newly discovered from the samples or previously

    registered in the Mayo DB or TGESP(1) data

    Highly likely

    pathogenic

    (HLP)

    • Nonsynonymous SNVs or in-frameshift indels of

  • 53

    Gene Exonic Func Exon cDNA change Protein change DupR Sanger

    PKD1 frameshiftdeletion

    exon15 c.5014_5015del

    p.Arg1672GlyfsX98 Y Confirmed

    PKD1 frameshiftinsertion

    exon16 c.6994_7000dup

    p.Val2334GlyfsX88 Y Confirmed

    PKD1 stopgain SNV exon15 c.4447C>T p.Gln1483X Y Confirmed

    PKD1 stopgain SNV exon15 c.6549dup p.Glu2184X Y Confirmed

    PKD1 frameshiftdeletion

    exon25 c.9083_9084del

    p.Glu3028GlyfsX40 Y Confirmed

    PKD1 stopgain SNV exon15 c.4447C>T p.Gln1483X Y Confirmed

    PKD1 stopgain SNV exon20 c.7816C>T p.Gln2606X Y Confirmed

    PKD1 frameshiftdeletionexon15 c.5014_5015del

    p.Arg1672GlyfsX98 Y Confirmed

    PKD1 frameshiftdeletionexon15 c.4070del

    p.Leu1357ArgfsX9 Y Confirmed

    PKD1 frameshiftinsertionexon11 c.2494dupC

    p.Arg832ProfsX40 Y Confirmed

    PKD1 stopgain SNV exon15 c.4447C>T p.Gln1483X Y Confirmed

    PKD1 stopgain SNV exon5 c.1198C>T p.Arg400X Y Confirmed

    PKD1 frameshiftdeletionexon11 c.2618_2621del

    p.Val873AlafsX24 Y Confirmed

    PKD1frameshiftdeletion

    exon15 c.5014_5015del

    p.Arg1672GlyfsX98 Y Confirmed

    PKD1 NA exon23c.8017-2A>G   Y Confirmed

    PKD1 stopgain SNV exon15

    c.4447C>T p.Gln1483X Y Confirmed

    PKD1 stopgain SNV exon15

    c.6549dup p.Glu2184X Y Confirmed

    PKD2 stopgain SNV exon13 c.2407C>T p.Arg803X N Confirmed

    PKD1 frameshiftdeletionexon15 c.3684del

    p.Val1229TrpfsX44 Y Confirmed

    PKD1 stopgain SNV exon15 c.6549dup p.Glu2184X Y Confirmed

    PKD1 nonsynonymous SNVexon23 c.8311G>A p.Glu2771Lys Y Confirmed

    PKD1nonframeshift

    deletionexon24 c.8935_8937del p.Phe2979del Y Confirmed

    Supplementary Table S6. Results from Validation Study(n=80)

  • 54

    PKD1nonsynonymous SNV

    exon27 c.9404C>T p.Thr3135Met Y Confirmed

    nonsynonymous SNV

    exon15 c.4051C>T p.Arg1351Trp Y ConfirmedPKD1

    stopgain SNV exon15 c.3334G>T p.Glu1112X Y ConfirmedPKD1

    frameshiftinsertion

    exon15 c.3349dup

    p.Gln1117ProfsX19 Y ConfirmedPKD1

    stopgain SNV exon9 c.1789C>T p.Gln597X Y ConfirmedPKD1

    frameshiftinsertion

    exon15

    c.4315_4324delinsCAACTGTTG

    p.Gly1439GlnfsX5 Y ConfirmedPKD1

    frameshiftdeletion

    exon7 c.1495del

    p.Glu499SerfsX59 Y ConfirmedPKD1

    frameshiftdeletion

    exon11 c.2659del

    p.Trp887GlyfsX11 Y ConfirmedPKD1

    frameshiftdeletion

    exon15 c.4679_4691del

    p.Val1560GlyfsX3 Y ConfirmedPKD1

    frameshiftdeletion

    exon46

    c.12672_12673del

    p.Gln4224HisfsX2 N ConfirmedPKD1

    frameshiftdeletion

    exon15 c.3744_3754del

    p.Asp1249AlafsX48 Y ConfirmedPKD1

    frameshiftdeletion

    exon19 c.7579_7580del

    p.Val2527LeufsX67 Y ConfirmedPKD1

    frameshiftdeletion

    exon11 c.2659del

    p.Trp887GlyfsX11 Y ConfirmedPKD1

    stopgain SNV exon15 c.5477G>A p.Trp1826X Y ConfirmedPKD1

    frameshiftdeletion

    exon8 c.1669_1670del

    p.Leu557AspfsX29 Y ConfirmedPKD1

    frameshiftdeletion

    exon11 c.2716del

    p.Glu906SerfsX21

    Y ConfirmedPKD1

    frameshiftdeletion

    exon23

    c.8327_8330delp.Leu2776Argfs

    X98 Y ConfirmedPKD1

    stopgain SNVexon17 c.7164C>G p.Tyr2388X Y ConfirmedPKD1

    stopgain SNV exon17 c.7164C>G p.Tyr2388X Y ConfirmedPKD1

    frameshiftdeletion

    exon15 c.6808_6811del

    p.Asp2270HisfsX43 Y ConfirmedPKD1

    frameshiftdeletion

    exon21 c.7973_7974del

    p.Val2658GlyfsX2 Y ConfirmedPKD1

    nonframeshiftdeletion

    exon20

    c.7837_7839del p.Leu2613del Y ConfirmedPKD1

    stopgain SNV exon5

    c.696T>A p.Cys232X Y ConfirmedPKD1

    stopgain SNV exon3 c.350T>G p.Leu117X Y ConfirmedPKD1

  • 55

    stopgain SNVexon5

    c.706C>T p.Gln236X Y ConfirmedPKD1

    ConfirmedYp.Leu557AspfsX29c.1669_1670del

    exon8

    frameshiftdeletionPKD1

    ConfirmedYp.Tyr1599Xc.4797C>Aexon15stopgain SNVPKD1

    ConfirmedYp.Trp1826Xc.5477G>Aexon15stopgain SNVPKD1

    ConfirmedYp.Leu2776ArgfsX98c.8327_8330del

    exon23

    frameshiftdeletionPKD1

    ConfirmedYp.Asn2770delc.8308_8310delexon23

    nonframeshiftdeletionPKD1

    ConfirmedYp.Trp439Xc.1316G>Aexon6stopgain SNVPKD1

    ConfirmedYp.Glu2335GlyfsX85c.7003dupG

    exon16

    frameshiftinsertionPKD1

    ConfirmedYp.Phe1075ArgfsX28c.3223_3226del

    exon14

    frameshiftdeletionPKD1

    ConfirmedYp.Val1841GlyfsX110c.5517_5521dup

    exon15

    frameshiftinsertionPKD1

    ConfirmedNp.Trp3603Xc.10809G>Aexon36stopgain SNVPKD1

    ConfirmedYp.Gln1621Xc.4861C>Texon15stopgain SNVPKD1

    ConfirmedNp.Gln183Xc.547C>Texon1stopgain SNVPKD2

    ConfirmedNp.Lys714AsnfsX2c.2140delA

    exon11

    frameshiftdeletionPKD2

    ConfirmedYp.Gln1203Xc.3607C>Texon15stopgain SNVPKD1

    ConfirmedNp.Gln183Xc.547C>Texon1stopgain SNVPKD2

    ConfirmedYp.Trp2187Xc.6561G>Aexon15stopgain SNVPKD1

    ConfirmedYp.Gln2606SerfsX14c.7816del

    exon20

    frameshiftdeletionPKD1

    ConfirmedNp.Val4034CysfsX4c.12100del

    exon44

    frameshiftdeletionPKD1

    ConfirmedNp.Phe3614LeufsX11

    c.10839_10840del

    exon37

    frameshiftdeletionPKD1

    ConfirmedNp.Ser3591Phec.10772C>Texon36

    nonsynonymous SNVPKD1

    ConfirmedYp.Ser689delc.2065_2067delexon10

    nonframeshiftdeletionPKD1

    ConfirmedYp.Ser689delc.2065_2067delexon10

    nonframeshiftdeletionPKD1

    ConfirmedYp.Ala81Valc.242C>Texon2

    nonsynonymous SNVPKD1

  • 56

    ConfirmedYp.Val908Metc.2722G>Aexon11

    nonsynonymous SNVPKD1

    NotdetectedYp.Pro694Leuc.2081C>Texon

    10nonsynonymous SNVPKD1

    ConfirmedYp.Ser689delc.2065_2067delexon10

    nonframeshiftdeletionPKD1

    ConfirmedYp.Leu1652Proc.4955T>Cexon15

    nonsynonymous SNVPKD1

    ConfirmedYp.Leu3031delc.9092_9094delexon25

    nonframeshiftdeletionPKD1

    ConfirmedYp.Ala81Valc.242C>Texon2

    nonsynonymous SNVPKD1

    ConfirmedYp.Pro2087_Arg2088dupc.6258_6263dup

    exon15

    nonframeshiftinsertionPKD1

    ConfirmedYp.Leu3031delc.9092_9094delexon25

    nonframeshiftdeletionPKD1

    ConfirmedYp.Cys3043Tyrc.9128G>Aexon25

    nonsynonymous SNVPKD1

    ConfirmedNp.Ile4105Leuc.12313A>Cexon45

    nonsynonymous SNVPKD1

    1. Long Range PCR information

    Exon Forward primer Reverse primer Tm

    ('C)

    Product

    size(bp)

    2-75'-CCCCGAGTAGCTGGAAC

    TACAGTTACACACT-3'

    5'-CGTCCTGCTGTGCCAGA

    GGCG-3'70 4041

    13-155'-TGGAGGGAGGGACGCC

    AATC-3'

    5'-GTCAACGTGGGCCTCCA

    AGT-3'65 4391

    15-215'-ATCCCTGGGGGTCCTAC

    CATCTCTTA-3'

    5'-ACACAGGACAGAACGG

    CTGAGGCTA-3'68 5253

    2. Sequencing primers information

    Exon primer

    name

    Sequence Genomic

    position(of5'firstbp;

    hg19)

    Distance to

    exon-intron

    junction

    2-3PKD1_2-3-F 5'-GGGATGCTGGCAATGTGTGGGAT-3' 2,169,468 -89 exon 2

    PKD1_2-3-R 5'-GGACCAACTGGGAGGGCAGAA-3' 2,169,038 77 exon 2

    4 PKD1_4-F 5'-GGCGGTGCTGTCAGGGTG-3' 2,168,948 -102 exon 4

    Supplementary Table S7. MLPA validation using Long Range PCR

    sequencing

  • 57

    PKD1_4-R 5'-CCAGAGAGGCCTTCCTGAGC-3' 2,168,582 95 exon 4

    5(A)PKD1_5A-F 5'-GAACAGCATGGGAGCCTGTGAGT-3' 2,168,561 -95 exon 5

    PKD1_5A-R 5'-AGCCGGCCCAGCGGCATC-3' 2,168,033Within an

    exon 5

    5(B)PKD1_5B-F 5'-GCCTGTCCCTCTGCTCCG-3' 2,168,262

    Within an

    exon 5

    PKD1_5B-R 5'-GTGTCAACGGTCAGTGTGGGC-3' 2,167,696 96 exon 5

    6PKD1_6-F 5'-GTGTCTGCTGCCCACTCCC-3' 2,167,787 -124 exon 6

    PKD1_6-RF 5'-CTCCTTCCTCCTGAGACTCCC-3' 2,167,397 93 exon 6

    15(A)PKD1_15A-F 5'-TTCTGCCGAGCGGGTGGG-3' 2,161,938 -64 exon 15

    PKD1_15A-R 5'-CATGTCGAAGGTCCACGTGATGT-3' 2,161,427Within an

    exon 15

    15(B)PKD1_15B-F 5'-GACATGAGCCTGGCCGTGG-3' 2,161,516

    Within an

    exon 15

    PKD1_15B-R 5'-CCACCTCTGGCTCCACGCA-3' 2,161,027Within an

    exon 15

    15(C)PKD1_15C-F 5'-CACGCGGAGCGGCACGTT-3' 2,161,121

    Within an

    exon 15

    PKD1_15C-R 5'-GGTGACCTCCGGACCCTC-3' 2,160,626Within an

    exon 15

    15(D)PKD1_15D-F 5'-TCTGCTGTGGGCCGTGGG-3' 2,160,706

    Within an

    exon 15

    PKD1_15D-R 5'-CTGTACCGTGTGGTTGGTGGG-3' 2,160,215Within an

    exon 15

    15(E)PKD1_15E-F 5'-ACAGCATCTTCGTCTATGTCCTG-3' 2,160,303

    Within an

    exon 15

    PKD1_15E-R 5'-GGTTCCCTGCCGTCATGGTG-3' 2,159,812Within an

    exon 15

    15(F)PKD1_15F-F 5'-GGGCTGAGCTGGGAGACCT-3' 2,159,899

    Within an

    exon 15

    PKD1_15F-R 5'-GACAGCTGAGCCGGCAGC-3' 2,159,417Within an

    exon 15

    15(G)PKD1_15G-F 5'-CTGTGGGCCAGCAGCAAGGT-3' 2,159,494

    Within an

    exon 15

    PKD1_15G-R 5'-CGTGCGGTTCTCACTGCCCA-3' 2,159,012Within an

    exon 15

    15(H)PKD1_15H-F 5'-GACGTCACCTACACGCCCG-3' 2,159,095

    Within an

    exon 15

    PKD1_15H-R 5'-CCTCCCAGCGGTACTCAGTCT-3' 2,158,603Within an

    exon 15

    15(I)PKD1_15I-F 5'-GATGCGGCGATCACAGCGCA-3' 2,158,688

    Within an

    exon 15

    PKD1_15I-R 5'-GGCCAGCCCTGGTGGCAA-3' 2,158,164Within an

    exon 15

    21PKD1_21-F 5'-AGTCGTGGGCATCTGCTGGC-3' 2,155,550 -75 exon 21

    PKD1_21-R 5'-CAAGCTGCCCGTCTGCCCT-3' 2,155,240 83 exon 21

    Reference. Ying-Cai Tan et al., J Mol Diagn. 2012 Jul;14(4):305-13. doi: 10.1016/ j.

    jmoldx. 2012. 02. 007

  • 58

    Supplementary Figure S1

    Of the entire sample of 524 families, variants were found by targeted exome

    sequencing (TES) in 471 families (89.9%). Among them, mutations were found

    in 409 families. In 62 families with likely neutral (LN) or indeterminate (I)

    variants and 53 families with no variants (NV), subsequent PKD1 exon 1

    Sanger sequencing (SS) and/or multiple ligation probe assay (MLPA) revealed

    10 additional mutations. An overall mutation detection rate was 80.7% (423/524

    families) by TES, PKD1 exon 1 Sanger sequencing, and MLPA.

  • 59

    Supplementary Figure S2

    Mutations were found more frequently in the C-type lectin and G

    protein-coupled receptor proteolytic site (GPS) domains of polycystin-1 in the

    cohort compared to those reported from the ADPKD mutation database (PKDB).

    Likewise, mutations were detected more frequently in the linker domain of