16
Genome and Epigenome The Open Chromatin Landscape of NonSmall Cell Lung Carcinoma Zhoufeng Wang 1 , Kailing Tu 2 , Lin Xia 2 , Kai Luo 2 ,Wenxin Luo 1 , Jie Tang 2 , Keying Lu 2 , Xinlei Hu 2 , Yijing He 2 , Wenliang Qiao 3 , Yongzhao Zhou 1 , Jun Zhang 2 , Feng Cao 2 , Shuiping Dai 1 , Panwen Tian 1 , Ye Wang 1 , Lunxu Liu 4 , Guowei Che 4 , Qinghua Zhou 3 , Dan Xie 2 , and Weimin Li 1 Abstract Nonsmall cell lung carcinoma (NSCLC) is a major cancer type whose epigenetic alteration remains unclear. We ana- lyzed open chromatin data with matched whole-genome sequencing and RNA-seq data of 50 primary NSCLC cases. We observed high interpatient heterogeneity of open chro- matin proles and the degree of heterogeneity correlated to several clinical parameters. Lung adenocarcinoma and lung squamous cell carcinoma (LUSC) exhibited distinct open chromatin patterns. Beyond this, we uncovered that the broadest open chromatin peaks indicated key NSCLC genes and led to less stable expression. Furthermore, we found that the open chromatin peaks were gained or lost together with somatic copy number alterations and affected the expression of important NSCLC genes. In addition, we identied 21 joint-quantitative trait loci (joint-QTL) that correlated to both assay for transposase accessible chroma- tin sequencing peak intensity and gene expression levels. Finally, we identied 87 regulatory risk loci associated with lung cancerrelated phenotypes by intersecting the QTLs with genome-wide association study signicant loci. In summary, this compendium of multiomics data provides valuable insights and a resource to understand the land- scape of open chromatin features and regulatory networks in NSCLC. Signicance: This study utilizes state of the art genomic methods to differentiate lung cancer subtypes. See related commentary by Bowcock, p. 4808 Introduction Lung cancer is one of the leading causes of cancer-related death worldwide (1). Nonsmall cell lung carcinoma (NSCLC) accounts for approximately 85% of lung cancer cases, with lung adenocarcinoma and lung squamous cell carcinoma (LUSC) being the two major histologic types (2). Recent genome sequenc- ing efforts have identied millions of somatic mutations in NSCLC (3), which further led to the discovery of "driver muta- tions" in key oncogenes. While the genetic factor only accounts for parts of the interpersonal variability in NSCLC risk, the epigenetic contributions to this disease are becoming increasingly para- mount (4). The epigenetic proles, such as DNA methylation (5), histone modications (6), and noncoding RNA (7) have been characterized in NSCLC. Until now, however, the open chromatin landscape of NSCLC remains undetermined. Recently, the highly efcient assay for transposase accessible chromatin sequencing (ATAC-seq) approach (8) has successfully mapped genome-wide open chromatin patterns in multiple human cell types and provided valuable insights into the under- lying regulatory mechanisms (9). Several works have proled the open chromatin state in lymphocytic leukemia (10, 11). To date, a few studies have characterized open chromatin state in primary NSCLC samples. A recent work published by Corces and collea- gues cataloged the open chromatin states of 23 cancer types (12), including NSCLC. However, none has associated the open chro- matin variations to genomic alterations among patients with NSCLC. It is known that the prole of open chromatin in primary NSCLC samples is particularly difcult partially because of cancer tissues confounded by cell-type heterogeneity (13). Recent work has shown that using negative cell isolation to deplete immune cells and broblasts could signicantly increase the purity of cancer cells from primary tumor samples (14), enabling mean- ingful epigenomic analysis. The integrative analysis combining whole-genome sequencing (WGS) and RNA-seq data with open chromatin state from the same patients could potentially delineate the effects of genomic alterations on the gene regulatory network in NSCLC. Notably, we can explore the effects of genomic mutations and structure varia- tions on open regulatory elements and understand how it is associated with NSCLC transcriptome. In this study, we generated matched ATAC-seq, WGS, and RNA-seq data from 50 primary 1 Department of Respiratory and Critical Care Medicine, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, China. 2 National Frontier Center of Disease Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, China. 3 Lung Cancer Center, West China Hospital Sichuan University, Chengdu, Sich- uan, China. 4 Department of Thoracic Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, China. Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/). Z. Wang, K. Tu, L. Xia, K. Luo, and W. Luo contributed equally to this article. Corresponding Authors: Dan Xie, West China Hospital of Sichuan University, Chengdu, Sichuan 610000, China. Phone: 136-9346-2346; Fax: 028-85164165; E-mail: [email protected]; and Weimin Li, Phone: 189-8060-1009; E-mail: [email protected] Cancer Res 2019;79:484054 doi: 10.1158/0008-5472.CAN-18-3663 Ó2019 American Association for Cancer Research. Cancer Research Cancer Res; 79(19) October 1, 2019 4840 on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

Genome and Epigenome

TheOpenChromatin LandscapeofNon–Small CellLung CarcinomaZhoufengWang1, Kailing Tu2, Lin Xia2, Kai Luo2,Wenxin Luo1, Jie Tang2, Keying Lu2,Xinlei Hu2, Yijing He2,Wenliang Qiao3, Yongzhao Zhou1, Jun Zhang2, Feng Cao2,Shuiping Dai1, Panwen Tian1,Ye Wang1, Lunxu Liu4, Guowei Che4, Qinghua Zhou3,Dan Xie2, and Weimin Li1

Abstract

Non–small cell lung carcinoma (NSCLC) is a major cancertype whose epigenetic alteration remains unclear. We ana-lyzed open chromatin data with matched whole-genomesequencing and RNA-seq data of 50 primary NSCLC cases.We observed high interpatient heterogeneity of open chro-matin profiles and the degree of heterogeneity correlated toseveral clinical parameters. Lung adenocarcinoma and lungsquamous cell carcinoma (LUSC) exhibited distinct openchromatin patterns. Beyond this, we uncovered that thebroadest open chromatin peaks indicated key NSCLC genesand led to less stable expression. Furthermore, we foundthat the open chromatin peaks were gained or lost togetherwith somatic copy number alterations and affected theexpression of important NSCLC genes. In addition, we

identified 21 joint-quantitative trait loci (joint-QTL) thatcorrelated to both assay for transposase accessible chroma-tin sequencing peak intensity and gene expression levels.Finally, we identified 87 regulatory risk loci associated withlung cancer–related phenotypes by intersecting the QTLswith genome-wide association study significant loci. Insummary, this compendium of multiomics data providesvaluable insights and a resource to understand the land-scape of open chromatin features and regulatory networksin NSCLC.

Significance: This study utilizes state of the art genomicmethods to differentiate lung cancer subtypes.

See related commentary by Bowcock, p. 4808

IntroductionLung cancer is one of the leading causes of cancer-related death

worldwide (1). Non–small cell lung carcinoma (NSCLC)accounts for approximately 85% of lung cancer cases, with lungadenocarcinoma and lung squamous cell carcinoma (LUSC)being the twomajor histologic types (2). Recent genome sequenc-ing efforts have identified millions of somatic mutations inNSCLC (3), which further led to the discovery of "driver muta-tions" in key oncogenes.While the genetic factor only accounts forparts of the interpersonal variability in NSCLC risk, the epigeneticcontributions to this disease are becoming increasingly para-

mount (4). The epigenetic profiles, such as DNAmethylation (5),histone modifications (6), and noncoding RNA (7) have beencharacterized inNSCLC.Until now, however, the open chromatinlandscape of NSCLC remains undetermined.

Recently, the highly efficient assay for transposase accessiblechromatin sequencing (ATAC-seq) approach (8) has successfullymapped genome-wide open chromatin patterns in multiplehuman cell types and provided valuable insights into the under-lying regulatory mechanisms (9). Several works have profiled theopen chromatin state in lymphocytic leukemia (10, 11). Todate, afew studies have characterized open chromatin state in primaryNSCLC samples. A recent work published by Corces and collea-gues cataloged the open chromatin states of 23 cancer types (12),including NSCLC. However, none has associated the open chro-matin variations to genomic alterations among patients withNSCLC. It is known that the profile of open chromatin in primaryNSCLC samples is particularly difficult partially because of cancertissues confounded by cell-type heterogeneity (13). Recent workhas shown that using negative cell isolation to deplete immunecells and fibroblasts could significantly increase the purity ofcancer cells from primary tumor samples (14), enabling mean-ingful epigenomic analysis.

The integrative analysis combining whole-genome sequencing(WGS) and RNA-seq data with open chromatin state from thesame patients could potentially delineate the effects of genomicalterations on the gene regulatory network inNSCLC.Notably, wecan explore the effects of genomic mutations and structure varia-tions on open regulatory elements and understand how it isassociated withNSCLC transcriptome. In this study, we generatedmatched ATAC-seq, WGS, and RNA-seq data from 50 primary

1Department of Respiratory and Critical Care Medicine, State Key Laboratory ofBiotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, China.2National Frontier Center of DiseaseMolecular Network, State Key Laboratory ofBiotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, China.3Lung Cancer Center, West China Hospital Sichuan University, Chengdu, Sich-uan, China. 4Department of Thoracic Surgery, West China Hospital, SichuanUniversity, Chengdu, Sichuan, China.

Note: Supplementary data for this article are available at Cancer ResearchOnline (http://cancerres.aacrjournals.org/).

Z. Wang, K. Tu, L. Xia, K. Luo, and W. Luo contributed equally to this article.

Corresponding Authors: Dan Xie, West China Hospital of Sichuan University,Chengdu, Sichuan 610000, China. Phone: 136-9346-2346; Fax: 028-85164165;E-mail: [email protected]; and Weimin Li, Phone: 189-8060-1009; E-mail:[email protected]

Cancer Res 2019;79:4840–54

doi: 10.1158/0008-5472.CAN-18-3663

�2019 American Association for Cancer Research.

CancerResearch

Cancer Res; 79(19) October 1, 20194840

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 2: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

NSCLC cases. The comprehensive open chromatin landscape ofNSCLC provided important resources that allowed us to identifykey regulatory elements in this disease. The integration of mul-tiple-omics datasets revealed novel insights into the gene regu-latory mechanism of NSCLC.

Materials and MethodsPatients and clinical information

Patients with NSCLC were staged according to the AmericanJoint Committee onCancer version 6 and initially diagnosedwithlung cancer at West China Hospital of Sichuan University(Chengdu, China) from September 2016 to December 2018.Information, including patients' age, gender, ethnicity, patholo-gy, and tumor stage was collected for these 51 patients (Supple-mentary Table S1). All of them received surgical treatment, andnone of them underwent neoadjuvant therapy before surgery.Tumors and matched distal normal lung tissues were obtainedduring surgery. All samples were evaluated by two pathologists todetermine the pathologic diagnosis and tumor cellularity. Onlytumor tissues containing at least 80% of tumor cells were includ-ed. This study was approved by the Institutional Review Board ofWest China Hospital of Sichuan University (Chengdu, China;project identification code: 2017.114) and all patients providedwritten informed consent.

ATAC-seq library preparation and sequencingTo profile open chromatin, ATAC-seq was performed as

described previously (8). A total of 1 � 105 cell pellets werewashed once with PBS and cells were pelleted by centrifugationusing the previous settings. Cell pellets were resuspended in 50 mLof lysis buffer, and nuclei were pelleted by centrifugation for 10minutes at 500 � g, 4�C. The supernatant was discarded, andnuclei were resuspended in 50 mL reaction buffer containing 2.5mL of Tn5 transposase and 22.5 mL of TD buffer (Nextera Illu-mina). The reaction was incubated at 37�C for 30 minutes.Tagmented DNA was isolated by MinElute PCR Purification Kit(Qiagen). Librarieswere amplified for 10 cycles andpurifiedusing2 � SPRI Cleanup (Agencourt). Library sizes were determinedusing LabChip GXII Touch HT (PerkinElmer), and 2� 75 paired-end sequencing performed onNextSeq500 (Illumina) to yield onaverage 50 M reads/sample.

DNA-seq library preparation and sequencingWGS was performed with DNA extracted from fresh-frozen

tumor and normalmaterial. DNAwas extracted from fresh-frozentissues, using the Gentra Puregene DNA Extraction Kit (Qiagen)following the protocol of the manufacturer. DNA isolates werehydrated in TE buffer and confirmed to be of high molecularweight (10 kb) by agarose gel electrophoresis. The concentrationwas measured using Qubit 2.0 Fluorometer (Life Technologies).Short insert DNA libraries were prepared with the TruSeq NanoDNA HT Sample Prep Kit (Illumina), the DNA libraries weresequenced on Illumina HiSeq X platform, and 150 bp paired-endreads were generated. Human DNA libraries were sequenced toobtain coverage of minimum 30 � or 50 � for both tumor andmatched normal.

RNA-seq library preparation and sequencingFor RNAextractions, tissue sectionswere first lysed andhomog-

enized with the TissueLyser (Qiagen). Subsequent RNA extrac-tions were performed with the Qiagen RNeasy Mini Kit according

to the instructions of the manufacturer. The RNA quality wasassessed at the Bioanalyzer 2100 DNA Chip 7500 (Agilent Tech-nologies), and samples with an RNA integrity number of over 7were further analyzed by RNA-seq. RNA sequencing libraries weregenerated using the rRNA-depleted RNA byNEBNext Ultra Direc-tional RNA Library Prep Kit for Illumina following the manu-facturer's recommendations. The products were purified (AMPureXP system) and library quality was assessed on the AgilentBioanalyzer 2100 system. The libraries were sequenced on anIlluminaHiSeq 4000 platform, and 150 bp paired-end reads weregenerated.

WGS data processingRaw pair-end WGS reads were subjected to adapter and low-

quality sequence trimming by using Trimmomatic (version 0.36;ref. 15) with default parameters. We mapped trimmed pair-endreads to human reference build hg19 by using BWAmem (version0.7.13-r1126; ref. 16). BAMs were sorted and indexed usingSAMtools (version 1.3; ref. 17), and marking duplicates usingPicard (version 2.2.1; http://broadinstitute.github.io/picard.).The Genome Analysis Toolkit (GATK, version 3.6; ref. 18) wasused for local realignment and base quality recalibration, proces-sing tumor/normal pairs independent.

Germline mutation detectionUsing default parameters, GATK HaplotypeCaller (18) was

used to detect germline single-nucleotide variant (SNV) andindels. The known sites' files used for germline SNVs and indelscalling were downloaded from ftp://[email protected]/bundle/.

Somatic mutation detectionSomatic SNVs and indels were predicted using MuTect2 (19)

and VarScan2 (20) with default parameters. Somatic SNVs andindels identified by VarScan2 (20) were retained if all of thefollowing criteria were met: (i) P value of the reported somaticSNV � 0.05; (ii) the natural frequency of the reported somaticSNV � 5%; (iii) tumor frequency of the reported somatic SNV �10%; and (iv) the count of reads that supported the reportedsomatic SNV � 2. Only somatic SNVs and indels identified byboth MuTect2 (19) and VarScan2 (20) were retained for furtheranalysis. All somatic SNV and indels were functionally annotatedby ANNOVAR (version 20160201; ref. 21).

Somatic copy-number variation detectionSomatic copy-number variations (CNV) were called for 42

tumor samples with paired adjacent tissue samples using Con-trol-FreeC (version 9.5; parameters: ploidy¼ 2, breakPointThres-hold¼ 0.8, breakPointType¼ 0.8, coefficientOfVariation¼ 0.05,contamination ¼ 0, contaminationAdjustment ¼ FALSE, min-MappabilityPerWindow¼ 0.85, window¼ 5000, numberOfPro-cesses¼ 10,mateOrientation¼ FR; ref. 22). TheGISTIC2 (version2.0.22; ref. 23) was used to identify regions of the genome thatare significantly amplified or deleted across 37 samples (5 sam-ples that had abnormal somatic mutation count were removed;parameters: -genegistic 1 -smallmem 1 -broad 1 -brlen 0.98 -conf0.90 -savegene 1).

ATAC-seq data processingPaired-end ATAC-seq fragments of human samples were

aligned to human reference build hg19 using BWA mem com-mand with -M parameter. PCR duplicates were removed by

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4841

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 3: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

Picard (version 1.119). Then, reads mapped to mitochondriawere discarded. Only uniquely mapped paired-end readswith fragment length less than 2,000 bp and mapping quality> 30 were kept for further analysis. ATAC-Seq peak regions ofeach sample were called using MACS2 (version 2.1.0; ref. 24)with parameters: –nomodel –nolambda –call-summits -q 0.001.Blacklisted regions (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/) were excludedfrom called peaks. Open chromatin regions from The CancerGenome Atlas (TCGA) were converted to hg19 genome usingliftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver). To gener-ate a consensus set of unique peaks, we scanned the genomeusing200 bp sliding windows and 100 bp step size. Adjacent openwindows that appeared in at least one sample were combined as aconsensus open region list. Jaccard index, which captures theproportion of open chromatin regions that are shared by twosamples, was used to measure the similarity between samples.Using all ATAC-seq data from NSCLC samples, we constructed asymmetric similarity matrix based on the presence of the openregion. This distance matrix was clustered using hierarchicalclustering algorithm from the "pheatmap" R package. Cluster-specific accessible peaks were identified with Dunn (1964) Krus-kal–Wallis test. All ATAC-seq peaks that were nomore than 2.5 kbaway from annotated gene TSS (GENCODE v19) were selected aspromoter ATAC-seq peaks. GREAT functional enrichment anal-ysis (version 3.0.0; ref. 25) was used to catalog the function ofpeak sets with default parameter.

RNA-seq processingRaw RNA-seq reads were aligned to the hg19 genome assembly

using Hisat2, and FPKM of gene expression were quantified byStringTie. Genes with mean FPKM greater than 1 in either lungadenocarcinoma groupor LUSC groupwere retained. In addition,only genes annotated as protein_coding, lincRNA, snRNA,miRNA, and snoRNA were kept and used for xQTL associationanalysis, leaving 12,776 genes for expression quantitative traitlocus (eQTL) analysis.

Data accessProcessed data of NSCLC samples in this study, including

ATAC-seq peaks, gene expression matrix, and genomic altera-tions, are available via the data center of precision medicine inWest China Hospital (https://pms.cd120.com/download.html;Sichuan, China).

ResultsGeneration of matched ATAC-seq, genome, and transcriptomedata in NSCLC samples

To reveal the landscape of open chromatin in NSCLC andelucidate the association between the open chromatin, genomicvariations, and gene expression profiles in NSCLC, we collectedprimary tumors from 51 patients, including 34 patients with lungadenocarcinoma, 13 patients with LUSC, and 4 patients withbenign solitary pulmonary nodules (BSPN; Supplementary TableS1; seeMaterials andMethods). We also collected paired adjacenttissues from 40 patients with NSCLC and 2 patients with non-malignant nodules for somatic mutation calling (see Materialsand Methods). For each sample, we performed ATAC-seq, WGS,and RNA-seq (Fig. 1A; see Materials and Methods). Each of theATAC-seq data generated 61.72�243.83M (median¼ 103.47M)

reads; the WGS depth exceeded 30�; and the RNA-seq datagenerated 80.6�165.6 M (median ¼ 95.63 M) reads (Supple-mentary Table S2).

We characterized the genomic variations, including somaticSNV, indel, and CNV for each tumor sample (see Materials andMethods). The majority of the somatic SNVs and indels werelocated in intergenic regions and introns, whereas only a smallproportion of somatic SNVs were located in exons (Fig. 1B). Inaccordance with previous report (3), we identified similar somat-ic SNVs in driver genes of NSCLC, including TP53 in 26.92% ofpatients with lung adenocarcinoma and 33.33% of patients withLUSC, EGFR in 19.23% of patients with lung adenocarcinoma,CSMD3 in 33.3% of patients with LUSC, and NFE2L2 in 33.3%of patients with LUSC (Fig. 1C). The transition to transversionratio of somatic SNV ranged from 0.22 to 2.1 in lung adeno-carcinoma and from 0.04 to 0.69 in LUSC (Supplementary Fig.S1A). The proportion of C>A transition, which was reported tobe associated with smoking and lung cancer, was significantlyhigher in NSCLC samples than benign samples (P ¼ 0.04878;Mann–Whitney U test; Supplementary Fig. S1B). We furtherperformedmutation signature analysis on the somatic SNV (26)and identified four enriched mutational signatures in allNSCLC samples (S1–S4, Supplementary Fig. S1C; Supplemen-tary Materials and Methods). Three of the four enriched sig-natures (S2–S4) had high cosine similarity to cancer-relatedsignatures cataloged in the COSMIC database (SupplementaryMaterials and Methods; ref. 27). The signature S2 was analo-gous to COSMIC signature 9 (cosine similarity: 0.57). COSMICsignature 9 has been found in chronic lymphocytic leukemia, aswell as malignant B-cell lymphomas and nerized by a pattern ofmutation caused by polymerase h. The signature S3 resembledCOSMIC signature 5 (cosine similarity: 0.65) that existed in allkinds of cancers and most cancer samples. The signature S4matched to COSMIC signature 3 (cosine similarity: 0.73). ThisCOSMIC signature was observed in breast, ovarian, and pan-creatic cancers and associated with failure of DNA double-strand break repair by homologous recombination. Interest-ingly, all LUSC samples were strongly associated with S3,whereas most of the lung adenocarcinoma samples were asso-ciated with S2 (Fig. 1D). We clustered all the samples based onthe contributions of these four signatures. The three pathologictypes of samples (lung adenocarcinoma, LUSC, and BSPN)could be well classified, with only one lung adenocarcinomasample being misclassified to the LUSC cluster (Fig. 1E).

As is supported by previous TCGA study (28), many knownNSCLC driver genes were found to have significant amplifica-tion such as EGFR and NKX2-1 (FDR < 10e-3), or deletion suchas CDKN2A (FDR < 10e-5) in lung adenocarcinoma samples.Besides, we also observed the amplification of other knownlung cancer–related genes (Supplementary Table S3), includingDDR2, TPM3, TPR, and LRIG3 (FDR < 0.1). Furthermore,significant deletions of other known cancer-related genes werealso identified (see Materials and Methods), like PIK3R1, IL6ST,and MAP3KQ (FDR < 0.1; Fig. 1F; Supplementary Table S3). Interms of somatic CNV in LUSC, we found significant amplifi-cation of SOX2, BCL11A, REL, WHSC1L1, and FGFR genes, andsignificant deletion of CDKN2A. These results were consistentwith the TCGA study (29). Also, we found significant deletionsof other known genes that are frequently mutated in lungcancer, such as MYCL, ROS1, and EZR (FDR < 10e-2; Fig. 1F;Supplementary Table S3).

Wang et al.

Cancer Res; 79(19) October 1, 2019 Cancer Research4842

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 4: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

0

10,000

20,000

30,000

40,000

50,000

BS

PN

1B

SP

N3

LUA

D1

LUA

D2

LUA

D4

LUA

D5

LUA

D6

LUA

D7

LUA

D8

LUA

D9

LUA

D11

LUA

D12

LUA

D13

LUA

D14

LUA

D19

LUA

D20

LUA

D22

LUA

D23

LUA

D24

LUA

D25

LUA

D27

LUA

D28

LUA

D29

LUA

D30

LUA

D31

LUA

D32

LUA

D50

LUA

D51

LUS

C1

LUS

C2

LUS

C3

LUS

C4

LUS

C5

LUS

C10

LUS

C11

LUS

C13

LUS

C14

Cou

nt

ExonicExonic;splicingIntergenicIntronicncRNA ExonicncRNA Exonic;splicingncRNA IntronicncRNA SplicingSplicing3’ UTR5’ UTR5’ UTR;3’ UTR

LUA

D12

LUA

D2

LUA

D13

BS

PN

3LU

AD

50LU

AD

7LU

SC

1LU

SC

14LU

SC

11LU

AD

11LU

SC

4LU

SC

2LU

SC

3LU

SC

10LU

SC

5LU

AD

27LU

AD

4B

SP

N1

LUA

D29

LUA

D32

LUA

D23

LUA

D5

LUA

D24

LUA

D30

LUA

D14

LUA

D25

LUA

D51

LUA

D9

LUA

D22

LUA

D31

LUA

D6

LUA

D19

LUA

D28

LUA

D1

LUA

D8

0.00

0.25

0.50

0.75

10−1 10−210−3 10−510−8

1

2

3

4

5

6

7

8

9

10

11

12

13

14

1516171819202122

1q21.2(1)1q21.3(26)1q44(4)

5p15.33(2)5p13.26p22.1(1)6p21.327p22.37p11.2(1)7q11.218q21.118q21.138q21.2(1)8q22.3(2)8q24.22(1)8q24.3

12q14.1(4)12q14.212q1512q21.114q13.3(1)14q21.114q23.3(1)14q32.214q32.3119q13.11(2)20q13.3321q22.13(5)

EGFR

LRIG3

NKX2-1

1010−210−410−6

5q13.2(3)6p21.32

8p22(1)9p21.3(1)

19p13.3

22q11.23

CDKN2A

PIK3R1, IL6ST, MAP3K1

PCM1

−1

DDR2, TPM3, TPR

10−1 10−2 10−4 10−7

1

2

3

4

5

6

7

8

9

10

11

12

13

14

1516171819202122

2p21(4)2p15(3)2q14.22q37.33q25.1(4)3q27.1(2)

8p11.23(2)8p11.22

12q12(1)

17p11.217q2219p13.3

22q11.21(1)

SOX2

BCL11A, REL

WHSC1L1, FGFR1

10−110−210−510−8

1p34.2(1)

4q13.2

6p21.32(37)

9p21.3(1)

ROS1, EZR

MYCL

CDKN2A

B

D

C

E

F LUAD LUSC

TP53

CSMD3

NFE2L2

FAM135B

EPHA3

EML4

DROSHA

HIP1

EZR

0 1 2 3Counts of mutated cases

TP53

EGFR

CSMD3

CUL3

FAM135B

FAM47C

NRG1

RFWD3

RBM10

SMARCA4

STRN

0 2 4 6Counts of mutated cases

Frameshift deletionNonsynonymous SNVStopgainSynonymous SNV

LUSC LUAD

Q-value Q-value

A

WGS

ATAC-seq

RNA-seq

Cancerregulatory network

Mutations (SNV,CNV)

Geneexpression

Openchromatin

peaks

Tumor Adjacent tissue

0.00

0.02

0.04

BS

PN

1B

SP

N3

LUA

D1

LUA

D11

LUA

D12

LUA

D13

LUA

D14

LUA

D19

LUA

D2

LUA

D22

LUA

D23

LUA

D24

LUA

D25

LUA

D27

LUA

D28

LUA

D29

LUA

D30

LUA

D31

LUA

D32

LUA

D4

LUA

D5

LUA

D50

LUA

D51

LUA

D6

LUA

D7

LUA

D8

LUA

D9

LUS

C1

LUS

C10

LUS

C11

LUS

C14

LUS

C2

LUS

C3

LUS

C4

LUS

C5

Sig

natu

re c

ontri

butio

n

S1S2S3S4

Figure 1.

Overview of study and genetic landscape of patients with NSCLC.A,Overview of study design. B, Barplot shows the number of somatic SNV located on differentgenomic features in each sample. C, Barplot shows the counts of samples harboring different somatic SNVs and indels in known genes related to cancer. D,Barplot shows the signature contribution in each sample. E, Dendrogram represents the results of hierarchical clustering of samples according to four de novomutational signatures. F, Line chart shows significant copy number gains (orange) and losses (blue; FDR < 0.1). LUAD, lung adenocarcinoma.

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4843

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 5: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

Open chromatin signature of NSCLCTo identify open chromatin regions, we applied ATAC-seq on

each tumor sample (see Materials and Methods). The numbersof open chromatin peaks in each sample ranged from 3,989 to211,042, showing high variance (Supplementary Fig. S2A). Wethen filtered out six samples whose ATAC-seq peak numberswere less than 20,000, leaving 40 ATAC-seq data for thefollowing analysis. Because tumor samples were known to beheterogeneous (13), we evaluated the intersample heterogene-ity at open chromatin level using the Jaccard index score (30).Comparing with ATAC-seq peaks in stem cells and sorted bloodlineage cells (9), which were considered to be homogenous, theATAC-seq peaks from NSCLC samples had significantly smallerJaccard index scores (Fig. 2A), indicating that NSCLC samplescontained more heterogeneous open chromatin regions acrosssamples. Unsurprisingly, the open chromatin peaks that wereshared across NSCLC samples mainly associated with house-keeping functions, such as gene regulation, translation, andDNA damage checkpoint (Supplementary Table S4). To explorewhether sample-specific peaks could reveal function specific tothe corresponding context, we then sorted the samples basedon the percentage of minority peaks (peaks appeared in lessthan 20% samples). Interestingly, the percentage of minoritypeaks was significantly correlated with pathologic types (P ¼0.0003693, Mann–Whitney U test), where LUSC samplestended to contain more minority peaks than lung adenocarci-noma samples. In addition, the percentage of minority peakswere also significantly correlated with higher tumor stage (P ¼0.01338, Spearman correlation test), history of smoking (P ¼0.006324, Mann–Whitney U test), and female (P ¼ 0.0108,Mann–Whitney U test), but not significantly correlated withother factors such as peak number, presence of metastasis, andage (Fig. 2B).

To examine the potential of global open chromatin distri-bution as markers to classify NSCLC samples, we clusteredthe samples based on ATAC-seq signals (see Materials andMethods). The NSCLC samples aggregated into three clusters(Fig. 2C). Cluster I consisted of four LUSC and four lungadenocarcinoma samples. Samples in cluster I containedsignificantly lower number of open chromatin peaks [q-value < 2.930761e-04; Dunn (1964) Kruskal–Wallis test] andhad significantly higher intersample heterogeneity [Fig. 2D; q-value < 3.517155e-10; Dunn (1964) Kruskal–Wallis test] thanthe other two clusters, exhibiting patient-specific featuresrather than cancer type features. Cluster II mainly consistedof LUSC samples (five LUSC and one lung adenocarcinomasamples). We identified 310 genes within 2.5 kb up anddownstream of gene TSS of cluster II–specific open chromatinand with upregulated gene expression (Supplementary Fig.S2B; Supplementary Table S5; see Materials and Methods).These genes were enriched for the keratinization process(Fig. 2E; Supplementary Fig. S2C), which was a phenomenonspecific to LUSC and not in lung adenocarcinoma (31). Clus-ter III mainly consisted of lung adenocarcinoma samples (25lung adenocarcinoma and one LUSC samples), which couldbe further classified into three subclusters (Fig. 2C). Thesamples from the three subclusters contained distinct openchromatin patterns and different stages of tumor development(Fig. 2C; Supplementary Table S6). In particular, one of thesubclusters, which contained fewer peaks than the other twosubclusters, were significantly associated with early tumor

stage [q-value < 0.02, Dunn (1964) Kruskal–Wallistest; Fig. 2C; Supplementary Table S6]. We identified 279open chromatins peaks that were specific to the "early stage"related subcluster (see Materials and Methods). The genesassociated with these peaks were enriched in the integrinsignaling pathway, which played important roles in theattachment of cells to the extracellular matrix (SupplementaryTable S7). For instance, among the "early stage," specific openchromatin genes were ITGAV and ITGA6, which interactedwith vitronectin and laminin, respectively. These two geneswere reported to be prognostic indicators in multiple cancers,including NSCLC, whose high expression correlated to longersurvival time (32). On the other hand, we identified 6,664chromatin regions that were specifically closed in the "earlystage" cluster. Their related genes were enriched in "spongio-trophoblast layer development" and "glomerular visceralepithelial cell differentiation" pathways (Supplementary TableS7). Overexpression of genes in these pathways led to elevatedcell mobility, cancer cell invasion, and tumor malignant pro-gression, which were considered later stage cancer pheno-types (33). To further confirm that open chromatin distributionis a potential marker to classify lung cancer samples, weincorporated 76 QC-passed ATAC-seq data of NSCLC samplesfrom 38 patients with TCGA (12) and 5 ATAC-seq data ofsmall-cell lung cancer (SCLC) cell lines (34) with our data.Consistent with the previous analysis, most of NSCLC samplescould be classified into three groups (Supplementary Fig. S2D).Interestingly, the major group of LUSC (cluster II, including 27samples from 16 patients) and lung adenocarcinoma (clusterIII, including 50 samples from 33 patients) were classified moreclearly (Supplementary Fig. S2D; Supplementary Table S6).Cluster I consisted of nine LUSC samples from 5 patients and20 lung adenocarcinoma samples from 15 patients. Thesesamples have higher intersample heterogeneity and more var-iant peak number comparing with cluster II and cluster III.Patients in this group had a distinct open chromatin patternthat is different from typical LUSC or lung adenocarcinoma. Asmall number of NSCLC samples, six LUSC and four lungadenocarcinoma, were clustered together with SCLC samplesin cluster IV. These samples were more likely to have lower dataquality, considering that they contained the lowest open chro-matin peak numbers and high intersample heterogeneity.Therefore, we excluded these samples in further analysis. Inter-estingly, comparing with SCLC, we identified 6,965 NSCLC-specific open peaks enriched around genes that associated withcell adhesion functions, such as focal adhesion and cell-substrate adherens junction (Supplementary Fig. S2E).

To better understand the intratumor heterogeneity, we per-formed single-cell ATAC-seq on 1 patient with LUSC (Supple-mentary Table S1). In total, we detected 50,486 peaks inthis patient (Supplementary Materials and Methods). Afterremoving low-quality nuclei, we subjected 1,651 cells to t-distributed stochastic neighbor embedding (SupplementaryFig. S3A; Supplementary Table S2) and identified seven majorclusters of cells (Supplementary Materials and Methods). Wecategorized each cluster (Supplementary Fig. S3B and S3C),and found that cells in cluster III were likely to come fromtumor cells (Supplementary Materials and Methods). Asexpected, the open chromatin profile of cluster III showedhigh concordance with bulk NSCLC samples, 97% peaks incluster III appeared in bulk samples (Fig. 3; Supplementary Fig.

Wang et al.

Cancer Res; 79(19) October 1, 2019 Cancer Research4844

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 6: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

S3D). At the single cell level, we observed a high degree ofheterogeneity at some of the open chromatin regions (Fig. 3).Interestingly, there appeared to be a high correlation betweenintersample and intratumor heterogeneity of open chromatin

profile (r ¼ 0.74, P < 2.2e-16, Spearman correlation test;Supplementary Fig. S3E), suggesting highly heterogeneousopen chromatin regions were potentially important to theevolution of tumor cells.

A

E

B

C

Blood lineage

NS

CLC

0.0 0.2 0.4 0.6

0

5

10

15

0

5

10

15

Jaccard index score

Den

sity

CD4TCD8TCMPGMPHSCLMPP

LUADLUSC

MPP

Peak number

2e+05 0.5e+05

Tumor type

LUAD LUSC

Metastasis

Yes No

Staging

IA IB IIA IIB IIIA IIIB

Smoke

Yes No

Ages

75 40

Sex

M F

0

25

50

75

100

Per

cent

age

Clu

ster

IIC

lust

er II

I

0%~20% 20%~40% 40%~60% 60%~80% 80%~100%

SexAgeSmokeStageMetastasisTumor typePeak number**

**

****

**

LCE3D SPRR3LCE1E

D

1 kb 0.5 kb 1 kb

0.2

0.3

0.4

0.5

Cluster I Cluster II Cluster III

***

ns

Jacc

ard

scor

e

SexYearsSmokeStagingMetastasisTumor TypePeak number

ClusterSubcluster

Jacc

ard

scor

e0.

20.

40.

60.

81

I II IIIS3 S2 S1

Figure 2.

Integration of chromatin landscapes in NSCLC. A, Density plots show Jaccard index distribution between blood lineage cells and NSCLC samples. CD4T (n¼ 5);CD8T (n¼ 5); CMP, commonmyeloid progenitor (n¼ 8); GMP, granulocyte–macrophage progenitor cells (n¼ 7); HSC, hematopoietic stem cells (n¼ 7); LMPP,lymphoid-primed multipotent progenitor cells (n¼ 3); MPP, multipotent progenitor cells (n¼ 6); LUAD, lung adenocarcinoma (n¼ 30); and LUSC (n¼ 10). B,Barplot shows the percentage of peaks with different sample frequency in NSCLC samples. Bottom bars indicate peak number; tumor type (purple, lungadenocarcinoma; light blue, LUSC); having remote metastasis (orange) or not (blue); tumor staging; smokers (black) or nonsmokers (green); patient age; andfemale (orange) or male (blue). �� , P < 0.01. C, Heatmap shows hierarchical clustering of open chromatin region of NSCLC samples.D, Boxplot shows samplesimilarity (Jaccard index) of open chromatin region within three NSCLC clusters. ns, not significant; ��� , P < 0.005 [Dunn (1964) Kruskal–Wallis test]. E, Browserplot shows normalized ATAC-seq profiles at keratinization process–related genes (orange, cluster II NSCLC samples; blue, cluster III NSCLC samples).

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4845

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 7: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

Broad open chromatin peaks associated with NSCLC key genesThe previous study has proved that the width of epigenetic

marker peaks associated with cell type–specific key genes (30).We next focused on broad open chromatin peaks and associ-ated genes by incorporating published leukemia ATAC-seqdatasets (Supplementary Fig. S4A and S4B; SupplementaryMaterials and Methods; ref. 9). Among the broad open chro-matin peak–associated genes, a total of 310 genes were NSCLCspecific; 368 genes were Leukemia specific; and 337 genes wereshared between the two cancer types (Fig. 4A). About 10% ofthe genes in each of the three categories were COSMIC anno-tated genes that related with at least one type of cancer, whichwas not surprising because both broad open chromatin peak–associated genes and COSMIC genes tend to be involved incritical pathways. Interestingly, however, the proportion ofNSCLC-related COSMIC genes was significantly higher (P <3.44e-30, x2 test; Supplementary Fig. S4C) in NSCLC-specificbroad open chromatin peak–associated genes than the othertwo categories, which suggested that the broad open chromatinpeaks might regulate genes essential to NSCLC functions. Wefurther noticed that the NSCLC-specific broad open chromatinpeak–associated genes were enriched in pathways that werepreviously reported in NSCLC, such as ECM receptor interac-tion and proteoglycans in cancer; whereas the leukemia-specificbroad open chromatin peak–associated genes were enriched inpathways such as platelet activation (Supplementary Fig. S4D;see Materials and Methods).

Meanwhile, many known NSCLC driver genes were covered bybroad open chromatin peaks. Strikingly, in three of the samples,the open chromatin regions were so broad that the whole EGFRgene bodywas covered (Fig. 4B). Besides, there were other NSCLCdriver genes covered by broad open chromatin peaks, includingJUN, ERBB3, WNT9A, and FAT1. We ranked genes according tothe width of broad open peaks in descending order and a carrierfrequency of associated broad open regions in ascending order.Genes that frequently affected by broad open chromatin peaks

included several key driver genes of NSCLC, like EGFR, jun proto-oncogene (JUN), Erb-B2 receptor tyrosine kinase 3 (ERBB3), Wntfamily member 9A (WNT9A), and FAT atypical cadherin 1(FAT1). As an extension of these studies, we noted that the otherNSCLC-specific genes were significantly enriched for biologicalfunctions in the cancer-associated pathway, which has beenreported in the majority of solid malignant tumors (Supplemen-tary Table S8; refs. 35, 36).

To study the effect of broad open chromatin peaks on geneexpression of NSCLC, we compared the expression levels of647 genes between samples that overlapped with broad openchromatin peaks and not overlapped with broad open chro-matin peaks. For genes that overlapped with broad openchromatin peaks, their average expression levels were higherin samples with broad peaks than in those without peaks(Fig. 4C). However, only 38 of the 647 genes showed signif-icant differential expression (FDR < 0.05, Mann–Whitney Utest; Fig. 4C). We then compared the variance of expressionlevels between broad open chromatin peak overlapped andnonoverlapped samples. On average, the variance of broadopen chromatin peak overlapped genes was higher than non-overlapped genes (Fig. 4D). Compared with random genes, thefold change of variance between broad open chromatin peakoverlapped and nonoverlapped genes were significantly higherregardless of gene expression levels (Mann–Whitney U test; P <0.05; Fig. 4D), which suggested that broad open chromatinpeaks were related to misregulation of genes in NSCLC. Besides,we also found that wide open regions are more likely to overlapwith regions of copy-number gain/amplification, comparingwith random genomic regions (P < 0.05 in 31/33 samples,permutation test; Supplementary Fig. S4E). For example, inthree cases we mentioned above (Fig. 4B), both wide openchromatin regions and the gain of CNV covered the whole genebody of EGFR. This result suggested that there was an associ-ation between somatic CNVs, wide open chromatin regions,and gene expression.

25 kb

CDH3 CDH1

LUSC : Bulk ATAC-seq(n = 10)

LWQCAggregate scATAC-seq

(cluster III : 283 single cells)

Cluster III283 single cels

01

Figure 3.

Overview of open chromatin signal of tumor single cells. Browser plot displays ATAC-seq signal around the CDH3 and CDH1 gene loci from the aggregate of all 10LUSC samples and 283 tumor single cells that classified into cluster III.

Wang et al.

Cancer Res; 79(19) October 1, 2019 Cancer Research4846

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 8: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

Open chromatin on somatic CNV regions associated with geneexpression

SomaticCNVof gene regulatory regions couldpotentially affecttarget gene expression. To study the relationship between somaticCNV and open chromatin peaks in NSCLC, we intersected somat-ic CNV regions with open chromatin peaks in each sample.Interestingly, the intensity of ATAC-seq signal positively correlat-edwith the copy number of somatic CNV regions (Fig. 5A), whichsuggested that the regulatory elements were gained or lost togeth-er with the genomic regions they were located on. The somaticCNV fragment that carried open chromatin peaks (sCOP) weremore likely to be located on gene promoters and gene body, but

less likely to be located on intergenic regions (Supplementary Fig.S5A). For each sample, within 2.5 kb up- and downstreamof genebody, an average of 79.26% expressed genes in our dataset(13,257 genes) contained open chromatin peaks, an average of11.80% of the genes contained sCOP, and an average of 7.96%genes contained neither open chromatin peak nor somatic CNVregion (Supplementary Fig. S5B).

The previous study has shown the association between somaticCNV and gene expression (37). To examine the relationshipbetween somatic CNV and gene expression in our data, weidentified 6,981 genes that overlapped with somatic CNV gainfragment and 759 genes that overlapped with somatic CNV loss

Leukemia

Others

NS

CLC

NS

CLC

& leukem

ia

37.0%

40.7%

43.6%

46.2%

48.6%

EGFR

45.9%

2.7%

2.7%

10.3%

22.2%

COSMIC Gene constitutionNSCLC Leukemia

Top1% broad open chromatin peak

Non top1% broad open chromatin peak

50 kb

Total

FPKM < 1

1 <= FPKM < 3

3 <= FPKM < 5

5 <= FPKM < 10

10 <= FPKM < 20

FPKM >= 20

BackgroundWith broad peak

6

4

2

- 2

0

- 4

- 6

Log

fold

cha

nge

of v

aria

nce

-4 -2 0 2 4Log fold change

2.0

1.5

1.0

-Log

10(F

DR

)

0.5

0.0

EPHB3ERBB3

FAM174BHECTD1

CASD

PTPN14

EGFR

PIK3CA

JUN

Affected No affected

NS

CLC

Spe

cific

Leuk

emia

spe

cific

Com

mon

Nonsignificant Significant

B

D

C

A

Figure 4.

The broad open chromatin regionin NSCLC.A, Heatmap (left) showsbroad peak–associated genes inNSCLC and leukemia samples.Broad open region association iscolor coded as shown at thebottom. Pie charts (right) show thepercentage of COSMIC-annotatedgenes associated with the broadopen region. Different types oftumors are color coded on theright. B, Browser plot shows ATAC-seq signal around EGFR in 33NSCLC samples. EGFR promoterwith and without broad openregion association is zoomed outat the bottom. Sample types arecolor coded at the bottom. C,Volcano plot shows mean geneexpression of broad open samplesand nonbroad opened samples.Differentially expressed genes arecoded in orange (FDR < 0.05,Mann–Whitney U test). D, Boxplotsshow log2-fold change of variablecoefficient in broad open genes(broad open gene againstnonbroad open gene; orange), andnonbroad open genes (randomgene expression comparison; blue).

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4847

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 9: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

6

4

-2

0

2In

tens

ity

8765431 9 10 >10Copy number

0

1

2

3

−4 0 4Log2 (fold change)

−Log

10 (P

)

Loss sCOP associated DEG

Gain sCOP associated DEG

PIK3CG

chr7:

DEG-Related gain sCOP

GM12878 ChromHMMH1-hESC ChromHMM

K562 ChromHMMHepG2 ChromHMM

HUVEC ChromHMMHMEC ChromHMMHSMM ChromHMMNHEK ChromHMMNHLF ChromHMM

55,200,000 55,400,000

EGFREGFREGFREGFREGFREGFREGFREGFR

EGFR-AS1

Open frequency

100% _

3.03% _

Somatic CNV gain frequency

[0% – 42%]

44 Mb 52 Mb 60 Mb

Somatic CNV gain

Epithelial cell signaling in Helicobacter pylori infection

Aldosterone−regulated sodium reabsorption

Huntington's disease

mTOR signaling pathway

Fc epsilon RI signaling pathway

Pathogenic Escherichia coli infection

Amino sugar and nucleotide sugar metabolism

Valine, leucine and isoleucine degradation

B cell receptor signaling pathway

Neurotrophin signaling pathway

Regulation of actin cytoskeleton

VEGF signaling pathway

MAPK signaling pathway

Spliceosome

Small cell lung cancer

Proteasome

Insulin signaling pathway

T cell receptor signaling pathway

Endometrial cancer

ErbB signaling pathway

Colorectal cancer

Adherens junction

Melanoma

Fc gamma R−mediated phagocytosis

Endocytosis

Chronic myeloid leukemia

Acute myeloid leukemia

Toll−like receptor signaling pathway

Pancreatic cancer

Glycosylphosphatidylinositol(GPI)−anchor biosynthesis

Glioma

Pathways in cancer

Prostate cancer

Non−small cell lung cancer

Renal cell carcinoma

2 3 4−Log10 (P)

Gene number102030405060

1.5

2.0

2.5

3.0

3.5Fold Enrichment

−4 0 4Log2 (fold change)

0

2

4

6

−Log

10 (P

)

Gain

A

B

D

EC

Calcium signalingpathway

Ras signalingpathway

PI3K-Akt signalingpathway

p53 SignalingpathwayCell cycle Cell cycle

G1–S progression

Figure 5.

The association of somatic CNV, open chromatin peaks, and gene expression.A, Boxplot represents the correlation of the copy number of somatic CNV regionsand the intensity of ATAC-seq signal of open chromatin peaks that overlapped corresponding somatic CNV regions. B, Volcano plots illustrate DEGs betweensamples with and without sCOPs (orange, DEGs; blue, non-DEGs). C, Aligned tracks show the somatic CNV gain frequency, somatic CNV gain, open frequency,and DEG-related gain sCOP within 1,000 bp from EGFR (bright red, active promoter; light red, weak promoter; purple, inactive/poised promoter; orange, strongenhancer; yellow, weak/poised enhancer; blue, insulator; dark green, transcriptional transition; light green, weak transcribed; gray, polycomb repressed; lightgray, heterochromatin, repetitive/copy number variation).D, Point plot shows the top 35 most significantly enriched Kyoto Encyclopedia of Genes and Genomes(KEGG) pathways of gain of sCOP-associated DEG, sorted by P value in reverse order. The color represents the fold of enrichment, and the size of the pointrepresents the number of DEGs. E, The KEGG pathway view represents the NSCLC pathway. DEGs are manually highlighted (orange boxes, gain sCOP–associated DEGs; blue boxes, loss sCOP–associated DEGs).

Wang et al.

Cancer Res; 79(19) October 1, 2019 Cancer Research4848

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 10: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

fragment within 2.5 kb up- and downstream of gene body in atleast three samples. For each gene, we then compared the expres-sion levels between samples in which the gene associated withsomatic CNV fragment and samples in which the gene did notassociate with somatic CNV fragment. Compared with genes thatdid not associate with somatic CNV fragment, the majority of thegenes associated with somatic CNV gain showed higher expres-sion levels, whereas the majority of the genes associated withsomatic CNV loss showed lower expression levels (Supplemen-tary Fig. S5C). In particular, 21.7% genes associated with somaticCNV gain were differentially expressed genes (DEG), whereas14.5% genes associated with somatic CNV loss were DEGs (P <0.05, two-tailed Mann–Whitney U test; Supplementary Fig. S5C).

We next focused on the effect that sCOP exerted on geneexpression. In total, we identified 229,554 sCOPs with somaticCNVgain and 5,443 sCOPswith somatic CNV loss in at least threesamples. We then compared the expression levels of sCOP-associated genes between samples with and without sCOPs(Supplementary Materials and Methods). The majority of genesassociated with gain of sCOP exhibited significantly higherexpression levels, whereas the majority of genes associated withloss of sCOPexhibited lower expression levels, comparedwith thesame genes associated with peaks that did not overlap withsomatic CNV (Fig. 5B; Supplementary Table S10). Furthermore,34.34% of genes associated with gain of sCOP were DEGs,whereas 16.89% of genes associated with loss of sCOP wereDEGs (P < 0.05, Wilcoxon Rank Sum test; Fig. 4B; SupplementaryTable S9). DEG-related sCOPs were enriched in promoter,enhancer, and insulator, whereas distal DEG-related sCOPs (dis-tance between DEG and sCOP � 1,000 bp) were more signifi-cantly enriched in enhancers and insulators than local DEG-related sCOP (distance between DEG and sCOP <1,000 bp;Supplementary Fig. S5D). It has been shown that CNV wouldaffect gene expression through changing gene dosage (37) or the

copy of regulatory elements (38). We analyzed all 33,303 sCOPs(29,433 local sCOPs and 4,395 distant sCOPs) and their relatedDEG pairs. We found that 99.7% of the sCOP-DEG pairs werecarried by the same somatic CNV fragment, which indicated thatin most of the cases the CNV affects the copy number of genestogether with their regulatory elements.

Many known important NSCLC-related genes were affected bysCOP in our data (Supplementary Table S10). For example, weidentified 40 gain of sCOPs around EGFR gene. The EGFR geneexpressed significantly higher in samples with a gain of sCOPsthan samples without sCOP (P < 0.05, Wilcoxon Rank Sumtest; Fig. 5C). As another example, we identified NFKBIA, whichharbored the largest number of gain of sCOP around the gene(Supplementary Fig. S5E). The sCOP-associated DEGs wereenriched in multiple cancer pathways and signaling pathways(Fig. 5D; Supplementary Materials and Methods). Among all theenriched cancer pathways, NSCLC pathway ranked among thetop. We annotated the sCOP-associated DEGs in the NSCLCpathway (Fig. 5E) and discovered that they were likely to positionon the EGFR-SOS2-BRAF-MAPK1 cascade. sCOP-associatedDEGs were also significantly enriched in the ErbB signalingpathway, which played essential roles in cancer development andprogression (Supplementary Fig. S5F). In the ErbB signalingpathway, multiple sCOP-associated DEGs fell on PI3K/AKT cas-cade, which was known to be dysregulated in cancer (39).

Germline SNV on open chromatin associated with geneexpression of NSCLC

SNV could potentially affect epigenetic states on regulatoryelements and alter the expression of target genes (40). To studyhow germline SNV regulates gene expression through open chro-matin regions in NSCLC, we first called germline SNVs from oursamples, and then identified cis eQTLs (within 1,000 kb of gene,only considered SNVswithinATAC-seqpeaks) and cisATAC-QTLs

Local Distal

ATAC-QTLs

C

D

eQTLs

SNP: 11244Open region: 10982

SNP: 20Open region: 127

SNP: 76Gene: 32

In-open-region SNP GeneOpen region

SNP: 21Open region: 21

1 2 3 4 5 6 7 8 9 10 11 12 13 15 17 19 21

−Log

10 (P

)

Chromosome

XRCC6BP1

MON2

NTS

PRKAR1B

CHN2

ARL17A

ALOX15B APOL2

RP11-273G15.2

SHARPINRHPN1

GLI4KIAA1875SCRIBZNF707

ZC3H3FBXL6

MAPK15 CYP4F11

CYP4F3

PPP1R37

CTU1RHCG

HORMAD1HES4

GSTM1

SCARNA2 RPTN

C4orf48ROS1

GSTT1

DYSF

2

4

6

8

0

0.5

1.0

1.5 eQTLBackground

-1,000 -500 0 500 1,000Distance (kb)

Den

sity

(1e-

06)

0

20

40

60

TSS TES 3’UTR 5’UTR Promoter Exon Intergenic Intron

Pro

porti

on (%

)

GenomeAll tested germline SNPseQTLs

Feature

A

B

Figure 6.

Landscape of ATAC-QTLs andeQTLs in NSCLC.A, Graphicalsummary of ATAC-QTLs, eQTLs,and association between ATAC-seqopen region and gene. B,Manhattanplot of eQTLs. Orange dots, eQTLs;orange triangles, joint-QTLs.C,Density plot shows the distancebetween eQTLs and TSS ofassociated gene (orange), anddistance between all pairs of SNVand TSS of all genes used for theeQTLs test are drawn asbackground (blue).D, Barplotshows the genomic featuredistribution of the human genome,all tested germline SNVs and eQTLs.

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4849

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 11: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

Wang et al.

Cancer Res; 79(19) October 1, 2019 Cancer Research4850

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 12: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

(within 1,000 kb of ATAC-seq peaks, only considered SNVswithin ATAC-seq peaks) from our data (Fig. 6A; SupplementaryMaterials and Methods). Under the threshold of FDR < 0.1, weidentified 76 eQTLs associated with 32 genes (Fig. 6B). TheseeQTLs were not overlapped with reported eQTLs in normaltissues (41), suggesting they were likely NSCLC-specific eQTLs.The majority (62.5%) of the genes were associated with onlyone eQTL. Conversely, 12 genes (37.5%) were associated withmultiple eQTLs (median ¼ 3.5; Supplementary Fig. S6A). Inagreement with previous reports (42), the eQTLs tended to fallclose to the related genes (Fig. 6C). About 69.23% of eQTLslocated within 100 kb of their related genes. The eQTLs werehighly enriched in the promoter, exon, and intron regions(Fig. 6D). For example, we identified 17 eQTLs associated withHORMAD1 gene, which played a key role in cell-cycle regula-tion and has been reported to be a potential marker in multiplecancers (43, 44). These 17 eQTLs were highly correlated,dividing the samples into two haplotypes, and significantlyhigher HORMAD1 expression was found in samples withmutant genotype (Fig. 7A).

Apart from eQTLs, we identified 11,265 ATAC-QTLs associatedwith 11,000 open chromatin regions (Fig. 6A; SupplementaryMaterials and Methods). As expected, the ATAC-QTLs tended tolocate close to their related open chromatin regions (Supplemen-tary Fig. S6B). ATAC-QTLs were enriched in the 30UTR, 50UTR,promoter, and exon regions (Supplementary Fig. S6C). A total of41 ATAC-QTLs were located within their associated ATAC-seqpeaks, which we termed as local-ATAC-QTLs. The remaining11,244 ATAC-QTLs were correlated with distal ATAC-seq peaks,which we termed as distal-ATAC-QTLs (Fig. 6A). Interestingly, 20of the 41 local-ATAC-QTLs also correlated with distal ATAC-seqpeaks, which suggested likely long-range regulatory interaction(Fig. 6A). To establish the relationship between eQTLs and ATAC-QTLs in NSCLC, we intersected the two types of QTL and iden-tified 21 joint quantitative trait loci (joint-QTL), accounting for27%of all eQTLs (Supplementary Fig. S6D). eQTLswere enrichedin ATAC-QTLs (P ¼ 2.509e-13; fold enrichment ¼ 10.12, Fisherexact test), suggesting the possible mediating role of open chro-matin regions in gene regulation. The joint-QTLmight potentiallyaffect gene expression through their associated open chromatinregions. The 21 joint-QTLs were associated with 362 ATAC-seqpeaks and 16 genes (Fig. 7B; Supplementary Table S11). The joint-QTLs were likely to be functionally relevant loci that were ignored

by previous studies. For example, GSTM1, one of the genesassociated with joint-QTL, was involved in the detoxificationof many carcinogens (45), and multiple studies had reportedthe correlation between the malfunction of GSTM1 andincreased risk of lung cancer (46). The SNP, rs10857795,located in the intron of GSTM1, was previously annotated asno effect (47). However, we found this SNP to be joint-QTLthat correlated with both the gene expression level of GSTM1gene and the ATAC-seq peak intensity that span the promoterof this gene in NSCLC (Fig. 7C). This joint-QTL could poten-tially disrupt the Meis1(Homeobox) motif, which stronglysuggested that SNP rs10857795 might play a regulatory roleto GSTM1 gene and consequently affect the risk of NSCLC.

We further intersected the identified QTLs with lung cancer–related genome-wide association study (GWAS) significant loci(Supplementary Materials and Methods). A total of seven eQTLs,83 ATAC-QTLs, and three joint-QTLs were located within thelinkage disequilibrium (LD) blocks of lung cancer–related GWASsignificant loci (Supplementary Table S12). Compared with SNP-containing open regions, ATAC-QTL–containing open regionsoverlapped with a higher percentage of LD blocks associated withsquamous cell lung carcinoma (P ¼ 0.024, permutation test),urinary 1,3-butadienemetabolite levels in smokers (an importantcarcinogen in tobacco smoke; P ¼ 0.001, permutation test),response to taxane treatment (docetaxel; P¼ 0.002, permutationtest), andmultiple cancers (lung cancer, noncardiac gastric cancer,and esophageal squamous cell carcinoma; P ¼ 0.021, permuta-tion test; Supplementary Fig. S6E; Supplementary Materials andMethods). Similarly, eQTL-containing open regions overlappedwith a higher percentage of LD blocks associated with lung cancer(P ¼ 0.009, permutation test), lung carcinoma (P ¼ 0.001,permutation test), cancer (pleiotropy; P ¼ 0.001, permutationtest), and urinary 1,3-butadienemetabolite levels in smokers (P¼0.001, permutation test; Supplementary Fig. S6F; SupplementaryMaterials and Methods). These GWAS-associated QTLs couldpotentially affect gene regulatory networks in NSCLC throughgenomic alterations, which could only be unveiled through thecombination of GWAS and multiomics sequencing. A case inpoint is the joint-QTL, rs140313,whichwas locatedwithin the LDblock of the significant loci associated with 1,3-butadiene metab-olism and detoxification phenotype (48). This joint-QTL corre-lated with the intensity of four ATAC-seq peaks upstream of theGSTT1 gene and the expression levels of theGSTT1 gene (Fig. 7D).

Figure 7.Regulation network in patients with lung cancer. A, Browser plot shows overlaid ATAC-seq signals of mutation-harboring samples (Alt, blue) and mutation-freesamples (Ref; orange). Red boxes, eQTLs associated with HOMARD1. Heatmap shows the genotype of two groups. Boxplot shows gene expression of twogroups. B, Circos plot shows QTLs and associated genes. The innermost circle (track 1) shows linkage disequilibrium blocks of lung cancer–related traits. Themiddle two circles (track2 and track3) are circular Manhattan plots of e-QTLs and ATAC-QTLs. Y-axis denotes –log10(P) of QTLs. All three tracks are color codedas shown on the left. Orange shaded regions highlight joint-QTLs that are associated with genes marked in the center of the circos. The outermost circle shows anideogram of the human chromosomes. C, Browser plot shows overlaid ATAC-seq signals of the two groups (GG, reference genotype; AA, mutant genotype).Red, open chromatin region containing joint-QTL; green, joint-QTL associated open chromatin regions. Position weight matrix for Meis1 (Homeobox) is shown inthe middle; the 8th position corresponded to the location of the motif altering SNV (rs10857795, joint-QTL). The proportion of samples with two genotypes isshown in the pie chart, and gene expression of two genotypes is shown in the boxplot. D, The fan chart shows the zoomed-in eQTL and ATAC-QTL analysis ofchr22 with a binning size of 700 kb. The regulatory network of this joint-QTL is shown at the top right (gray, GWAS lead SNV; orange, joint-QTL SNV; green,associated gene). The distance between joint-QTL and its associated gene or open chromatin regions are marked near ledges. Individuals were separatedaccording to two genotypes (reference genotype CC andmutant genotype TT). Boxplot in the middle shows the gene expression of the two groups. Browserplot in bottom shows overlaid ATAC-seq signals of the two groups. E, The fan chart shows the zoomed-in eQTL and ATAC-QTL analysis of chr6 with a binningsize of 220 kb. The regulatory network of NSCLC-derived joint-QTL is shown at the bottom right (gray, GWAS lead SNV; orange, joint-QTL SNV; green,associated gene). Distance between joint-QTLs and their associated genes or open chromatin regions are marked near ledges. Individuals were separatedaccording to two genotypes (reference genotype AA andmutant genotype AG). Boxplot in the middle shows the gene expression of the two groups. Browserplot in the top shows overlaid ATAC-seq signals of the two groups.

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4851

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 13: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

While GSTT1 gene was reported to be associated with lung cancerrisk (48), and previous studies had reported the correlationbetween GSTT1 deletion and carcinogenesis detoxification andantineoplastic drug therapy response (45, 47), our analysissuggested a regulatory risk loci link to this gene. To validatethe regulatory interaction of the predicted joint-QTL-to-genelinks, we applied the CRISPR/Cas9 cassette method (Supple-mentary Materials and Methods; ref. 49) to replace thesequence of the QTL with the alternative genotype (Supple-mentary Fig. S7A). Consistent with the analysis, CRISPR/Cas9cassette treatment of joint-QTL, rs10857795, in which cellsgained G>A mutation (Supplementary Fig. S7B), led to signif-icant upregulation of GSTM1 gene expression in H1299 NSCLCcell line (Supplementary Fig. S7C). Similarly, CRISPR/Cas9cassette treatment of joint-QTL, rs140313, where cells gainedC>T mutation (Supplementary Fig. S7D), led to significantupregulation of GSTT1 gene expression in H1299 NSCLC cellline (Supplementary Fig. S7E). Besides, in another example, weidentified two joint-QTLs (rs17079286 and rs73766221),which were located within the LD blocks of GWAS significantloci associated with lung cancer (rs9387478; ref. 50) and lungadenocarcinoma (rs9387479; ref. 51), correlated to the inten-sity of an ATAC-seq peak upstream of ROS1 gene and theexpression levels of ROS1 gene (Fig. 7E). The ROS1 gene wasa known NSCLC driver gene, whose mutation, rearrangement,and fusion were reported to be well associated withNSCLC (50). Our analysis suggested that the genomic mutationon the regulatory elements, which affect the expression of ROS1gene, could also be an important factor in NSCLC networkmalfunction.

DiscussionPrevious study on NSCLC has extensively characterized geno-

mic features, such as sequence mutation, gene expression, andepigenetic markers. However, the chromatin accessibility ofNSCLC has not been characterized as clearly so far. In this study,we analyzed ATAC-seq profiles, together withWGS and transcrip-tome sequencing in the primary tumor tissue of 50 patients. Wefound high heterogeneity of chromatin accessibility amongpatients, which reflected the high interpatient heterogeneity ofNSCLC genome sequence. Given the heterogeneity of chromatinaccessibility, however, the ATAC-seq profiles could, to someextent, serve as molecular markers to classify NSCLC samples.On the basis of global open chromatin peaks, we identified lungadenocarcinoma- and LUSC-specific clusters and further stratifiedlung adenocarcinoma samples into three subclusters. Althoughwe found a correlation between the clusters and clinical para-meters such as tumor stage and smoking, it was challenging toconclude due to small sample size and potential confoundingeffects. Beyond that, we performed single-cell ATAC-seq technol-ogy in 1 patient, which identified different cell types in tumormicroenvironment. We discovered a positive correlation betweenintra- and intertumor heterogeneity, suggesting that it is possibleto explore tumor heterogeneity intra- and interpatient using openchromatin signature.

Other than the position of open chromatin peaks on thegenome, the width of the open chromatin peak could alsoindicate key genes in NSCLC. Previous study has proven thatbroad open chromatin peaks of active regulatory histone mod-ification regions associated with cell type–specific active genes.

It is rational to hypothesize that broad open chromatin peaksin NSCLC could be associated with critical genes, as theyindicate more active gene regulatory relationship. We thenidentified genes associated with broad ATAC-seq peaks highfrequency in our samples. These genes were indeed enriched inknown NSCLC-related genes, which suggested that broadATAC-seq peaks could potentially serve as markers to identifykey genes in NSCLC. On the other hand, we observed someextremely wide open chromatin regions, such as open chro-matin regions that extended the whole gene body of EGFR. Theregions could be the consequence of large genomic rearrange-ment (i.e., CNV) and cause severe misregulation of target genesin NSCLC. These extreme abnormal genomic open chromatinsignals could also indicate malfunction components of thegene regulatory network in NSCLC.

Importantly, through the integration of genome sequence,ATAC-seq, and transcriptome data, we could directly study howthe alteration of genome sequence on regulatory elementsaffected the gene regulatory network of NSCLC. To this end,we analyzed two types of genomic variations, that is, somaticCNV and germline SNV. Both ATAC-seq peak intensity andgene expression levels positively correlated with the copynumber of CNV region they were located on, indicating thatactive regulatory elements might gain or loss together withCNV. While previous studies proposed that CNV might altergene expression by affecting gene dosage or gene regulatoryelements, we found that in 99.7% of the cases, CNV fragmentscarried gene body together with their regulatory elements. Interms of germline SNV, we identified 76 eQTLs and about 30%of them were also ATAC-QTLs. The expression and ATAC joint-QTLs were more likely to be causal SNVs that affected geneexpression through regulatory elements. We further intersectedthe QTLs with known lung cancer–related GWAS significantloci and identified three likely causal SNVs that were theprevious challenging to pinpoint.

To sum up, we generated multiomics data from the tumortissue of NSCLC population and showed the potential of this typeof data in elucidating gene regulatory networks of NSCLC. Thesample size still limited the power of our analysis. In the future,the study would be improved with a larger population or byincorporating more types of omics data.

Disclosure of Potential Conflicts of InterestNo potential conflicts of interest were disclosed.

Authors' ContributionsConception and design: W. Luo, S. Dai, Q. Zhou, D. Xie, W. LiDevelopment of methodology: S. Dai, D. XieAcquisition of data (provided animals, acquired and managed patients,provided facilities, etc.): Z. Wang, W. Luo, J. Tang, Y. Zhou, J. Zhang,F. Cao, S. Dai, P. Tian, Y. Wang, L. Liu, G. Che, Q. ZhouAnalysis and interpretation of data (e.g., statistical analysis, biostatistics,computational analysis): K. Tu, L. Xia, K. Luo, K. Lu, X. Hu, Y. He, W. Qiao,S. Dai, D. XieWriting, review, and/or revision of the manuscript: Z. Wang, K. Tu, L. Xia,K. Luo, S. Dai, Y. Wang, Q. Zhou, D. XieAdministrative, technical, or material support (i.e., reporting or organizingdata, constructing databases): W. Luo, S. Dai, P. Tian, L. Liu, G. CheStudy supervision: S. Dai, Q. Zhou, D. Xie, W. Li

AcknowledgmentsWe thank Dr. Xin He from the University of Chicago for the valuable

discussion. This work was supported by grant nos. 91631111, 31571327,

Wang et al.

Cancer Res; 79(19) October 1, 2019 Cancer Research4852

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 14: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

and 31771426 from Chinese National Natural Science Foundation to D. Xie;the National Natural Science Foundation of China (81871890) and Trans-formation Projects of Sci-Tech Achievements of Sichuan Province(2016CZYD0001) to W. Li; China Postdoctoral Science Foundation(2017M623043) to Z. Wang; and Sci-Tech Support Program of Science andTechnology Department of Sichuan Province (2016SZ0073) to P. Tian.

The costs of publicationof this articlewere defrayed inpart by the payment ofpage charges. This article must therefore be hereby marked advertisement inaccordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Received December 1, 2018; revised April 12, 2019; accepted June 10, 2019;published first June 17, 2019.

References1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin

2018;68:7–30.2. Sakashita S, Sakashita M, Sound Tsao M. Genes and pathology of non-

small cell lung carcinoma. Semin Oncol 2014;41:28–39.3. Berger AH, Brooks AN, Wu X, Shrestha Y, Chouinard C, Piccioni F, et al.

High-throughput phenotyping of lung cancer somatic mutations.Cancer Cell 2017;32:884.

4. Widschwendter M, Jones A, Evans I, Reisel D, Dillner J, Sundstrom K, et al.Epigenome-based cancer risk prediction: rationale, opportunities andchallenges. Nat Rev Clin Oncol 2018;15:292–309.

5. Brzezianska E, Dutkowska A, Antczak A. The significance of epigeneticalterations in lung carcinogenesis. Mol Biol Rep 2013;40:309–25.

6. Park SM, Choi EY, Bae M, Kim S, Park JB, Yoo H, et al. Histone variantH3F3A promotes lung cancer cell migration through intronic regulation.Nat Commun 2016;7:12914.

7. Anastasiadou E, Jacob LS, Slack FJ. Non-coding RNA networks in cancer.Nat Rev Cancer 2018;18:5–18.

8. Buenrostro JD,Giresi PG, Zaba LC,ChangHY,GreenleafWJ. Transpositionof native chromatin for fast and sensitive epigenomic profiling of openchromatin,DNA-binding proteins and nucleosomeposition.NatMethods2013;10:1213–8.

9. Corces MR, Buenrostro JD, Wu B, Greenside PG, Chan SM, Koenig JL,et al. Lineage-specific and single-cell chromatin accessibility chartshuman hematopoiesis and leukemia evolution. Nat Genet 2016;48:1193–203.

10. BeekmanR,Chapaprieta V, RussinolN, Vilarrasa-Blasi R, Verdaguer-DotN,Martens JHA, et al. The reference epigenome and regulatory chromatinlandscape of chronic lymphocytic leukemia. Nat Med 2018;24:868–80.

11. Rendeiro AF, Schmidl C, Strefford JC, Walewska R, Davis Z, Farlik M, et al.Chromatin accessibility maps of chronic lymphocytic leukaemia identifysubtype-specific epigenome signatures and transcription regulatory net-works. Nat Commun 2016;7:11938.

12. Corces MR, Granja JM, Shams S, Louie BH, Seoane JA, Zhou W, et al. Thechromatin accessibility landscape of primary human cancers. Science2018;362:eaav1898. doi: 10.1126/science.aav1898.

13. Lambrechts D, Wauters E, Boeckx B, Aibar S, Nittner D, Burton O, et al.Phenotype molding of stromal cells in the lung tumormicroenvironment.Nat Med 2018;24:1277–89.

14. Puram SV, Tirosh I, Parikh AS, Patel AP, Yizhak K, Gillespie S, et al. Single-cell transcriptomic analysis of primary andmetastatic tumor ecosystems inhead and neck cancer. Cell 2017;171:1611–24.

15. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer forIllumina sequence data. Bioinformatics 2014;30:2114–20.

16. Li H, Durbin R.Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009;25:1754–60.

17. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. Thesequence alignment/map format and SAMtools. Bioinformatics 2009;25:2078–9.

18. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A,et al. The genome analysis toolkit: a MapReduce framework for analyzingnext-generation DNA sequencing data. Genome Res 2010;20:1297–303.

19. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C,et al. Sensitive detection of somatic point mutations in impure andheterogeneous cancer samples. Nat Biotechnol 2013;31:213–9.

20. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al.VarScan 2: somatic mutation and copy number alteration discovery incancer by exome sequencing. Genome Res 2012;22:568–76.

21. WangK, LiM,HakonarsonH.ANNOVAR: functional annotationof geneticvariants from high-throughput sequencing data. Nucleic Acids Res 2010;38:e164.

22. Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, et al.Control-FREEC: a tool for assessing copy number and allelic content usingnext-generation sequencing data. Bioinformatics 2012;28:423–5.

23. Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G.GISTIC2.0 facilitates sensitive and confident localization of the targets offocal somatic copy-number alteration in human cancers. Genome Biol2011;12:R41.

24. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE,et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 2008;9:R137.

25. McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, et al.GREAT improves functional interpretation of cis-regulatory regions.Nat Biotechnol 2010;28:495–501.

26. Gehring JS, Fischer B, Lawrence M, Huber W. SomaticSignatures: inferringmutational signatures from single-nucleotide variants. Bioinformatics2015;31:3673–5.

27. Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, Nik-Zainal S,et al. Clock-like mutational processes in human somatic cells. Nat Genet2015;47:1402–7.

28. Cancer Genome Atlas Research NetworkComprehensive molecular pro-filing of lung adenocarcinoma. Nature 2014;511:543–50.

29. Cancer Genome Atlas Research NetworkComprehensive genomic charac-terization of squamous cell lung cancers. Nature 2012;489:519–25.

30. Benayoun BA, Pollina EA, Ucar D, Mahmoudi S, Karra K, Wong ED, et al.H3K4me3breadth is linked to cell identity and transcriptional consistency.Cell 2014;158:673–88.

31. Travis WD, Brambilla E, Nicholson AG, Yatabe Y, Austin JHM, Beasley MB,et al. The 2015 World Health Organization classification of lung tumors:impact of genetic, clinical and radiologic advances since the 2004 classi-fication. J Thorac Oncol 2015;10:1243–60.

32. Linhares MM, Affonso RJ Jr, Viana Lde S, Silva SR, Denadai MV, deToledo SR, et al. Genetic and immunohistochemical expression ofintegrins ITGAV, ITGA6, and ITGA3 as prognostic factor for colorectalcancer: models for global and disease-free survival. PLoS One 2015;10:e0144333.

33. Jia Y, Ying X, Zhou J, Chen Y, Luo X, Xie S, et al. The novel KLF4/PLAC8signaling pathway regulates lung cancer growth. Cell Death Dis 2018;9:603.

34. Park JW, Lee JK, Sheu KM, Wang L, Balanis NG, Nguyen K, et al. Repro-gramming normal human epithelial tissues to a common, lethal neuro-endocrine cancer lineage. Science 2018;362:91–5.

35. Ciardiello F, Tortora G.EGFR antagonists in cancer treatment. N Engl JMed2008;358:1160–74.

36. Jaiswal BS, Kljavin NM, Stawiski EW, Chan E, Parikh C, Durinck S, et al.Oncogenic ERBB3 mutations in human cancers. Cancer Cell 2013;23:603–17.

37. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, et al.Relative impact of nucleotide and copy number variation on gene expres-sion phenotypes. Science 2007;315:848–53.

38. DeBoever C, Li H, JakuboskyD, Benaglio P, Reyna J, Olson KM, et al. Large-scale profiling reveals the influence of genetic variation on gene expressionin human induced pluripotent stem cells. Cell StemCell 2017;20:533–46.

39. Yuan TL, Cantley LC.PI3K pathway alterations in cancer: variations on atheme. Oncogene 2008;27:5497–510.

40. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al.Systematic localization of common disease-associated variation in regu-latory DNA. Science 2012;337:1190–5.

41. GTEx Consortium. Human genomics. The genotype-tissue expression(GTEx) pilot analysis: multitissue gene regulation in humans. Science2015;348:648–60.

Open Chromatin Landscape of Non–Small Cell Lung Carcinoma

www.aacrjournals.org Cancer Res; 79(19) October 1, 2019 4853

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 15: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

42. Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, et al.Understanding mechanisms underlying human gene expression variationwith RNA sequencing. Nature 2010;464:768–72.

43. Shahzad MM, Shin YH, Matsuo K, Lu C, Nishimura M, Shen DY, et al.Biological significance of HORMA domain containing protein 1(HORMAD1) in epithelial ovarian carcinoma. Cancer Lett 2013;330:123–9.

44. Watkins J, Weekes D, Shah V, Gazinska P, Joshi S, Sidhu B, et al. Genomiccomplexity profiling reveals thatHORMAD1overexpression contributes tohomologous recombination deficiency in triple-negative breast cancers.Cancer Discov 2015;5:488–505.

45. McIlwain CC, Townsend DM, Tew KD. Glutathione S-transferasepolymorphisms: cancer incidence and therapy. Oncogene 2006;25:1639–48.

46. Ford JG, Li Y, O'Sullivan MM, Demopoulos R, Garte S, Taioli E, et al.Glutathione S-transferase M1 polymorphism and lung cancer risk inAfrican-Americans. Carcinogenesis 2000;21:1971–5.

47. Moyer AM, Salavaggione OE, Hebbring SJ, Moon I, Hildebrandt MA,Eckloff BW, et al. Glutathione S-transferase T1 and M1: gene sequencevariation and functional genomics. Clin Cancer Res 2007;13:7207–16.

48. Boldry EJ, Patel YM, Kotapati S, Esades A, Park SL, Tiirikainen M, et al.Genetic determinants of 1,3-butadiene metabolism and detoxification inthree populations of smokers with different risks of lung cancer.Cancer Epidemiol Biomarkers Prev 2017;26:1034–42.

49. Hsu PY, Hsu HK, Hsiao TH, Ye Z, Wang E, Profit AL, et al. Spatiotemporalcontrol of estrogen-responsive transcription in ERa-positive breast cancercells. Oncogene 2016;35:2379–89.

50. Lan Q, Hsiung CA, Matsuo K, Hong YC, Seow A, Wang Z, et al. Genome-wide association analysis identifies new lung cancer susceptibility loci innever-smoking women in Asia. Nat Genet 2012;44:1330–5.

51. McKay JD, Hung RJ, Han Y, Zong X, Carreras-Torres R, Christiani DC, et al.Large-scale association analysis identifies new lung cancer susceptibilityloci and heterogeneity in genetic susceptibility across histological sub-types. Nat Genet 2017;49:1126–32.

Cancer Res; 79(19) October 1, 2019 Cancer Research4854

Wang et al.

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663

Page 16: TheOpenChromatinLandscapeofNon SmallCell Lung CarcinomaATAC-seq data processing Paired-end ATAC-seq fragments of human samples were aligned to human reference build hg19 using BWA

2019;79:4840-4854. Published OnlineFirst June 17, 2019.Cancer Res   Zhoufeng Wang, Kailing Tu, Lin Xia, et al.   Carcinoma

Small Cell Lung−The Open Chromatin Landscape of Non

  Updated version

  10.1158/0008-5472.CAN-18-3663doi:

Access the most recent version of this article at:

  Material

Supplementary

  http://cancerres.aacrjournals.org/content/suppl/2020/06/18/0008-5472.CAN-18-3663.DC1

Access the most recent supplemental material at:

   

   

  Cited articles

  http://cancerres.aacrjournals.org/content/79/19/4840.full#ref-list-1

This article cites 51 articles, 10 of which you can access for free at:

  Citing articles

  http://cancerres.aacrjournals.org/content/79/19/4840.full#related-urls

This article has been cited by 1 HighWire-hosted articles. Access the articles at:

   

  E-mail alerts related to this article or journal.Sign up to receive free email-alerts

  Subscriptions

Reprints and

  [email protected]

To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department at

  Permissions

  Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)

.http://cancerres.aacrjournals.org/content/79/19/4840To request permission to re-use all or part of this article, use this link

on September 19, 2020. © 2019 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst June 17, 2019; DOI: 10.1158/0008-5472.CAN-18-3663