MesureHDMesureHDDéveloppement de nouveaux protocoles demesure, d'analyse et de traitement desdonnées adaptés aux mesures à hautesdonnées adaptés aux mesures à hautesrésolutions et à haut débit par desméthodes biophysiques
Marc-André DELSUC
méthodes biophysiques.
Marc-André DELSUC, Christian ROLANDO
Colloque Mastodons - Masse de Données Scientifiques
CNRS, 22-23 janvier 2015
The MesureHD consortium
Patrick Combette Laboratoire Jacques Louis Lions ParisINSMI Patrick Combette, Laboratoire Jacques-Louis Lions, Paris INSMI
Emilie Chouzenoux Jean-Christophe Pesquet LaboratoireINS2I
Emilie Chouzenoux , Jean-Christophe Pesquet, Laboratoire d'Informatique Gaspard Monge, Marne-la-Vallée
Pierre Collet, I-Cube, Strasbourg
Marc-André Delsuc, Bruno Kieffer, IGBMC, StrasbourgINSB
, , g
Julia Chamot-Rooke, Institut Pasteur, ParisINSB
Pascale Roy, Synchrotron Soleil,
Christian Rolando, MSAP, LilleINC
Page 2
MS/MS Sequencingte
nsity
As many MS/MS spectra as ions present in the
/Rel
ativ
e in
t as ions present in the MS spectrum
1: MS Spectrumm/zR
nsity
Need to select precursors: data dependant analysis
ativ
e in
ten
m/z2: precursor
selection
Rel
sity
ativ
e in
ten
m/z
3: fragmentation/sequencingR
ela
Exacte mass and number of unique peptide in a genome
Theoretical predicition of identifiedpeptide percentage in function ofpeptide percentage in function ofmass (m/z) at different accuracy.
YeastYeast
C. Elegans
Liu, T., Belov, M. E., Jaitly, N., Qian, W. J., &
Page 4
Smith, R. D. (2007). Accurate mass measurements in proteomics. Chemical reviews, 107(8), 3621-3653.
Why 2D FT-ICR mass spectrometry
MS information on all the compounds at the same time, pparallel acquisition.
• MS/MS: 1 compound at a time serial acquisition One• MS/MS: 1 compound at a time, serial acquisition. One fragmentation mode at a time, either fragments or precursors.
Two-dimensional FT-ICR MS (2D FT-ICR): all compounds at once with both correlations (fragments and precursors), i d d tl f th l it b t i t d f ll tindependently of the complexity, but requires to record a full set of data.
A truly data independent acquisition.
van Agthoven, M. A., Delsuc, M. A., Bodenhausen, G., & Rolando, C. (2013). Towards analytically usefultwo-dimensional Fourier transform ion cyclotron resonance mass spectrometry. Analytical and bioanalyticalchemistry, 405(1), 51-61
Principle of 2D RMN and 2D FT-ICR/MS
2D NMR NOESY
Müller, L., Kumar, A., & Ernst, R. R. (1975). Two‐dimensional carbon‐13 NMR spectroscopy. The Journal of Chemical Physics, 63(12), 5490-491.
P. Pfändler, G. Bodenhausen, J. Rapin, M.-E. Walser, T. Gäumann, J. Am. Chem. Soc. 110 (1988) 5625-5628.
2D NMR versus 2D FT-ICR MS
2D NMR 2D FT-ICR MS
Page 7
Number of papers per year 2D NMR and 2D FT-ICR MS
Problems to be solved for analytical useful 2D FT-ICR
• Preserve FT-ICR resolution during in-cell FT-MS: earlyexperiment was performed using Collision Induced Fragmentationp p g g(CID) with a gas which induces resolution loss as resolution isinversely proportional to pressure.
•Data handling: the size of a 1D FT-ICR spectrum at fullresolution is typically 1 mega points (4 mega bytes). Thetheoretical size of a 2D FT-ICR is 16 peta bytes… In comparison2D NMR is performed with 2048 × 2048 points (32 mega bytes)2D NMR is performed with 2048 × 2048 points (32 mega bytes).
• Scintillation noise removal: for each t (delay) a new bunch of• Scintillation noise removal: for each t1 (delay) a new bunch ofions must be introduced as MS/MS is a destructive processcontrary to NMR which use the same sample after spin relaxation.
Page 8
contrary to NMR which use the same sample after spin relaxation.van Agthoven, M. A., Delsuc, M. A., Bodenhausen, G., & Rolando, C. (2013). Towards analytically useful two-dimensional Fourier transform ion cyclotron resonance mass spectrometry. Analytical and bioanalyticalchemistry, 405(1), 51-61
Creating focuss from noise
Page 10Van Putten, E. G., Akbulut, D., Bertolotti, J., Vos, W. L., Lagendijk, A., & Mosk, A. P. (2011). Scattering lensresolves sub-100 nm structures with visible light. Physical review letters, 106(19), 193905.
Basis of signal treatment
Signal time-series : P frequencies
Uniform sampling
M x NH k l t i
Uniform sampling
L = M + N + 1M < N
Hankel matrix
M
NN
● Hankel matrix: Same terms on antidiagonals
Cadzow procedure
The idea is to decompose H‣ using Singular Value Decompostion SVDusing Singular Value Decompostion SVD
‣ i l l‣ singular values
we keep only the k largest singular values‣ and reconstruct a denoised signal from the rank-reduced H matrixand reconstruct a denoised signal from the rank reduced H matrix
‣ projection of H on a subspace ‣ then averaging on H antidiagonals
Cadzow, J.A. (1988) IEEE Trans. Acous. Speech Signal, Proc., 36, 49-62.
urQRd algorithm
Build H : MxN
Build a random matrix K is ~ number of signals K is ~ number of signals K << M < N
Sample H with itSample H with it Y smaller than H
Find main axes of YFind main axes of Y QR decomposition MUCH faster than SVD
k k d ti f H i Qmake a rank reduction of H using Q
Reconstruction, as with Cadzow
urQRd an efficient denoising of Fourier Transform (ICR, Orbitrap) Mass Spectrometry data
Page 14
Chiron, L., van Agthoven, M. A., Kieffer, B., Rolando, C., & Delsuc, M. A. (2014). Efficient denoising algorithmsfor large experimental datasets and their applications in Fourier transform ion cyclotron resonance massspectrometry. Proceedings of the National Academy of Sciences, 111(4), 1385-1390.
2 D FT-ICR: substance P, ESI, ECD, classical 2D FT spectrum
FT-ICR: 7 Tesla Solarix7 Tesla, SolarixHarmonized cell
I i ti ESI
Scintillation noise
Ionisation: ESI
Fragmentation: ECD
Analyte: Substance P, 1 picomol.microL-1
Fragment linep
Acquisition: F1, 2 k 45 minutesF2 128 kF2, 128 k4 Gbyte file
Data treatment (open
Page 15
Data treatment (open source): Spike
2 D FT-ICR: substance P, ESI, ECD, UrQRd denoised
Processing time proportional to fileproportional to file size.
Not limited byNot limited by computer memory.
Chiron L van Agthoven MChiron, L., van Agthoven, M. A., Kieffer, B., Rolando, C., & Delsuc, M. A. (2014). Efficient denoisingalgorithms for large experimental datasets and their applications in Fourier transform ion cyclotrontransform ion cyclotron resonance mass spectrometry. Proceedings of the National Academy of
Page 16
Sciences, 111, 1385-1390.
How to increase resolution in the first dimension
Do a better use of the first data points: Filter diagonalization method algorithm (FDM)Filter diagonalization method algorithm (FDM)
Hu, H., Van, Q. N., Mandelshtam, V. A., & Shaka, A. J. (1998). Reference deconvolution, phase correction, and line listing ofNMR spectra by the 1D filter diagonalization method. Journal of Magnetic resonance, 134(1), 76-87.Aizikov, K., & O’Connor, P. B. (2006). Use of the filter diagonalization method in the study of space charge related frequency
d l ti i F i t f i l t t t J l f th A i S i t f M
Page 17
modulation in Fourier transform ion cyclotron resonance mass spectrometry. Journal of the American Society for MassSpectrometry, 17(6), 836-843.Kozhinov, A. N., & Tsybin, Y. O. (2012). Filter diagonalization method-based mass spectrometry for molecular andmacromolecular structure analysis. Analytical chemistry, 84(6), 2850-2856.
How to increase resolution in the first dimension
NMR solution: skip points! Non uniform sampling (NUS) &Maximum entropy algorithm (MaxEnt) for reconstructionMaximum entropy algorithm (MaxEnt) for reconstruction
FFT MaxEntFirst 128 pts
FFT 1024 points
FFT Fi t 512 t
MaxEntRandomFirst 512 pts Random 128 out of 512 pts
Barna, J. C. J., Laue, E. D., Mayger, M. R., Skilling, J., & Worrall, S. J. P. (1987). Exponential sampling, an alternative method forsampling in two-dimensional NMR experiments. Journal of Magnetic Resonance (1969), 73(1), 69-77.D l M A (1989) A i t i l ith ith li ti t l ti
Page 18
Delsuc, M. A. (1989). A new maximum entropy processing algorithm, with applications to nuclear magnetic resonanceexperiments. In Maximum entropy and bayesian methods (pp. 285-290).Hyberts, S. G., Arthanari, H., Robson, S. A., & Wagner, G. (2014). Perspectives in magnetic resonance: NMR in the post-FFTera. Journal of Magnetic Resonance, 241, 60-73.
RECITAL (derived from FISTA) on 1 D
Myoglobine MS, res 15000, FFTFFT,no isotopic resolution
FISTA 20 noise10
FISTA 20 noise1
Page 19Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems (FISTA). SIAM Journal on Imaging Sciences, 2(1), 183-202.
RECITAL (derived from FISTA) on 1 D MS
Cytochrome C, m/z 773(+15), res 7500, MS/MS, ETD 50 ms, m/z 200-2000
FFTMS-Align YADA-deconvoluted
Completely wrong identification at 10, 15 and 20 ppm error
MS-Align on YADA-deconvolutedFISTA 10
Page 208 ETD fragments at 20 ppm. Identification of the protein with a high probability
NUS & RECITAL: zoom on parent ion (doubly charged)
NUS 4 k div 2Non NUS 2 k NUS 8 k div 4NUS 4 k div 2 Non NUS 2 k NUS 8 k div 4
NUS 16 k div 8 NUS 32 k div 16NUS 16 k div 8 NUS 32 k div 16 Precursor resolutionis increasing withundersampling ratioundersampling ratioas expected.
Page 21
NUS & RECITAL: parent precursor profile (monoisotopic peak)
NUS 4 k div 2 Non NUS 2 k NUS 8 k div 4
NUS 16 k div 8 NUS 32 k div 16NUS 16 k div 8 NUS 32 k div 16
Page 22Precursor FWHM is decreasing proportionally to the undersampling ratio.
2 D FT-ICR new processing: urQRd for NUS
MH NH +/ MH NH +MH22+‐H2O
S b t P
MH+/ MH2.+
MH2‐NH3.+ / MH2‐NH3
+
MH2‐2NH3.+
+
C5. /C5
Substance PNUS 16 k div 8Fragment ion spectrum MH2‐CH3NO+
C10
Z9•/ Z9
Fragment ion spectrum
C4• / C4
b10‐NH32+
C9C8
C7
C6b
a2+
b2
Page 23
2 D FT-ICR new processing: urQRd for NUS
Substance P, NUS 16 k div 8
Parent ion spectrum, zoom Fragment ion spectrum zoom
Page 24
Parent ion spectrum, zoom Fragment ion spectrum, zoom
ANR Defi de tous les savoirs 2014, One Shot 2D FT-ICR
Block-coordinate algorithms (-> 2015)
P. L. Combettes and J.-C. Pesquet,P. L. Combettes and J. C. Pesquet, Stochastic quasi-Fejér block-coordinate fixed point iterations withrandom sweepingrandom sweepingSIAM Journal on Optimization, en révision
Cette nouvelle génération d'algorithmepermet un niveau supplémentairep ppde décomposition des variables: àchaque itération on ne traite quecertaines coordonnées descertaines coordonnées desvariables au lieu de toutes dans lesméthodes classiques. Ceci permet detraiter des problèmes de très
Page 25
traiter des problèmes de trèsgrande taille efficacement.
Proximal methods: tools for solving inverseproblems on a large scale (-> 2015)
Combettes, P. L., & Pesquet, J. C. (2011). Proximal splitting methods in signal processing. In Fixed-point algorithms for inverse problems in science and engineering (pp 185 212) Springerand engineering (pp. 185-212). Springer
Page 26
Use of GPU to accelerate urQRd (-> 2015)
Carte NVIDIA Titan BlackCarte NVIDIA Titan Black
336 GB/s bande passante mémoire
5 Tflops single precisionp g p
1,7 TFlops single precision
250 W - 1000 €
2880 cœurs à 889 MHz, 6 GB
Page 28
IR and THz synchrotron radiation for high resolution spectroscopy
methyl formatePropane
methyl formate
THzPure Rotations Torsions / Rotations
2 4 6 8 1210
Vibrations / Rotations
Page 29Huge N (>109) & big k (proportional to N)
Optimisation et processus dynamiques en apprentissage et dans les problèmes inverses (8-12 septembre 2014)
Le but de ce colloque est de stimuler les discussions et de favoriser lacréation de nouvelles collaborations entre chercheurs italiens et français surcréation de nouvelles collaborations entre chercheurs italiens et français surles thèmes suivants: algorithmes pour l'optimisation convexe et lesinclusions monotones, méthodes de point fixe, théorie des jeux, interactions
t d i di èt t ti thé i d l' tientre dynamiques discrètes et continues, théorie de l'apprentissagestatistique, traitement de masses de données, problèmes inverses.
https://www ljll math upmc fr/ plc/sestri/
Page 30Organisateur: Patrick Combette
https://www.ljll.math.upmc.fr/~plc/sestri/
Complexity in Chemistry & Biology
http://chemcomplex2015.sciencesconf.org/
Organizer: Marc-André Delsuc & Bruno Kieffer
Opening Conference:Jean-Marie Lehn, Nobel prize of chemistry
Page 31
110 participants from the different communities (data treatment, bioinformatics, chemistry, biology)
Creation of a start-up CASC4DE
Page 34
Company founders: Marc-André Delsuc, Bruno Krieff, Julia Chamot-Rooke, Christian Rolando and private partners
Mesure HD: joint papers 2014
1 - Chiron, L., van Agthoven, M. A., Kieffer, B., Rolando, C. & Delsuc,, , g , , , , , ,M.-A. Efficient denoising algorithms for large experimental datasetsand their applications in Fourier transform ion cyclotron resonance
t t P N tl A d S i USA 111 1385 1390 (2014)mass spectrometry. Proc Natl Acad Sci USA 111, 1385–1390 (2014).
2 - P. L. Combettes and J.-C. Pesquet, "Stochastic quasi-Fejér block-2 P. L. Combettes and J. C. Pesquet, Stochastic quasi Fejér blockcoordinate fixed point iterations with random sweeping," soumis àSIAM Journal on Optimization, en révision.
3 - F., Wagner, D., & Collet, P. (2014, January). Massively ParallelGenerational GA on GPGPU Applied to Power Load ProfilesGenerational GA on GPGPU Applied to Power Load ProfilesDetermination. In Artificial Evolution (pp. 227-239). SpringerInternational Publishing.
Page 35
http://www.springer.com/computer/ai/book/978-3-642-37958-1.
Mesure HD: organized or co-organized congress 2014
1 - Optimisation et processus dynamiques en apprentissage et dans les problèmes inverses 8-12 septembre, 2014, Sestri Levante, Italiales problèmes inverses 8 12 septembre, 2014, Sestri Levante, Italia(https://www.ljll.math.upmc.fr/~plc/sestri/)Organisée P. L. CombettesPrésentations de P L Combettes Q Van Ngyen (doctorant dePrésentations de P. L. Combettes, Q. Van Ngyen (doctorant de Combettes), Jean-Christophe Pesquet et Emilie Chouzenoux.
2 Chemical Complexity & Biology 19 20 janvier 2015 Strasbourg2 - Chemical Complexity & Biology 19-20 janvier 2015, Strasbourg(http://chemcomplex2015.sciencesconf.org/)Organisée par M-A Delsuc et B. Kieffer, avec le soutient de l’action M t dMastodonsPrésentations de Christian Rolando, Emilie Chouzenoux
3 - Complex Systems Digital Campus 2015 (CS-DC 2015), First World E-Conference, 30 septembre -1 octobre 2015.Co-organisateur Pierre Collet
Page 36
g
Actors of the MesureHD consortium
P Combette Préexistant
JC PesquetE Chouzenoux
2014
2015
P Collet
MA Delsuc P Roy
Page 37J. Chamot-RookeC Rolando