name: title-slide class: title-slide # Proteomics ## A brief overview ### <span style="color:white;">Gabriel Mateus Bernardo Harrington</span> ### Research Associate<br>Sims Group ### 2022-06-22 (updated: 2022-06-22) <!--Use the following to add further logos to the title/final slide--> <!--Adjust sizing in the CSS file--> <div class="title-logo-1"></div> <div class="title-logo-2"></div> --- # Two high-level approaches ### Top-down proteomics - Uses intact whole proteins - Limited to only a investigating a small number of predefined proteins - Good for looking at protein modifications (e.g. phosphorylations) ### Bottom-up proteomics - Proteins are digested into peptides - Can be used to examination of protein subsets, but also for characterising the whole proteome Basic steps: 1. Enzymatically digest proteins into peptides 1. Sort the peptides using high-performance liquid chromatography 1. Feed this into a mass spectrometer which fragments the peptides further and measures the mass/charge of the resulting fragments --- .center[ <img src="images/bottom-up-proteomics.png" width="80%" /> ] ??? - we're using [S-TRAP](https://protifi.com/pages/s-trap) kit from a company called Protifi to prep the samples - It's a system for binding the proteins, allowing you to clean any contaminates, denature the proteins and digest them to peptides - trypsin cuts at lysine and arginine --- # Mass resolution matters - Resolution is mass spectrometry relates to the width of a peak at a given hight (typically 50% peak height) .footnote[Image taken from: ([Eidhammer, Flikka, Martens, et al., 2007](https://doi.org/10.1002/9780470724309))] .center[ <img src="images/monoisotopic_peak.png" width="90%" /> ] ??? - use of isotopes to figure out charge - carbon 13, if next peak is +1m/z, then ion is plus one, if it's plus +2 then it'd be +0.5m/z - this can cause problems at larger masses as it becomes less likely there is a peak with no carbon 13 --- # Fragmentation ladders - Bonds are weaker between amino acids as opposed to within them, therefore the `\(b_i\)` and `\(y_i\)` fragments are of most interest .center[ <img src="images/fragmentation_ladder.png" width="90%" /> ] --- # Interpretable ion ladder - Ideally, this is the sort of ladder we'd get... .footnote[Image taken from Lennart Martens presentation] .center[ <img src="images/ideal_ion_ladder.png" width="65%" /> ] --- # Interpretable ion ladder - Realistically, this is the sort of ladder we get... .footnote[Image taken from Lennart Martens presentation] .center[ <img src="images/realistic_ion_ladder.png" width="65%" /> ] ??? - we get a range of potential sequences with different scores --- # Peptide identification - ***De Novo* search**: sequence a peptide by building a graph of observed amino acid distances in a spectrum. Then find a path that explains the peptide mass and e.g., most of the observed peaks - **Spectral library matching**: compare against previously observed or predicted spectra - **Database search**: Generate theoretical spectra and compare against observed ones .footnote[Image taken from: ([Duncan, Aebersold, and Caprioli, 2010](https://doi.org/10.1038/nbt0710-659))] .center[ <img src="images/database_search.jpg" width="65%" /> ] ??? - you get multiple PSM scores for different matches returned, not just the best score - intensities aren't typically computed in the *in-silico* spectra, but there are modern methods that attempt this - there are also methods that try to predict retention time --- # False discovery rates - Problem: there's uncertainty in matches, we need a means to quantify and correct for inaccurate matches - Solution: Create decoy sequences to match against to compute the ratio of false targets `$$FDR = \frac{\#decoys}{\#targets}$$` - Reversing sequences in database is a common way to make the decoys FDR from posterior error probabilities (PEP): - e.g. [PeptideProphet](http://proteinprophet.sourceforge.net/) - fit a mixture model (true hit / false hit) to score distributions Post-processing: - e.g. [Percolator](http://percolator.ms/): machine learning to discriminate between correct and incorrect PSMs - can improve results is score of individual spectra are not easily comparable (not well calibrated) --- # Protein identification .center[ <img src="images/protein_identification.png" width="65%" /> ] .pull-right[Image taken from: ([Martens and Hermjakob, 2007](https://doi.org/10.1039/B705178F))] --- # Complexity is linked to database .center[ <img src="images/protein_id_proportions.png" width="65%" /> ] .pull-right[Image taken from: ([Barsnes and Martens, 2013](https://doi.org/10.1007/s00726-012-1455-z))] --- # Data acquisition techniques - Parallel Reaction Monitoring (PRM): good for quantitation of pre-selected protein/s (succeeded MRM) - Data Dependent Acquisition (DDA): good for comprehensive detection of whole proteome - **Data Independent Acquisition (DIA): hybrid that aims to do both** .center[ <img src="images/data_acquisition_techniques.gif" width="65%" /> ] ??? - Can have targeted or untargeted DDA/DIA - In targeted, use a inclusion list of peptides mass of interest --- # Window width varies in DDA vs DIA .center[ <img src="images/dia_vs_dda_windows.png" width="75%" /> ] --- # Nice example .center[ <img src="images/dia_peptide_clean.png" width="55%" /> ] --- # Messy example .center[ <img src="images/dia_peptide_messy.png" width="55%" /> ] --- .center[ <img src="images/spectral_library.png" width="65%" /> ] --- class: final-slide # Thanks for listening <br> .left[ <div style="color:skyblue;"> <span style="color:white;"><b>Gabriel Mateus Bernardo Harrington</b></span><br> <b>Research Assocaite</b><br> <b></b> </div> ] These slides can be found from the QR code below, or at this address:<br>[https://h-mateus.github.io/presentations/ipmar_proteomics_2022-06-01/index.html#1](https://h-mateus.github.io/presentations/ipmar_proteomics_2022-06-01/index.html#1) .pull-left[ <img src="images/slide_qr.png" width="50%" /> ] --- ## References <a name=bib-barsnes_crowdsourcing_2013></a>[Barsnes, H. and L. Martens](#cite-barsnes_crowdsourcing_2013) (2013). "Crowdsourcing in Proteomics: Public Resources Lead to Better Experiments". In: _Amino Acids_ 44.4, pp. 1129-1137. ISSN: 1438-2199. DOI: [10.1007/s00726-012-1455-z](https://doi.org/10.1007%2Fs00726-012-1455-z). <a name=bib-duncan_pros_2010></a>[Duncan, M. W., R. Aebersold, and R. M. Caprioli](#cite-duncan_pros_2010) (2010). "The Pros and Cons of Peptide-Centric Proteomics". In: _Nature Biotechnology_ 28.7, pp. 659-664. ISSN: 1546-1696. DOI: [10.1038/nbt0710-659](https://doi.org/10.1038%2Fnbt0710-659). <a name=bib-eidhammer_computational_2007></a>[Eidhammer, I., K. Flikka, L. Martens, et al.](#cite-eidhammer_computational_2007) (2007). _Computational Methods for Mass Spectrometry Proteomics_. Chichester, UK: John Wiley & Sons, Ltd. ISBN: 978-0-470-72430-9 978-0-470-51297-5. DOI: [10.1002/9780470724309](https://doi.org/10.1002%2F9780470724309). <a name=bib-martens_proteomics_2007></a>[Martens, L. and H. Hermjakob](#cite-martens_proteomics_2007) (2007). "Proteomics Data Validation: Why All Must Provide Data". In: _Molecular BioSystems_ 3.8, pp. 518-522. ISSN: 1742-2051. DOI: [10.1039/B705178F](https://doi.org/10.1039%2FB705178F).