Proteomics

name: title-slide
class: title-slide

# Proteomics
## A brief overview

### <span style="color:white;">Gabriel Mateus Bernardo Harrington</span>
### Research Associate<br>Sims Group
### 2022-06-22 (updated: 2022-06-22)

---

# Two high-level approaches

### Top-down proteomics

- Uses intact whole proteins
- Limited to only a investigating a small number of predefined proteins
- Good for looking at protein modifications (e.g. phosphorylations)

### Bottom-up proteomics

- Proteins are digested into peptides
- Can be used to examination of protein subsets, but also for characterising the whole proteome

Basic steps:
 1. Enzymatically digest proteins into peptides
 1. Sort the peptides using high-performance liquid chromatography
 1. Feed this into a mass spectrometer which fragments the peptides further and measures the mass/charge of the resulting fragments

---

.center[
<img src="images/bottom-up-proteomics.png" width="80%" />
]

???

- we're using [S-TRAP](https://protifi.com/pages/s-trap) kit from a company called Protifi to prep the samples
- It's a system for binding the proteins, allowing you to clean any contaminates, denature the proteins and digest them to peptides
- trypsin cuts at lysine and arginine

---

# Mass resolution matters

- Resolution is mass spectrometry relates to the width of a peak at a given hight (typically 50% peak height) .footnote[Image taken from: ([Eidhammer, Flikka, Martens, et al., 2007](https://doi.org/10.1002/9780470724309))]

.center[
<img src="images/monoisotopic_peak.png" width="90%" />
]

???

- use of isotopes to figure out charge - carbon 13, if next peak is +1m/z, then ion is plus one, if it's plus +2 then it'd be +0.5m/z
- this can cause problems at larger masses as it becomes less likely there is a peak with no carbon 13

---

# Fragmentation ladders

- Bonds are weaker between amino acids as opposed to within them, therefore the `$b_i$` and `$y_i$` fragments are of most interest

.center[
<img src="images/fragmentation_ladder.png" width="90%" />
]

---

# Interpretable ion ladder

- Ideally, this is the sort of ladder we'd get... .footnote[Image taken from Lennart Martens presentation]

.center[
<img src="images/ideal_ion_ladder.png" width="65%" />
]

---

# Interpretable ion ladder

- Realistically, this is the sort of ladder we get... .footnote[Image taken from Lennart Martens presentation]

.center[
<img src="images/realistic_ion_ladder.png" width="65%" />
]

???

- we get a range of potential sequences with different scores

---

# Peptide identification

- ***De Novo* search**: sequence a peptide by building a graph of observed amino acid distances in a spectrum. Then find a path that explains the peptide mass and e.g., most of the observed peaks
- **Spectral library matching**: compare against previously observed or predicted spectra
- **Database search**: Generate theoretical spectra and compare against observed ones .footnote[Image taken from: ([Duncan, Aebersold, and Caprioli, 2010](https://doi.org/10.1038/nbt0710-659))]

.center[
<img src="images/database_search.jpg" width="65%" />
]

???

- you get multiple PSM scores for different matches returned, not just the best score
- intensities aren't typically computed in the *in-silico* spectra, but there are modern methods that attempt this
- there are also methods that try to predict retention time

---

# False discovery rates

- Problem: there's uncertainty in matches, we need a means to quantify and correct for inaccurate matches
- Solution: Create decoy sequences to match against to compute the ratio of false targets

`$$FDR = \frac{\#decoys}{\#targets}$$`

- Reversing sequences in database is a common way to make the decoys

FDR from posterior error probabilities (PEP):
- e.g. [PeptideProphet](http://proteinprophet.sourceforge.net/)
- fit a mixture model (true hit / false hit) to score distributions

Post-processing:
- e.g. [Percolator](http://percolator.ms/): machine learning to discriminate between correct and incorrect PSMs
- can improve results is score of individual spectra are not easily comparable (not well calibrated)

---

# Protein identification

.center[
<img src="images/protein_identification.png" width="65%" />
]

.pull-right[Image taken from: ([Martens and Hermjakob, 2007](https://doi.org/10.1039/B705178F))]

---

# Complexity is linked to database

.center[
<img src="images/protein_id_proportions.png" width="65%" />
]

.pull-right[Image taken from: ([Barsnes and Martens, 2013](https://doi.org/10.1007/s00726-012-1455-z))]

---

# Data acquisition techniques

- Parallel Reaction Monitoring (PRM): good for quantitation of pre-selected protein/s (succeeded MRM)
- Data Dependent Acquisition (DDA): good for comprehensive detection of whole proteome
- **Data Independent Acquisition (DIA): hybrid that aims to do both**

.center[
<img src="images/data_acquisition_techniques.gif" width="65%" />
]

???

- Can have targeted or untargeted DDA/DIA
  - In targeted, use a inclusion list of peptides mass of interest

---

# Window width varies in DDA vs DIA

.center[
<img src="images/dia_vs_dda_windows.png" width="75%" />
]

---

# Nice example

.center[
<img src="images/dia_peptide_clean.png" width="55%" />
]

---

# Messy example

.center[
<img src="images/dia_peptide_messy.png" width="55%" />
]

---

.center[
<img src="images/spectral_library.png" width="65%" />
]

---

class: final-slide

# Thanks for listening

<br>
.left[
<div style="color:skyblue;">
<span style="color:white;"><b>Gabriel Mateus Bernardo Harrington</b></span><br>
<b>Research Assocaite</b><br>
<b></b>
</div>

]

These slides can be found from the QR code below, or at this address:<br>[https://h-mateus.github.io/presentations/ipmar_proteomics_2022-06-01/index.html#1](https://h-mateus.github.io/presentations/ipmar_proteomics_2022-06-01/index.html#1)

.pull-left[
<img src="images/slide_qr.png" width="50%" />
]

---

## References

<a name=bib-barsnes_crowdsourcing_2013></a>[Barsnes, H. and L.
Martens](#cite-barsnes_crowdsourcing_2013) (2013). "Crowdsourcing in
Proteomics: Public Resources Lead to Better Experiments". In: _Amino
Acids_ 44.4, pp. 1129-1137. ISSN: 1438-2199. DOI:
[10.1007/s00726-012-1455-z](https://doi.org/10.1007%2Fs00726-012-1455-z).

<a name=bib-duncan_pros_2010></a>[Duncan, M. W., R. Aebersold, and R.
M. Caprioli](#cite-duncan_pros_2010) (2010). "The Pros and Cons of
Peptide-Centric Proteomics". In: _Nature Biotechnology_ 28.7, pp.
659-664. ISSN: 1546-1696. DOI:
[10.1038/nbt0710-659](https://doi.org/10.1038%2Fnbt0710-659).

<a name=bib-eidhammer_computational_2007></a>[Eidhammer, I., K. Flikka,
L. Martens, et al.](#cite-eidhammer_computational_2007) (2007).
_Computational Methods for Mass Spectrometry Proteomics_. Chichester,
UK: John Wiley & Sons, Ltd. ISBN: 978-0-470-72430-9 978-0-470-51297-5.
DOI: [10.1002/9780470724309](https://doi.org/10.1002%2F9780470724309).

<a name=bib-martens_proteomics_2007></a>[Martens, L. and H.
Hermjakob](#cite-martens_proteomics_2007) (2007). "Proteomics Data
Validation: Why All Must Provide Data". In: _Molecular BioSystems_ 3.8,
pp. 518-522. ISSN: 1742-2051. DOI:
[10.1039/B705178F](https://doi.org/10.1039%2FB705178F).