1. First, please install and register to activate Preview.
2. Provide (1) the spectrum file in .mzML, mzXML, Thermo RAW, or .mgf format, and (2) the database in FASTA format. Then you simply press the Run button (3). Once the process finishes, a browser window will open to display the result. Note that you will be asked to select a folder to place the resulting files (which you may do later).
3. You may optionally provide (a) fixed modification information and (b) options such as digestions, fragmentation, and whether to produce a recalibrated file (output file is in .mgf format currently, and written in the results folder). To learn about these options in detail, read below and also the tool tips (by hovering the mouse cursor over the text within the application’s UI), or see the help page.
We will guide you through a sample run and how to use the results. When you launch Preview, you will see the user interface.
There are only two required inputs: (1) a set of MS/MS spectra in mzML, mzXML, Thermo RAW, or .mgf format, and (2) a protein database in FASTA format. Press the “Select MS/MS data file” and “Select Protein database file to browse for your files.”
You can define fixed modifications for your sample. Common modification presets can be found by clicking on the pulldown menu. In this case, the user specified a fixed modification of +57.0215 Da on cysteine. For standard cysteine treatments (+0, +46, +57, +58, and +71), this input is not strictly necessary, because Preview can usually deduce the cysteine treatment from the data.
Preview does not attempt to deduce other fixed modifications, so the user must specify lysine and N-terminal modifications (such as iTRAQ), C-terminal modifications (such as 18O labeling), and lysine and N-terminal modifications (such as TMT, iTRAQ and SILAC). Note that you can enter, by typing, modification masses that are not already in the pull-down menu to customize according to your needs.
In this example, the user left the settings of digestion cleavages and initial search specificity at their defaults: RK (meaning trypsin digestion) and fully specific initial search. We recommend fully specific initial search for all digested samples; nonspecific initial search may perform better for undigested (peptidomic) samples.
The “Phospho enriched” option is used only for cases where the sample is composed predominantly of phosphopeptides; this option optimizes Preview for this situation. The “Wildcard search” option directs the program to perform a blind modification search that tries each integer mass shift from -50 to +150 on each residue. The “Try all charge assignments” option directs Preview to ignore the charge assignments in the spectrum file and run every spectrum with z = +1, +2, +3 for each CID spectrum, and z = +2, +3, +4 for each ETD spectrum. You can run Preview twice, once with this box checked and once without, to test the reliability of charge assignments. For the “Fragmentation type,” you can choose between CID/HCD or ETD/ECD.
Recalibration: In the above example, the user has chosen the default option, which is to recalibrate both the precursor and the fragment masses. This will generate a new file with “.recal.” inserted in the .mgf file name. The original file is not altered. The recalibrated file is located in the results output folder, which will be inside the folder set from the menu: Edit > Preferences.
Preview runs quickly (in this case, about 15 seconds) and then produces its output. The primary output is two html pages: Summary and Details pages. In addition, Preview outputs a Byonic parameter file located in the output folder for subsequent search, with an extension .byparms. Note that this is a suggested search. In general, the parameters should be reviewed and, if necessary, modified to fit the particular experiment. Preview is good at finding modifications that are consistently found sample-wide (for example, in vitro modifications from sample handling and processing). However, Preview may miss post-translational modifications that are found only on a few proteins, and Preview does not look for glycopeptides. If you are interested in finding specific post-translational modifications or glycopeptides, you should carefully review and adjust the Byonic parameters suggested by Preview.
The first output html page is a Summary that presents Preview’s most important findings: the top proteins (up to 10), mass measurement errors, and the most prevalent modifications. From this page, the user can click to a Details page, with more details, to the results folder where there is a recalibrated spectrum file (if that option was chosen), and spreadsheets giving peptide identifications. Note that there will be fewer identifications than can be obtained with standard search engines. Preview samples the data; it does not perform a complete search
The top protein list shows that only Bruton’s tyrosine kinase was found in quantity, and indeed this sample is nominally a one-protein (therapeutic protein) sample, purified by gel electrophoresis. Preview ranks proteins using a rather complicated function of the number of unique peptides, the peptide matching scores, and so forth. In this case, only the top protein is surely in the sample, because it is the only protein with more than one distinct peptide. A few extra proteins do not hurt Preview’s assays, because they will provide few high-scoring identifications. Preview automatically adds matched decoy peptides for all searches, and it uses the decoys to estimate and correct for false discoveries and to set the score thresholds for accepting identifications. There is no need to add decoys to the input database.
The plots of precursor and fragment measurement errors reveal something interesting in this example: the measurements could benefit from recalibration. The precursor measurements are running about 15 ppm too high, which is fairly large for Orbitrap measurements, and the fragment measurements are running about 0.3 Da too low, which is fairly large for LTQ measurements. Preview includes built-in recalibration, which generates a new spectrum file (at the specified by the output folder). On this data set, recalibration improves the precursor errors by about 7x to a median accuracy (absolute value) of about 2 ppm and the fragment errors by about 3x to a median accuracy of about 0.1 Da.
Note that a median precursor accuracy of 2.4 ppm does not mean that the user should specify 2.4 ppm tolerance in a search engine such as Mascot, SEQUEST, or X!Tandem. The median error is the typical error of an abundant ion, and the mass tolerance should be set to at least three times the typical error in order to catch all the valid identifications. In the case of BTK.recal.dta, we would choose a 10 ppm precursor tolerance and 0.4 Da fragment tolerance, about 4x these median errors.
The Summary page also gives statistics on digestion specificity and fixed and variable modifications. In this case, we see nonspecific digestion at the N-terminus, oxidized methionine, cysteine propionamide (not surprising in a gel sample), acetylated protein N-terminus, and phosphorylated serine, threonine, and tyrosine (known PTMs in Bruton’s tyrosine kinase).
The list of the most common variable modifications (above image, bottom) helps the user choose the modifications to enable for a full search, based on prevalence, biological importance, search time, and so forth. The most common variable modifications in this sample are oxidized methionine, protein (and hence peptide) N-terminal acetylation, and cysteine propionamide; the user would probably want to enable all three of these. The user may want to enable sodiation; enabling this modification does not usually increase the amount of biological information, but a search with sodiation at D, E, and C-terminus would be reasonable. The user would definitely want to enable phosphorylation in studying a kinase that is itself phosphorylated or any sample where phosphorylation is suspected to be of importance. The user may or may not want to enable deamidation and N-terminal methylation and dimethylation. Some samples have considerably more deamidation than in this example.
The Details page lists all of Preview’s assays, including both positive and negative results. It also lists the size of the sample for each assay. For example, Preview made 73 identifications (including duplicate identifications) to peptides containing M, and 24 of these 73 included at least one M [+16]. The sample sizes let the user judge the statistical significance of the assay result.
From the Summary page, the user can click to a Details page that gives the full account of Preview’s “assays” and their results. The above image shows that Preview detected no oxidations besides methionine sulfoxide. It also reveals some amount of pyro-glu cyclization on peptides with N-terminal Q, E, and C[+57], and some beta-elimination, which in this case is probably caused by neutral loss of phosphoric acid from phosphoserine and phosphothreonine. Pyro-glu does not increase the size of the search much, because it applies only to peptides with certain N-terminal residues, and beta-elimination provides additional information about phosphorylation sites, so in this study we would enable these modifications, along with oxidized methionine, protein N-terminal acetylation, cysteine propionamide, and all three phosphorylations (S, T, Y).
Note: The Details page reports the rates of modification on eligible peptides, whereas the Summary page reports potential gains in the total number of identifications. Denominators in the percentages may also vary from search to search due to “second-order” effects such as multiply modified peptides and corrections for hits to decoys