OpenSwathWorkflow 是OpenSWATh数据分析工作流的整合版,一般来说当我们准备好Library和转换成mzML的原始SWATH/DIA数据后,运行 OpenSwathWorkflow 可以完成MASS数据搜库,后续结合Pyprophet完成Pepetide和protein的FDR打分和鉴定。

通常我们需要运行 OpenSwathWorkflow 即可查看相关的帮助文档,如果我们要查看详细的文档我们需要输入:

OpenSwathWorkflow --helphelp


Options (mandatory options marked with '*'):
  -in <files>*:Input files separated by blank (valid formats: 'mzML', 'mzXML', 'sqMass')
  -tr <file>*:Transition file ('TraML','tsv','pqp') (valid formats: 'traML', 'tsv', 'pqp')
  -tr_type <type>:Input file type -- default: determined from file extension or content
:(valid: 'traML', 'tsv', 'pqp')
  -tr_irt <file>:Transition file ('TraML') (valid formats: 'traML')
  -rt_norm <file>:RT normalization file (how to map the RTs of this run to the ones stored in the library). If set, tr_irt may be 
:omitted. (valid formats: 'trafoXML')
  -swath_windows_file <file>:Optional, tab separated file containing the SWATH windows for extraction: lower_offset upper_offset \newline 400 
:425 \newline ... Note that the first line is a header and will be skipped.
  -sort_swath_maps:Sort input SWATH files when matching to SWATH windows from swath_windows_file
  -use_ms1_traces:Extract the precursor ion trace(s) and use for scoring
  -enable_uis_scoring:Enable additional scoring of identification assays
  -out_features <file>:Output file (valid formats: 'featureXML')
  -out_tsv <file>:TSV output file (mProphet compatible TSV file) (valid formats: 'tsv')
  -out_osw <file>:OSW output file (PyProphet compatible SQLite file) (valid formats: 'osw')
  -out_chrom <file>:Also output all computed chromatograms output in mzML (chrom.mzML) or sqMass (SQLite format) (valid formats: 'mzML', 'sqMass')
  -min_upper_edge_dist <double>:Minimal distance to the edge to still consider a precursor, in Thomson (default: '0')
  -rt_extraction_window <double>:Only extract RT around this value (-1 means extract over the whole range, a value of 600 means to extract around +/- 300 s of the expected elution). (default: '600')
  -extra_rt_extraction_window <double>:Output an XIC with a RT-window that by this much larger (e.g. to visually inspect a larger area of the chromatogram) (default: '0' min: '0')
  -ion_mobility_window <double>:Extraction window in ion mobility dimension (in milliseconds). This is the full window size, e.g. a value of 10 milliseconds would extract 5 milliseconds on either side. (default: '-1')
  -mz_extraction_window <double>:Extraction window used (in Thomson, to use ppm see -ppm flag) (default: '0.05' min: '0')
  -ppm:M/z extraction_window is in ppm
  -sonar:Data is scanning SWATH data
  -min_rsq <double>:Minimum r-squared of RT peptides regression (default: '0.95')
  -min_coverage <double>:Minimum relative amount of RT peptides to keep (default: '0.6')
  -split_file_input:The input files each contain one single SWATH (alternatively: all SWATH are in separate files)
  -use_elution_model_score:Turn on elution model score (EMG fit to peak)
  -readOptions <name>:Whether to run OpenSWATH directly on the input data, cache data to disk first or to perform a datareduction step first. If you choose cache, make sure to also set tempDirectory (default: 'normal' valid: 'normal', 'cache', 'cacheWorkingInMemory', 'workingInMemory')
  -mz_correction_function <name>:Use the retention time normalization peptide MS2 masses to perform a mass correction (linear, weighted by intensitylinear or quadratic) of all spectra. (default: 'none' valid: 'none', 'unweighted_regression', 'weighted_regression', 'quadratic_regression', 'weighted_quadratic_regression', 'weighted_quadratic_regression_delta_ppm', 'quadratic_regression_delta_ppm')
  -irt_mz_extraction_window <double>:Extraction window used for iRT and m/z correction (in Thomson, use ppm use -ppm flag) (default: '0.05')
  -ppm_irtwindow:IRT m/z extraction_window is in ppm
  -tempDirectory <tmp>:Temporary directory to store cached files for example (default: '/tmp/')
  -extraction_function <name>:Function used to extract the signal (default: 'tophat' valid: 'tophat', 'bartlett')
  -batchSize <number>:The batch size of chromatograms to process (0 means to only have one batch, sensible values are around 500-1000) (default: '0' min: '0')
Common UTIL options:
  -ini <file>:Use the given TOPP INI file
  -log <file>:Name of log file (created only when specified)
  -instance <n>:Instance number for the TOPP INI file (default: '1')
  -debug <n>:Sets the debug level (default: '0')
  -threads <n>:Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>:Writes the default configuration file
  -write_ctd <out_dir>:Writes the common tool description file(s) (Toolname(s).ctd) to <out_dir>
  -no_progress:Disables progress logging to command line
  -force:Overwrite tool specific checks.
  -test:Enables the test mode (needed for internal use only)
  --help:Shows options
  --helphelp:Shows all options (including advanced)
  -Debugging:irt_mzml <text>:Chromatogram mzML containing the iRT peptides
  -Debugging:irt_trafo <text>:Transformation file for RT transform
Library parameters section:
  -Library:retentionTimeInterpretation <choice>:How to interpret the provided retention time (the retention time column can either be interpreted to be in iRT, minutes or seconds) (default: 'iRT' valid: 'iRT', 'seconds', 'minutes')
  -Library:override_group_label_check:Override an internal check that assures that all members of the same PeptideGroupLabel have the same PeptideSequence (this ensures that only different isotopic forms of the same peptide can be grouped together in the same label group). Only turn this off if you know what you are doing.
  -Library:force_invalid_mods:Force reading even if invalid modifications are encountered (OpenMS may not recognize the modification)
Parameters for the RTNormalization for iRT petides. This specifies how the RT alignment is performed and how outlier detection is applied. Outlier detection can be done iteratively (by default) 
which removes one outlier per iteration or using the RANSAC algorithm.:

  -RTNormalization:alignmentMethod <choice>:How to perform the alignment to the normalized RT space using anchor points. 'linear': perform linear regression (for few anchor points). 'interpolated': Interpolate between anchor points (for few, noise-free anchor points).'lowess' Use local regression (for many, noisy anchor points). 'b_spline' use b splines for smoothing. (default:'linear' valid: 'linear', 'interpolated', 'lowess', 'b_spline')
  -RTNormalization:outlierMethod <choice>:Which outlier detection method to use (valid: 'iter_residual', 'iter_jackknife', 'ransac', 'none'). Iterative methods remove one outlier at a time. Jackknife approach optimizes for maximum r-squared improvement while 'iter_residual' removes the datapoint with the largest residual error (removal by residual is computationally cheaper, use this with lots of peptides). (default: 'iter_residual' valid: 'iter_residual', 'iter_jackknife', 'ransac', 'none')
  -RTNormalization:useIterativeChauvenet:Whether to use Chauvenet's criterion when using iterative methods. This should be used if the algorithm removes too many datapoints but it may lead to true outliers being retained.
  -RTNormalization:RANSACMaxIterations <number>:Maximum iterations for the RANSAC outlier detection algorithm. (default: '1000')
  -RTNormalization:RANSACMaxPercentRTThreshold <number>:Maximum threshold in RT dimension for the RANSAC outlier detection algorithm (in percent of the total gradient). Default is set to 3% which is around +/- 4 minutes on a 120 gradient. (default: '3')
  -RTNormalization:RANSACSamplingSize <number>:Sampling size of data points per iteration for the RANSAC outlier detection algorithm. (default: '10')
  -RTNormalization:estimateBestPeptides:Whether the algorithms should try to choose the best peptides based on their peak shape for normalization. Use this option you do not expect all your peptides to be detected in a sample and too many 'bad' peptides enter the outlier removal step (e.g. due to them being endogenous peptides or using a less curated list of peptides).
  -RTNormalization:InitialQualityCutoff <value>:The initial overall quality cutoff for a peak to be scored (range ca. -2 to 2) (default: '0.5')
  -RTNormalization:OverallQualityCutoff <value>:The overall quality cutoff for a peak to go into the retention time estimation (range ca. 0 to 10) (default: '5.5')
  -RTNormalization:NrRTBins <number>:Number of RT bins to use to compute coverage. This option should be used to ensure that there is a complete coverage of the RT space (this should detect cases where only a part of the RT gradient is actually covered by normalization peptides) (default: '10')
  -RTNormalization:MinPeptidesPerBin <number>:Minimal number of peptides that are required for a bin to counted as 'covered' (default: '1')
  -RTNormalization:MinBinsFilled <number>:Minimal number of bins required to be covered (default: '8')
  -RTNormalization:lowess:span <value>:Span parameter for lowess (default: '1/3' min: '0' max: '1')

  -RTNormalization:b_spline:num_nodes <number>:Number of nodes for b spline (default: '5' min: '0')
Scoring parameters section:
  -Scoring:stop_report_after_feature <number>:Stop reporting after feature (ordered by quality; -1 means do not stop). (default: '-1')
  -Scoring:rt_normalization_factor <value>:The normalized RT is expected to be between 0 and 1. If your normalized RT has a different range, pass this here (e.g. it goes from 0 to 100, set this value to 100) (default: '100')
  -Scoring:quantification_cutoff <value>:Cutoff in m/z below which peaks should not be used for quantification any more (default: '0' min: '0')
  -Scoring:write_convex_hull:Whether to write out all points of all features into the featureXML
  -Scoring:uis_threshold_sn <number>:S/N threshold to consider identification transition (set to -1 to consider all) (default: '0')
  -Scoring:uis_threshold_peak_area <number>:Peak area threshold to consider identification transition (set to -1 to consider all) (default: '0')
  -Scoring:scoring_model <choice>:Scoring model to use (default: 'default' valid: 'default', 'single_transition')

  -Scoring:TransitionGroupPicker:stop_after_feature <number>:Stop finding after feature (ordered by intensity; -1 means do not stop). (default: '-1')
  -Scoring:TransitionGroupPicker:min_peak_width <value>:Minimal peak width (s), discard all peaks below this value (-1 means no action). (default: '14')
  -Scoring:TransitionGroupPicker:peak_integration <choice>:Calculate the peak area and height either the smoothed or the raw chromatogram data. (default: 'original' valid: 'original', 'smoothed')
  -Scoring:TransitionGroupPicker:background_subtraction <choice>:Remove background from peak signal using estimated noise levels. The 'original' method is only provided for historical purposes, please use the 'exact' method and set parameters using the PeakIntegrator: settings. The same original or smoothed chromatogram specified by peak_integration will be used for background estimation. (default: 'none' valid: 'none', 'original', 'exact')
  -Scoring:TransitionGroupPicker:recalculate_peaks <choice>:Tries to get better peak picking by looking at peak consistency of all picked peaks. Tries to use the consensus (median) peak border if the variation within the picked peaks is too large. (default: 'true' valid: 'true', 'false')
  -Scoring:TransitionGroupPicker:use_precursors:Use precursor chromatogram for peak picking (note that this may lead to precursor signal driving the peak picking)
  -Scoring:TransitionGroupPicker:use_consensus <choice>:Use consensus peak boundaries when computing transition group picking (if false, compute independent peak boundarie s for each transition) (default: 'true' valid: 'true', 'false')
  -Scoring:TransitionGroupPicker:recalculate_peaks_max_z <value>:Determines the maximal Z-Score (difference measured in standard deviations) that is considered too large for peak  boundaries. If the Z-Score is above this value, the median is used for peak boundaries (default value 1.0). (defaul t: '0.75')
  -Scoring:TransitionGroupPicker:minimal_quality <value>:Only if compute_peak_quality is set, this parameter will not consider peaks below this quality threshold (default:  '-1.5')
  -Scoring:TransitionGroupPicker:resample_boundary <value>:For computing peak quality, how many extra seconds should be sample left and right of the actual peak (default:  '15')
  -Scoring:TransitionGroupPicker:compute_peak_quality <choice>:Tries to compute a quality value for each peakgroup and detect outlier transitions. The resulting score is centered around zero and values above 0 are generally good and below -1 or -2 are usually bad. (default: 'true' valid: 'tru e', 'false')
  -Scoring:TransitionGroupPicker:compute_peak_shape_metrics:Calculates various peak shape metrics (e.g., tailing) that can be used for downstream QC/QA.
  -Scoring:TransitionGroupPicker:compute_total_mi:Compute mutual information metrics for individual transitions that can be used for OpenSWATH/IPF scoring.
  -Scoring:TransitionGroupPicker:boundary_selection_method <choice>:Method to use when selecting the best boundaries for peaks. (default: 'largest' valid: 'largest', 'widest')

  -Scoring:TransitionGroupPicker:PeakPickerMRM:sgolay_frame_length <number>:The number of subsequent data points used for smoothing. This number has to be uneven. If it is not, 1 will be added. (default: '11')
  -Scoring:TransitionGroupPicker:PeakPickerMRM:sgolay_polynomial_order <number>   Order of the polynomial that is fitted. (default: '3')
  -Scoring:TransitionGroupPicker:PeakPickerMRM:gauss_width <value>:Gaussian width in seconds, estimated peak size. (default: '30')
  -Scoring:TransitionGroupPicker:PeakPickerMRM:use_gauss <choice>:Use Gaussian filter for smoothing (alternative is Savitzky-Golay filter) (default: 'false' valid: 'false', 'true')
  -Scoring:TransitionGroupPicker:PeakPickerMRM:peak_width <value>:Force a certain minimal peak_width on the data (e.g. extend the peak at least by this amount on both sides) in seco nds. -1 turns this feature off. (default: '-1')
  -Scoring:TransitionGroupPicker:PeakPickerMRM:signal_to_noise <value>:Signal-to-noise threshold at which a peak will not be extended any more. Note that setting this too high (e.g. 1.0) can lead to peaks whose flanks are not fully captured. (default: '0.1' min: '0')
  -Scoring:TransitionGroupPicker:PeakPickerMRM:write_sn_log_messages:Write out log messages of the signal-to-noise estimator in case of sparse windows or median in rightmost histogram  bin
  -Scoring:TransitionGroupPicker:PeakPickerMRM:remove_overlapping_peaks <choice>  Try to remove overlapping peaks during peak picking (default: 'true' valid: 'false', 'true')
  -Scoring:TransitionGroupPicker:PeakPickerMRM:method <choice>:Which method to choose for chromatographic peak-picking (OpenSWATH legacy on raw data, corrected picking on smoothe d chromatogram or Crawdad on smoothed chromatogram). (default: 'corrected' valid: 'legacy', 'corrected', 'crawdad')

  -Scoring:TransitionGroupPicker:PeakIntegrator:integration_type <choice>:The integration technique to use in integratePeak() and estimateBackground() which uses either the summed intensity , integration by Simpson's rule or trapezoidal integration. (default: 'intensity_sum' valid: 'intensity_sum', 'simp son', 'trapezoid')
  -Scoring:TransitionGroupPicker:PeakIntegrator:baseline_type <choice>:The baseline type to use in estimateBackground() based on the peak boundaries. A rectangular baseline shape is comp uted based either on the minimal intensity of the peak boundaries, the maximum intensity or the average intensity (base_to_base). (default: 'base_to_base' valid: 'base_to_base', 'vertical_division', 'vertical_division_min', 'vert ical_division_max')
  -Scoring:TransitionGroupPicker:PeakIntegrator:fit_EMG <choice>:Fit the chromatogram/spectrum to the EMG peak model. (default: 'false' valid: 'false', 'true')

  -Scoring:DIAScoring:dia_extraction_window <value>:DIA extraction window in Th or ppm. (default: '0.05' min: '0')
  -Scoring:DIAScoring:dia_extraction_unit <choice>:DIA extraction window unit (default: 'Th' valid: 'Th', 'ppm')
  -Scoring:DIAScoring:dia_centroided:Use centroided DIA data.
  -Scoring:DIAScoring:dia_byseries_intensity_min <value>:DIA b/y series minimum intensity to consider. (default: '300' min: '0')
  -Scoring:DIAScoring:dia_byseries_ppm_diff <value>:DIA b/y series minimal difference in ppm to consider. (default: '10' min: '0')
  -Scoring:DIAScoring:dia_nr_isotopes <number>:DIA number of isotopes to consider. (default: '4' min: '0')
  -Scoring:DIAScoring:dia_nr_charges <number>:DIA number of charges to consider. (default: '4' min: '0')
  -Scoring:DIAScoring:peak_before_mono_max_ppm_diff <value>:DIA maximal difference in ppm to count a peak at lower m/z when searching for evidence that a peak might not be  monoisotopic. (default: '20' min: '0')

  -Scoring:EMGScoring:max_iteration <number>:Maximum number of iterations using by Levenberg-Marquardt algorithm. (default: '10')

  -Scoring:Scores:use_shape_score <choice>:Use the shape score (this score measures the similarity in shape of the transitions using a cross-correlation) (def ault: 'true' valid: 'true', 'false')
  -Scoring:Scores:use_coelution_score <choice>:Use the coelution score (this score measures the similarity in coelution of the transitions using a cross-correlati on) (default: 'true' valid: 'true', 'false')
  -Scoring:Scores:use_rt_score <choice>:Use the retention time score (this score measure the difference in retention time) (default: 'true' valid: 'true',  'false')
  -Scoring:Scores:use_library_score <choice>:Use the library score (default: 'true' valid: 'true', 'false')
  -Scoring:Scores:use_intensity_score <choice>:Use the intensity score (default: 'true' valid: 'true', 'false')
  -Scoring:Scores:use_nr_peaks_score <choice>:Use the number of peaks score (default: 'true' valid: 'true', 'false')
  -Scoring:Scores:use_total_xic_score <choice>:Use the total XIC score (default: 'true' valid: 'true', 'false')
  -Scoring:Scores:use_total_mi_score:Use the total MI score
  -Scoring:Scores:use_sn_score <choice>:Use the SN (signal to noise) score (default: 'true' valid: 'true', 'false')
  -Scoring:Scores:use_mi_score:Use the MI (mutual information) score
  -Scoring:Scores:use_dia_scores <choice>:Use the DIA (SWATH) scores. If turned off, will not use fragment ion spectra for scoring. (default: 'true' valid:  'true', 'false')
  -Scoring:Scores:use_ms1_correlation:Use the correlation scores with the MS1 elution profiles
  -Scoring:Scores:use_sonar_scores:Use the scores for SONAR scans (scanning swath)
  -Scoring:Scores:use_ms1_fullscan:Use the full MS1 scan at the peak apex for scoring (ppm accuracy of precursor and isotopic pattern)
  -Scoring:Scores:use_ms1_mi:Use the MS1 MI score
  -Scoring:Scores:use_uis_scores:Use UIS scores for peptidoform identification 


Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.