Omics - Hunter

msconvert使用介绍

除了开放的mzML、mzXML和mzData格式之外,每个供应商通常都以特定于供应商的、专有的、封闭的格式对质谱数据进行编码。为了方便我们数据分析msconvert利用厂商提供的API将这些文件转换成开放的格式。

msconvert支持的格式转换:

VendorFormatsVendor Required Software
ABIT2DDataExplorer 4.0
AgilentMassHunter .ddistributed with ProteoWizard
BrukerCompass .d, YEP, BAF, FID, TDFdistributed with ProteoWizard
SciexWIFF / WIFF2distributed with ProteoWizard
ShimadzuLCD(未完全支持)distributed with ProteoWizard
Thermo ScientificRAWdistributed with ProteoWizard
WatersMassLynx .raw / UNIFIdistributed with ProteoWizard

msconvert可转换成以下格式:

mzML 1.1
mzML 1.0
mzXML
MGF
MS2/CMS2/BMS2
mzIdentML

其中mzXML和mzML是我们常用的格式。

下载地址

https://sourceforge.net/projects/proteowizard/

https://github.com/ProteoWizard/pwiz

备用下载

https://pan.baidu.com/s/1fOa8c-9syk0ZbBZMvaZOIw 提取码: tsw


也可以用docker:https://hub.docker.com/r/chambm/pwiz-skyline-i-agree-to-the-vendor-licenses

安装注意事项:

Windows 用户:使用安装程序需要安装Microsoft .NET Framework 4.0或更高版本。还必须具有以下、Visual C++ redistributables组件(对于 x86 或 x64,取决于您下载的安装包版本):2008、2010、2012、2013、2015、2017。此页面链接到每个 VC 版本的最新可再发行组件,不同的供应商 DLL 依赖于不同版本的 Visual C++ redistributables组件 所以建议都安装。

示例:

1)msconvert
(输出帮助信息)


2)msconvert –help
(输出更详细的帮助信息文档)

3)msconvert data.RAW
(转换成 data.mzML到当前路径)


4)msconvert data.RAW –mzXML
(转换成 data.mzXML 到当前路径)


5)msconvert *.RAW -o my_output_dir
(转换所有以 *.RAW 为后缀的文件到 mzML并输出到 my_output_dir路径)


6)msconvert data.RAW –zlib –filter “peakPicking true [1,2]”
(用vendor方法对msLevels [1,2]进行中心化过滤,并用zlib对结果数据进行压缩,此命令比较常用)

或者可以使用 ProteoWizard 自带的GUI进行可视化操作:

注意:当我们使用PeakPicking时,需要让其保持在第一条,否则不会进行centroided!!!

https://ccms-ucsd.github.io/GNPSDocumentation/fileconversion

2022.04.13新增部分参数说明

最近很多来信询问设置的问题,官方文档已经有很详细的说明于是整理如下(https://proteowizard.sourceforge.io/tools/msconvert.html):

常规参数:

Options:
  -f [ –filelist ] arg : specify text file containing filenames
  -o [ –outdir ] arg (=.) : set output directory (‘-‘ for stdout) [.]
  -c [ –config ] arg : configuration file (optionName=value)
  –outfile arg : Override the name of output file.
  -e [ –ext ] arg : set extension for output files
  [mzML|mzXML|mgf|txt|mz5]
  –mzML : write mzML format [default]
  –mzXML : write mzXML format
  –mz5 : write mz5 format
  –mgf : write Mascot generic format
  –text : write ProteoWizard internal text format
  –ms1 : write MS1 format
  –cms1 : write CMS1 format
  –ms2 : write MS2 format
  –cms2 : write CMS2 format
  -v [ –verbose ] : display detailed progress information
  –64 : set default binary encoding to 64-bit precision
  [default]
  –32 : set default binary encoding to 32-bit precision
  –mz64 : encode m/z values in 64-bit precision [default]
  –mz32 : encode m/z values in 32-bit precision
  –inten64 : encode intensity values in 64-bit precision
  –inten32 : encode intensity values in 32-bit precision
  [default]
  –noindex : do not write index
  -i [ –contactInfo ] arg : filename for contact info
  -z [ –zlib ] : use zlib compression for binary data
  –numpressLinear [toler] : use numpress linear prediction lossy compression for binary mz and rt data (relative error guaranteed less than given tolerance, default is 2e-009)
  –numpressPic : use numpress positive integer lossy compression for binary intensities (maximum 0.5 absolute error guaranteed)
  –numpressSlof [toler] : use numpress short logged float lossy compression for binary intensities (relative error guaranteed less than given tolerance, default is 0.0002)
  -n [ –numpressAll] : same as –numpressLinear –numpressSlof (see https://github.com/fickludd/ms-numpress for more info)
  –numpressLinearAbsTol : desired absolute tolerance for linear numpress prediction (e.g. use 1e-4 for a mass accuracy of 0.2 ppm at 500 m/z, default uses -1.0 for maximal accuracy). Note: setting this value may substantially reduce file size, this overrides relative accuracy tolerance.
  Numpress may be used at the same time as zlib (-z) for best compression, though some older mzML parsers may not handle this properly.
  -g [ –gzip ] : gzip entire output file (adds .gz to filename)
  –filter arg : add a spectrum list filter
  –merge : create a single output file from multiple input
  files by merging file-level metadata and
  concatenating spectrum lists
  –simAsSpectra : write selected ion monitoring as spectra, not
  chromatograms
  –srmAsSpectra : write selected reaction monitoring as spectra, not
  chromatograms
  –combineIonMobilitySpectra : write all drift bins/scans in a frame/block as one spectrum instead of individual spectra
  –acceptZeroLengthSpectra : some vendor readers have an efficient way of filtering out empty spectra, but it takes more time to open the file
  –ignoreUnknownInstrumentError : if true, if an instrument cannot be determined from a vendor file, it will not be an error
  –help : show this message, with extra detail on filter options

这里面尤其要关注--filter参数:

index <index_value_set>
msLevel <mslevels>
chargeState <charge_states>
precursorRecalculation
mzRefiner input1.pepXML input2.mzid [msLevels=<1->] [thresholdScore=<CV_Score_Name>] [thresholdValue=<floatset>] [thresholdStep=<float>] [maxSteps=<count>]
lockmassRefiner mz=<real> mzNegIons=<real (mz)> tol=<real (1.0 Daltons)>
precursorRefine
peakPicking [<PickerType> [snr=<minimum signal-to-noise ratio>] [peakSpace=<minimum peak spacing>] [msLevel=<ms_levels>]]
scanNumber <scan_numbers>
scanEvent <scan_event_set>
scanTime <scan_time_range>
sortByScanTime
stripIT
metadataFixer
titleMaker <format_string>
threshold <type> <threshold> <orientation> [<mslevels>]
mzWindow <mzrange>
mzPrecursors <precursor_mz_list>
defaultArrayLength <peak_count_range>
zeroSamples <mode> [<MS_levels>]
mzPresent <tolerance> <type> <threshold> <orientation> <mz_list> [<include_or_exclude>]
scanSumming [precursorTol=<precursor tolerance>] [scanTimeTol=<scan time tolerance>]
MS2Denoise [<peaks_in_window> [<window_width_Da> [multicharge_fragment_relaxation]]]
MS2Deisotope [hi_res [mzTol=<mzTol>]] [Poisson [minCharge=<minCharge>] [maxCharge=<maxCharge>]]
ETDFilter [<removePrecursor> [<removeChargeReduced> [<removeNeutralLoss> [<blanketRemoval> [<matchingTolerance> ]]]]]
chargeStatePredictor [overrideExistingCharge=<true|false (false)>] [maxMultipleCharge=<int (3)>] [minMultipleCharge=<int (2)>] [singleChargeFractionTIC=<real (0.9)>] [maxKnownCharge=<int (0)>] [makeMS2=<true|false (false)>]
turbocharger [minCharge=<minCharge>] [maxCharge=<maxCharge>] [precursorsBefore=<before>] [precursorsAfter=<after>] [halfIsoWidth=<half-width of isolation window>] [defaultMinCharge=<defaultMinCharge>] [defaultMaxCharge=<defaultMaxCharge>] [useVendorPeaks=<useVendorPeaks>]
activation <precursor_activation_type>
analyzer <analyzer>
analyzerType <analyzer>
polarity <polarity>

示例如下:

# extract scan indices 5…10 and 20…25
msconvert data.RAW –filter “index [5,10] [20,25]”

# extract MS1 scans only
msconvert data.RAW –filter “msLevel 1”

# extract MS2 and MS3 scans only
msconvert data.RAW –filter “msLevel 2-3”

# extract MSn scans for n>1
msconvert data.RAW –filter “msLevel 2-“

# apply ETD precursor mass filter
msconvert data.RAW –filter ETDFilter

# remove non-flanking zero value samples
msconvert data.RAW –filter “zeroSamples removeExtra”

# remove non-flanking zero value samples in MS2 and MS3 only
msconvert data.RAW –filter “zeroSamples removeExtra 2 3”

# add missing zero value samples (with 5 flanking zeros) in MS2 and MS3 only
msconvert data.RAW –filter “zeroSamples addMissing=5 2 3”

# keep only HCD spectra from a decision tree data file
msconvert data.RAW –filter “activation HCD”

# keep the top 42 peaks or samples (depending on whether spectra are centroid or profile):
msconvert data.RAW –filter “threshold count 42 most-intense”

# multiple filters: select scan numbers and recalculate precursors
msconvert data.RAW –filter “scanNumber [500,1000]” –filter “precursorRecalculation”

# multiple filters: apply peak picking and then keep the bottom 100 peaks:
msconvert data.RAW –filter “peakPicking true 1-” –filter “threshold count 100 least-intense”

# multiple filters: apply peak picking and then keep all peaks that are at least 50% of the intensity of the base peak:
msconvert data.RAW –filter “peakPicking true 1-” –filter “threshold bpi-relative .5 most-intense

FILTER详细介绍见:https://proteowizard.sourceforge.io/tools/filters.html

参考资料:

1.http://proteowizard.sourceforge.net


作者:陈浩


版权:本文版权归作者所有


免责声明:本文中使用的部分图片来自于网络或者参考资料,如有侵权,请联系博主:chenhao__@__evvail.com(发件请删除下划线)进行删除


转载注意:除非特别声明,本站点内容均为作者原创文章,转载须以链接形式标明本文链接


本文链接:https://evvail.com/2019/12/19/293.html

4 评论

  1. 你好为什么我安装的里面没有titlemaker呢,我在官网下载的

  2. 您好,我想问一下,安捷伦.d文件使用MSconvert转化的MGF中MS2名称是Scan number,而直接用 MassHunter 转化的MGF中MS2是m/z。个人更倾向获得m/z,请问在MSconvert中如何设置呢?

    • 陈浩

      2022/10/17 在 18:29

      msconvert文档有说明,可以参考
      titleMaker This filter adds or replaces spectrum titles according to specified . You can use it, for example, to customize the TITLE line in MGF output in msconvert. The following keywords are recognized:
      – prints the spectrum’s Run id – for example, “Data.d” from “C:/Agilent/Data.d/AcqData/mspeak.bin”
      – prints the spectrum’s index
      – prints the spectrum’s nativeID– prints the path of the spectrum’s source data
      – if the nativeID can be represented as a single number, prints that number, else index+1
      – for the first precursor, prints the spectrum’s “dissociation method” value
      – for the first precursor, prints the the spectrum’s “isolation target m/z” value – prints the nativeID of the spectrum of the first precursor
      – prints the m/z value of the first selected ion of the first precursor
      – prints the charge state for the first selected ion of the first precursor
      – prints the spectrum type
      – prints the spectrum’s first scan’s start time, in seconds
      – prints the spectrum’s first scan’s start time, in minutes
      – prints the spectrum’s base peak m/z
      – prints the spectrum’s base peak intensity
      – prints the spectrum’s total ion current
      – prints the spectrum’s MS level
      For example, to create a TITLE line in msconvert MGF output with the “name.first_scan.last_scan.charge” style (eg. “mydata.145.145.2”), use –filter “titleMaker ...

发表回复

如果你有什么好的建议或者疑问请给我留言,谢谢!

Captcha Code