msconvert使用介绍

除了开放的mzML、mzXML和mzData格式之外，每个供应商通常都以特定于供应商的、专有的、封闭的格式对质谱数据进行编码。为了方便我们数据分析msconvert利用厂商提供的API将这些文件转换成开放的格式。

msconvert支持的格式转换：

Vendor	Formats	Vendor Required Software
ABI	T2D	DataExplorer 4.0
Agilent	MassHunter .d	distributed with ProteoWizard
Bruker	Compass .d, YEP, BAF, FID, TDF	distributed with ProteoWizard
Sciex	WIFF / WIFF2	distributed with ProteoWizard
Shimadzu	LCD(未完全支持)	distributed with ProteoWizard
Thermo Scientific	RAW	distributed with ProteoWizard
Waters	MassLynx .raw / UNIFI	distributed with ProteoWizard

msconvert可转换成以下格式：

mzML 1.1
mzML 1.0
mzXML
MGF
MS2/CMS2/BMS2
mzIdentML

其中mzXML和mzML是我们常用的格式。

下载地址：

https://sourceforge.net/projects/proteowizard/

https://github.com/ProteoWizard/pwiz

备用下载：

https://pan.baidu.com/s/1fOa8c-9syk0ZbBZMvaZOIw 提取码: tsw

也可以用docker：https://hub.docker.com/r/chambm/pwiz-skyline-i-agree-to-the-vendor-licenses

安装注意事项：

Windows 用户：使用安装程序需要安装Microsoft .NET Framework 4.0或更高版本。还必须具有以下、Visual C++ redistributables组件（对于 x86 或 x64，取决于您下载的安装包版本）：2008、2010、2012、2013、2015、2017。此页面链接到每个 VC 版本的最新可再发行组件，不同的供应商 DLL 依赖于不同版本的 Visual C++ redistributables组件所以建议都安装。

示例：

1）msconvert
(输出帮助信息)

2）msconvert –help
(输出更详细的帮助信息文档)

3）msconvert data.RAW
(转换成 data.mzML到当前路径)

4）msconvert data.RAW –mzXML
(转换成 data.mzXML 到当前路径)

5）msconvert *.RAW -o my_output_dir
(转换所有以 *.RAW 为后缀的文件到 mzML并输出到 my_output_dir路径)

6）msconvert data.RAW –zlib –filter “peakPicking true [1,2]”
(用vendor方法对msLevels [1,2]进行中心化过滤，并用zlib对结果数据进行压缩，此命令比较常用)

或者可以使用 ProteoWizard 自带的GUI进行可视化操作：

注意：当我们使用PeakPicking时，需要让其保持在第一条，否则不会进行centroided！！！

https://ccms-ucsd.github.io/GNPSDocumentation/fileconversion

2022.04.13新增部分参数说明

最近很多来信询问设置的问题，官方文档已经有很详细的说明于是整理如下（https://proteowizard.sourceforge.io/tools/msconvert.html）：

常规参数：

Options:
-f [ –filelist ] arg : specify text file containing filenames
-o [ –outdir ] arg (=.) : set output directory (‘-‘ for stdout) [.]
-c [ –config ] arg : configuration file (optionName=value)
–outfile arg : Override the name of output file.
-e [ –ext ] arg : set extension for output files
[mzML|mzXML|mgf|txt|mz5]
–mzML : write mzML format [default]
–mzXML : write mzXML format
–mz5 : write mz5 format
–mgf : write Mascot generic format
–text : write ProteoWizard internal text format
–ms1 : write MS1 format
–cms1 : write CMS1 format
–ms2 : write MS2 format
–cms2 : write CMS2 format
-v [ –verbose ] : display detailed progress information
–64 : set default binary encoding to 64-bit precision
[default]
–32 : set default binary encoding to 32-bit precision
–mz64 : encode m/z values in 64-bit precision [default]
–mz32 : encode m/z values in 32-bit precision
–inten64 : encode intensity values in 64-bit precision
–inten32 : encode intensity values in 32-bit precision
[default]
–noindex : do not write index
-i [ –contactInfo ] arg : filename for contact info
-z [ –zlib ] : use zlib compression for binary data
–numpressLinear [toler] : use numpress linear prediction lossy compression for binary mz and rt data (relative error guaranteed less than given tolerance, default is 2e-009)
–numpressPic : use numpress positive integer lossy compression for binary intensities (maximum 0.5 absolute error guaranteed)
–numpressSlof [toler] : use numpress short logged float lossy compression for binary intensities (relative error guaranteed less than given tolerance, default is 0.0002)
-n [ –numpressAll] : same as –numpressLinear –numpressSlof (see https://github.com/fickludd/ms-numpress for more info)
–numpressLinearAbsTol : desired absolute tolerance for linear numpress prediction (e.g. use 1e-4 for a mass accuracy of 0.2 ppm at 500 m/z, default uses -1.0 for maximal accuracy). Note: setting this value may substantially reduce file size, this overrides relative accuracy tolerance.
Numpress may be used at the same time as zlib (-z) for best compression, though some older mzML parsers may not handle this properly.
-g [ –gzip ] : gzip entire output file (adds .gz to filename)
–filter arg : add a spectrum list filter
–merge : create a single output file from multiple input
files by merging file-level metadata and
concatenating spectrum lists
–simAsSpectra : write selected ion monitoring as spectra, not
chromatograms
–srmAsSpectra : write selected reaction monitoring as spectra, not
chromatograms
–combineIonMobilitySpectra : write all drift bins/scans in a frame/block as one spectrum instead of individual spectra
–acceptZeroLengthSpectra : some vendor readers have an efficient way of filtering out empty spectra, but it takes more time to open the file
–ignoreUnknownInstrumentError : if true, if an instrument cannot be determined from a vendor file, it will not be an error
–help : show this message, with extra detail on filter options

这里面尤其要关注--filter参数：

index <index_value_set>
msLevel <mslevels>
chargeState <charge_states>
precursorRecalculation
mzRefiner input1.pepXML input2.mzid [msLevels=<1->] [thresholdScore=<CV_Score_Name>] [thresholdValue=<floatset>] [thresholdStep=<float>] [maxSteps=<count>]
lockmassRefiner mz=<real> mzNegIons=<real (mz)> tol=<real (1.0 Daltons)>
precursorRefine
peakPicking [<PickerType> [snr=<minimum signal-to-noise ratio>] [peakSpace=<minimum peak spacing>] [msLevel=<ms_levels>]]
scanNumber <scan_numbers>
scanEvent <scan_event_set>
scanTime <scan_time_range>
sortByScanTime
stripIT
metadataFixer
titleMaker <format_string>
threshold <type> <threshold> <orientation> [<mslevels>]
mzWindow <mzrange>
mzPrecursors <precursor_mz_list>
defaultArrayLength <peak_count_range>
zeroSamples <mode> [<MS_levels>]
mzPresent <tolerance> <type> <threshold> <orientation> <mz_list> [<include_or_exclude>]
scanSumming [precursorTol=<precursor tolerance>] [scanTimeTol=<scan time tolerance>]
MS2Denoise [<peaks_in_window> [<window_width_Da> [multicharge_fragment_relaxation]]]
MS2Deisotope [hi_res [mzTol=<mzTol>]] [Poisson [minCharge=<minCharge>] [maxCharge=<maxCharge>]]
ETDFilter [<removePrecursor> [<removeChargeReduced> [<removeNeutralLoss> [<blanketRemoval> [<matchingTolerance> ]]]]]
chargeStatePredictor [overrideExistingCharge=<true|false (false)>] [maxMultipleCharge=<int (3)>] [minMultipleCharge=<int (2)>] [singleChargeFractionTIC=<real (0.9)>] [maxKnownCharge=<int (0)>] [makeMS2=<true|false (false)>]
turbocharger [minCharge=<minCharge>] [maxCharge=<maxCharge>] [precursorsBefore=<before>] [precursorsAfter=<after>] [halfIsoWidth=<half-width of isolation window>] [defaultMinCharge=<defaultMinCharge>] [defaultMaxCharge=<defaultMaxCharge>] [useVendorPeaks=<useVendorPeaks>]
activation <precursor_activation_type>
analyzer <analyzer>
analyzerType <analyzer>
polarity <polarity>

示例如下：

# extract scan indices 5…10 and 20…25
msconvert data.RAW –filter “index [5,10] [20,25]”

# extract MS1 scans only
msconvert data.RAW –filter “msLevel 1”

# extract MS2 and MS3 scans only
msconvert data.RAW –filter “msLevel 2-3”

# extract MSn scans for n>1
msconvert data.RAW –filter “msLevel 2-“

# apply ETD precursor mass filter
msconvert data.RAW –filter ETDFilter

# remove non-flanking zero value samples
msconvert data.RAW –filter “zeroSamples removeExtra”

# remove non-flanking zero value samples in MS2 and MS3 only
msconvert data.RAW –filter “zeroSamples removeExtra 2 3”

# add missing zero value samples (with 5 flanking zeros) in MS2 and MS3 only
msconvert data.RAW –filter “zeroSamples addMissing=5 2 3”

# keep only HCD spectra from a decision tree data file
msconvert data.RAW –filter “activation HCD”

# keep the top 42 peaks or samples (depending on whether spectra are centroid or profile):
msconvert data.RAW –filter “threshold count 42 most-intense”

# multiple filters: select scan numbers and recalculate precursors
msconvert data.RAW –filter “scanNumber [500,1000]” –filter “precursorRecalculation”

# multiple filters: apply peak picking and then keep the bottom 100 peaks:
msconvert data.RAW –filter “peakPicking true 1-” –filter “threshold count 100 least-intense”

# multiple filters: apply peak picking and then keep all peaks that are at least 50% of the intensity of the base peak:
msconvert data.RAW –filter “peakPicking true 1-” –filter “threshold bpi-relative .5 most-intense

FILTER详细介绍见：https://proteowizard.sourceforge.io/tools/filters.html

参考资料：

1.http://proteowizard.sourceforge.net

阅读: 13,870

msconvert文档有说明，可以参考
titleMaker This filter adds or replaces spectrum titles according to specified . You can use it, for example, to customize the TITLE line in MGF output in msconvert. The following keywords are recognized:
– prints the spectrum’s Run id – for example, “Data.d” from “C:/Agilent/Data.d/AcqData/mspeak.bin”
– prints the spectrum’s index
– prints the spectrum’s nativeID– prints the path of the spectrum’s source data
– if the nativeID can be represented as a single number, prints that number, else index+1
– for the first precursor, prints the spectrum’s “dissociation method” value
– for the first precursor, prints the the spectrum’s “isolation target m/z” value – prints the nativeID of the spectrum of the first precursor
– prints the m/z value of the first selected ion of the first precursor
– prints the charge state for the first selected ion of the first precursor
– prints the spectrum type
– prints the spectrum’s first scan’s start time, in seconds
– prints the spectrum’s first scan’s start time, in minutes
– prints the spectrum’s base peak m/z
– prints the spectrum’s base peak intensity
– prints the spectrum’s total ion current
– prints the spectrum’s MS level
For example, to create a TITLE line in msconvert MGF output with the “name.first_scan.last_scan.charge” style (eg. “mydata.145.145.2”), use –filter “titleMaker ...“

4 评论

不将就
2022/11/9 在 13:42
你好为什么我安装的里面没有titlemaker呢，我在官网下载的
回复
- 陈浩
  2022/11/14 在 22:06
  •文章的作者
  应该都会有的，这个是在–filter参数里面进行设置的
江上渔者
2022/10/14 在 17:20
您好，我想问一下，安捷伦.d文件使用MSconvert转化的MGF中MS2名称是Scan number，而直接用 MassHunter 转化的MGF中MS2是m/z。个人更倾向获得m/z，请问在MSconvert中如何设置呢？
回复
- 陈浩
  2022/10/17 在 18:29
  •文章的作者
  msconvert文档有说明，可以参考
  titleMaker This filter adds or replaces spectrum titles according to specified . You can use it, for example, to customize the TITLE line in MGF output in msconvert. The following keywords are recognized:
  – prints the spectrum’s Run id – for example, “Data.d” from “C:/Agilent/Data.d/AcqData/mspeak.bin”
  – prints the spectrum’s index
  – prints the spectrum’s nativeID– prints the path of the spectrum’s source data
  – if the nativeID can be represented as a single number, prints that number, else index+1
  – for the first precursor, prints the spectrum’s “dissociation method” value
  – for the first precursor, prints the the spectrum’s “isolation target m/z” value – prints the nativeID of the spectrum of the first precursor
  – prints the m/z value of the first selected ion of the first precursor
  – prints the charge state for the first selected ion of the first precursor
  – prints the spectrum type
  – prints the spectrum’s first scan’s start time, in seconds
  – prints the spectrum’s first scan’s start time, in minutes
  – prints the spectrum’s base peak m/z
  – prints the spectrum’s base peak intensity
  – prints the spectrum’s total ion current
  – prints the spectrum’s MS level
  For example, to create a TITLE line in msconvert MGF output with the “name.first_scan.last_scan.charge” style (eg. “mydata.145.145.2”), use –filter “titleMaker ...“

Omics - Hunter

Python jpype警告convertStrings was not specified when starting the JVM……

R Mfuzz包使用简单说明

4 评论

不将就

陈浩

江上渔者

陈浩

发表回复取消回复

Omics - Hunter

msconvert使用介绍

Python jpype警告convertStrings was not specified when starting the JVM……

R Mfuzz包使用简单说明

4 评论

不将就

陈浩

江上渔者

陈浩

发表回复 取消回复

发表回复取消回复