Omics - Hunter

msconvert使用介绍

除了开放的mzML、mzXML和mzData格式之外,每个供应商通常都以特定于供应商的、专有的、封闭的格式对质谱数据进行编码。为了方便我们数据分析msconvert利用厂商提供的API将这些文件转换成开放的格式。

msconvert支持的格式转换:

VendorFormatsVendor Required Software
ABIT2DDataExplorer 4.0
AgilentMassHunter .ddistributed with ProteoWizard
BrukerCompass .d, YEP, BAF, FID, TDFdistributed with ProteoWizard
SciexWIFF / WIFF2distributed with ProteoWizard
ShimadzuLCD(未完全支持)distributed with ProteoWizard
Thermo ScientificRAWdistributed with ProteoWizard
WatersMassLynx .raw / UNIFIdistributed with ProteoWizard

msconvert可转换成以下格式:

mzML 1.1
mzML 1.0
mzXML
MGF
MS2/CMS2/BMS2
mzIdentML

其中mzXML和mzML是我们常用的格式。

下载地址

https://sourceforge.net/projects/proteowizard/

https://github.com/ProteoWizard/pwiz

备用下载

https://pan.baidu.com/s/1fOa8c-9syk0ZbBZMvaZOIw 提取码: tsw


也可以用docker:https://hub.docker.com/r/chambm/pwiz-skyline-i-agree-to-the-vendor-licenses

安装注意事项:

Windows 用户:使用安装程序需要安装Microsoft .NET Framework 4.0或更高版本。还必须具有以下、Visual C++ redistributables组件(对于 x86 或 x64,取决于您下载的安装包版本):2008、2010、2012、2013、2015、2017。此页面链接到每个 VC 版本的最新可再发行组件,不同的供应商 DLL 依赖于不同版本的 Visual C++ redistributables组件 所以建议都安装。

示例:

1)msconvert
(输出帮助信息)


2)msconvert –help
(输出更详细的帮助信息文档)

3)msconvert data.RAW
(转换成 data.mzML到当前路径)


4)msconvert data.RAW –mzXML
(转换成 data.mzXML 到当前路径)


5)msconvert *.RAW -o my_output_dir
(转换所有以 *.RAW 为后缀的文件到 mzML并输出到 my_output_dir路径)


6)msconvert data.RAW –zlib –filter “peakPicking true [1,2]”
(用vendor方法对msLevels [1,2]进行中心化过滤,并用zlib对结果数据进行压缩,此命令比较常用)

或者可以使用 ProteoWizard 自带的GUI进行可视化操作:

注意:当我们使用PeakPicking时,需要让其保持在第一条,否则不会进行centroided!!!

https://ccms-ucsd.github.io/GNPSDocumentation/fileconversion

2022.04.13新增部分参数说明

最近很多来信询问设置的问题,官方文档已经有很详细的说明于是整理如下(https://proteowizard.sourceforge.io/tools/msconvert.html):

常规参数:

Options:
  -f [ –filelist ] arg : specify text file containing filenames
  -o [ –outdir ] arg (=.) : set output directory (‘-‘ for stdout) [.]
  -c [ –config ] arg : configuration file (optionName=value)
  –outfile arg : Override the name of output file.
  -e [ –ext ] arg : set extension for output files
  [mzML|mzXML|mgf|txt|mz5]
  –mzML : write mzML format [default]
  –mzXML : write mzXML format
  –mz5 : write mz5 format
  –mgf : write Mascot generic format
  –text : write ProteoWizard internal text format
  –ms1 : write MS1 format
  –cms1 : write CMS1 format
  –ms2 : write MS2 format
  –cms2 : write CMS2 format
  -v [ –verbose ] : display detailed progress information
  –64 : set default binary encoding to 64-bit precision
  [default]
  –32 : set default binary encoding to 32-bit precision
  –mz64 : encode m/z values in 64-bit precision [default]
  –mz32 : encode m/z values in 32-bit precision
  –inten64 : encode intensity values in 64-bit precision
  –inten32 : encode intensity values in 32-bit precision
  [default]
  –noindex : do not write index
  -i [ –contactInfo ] arg : filename for contact info
  -z [ –zlib ] : use zlib compression for binary data
  –numpressLinear [toler] : use numpress linear prediction lossy compression for binary mz and rt data (relative error guaranteed less than given tolerance, default is 2e-009)
  –numpressPic : use numpress positive integer lossy compression for binary intensities (maximum 0.5 absolute error guaranteed)
  –numpressSlof [toler] : use numpress short logged float lossy compression for binary intensities (relative error guaranteed less than given tolerance, default is 0.0002)
  -n [ –numpressAll] : same as –numpressLinear –numpressSlof (see https://github.com/fickludd/ms-numpress for more info)
  –numpressLinearAbsTol : desired absolute tolerance for linear numpress prediction (e.g. use 1e-4 for a mass accuracy of 0.2 ppm at 500 m/z, default uses -1.0 for maximal accuracy). Note: setting this value may substantially reduce file size, this overrides relative accuracy tolerance.
  Numpress may be used at the same time as zlib (-z) for best compression, though some older mzML parsers may not handle this properly.
  -g [ –gzip ] : gzip entire output file (adds .gz to filename)
  –filter arg : add a spectrum list filter
  –merge : create a single output file from multiple input
  files by merging file-level metadata and
  concatenating spectrum lists
  –simAsSpectra : write selected ion monitoring as spectra, not
  chromatograms
  –srmAsSpectra : write selected reaction monitoring as spectra, not
  chromatograms
  –combineIonMobilitySpectra : write all drift bins/scans in a frame/block as one spectrum instead of individual spectra
  –acceptZeroLengthSpectra : some vendor readers have an efficient way of filtering out empty spectra, but it takes more time to open the file
  –ignoreUnknownInstrumentError : if true, if an instrument cannot be determined from a vendor file, it will not be an error
  –help : show this message, with extra detail on filter options

这里面尤其要关注--filter参数:

index <index_value_set>
msLevel <mslevels>
chargeState <charge_states>
precursorRecalculation
mzRefiner input1.pepXML input2.mzid [msLevels=<1->] [thresholdScore=<CV_Score_Name>] [thresholdValue=<floatset>] [thresholdStep=<float>] [maxSteps=<count>]
lockmassRefiner mz=<real> mzNegIons=<real (mz)> tol=<real (1.0 Daltons)>
precursorRefine
peakPicking [<PickerType> [snr=<minimum signal-to-noise ratio>] [peakSpace=<minimum peak spacing>] [msLevel=<ms_levels>]]
scanNumber <scan_numbers>
scanEvent <scan_event_set>
scanTime <scan_time_range>
sortByScanTime
stripIT
metadataFixer
titleMaker <format_string>
threshold <type> <threshold> <orientation> [<mslevels>]
mzWindow <mzrange>
mzPrecursors <precursor_mz_list>
defaultArrayLength <peak_count_range>
zeroSamples <mode> [<MS_levels>]
mzPresent <tolerance> <type> <threshold> <orientation> <mz_list> [<include_or_exclude>]
scanSumming [precursorTol=<precursor tolerance>] [scanTimeTol=<scan time tolerance>]
MS2Denoise [<peaks_in_window> [<window_width_Da> [multicharge_fragment_relaxation]]]
MS2Deisotope [hi_res [mzTol=<mzTol>]] [Poisson [minCharge=<minCharge>] [maxCharge=<maxCharge>]]
ETDFilter [<removePrecursor> [<removeChargeReduced> [<removeNeutralLoss> [<blanketRemoval> [<matchingTolerance> ]]]]]
chargeStatePredictor [overrideExistingCharge=<true|false (false)>] [maxMultipleCharge=<int (3)>] [minMultipleCharge=<int (2)>] [singleChargeFractionTIC=<real (0.9)>] [maxKnownCharge=<int (0)>] [makeMS2=<true|false (false)>]
turbocharger [minCharge=<minCharge>] [maxCharge=<maxCharge>] [precursorsBefore=<before>] [precursorsAfter=<after>] [halfIsoWidth=<half-width of isolation window>] [defaultMinCharge=<defaultMinCharge>] [defaultMaxCharge=<defaultMaxCharge>] [useVendorPeaks=<useVendorPeaks>]
activation <precursor_activation_type>
analyzer <analyzer>
analyzerType <analyzer>
polarity <polarity>

示例如下:

# extract scan indices 5…10 and 20…25
msconvert data.RAW –filter “index [5,10] [20,25]”

# extract MS1 scans only
msconvert data.RAW –filter “msLevel 1”

# extract MS2 and MS3 scans only
msconvert data.RAW –filter “msLevel 2-3”

# extract MSn scans for n>1
msconvert data.RAW –filter “msLevel 2-“

# apply ETD precursor mass filter
msconvert data.RAW –filter ETDFilter

# remove non-flanking zero value samples
msconvert data.RAW –filter “zeroSamples removeExtra”

# remove non-flanking zero value samples in MS2 and MS3 only
msconvert data.RAW –filter “zeroSamples removeExtra 2 3”

# add missing zero value samples (with 5 flanking zeros) in MS2 and MS3 only
msconvert data.RAW –filter “zeroSamples addMissing=5 2 3”

# keep only HCD spectra from a decision tree data file
msconvert data.RAW –filter “activation HCD”

# keep the top 42 peaks or samples (depending on whether spectra are centroid or profile):
msconvert data.RAW –filter “threshold count 42 most-intense”

# multiple filters: select scan numbers and recalculate precursors
msconvert data.RAW –filter “scanNumber [500,1000]” –filter “precursorRecalculation”

# multiple filters: apply peak picking and then keep the bottom 100 peaks:
msconvert data.RAW –filter “peakPicking true 1-” –filter “threshold count 100 least-intense”

# multiple filters: apply peak picking and then keep all peaks that are at least 50% of the intensity of the base peak:
msconvert data.RAW –filter “peakPicking true 1-” –filter “threshold bpi-relative .5 most-intense

FILTER详细介绍见:https://proteowizard.sourceforge.io/tools/filters.html

参考资料:

1.http://proteowizard.sourceforge.net


作者:陈浩


版权:本文版权归作者所有


免责声明:本文中使用的部分图片来自于网络或者参考资料,如有侵权,请联系博主:chenhao__@__evvail.com(发件请删除下划线)进行删除


转载注意:除非特别声明,本站点内容均为作者原创文章,转载须以链接形式标明本文链接


本文链接:https://evvail.com/2019/12/19/293.html

26 评论

  1. 您好,我想问一下,msconvert可以将mgf转为mzml格式吗,支持哪一种格式的mgf文件呢

    • 陈浩

      2022/9/20 在 11:42

      mgf已经是开放的格式了,可以用msconvert将质谱原始数据转换成mgf或者mzxml等,但是用msconvert无法将mgf直接转换成mzxml。

  2. msconvert无法将waters原始数据转成mzxml

    • 陈浩

      2022/9/10 在 9:23

      具体是什么格式的数据?
      msconvert 支持Waters产生的MassLynx .raw / UNIFI格式数据,请查看你的msconvert是否完全安装,Microsoft .NET Framework 4.0是否安装正确。
      官方介绍:https://proteowizard.sourceforge.io/doc_users.html

  3. 你好,请问利用MSconvert转换安捷伦的.d为mzml格式后发现没有MS1该如何解决?

    • 陈浩

      2022/4/13 在 13:23

      你好,你的转换命令或者设置是什么?
      猜测可能是–filter参数设置问题

    • 陈浩

      2022/4/13 在 13:29

      例如:
      # 提取MS1
      msconvert data.RAW –filter “msLevel 1”

      # 提取MS2-3
      msconvert data.RAW –filter “msLevel 2-3”

  4. 请问用岛津导出的.lcd转换.mzXML一直失败,怎么办?

  5. 请问现在还有msconvert的安装链接吗?

    • 陈浩

      2022/3/6 在 14:55

      可以从https://proteowizard.sourceforge.io/download.html下载最新安装包。

  6. 你好,请问你们导出MGF格式的文件文件名对吗?我看导出来的文件里面“title=…”都不对,我都找不到峰的保留时间和MS/MS的信息。

  7. 请问有Mac版本msconvert吗?

  8. 您好,除了安装从网上下载的软件外,还需要安装其他的软件吗,我从官网上下载之后,打开软件出现了报错。

    • 陈浩

      2021/2/28 在 23:14

      报错内容是什么呢?
      安装本软件需要安装.NET Framework 3.5 SP1.NET Framework 4.7.2(参考.NET安装指导文档)。另外需要安装Visual C++ redistributables ( x86 和 x64 ,下载地址:微软官方下载地址) 版本: 2008, 2012, 2013, 2015, 2017.

    • 感谢回复!目前报错信息没有了,软件也能打开,但是进行转换格式的时候,文件没有转换完,软件就出现了闪退。
      1.NET Framework 3.5 SP1 下载之后没有弹出安装的提示,是否影响软件使用。
      2.NET Framework4.7.2下载之后,电脑显示已经安装了更高版本,是否影响软件使用。
      3.没有安装C++,我先尝试安装一下。
      感谢回复,谢谢。

    • 陈浩

      2021/3/4 在 23:27

      应该不影响的

    • 您好,把C++安装之后,软件叶闪退。期待您的回复,谢谢。

    • 陈浩

      2022/3/11 在 22:39

      应该是C++库没安装好,这是官方的F&Q
      Windows Users: If you use the installer, then the only requirement is having Microsoft .NET Framework 4.0 or higher installed. To use the binary tarball distribution however, you must also have Visual C++ redistributables (for either x86 or x64 depending on which tarball you download) for these VC versions: 2008, 2010, 2012, 2013, 2015, 2017. This page links to the most recent redistributable for each VC version (NB: only get the redistributables, not the service packs). These are required because different vendor DLLs depend on different versions of VC.

  9. 您好,msconvert不能下载怎么办呢?

  10. 你好,请问msconvert的GUI里面options部分显示不全是什么情况?如何解决?你这个截图里也有这样的问题

    • 陈浩

      2020/11/18 在 23:15

      你好,这个GUI在现在高分辨率的电脑上确实是显示不完全的,建议分析用命令行操作。
      (如果你一定要用,建议修改下Windows系统的显示里面的缩放试试,一般小分辨率的应该没有这个现象)

发表回复

如果你有什么好的建议或者疑问请给我留言,谢谢!

Captcha Code