rMATS是一款对RNA-Seq数据进行差异可变剪切分析的软件。rMATS可识别的可变剪切事件有5种:
- skipped exon (SE)外显子跳跃
- alternative 5′ splice site (A5SS)第一个外显子可变剪切
- alternative 3′ splice site (A3SS)最后一个外显子可变剪切
- mutually exclusive exons (MXE)外显子选择性跳跃
- retained intron (RI)内含子滞留
目前最新版是rMATS 4.1.1,安装方式如下:
一些依赖环境:
- Python (3.6.12 or 2.7.15)
- Cython (0.29.21 or 0.29.15 for Python 2)
- numpy (1.16.6 or 1.16.5 for Python 2)
- BLAS, LAPACK
- GNU Scientific Library (GSL 2.5)
- GCC (>=5.4.0)
- gfortran (Fortran 77)
- CMake (3.15.4)
- PAIRADISE (可选)
- Samtools (建议安装,如果需要从原始fq文件开始分析)
- STAR (建议安装,如果需要从原始fq文件开始分析)
下载源代码(https://github.com/Xinglab/rmats-turbo/releases),然后进行编译:
./build_rmats
编译完成后可以用python rmats.py {arguments}
python rmats.py -h
usage: rmats.py [options]
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--gtf GTF An annotation of genes and transcripts in GTF format
--b1 B1 A text file containing a comma separated list of the
BAM files for sample_1. (Only if using BAM)
--b2 B2 A text file containing a comma separated list of the
BAM files for sample_2. (Only if using BAM)
--s1 S1 A text file containing a comma separated list of the
FASTQ files for sample_1. If using paired reads the
format is ":" to separate pairs and "," to separate
replicates. (Only if using fastq)
--s2 S2 A text file containing a comma separated list of the
FASTQ files for sample_2. If using paired reads the
format is ":" to separate pairs and "," to separate
replicates. (Only if using fastq)
--od OD The directory for final output from the post step
--tmp TMP The directory for intermediate output such as ".rmats"
files from the prep step
-t {paired,single} Type of read used in the analysis: either "paired" for
paired-end data or "single" for single-end data.
Default: paired
--libType {fr-unstranded,fr-firststrand,fr-secondstrand}
Library type. Use fr-firststrand or fr-secondstrand
for strand-specific data. Default: fr-unstranded
--readLength READLENGTH
The length of each read
--variable-read-length
Allow reads with lengths that differ from --readLength
to be processed. --readLength will still be used to
determine IncFormLen and SkipFormLen
--anchorLength ANCHORLENGTH
The anchor length. Default is 1
--tophatAnchor TOPHATANCHOR
The "anchor length" or "overhang length" used in the
aligner. At least "anchor length" NT must be mapped to
each end of a given junction. The default is 6. (Only
if using fastq)
--bi BINDEX The directory name of the STAR binary indices (name of
the directory that contains the SA file). (Only if
using fastq)
--nthread NTHREAD The number of threads. The optimal number of threads
should be equal to the number of CPU cores. Default: 1
--tstat TSTAT The number of threads for the statistical model. If
not set then the value of --nthread is used
--cstat CSTAT The cutoff splicing difference. The cutoff used in the
null hypothesis test for differential splicing. The
default is 0.0001 for 0.01% difference. Valid: 0 <=
cutoff < 1. Does not apply to the paired stats model
--task {prep,post,both,inte,stat}
Specify which step(s) of rMATS to run. Default: both.
prep: preprocess BAMs and generate a .rmats file.
post: load .rmats file(s) into memory, detect and
count alternative splicing events, and calculate P
value (if not --statoff). both: prep + post. inte
(integrity): check that the BAM filenames recorded by
the prep task(s) match the BAM filenames for the
current command line. stat: run statistical test on
existing output files
--statoff Skip the statistical analysis
--paired-stats Use the paired stats model
--novelSS Enable detection of novel splice sites (unannotated
splice sites). Default is no detection of novel splice
sites
--mil MIL Minimum Intron Length. Only impacts --novelSS
behavior. Default: 50
--mel MEL Maximum Exon Length. Only impacts --novelSS behavior.
Default: 500
--allow-clipping Allow alignments with soft or hard clipping to be used
--fixed-event-set FIXED_EVENT_SET
A directory containing fromGTF.[AS].txt files to be
used instead of detecting a new set of events
另外也可以用docker安装该应用:docker pull xinglab/rmats:v4.1.1
1)从fastq文件开始分析:
假设有如下数据:
- group 1 FASTQ文件
/path/to/1_1.R1.fastq
/path/to/1_1.R2.fastq
/path/to/1_2.R1.fastq
/path/to/1_2.R2.fastq
- group 2 FASTQ文件
/path/to/2_1.R1.fastq
/path/to/2_1.R2.fastq
/path/to/2_2.R1.fastq
/path/to/2_2.R2.fastq
将上述文件列表整理到文本中,用:
来分割双端测序文件, 用,
来分开生物/技术重复:
如:
/path/to/s1.txt
内容如下:
/path/to/1_1.R1.fastq:/path/to/1_1.R2.fastq,/path/to/1_2.R1.fastq:/path/to/1_2.R2.fastq
/path/to/s2.txt
内容如下:
/path/to/2_1.R1.fastq:/path/to/2_1.R2.fastq,/path/to/2_2.R1.fastq:/path/to/2_2.R2.fastq
然后执行:
# 执行命令
python rmats.py
--s1 /path/to/s1.txt # 输入sample1的txt格式的文件
--s2 /path/to/s2.txt
--gtf /path/to/the.gtf # gtf文件
--bi /path/to/STAR_binary_index # STAR索引文件
-t paired # 双端测序则paired或单端测序single
--readLength 50 # 测序读长
--nthread 4 # 分析线程数
--od /path/to/output # 输出文件的路径
--tmp /path/to/tmp_output # 缓存文件路径
2)从bam文件开始分析(推荐用STAR比对后的bam文件):
假设有如下数据:
- group 1 BAM文件
/path/to/1_1.bam
/path/to/1_2.bam
- group 2 BAM文件
/path/to/2_1.bam
/path/to/2_2.bam
将上述文件整理到文本中,用,
分割:
/path/to/b1.txt
/path/to/1_1.bam,/path/to/1_2.bam
/path/to/b2.txt
/path/to/2_1.bam,/path/to/2_2.bam
然后执行命令:
# 执行命令
python rmats.py
--s1 /path/to/s1.txt # 输入sample1的txt格式的文件
--s2 /path/to/s2.txt
--gtf /path/to/the.gtf # gtf文件
-t paired # 双端测序则paired或单端测序single
--readLength 50 # 测序读长
--nthread 4 # 分析线程数
--od /path/to/output # 输出文件的路径
--tmp /path/to/tmp_output # 缓存文件路径
rMATS的结果文件是以各个可变剪切事件分别输出的,主要由[AS_Event].MATS.JC.txt,[AS_Event].MATS.JCEC.txt,fromGTF.[AS_Event].txt,JC.raw.input.[AS_Event].txt,JCEC.raw.input.[AS_Event].txt这几类,详细介绍以及文件内容介绍见:https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/README.md#output
分析完成后,我们可以用rmats2sashimiplot(https://github.com/Xinglab/rmats2sashimiplot)进行数据的可视化,如我们对上述的结果文件进行可视化:
python rmats2sashimiplot.py
--s1 ./rmats2sashimiplot_test_data/sample_1_replicate_1.sam,./rmats2sashimiplot_test_data/sample_1_replicate_2.sam,./rmats2sashimiplot_test_data/sample_1_replicate_3.sam
--s2 ./rmats2sashimiplot_test_data/sample_2_replicate_1.sam,./rmats2sashimiplot_test_data/sample_2_replicate_2.sam,./rmats2sashimiplot_test_data/sample_2_replicate_3.sam
-t SE
-e ./rmats2sashimiplot_test_data/SE.MATS.JC.txt
--l1 SampleOne
--l2 SampleTwo
--exon_s 1
--intron_s 5
-o test_events_output
参考资料:
1.http://rnaseq-mats.sourceforge.net/
2.https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/README.md