Galaxy Project是一款开源的基于Web的生信分析平台 ,之前一篇文章《开源生信(云)平台部署教程-Galaxy Project》介绍了部署流程,本文简单介绍如何快速开发基于Galaxy 平台的生物信息工具。
创建工具主要需要两部分:
- 你需要执行任务的脚本或者命令
- 配置文件(XML语法,定义输入输出,参数等)
下面以一个简单的示例来说明:
1)编写一个示例脚本demo.pl
#!/usr/bin/perl -w
# 用法: perl demo.pl <FASTA file> <output file>
# 定义输入和输出
open (IN, "<$ARGV[0]");
open (OUT, ">$ARGV[1]");
while (<IN>) {
chop;
if (m/^>/) {
s/^>//;
if ($. > 1) {
print OUT sprintf("%.3f", $gc/$length) . "\n";
}
$gc = 0;
$length = 0;
} else {
++$gc while m/[gc]/ig;
$length += length $_;
}
}
print OUT sprintf("%.3f", $gc/$length) . "\n";
close( IN );
close( OUT );
2)将脚本copy到Galaxy的tools目录
在tools目录下创建一个 myTools
的子目录用来放置脚本
cd tools
mkdir myTools
cd myTools
3)编写配置文件
放置好脚本后,我们在相同的目录下面创建一个demo.xml
配置文件。格式如下:
<tool id="fa_gc_content_1" name="Compute GC content" version="0.1.0">
<description>for each sequence in a file</description>
<command>perl '${__tool_directory__}/demo.pl' '$input' '$output'</command>
<inputs>
<param format="fasta" name="input" type="data" label="Source file"/>
</inputs>
<outputs>
<data format="tabular" name="output" />
</outputs>
<tests>
<test>
<param name="input" value="fa_gc_content_input.fa"/>
<output name="out_file1" file="fa_gc_content_output.txt"/>
</test>
</tests>
<help>
This tool computes GC content from a FASTA file.
</help>
</tool>
其中${tool_directory}
是指 Galaxy的tools目录变量
XML具体定义节点内容如下( 可以参考官方介绍):
tool
tool > description
tool > macros
tool > edam_topics
tool > edam_operations
tool > xrefs
tool > xrefs > xref
tool > requirements
tool > requirements > requirement
tool > requirements > container
tool > code
tool > stdio
tool > stdio > exit_code
tool > stdio > regex
tool > version_command
tool > command
tool > environment_variables
tool > environment_variables > environment_variable
tool > configfiles
tool > configfiles > configfile
tool > configfiles > inputs
tool > inputs
tool > inputs > section
tool > inputs > repeat
tool > inputs > conditional
tool > inputs > conditional > when
tool > inputs > param
tool > inputs > param > validator
tool > inputs > param > option
tool > inputs > param > conversion
tool > inputs > param > options
tool > inputs > param > options > column
tool > inputs > param > options > filter
tool > inputs > param > sanitizer
tool > inputs > param > sanitizer > valid
tool > inputs > param > sanitizer > valid > add
tool > inputs > param > sanitizer > valid > remove
tool > inputs > param > sanitizer > mapping
tool > inputs > param > sanitizer > mapping > add
tool > inputs > param > sanitizer > mapping > remove
tool > request_param_translation
tool > request_param_translation > request_param
tool > request_param_translation > request_param > append_param
tool > request_param_translation > request_param > append_param > value
tool > request_param_translation > request_param > value_translation
tool > request_param_translation > request_param > value_translation > value
tool > outputs
tool > outputs > data
tool > outputs > data > filter
tool > outputs > data > change_format
tool > outputs > data > change_format > when
tool > outputs > data > actions
tool > outputs > data > actions > conditional
tool > outputs > data > actions > conditional > when
tool > outputs > data > actions > action
tool > outputs > data > discover_datasets
tool > outputs > collection
tool > outputs > collection > filter
tool > outputs > collection > discover_datasets
tool > tests
tool > tests > test
tool > tests > test > param
tool > tests > test > param > collection
tool > tests > test > repeat
tool > tests > test > section
tool > tests > test > conditional
tool > tests > test > output
tool > tests > test > output > discover_dataset
tool > tests > test > output > metadata
tool > tests > test > output > assert_contents
tool > tests > test > output_collection
tool > tests > test > assert_command
tool > tests > test > assert_stdout
tool > tests > test > assert_stderr
tool > help
tool > citations
tool > citations > citation
4)定义工具
1~3步骤我们已经设计好了一个简单的工具,第4步我们需要编写XML配置文件告诉Galaxy我们添加了一个工具,配置文件需要放置在config
目录下的 tool_conf.xml
文件,添加如下内容:
<section name="MyTools" id="mTools">
<tool file="myTools/demo.xml" />
</section>
5)然后重启或者运行 Galaxy 平台,即可看到我们自定义的小工具已经完整加载
sh run.sh
总体对于生信人员来说,开发 Galaxy 平台 的门槛还是很低的,很快能上手学习,作为经典平台适合初入生信的小伙伴学习和借鉴
参考资料:
1.https://docs.galaxyproject.org/en/master/dev/schema.html
2.https://galaxyproject.org/admin/tools/add-tool-tutorial/
3.https://github.com/galaxyproject/galaxy