Galaxy Project是一款开源的基于Web的生信分析平台 ,之前一篇文章《开源生信(云)平台部署教程-Galaxy Project》介绍了部署流程,本文简单介绍如何快速开发基于Galaxy 平台的生物信息工具。

创建工具主要需要两部分:

  • 你需要执行任务的脚本或者命令
  • 配置文件(XML语法,定义输入输出,参数等)

下面以一个简单的示例来说明:

1)编写一个示例脚本demo.pl


#!/usr/bin/perl -w
# 用法: perl demo.pl <FASTA file> <output file>

# 定义输入和输出
open (IN, "<$ARGV[0]");
open (OUT, ">$ARGV[1]");
while (<IN>) {
    chop;
    if (m/^>/) {
        s/^>//;
        if ($. > 1) {
            print OUT sprintf("%.3f", $gc/$length) . "\n";
        }
        $gc = 0;
        $length = 0;
    } else {
        ++$gc while m/[gc]/ig;
        $length += length $_;
    }
}
print OUT sprintf("%.3f", $gc/$length) . "\n";
close( IN );
close( OUT );

2)将脚本copy到Galaxy的tools目录

在tools目录下创建一个 myTools 的子目录用来放置脚本


cd tools
mkdir myTools
cd myTools

3)编写配置文件

放置好脚本后,我们在相同的目录下面创建一个demo.xml配置文件。格式如下:


<tool id="fa_gc_content_1" name="Compute GC content" version="0.1.0">
  <description>for each sequence in a file</description>
  <command>perl '${__tool_directory__}/demo.pl' '$input' '$output'</command>
  <inputs>
    <param format="fasta" name="input" type="data" label="Source file"/>
  </inputs>
  <outputs>
    <data format="tabular" name="output" />
  </outputs>

  <tests>
    <test>
      <param name="input" value="fa_gc_content_input.fa"/>
      <output name="out_file1" file="fa_gc_content_output.txt"/>
    </test>
  </tests>

  <help>
This tool computes GC content from a FASTA file.
  </help>
</tool>

其中${tool_directory}是指 Galaxy的tools目录变量

XML具体定义节点内容如下( 可以参考官方介绍):


tool
tool > description
tool > macros
tool > edam_topics
tool > edam_operations
tool > xrefs
tool > xrefs > xref
tool > requirements
tool > requirements > requirement
tool > requirements > container
tool > code
tool > stdio
tool > stdio > exit_code
tool > stdio > regex
tool > version_command
tool > command
tool > environment_variables
tool > environment_variables > environment_variable
tool > configfiles
tool > configfiles > configfile
tool > configfiles > inputs
tool > inputs
tool > inputs > section
tool > inputs > repeat
tool > inputs > conditional
tool > inputs > conditional > when
tool > inputs > param
tool > inputs > param > validator
tool > inputs > param > option
tool > inputs > param > conversion
tool > inputs > param > options
tool > inputs > param > options > column
tool > inputs > param > options > filter
tool > inputs > param > sanitizer
tool > inputs > param > sanitizer > valid
tool > inputs > param > sanitizer > valid > add
tool > inputs > param > sanitizer > valid > remove
tool > inputs > param > sanitizer > mapping
tool > inputs > param > sanitizer > mapping > add
tool > inputs > param > sanitizer > mapping > remove
tool > request_param_translation
tool > request_param_translation > request_param
tool > request_param_translation > request_param > append_param
tool > request_param_translation > request_param > append_param > value
tool > request_param_translation > request_param > value_translation
tool > request_param_translation > request_param > value_translation > value
tool > outputs
tool > outputs > data
tool > outputs > data > filter
tool > outputs > data > change_format
tool > outputs > data > change_format > when
tool > outputs > data > actions
tool > outputs > data > actions > conditional
tool > outputs > data > actions > conditional > when
tool > outputs > data > actions > action
tool > outputs > data > discover_datasets
tool > outputs > collection
tool > outputs > collection > filter
tool > outputs > collection > discover_datasets
tool > tests
tool > tests > test
tool > tests > test > param
tool > tests > test > param > collection
tool > tests > test > repeat
tool > tests > test > section
tool > tests > test > conditional
tool > tests > test > output
tool > tests > test > output > discover_dataset
tool > tests > test > output > metadata
tool > tests > test > output > assert_contents
tool > tests > test > output_collection
tool > tests > test > assert_command
tool > tests > test > assert_stdout
tool > tests > test > assert_stderr
tool > help
tool > citations
tool > citations > citation

4)定义工具

1~3步骤我们已经设计好了一个简单的工具,第4步我们需要编写XML配置文件告诉Galaxy我们添加了一个工具,配置文件需要放置在config目录下的 tool_conf.xml 文件,添加如下内容:


<section name="MyTools" id="mTools">
    <tool file="myTools/demo.xml" />
 </section>

5)然后重启或者运行 Galaxy 平台,即可看到我们自定义的小工具已经完整加载


 sh run.sh

总体对于生信人员来说,开发 Galaxy 平台 的门槛还是很低的,很快能上手学习,作为经典平台适合初入生信的小伙伴学习和借鉴

参考资料:

1.https://docs.galaxyproject.org/en/master/dev/schema.html

2.https://galaxyproject.org/admin/tools/add-tool-tutorial/

3.https://github.com/galaxyproject/galaxy