Adding your own tool into BioExtract
A local tool is a program on your own computer. The tool itself is not
uploaded to BioExtract Server. Rather, the BioExtract Server uses the information given about
the tool to execute it on your own system. Once the tool has been added, it
can be used like any of the other tools in the BioExtract Server and can even be included
in workflows.
The tool must meet the following criteria:
- The tool can be executed from the command line (either a DOS prompt or
Linux shell). If the tool only has a graphical or window-like interface,
it cannot be used by BioExtract Server.
- If you wish to use the output of the local tool as input for another
tool on BioExtract Server, the name and location of any output files produced by
the tool must be known before the tool is executed.
If the tool meets the criteria above, the following information is required
to add the tool:
- The full path to the location of the tool on your system. For example:
"C:\biotools\mytool.exe" on Windows
"/usr/local/bio/mytool" on Linux
- Any required command line arguments. These are pieces of information typed
after the program's name, usually specifying the name of the file it should
use as input, the file name its output should be written to, etc. For
example, let's say we have a tool that can convert sequences in one format
to another format. The documentation for the tool may read:
usage: seqconv -i filename -if formatName -of formatName
Where:
-i filename is the file to be converted
-if is the format of the input file
-of is the requested format of the output file
Valid format names: fasta, blast, genbank
We have a file in fasta format, located at "C:\temp\fasta.txt", that we
wish to convert to the genbank format. According to the documentation, we
could use the following command to accomplish this:
seqconv -i C:\temp\fasta.txt -if fasta -of genbank
- Any additional requirements or constraints the tool
may have in regards to input and output files. For example, some tools may
require that all files used for input be located in a particular directory,
while others may allow files in any location. Some tools may allow you to
specify the name and location for each output file, while others may automatically
give a name to the file and place it in a predetermined location.
Example
Suppose we have a fictional tool called "mutate", which takes
a file containing one or more gene sequences as input, performs a
handful of random mutations on the sequences, and writes the mutated
sequences to an output file.
First, click the Tools tab, then select Add a New Tool from the menu
that appears on the left. Select Add a Local Tool, then click New Local Tool. A new tool form opens in the right panel. Click the Edit link next to New Tool.
- Logical Name: a name for the tool that will be used within the
BioExtract Server. It does not have to match the actual name of the
program. For our tool, names such as "Mutate Sequences" or even just
"mutate" will work.
- Description: Optional. A description of the tool.
- HelpURL: Optional. If there is a website associated with your tool, you
can enter a link to it here.
- Location: Not used for local tools. The field should be
left blank.
- Execution Name: Enter the full path of your tool here.
If your tool is a Java JAR file (e.g. MyTool.jar), this should be
"java -jar" followed by the full path of your JAR file. For example, on
a Linux system the Execution Name might be "java -jar /usr/local/tools/MyTool.jar",
and on a Windows system it might be "java -jar C:\biotools\MyTool.jar".
Depending on your system configuration, you may need to replace 'java'
with the full path to the java command (e.g. C:\Program Files\Java\jre6\bin\java.exe).
- Can Use Current Extract: Not used for local tools and should
be left unchecked. As a note, local tools can use the current
data extract as input, but this checkbox has no effect on that at all.
Once this information is entered, click the Save link near the top of the
form.
Inputs
Data from the BioExtract Server is given to the local tool by using one or
more input files, which are created on your computer when BioExtract Server runs the tool.
The data sent to each input file can come from one of four sources,
selected when the tool is used:
- The current data extract
- A previous tool's output
- An input file uploaded from your computer
- Text typed or pasted directly into a text box
Not all tools require input files, so you may not need to add one at all.
Our example tool "mutate" requires a single input file containing a gene
sequence. Let's go through the steps for adding it:
- Physical Name: We're going to skip this one for now. It will make
more sense later after going through the rest of the fields.
- Logical Name: As with the tool itself, the logical name is only used
within BioExtract Server. It may help to use the type of data as the logical name
(especially if the tool has more than one input). Since mutate expects this
input file to be a gene sequence, we'll use "sequence" for the logical name.
- Description: Any additional information or notes about the tool
can be written here.
- Record Number Limit: If the current extract is being used as
the input, the number of records included will be truncated to this amount.
0 means "no limit".
- File Name: Enter the name (with full path) that BioExtract Server
should use to create this file. For mutate's input file, I've decided to
use "C:\biotools\mut_seq_input.txt".
- Data Types: Not used for local tools.
- Uses Current Extract: This control isn't used for local tools and
has no effect at all on whether the input can come from the current
extract. It's best to leave it unchecked.
- File Size Limit: If the size of the input file should be
restricted to a certain number of bytes, you can enter that number here.
This should only be needed if your tool has a limit on the size of the
input files given to it. Most tools can accept files of any size, so if
you're not sure, it's probably safe to leave this at the default value
of 0 (no limit).
- Include in Command Line: If your tool expects the name of the
input file to be given after its name in the command line, this box
should be checked. For example, if we were to run mutate from the command
line, using "mut_seq_input.txt" as the input file and "mut_out.txt" as
the output file, we would type: mutate C:\biotools\mut_seq_input.txt C:\biotools\mut_out.txt
- If "Include in Command Line" is checked, BioExtract Server adds the file name
to the command used to run to the tool, just as we would type ourselves
as shown above.
- If "Include in Command Line" is left unchecked, the file will still
be created with the name given by "File Name", but its name will not
be added to the command line.
- Some tools require a "switch" (like -i or -f) before files given in
the command line. Let's say, for example, that mutate requires an "-i"
before the input file and an "-o" before the output file. The command
line would look like this: mutate -i C:\biotools\mut_seq_input.txt -o C:\biotools\mut_out.txt
- Remember the Physical Name field we skipped earlier? If
the tool requires a switch before the input file, enter the switch in the
Physical Name field. (This is a temporary work-around for the problem of
adding switches before file names. We hope to provide a more elegant
interface in the future).
- Once all the information about this input has been entered, click the Save link at the top of the form. Any number of inputs can be added,
depending on the requirements or limitations of your tool.
Outputs
The interface for defining output files is very similar to the one used
to define input files. As with input files, output files are optional and
need only be defined if you would like to use the output of the tool
as input to other tools on BioExtract Server.
Physical Name, Logical Name, Description, File Name, and Include in Command Line behave exactly the same as they do
for Input files.
Record Number Limit, Description File Name,
and Modify Current Extract are not used for local tool output
files and can be ignored.
Command Line Parameters/Arguments
Most tools have a set of options whose values are given by arguments
on the command line. Specifying the input and output files for the program
are just one example of such arguments. For our purposes here, the terms
"argument" and "parameter" are identical.
The BioExtract Server uses the information given about each parameter to add
an element in the completed tool's interface where the value for the
parameter can be given. This is the interface shown for mutate's
parameters:
Before adding a Parameter, you must first add a Parameter Group.
Multiple parameter groups are allowed and can be used to keep related
parameters together. For example, a tool may have a set of parameters that
affect the appearance of the tool's output. These parameters could be placed
within a group called "Output Options", and will be displayed together
in the menu used to run the tool from BioExtract Server.
Our example tool "mutate" has a set of basic mutation operations that
can be performed on points (single nucleotides), codons, or both. The
operations include insertion, deletion, duplication, etc. The severity level
of the mutations is given by a number between 1 and 10.
So, if we wanted to delete a few random codons in the input sequence,
the command would be: mutate -i in_seq.txt -o out_seq.txt -codon deletions -s 2
If we wanted to generate severe mutations, we can ask mutate to delete
random codons and insert random nucleotides, specifying a severity level of
9. mutate -i in_seq.txt -o out_seq.txt -codon deletions -point insertions -s 9
To add parameters, begin by clicking the Create New link next to
"Parameter Groupings". Then, click Edit next to "New Grouping".
Assign a name to this parameter grouping.
Mutate has only one group of parameters, all having to do with mutation
operations, so the name "Mutation Options" fits well here.
Now, click Create New next to Parameters, then click the Edit link next to "New Parameter". Below is a description of each field
required to define a parameter. After describing these fields in
general, we will demonstrate how they were used to add parameters for
the mutate tool.
- Logical Name: As with Inputs and Outputs before, the logical
name is the name used for this parameter within the BioExtract Server.
- Physical Name: This is the actual parameter name as it should
appear on the command line. Note that the BioExtract Server will not add a "-"
automatically, so please remember to do so if your tool requires a "-"
before the parameter name.
- Description: Optional. Additional information about the
parameter.
- Parameter Type: This specifies the way values are given
for the parameter. Depending on the type, they may be entered directly
by the user or chosen from a list of pre-defined values. Below are the
details for each type:
- text: Creates a field where the value can be typed in
directly. Also useful for numeric values.
- checkbox: Creates a checkbox. Useful for parameters that
don't have any additional values (for example, some programs will print extra
information if a -v is present on the command line). If checked, the parameter
(specifically, the Physical Name) will appear on the command
line.
- radio: Not available at this time.
- select: Creates a "drop-down" menu from which one of several
possible values can be chosen.
- When using the "select" type, the possible
values must be defined. This is not required for any of the other types.
To define a set of values, click the Create New link next to Values.
Then, click Edit next to the New Parameter Value. "Value" is the value
as it should appear on the command line. If "Is Default" is checked, this
value will be selected by default when the tool's interface is shown.
- textarea: Identical to "text" type above.
- Tab Order: Optional. If you would like the parameters to be
displayed in a certain order, this number gives the rank of this
particular parameter. The ordering is from lowest-to-highest, so a
parameter with a tab order of 2 appears above one with an order of 3.
- Is Mandatory: If checked, the tool will not be able to
execute unless a value for this parameter has been entered.
This is the completed "-codon" parameter for mutate:
Saving
Once all of the necessary inputs, outputs, and parameters have been defined,
click the Save Tool button at the bottom of the form. Please note that
all of the "subforms" opened for each input, output and parameter must be
saved before clicking "Save Tool". If any of the subforms are still open,
the following error message will be displayed:
Running a Local Tool
Once the tool is saved, it will appear under the "My Tools" group in the
Available Tools list on the Tools page.
When you select the tool, the interface presented is the same one used
by all BioExtract Server tools. Please consult Executing Tools for additional information.
A few moments after clicking "Execute", a popup window should appear, followed by
another one that looks like this:
This is normal. A Java applet is used to execute the local tool, and applets cannot
execute programs or write files without permission, which can only be granted
if the applet is placed in a digitally signed .jar file. Since we have signed the file
ourselves without using a certificate from one of the third-party Certificate
Authorities, the browser reports that the signature cannot be verified.
Once you click "Run", the applet will download the input files from the server,
execute the tool, and upload the output files, displaying a short message
for each step. Once it is finished, you may close the applet window.