| 1 |
***MAKER Documentation*** |
|---|
| 2 |
|
|---|
| 3 |
#---------- |
|---|
| 4 |
INSTALLATION INSTUCTIONS FOR MAKER |
|---|
| 5 |
|
|---|
| 6 |
*Step by step instructions are also available in the INSTALL text file. |
|---|
| 7 |
|
|---|
| 8 |
To install maker, you will first need to install the following external programs: |
|---|
| 9 |
|
|---|
| 10 |
*PERL 5.8.0 or higher |
|---|
| 11 |
*BioPerl 1.5 or higher (www.bioperl.org) |
|---|
| 12 |
*Wu-BLAST 2.0 or higher (blast.wustl.edu) |
|---|
| 13 |
*SNAP version 2006-07-28 or higher (homepage.mac.com/iankorf) |
|---|
| 14 |
*RepeatMasker 3.1.6 or higher (www.repeatmasker.org) |
|---|
| 15 |
*Exonerate 1.4 or higher (www.ebi.ac.uk/~guy/exonerate) |
|---|
| 16 |
|
|---|
| 17 |
|
|---|
| 18 |
You might want to also install these optional external programs: |
|---|
| 19 |
|
|---|
| 20 |
*Augustus 2.0 or higher (augustus.gobics.de) |
|---|
| 21 |
|
|---|
| 22 |
|
|---|
| 23 |
To install mpi_maker, you must have an mpi package installed, try the following: |
|---|
| 24 |
|
|---|
| 25 |
*MPICH2 (http://www.mcs.anl.gov/research/projects/mpich2/) |
|---|
| 26 |
|
|---|
| 27 |
note: Remember to install MPICH2 with the --enable-sharedlibs flag set to the appropriate value (See MPICH2 Installer's Guide at http://www.mcs.anl.gov/research/projects/mpich2/documentation/index.php?s=docs). |
|---|
| 28 |
|
|---|
| 29 |
|
|---|
| 30 |
Notes: |
|---|
| 31 |
1) RepeatMasker requires Wu-BLAST and a single file executable called TRF (see RepeatMasker website for details), so please install these before installing RepeatMasker |
|---|
| 32 |
2) Exonerate Binaries can be downloaded from the website. If you have Mac OSX, however, binaries are only available for version 1.0. This verion will work too. If you would like to compile exonerate, it requires GLIB, a C-library, that has a link from the exonerate website. If you have Mac OSX, this can downloaded using FINK. |
|---|
| 33 |
3) RepeatMasker requires a repeat library file, which is downloaded from Repbase (http://www.girinst.org/), this is explained on the RepeatMasker website. |
|---|
| 34 |
4) Please note the location of all of the programs that you have installed. You will need this information in the maker.exe file, one of MAKER's 3 control files. |
|---|
| 35 |
|
|---|
| 36 |
|
|---|
| 37 |
Now that you have all the necessary programs installed, MAKER can be unpacked using: |
|---|
| 38 |
|
|---|
| 39 |
tar xvfz maker.tar.gz |
|---|
| 40 |
|
|---|
| 41 |
This will create a directory called maker with 5 sub directories: |
|---|
| 42 |
|
|---|
| 43 |
bin - contains the maker code. |
|---|
| 44 |
lib - contains all the necessary perl libaries for MAKER. |
|---|
| 45 |
MPI - contains MPI specific data to configure maker to run on a cluster that supports MPI. |
|---|
| 46 |
Apollo - contains gff3.tiers file (See section titled APOLLO below) |
|---|
| 47 |
data - contains some sample data used to make sure everything works |
|---|
| 48 |
|
|---|
| 49 |
Maker uses control files to guide each run. Generic control files can be built using the -CTL flag in maker. These control files can then be edited by the user to identify the location of all required input data and statistics. Control files are run specific and seperate control will need to be built for each genome given to maker. Maker will look for control files in the current working directory, so it is recomended that maker should be ran in a seperate directory containing unique control files for each genome. |
|---|
| 50 |
|
|---|
| 51 |
Control files: |
|---|
| 52 |
|
|---|
| 53 |
1. maker_exe.ctl - contains the path information for needed executables |
|---|
| 54 |
2. maker_bopts - contains filtering statistics for BLAST and Exonerate |
|---|
| 55 |
3. maker_opts.ctl - contains all other information for MAKER, including the location of the input genome file. |
|---|
| 56 |
|
|---|
| 57 |
|
|---|
| 58 |
Always remember to be examine the control files before each run of MAKER on your specific data |
|---|
| 59 |
|
|---|
| 60 |
|
|---|
| 61 |
Programs required by maker rely on certain environmental variables being set. If you have not set these variables per the installation instructions of the external programs, a reminder list is provided below: |
|---|
| 62 |
|
|---|
| 63 |
for tcsh: |
|---|
| 64 |
setenv PERL5LIB where_bioperl_is_installed |
|---|
| 65 |
setenv WUBLASTMAT where_wublast_is_installed/matrix |
|---|
| 66 |
setenv ZOE where_snap_is_installed |
|---|
| 67 |
setenv WUBLASTFILTER where_wublast_is_installed/filter |
|---|
| 68 |
setenv AUGUSTUS_CONFIG_PATH where_augustus_is_installed/config |
|---|
| 69 |
|
|---|
| 70 |
for bash: |
|---|
| 71 |
export PERL5LIB=where_bioperl_is_installed |
|---|
| 72 |
export WUBLASTMAT=where_wublast_is_installed/matrix |
|---|
| 73 |
export ZOE=where_snap_is_installed |
|---|
| 74 |
export WUBLASTFILTER=where_wublast_is_installed/filter |
|---|
| 75 |
export AUGUSTUS_CONFIG_PATH=where_augustus_is_installed/config |
|---|
| 76 |
|
|---|
| 77 |
|
|---|
| 78 |
#---------- |
|---|
| 79 |
MPI MAKER INSTALL |
|---|
| 80 |
|
|---|
| 81 |
If you are running maker on an MPI capable cluster, you can install an MPI version of maker by doing the following: |
|---|
| 82 |
|
|---|
| 83 |
1. Install standard maker and verify that it runs. |
|---|
| 84 |
2. Use cd to change to the MPI subdirectory in the maker instalation folder (i.e. maker/MPI/) |
|---|
| 85 |
3. Run Install.PL by typing: perl Install.PL |
|---|
| 86 |
|
|---|
| 87 |
A new version of maker called mpi_maker should now be installed under maker/bin. |
|---|
| 88 |
|
|---|
| 89 |
To run mpi_maker, first verify that your mpi environment is initiated, (i.e. using the mpdboot command). Now start mpi_maker via mpiexec. |
|---|
| 90 |
|
|---|
| 91 |
Example: |
|---|
| 92 |
|
|---|
| 93 |
mpiexec -n 3 perl maker_directory/maker/bin/mpi_maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl |
|---|
| 94 |
|
|---|
| 95 |
|
|---|
| 96 |
Please see the documentation for the MPI environment you use for how to initiate an MPI process. |
|---|
| 97 |
|
|---|
| 98 |
|
|---|
| 99 |
#---------- |
|---|
| 100 |
MAKER USAGE STATEMENT |
|---|
| 101 |
|
|---|
| 102 |
Usage: |
|---|
| 103 |
|
|---|
| 104 |
maker [options] <maker_opts.ctl> <maker_bopts.ctl> <maker_exe.ctl> |
|---|
| 105 |
|
|---|
| 106 |
The three input arguments are user control files that specify how maker should behave. |
|---|
| 107 |
All input files listed in the control options files must be in fasta format. Please |
|---|
| 108 |
see maker documentation to learn more about control file format. The program will |
|---|
| 109 |
automatically try and locate the user control files in the current working |
|---|
| 110 |
directory if these arguments are not supplied when initializing maker. |
|---|
| 111 |
|
|---|
| 112 |
It is important to note that maker does not try and recalculated data that it has |
|---|
| 113 |
already calculated. For example, if you run an analysis twice on the same fasta file |
|---|
| 114 |
you will notice that maker does not rerun any of the blast analyses but instead uses |
|---|
| 115 |
the blast analyses stored from the previous run. To force maker to rerun all |
|---|
| 116 |
analyses, use the -f flag. |
|---|
| 117 |
|
|---|
| 118 |
Options: |
|---|
| 119 |
|
|---|
| 120 |
-genome|g <file_name> Give MAKER a different genome file (this overrides the |
|---|
| 121 |
control file value) |
|---|
| 122 |
|
|---|
| 123 |
-predictor <snap> Selects the gene predictor to use when building annotations (Default |
|---|
| 124 |
<augustus> is 'snap'). The option 'est2genome' builds annotations directly |
|---|
| 125 |
<est2genome> from the EST evidence. |
|---|
| 126 |
|
|---|
| 127 |
-GFF Use an input gff3 format file of repeat elements for repeat masking. |
|---|
| 128 |
You must set rm_gff in maker_opts.ctl to the files location. This |
|---|
| 129 |
option turns off all other repeat masking. |
|---|
| 130 |
|
|---|
| 131 |
-RM_off|R Turns repeat masking off (* See Warning) |
|---|
| 132 |
|
|---|
| 133 |
-force|f Forces maker to rerun all analyses (replaces all previous output). |
|---|
| 134 |
|
|---|
| 135 |
-datastore|d Causes output to be written using datastore. This option is |
|---|
| 136 |
automatically enabled if there are more than 1000 fasta entries |
|---|
| 137 |
in the input file. Output can then accessed using the |
|---|
| 138 |
master_datastore_index file created by the program. |
|---|
| 139 |
|
|---|
| 140 |
-PREDS Outputs ab-initio predictions that do not overlap maker annotation |
|---|
| 141 |
as gene annotations in the final gff3 output file (based on the |
|---|
| 142 |
-predictor flag ). |
|---|
| 143 |
|
|---|
| 144 |
-CTL Generates generic control files in the current working directory. |
|---|
| 145 |
|
|---|
| 146 |
-retry <integer> Re-run failed contigs up to the specified number of re-tries. |
|---|
| 147 |
|
|---|
| 148 |
-cpus|c <integer> Tells how many cpus to use for Blast analysis (this overrides |
|---|
| 149 |
contorol file value). |
|---|
| 150 |
|
|---|
| 151 |
-help|? Prints this usage statement. |
|---|
| 152 |
|
|---|
| 153 |
|
|---|
| 154 |
Warning: |
|---|
| 155 |
|
|---|
| 156 |
*When using the -R flag, maker expects that the input genome file is already masked. |
|---|
| 157 |
Also if your genome file contains lower case characters, maker will consider those |
|---|
| 158 |
characers to be soft masked. |
|---|
| 159 |
|
|---|
| 160 |
|
|---|
| 161 |
#---------- |
|---|
| 162 |
RUNNING MAKER WITH EXAMPLE DATA |
|---|
| 163 |
|
|---|
| 164 |
1) Copy the files in the data directories to a temporary directory where you will run an example file. |
|---|
| 165 |
2) Type maker -CTL to generate generic maker control files |
|---|
| 166 |
3) Next you will need to edit the control files to include the path of the genome file, EST file, and proitein file, as well as the paths to all required executables. See CONFIG FILE EDITING for more information. |
|---|
| 167 |
4) Then try the following command from your temporary directory: |
|---|
| 168 |
|
|---|
| 169 |
perl maker_directory/bin/maker maker_exe.ctl maker_opts.ctl maker_bopts.ctl |
|---|
| 170 |
|
|---|
| 171 |
MAKER will create at least the following files/directory: |
|---|
| 172 |
|
|---|
| 173 |
seq_name.gff - a gff file that can be loaded into GMOD, GBROWSE, or Apollo |
|---|
| 174 |
|
|---|
| 175 |
seq_name.maker.transcripts.fasta - a file of the maker transcript sequences |
|---|
| 176 |
seq_name.maker.snap.transcript.fasta - a file of ab-inito snap transcript sequences |
|---|
| 177 |
seq_name.maker.proteins.fasta - a file of the maker protein sequences |
|---|
| 178 |
seq_name.maker.snap.proteins.fasta - a file of ab-inito snap protein sequences |
|---|
| 179 |
|
|---|
| 180 |
theVoid.seq_name - a directory containing all of the results files produced by maker, including BLAST reports, SNAP output, exonnerate output and the masked sequence |
|---|
| 181 |
|
|---|
| 182 |
WARNING: |
|---|
| 183 |
*The names of output files are based on sequence ids. If giving maker a multi-fasta file, it is important to verify that all sequence id are unique, so files are not overwritten. |
|---|
| 184 |
*If there are more than 1,000 sequences in a multi-fasta file or you use the -d flag on the command line a datastore structure will be used. see DATASTORE in this document. |
|---|
| 185 |
*If sequence ids contain characters that are illegal in file names, those characters will be replaced automatically before building output file names. |
|---|
| 186 |
|
|---|
| 187 |
|
|---|
| 188 |
|
|---|
| 189 |
#---------- |
|---|
| 190 |
DATASTORE |
|---|
| 191 |
|
|---|
| 192 |
"Many filesystems have performance problems with large numbers of subdirectories and files within a single directory and even when the underlying filesystems handle things gracefully, access via network filesystems can be an issue. The Datastore modules create a hiearchy of subdirectory layers, starting from a 'base', and mapping end-user's identifiers to the corresponding subdirectory." - quote from http://www.yandell-lab.org/ (See site for more information on the Datastore module) |
|---|
| 193 |
|
|---|
| 194 |
Datastore will be used by maker if there are more than 1,000 sequences in a multi-fasta file or you use the -d flag on the command line. |
|---|
| 195 |
|
|---|
| 196 |
When datastore is implemented, the output files described above will not appear where you would normally expect them to be. Instead they will be located in a series of sub-directory under a new base-directory whose name is determined from the input genome file name, i.e. current_working_directory/input_genome_datastore/EE/Af/seq_name/seq_name.gff. A master_datastore_index file will be made in the current working directory to help you find the output files from each sequence. |
|---|
| 197 |
|
|---|
| 198 |
The master_datastore_index file is a file created to allow the user to easily find the exact output directory corresponding to contigs from the input genome file. The The master_datastore_index file contains two columns of text; the first column shows the sequence identifier from each fasta header, and the second column shows the location of the output files for that sequence. |
|---|
| 199 |
|
|---|
| 200 |
|
|---|
| 201 |
|
|---|
| 202 |
#---------- |
|---|
| 203 |
CONFIG FILE EDITING |
|---|
| 204 |
|
|---|
| 205 |
Lines in the maker control files have the format key:value whith no spaces before or after the colon(:). If the value is a file name, you can use relative paths and environmental variables, i.e. genome:$HOME/my_genome.fasta |
|---|
| 206 |
|
|---|
| 207 |
|
|---|
| 208 |
MAKER has 3 control files for configuration options. |
|---|
| 209 |
|
|---|
| 210 |
A. maker_exe.ctl - includes information about programs executed by MAKER. |
|---|
| 211 |
|
|---|
| 212 |
Here is what the standard maker_exe.ctl control file looks like: |
|---|
| 213 |
==================================== |
|---|
| 214 |
|
|---|
| 215 |
#-----Location of executables required by Maker |
|---|
| 216 |
xdformat:/usr/local/wublast/xdformat #location of xdformat executable |
|---|
| 217 |
blastn:/usr/local/wublast/blastn #location of blastn executable |
|---|
| 218 |
blastx:/usr/local/wublast/blastx #location of blastn executable |
|---|
| 219 |
snap:/usr/local/snap/snap #location of snap executable |
|---|
| 220 |
augustus:/usr/local/augustus/bin/augustus #location of augustus executable (optional) |
|---|
| 221 |
RepeatMasker:/usr/local/RepeatMasker/RepeatMasker #location of RepeatMasker executable |
|---|
| 222 |
exonerate:/usr/local/exonerate/bin/exonerate #location of exonerate executable |
|---|
| 223 |
|
|---|
| 224 |
==================================== |
|---|
| 225 |
|
|---|
| 226 |
Note that for all control files the comments written to help users begin with a pound sign(#). In addition, options before the colon(:) can not be changed, nor should there be a space before or after the colon. |
|---|
| 227 |
|
|---|
| 228 |
|
|---|
| 229 |
B. maker_bopts.ctl - contains statistics for fltering blast and exonerate data |
|---|
| 230 |
|
|---|
| 231 |
Here an example maker_bopts.ctl: |
|---|
| 232 |
==================================== |
|---|
| 233 |
|
|---|
| 234 |
#-----BLAST and Exonerate statistics thresholds |
|---|
| 235 |
percov_blastn:0.80 #Blastn Percent Coverage Threhold EST-Genome Alignments |
|---|
| 236 |
percid_blastn:0.85 #Blastn Percent Identity Threshold EST-Genome Aligments |
|---|
| 237 |
eval_blastn:1e-10 #Blastn eval cutoff |
|---|
| 238 |
bit_blastn:40 #Blastn bit cutoff |
|---|
| 239 |
percov_blastx:0.50 #Blastx Percent Coverage Threhold Protein-Genome Alignments |
|---|
| 240 |
percid_blastx:0.40 #Blastx Percent Identity Threshold Protein-Genome Aligments |
|---|
| 241 |
eval_blastx:1e-6 #Blastx eval cutoff |
|---|
| 242 |
bit_blastx:30 #Blastx bit cutoff |
|---|
| 243 |
e_perc_cov:50 #Exonerate Percent Coverage Thresshold EST_Genome Alignments |
|---|
| 244 |
ep_score_limit:20 #Report alignments scoring at least this percentage of the maximal score exonerate nucleotide |
|---|
| 245 |
en_score_limit:20 #Report alignments scoring at least this percentage of the maximal score exonerate protein |
|---|
| 246 |
|
|---|
| 247 |
==================================== |
|---|
| 248 |
|
|---|
| 249 |
|
|---|
| 250 |
C. maker_opts.ctl - contains options for maker and external programs used by maker |
|---|
| 251 |
|
|---|
| 252 |
Here an example maker_opts.ctl: |
|---|
| 253 |
==================================== |
|---|
| 254 |
|
|---|
| 255 |
#-----sequence and library files |
|---|
| 256 |
genome:fly_assembly.fasta #genome sequence file (required) |
|---|
| 257 |
est:fly_est.fasta #EST sequence file (required) |
|---|
| 258 |
protein:uniprot.fasta #protein sequence file (required) |
|---|
| 259 |
repeat_protein:te_proteins.fasta #a database of transposable element proteins |
|---|
| 260 |
rmlib:fly_specific_repeats.fasta #an organism specific repeat library (optional) |
|---|
| 261 |
rm_gff: #a gff3 format file of repeat elements (only used with -GFF flag) |
|---|
| 262 |
|
|---|
| 263 |
#-----external application specific options |
|---|
| 264 |
snaphmm:fly #SNAP HMM model |
|---|
| 265 |
augustus_species:fly #Augustus gene prediction model |
|---|
| 266 |
model_org:all #RepeatMasker model organism |
|---|
| 267 |
alt_peptide:c #amino acid used to replace non standard amino acids in xdformat |
|---|
| 268 |
cpus:2 #max number of cpus to use in BLAST and RepeatMasker |
|---|
| 269 |
|
|---|
| 270 |
#-----Maker specific options |
|---|
| 271 |
predictor:snap #identifies which gene prediction program to use for annotations |
|---|
| 272 |
te_remove:1 #mask regions with excess similarity to transposable element proteins |
|---|
| 273 |
max_dna_len:100000 #length for dividing up contigs into chunks (larger values increase memory usage) |
|---|
| 274 |
split_hit:10000 #length of the splitting of hits (max intron size for EST and protein alignments) |
|---|
| 275 |
snap_flank:200 #length of sequence surrounding EST and protein evidence used to extend gene predictions |
|---|
| 276 |
single_exon:0 #consider EST hits aligning to single exons when generating annotations, 1 = yes, 0 = no |
|---|
| 277 |
use_seq_dir:1 #place output files in same directory as sequence file: 1 = yes, 0 = no |
|---|
| 278 |
clean_up:0 #remove theVoid directory: 1 = yes, 0 = no |
|---|
| 279 |
|
|---|
| 280 |
==================================== |
|---|
| 281 |
|
|---|
| 282 |
|
|---|
| 283 |
#---------- |
|---|
| 284 |
ADDING UTRs for GBROWSE |
|---|
| 285 |
|
|---|
| 286 |
* When using APOLLO to visualize gene annotations, UTRs are inferred based on exon and CDS locations. However GMOD and GBROWSE do not infer the UTR, so to visualize the UTR, you will have to run: add_utr_gff.pl with the following command: |
|---|
| 287 |
|
|---|
| 288 |
maker2zff.pl <directory> |
|---|
| 289 |
<directory> is the directory where all of your GFF files are located |
|---|
| 290 |
|
|---|
| 291 |
each GFF file will have a sister file called sequence.wutr.gff3 |
|---|
| 292 |
|
|---|
| 293 |
|
|---|
| 294 |
#---------- |
|---|
| 295 |
APOLLO |
|---|
| 296 |
|
|---|
| 297 |
Maker is bundled with a configuration file that improves the color and display of maker annotations and evidence in the Apollo genome browser. The configuration file is called "gff3.tiers" and is located in the maker/Apollo/ directory. The file should be copied to the conf/ sub_directory which is located under the Apollo instalation directory. Using the Mac version of Apollo the conf/ directory is located at /Applications/Apollo.app/Contents/Resources/app/conf/. |
|---|
| 298 |
|
|---|
| 299 |
|
|---|
| 300 |
#---------- |
|---|
| 301 |
HMM BUILDING (based snap documentation) |
|---|
| 302 |
|
|---|
| 303 |
A. First you will need to determine the genes used to model future genes, by determining a high quality gene set (annotations for the high quality gene should be in GFF3 format). The high quality gene set can then be coverted into snap ZFF format using maker2zff.pl found in maker/bin. |
|---|
| 304 |
|
|---|
| 305 |
This program is run with the following command: |
|---|
| 306 |
|
|---|
| 307 |
maker2zff.pl <directory> genome |
|---|
| 308 |
|
|---|
| 309 |
*<directory> is the directory where all of your GFF3 files are located |
|---|
| 310 |
*geneome is the name for the outfile |
|---|
| 311 |
|
|---|
| 312 |
Files Created: |
|---|
| 313 |
|
|---|
| 314 |
genome.ann |
|---|
| 315 |
genome.dna |
|---|
| 316 |
|
|---|
| 317 |
Note: A convenient way to identify and initial high quality gene set for the HMM is to use the -predictor est2genome option in maker. This will produce gene annotations based solely on EST evidence. These annoations can then seed the first HMM. After running maker again using this new HMM and the -predictor snap option, you can use the second round of annotations as the seed for an even better HMM model. In this way the HMM model progressively improves with each run of maker. |
|---|
| 318 |
|
|---|
| 319 |
Another strategy for identifying an initial gene set to model the HMM is to use the program CEGMA (http://korflab.ucdavis.edu/software.html). CEGMA builds a highly reliable set of gene annotations in the absence of experimental data by identifying DNA regions with homology to a set of 458 proteins that are highly conserved among taxa. |
|---|
| 320 |
|
|---|
| 321 |
Combining both CEGMA and maker datasets to build the first HMM is also a good strategy. |
|---|
| 322 |
|
|---|
| 323 |
|
|---|
| 324 |
B. Next you will use the dna and zff file (genome.dna and genome.ann) to produce a SNAP HMM as described in the SNAP documation (which we have provided below): |
|---|
| 325 |
|
|---|
| 326 |
The first step is to look at some features of the genes: |
|---|
| 327 |
|
|---|
| 328 |
fathom genome.ann genome.dna -gene-stats |
|---|
| 329 |
|
|---|
| 330 |
Next, you want to verify that the genes have no obvious errors: |
|---|
| 331 |
|
|---|
| 332 |
fathom genome.ann genome.dna -validate |
|---|
| 333 |
|
|---|
| 334 |
You may find some errors and warnings. Check these out in some kind of genome |
|---|
| 335 |
browser and remove those that are real errors. Next, break up the sequences into |
|---|
| 336 |
fragments with one gene per sequence with the following command: |
|---|
| 337 |
|
|---|
| 338 |
fathom -genome.ann genome.dna -categorize 1000 |
|---|
| 339 |
|
|---|
| 340 |
There will be up to 1000 bp on either side of the genes. You will find |
|---|
| 341 |
several new files. |
|---|
| 342 |
|
|---|
| 343 |
alt.ann, alt.dna (genes with alternative splicing) |
|---|
| 344 |
err.ann, err.dna (genes that have errors) |
|---|
| 345 |
olp.ann, olp.dna (genes that overlap other genes) |
|---|
| 346 |
wrn.ann, wrn.dna (genes with warnings) |
|---|
| 347 |
uni.ann, uni.dna (single gene per sequence) |
|---|
| 348 |
|
|---|
| 349 |
Convert the uni genes to plus stranded with the command: |
|---|
| 350 |
|
|---|
| 351 |
fathom uni.ann uni.dna -export 1000 -plus |
|---|
| 352 |
|
|---|
| 353 |
You will find 4 new files: |
|---|
| 354 |
|
|---|
| 355 |
export.aa proteins corresponding to each gene |
|---|
| 356 |
export.ann gene structure on the plus strand |
|---|
| 357 |
export.dna DNA of the plus strand |
|---|
| 358 |
export.tx transcripts for each gene |
|---|
| 359 |
|
|---|
| 360 |
The parameter estimation program, forge, creates a lot of files. You probably |
|---|
| 361 |
want to create a directory to keep things tidy before you execute the program. |
|---|
| 362 |
|
|---|
| 363 |
mkdir params |
|---|
| 364 |
cd params |
|---|
| 365 |
forge ../export.ann ../export.dna |
|---|
| 366 |
cd .. |
|---|
| 367 |
|
|---|
| 368 |
Last is to build an HMM. |
|---|
| 369 |
|
|---|
| 370 |
hmm-assembler.pl my-genome params > my-genome.hmm |
|---|
| 371 |
|
|---|
| 372 |
|
|---|
| 373 |
Lastly, you will want to add the location of your hmm file to your maker_opts.ctl file. |
|---|
| 374 |
|
|---|
| 375 |
*For more information see SNAP documentation on how to build an HMM |
|---|