Changeset 173

Show
Ignore:
Timestamp:
03/24/09 15:08:16 (8 months ago)
Author:
cholt
Message:

update README

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • MPI/mpi_maker

    r160 r173  
    114114     mpi_maker [options] <maker_opts> <maker_bopts> <maker_exe> <evaluator> 
    115115 
    116      Maker is a program that produces gene annotations in gff3 file format using 
     116     Maker is a program that produces gene annotations in GFF3 file format using 
    117117     evidence such as EST alignments and protein homology.  Maker can be used to 
    118118     produce gene annotations for new genomes as well as update annoations from 
     
    120120 
    121121     The four input arguments are user control files that specify how maker 
    122      should behave. The evaluator_opts file contains control options specific 
     122     should behave. The evaluator options file contains control options specific 
    123123     for the evaluation of gene annotations. All options for maker should be set 
    124124     in the control files, but a few can also be set on the command line. 
     
    150150                                 augustus 
    151151                                 fgenesh 
     152                                 genemark 
    152153                                 est2genome (Uses EST's directly) 
    153154                                 abinit (ab-initio predictions) 
    154                                  gff (Passes through gff3 file annotations) 
     155                                 model_gff (Passes through GFF3 annotations) 
    155156 
    156157     -RM_off|R           Turns all repeat masking off. 
     
    162163     -force|f            Forces maker to delete old files before running again. 
    163164                         This will require all blast analyses to be rerun. 
     165 
     166     -evaluate|e         Run Evaluator on final annotations (under development). 
    164167 
    165168     -quiet|q            Silences most of maker's status messages. 
     
    250253               "predictor=s" =>\$OPT{predictor}, 
    251254               "retry=i" =>\$OPT{retry}, 
    252                "clean_try" =>\$OPT{clean_try}, 
    253255               "evaluate" =>\$OPT{evaluate}, 
    254256               "quiet" =>\$main::quiet, 
  • README

    r96 r173  
    11***MAKER Documentation*** 
    22 
    3 #---------- 
     3#---------------------------------------------------- 
    44INSTALLATION INSTUCTIONS FOR MAKER 
    55 
    66*Step by step instructions are also available in the INSTALL text file. 
     7 
     8MAKER is an annotation pipeline.  In other words it links together many steps and programs to produce final annotations.  For this reason, you must first install a number of programs that MAKER depends on. 
     9 
    710 
    811To install maker, you will first need to install the following external programs: 
     
    1013     *PERL 5.8.0 or higher 
    1114     *BioPerl 1.5 or higher (www.bioperl.org) 
    12      *Wu-BLAST 2.0  or higher (blast.wustl.edu) 
    13      *SNAP version 2006-07-28  or higher (homepage.mac.com/iankorf) 
     15     *SNAP version 2009-02-03  or higher (homepage.mac.com/iankorf) 
    1416     *RepeatMasker 3.1.6  or higher (www.repeatmasker.org) 
    1517     *Exonerate 1.4  or higher (www.ebi.ac.uk/~guy/exonerate) 
    1618 
    17  
     19You must also install one of the following: 
     20 
     21     *Wu-BLAST 2.0  or higher (Wu-BLAST is becoming AB-BLAST which can not yet be downloaded) 
     22        or 
     23     *NCBI BLAST 2.2.X or higher (http://www.ncbi.nlm.nih.gov/BLAST/download.shtml) 
     24  
    1825You might want to also install these optional external programs: 
    1926 
    2027     *Augustus 2.0  or higher (augustus.gobics.de) 
    21  
     28     *GeneMark.hmm-E 3.9 or higher (exon.biology.gatech.edu) 
     29     *FgenesH (www.softberry.com/) - requires licence 
    2230 
    2331To install mpi_maker, you must have an mpi package installed, try the following: 
     
    2836 
    2937 
    30 Notes:  
    31 1) RepeatMasker requires Wu-BLAST and a single file executable called TRF (see RepeatMasker website for details), so please install these before installing RepeatMasker 
    32 2) Exonerate Binaries can be downloaded from the website.  If you have Mac OSX, however, binaries are only available for version 1.0.  This verion will work too.  If you would like to compile exonerate, it requires GLIB, a C-library, that has a link from the exonerate website.  If you have Mac OSX, this can downloaded using FINK. 
    33 3) RepeatMasker requires a repeat library file, which is downloaded from Repbase (http://www.girinst.org/), this is explained on the RepeatMasker website. 
    34 4) Please note the location of all of the programs that you have installed.  You will need this information in the maker.exe file, one of MAKER's 3 control files. 
     38Notes: 
     391) Wu-BLAST is becoming AB-BLAST.  Once AB-BLAST becomes available we will do some testing to see if it is compatible with MAKER.  Wu-BLAST is no longer available online, so if you don't already have it, you will have to use NCBI BLAST instead. 
     402) RepeatMasker requires Wu-BLAST or Cross_Match and a single file executable called TRF (see RepeatMasker website for details), so please install these before installing RepeatMasker 
     413) Exonerate Binaries can be downloaded from the website.  If you use Mac OSX, however, binaries are only available for version 1.0.  This verion will work too.  If you would like to compile exonerate, it requires GLIB, a C-library, that has a link from the exonerate website.  If you use Mac OSX, GLIB can downloaded using FINK. 
     424) RepeatMasker requires a repeat library file, which can be downloaded from Repbase upon registration (http://www.girinst.org/), this is explained on the RepeatMasker website. 
     435) Please note the location of all of the programs that you have installed, and add them to you $PATH variable in your .profile file.  You will need this information in the maker.exe file, one of MAKER's 3 control files. 
    3544 
    3645 
     
    4150This will create a directory called maker with 5 sub directories: 
    4251 
    43         bin - contains the maker code
     52        bin - contains the maker executables
    4453        lib - contains all the necessary perl libaries for MAKER. 
    45         MPI - contains MPI specific data to configure maker to run on a cluster that supports MPI. 
     54        MPI - contains MPI specific data to configure MAKER for a cluster that supports MPI. 
    4655        Apollo - contains gff3.tiers file (See section titled APOLLO below) 
    4756        data  - contains some sample data used to make sure everything works 
     
    7685 
    7786 
    78 #---------- 
     87#---------------------------------------------------- 
    7988MPI MAKER INSTALL 
    8089 
     
    8291 
    8392        1. Install standard maker and verify that it runs. 
    84         2. Use cd to change to the MPI subdirectory in the maker instalation folder (i.e. maker/MPI/) 
    85         3. Run Install.PL by typing:     perl Install.PL 
     93        2. Install MPICH2 with the --enable-sharedlibs flag set to the appropriate value (See MPICH2 documentation) 
     94        3. Use cd to change to the MPI subdirectory in the maker instalation folder (i.e. maker/MPI/) 
     95        4. Run Install.PL by typing:     perl Install.PL 
    8696 
    8797A new version of maker called mpi_maker should now be installed under maker/bin. 
    8898 
    89 To run mpi_maker, first verify that your mpi environment is initiated, (i.e. using the mpdboot command). Now start mpi_maker via mpiexec. 
    90  
    91 Example: 
     99To run mpi_maker, first verify that your mpi environment is initiated, (i.e. using the mpdboot or mpd command). Now start mpi_maker via mpiexec. 
     100 
     101Example: (This will run MAKER on 3 nodes or processors) 
    92102 
    93103        mpiexec -n 3 perl maker_directory/maker/bin/mpi_maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl 
    94104 
    95105 
    96 Please see the documentation for the MPI environment you use for how to initiate an MPI process. 
    97  
    98  
    99 #---------- 
     106 
     107Please see the documentation of the MPI environment you use for instructions on how to initiate an MPI process. 
     108 
     109 
     110#---------------------------------------------------- 
    100111MAKER USAGE STATEMENT 
    101112 
    102 Usage: 
    103  
    104         maker [options] <maker_opts.ctl> <maker_bopts.ctl> <maker_exe.ctl> 
    105  
    106         The three input arguments are user control files that specify how maker should behave. 
    107         All input files listed in the control options files must be in fasta format.  Please 
    108         see maker documentation to learn more about control file format.  The program will 
    109         automatically try and locate the user control files in the current working 
    110         directory if these arguments are not supplied when initializing maker. 
    111  
    112         It is important to note that maker does not try and recalculated data that it has 
    113         already calculated.  For example, if you run an analysis twice on the same fasta file 
    114         you will notice that maker does not rerun any of the blast analyses but instead uses 
    115         the blast analyses stored from the previous run.  To force maker to rerun all 
    116         analyses, use the -f flag. 
    117  
    118 Options: 
    119  
    120      -genome|g  <file_name>   Give MAKER a different genome file (this overrides the 
    121                               control file value) 
    122  
    123      -predictor <snap>        Selects the gene predictor to use when building annotations (Default 
    124                 <augustus>    is 'snap').  The option 'est2genome' builds annotations directly 
    125                 <est2genome>  from the EST evidence. 
    126  
    127      -GFF                     Use an input gff3 format file of repeat elements for repeat masking. 
    128                               You must set rm_gff in maker_opts.ctl to the files location.  This 
    129                               option turns off all other repeat masking. 
    130  
    131      -RM_off|R                Turns repeat masking off (* See Warning) 
    132  
    133      -force|f                 Forces maker to rerun all analyses (replaces all previous output). 
    134  
    135      -datastore|d             Causes output to be written using datastore.  This option is 
    136                               automatically enabled if there are more than 1000 fasta entries 
    137                               in the input file.  Output can then accessed using the 
    138                               master_datastore_index file created by the program. 
    139  
    140      -PREDS                   Outputs ab-initio predictions that do not overlap maker annotation 
    141                               as gene annotations in the final gff3 output file (based on the 
    142                               -predictor flag ). 
    143  
    144      -CTL                     Generates generic control files in the current working directory. 
    145  
    146      -retry     <integer>     Re-run failed contigs up to the specified number of re-tries. 
    147  
    148      -cpus|c    <integer>     Tells how many cpus to use for Blast analysis (this overrides 
    149                               contorol file value). 
    150  
    151      -help|?                  Prints this usage statement. 
    152  
    153  
    154 Warning: 
    155        
    156         *When using the -R flag, maker expects that the input genome file is already masked. 
    157          Also if your genome file contains lower case characters, maker will consider those 
    158          characers to be soft masked. 
    159  
    160  
    161 #---------- 
     113 
     114 
     115 
     116#---------------------------------------------------- 
    162117RUNNING MAKER WITH EXAMPLE DATA 
    163118 
    1641191) Copy the files in the data directories to a temporary directory where you will run an example file. 
    1651202) Type maker -CTL to generate generic maker control files 
    166 3) Next you will need to edit the control files to include the path of the genome file, EST file, and proitein file, as well as the paths to all required executables.  See CONFIG FILE EDITING for more information. 
     1213) Next you will need to edit the control files to include the path of the genome file, EST file, and protein file, as well as the paths to all required executables.  See CONFIG FILE EDITING for more information. 
    1671224) Then try the following command from your temporary directory: 
    168123 
    169124perl maker_directory/bin/maker maker_exe.ctl maker_opts.ctl maker_bopts.ctl 
    170125 
    171 MAKER will create at least the following files/directory: 
    172  
    173 seq_name.gff - a gff file that can be loaded into GMOD, GBROWSE, or Apollo 
    174  
    175 seq_name.maker.transcripts.fasta - a file of the maker transcript sequences 
    176 seq_name.maker.snap.transcript.fasta - a file of ab-inito snap transcript sequences 
    177 seq_name.maker.proteins.fasta - a file of the maker protein sequences 
    178 seq_name.maker.snap.proteins.fasta - a file of ab-inito snap protein sequences 
    179  
    180 theVoid.seq_name - a directory containing all of the results files produced by maker, including BLAST reports, SNAP output, exonnerate output and the masked sequence 
     126MAKER will create at least the following files/directories: 
     127 
     128XXX.maker.output/ - contains all output for a given run of make 
     129XXX.maker.output/XXX_master_datastore_index.log - log of MAKER run progress as well as an index for traversing XXX.maker.output/XXX_datastore/ 
     130XXX.maker.output/XXX_datastore/ - contains folders containing the output for each individual contig of the input fasta file 
     131*Within these folders  
     132        seq_name.gff - a gff file that can be loaded into GMOD, GBROWSE, or Apollo 
     133        seq_name.maker.transcripts.fasta - a file of the maker transcript sequences 
     134        seq_name.maker.proteins.fasta - a file of the maker protein sequences 
     135        seq_name.maker.XXX.transcript.fasta - a file of ab-inito transcript sequences from program XXX 
     136        seq_name.maker.XXX.proteins.fasta - a file of ab-inito protein sequences from program XXX 
     137        seq_name.maker.non_overlapping_ab_initio.transcripts.fasta - a file of filtered ab-inito transcript sequences that don't overlap annotations 
     138        seq_name.maker.non_overlapping_ab_initio.proteins.fasta - a file of filtered ab-inito protein sequences that don't overlap annotations 
     139        theVoid.seq_name/ - a directory containing all of the raw output files produced by maker, including BLAST reports, SNAP output, exonnerate output and the masked sequence 
    181140 
    182141WARNING: 
    183142*The names of output files are based on sequence ids.  If giving maker a multi-fasta file, it is important to verify that all sequence id are unique, so files are not overwritten. 
    184 *If there are more than 1,000 sequences in a multi-fasta file or you use the -d flag on the command line a datastore structure will be used. see DATASTORE in this document. 
     143*If there are more than 1,000 sequences in a multi-fasta file a deep datastore structure will be used. see DATASTORE in this document. 
    185144*If sequence ids contain characters that are illegal in file names, those characters will be replaced automatically before building output file names. 
    186145 
    187  
    188  
    189 #---------- 
     146#---------------------------------------------------- 
    190147DATASTORE 
    191148 
    192149"Many filesystems have performance problems with large numbers of subdirectories and files within a single directory and even when the underlying filesystems handle things gracefully, access via network filesystems can be an issue.  The Datastore modules create a hiearchy of subdirectory layers, starting from a 'base', and mapping end-user's identifiers to the corresponding subdirectory." - quote from http://www.yandell-lab.org/  (See site for more information on the Datastore module) 
    193150 
    194 Datastore will be used by maker if there are more than 1,000 sequences in a multi-fasta file or you use the -d flag on the command line. 
    195  
    196 When datastore is implemented, the output files described above will not appear where you would normally expect them to be.  Instead they will be located in a series of sub-directory under a new base-directory whose name is determined from the input genome file name, i.e. current_working_directory/input_genome_datastore/EE/Af/seq_name/seq_name.gff.  A master_datastore_index file will be made in the current working directory to help you find the output files from each sequence. 
    197  
    198 The master_datastore_index file is a file created to allow the user to easily find the exact output directory corresponding to contigs from the input genome file.  The The master_datastore_index file contains two columns of text; the first column shows the sequence identifier from each fasta header, and the second column shows the location of the output files for that sequence.  
    199  
    200  
    201  
    202 #---------- 
     151A deep datastore will be used by maker if there are more than 1,000 sequences in a multi-fasta file. 
     152 
     153When a datastore is implemented, the output files described above will not appear where you would normally expect them to be.  Instead they will be located in a series of sub-directory under a new base-directory whose name is determined from the input genome file name, i.e. current_working_directory/genome_datastore/EE/Af/Contig1/Contig1.gff.  A master_datastore_index file will be made in the current working directory to help you find the output files from each sequence. 
     154 
     155The master_datastore_index file is a file created to allow the user to easily find the exact output directory corresponding to contigs from the input genome file.  The The master_datastore_index file contains three columns of text; the first column shows the sequence identifier from each fasta header, and the second column shows the location of the output files for that sequence. The third column is for logging the status of data related to an individual contig. The values of the third column are as follows: 
     156        STARTED - Indicates that maker has started proccessing this contig. 
     157        FINISHED - Indicates that maker has finished processing this contig and all data is currently available in that subdirectory. 
     158        DIED - Indicates that maker failed. 
     159        DIED_SKIPPED_PERMANENT - Indicates that maker failed up to the specified number of retries and will not try again. 
     160        RETRY - Indicates that maker is retrying the contig after a failure. 
     161        SKIPPED_SMALL - Indicates that this contig was skipped because it is too short (based on control file values set by the user) 
     162 
     163 
     164#---------------------------------------------------- 
    203165CONFIG FILE EDITING 
    204166 
     
    206168 
    207169 
    208 MAKER has 3 control files for configuration options. 
     170MAKER has 3 control files for configuration options. A fourth file evaluator.ctl is used to supply a MAKER related program EVALUATOR with options specific to that program (only important if 'evaluate' is set to 1 in maker_opts.ctl). 
     171 
     172Note that for all control files the comments written to help users begin with a pound sign(#).  In addition, options before the colon(:) can not be changed, nor should there be a space before or after the colon. 
    209173 
    210174A. maker_exe.ctl - includes information about programs executed by MAKER. 
    211  
    212 Here is what the standard maker_exe.ctl control file looks like: 
    213 ==================================== 
    214  
    215 #-----Location of executables required by Maker 
    216 xdformat:/usr/local/wublast/xdformat              #location of xdformat executable 
    217 blastn:/usr/local/wublast/blastn                  #location of blastn executable 
    218 blastx:/usr/local/wublast/blastx                  #location of blastn executable 
    219 snap:/usr/local/snap/snap                         #location of snap executable 
    220 augustus:/usr/local/augustus/bin/augustus         #location of augustus executable (optional) 
    221 RepeatMasker:/usr/local/RepeatMasker/RepeatMasker #location of RepeatMasker executable 
    222 exonerate:/usr/local/exonerate/bin/exonerate      #location of exonerate executable 
    223  
    224 ==================================== 
    225  
    226 Note that for all control files the comments written to help users begin with a pound sign(#).  In addition, options before the colon(:) can not be changed, nor should there be a space before or after the colon. 
     175Here an example of a section of the maker_exe.ctl file: 
     176==================================== 
     177#-----Location of Executables Used by Maker/Evaluator 
     178formatdb:/usr/local/bin/formatdb                              #location of NCBI formatdb executable 
     179blastall:/usr/local/bin/blastall                              #location of NCBI blastall executable 
     180xdformat:/usr/local/bin/xdformat                              #location of WUBLAST xdformat executable 
     181blastn:/usr/local/bin/blastn                                  #location of WUBLAST blastn executable 
     182blastx:/usr/local/bin/blastx                                  #location of WUBLAST blastx executable 
     183tblastx:/usr/local/bin/tblastx                                #location of WUBLAST tblastx executable 
     184RepeatMasker:/home/cholt/usr/local/RepeatMasker/RepeatMasker  #location of RepeatMasker executable 
     185exonerate:/home/cholt/usr/local/exonerate/bin/exonerate       #location of exonerate executable 
     186 
     187#-----Ab-initio Gene Prediction Algorithms 
     188snap:/home/cholt/usr/local/snap/snap                  #location of snap executable 
     189gmhmme3:/home/cholt/usr/local/gmes/gmhmme3            #location of eukaryotic genemark executable 
     190augustus:/home/cholt/usr/local/augustus/bin/augustus  #location of augustus executable 
     191fgenesh:/home/cholt/usr/local/fgenesh/fgenesh         #location of fgenesh executable 
     192 
     193==================================== 
    227194 
    228195 
    229196B. maker_bopts.ctl - contains statistics for fltering blast and exonerate data 
    230  
    231 Here an example maker_bopts.ctl: 
     197Here an example of a section of the maker_bopts.ctl file: 
    232198==================================== 
    233199 
    234200#-----BLAST and Exonerate statistics thresholds 
    235 percov_blastn:0.80 #Blastn Percent Coverage Threhold EST-Genome Alignments 
    236 percid_blastn:0.85 #Blastn Percent Identity Threshold EST-Genome Aligments 
    237 eval_blastn:1e-10  #Blastn eval cutoff 
    238 bit_blastn:40      #Blastn bit cutoff 
    239 percov_blastx:0.50 #Blastx Percent Coverage Threhold Protein-Genome Alignments 
    240 percid_blastx:0.40 #Blastx Percent Identity Threshold Protein-Genome Aligments 
    241 eval_blastx:1e-6   #Blastx eval cutoff 
    242 bit_blastx:30      #Blastx bit cutoff 
    243 e_perc_cov:50      #Exonerate Percent Coverage Thresshold EST_Genome Alignments 
    244 ep_score_limit:20  #Report  alignments scoring at least this percentage of the maximal score exonerate nucleotide 
    245 en_score_limit:20  #Report  alignments scoring at least this percentage of the maximal score exonerate protein 
    246  
     201blast_type:wublast    #set to 'wublast' or 'ncbi' 
     202 
     203pcov_blastn:0.8       #Blastn Percent Coverage Threhold EST-Genome Alignments 
     204pid_blastn:0.85       #Blastn Percent Identity Threshold EST-Genome Aligments 
     205eval_blastn:1e-10     #Blastn eval cutoff 
     206bit_blastn:40         #Blastn bit cutoff 
     207 
     208pcov_blastx:0.5       #Blastx Percent Coverage Threhold Protein-Genome Alignments 
     209pid_blastx:0.4        #Blastx Percent Identity Threshold Protein-Genome Aligments 
     210eval_blastx:1e-06     #Blastx eval cutoff 
     211bit_blastx:30         #Blastx bit cutoff 
     212 
     213pcov_rm_blastx:0.5    #Blastx Percent Coverage Threhold For Transposable Element Masking 
     214pid_rm_blastx:0.4     #Blastx Percent Identity Threshold For Transposbale Element Masking 
     215eval_rm_blastx:1e-06  #Blastx eval cutoff for transposable element masking 
     216bit_rm_blastx:30      #Blastx bit cutoff for transposable element masking 
    247217==================================== 
    248218 
    249219 
    250220C. maker_opts.ctl - contains options for maker and external programs used by maker 
    251  
    252 Here an example maker_opts.ctl: 
    253 ==================================== 
    254  
    255 #-----sequence and library files 
    256 genome:fly_assembly.fasta        #genome sequence file (required) 
    257 est:fly_est.fasta                #EST sequence file (required) 
    258 protein:uniprot.fasta            #protein sequence file (required) 
    259 repeat_protein:te_proteins.fasta #a database of transposable element proteins 
    260 rmlib:fly_specific_repeats.fasta #an organism specific repeat library (optional) 
    261 rm_gff:                          #a gff3 format file of repeat elements (only used with -GFF flag) 
    262  
    263 #-----external application specific options 
    264 snaphmm:fly          #SNAP HMM model 
    265 augustus_species:fly #Augustus gene prediction model 
    266 model_org:all        #RepeatMasker model organism 
    267 alt_peptide:c        #amino acid used to replace non standard amino acids in xdformat 
    268 cpus:2               #max number of cpus to use in BLAST and RepeatMasker 
    269  
    270 #-----Maker specific options 
    271 predictor:snap     #identifies which gene prediction program to use for annotations 
    272 te_remove:1        #mask regions with excess similarity to transposable element proteins 
    273 max_dna_len:100000 #length for dividing up contigs into chunks (larger values increase memory usage) 
    274 split_hit:10000    #length of the splitting of hits (max intron size for EST and protein alignments) 
    275 snap_flank:200     #length of sequence surrounding EST and protein evidence used to extend gene predictions 
    276 single_exon:0      #consider EST hits aligning to single exons when generating annotations, 1 = yes, 0 = no 
    277 use_seq_dir:1      #place output files in same directory as sequence file: 1 = yes, 0 = no 
    278 clean_up:0         #remove theVoid directory: 1 = yes, 0 = no 
    279  
    280 ==================================== 
    281  
    282  
    283 #---------- 
     221Here an example of a section of the maker_opts.ctl file: 
     222==================================== 
     223#-----Genome (Required for De-Novo Annotations) 
     224genome:input/genome.fasta  #genome sequence file in fasta format 
     225 
     226#-----Re-annotation Options 
     227genome_gff:     #re-annotate genome based on this gff3 file 
     228est_pass:0      #use ests in genome_gff: 1 = yes, 0 = no 
     229altest_pass:0   #use alternate organism ests in genome_gff: 1 = yes, 0 = no 
     230protein_pass:0  #use proteins in genome_gff: 1 = yes, 0 = no 
     231rm_pass:0       #use repeats in genome_gff: 1 = yes, 0 = no 
     232model_pass:0    #use gene models in genome_gff: 1 = yes, 0 = no 
     233pred_pass:0     #use ab-initio predictions in genome_gff: 1 = yes, 0 = no 
     234other_pass:0    #passthrough everything else in genome_gff: 1 = yes, 0 = no 
     235 
     236#-----EST Evidence (you must provide a value for at least one) 
     237est:input/est.fasta        #non-redundant set of assembled ESTs in fasta format (classic EST analysis) 
     238est_reads:                 #un-assembled EST reads in fasta format (for deep nextgen mRNASeq) 
     239altest:input/altest.fasta  #EST/cDNA sequence file in fasta format from an alternate organism 
     240est_gff:                   #EST evidence from a seperate gff3 file 
     241altest_gff:                #Alternate organism EST evidence from a seperate gff3 file 
     242 
     243#-----Protein Homology Evidence (you must provide a value for at least one) 
     244protein:input/protein.fasta  #protein sequence file in fasta format 
     245protein_gff:                 #protein homology evidence from a gff3 file 
     246==================================== 
     247 
     248#---------------------------------------------------- 
     249GFF3 Passthrough 
     250 
     251If you have data from a source that MAKER does not support, and you wish to use the data in annotating a genome, then you can pass the data to MAKER as an aligned GFF3 file.  This is done by supplying the files location to the appropriate value in the maker_opt.ctl file (i.e. est_gff:input\est.gff).  Note that MAKER expects all data sent to it to be of the type specified, so don't put mixed data in a file (i.e. don't mix EST and other data in the file pointed to by est_gff, otherwise it all gets used as EST data).  Also the genome_gff option is only for MAKER produced GFF3 files.  Other GFF3 files of mixed data must be split by type and identified by the appropriate control file option (i.e. rm_gff for repeat data, pred_gff for ab-initio prediction data, est_gff for EST data, etc.).  
     252 
     253#---------------------------------------------------- 
    284254ADDING UTRs for GBROWSE 
    285255 
     
    292262 
    293263 
    294 #---------- 
     264#---------------------------------------------------- 
    295265APOLLO 
    296266 
     
    298268 
    299269 
    300 #---------- 
    301 HMM BUILDING (based snap documentation) 
     270#---------------------------------------------------- 
     271HMM BUILDING (based on snap documentation) 
    302272 
    303273A.  First you will need to determine the genes used to model future genes, by determining a high quality gene set (annotations for the high quality gene should be in GFF3 format).  The high quality gene set can then be coverted into snap ZFF format using maker2zff.pl found in maker/bin. 
  • lib/Dumper/GFF/GFFV3.pm

    r172 r173  
    443443     
    444444    my ($class) = lc($h->algorithm); 
    445     $class =~ /^exonerate\:*\_*/; 
     445    $class =~ s/^exonerate\:*\_*//; 
    446446 
    447447    my $type; 
  • lib/GFFDB.pm

    r172 r173  
    149149             my ($index) = $dbh->selectrow_array(qq{SELECT name FROM sqlite_master WHERE name = '$table\_inx'}); 
    150150             $dbh->do(qq{DROP TABLE $table}); 
    151              $dbh->do(qq{DROP INDEX $index}) if $index; 
    152151             $dbh->do(qq{CREATE TABLE $table (seqid TEXT, source TEXT, start INT, end INT, line TEXT)}); 
    153152             $dbh->do(qq{UPDATE sources SET source = '$source' WHERE name = '$table'}); 
     
    320319          my ($index) = $dbh->selectrow_array(qq{SELECT name FROM sqlite_master WHERE name = '$table\_inx'}); 
    321320          $dbh->do(qq{DROP TABLE $table}); 
    322           $dbh->do(qq{DROP INDEX $index}) if $index; 
    323321          $dbh->do(qq{CREATE TABLE $table (seqid TEXT, source TEXT, start INT, end INT, line TEXT)}); 
    324322          $dbh->do(qq{UPDATE sources SET source = '$source' WHERE name = '$table'}); 
  • lib/GI.pm

    r172 r173  
    504504   my $mpi_size = shift@_ || 1; 
    505505 
     506   #rebuild all fastas when specified 
     507   File::Path::rmtree($CTL_OPT->{out_base}."/mpi_blastdb") if($CTL_OPT->{force}); 
     508 
    506509   ($CTL_OPT->{_protein}, $CTL_OPT->{p_db}) = split_db($CTL_OPT, 'protein', $mpi_size); 
    507510   ($CTL_OPT->{_est}, $CTL_OPT->{e_db}) = split_db($CTL_OPT, 'est', $mpi_size); 
     
    539542   my $f_full = $f_dir.".fasta"; 
    540543                           
    541    #rebuild fastas on force 
    542    File::Path::rmtree($b_dir) if ($CTL_OPT->{force}); 
    543  
    544544   if(-e "$f_dir"){ 
    545545      my @t_db = <$f_dir/*$d_name\.*>; 
     
    22332233       else { 
    22342234           push(@{$CTL_OPT{_predictor}}, $p) unless($uniq{$p}); 
    2235            push(@{$CTL_OPT{_run}}, $p) unless($uniq{$p}); 
     2235           if($p =~ /^snap$|^augustus$|^fgenesh$|^genemark$|^jigsaw$/){ 
     2236               push(@{$CTL_OPT{_run}}, $p) unless($uniq{$p}); 
     2237           } 
    22362238           $uniq{$p}++; 
    22372239       } 
     
    25162518   print OUT "#-----EST Evidence (you should provide a value for at least one)\n"; 
    25172519   print OUT "est:$O{est} #non-redundant set of assembled ESTs in fasta format (classic EST analysis)\n"; 
    2518    print OUT "est_reads:$O{est_reads} #un-assembled EST reads in fasta format (for deep nextgen mRNASeq)\n"; 
     2520   print OUT "est_reads:$O{est_reads} #unassembled nextgen mRNASeq in fasta format (not fully implemented)\n"; 
    25192521   print OUT "altest:$O{altest} #EST/cDNA sequence file in fasta format from an alternate organism\n"; 
    25202522   print OUT "est_gff:$O{est_gff} #EST evidence from an external gff3 file\n"; 
  • lib/Widget/RepeatMasker.pm

    r159 r173  
    189189 
    190190                push(@args, '-algorithm'); 
    191                 push(@args, 'repeat masker'); 
     191                push(@args, 'repeatmasker'); 
    192192 
    193193                push(@args, '-bits'); 
     
    252252                     Bio::Search::Hit::PhatHit::repeatmasker->new('-name' => $key, 
    253253                                                                  '-description'  => 'NA', 
    254                                                                   '-algorithm'    => 'repeat_masker', 
     254                                                                  '-algorithm'    => 'repeatmasker', 
    255255                                                                  '-length'       => $q_length, 
    256256                                                                  ); 
  • lib/Widget/fgenesh.pm

    r168 r173  
    371371                #added 3/19/2009 
    372372                #check for single and double base pair overhangs 
     373                @{$g->{$gene}} = grep {$_->{type} !~ /TSS|PolA/} @{$g->{$gene}}; 
    373374                @{$g->{$gene}} = sort {$a->{b} <=> $b->{b}} @{$g->{$gene}}; 
    374375                my $length = 0; 
  • lib/runlog.pm

    r172 r173  
    180180      else { 
    181181         $continue_flag = 0 if (-e $gff_file); #don't re-run finished 
    182        
     182      } 
     183       
     184      if($continue_flag == 0 || $continue_flag == -1 || $continue_flag == 3){ 
    183185         #CHECK CONTROL FILE OPTIONS FOR CHANGES 
    184186         my $cwd = Cwd::cwd();