Changeset 146
- Timestamp:
- 02/09/09 16:49:11 (10 months ago)
- Files:
-
- MPI/mpi_maker (modified) (1 diff)
- bin/maker (modified) (5 diffs)
- lib/Dumper/GFF/GFFV3.pm (modified) (2 diffs)
- lib/GI.pm (modified) (2 diffs)
- lib/Process/MpiChunk.pm (modified) (4 diffs)
- lib/evaluator/evaluate.pm (modified) (3 diffs)
- lib/maker/auto_annotator.pm (modified) (9 diffs)
- lib/shadow_AED.pm (added)
Legend:
- Unmodified
- Added
- Removed
- Modified
- Copied
- Moved
MPI/mpi_maker
r144 r146 150 150 augustus 151 151 fgenesh 152 twinscan153 152 est2genome (Uses EST's directly) 154 153 abinit (ab-initio predictions) bin/maker
r144 r146 71 71 Maker is a program that produces gene annotations in gff3 file format using 72 72 evidence such as EST alignments and protein homology. Maker can be used to 73 produce gene annotations for new genome as well as update annoations from73 produce gene annotations for new genomes as well as update annoations from 74 74 existing genome databases. 75 75 76 76 The four input arguments are user control files that specify how maker 77 should behave. All options for maker should be set in the control files, 78 but a few can also be set on the command line. The evaluator_opts.ctl 79 file contains control options specific for the evaluation of gene 80 annotations. 77 should behave. The evaluator_opts file contains control options specific 78 for the evaluation of gene annotations. All options for maker should be set 79 in the control files, but a few can also be set on the command line. 80 Command line options provide a convenient machanism to override commonly 81 altered control file values. 81 82 82 83 Input files listed in the control options files must be in fasta format. 83 Please see maker documentation to learn more about control file format.84 The program will automatically try and locate the user control files in the85 current working directory if these arguments are not supplied when86 initializing maker.84 Please see maker documentation to learn more about control file 85 configuration. Maker will automatically try and locate the user control 86 files in the current working directory if these arguments are not supplied 87 when initializing maker. 87 88 88 89 It is important to note that maker does not try and recalculated data that 89 90 it has already calculated. For example, if you run an analysis twice on 90 the same fastafile you will notice that maker does not rerun any of the91 blast analyses but instead uses the blast analyses stored from the previous92 run. To force maker to rerun all analyses, use the -f flag.91 the same dataset file you will notice that maker does not rerun any of the 92 blast analyses, but instead uses the blast analyses stored from the 93 previous run. To force maker to rerun all analyses, use the -f flag. 93 94 94 95 95 96 Options: 96 97 97 -genome|g <filename> Give MAKER a different genome file (this overrides 98 the control file value). 99 100 -predictor|p <type> Selects the gene predictor/predictors to use when 101 building annotations (this overrides the control 102 file value). Use a ',' to seperate types (no 103 spaces), i.e. -predictor=snap,augustus,fgenesh 104 105 Types: snap 106 augustus 107 fgenesh 108 twinscan 109 est2genome (Uses EST's directly) 110 gff (Passes through gff3 file annotations) 111 112 -RM_off|R Turns all repeat masking off. When using this 113 flag, maker expects that the input genome file is 114 already masked (This overrides all repeatmasking 115 control file values). 116 117 -datastore|d Causes output to be written using a datastore. This 118 option is automatically enabled if there are more 119 than 1000 fasta entries in the input genome file. 120 Output can then be accessed using the 121 master_datastore_index file created by the maker 122 (this overrides the control file value). 123 124 -retry <integer> Re-run failed contigs up to the specified number of 125 re-tries (This overrides the control file value). 126 127 -cpus|c <integer> Tells how many cpus to use for BLAST analysis (this 128 overrides contorol file value). 129 130 -force|f Forces maker to rerun all analyses (erases all 131 previous output). 132 133 -quiet|q Silences most of the status messages. 134 135 -CTL Generates generic control files in the current 136 working directory. 137 138 -help|? Prints this usage statement. 98 -genome|g <filename> Specify the genome file. 99 100 -predictor|p <type> Selects the predictor(s) to use when building 101 annotations. Use a ',' to seperate types (no spaces). 102 i.e. -predictor=snap,augustus,fgenesh 103 104 types: snap 105 augustus 106 fgenesh 107 est2genome (Uses EST's directly) 108 abinit (ab-initio predictions) 109 gff (Passes through gff3 file annotations) 110 111 -RM_off|R Turns all repeat masking off. 112 113 -retry <integer> Rerun failed contigs up to the specified count. 114 115 -cpus|c <integer> Tells how many cpus to use for BLAST analysis. 116 117 -force|f Forces maker to delete old files before running again. 118 This will require all blast analyses to be rerun. 119 120 -quiet|q Silences most of maker's status messages. 121 122 -CTL Generate empty control files in the current directory. 123 124 -help|? Prints this usage statement. 139 125 140 126 … … 321 307 $CTL_OPT{split_hit}, 322 308 $CTL_OPT{cpus}, 323 $CTL_OPT{ _repeat_protein},309 $CTL_OPT{repeat_protein}, 324 310 $CTL_OPT{_formater}, 325 311 0, … … 407 393 $CTL_OPT{split_hit}, 408 394 $CTL_OPT{cpus}, 409 $CTL_OPT{ _est},395 $CTL_OPT{est}, 410 396 $CTL_OPT{_formater}, 411 397 0, … … 440 426 $CTL_OPT{split_hit}, 441 427 $CTL_OPT{cpus}, 442 $CTL_OPT{ _protein},428 $CTL_OPT{protein}, 443 429 $CTL_OPT{_formater}, 444 430 0, … … 473 459 $CTL_OPT{split_hit}, 474 460 $CTL_OPT{cpus}, 475 $CTL_OPT{ _altest},461 $CTL_OPT{altest}, 476 462 $CTL_OPT{_formater}, 477 463 0, lib/Dumper/GFF/GFFV3.pm
r127 r146 674 674 my $t_name = $t->{t_name}; 675 675 my $t_qi = $t->{qi}; 676 my $AED = $t->{AED}; 677 my $score = $t->{score}; 676 678 677 679 my $t_s = $t_hit->strand('query') == 1 ? '+' : '-'; … … 683 685 my @data; 684 686 push(@data, $seq_id, 'maker', 'mRNA', $t_b, $t_e, '.', $t_s, '.'); 685 my $nine = 'ID='.$t_id.';Parent='.$g_id.';Name='.$t_name;; 687 my $nine = 'ID='.$t_id.';Parent='.$g_id.';Name='.$t_name; 688 $nine .= ';aed='.$AED.'eval_score='.$score; 686 689 $nine .= ';'.$t_hit->{-attrib} if($t_hit->{-attrib}); 687 690 push(@data, $nine); lib/GI.pm
r145 r146 2158 2158 $error .= "ERROR: Invalid predictor defined: $p\n". 2159 2159 "Valid entries are: est2genome, abinit, gff, snap, augustus,\n". 2160 " fgenesh, jigsaw, or twinscan\n\n";2160 "or fgenesh\n\n"; 2161 2161 } 2162 2162 } … … 2494 2494 print OUT "augustus:$O{augustus} #location of augustus executable\n"; 2495 2495 print OUT "fgenesh:$O{fgenesh} #location of fgenesh executable\n"; 2496 print OUT "twinscan:$O{twinscan} #location of twinscan executable\n";2496 # print OUT "twinscan:$O{twinscan} #location of twinscan executable\n"; 2497 2497 print OUT "fathom:$O{fathom} #location of fathom executable\n"; 2498 2498 print OUT "\n"; 2499 2499 print OUT "#-----Other Algorithms\n"; 2500 print OUT "jigsaw:$O{jigsaw} #location of jigsaw executable \n";2501 print OUT "qrna:$O{qrna} #location of qrna executable \n";2500 print OUT "jigsaw:$O{jigsaw} #location of jigsaw executable (not yet implemented)\n"; 2501 print OUT "qrna:$O{qrna} #location of qrna executable (not yet implemented)\n"; 2502 2502 close(OUT); 2503 2503 lib/Process/MpiChunk.pm
r144 r146 469 469 $CTL_OPT{split_hit}, 470 470 $CTL_OPT{cpus}, 471 $CTL_OPT{ _repeat_protein},471 $CTL_OPT{repeat_protein}, 472 472 $CTL_OPT{_formater}, 473 473 $self->{RANK}, … … 754 754 $CTL_OPT{split_hit}, 755 755 $CTL_OPT{cpus}, 756 $CTL_OPT{ _est},756 $CTL_OPT{est}, 757 757 $CTL_OPT{_formater}, 758 758 $self->{RANK}, … … 872 872 $CTL_OPT{split_hit}, 873 873 $CTL_OPT{cpus}, 874 $CTL_OPT{ _protein},874 $CTL_OPT{protein}, 875 875 $CTL_OPT{_formater}, 876 876 $self->{RANK}, … … 987 987 $CTL_OPT{split_hit}, 988 988 $CTL_OPT{cpus}, 989 $CTL_OPT{ _altest},989 $CTL_OPT{altest}, 990 990 $CTL_OPT{_formater}, 991 991 $self->{RANK}, lib/evaluator/evaluate.pm
r141 r146 16 16 use Fasta; 17 17 18 19 18 use vars qw/$OPT_F $OPT_PREDS $OPT_PREDICTOR $LOG $CTL/; 20 19 #------------------------------------------------------------------------------ … … 142 141 $blastx_hits, $abinits_hits, $t_name); 143 142 144 my $txnAED = evaluator::AED::txnAED($box,{'start'=>1,'stop'=>1,'donor'=>1,'acceptor'=>1}); 145 my $overallAED =evaluator::AED::txnAED($box, { 'start'=>100, 146 'stop'=>100, 147 'donor'=>100, 148 'acceptor'=>100, 149 'exon'=>1, 150 } ); 143 144 #-----temporary fix for oomycete 145 my $txnAED = '';#evaluator::AED::txnAED($box,{'start'=>1,'stop'=>1,'donor'=>1,'acceptor'=>1}); 146 my $overallAED = '';#evaluator::AED::txnAED($box, { 'start'=>100, 147 # 'stop'=>100, 148 # 'donor'=>100, 149 # 'acceptor'=>100, 150 # 'exon'=>1, 151 # } ); 151 152 153 #----- 154 155 156 152 157 153 158 my $snap_backwards; … … 177 182 my $report = generate_report($eat, $box, $qi, $quality_seq, $splice_sites, 178 183 $transcript_type, $completion, $alt, $score, 179 $so_code, $geneAED, $txnAED, $overallAED,180 $solexa_for_splices, $gff3_identity,181 $snap_backwards);184 $so_code, $geneAED, $txnAED, $overallAED, 185 $solexa_for_splices, $gff3_identity, 186 $snap_backwards); 182 187 183 188 print STDERR "Finished.\n\n" unless $main::quiet; lib/maker/auto_annotator.pm
r145 r146 23 23 use evaluator::funs; 24 24 use evaluator::AED; 25 use shadow_AED; 25 26 26 27 @ISA = qw( … … 665 666 my $g = shift; 666 667 667 return $g->{ eval};668 return $g->{AED}; 668 669 } 669 670 #------------------------------------------------------------------------ … … 851 852 $f->name($t_name); 852 853 853 my $qi = 854 maker::quality_index::get_transcript_qi($f,$evi,$offset,$len_3_utr,$l_trans); 854 #my $qi = maker::quality_index::get_transcript_qi($f,$evi,$offset,$len_3_utr,$l_trans); 855 855 856 856 #----evaluator here … … 859 859 my $blastx_hits = get_selected_types($evi->{gomiph},'blastx', 'protein_gff'); 860 860 my $abinits = $evi->{all_preds}; 861 861 862 my @bag = (@$pol_p_hits, 863 @$pol_e_hits, 864 @$blastx_hits 865 ); 866 867 my $shadowAED = shadow_AED::get_AED(\@bag, $f); 868 862 869 #holds evalutor struct 863 870 my $eva = evaluator::evaluate::power_evaluate($f, … … 875 882 ); 876 883 884 my $AED = $shadowAED; #$eva->{score}; 877 885 my $score = $eva->{score}; 878 my $eva_qi = $eva->{qi}; 879 die "$qi\t$eva_qi\n" if ($qi ne $eva_qi); 880 #my $score = 0; 886 my $qi = $eva->{qi}; 881 887 #---- 882 888 $t_name .= " AED:"; 889 $t_name .= sprintf '%.2f', $AED; # two decimal places 883 890 $t_name .= " $qi"; 884 891 … … 890 897 't_name' => $t_name, 891 898 't_qi' => $qi, 892 'eval' => $score, 899 'AED' => $AED, 900 'score' => $score, 893 901 'report' => $eva->{report} 894 902 }; … … 1068 1076 #---- 1069 1077 1070 my $eval = 0; 1078 my $AED = 0; 1079 my $score = 0; 1071 1080 my $i = 1; 1072 1081 foreach my $f (@{$c}) { … … 1077 1086 $geneAED, $alt_spli_sup, $the_void, $CTL_OPTIONS); 1078 1087 push(@t_structs, $t_struct); 1079 $eval = $t_struct->{eval} if($t_struct->{eval} > $eval); 1088 $score = $t_struct->{score} if($t_struct->{score} > $AED); 1089 $AED = $t_struct->{AED} if($t_struct->{AED} > $AED); 1080 1090 $i++; 1081 1091 } … … 1088 1098 'g_end' => $g_end, 1089 1099 'g_strand' => $g_strand, 1090 'eval' => $eval, 1100 'score' => $score, 1101 'AED' => $AED, 1091 1102 'predictor' => $predictor, 1092 1103 'so_code' => $so_code
