This reports the protocol used to align the Sorghum_EST features to Maize_BACs_20060126. Mon Feb 13 15:01:33 2006 Source of Sorghum_EST : Downloaded from genbank with query ' txid4557[orgn] AND gbdiv_est[PROP]' Alignment procedure details --------------------------- 231225 Sorghum_EST are aligned to Maize_BACs_20060126 using blat with blat parameters -minIdentity=50 followed by PslReps with -singleHit. This was followed by a filtering procedure described below and applied in general to 'CrossSpecies-Coding' data sets. Initial summary # alignments : 45512 # unique Features these alignments represent: 35913 % of total features these alignments represent : 15.53 % The length of the matches are distributed as follows Hit_Length # alignments -------- -------- 100 7404 150 5485 200 4517 250 3823 300 3657 350 3508 400 3711 450 3307 500 3324 550 2902 600 1910 650 976 700 598 750 267 800 91 10000 32 Alignments with matches less than 150 bp are deleted # remaining Alignments : 32698 # unique Features these remaining alignments represent: 25205 % of total features these alignments represent : 10.90 % Frequency distribution of the remaining features # hits # features -------- -------- 1 21739 2 2235 3 220 4 136 5 319 6 337 8 219 9 0 10 0 20 0 30 0 40 0 50 0 100 0 Features that hit more than thrice are deleted. # remaining Alignments : 26869 # unique Features these remaining alignments represent: 24194 % of total features these alignments represent : 10.46 % % Identity distribution of the remaining features % Identity # features -------- -------- 10 0 20 0 30 0 40 7 50 9 60 82 70 373 80 1320 90 7225 95 14026 100 3827 Following is the distribution of Gaps Gaps # features -------- -------- 1000 23311 2000 1995 3000 687 4000 313 5000 189 6000 68 7000 47 8000 22 9000 33 10000 27 Following is the final summary # alignments : 26869 # unique Features these alignments represent: 24194 % of total features these alignments represent : 10.46 %