This reports the protocol used to align the Sorghum_ESTcluster3_Pratt features to tigrv4-genome.
Fri Apr 14 11:57:18 2006


Source of Sorghum_ESTcluster3_Pratt : from Gramene markers database, originally from Pratt lab 

Alignment procedure details 
--------------------------- 

27436 Sorghum_ESTcluster3_Pratt are aligned to tigrv4-genome using blat with blat parameters -minIdentity=50 followed by PslReps with -singleHit. This was followed by a filtering procedure described below and applied in general to 'CrossSpecies-Coding' data sets.

Initial summary
# alignments : 17721
# unique Features these alignments represent: 16594
% of total features these alignments represent : 60.48 %

The length of the matches are distributed as follows 
Hit_Length	# alignments
--------	--------
100	 1918
150	 2045
200	 2352
250	 2332
300	 2268
350	 2078
400	 1742
450	 1240
500	 766
550	 446
600	 286
650	 132
700	 46
750	 25
800	 22
10000	 23

Alignments with matches less than 150 bp are deleted
# remaining Alignments : 13814
# unique Features these remaining alignments represent: 12946
% of total features these alignments represent : 47.19 %

Frequency distribution of the remaining features
# hits	# features
--------	--------
1	 12543
2	 238
3	 41
4	 44
5	 53
6	 5
8	 8
9	 6
10	 4
20	 4
30	 0
40	 0
50	 0
100	 0

 Features that hit more than thrice are deleted.  
# remaining Alignments : 13142
# unique Features these remaining alignments represent: 12822
% of total features these alignments represent : 46.73 %

% Identity distribution of the remaining features
% Identity	# features
--------	--------
10	 0
20	 0
30	 0
40	 3
50	 5
60	 24
70	 151
80	 1157
90	 9068
95	 2589
100	 145

Following is the distribution of gaps
Gaps	# features
--------	--------
1000	 11199
2000	 1202
3000	 302
4000	 84
5000	 37
6000	 33
7000	 17
8000	 28
9000	 16
10000	 14

Following is the final summary
# alignments : 13142
# unique Features these alignments represent: 12822
% of total features these alignments represent : 46.73 %