This reports the protocol used to align the Rice_EST features to Maize_BACs_20060126.
Mon Feb 13 13:21:14 2006


Source of Rice_EST : Downloaded from Genbank with query 'txid4530[orgn]  AND  gbdiv_est[PROP] 

Alignment procedure details 
--------------------------- 

298857 Rice_EST are aligned to Maize_BACs_20060126 using blat with blat parameters -minIdentity=50 followed by PslReps with -singleHit. This was followed by a filtering procedure described below and applied in general to 'CrossSpecies-Coding' data sets.

Initial summary
# alignments : 39150
# unique Features these alignments represent: 27694
% of total features these alignments represent : 9.27 %

The length of the matches are distributed as follows 
Hit_Length	# alignments
--------	--------
100	 7917
150	 5406
200	 5071
250	 4525
300	 3416
350	 3211
400	 2652
450	 2695
500	 1644
550	 960
600	 534
650	 343
700	 218
750	 182
800	 277
10000	 99

Alignments with matches less than 150 bp are deleted
# remaining Alignments : 25948
# unique Features these remaining alignments represent: 17089
% of total features these alignments represent : 5.72 %

Frequency distribution of the remaining features
# hits	# features
--------	--------
1	 13922
2	 1360
3	 166
4	 457
5	 597
6	 216
8	 371
9	 0
10	 0
20	 0
30	 0
40	 0
50	 0
100	 0

 Features that hit more than thrice are deleted.  
# remaining Alignments : 17140
# unique Features these remaining alignments represent: 15448
% of total features these alignments represent : 5.17 %

% Identity distribution of the remaining features
% Identity	# features
--------	--------
10	 0
20	 0
30	 2
40	 10
50	 13
60	 58
70	 200
80	 1228
90	 9773
95	 3618
100	 2238

Following is the distribution of Gaps
Gaps	# features
--------	--------
1000	 14369
2000	 897
3000	 397
4000	 105
5000	 1108
6000	 95
7000	 25
8000	 11
9000	 26
10000	 19

Following is the final summary
# alignments : 17140
# unique Features these alignments represent: 15448
% of total features these alignments represent : 5.17 %