This reports the protocol used to align the Rice_CDS features to Maize_BACs_20060126.
Mon Feb 13 12:39:16 2006


Source of Rice_CDS : Downloaded from Genbank with query '(txid4530[ORGN] AND complete[TITL] AND cds[TITL]) NOT (Mitochondrion[ALL] OR Chloroplast[ALL] OR Mitochondrial[ALL]) )' 

Alignment procedure details 
--------------------------- 

2423 Rice_CDS are aligned to Maize_BACs_20060126 using blat with blat parameters -minIdentity=50 followed by PslReps with -singleHit. This was followed by a filtering procedure described below and applied in general to 'CrossSpecies-Coding' data sets.

Initial summary
# alignments : 857
# unique Features these alignments represent: 780
% of total features these alignments represent : 32.19 %

The length of the matches are distributed as follows 
Hit_Length	# alignments
--------	--------
100	 190
150	 168
200	 82
250	 65
300	 45
350	 37
400	 23
450	 23
500	 15
550	 19
600	 18
650	 18
700	 13
750	 9
800	 12
10000	 120

Alignments with matches less than 150 bp are deleted
# remaining Alignments : 500
# unique Features these remaining alignments represent: 466
% of total features these alignments represent : 19.23 %

Frequency distribution of the remaining features
# hits	# features
--------	--------
1	 433
2	 32
3	 1
4	 0
5	 0
6	 0
8	 0
9	 0
10	 0
20	 0
30	 0
40	 0
50	 0
100	 0

 Features that hit more than thrice are deleted.  
# remaining Alignments : 500
# unique Features these remaining alignments represent: 466
% of total features these alignments represent : 19.23 %

% Identity distribution of the remaining features
% Identity	# features
--------	--------
10	 0
20	 0
30	 0
40	 1
50	 3
60	 10
70	 19
80	 99
90	 302
95	 64
100	 2

Following is the distribution of Gaps
Gaps	# features
--------	--------
1000	 335
2000	 67
3000	 35
4000	 21
5000	 10
6000	 6
7000	 3
8000	 7
9000	 2
10000	 1

Following is the final summary
# alignments : 500
# unique Features these alignments represent: 466
% of total features these alignments represent : 19.23 %