www.cis.upenn.edu/~angelov/files/Monday_8_Angelov.ppt
t face=arial size=-1 color=black>
Efficient Enumeration of Phylogenetically Informative
Substrings
B. Harb, S. Kannan, S. Khanna, J. Kim
University of Pennsylvania
Stanislav Angelov
Joint work with
Efficient Enumeration of Phylogenetically Informative Substrings
2
Phylogenetically Informative Substrings
Set
of genomes and
their phylogeny
For
a node
:
Substrings
common to all genomes under (HCS)
Substring
common to all genomes under
but not found in any other genome of
Substrings
common to all genomes in s left clade but not found in any other genome
in s right clade:
discriminating substrings
Efficient Enumeration of Phylogenetically Informative Substrings
3
Discriminating Substrings (Tags)
High-throughput
techniques are needed for
Rapid
identification of organisms
Classification of new organisms
Long
shared substrings between two genomes indicative of genealogical relationship
Substring
tests can be implemented efficiently using oligo hybridization arrays. No need of explicit sequencing.
Efficient Enumeration of Phylogenetically Informative Substrings
4
Example
ACATCAG
TCATAGT
CGTCGAC
Unclassified Genome
GCATAGG
Efficient Enumeration of Phylogenetically Informative Substrings
5
Example
A</span><span style=" font-family: 'Arial', 'Arial'; font-size: 24pt;
font-weight: normal; font-style: normal; text-decoration: none; color: #CC0000;
">CAT</span><span style=" font-family: 'Arial', 'Arial'; font-size: 24pt;
font-weight: normal; font-style: normal; text-decoration: none;">CAG
T</span><span style=" font-family: 'Arial', 'Arial'; font-size: 2