www.cis.upenn.edu/~angelov/files/Monday_8_Angelov.ppt

t face=arial size=-1 color=black>







Efficient Enumeration of Phylogenetically Informative
Substrings


B. Harb, S. Kannan, S. Khanna, J. Kim


University of Pennsylvania


Stanislav Angelov


Joint work with



Efficient Enumeration of Phylogenetically Informative Substrings


2


Phylogenetically Informative Substrings



Set
of genomes and  

their phylogeny


For
a node            
:

Substrings
common to all genomes under       (HCS)


Substring
common to all genomes under       

but not found in any other genome of


Substrings
common to all genomes in    s left clade       but not found in any other genome
in    s right clade:

discriminating substrings



Efficient Enumeration of Phylogenetically Informative Substrings


3


Discriminating Substrings (Tags)



High-throughput
techniques are needed for

Rapid
identification of organisms
Classification of new organisms


Long
shared substrings between two genomes indicative of genealogical relationship


Substring
tests can be implemented efficiently using oligo hybridization arrays. No need of explicit sequencing.


Efficient Enumeration of Phylogenetically Informative Substrings


4


Example


ACATCAG


TCATAGT


CGTCGAC


Unclassified Genome


GCATAGG



Efficient Enumeration of Phylogenetically Informative Substrings


5


Example


A</span><span style=" font-family: 'Arial', 'Arial'; font-size: 24pt;
font-weight: normal; font-style: normal; text-decoration: none; color: #CC0000;
">CAT</span><span style=" font-family: 'Arial', 'Arial'; font-size: 24pt;
font-weight: normal; font-style: normal; text-decoration: none;">CAG


T</span><span style=" font-family: 'Arial', 'Arial'; font-size: 2