The human genome has been predicted to contain on the order of 40,000

rit.edu/~shuba/pdfs/ismb02.pdf. It's a snapshot of the page taken as our search engine crawled the Web.
The web site itself may have changed. You can check the current page or check for previous versions at the Internet Archive. Yahoo! is not affiliated with the authors of this page or responsible for its content.
The human genome has been predicted to contain on the order of 40,000 Novel Opportunities and Challenges in the Human Proteome: A Bioinformatics
Strategy to Identify Splice Variants of Druggable Gene Targets

Chandra Ramanathan
1
, Shuba Gopal
3
, Bob Bruccoleri
1
, John Feder
2
, Gabe Mintier
2
and

Terry Gaasterland
3


Bioinformatics
1
, Applied Genomics
2
, Bristol-Myers Squibb and The Rockefeller
University
3

Email:

Chandra.Ramanathan@bms.com



The human genome has been predicted to contain on the order of 40,000
genes. This is just a little over twice the number of genes predicted in C.elegans,
a nematode, and in D.melanogaster, the fruit fly. In order to account for the
biological complexity of mammals and humans in particular, it is postulated that
somewhere between one third to one ha lf of the genes in the human genome are
alternatively spliced. If each such splicing yielded two or more functional
proteins, then this would produce a large increase in the number of functional
proteins, accounting for the complexity seen. The different protein forms of a
gene can have different pharmacological functions. For example, the
Gonadotropin-releasing hormone (GnRH) receptor has recently been shown to
encode a novel splice variant, which acts as a repressor of the receptor itself.
The difference between the predicted number of genes and the corresponding
proteins provides an opportunity to mine for novel proteins even in instances
where one version of the protein is known. Unfortunately, bioinformatics
methods for predicting splice variant forms are not accurate and it remains
largely an unsolved problem. Here, we present a strategy to couple computer
prediction with experimental data to predict the potential variant forms of
druggable gene targets. We have developed three approaches to further explore
this opportunity in the human proteome, with a specific focus on the G-protein
coupled receptors (GPCRs). The first is to evaluate EST contigs with multiple
sub-contigs that match to GPCRs for possible splice variants. A second
approach uses Genscan predicted fragments in the vicinity of predicted
sequences for which we already filed patent provisionals on the protein
sequence. A third approach is to analyze Ensembl predicted proteins against
known GPCR proteins to look for possible 5 and 3 variants. Using these
methods, we have identified several novel splice variants of known GPCRs.
Also, we have developed a web interface to query the EST contigs for potential
splice variants and sort the results based on different criteria such as expression
profiling data, number of sequences, clone count, open reading frame
information among others. We are in the process of analyzing these predicted
variants via experimental methods (expression profiling, full-length cloning, etc)
as a means of confirming that these putative variants are functional members of
the human proteome.