www.sjc.edu/hzhou/siRNA.doc

« back to results for ""
Below is a cache of http://www.sjc.edu/hzhou/siRNA.doc. It's a snapshot of the page taken as our search engine crawled the Web.
The web site itself may have changed. You can check the current page or check for previous versions at the Internet Archive. Yahoo! is not affiliated with the authors of this page or responsible for its content.






A
Three-Phase Algorithm for Computer Aided siRNA Design


Hong Zhou


Saint Joseph College, West Hartford, CT 06117, USA


hzhou@sjc.edu


Xiao Zeng


Superarray Bioscience Corporation, 7320 Executive Way, Frederick,
MD 21704, USA


xzeng@superarray.net


Yufang Wang
and Benjamin Ray Seyfarth


University of Southern Mississippi, Hattiesburg, MS 39406, USA


Keywords: siRNA, RNA interference, three-phase,
Smith-Waterman, BLAST

Received: July 10, 2005


As our knowledge of RNA interference accumulates, it is desirable
to incorporate as many selection rules as possible into a computer-aided
siRNA-designing tool. This paper presents an algorithm for siRNA selection
in which nearly all published siRNA-designing rules are categorized
into three groups and applied in three phases according to their identified
impact on siRNA function. This tool provides users with the maximum
flexibility to adjust each rule and reorganize them in the three phases
based on users own preferences and/or empirical data. When the generally
accepted stringency was set to select siRNA for 23,484 human genes represented
in the RefSeq Database (NCBI, human genome build 35.1), we found 1,915
protein-coding genes (8.2%) for which none suitable siRNA sequences
can be found. Curiously, among these 1,915 genes, two had validated
siRNA sequences published. After close examination of another 105 published
human siRNA sequences, we conclude that (A) many of the published siRNA
sequences may not be the best for their target genes; (B) some of the
published siRNA may risk off-target silencing; and (C) some published
rules have to be compromised in order to select a testable siRNA sequence
for the hard-to-design genes.


1 Introduction

Since the seminal paper published by Craig C. Mellos group in 1998
[1], RNA interference (RNAi) has emerged as a powerful technique to
knock out/down the expression of target genes for gene function studies
in various organisms [2,3,4]. What is truly remarkable about the RNAi
effect is that it is sequence-specific. This means that as long as we
know the sequence of the transcript to be targeted, we can design a
short double-stranded RNA (small interfering RNA or siRNA) to knock
down, if not eliminate the expression of the target gene without changing
the genetic make-up of the cells. Compared to the anti-sense oligonucleotide
technology developed earlier [5,6], RNAi is much more effective because
RNAi is achieved by catalytic components within the cell [1,7,8,9].


Understandably, how
to design the best siRNA has become an intense competition between academic
research groups as well as commercial providers of siRNA. The following
is a summary of some major designing rules published.



The length of functional siRNAs: The lengt</span><span class="Normal--Char" style=" font-size: 10pt;
">h of siRNA ranges from 19 to 30 base pairs (bps) [2,10,11]. Double stranded RNA longer than 30 bps is likely to invoke an antiviral interferon response, a general shut-do</span><span
class="Normal--Char" style=" font-size: 10pt;">wn of the cellular translation instead of gene-specific
RNAi [12,13,14].
The GC content of functional siRNA: The optimal GC content of siRNA should be between 30% and 55% [10,14,15]. GC-rich sequences,
in general, have the tendency to form quadruplex or hairpin structures [16]. Sequences with GC stretches over 7 in a row may form duplexes too
stable to be unwound [16,17,18,19]. On the other hand, sequences with extremely low GC content cannot
form stable siRNA duplexes.
The thermo-stability bias at the 5 end of the antisense stran</span><span class="Normal--Char" style=" font-size: 10pt;
font-weight: bold;">d: Since it is desirable to have only the antisense strand incorporated
into the RISC complex, lowering the thermo-stability at the 5 end of the antisense strand can promote helicase unwind siRNA duplexes
from this end [17,20,21].
Concerning tandem repeats and palindromes: Since sequences containing tandem repeats or palindromes may form
internal fold-back structures, it is best to avoid any internal repeats or palindromes in the designed siRNA
sequence [10]. For the same r</span><span
class="Normal--Char" style=" font-size: 10pt;">eason and other concerns [22] [23], long single nucleotide
repeats (such as AAAA, UUUU, CCCC or GGGG) should also be avoided [19,24].

Regarding the specific
nucleotide positions in siRNA, it has been proposed that base U at position
10, base A at position three, and a base other than G at position thirteen
were preferred [10]. However, those experiments were conducted with
siRNAs 19 bps in length, it is unknown if the same rules apply to longer
siRNAs. While some siRNA design algorithms prefer having the siRNA sequence
start with AA [14,24,25], others have pointed out that this rule may
result in frequent misses of effective siRNA sequences [17]. Besides,
starting with AA may sometimes conflict with the notion that 5 antisense
end should be thermodynamically less stable than the 5-sense end [17,20,21].
It is not clear whether siRNA should be picked within the coding region
(CDS) only, though it has been suggested that 5 and 3 untranslated
region (UTR) should be avoided [24,25]. However, a recent report showed
that targeting 3-UTR was as efficient as targeting the CDS [26]. If
the siRNA (or shRNA, small hairpin RNA) is generated via T7 RNA polymerase,
additional rules may apply [27].


While it is desirable
to incorporate all of the selection rules into a computer aided siRNA
design tool, the complication at the moment is how to rank those published
rules, especially when some of the rules are contradictive. Currently,
quite a few computer aided siRNA design tools have been published [17,18,19,24,25,27,28,29]
and some of those have been made accessible through websites. However,
none of those tools has successfully incorporated all the rules above,
and most of them treat their employed rules without much differentiation. 
In general, the existing tools adopt a set of rules and assign each
rule an equal or different score, and each siRNA sequence is scored
against every rule and only those sequences scoring above a predefined
point are selected as valid siRNA sequences.   Such a simple
selection procedure does not accommodate the possibility that some rules
are critical for the validity of a siRNA sequence (must be met), while
some rules can only affect the efficiency of the siRNA sequence.  
Meanwhile, those web-based tools only provide users very limited flexibility,
and users cannot reorganize the selection rules based on their own preferences
or recent research data.


Although the actual
mechanism of which is still unclear, the off-target effect [30] of siRNA
is largely attributed to partial sequence homology between siRNA and
its unintended targets [31,32]. Most available siRNA design tools use
BLAST [33] to filter out siRNA candidates that may cause off-target
effect. However, BLAST may overlook significant sequence homologies
[17,34]. As an alternative, the Smith-Waterman search algorithm [35]
has been proposed to identify all possible off-target sequences [17].
Unfortunately, Smith-Waterman search against the whole-transcriptome
is very time-consuming.


This paper presents
a three-phase siRNA selection a