Type-Aware Hierarchical Storage
n check the current page or check for previous versions at the Internet Archive.
Yahoo! is not affiliated with the authors of this page or responsible for its content.
Type-Aware Hierarchical Storage
Type-Aware Hierarchical Storage
Abstract
We propose and evaluate a hierarchical storage system DHIS, that is capable of discriminating be-
tween data with different access characteristics, and then customizing its layout and caching policies to
each type. DHIS uses two kinds of information to tune its decisions. First, it uses information about
pointers between blocks to understand the relationship between blocks and consequently, their impor-
tance. Second, DHIS denes a set of generic attributes that the higher layers can use to annotate the data,
conveying various properties such as importance, access pattern, etc. Based on these attributes, DHIS
dynamically decides to layout the data in the format best suited for its requirements. By doing so, DHIS
solves a critical problem faced by storage vendors and developers of higher level storage software in
terms of choosing the right policy from among a slew of possible alternatives. We show via a proto-
type implementation that customizing policies to specic data requirements has signicant performance
benets.
1
Introduction
Modern large storage systems are virtually supercomputers; a typical high-end storage system from EMC [5]
or NetApp [9] has hundreds of processors, tens of gigabytes of RAM and hundreds of disks. In tune with
the increasing processing power available at the storage systems, their functional sophistication has also
increased. Today, storage systems employ various forms of RAID for reliability and performance, use non-
volatile RAM to absorb write latency, perform dynamic block migration for load balancing, and so on [5, 15].
Although storage systems have evolved signicantly in terms of the range of functionality they provide,
they are still constrained due to one fundamental limitation: they have little or no information about the
system layers above that use the storage system, and thus view data simply as a at stream of bytes. For
example, they do not know what pieces of data are more important than others, what pieces are likely to be
accessed randomly vs. sequentially, etc. Although a lot of storage-level policies such as RAID level, caching
policy, etc. can be tuned for specic kinds of usage, a typical storage system cannot fully exploit this potential
because it deals with a myriad of interleaved types of data each with different access characteristics, and has
very little information to separate these types from each other.
1
In this paper, we present DHIS (pronounced as this), a D</b>iscriminating Hi</b>erarchical S</b>torage system,
that uses various hints specied from the higher layers about the type of the data to select custom policies
for managing the data, such as the exact RAID level, cacheability of the data in NVRAM etc. DHIS also
uses information on the logical relationship between blocks conveyed in the form of logical pointers [11] to
extrapolate its type information from one identifying block to its descendants. By being able to discriminate
between data with varying requirements, DHIS is able to balance conicting goals such as performance and
reliability much more efciently than traditional storage systems.
To make informed choices on the exact layout and caching policies to use for a specic piece of data,
DHIS enables the layers above to annotate logical chunks of data with attributes on the data. For instance,
the le system can specify that a given le (identied by the top-level inode block for the le) will be mostly
subject to small random writes. Given this attribute associated with the le, DHIS would make sure to not
place the le in a RAID-5 format, given the small-write performance penalty incurred in RAID-5; instead,
it may choose to place it in RAID-1 (mirroring) format.
There are ve attributes that DHIS supports: importance of the data (which determines how reliably
the data should be stored), the normal access-pattern on the data (i.e., random or sequential), the expected
popularity of the data (i.e., hot or cold), whether the data is read-mostly or write-mostly, and nally, the
expected lifetime of the data (i.e., whether it corresponds to a temporary le). Based on these ve attributes,
DHIS decides on the specic redundancy and reliability scheme to use for the data, and the various forms
of caching to use (e.g., whether to cache the data in NVRAM or perhaps a faster Flash storage layer) such
that the best performance/reliability trade-off is obtained. Specically, the current implementation of DHIS
utilizes these attributes to automatically select the RAID level a piece of data goes to, and to decide which
pieces of data to cache in NVRAM.
We evaluate DHIS using a software prototype implementation in the Linux kernel. Our prototype oper-
ates as a pseudo device driver that interposes between the le system and the software RAID layers. One key
challenge in this prototyping environment is to ensure there is no performance interference between the host
application and the processing at the pseudo driver layer. By careful use of kernel isolation techniques, we
isolate the CPU and memory usage of the software prototype from the host applications, thus providing a
very close approximation of an actual hardware prototype with its own processing and memory. We believe
that this prototyping environment is valuable more generally for evaluating other kinds of functionality in
the storage system.
2
Using this prototyping environment we evaluate the various discriminating policies of DHIS and demon-
strate their effectiveness. We show that DHIS can achieve signicant performance wins by exploiting higher-
level attributes. We show that the exibility to choose RAID-levels on a per-le basis provides signicant
benets in performance, compared to the one-size-ts-all solution normally employed in todays systems.
We also show that by intelligent caching of data that is subject to frequent random writes (e.g., meta-data
blocks in a le system) in NVRAM, DHIS greatly improves overall system performance.
Overall, we nd that DHIS presents an interesting design choice for building storage systems that exploit
higher level system information. By allowing the higher layers such as the operating system to express only
attributes inherent to the data and not what the storage system should do with it, we decouple the layers; in
other words, the le system need not understand the specics of the wide variety of low-level mechanisms
and policies that todays storage systems use. Depending on the specic features available within a specic
storage system, the storage system can decide how to exploit this valuable extra information.
The rest of this paper is organized as follows: in the next Section, we discuss the background of modern
storage systems and type-aware storage. In Section 3, we describe the design details of DHIS and show
the kind of optimizations that DHIS enables. Section 4 presents our disk protototyping framework and
our prototype implementation of DHIS. We evaluate our prototyping framework and our implementation of
DHIS in Section 5. We discuss related work in Section 6 and conclude in Section 7.
2
Background
In this section, we rst describe the current state of the art in hierarchical storage and motivate the need for
ne-grained policies specic to data. We then briey discuss the usage of pointer information in type-safe
storage which our work builds upon.
2.1
The State of the Art in Large-Scale Storage Systems
Large-scale storage systems today comprise diverse resources that include high processing power, hundreds
of gigabytes of RAM, solid state storage media such as ash, and hundreds or even thousands of disks [5].
Modern storage systems run complex software to provide functionality such as reliability, fault-tolerance,
and high performance I/O. One of the challenges in such storage systems is to effectively manage the wide
range of resources to provide optimal performance and customizable features. However, despite the ad-
3
vancement in storage hardware, the interface used for communicating with them is still simple and narrow
in most scenarios. For example, the SCSI interface supports just two main primitives, block
read
and
write
, resulting in the storage system being mostly oblivious to higher-level information. This makes ef-
cient resource management within modern storage systems a difcult problem, as storage systems cannot
discriminate between different kinds of information they store.
Some existing systems try to work around this problem by exporting more information to higher-level
software [6, 7]. For example, certain enterprise class storage systems allow higher-level software to choose
the RAID level to use for a new volume, during creation [8]. However this requires that the le system or
higher-level storage software be aware of the characteristics of each volume, which could be totally tied to
the internal architecture of the specic storage systems. For example, a storage system could contain several
ne-grained RAID levels, and devices such as NVRAM and solid state memory. Storage architectures
could also be different across vendors and models, and it may be cumbersome to customize le systems
for specic storage systems. Moreover, the abstraction of a vol