PAC267-C ESX Server Storage Internals

=arial size=-1 color=blue>
« back to results for ""
Below is a cache of http://download3.vmware.com/vmworld/2005/pac267-c.pdf. It's a snapshot of the page taken as our search engine crawled the Web.
The web site itself may have changed. You can check the current page or check for previous versions at the Internet Archive. Yahoo! is not affiliated with the authors of this page or responsible for its content.
PAC267-C ESX Server Storage Internals
PAC267-C
ESX Server Storage Internals
Satyam Vaghani
Staff Engineer, R&D
19 October 2005 This presentation may contain VMware
confidential information.
Copyright © 2005 VMware, Inc. All rights reserved.
Protected by one or more of U.S. Patent Nos. 6,397,242, 6,496,847,
6,704,925, 6,711,672, 6,725,289, 6,735,601, 6,785,886, 6,789,156, and
6,795,966; patents pending.
VMware, the VMware "boxes" logo and design, Virtual SMP and VMotion are
registered trademarks or trademarks of VMware, Inc. in the United States
and/or other jurisdictions. Microsoft, Windows, and Windows NT are
registered trademarks of Microsoft Corporation. Linux is a registered
trademark of Linus Torvalds. All other marks and names mentioned herein
may be trademarks of their respective companies. Outline
ESX Server storage architecture
Storage virtualization core
Virtual machine state management
Managing physical storage devices
Questions Outline
ESX Server storage architecture
Storage virtualization core
Virtual machine state management
Managing physical storage devices
Questions Goals for ESX Server Storage
High speed, isolated access from virtual
machine to its disk
Organize storage hardware into a
distributed, structured resource pool
Hide physical storage complexity from
virtual machines
High availability, scalability, and reliability
Lower cost of data protection for
virtual machines ESX Server Storage Architecture
Service console (SC)
VMFS driver exports VMFS volumes
to /vmfs, forwards VMFS file access
requests to VMKernel FS core
VMKDev driver exports all VMKernel
storage devices to SC kernel,
forwards requests to storage core
Virtual machine
Uses file-level functions to setup
and manage virtual disk files
Uses SCSI commands to do IO from
Guest OS to virtual SCSI device
VMKernel
Provides buffered IO interface for
POSIX-style file requests
Virtual disk IO goes directly to FS or
storage core
Service Console
VMFS
VMKDev
vmkfstools
fdisk
ls
Virtual Machine
VMKernel
Virtual
Storage
Virtual
HBA
Virtual Disk
Setup
Virtual Machine File System
Storage Core (Physical Device Access)
Data Cache
To Storage Device
VMware storage drivers/apps in Service Console
Core virtualization path (see next slide) ESX Server Storage Stack
Virtual Machine
SCSI Virtualization Engine
VMFS
Logical Volume Manager
Storage Core
Multipathing
Device Driver
SCSI command
FS operation
Block operation
SCSI command
To Storage Device
Physical Device Access
Virtual Machine State Management
Storage Virtualization Core Outline
ESX Server storage architecture
Storage virtualization core
Virtual machine state management
Managing physical storage devices
Questions SCSI Virtualization Engine
Exports a file, RDM, LUN or redo log as
virtual SCSI disk
Forwards, filters or remaps commands
(downstream) and IO completions (upstream)
New in ESX Server3
Hot-add virtual disks to virtual machines
Layered apps inside virtual machines with RDMs
Uniform virtual disk management across
VMware products
NOTE: ESX Server3 features still under development and may be su
NOTE: ESX Server3 features still under development and may be su
bject to change
bject to change Virtual SCSI Devices
SCSI-2 compliant VMware virtual disk
Physical device access (pass-through)
SCSI Commands From Virtual Machine
Device State
Test Unit Ready,
Inquiry
State Machine
Reserve,
Release,
Reset
Filter
Read,
Write
File Read,
File Write
Virtual Resv,
Rel, Rst
Read Buffer,
Write buffer
Get Capacity
File Stat
IO Cmpl
No Connect
Busy
To/From File System
To/From Storage Core
Filter
Report LUNs
SCSI Command
SCSI Command
SCSI Virtualization Engine Virtual Disks
Use Case
Performance
Format
Type
Deprecated, expose
non-disk devices to
virtual machine
High
N/A
Raw/System
LUN
Layered apps,
clustering
High
N/A
RDM
Virtual machine
snapshots, backup, DR
Medium
Sparse
Delta (redo
log)
Storage over-
commitment
Medium to
high
Flat
Allocate-on-
demand*
ESX Server default
High
Flat
Preallocated
* New in ESX Server3
* New in ESX Server3
NOTE:
ESX Server3 features still under development and may be su
NOTE: ESX Server3 features still under development and may be su
bject to change
bject to change Outline
ESX Server storage architecture
Storage virtualization core
Virtual machine state management
Managing physical storage devices
Questions The ESX Server Virtual SAN
Virtual disks are LUNs on the VMFS Disk Array
Storage Virtualization Device (/vmfs)
Virtual
Machine 1
Legend
VMFS-3 volume
Base disk
Redo log
(Snapshot)
RDM
Raw Device
vmhba0:0:0:0
1.vmdk.redo
1.vmdk
4.rdm
3.vmdk
VMFS vol 1
2.vmdk
2.vmdk.redo
VMFS vol 2
Virtual Disk Array 2
Virtual Disk Array 1
Virtual
Machine 2
Virtual
Machine 3
Virtual
Machine 4 Virtual Machine File System (VMFS)
Optimized for
accessing large
files from VMM
Keeps virtual disk
performance close
to native
0
20
40
60
80
100
120
140
160
180
200
seq read
seq write
rand read
rand write
64K Block size
MB
p
s
physical machine
esx3-vmfs2
esx3-vmfs3
esx2-vmfs2
0
20
40
60
80
100
120
140
160
180
200
seq read
seq write
rand read
r and write
128K Block size
MB
p
s
phys ic al machine
esx3-vmfs2
esx3-vmfs3
esx2-vmfs2 VMFS, cont.
Enhanced functionality on SANs
Distributed access from ESX Server hosts
No network lock manager, or knowledge of other hosts
Auto-discover volumes, SAN volume manager, RDMs, snapshots
Virtual machine storage consistency
VMFS partition protection, exclusive locks across hosts, crash
consistent virtual machine IO path
Special primitives for clustered virtual machines,
raw LUNs
Easily manage virtual disks as files
Enables elegant backup and DR solutions VMFS-3
A New FS For ESX Server3
Exclusive repository of virtual machines and
virtual machine state
Better organization through directories, small files
Large number of files to host more virtual machines
Enables DAS and DRS
Stronger consistency mechanisms
Distributed journal for faster crash recovery
Crash recovery and metadata update code is tested
in normal IO paths
NOTE: ESX Server3 features still under development and may be su
NOTE: ESX Server3 features still under development and may be su
bject to change
bject to change VMFS-3
Performance
Reduced IO-to-disk for
metadata operations
Less contention on
global resources
Less disruption due to
SCSI reservations
Faster virtual machine
management operations
NOTE: ESX Server3 features still under development and may be su
NOTE: ESX Server3 features still under development and may be su
bject to change
bject to change
touch1
ls1
rm1
touch64
ls64
rm64
vmfs3
vmfs2
1
10
100
1000
10000
100000
1000000
10000000
100000000
Bytes
Tests
IO-to-disk
-100.000
-90.000
-80.000
-70.000
-60.000
-50.000
-40.000
-30.000
-20.000
-10.000
0.000
0
20
40
60
80
100
120
140
# of files
%
d
i
f
f
e
r
en
ce
touch
ls
rm VMFS-3
Scalability
Large number of FS
objects dont compromise
performance
Greater connectivity (hosts
or virtual machines/VMFS
volume)
Fairness across multiple
virtual machines hosted on
the same volume
NOTE: ESX Server3 features still under development and may be su
NOTE: ESX Server3 features still under development and may be su
bject to change
bject to change
0
5
10
15
20
25
30
35
40
45
50
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
# of files multiplier (1 = 32 files)
T
i
m
e
t
aken

(
s
e
c
)
esx3-vmfs3
esx3-vmfs2
esx2- vmfs2
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0
10
20
30
40
50
60
70
Time
KBp
s
1 VM
2 VM
4 VM
8 VM
64k block size, sequential write throughput on VMFS
64k block size, sequential write throughput on VMFS
-
-
3
3
touch
touch Logical Volume Manager
Consolidates multiple physical disks into a
single logical device
New in ESX Server3
Volume availability not compromised due to
missing disks
Automatic resignaturing for volumes hosted on
SAN snapshots
NOTE: ESX Server3 features still under development and may be su
NOTE: ESX Server3 features still under development and may be su
bject to change
bject to change Network File System (NFS)
New in ESX Server3: NFS v3/TCP driver to
mount NAS exports in VMKernel
Cheaper shared storage alternative to SAN
Easier to provision and setup
No
need to carve out LUNs,