Proceedings of the Linux Symposium

STYLE type="text/css">
Conference Organizers
Andrew J. Hutton, Steamballoon, Inc.
C. Craig Ross, Linux Symposium
Review Committee
Andrew J. Hutton, Steamballoon, Inc.
Dirk Hohndel, Intel
Martin Bligh, Google
Gerrit Huizenga, IBM
Dave Jones, Red Hat, Inc.
C. Craig Ross, Linux Symposium
Proceedings Formatting Team
John W. Lockhart, Red Hat, Inc.
Gurhan Ozen, Red Hat, Inc.
John Feeney, Red Hat, Inc.
Len DiMaggio, Red Hat, Inc.
John Poelstra, Red Hat, Inc.
Authors retain copyright to all submitted papers, but have granted unlimited redistribution rights
to all as a condition of submission. Internals of the RT Patch
Steven Rostedt
Red Hat, Inc.
srostedt@redhat.com
rostedt@goodmis.org
Darren V. Hart
IBM Linux Technology Center
dvhltc@us.ibm.com
Abstract
Steven Rostedt (
srostedt@redhat.com
)
Over the past few years, Ingo Molnar and others have
worked diligently to turn the Linux kernel into a vi-
able Real-Time platform. This work is kept in a patch
that is held on Ingos page of the Red Hat web site [7]
and is referred to in this document as the RT patch.
As the RT patch is reaching maturity, and slowly slip-
ping into the upstream kernel, this paper takes you into
the depths of the RT patch and explains exactly what it
is going on. It explains Priority Inheritance, the con-
version of Interrupt Service Routines into threads, and
transforming spin_locks into mutexes and why this all
matters. This paper is directed toward kernel developers
that may eventually need to understand Real-Time (RT)
concepts and help them deal with design and develop-
ment changes as the mainline kernel heads towards a
full edge Real-Time Operating System (RTOS). This
paper will offer some advice to help them avoid pitfalls
that may come as the mainline kernel comes closer to an
actual RTOS.
The RT patch has not only been benecial to those
in the Real-Time industry, but many improvements to
the mainline kernel have come out of the RT patch.
Some of these improvements range from race conditions
that were xed to reimplementation of major infrastruc-
tures.
1
The cleaner the mainline kernel is, the easier it
is to convert it to an RTOS. When a change is made to
the RT patch that is also benecial to the mainline ker-
nel, those changes are sent as patches to be incorporated
into mainline.
1
such as hrtimers and generic IRQs
1
The Purpose of a Real-Time Operating Sys-
tem
The goal of a Real-Time Operating System is to create
a predictable and deterministic environment. The pri-
mary purpose is not to increase the speed of the system,
or lower the latency between an action and response, al-
though both of these increase the quality of a Real-Time
Operating System. The primary purpose is to eliminate
surprises. A Real-Time system gives control to the
user such that they can develop a system in which they
can calculate the actions of the system under any given
load with deterministic results. Increasing performance
and lowering latencies help in this regard, but they are
only second to deterministic behavior. A common mis-
conception is that an RTOS will improve throughput and
overall performance. A quality RTOS still maintains
good performance, but an RTOS will sacrice through-
put for predictability.
To illustrate this concept, lets take a look at a hypothet-
ical algorithm that on a non Real-Time Operating Sys-
tem, can complete some calculation in 250 microsec-
onds on average. An RTOS on the same machine may
take 300 microseconds for that same calculation. The
difference is that an RTOS can guarantee that the worst
case time to complete the calculation is known in ad-
vanced, and the time to complete the calculation will
not go above that limit.
2
The non-RTOS can not guar-
antee a maximum upper limit time to complete that algo-
rithm. The non-RTOS may perform it in 250 microsec-
onds 99.9% of the time, but 0.1% of the time, it might
take 2 milliseconds to complete. This is totally unac-
ceptable for an RTOS, and may result in system failure.
For example, that calculation may determine if a device
driver needs to activate some trigger that must be set
within 340 microseconds or the machine will lock up.
So we see that a non-RTOS may have a better average
2
when performed by the highest priority thread.
161 162 Internals of the RT Patch
performance than an RTOS, but an RTOS guarantees to
meet its execution time deadlines.
The above demonstrates an upper bound requirement for
completing a calculation. An RTOS must also imple-
ment the requirement of response time. For example, a
system may have to react to an asynchronous event. The
event may be caused by an external stimulus (hitting a
big red button) or something that comes from inside the
system (a timer interrupt). An RTOS can guarantee a
maximum response time from the time the stimulant oc-
curs to the time the reaction takes place.
1.1
Latencies
The time between an event is expected to occur and the
time it actually does is called latency. The event may be
an external stimulus that wants a response, or a thread
that has just woken up and needs to be scheduled. The
following is the different kinds and causes of latencies
and these terms will be used later in this paper.
Interrupt Latency The time between an
interrupt triggering and when it is actually ser-
viced.
Wakeup Latency The time between the
highest priority task being woken up and the time
it actually starts to run. This also can be called
Scheduling Latency
.
Priority Inversion The time a high pri-
ority thread must wait for a resource owned by a
lower priority thread.
Interrupt Inversion The time a high
priority thread must wait for an interrupt to perform
a task that is of lower priority.
Interrupt latency is the easiest to measure since it cor-
responds tightly to the time interrupts are disabled. Of
course, there is also the time that it takes to make it to
the actual service routine, but that is usually a constant
value.
3
The duration between the waking of a high pri-
ority process and it actually running is also a latency.
This sometimes includes interrupt latency since waking
of a process is usually due to some external event.
3
except with the RT kernel, see Section 2.
Priority inversion is not a latency but the effect of pri-
ority inversion causes latency. The amount of time a
thread must wait on a lower priority thread is the la-
tency due to priority inversion. Priority inversion can
not be prevented, but an RTOS must prevent unbounded
priority inversion. There are several methods to address
unbounded priority inversion, and Section 6 explains the
method used by the RT patch.
Interrupt inversion is a type of priority inversion where
a thread waits on an interrupt handler servicing a lower
priority task. What makes this unique, is that a thread
is waiting on an interrupt context that can not be pre-
empted, as opposed to a thread that can be preempted
and scheduled out. Section 2 explains how threaded in-
terrupts address this issue.
2
Threaded Interrupts
As mentioned in Section 1.1, one of the causes of la-
tency involves interrupts servicing lower priority tasks.
A high priority task should not be greatly affected by
a low priority task, for example, doing heavy disk IO.
With the normal interrupt handling in the mainline ker-
nel, the servicing of devices like hard-drive interrupts
can cause large latencies for all tasks. The RT patch
uses threaded interrupt service routines to address this
issue.
When a device driver requests an IRQ, a thread is cre-
ated to service this interrupt line.
4
Only one thread can
be created per interrupt line. Shared interrupts are still
handled by a single thread. The thread basically per-
forms the following:
while (!kthread_should_stop())
{
set_current_state
(TASK_INTERRUPTIBLE);
do_hardirq(desc);
cond_resched();
schedule();
}
Heres the ow that occurs when an interrupt is trig-
gered:
The architecture function do_IRQ()
5
calls one of the
following chip handlers:
4
See kernel/irq/manage.c do_irqd.
5
See arch/<arch>/kernel/irq.c. (May be different in
some architectures.) 2007 Linux Symposium, Volume Two 163
handle_simple_irq
handle_level_irq
handle_fasteoi_irq
handle_edge_irq
handle_percpu_irq
Each of these sets the IRQ descriptors status ag
IRQ_INPROGRESS
,
and then calls redirect_
hardirq()
.
redirect_hardirq()
checks if threaded interrupts
are enabled, and if the current IRQ is threaded (the
IRQ ag IRQ_NODELAY is not set) then the associ-
ated thread (do_irqd) is awaken. The interrupt line
is masked and the interrupt exits. The cause of the inter-
rupt has not been handled yet, but since the interrupt line
has been masked, that interrupt will not trigger again.
When the interrupt thread is scheduled, it will handle
the interrupt, clear the IRQ_INPROGRESS status ag,
and unmask the interrupt line.
The interrupt priority inversion latency time is only the
time from the triggering of the interrupt, the masking of
the interrupt line, the waking of the interrupt thread, and
returning back to the interrupted code, which takes on a
modern computer system a few microseconds. With the
RT patch, a thread may be given a higher priority than a
device handler interrupt thread, so when the device trig-
gers an interrupt, the interrupt priority inversion latency
is only the masking of the interrupt line and waking the
interrupt thread that will handle that interrupt. Since the
high priority thread may be of a higher priority than the
interrupt threa