FDA XML Data Format Design Specification
Administration
Others Welcome..
Send us your information.
DRAFT
FDA XML Data Format
Revision C
Design Specification
04/18/02
PRELIMINARY SPECIFICATION FOR COMMENT AND REVIEW
Page 2 of 27
1 Introduction
1.1 Purpose
This specification document exists for the purpose of defining the design of the FDA XML Data
Format (FDADF) and ensuring that all interested individuals and organizations involved with the project
have the same understanding of the data format design.
1.2 Scope of Specification
This document covers the design for the waveform data format as well as relevant submission
information. Areas addressed by this document include identifying design elements that meet the
requirements for the data set previously specified in the FDA XML Data Format Requirements
Specification[1] which was initiated after the FDAs meeting on November 19
th
, 2001[2]. Applicability and
interaction with other standards bodies and data definitions, e.g. CDISC and HL-7, as well as current
practice with SAS submission data sets is also discussed. Design for the structure of the data on
electronic media is also covered.
2 Overview
2.1 Background
New Drug Application (NDA) sponsors collect biological data, often as waveforms from subjects
dosed with the candidate drug. A number of measurements are made from the data, or from close
derivations of that data. Those measurements are compiled into datasets and statistically analyzed.
The datasets are submitted with the NDA to support the findings.
2.1.1 Issue
The FDA would like to get a sense of the accuracy and consistency of the measurements made
from the collected biological data. The FDA cannot do this without being given the opportunity to view
the biological data used for making those measurements.
2.1.2 Goal
To facilitate the submission of the biological data or close derivations of it used to make the
measurements. The biological data should be annotated with points and intervals to show the reviewer
relevant landmarks used for making the measurements.
2.1.3 Process Description
During the course of a drug study, a subject is given a dosage of some compound, either the drug
under study or a placebo. Periodically, recording devices collect biological data from the subject. Each
recording session is typically made up of one or more periodically sampled channels (waveforms).
Sometimes measurements are made on the raw waveforms themselves, and in other times on
transformations of that data into other domains. Therefore, the data from which measurements are
made can be in many different forms: electrical-potential vs. sample-time, electrical-potential vs. cycle-
time, pressure vs. sample-time, power vs. frequency, or possible future requirements. No matter what
domain the data from which measurements are made is in, the data is related to a period of real-time,
the period of time from which the recording was made.
DRAFT
FDA XML Data Format
Revision C
Design Specification
04/18/02
PRELIMINARY SPECIFICATION FOR COMMENT AND REVIEW
Page 3 of 27
2.2 Recording
Typically a recording device will periodically sample one or more biological sensors. The sample
values will generally represent biological parameters, e.g. temperature, pressure, oxygen saturation,
electrical potential, at each time point. Customarily the samples are plotted vs. time and the resulting
waveform imparts meaning to a clinician. Additional information can be derived from the waveforms by
making measurements on the waveforms themselves; these measurements are usually included in the
statistical datasets provided to the FDA.
However, the sensor-value vs. time waveform is not the only set of data measurements can be
made from. For example, if the waveform is cyclical in nature (e.g. ECG, BP), an average cycle can be
derived. Measurements can be made on that average cycle and can be used for further statistical
analysis. Other derivations can be imagined, for example, analyzing the frequency content of the
average cycle. Viewing power versus frequency might give an idea of how much energy is in a certain
part of the frequency spectrum, and this may be useful for certain types of analysis.
3 Data Format
3.1 Technology
The FDA desires to use XML as the underlying technology for the specification. This matches the
Agencys strategic direction for data submissions. It is also aligned with other industry initiatives such as
the CDISCs Submission Data Model (SDM).
3.2 Description
The data is assumed to be two-dimensional (2-D) in nature. It is also assumed that the data or
the data from which it is derived was collected from a subject in a drug study. The recording session
producing the source data has a real start time and duration. Datasets can, therefore, be related to real-
time, even if real-time is not one of the dimensions.
The 2-D datasets can also be annotated. The annotations will give the FDA reviewer domain-
specific landmarks demonstrating how the data was used. The annotations are intended to give a precise
indication of where measurements and fiducial points, e.g. R-peak, QRS-onset, and T-offset, used in
analysis were made or placed. Some computed measurements, e.g. QTc, cannot be directly included in
the waveform annotation, but would be supplied in the corresponding submission data.
The 2-D datasets can be grouped together when they are related by a common X-axis dimension.
For example, ECG rhythm leads can be grouped together because they share a common X-axis
sample-time dimension. Leads of an ECG median beat are related by a common cycle-time X-axis
dimension. The group of datasets can therefore share common annotations made in the X-axis domain.
The relationships between real-time, the recording session, dataset groups and datasets can be thought of as a set of
related coordinate systems.
Dataset groups can optionally share a dimension that is related to the recording
session time-domain by simple translate/scale transformations. If the group is not directly related, e.g.
the common dimension of a median beat is cycle-time, not real-time, the group can only be related to a
period of recording session time, but cannot be directly plotted along the recording session timeline.
Therefore, each dataset group will have attributes relating it to the period of recording session time it is
derived from, but will not describe the relationship as a pure translate/scale transformation.
DRAFT
FDA XML Data Format
Revision C
Design Specification
04/18/02
PRELIMINARY SPECIFICATION FOR COMMENT AND REVIEW
Page 4 of 27
10:24:56
10:24:57
10:24:58
10:24:59
10:25:01
10:25:02
10:25:03
10:25:04
10:25:00
10:25:05
Recording Session
Real Time
Plot Group
XYPlot
0.000
0.000
0.000
PlotGroup.TimeDuration
Domain Boundary
Plot Group - X Axis Domain
Recording Session Domain
RecordingSessionPlots.StartDate
RecordingSessionPlots.StartTime
PlotGroup.TimeSinceSessionStart
XYPlot.XOffset
DRAFT
FDA XML Data Format
Revision C
Design Specification
04/18/02
PRELIMINARY SPECIFICATION FOR COMMENT AND REVIEW
Page 5 of 27
3.3 The Entity-Relationship Model
The following diagram shows the entities and their relationships. The
crows foot connector shows a many to one relationship. For example, RecordingSessionPlots may
contain 0 or more PlotGroups, and a PlotGroup must contain exactly one XAxisDomain.
XYPlot
Label
Comment
XOffset
Connected?
AspectRatio
PlotGroup
Label
Comment
TimeSinceSessionStart
TimeDuration
XAxisNotation
BeginningValue
EndingValue
Label
Comment
PointNotation
XValue
YValue
Label
Comment
RecordingSessionPlots
StartDate
StartTime
Duration
UniqueID
FormatVersion
YValues
Scale
Offset
InitialValue
Increment
Values
XValues
Scale
Offset
InitialValue
Increment
Values
YAxisNotation
BeginningValue
EndingValue
Label
Comment
XAxisDomain
Unit
Label
MinorTickInterval
MajorTickInterval
LogScale?
RealTime?
YAxisDomain
Unit
Label
MinorTickInterval
MajorTickInterval
LogScale?
RecordingDevice
Type
Manufacturer
Model
SerialNumber
DeviceID
SoftwareVersion
TrialIdentifiers
StudyID
SiteID
InvestigatorID
UniqueSubjectID
SubjectID
SubjectAge
SubjectSex
SubjectRace
TreatmentCode
TreatmentGroup
Country
VisitNumber
VisitDay
VisitName
DRAFT
FDA XML Data Format
Revision C
Design Specification
04/18/02
PRELIMINARY SPECIFICATION FOR COMMENT AND REVIEW
Page 6 of 27
3.4 XML DTD
Below is the suggested DTD for a RecordingSessionPlots:
<!
ELEMENT
RecordingSessionPlots (StartDate, StartTime, Duration
?
, UniqueID
?
, TrialIdentifiers
?
,
RecordingDevice
?
, PlotGroup*)
>
<!
ELEMENT
StartDate (
#PCDATA
)
>
<!
ELEMENT
StartTime (
#PCDATA
)
>
<!
ELEMENT
Duration (
#PCDATA
)
>
<!
ELEMENT
UniqueID (
#PCDATA
)
>
<!
ATTLIST
RecordingSessionPlots FormatVersion
CDATA
1.0
>
<!
ELEMENT
TrialIdentifiers (StudyID
?
, SiteID
?
, InvestigatorID
?
, UniqueSubjectID
?
, SubjectID
?
,
SubjectAge
?
, SubjectAgeUnits
?
, SubjectSex
?
, SubjectRace
?
, TreatmentCode
?
, TreatmentGroup
?
,
Country
?,
VisitNumber
?
, VisitDay
?
, VisitName
?
)
>
<!
ELEMENT
StudyID (
#PCDATA
)
>
<!
ELEMENT
SiteID (
#PCDATA
)
>
<!
ELEMENT
InvestigatorID (
#PCDATA
)
>
<!
ELEMENT
SubjectID (
#PCDATA
)
>
&l