NSF SCI #0438193: Annual Report for Year 2005

br>The web site itself may have changed. You can check the current page or check for previous versions at the Internet Archive. Yahoo! is not affiliated with the authors of this page or responsible for its content.
NSF SCI #0438193: Annual Report for Year 2005
NSF SCI #0438193: Annual Report for Year 2005
NSF SCI #0438193: Annual Report for Year 2005
Mobile-Agent-Based Middleware for Distributed Job Coordination
Munehiro Fukuda
Computing and Software Systems, University of Washington, Bothell
email: mfukuda@u.washington.edu
January 21, 2006
Abstract
This annual report presents the PIs research activities conducted for year 2005 on NSF SCI
#0438193: Mobile-Agent-Based Middleware for Distributed Job Coordination. This is an RUI project
to implement the AgentTeamwork system that dispatches a collection of mobile agents to remote com-
puting sites for coordinating the execution of a user job. The PI, his undergraduate research assistants,
and research collaborators from Ehime University have implemented and enhanced AgentTeamworks
mobile agent execution platform, system agents, and language support utilities including the mpiJava
API. We have evaluated AgentTeamworks job coordination performance using a Giga-Ethernet clus-
ter of 24 DELL computing nodes and have shown results in our academic papers including the one to
be published from International Journal of Applied Intelligence.
UW Bothell Distributed Systems Laboratory
1 NSF SCI #0438193: Annual Report for Year 2005
Contents
1
Overview
3
2
Research Activities
3
2.1
Research Equipments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.2
System Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.3
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.3.1
UWAgent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.3.2
AgentTeamworks System Agents
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.3.3
User Program Wrapper and Preprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.3.4
GridTcp and mpiJava . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
3
Major Findings
8
3.1
Computational Granularity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
3.2
Computational Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
4
Student Supervision
10
5
Dissemination
11
5.1
publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2
Colloquia
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3
Contribution to Partners Publication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6
Budget Activities
11
6.1
Equipments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.2
PIs Salary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.3
Student Salary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.4
Travels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7
Plan for Year 2006
13
8
Final Comments
13
UW Bothell Distributed Systems Laboratory
2 NSF SCI #0438193: Annual Report for Year 2005
1
Overview
NSF SCI #0438193: Mobile-Agent-Based Middleware for Distributed Job Coordination is an RUI
project to implement AgentTeamwork that allows a user to dispatch a job with a collection of mo-
bile agents, each deployed to a remote computing site where it launches a job, monitors the job execu-
tion, recovers it at another site upon a crash, and even migrates it to an idle computing node for better
performance.
Year 2005 was the rst year of our three-year NSF-granted research activities. Starting with the
purchase and installation of research equipments, we achieved the following research work:
1. an enhancement of our mobile-agent execution platform that serves as AgentTeamworks infras-
tructure,
2. an extension of AgentTeamworks system agents that can dispatch a job, monitor remote comput-
ing resources, and detect agent crashes in their hierarchy,
3. a design of AgentTeamworks preprocessor as well as a study of code cryptography that translates
a java program to the code accepted by AgentTeamwork and encrypted for security purposes,
4. an implementation of the mpiJava API with Java sockets and our fault-tolerant socket library
named GridTcp.
In addition to the research activities, this report will give details of the PIs student supervision,
publications and budget activities in 2005 as well as his research plan for year 2006.
2
Research Activities
This section introduces the research equipments in use for our AgentTeamwork development, describes
AgentTeamworks system overview, and explains our implementation progress for year 2005.
2.1
Research Equipments
We purchased a Giga Ethernet cluster of 24 DELL computing nodes with the NSF grant in Febru-
ary 2005. As shown in Figure 1, it is now connected through the UW Bothell campus backbone
to our Myrinet-2000 cluster of eight DELL desktop workstations that has been already installed in
summer 2002. Table 1 summarizes the specication of these two cluster systems. Our intension for
this cluster conguration is to simulate low parallelism/tightly-coupled CPU connection versus high
parallelism/loosely-coupled CPU connection by combining these two clusters and the departments in-
structional cluster: (1) using the 2Gbps Myrinet cluster for 8-way parallel computation, (2) using the
Giga Ethernet cluster for 16 and 24-way computation, (3) using both for 32-way computation, and (4)
adding the 100Mbps/1Gbps 32-node instructional cluster to our systems so as to realize 64-way compu-
tation.
Specication
Giga Ethernet Cluster
Myrinet 2000 Cluster
# nodes
24
8
CPU
3.2GHz/1MB cache Xeon
2.8GHz Xeon
Memory
512MB
512MB
HDD
36GB SCSI
60GB SCSI
Bandwidth
1Gbps
2Gbps and 1Gbps
Installation
Rank mounted
Desktop
Purchase
With NSF #0438193/departmental budgets
With PIs start-up money in year 2001
Table 1: Cluster specications
UW Bothell Distributed Systems Laboratory
3 NSF SCI #0438193: Annual Report for Year 2005
Campus Backbone Public IPs (100Mbps)
Giga Ethernet Switch
mnode8
mnode31
Giga Ethernet Switch
Public IPs (1GBps)
mnode0
mnode7
EightNode Myrinet2000 Cluster
24Node GigaEthernet Cluster
Myrinet 2000 Switch
Local IPs (2Gbps)
Public
IPs (1GBps)
NSF Server
(perseus.uwb.edu)
(medusa.uwb.edu)
Cluster Gateway
PIs Workstatioin
Figure 1: A Conguration of Research Equipments
Java user applications
mpiJava API
mpiJava-S
mpiJava-A
Java socket
GridTcp
User program wrapper
Commander, resource sentinel, and bookkeeper agents
UWAgents mobile agent execution platform
Operating systems
Figure 2: AgentTeamwork execution layer
2.2
System Conguration
Figure 2 shows the AgentTeamwork execution layers from the top application level to the underlying
operating systems. The system facilitates the mpiJava API [6] for high-performance Java applications,
while they may call native functions and use socket-based communication as a matter of course. Com-
plying with the original API, AgentTeamwork distinguishes two versions of mpiJava implementation:
one is mpiJava-S that establishes inter-process communication using the conventional Java socket, and
the other is mpiJava-A that realizes message-recording and error-recoverable TCP connections using our
GridTcp socket library. The implementation is user-determined with arguments passed to the MPJ.Init( )
function. Below mpiJava-S and mpiJava-A is the user program wrapper that periodically serializes a user
process into a byte-streamed snapshot. The process execution is coordinated and monitored by Agent-
Teamworks system-provided agents such as commander, resource, sentinel, and bookkeeper agents.
Each process snapshot is captured by its local sentinel agent and sent to the corresponding bookkeeper
agent for recovery purposes. All these agents are executed on top of the UWAgents mobile agent exe-
cution platform which we have developed with Java as an infrastructure for agent-based grid-computing
middleware.
An application program is coded in the AgentTeamwork-specic framework as shown in Figure 3. If
the
program uses mpiJava-A, it must include a GridTcp object in a declaration part of system-provided
objects (line 4). The code consists of a collection of methods, each of which is named func appended
by a 0-based index and which returns the index of the next method to call. The application starts from
func 0, repeats calling a new method indexed by the return value of its previous method, and ends in
the method whose return value is -2, (i.e., func 2 in this example code). The MPJ.Init function invokes
mpiJava-A when receiving an ipEntry object that is automatically initialized by the user program wrap-
per to map an IP name to the corresponding MPI rank (line 8). Following MPJ.Init, a user may use any
UW Bothell Distributed Systems Laboratory
4 NSF SCI #0438193: Annual Report for Year 2005
mpiJava functions for inter-process comm