Scientific and Engineering Computation Series

Enabling Technologies for Petaflops Computing

by Thomas Sterling, Paul C. Messina, and Paul H. Smith

Published 26 July 1995

Building a computer ten times more powerful than all the networked computing capability in the United States is the subject of this book by leading figures in the high performance computing community. It summarizes the near-term initiatives, including the technical and policy agendas for what could be a twenty-year effort to build a petaFLOP scale computer. (A FLOP -- Floating Point OPeration -- is a standard measure of computer performance and a PetaFLOP computer would perform a million billion of these operations per second.) Chapters focus on four interrelated areas: applications and algorithms, device technology, architecture and systems, and software technology.While a petaFLOPS machine is beyond anything within contemporary experience, early research into petaFLOPS system design and methodologies is essential to U.S. leadership in all facets of computing into the next century. The findings reported here explore new and fertile ground.
Among them: construction of an effective petaFLOPS computing system will be feasible in two decades, although effectiveness and applicability will depend on dramatic cost reductions as well as innovative approaches to system software and programming methodologies; a mix of technologies such as semiconductors, optics, and possibly cryogenics will be required; and while no fundamental paradigm shift in system architecture is expected, active latency management will be essential, requiring a high degree of fine-grain parallelism and the mechanisms to exploit it.Scientific and Engineering Computation series

How to Build a Beowulf

by Donald J. Becker, John Salmon, Daniel F. Savarese, and Thomas Sterling

Published 13 May 1999

This how-to guide provides step-by-step instructions for building aBeowulf-type computer, including the physical elements that make up aclustered PC computing system, the software required (most of which isfreely available), and insights on how to organize the code to exploitparallelism.Supercomputing research-the goal of which is to make computers that are ever faster and more powerful-has been at the cutting edge of computer technology since the early 1960s. Until recently, research cost in the millions of dollars, and many of the companies that originally made supercomputers are now out of business.The early supercomputers used distributed computing and parallel processing to link processors together in a single machine, often called a mainframe. Exploiting the same technology, researchers are now using off-the-shelf PCs to produce computers with supercomputer performance. It is now possible to make a supercomputer for less than $40,000. Given this new affordability, a number of universities and research laboratories are experimenting with installing such Beowulf-type systems in their facilities.This how-to guide provides step-by-step instructions for building a Beowulf-type computer, including the physical elements that make up a clustered PC computing system, the software required (most of which is freely available), and insights on how to organize the code to exploit parallelism. The book also includes a list of potential pitfalls.

Beowulf Cluster Computing with Windows

by Thomas Sterling

Published 26 October 2001

Comprehensive guides to the latest Beowulf tools and methodologies.

Beowulf clusters, which exploit mass-market PC hardware and software in conjunction with cost-effective commercial network technology, are becoming the platform for many scientific, engineering, and commercial applications. With growing popularity has come growing complexity. Addressing that complexity, Beowulf Cluster Computing with Linux and Beowulf Cluster Computing with Windows provide system users and administrators with the tools they need to run the most advanced Beowulf clusters. The book is appearing in both Linux and Windows versions in order to reach the entire PC cluster community, which is divided into two distinct camps according to the node operating system. Each book consists of three stand-alone parts. The first provides an introduction to the underlying hardware technology, assembly, and configuration. The second part offers a detailed presentation of the major parallel programming librairies. The third, and largest, part describes software infrastructures and tools for managing cluster resources. This includes some of the most popular of the software packages available for distributed task scheduling, as well as tools for monitoring and administering system resources and user accounts. Approximately 75% of the material in the two books is shared, with the other 25% pertaining to the specific operating system. Most of the chapters include text specific to the operating system. The Linux volume includes a discussion of parallel file systems.