IBM Final Report: Computational Advances

A SPATIAL MODELING AND DECISION SUPPORT SYSTEM FOR CONSERVATION OF BIOLOGICAL DIVERSITY

COMPUTATIONAL ADVANCES

Regional Vegetation Classification and Mapping
Wildlife Habitat Modeling
Monitoring Environmental Change
Reserve Selection Algorithms
General Advances

As is typical of regional natural resource databases, ours is moderately large (50-100 gigabytes and thousands of files) and comprised of very diverse geographic data and information ranging from text and tables to images, maps, and models. Usually 5-10 individuals work at the same time on various complex applications such as spatial analysis, image processing, word processing, graphics, spreadsheet analysis, linear programming, and statistical software. There are many ways to configure a network of workstations and additional disk storage.

With the wide variety of commands executed in a highly technical GIS/image processing compute environment, several bottlenecks exist for different types of commands. Depending on the nature of the command, the bottleneck may be CPU, local disk I/O, or NFS I/O.

An example of a CPU intensive operation on a raster is the focal mean operation. This operation produces a new raster whose values are the mean values of the pixels inside a floating window passed over the input grid. The CPU bottleneck is a function of the size of the floating window. In a 10 x 10 window there are 100 add operations and a divide operation for every input pixel. A single Landsat Thematic Mapper satellite image would involve over 52 billion add operations with a 10 x 10 pixel window. For this size of window, CPU is definitely the bottleneck.

Image processing operations are very often limited by disk I/O instead of CPU. A single Landsat Thematic Mapper satellite image at 25m resolution consumes approximately 525 megabytes of disk space. Simply extracting one of seven bands for further processing is severely limited by disk I/O.

NFS I/O was one of our largest bottlenecks. We have calculated NFS writes to take as much as five times longer than NFS reads. We have contacted IBM technical support about this finding, but concurrent with our tests on other architectures (DEC Alpha, Sun SPARC), this is simply the nature of NFS.

When provided new opportunities by IBM-ERP, therefore, we based our equipment choices on the need for more "compute power." Compute power is a term that is difficult to define, but specific to GIS and remote sensing applications, we determined that an increase in the performance and quantity of certain resources would greatly speed productivity. To reduce CPU bottlenecks, our solution has been to select several high-end RS6000 workstations with our most demanding CPU intensive tasks in mind. GIS and image processing algorithms require a fast CPU on a machine with plenty of memory for data buffering to execute in a reasonable amount of time. To alleviate disk I/O bottlenecks we have chosen only high capacity SCSI-2 hard disks and an FDDI network to move data quickly. The quantity of disk space is important not only for storage but for temporary processing space. Despite our 100 megabit/second FDDI network and Prestoserve adapter, which have somewhat improved NFS performance, NFS still seems to be the culprit for many delays in processing. We have two inelegant workarounds to alleviate the NFS write bottleneck. The first has been to strategically allocate moderate quantities of hard disk space to every workstation. Users on their local machines can very often use their local disk to perform most of their compute tasks, thereby minimizing the use of NFS for software services. The second workaround has been in the form of educating users that they can copy a file much faster from one machine to another by logging in to the destination machine and NFS reading ("pulling") the file instead of NFS writing ("pushing") the file from the local machine.

Our IBM hardware configuration includes a large file server which is used as a compute server for very intensive jobs with several high performance workstations for general processing. This has proven a reliable and effective design for our computing needs. The 58H workstation has a total of 24.5 gigabytes of disk to accommodate the very large GIS and satellite imagery data sets we use. We also received a 39H workstation with 13.5 gigabytes of disk. This machine has a more powerful CPU than the 58H, and is dedicated as a compute server. The 58H remains the primary file server, while the 39H provides some additional disk space. We also requested a Pentium 133mhz PC and a 486/DX100 laptop to accommodate our mobile and PC applications. These machines are connected to the network and can access the mass file stores on any of the disk servers and workstations. We now have 5 Xstations which have proven a cost-effective means for obtaining fully functional seats. Our current IBM hardware now includes:

Qty Model Memory (mb) Disk (bg) FDDI (y/n) Other Peripherals

1 58H 256 24.5 y 4 tape drives, cdrom, two external disk boxes

1 39H 128 13.5 y cdrom

1 370 128 2.5 y 23'' monitor

1 370 128 2.5 y

1 370 64 2.5 y

1 355 64 1.2 n

4 150 14 n/a n/a

1 160 16 n/a n/a

1 8240 n/a n/a y 8 ports

Here is an example of the increase in productivity provided by the new computing facilities. The task is to load a single Landsat Thematic Mapper (TM) image and project it into a different map projection with a different image resolution. The original scene size is about 525 megabytes and the result is about 35 megabytes. The input consists of 7 individual data bands of the TM sensor. With our original computing facilities, each of the 7 bands would be loaded and processed individually because there was not enough disk space to house the all the original data at once. Processing all 7 bands took 2 days running overnight and required about 2 hours person-time to input parameters, load tapes and juggle files around. Processing all 30 TM scenes required to cover the state of California would take 60 working days and 60 person-hours to complete. With the new facilities and the coding of a smart "wrapper" program to handle the parameters, this procedure has been greatly simplified and speeded. All the input data are now loaded at once and processed in a single command line due to the availability of ample disk space. With the fast CPU and I/O of the new computers, the process completes in about 3 hours. Person-time is reduced to about 1 minute per scene, and there are no files or tapes to juggle to store intermediate output. With the new facilities, 8 scenes can be processed in one day with minimal user intervention. Processing all scenes for California would take 4 working days; a performance increase of 15 times. With the increased CPU power, faster I/O and the application of a smart program, productivity has been increased greatly leaving more time for other work.

Regional Vegetation Classification and Mapping
Wildlife Habitat Modeling
Monitoring Environmental Change
Reserve Selection Algorithms
General Advances

Next Section

IBM-ERP Project Home Page

Biogeography Lab Home Page

Qty	Model	Memory (mb)	Disk (bg)	FDDI (y/n)	Other Peripherals
1	58H	256	24.5	y	4 tape drives, cdrom, two external disk boxes
1	39H	128	13.5	y	cdrom
1	370	128	2.5	y	23'' monitor
1	370	128	2.5	y
1	370	64	2.5	y
1	355	64	1.2	n
4	150	14	n/a	n/a
1	160	16	n/a	n/a
1	8240	n/a	n/a	y	8 ports