Acquisition and Evaluation of Data Sets for Comparative Assessment of Risk to Biodiversity on a Continental Scale: Threats to Biodiversity


David M. Stoms, William A. Kuhn, Frank W. Davis,
with Mary E. Cablk, William W. Hargrove, and Lawrence L. Master

Institute for Computational Earth System Science
and Department of Geography, University of California
Santa Barbara, CA 93106

Final Report to the Environmental Protection Agency

Report Date: September 30, 1998

Executive Summary

Table of Contents

The EPA's Scientific Advisory Board, in a report, Reducing Risk: Priorities and Strategies for Environmental Protection, identified biological depletion and habitat modification among the four highest priority ecological risks in the United States. In recognition that the loss of biological diversity can only be effectively addressed through cooperation of vested interests, EPA formed the Biodiversity Research Consortium (BRC) to develop the technical information and databases needed to assess and manage risks to biodiversity. The BRC proposed a national assessment of comparative risks to biodiversity. The objective of a national assessment would be to identify those areas having species assemblages that contribute the greatest genetic diversity to the biota of their biogeographic regions and high-risk areas requiring management intervention to sustain biodiversity.

Before authorizing a national assessment, the Scientific Advisory Board urged the BRC to demonstrate that reliable data and methodology exist. To answer both the ecological and technical (data and analysis) components of these questions, the BRC sponsored and coordinated pilot studies on five aspects of a potential national assessment:

  1. Vertebrate species richness as a metric of biological diversity;
  2. Land use/land cover derived from remote sensing as a measure of environmental diversity;
  3. Analysis of the species and land characterization data;
  4. Analysis of stressor data derived from existing sources for anthropogenic stressors and natural environmental factors; and
  5. Site selection algorithms for prioritizing sites for conservation based on species composition and/or stressors.

In 1994, the BRC entered a cooperative agreement with the University of California, Santa Barbara (UCSB), to continue the investigation by developing and analyzing a GIS database of stressors and biophysical factors. This report presents the results of the UCSB research related to aspect #4 above.

A stressor can be defined as a factor operating at the organism, population, community, or ecosystem level of organization that causes or may cause a change in habitat, or a change in exposure to adverse physical, chemical, or biological conditions. We can distinguish between stressor as a disturbance and stress as the response to disturbance. This definition implies that the effects of stressors should be measured in relation to reference conditions. Indicators of stress can also be divided into stressors that directly affect biodiversity at one or more of the ecological levels mentioned above and conditions that characterize the effects of stress.

EPA specified that the study area be the three West Coast states of the United States: Washington, Oregon, and California for which the requisite data on species distributions by hexagon were being compiled by The Nature Conservancy. This region, referred to in this report as the West Coast Transect or WCT, spans 16 degrees of latitude and 10 degrees of longitude. The WCT environments range from cool, moist temperate rainforest in the Pacific Northwest to hot, arid deserts in southeastern California near the border with Mexico. The region provides a good sampling of environments and habitats to examine patterns of richness and relationships with biophysical variables. This region has also experienced a wide range in the magnitude and types of environmental stress. Including three states in the study area, containing a combination of public and private lands, required that data sets evaluated in this pilot study be comprehensive across states and ownership, and therefore would be reasonable prototypes of a national database. Sampling units for the BRC national assessment were defined as the 640 km≤ sampling hexagons designed for EPAís Environmental Monitoring and Assessment Program.

The UCSB research attempted to answer several questions regarding the level of risk to biodiversity. Which existing data sets relate to stressors and biophysical factors that affect biodiversity? How should they be manipulated to create a consistent database (i.e., structured by EMAP hexagons) that best represents these factors with respect to species richness and stress? In addition, the BRC program needed answers to two specific questions in addition to identifying a candidate set of stressors (and biophysical factors):

  1. How well are the stressor (and biophysical factor) data correlated with patterns of biological diversity?
  2. How consistent are the stressor (and biophysical factor) data?

As a pilot study, the primary criteria for evaluating project success will be whether the two BRC questions about consistency between data sets and their relationship to biodiversity were answered. Stressors and biophysical factors should be selected for analysis based on two criteria:

The next three sections are organized around the major research themes of the study: patterns of species richness and relationships with biophysical factors, the potential for remotely sensed data to estimate environmental stress, and an assessment of risk to regional biodiversity. The subsequent section describes the application of stressor data in several related research projects. The report concludes with our answers to the two research questions above and with our recommendations for future research.

Comparisons across Taxa of Biophysical Predictors of Species Richness

Recent improvements in the quality and resolution of spatial data on species distributions and related biophysical factors prompted a study to re-examine the relationship between them. Our research objectives in this study were to 1) identify common biophysical factors that predict richness among taxa, 2) evaluate the potential contribution of remotely sensed data as an integrator of biophysical factors for predicting richness, and 3) identify natural stressors from among the biophysical variables that could be used in Chapter 4 for analysis with anthropogenic stressors.

Data on species distributions had been compiled by EMAP hexagon by The Nature Conservancy for the three western states. Improved biophysical data include new interpolations of climate variables that account for topographic effects, recently completed soils maps, and remotely sensed data products known to be related to primary production. Species richness was compiled for birds, mammals, amphibians, reptiles, rare vertebrates, and trees at the resolution of EMAP hexagons. Biophysical factors were summarized as means and standard deviations of the pixel values within hexagons (see Mean annual precipitation as an example). The relationships were examined by regression tree analysis to identify the most useful predictors of richness and compare these with other findings from the scientific literature (see the regression tree predicting tree species richness as an example).

The most frequently used biophysical variables in the regression tree models for the different taxonomic groups represented broad regional patterns of climatic conditions, i.e., mean values in hexagons. Variables representing habitat heterogeneity within EMAP hexagons, i.e., standard deviations, were very seldom selected for the models, and then only for lower level splits in the regression tree. The one exception was for mammals where the standard deviation of mean annual precipitation was the most important variable in predicting richness. Even the habitat heterogeneity factor derived from the number of ecosystems in a hexagon, as mapped by Hargrove and Luxmoore (1998), was never selected in any richness model. Soil factors were not generally useful in this study area in predicting richness patterns. The climate factors adequately captured the regional patterns of richness so that elevation and topographic diversity or relief were not needed, although they were frequently cited in past literature. Perhaps interpolation of precipitation and temperature over digital elevation models to high resolution grids was a substantial improvement over the cruder estimates from meteorological stations used in the past. Solar irradiance was a factor in every model except for rare species. Rarely was the same factor selected for the primary split in a regression tree for more than one taxon.

The satellite-based remote sensing data played a small role in predicting richness. The set of models with NDVI factors caused a small improvement in the variation explained over those without them in several groups. They were not used at all in models for rare vertebrates or trees. The variation explained for reptiles was actually reduced slightly when using NDVI factors but this may be a reflection more on the subjectivity in pruning regression trees than on the usefulness of the data. Potential weaknesses in using NDVI composite data from USGS in biodiversity studies have been discussed elsewhere ( Stoms et al. 1997). In addition, NDVI reflects current land use and disturbance, whereas the TNC species data tends to portray long-term patterns of species distributions. At the scale of EMAP hexagons, actual productivity may be less useful than potential primary production.

Land use information may be better applied in understanding potential stresses on biodiversity. Human land use can divert the natural inputs of precipitation and energy away from native species and ecosystems. This change would be reflected in imagery obtained from space. Land use also disrupts normal ecosystem flows and processes by reducing and fragmenting habitats, increasing mortality, introducing superior competitors or predators, and exposing species to harmful toxins. While these stressors do not necessarily modify the basic inputs found in this chapter to be associated with patterns of species richness, they lower the potential of native species to use them fully. Species are also exposed to natural stressors, such as extremes of heat and cold between seasons of the year.

Modeling Potential NDVI to Monitor Environmental Stress

One of the most dramatic impacts of human activities on ecosystem functioning is the appropriation of net primary production (NPP) for human uses at the expense of much of the rest of the biota. Assessing the amount of this appropriation and monitoring its expansion is difficult at the resolution needed for regional scale studies. Data are often too coarse (e.g., county, state, or national level statistics), are only collected in some areas (e.g., National Resource Inventory on non-federal lands), or are collected infrequently (e.g., the decadal census).

Periodic satellite remote sensing offers one tool for monitoring ecosystem functioning. Therefore a method that tracks changes in NPP would estimate levels of environmental stress that may cause decline or extinction of local species populations. In this study, rather than using AVHRR imagery to detect changes between two dates, we developed a simple predictive model of potential greenness in the absence of human land use effects. Sample pixels were selected from nature reserves as mapped for the Gap Analysis Program where land uses generally maintain natural ecological processes. A regression tree model for the West Coast Transect study area (California, Oregon, and Washington) used mean annual precipitation, mean January and July temperatures, and available soil water capacity, and it captured most of the variation in NDVI in undisturbed natural areas. Actual greenness deviated as expected from potential greenness in response to human land use effects such as urbanization and agriculture.

Patterns of NDVI: a) actual and b) predicted (or potential) time integrated NDVI, c) positive deviations between predicted and actual time integrated NDVI (greenness was predicted higher than actual) and d) negative deviations (greenness less than predicted).

Regression trees are a useful exploratory data analysis technique for predictive modeling as was done for NDVI here. The method typically captures broad regional patterns at the upper levels of the hierarchical divisions, and then incorporates local processes or effects at lower levels. In fact, unexpected relationships may emerge that contradict the results of simple correlation analysis. For instance, mean July temperature was negatively correlated with time integrated NDVI because the hottest locations in the deserts have very sparse vegetation cover. In the branch of the model with high precipitation and cold January temperatures, however, warmer July temperatures predicted higher NDVI in the interior mountains. This important local relationship would probably have been overlooked in multiple regression analysis because of the conflicting trends in subsets of the data. The drawback of regression tree analysis is that pruning trees is still a case-specific process. Analysts could produce different trees from the same data set based on how they choose to prune the initial tree that fits all observations. Defining the study region to be modeled could also influence the final shape of the tree. Some local patterns may have been better predicted if the West Coast Transect had been partitioned into smaller subregions first. In this case, when a categorical variable for ecoregions was tested in the regression tree analysis, it did not enter the final model. The risks of doing so are that artificially abrupt changes will be created in the predicted variable at the boundaries of subregions and that training samples will be inadequate to represent the variation in each subregion.

While this approach appears promising for monitoring environmental stress, it is far from operational. Additional research is needed to corroborate these preliminary findings and to make the methodology reliable. Limitations of the present study can be summarized as those of AVHRR data, sampling of training sites, and validation. For all these reasons, we present the regression tree model developed in this report only as a promising initial step rather than as a finished work.

There are several limitations to the AVHRR data set used in this study. The model was developed using only 1990 NDVI composites from USGS. We know that NDVI metrics can vary significantly between years in response to interannual differences in weather, local disturbance, and sensor factors (e.g., calibration, drift in time of acquisition, and replacement of satellites). Using a single date to develop the predictive model has two limitations. First, the variables that enter the model, and therefore potential greenness, are likely to be sensitive to the choice of year(s) used to construct it. Similarly, the deviations calculated in comparison with a given yearís NDVI data might be sensitive to the data used in model building. Time integrated NDVI is probably less sensitive than other metrics such as the date of first green-up. It would still be prudent to test the sensitivity of the model and deviations to the choice of year. At the very least, sensitivity analysis would identify the effects of year-to-year variation in actual NDVI and deviations from baseline and therefore the level of deviation that would robustly detect actual change. Perhaps a model could be developed using NDVI averaged over many years to remove the influence of weather variability, or alternatively to use standardized principal components of multi-year data.

Modeling was limited to a study area on the west coast of the United States. Comparisons with other regions and for the entire nation will be needed to evaluate how general our results are. The compositing strategy used by USGS is known to produce NDVI images that tend to be biased towards off-nadir views, particularly in the western U. S. This satellite zenith angle bias blurs the effective resolution of the NDVI images and tends to inflate the apparent value of NDVI from a combination of atmospheric and bi-directional reflection effects. Ideally, alternative compositing algorithms should be tested in comparison with the maximum value compositing algorithm, curently in use by USGS, for the robustness of the model and its accuracy in detecting real change in greenness. Other vegetation indices should also be examined, since some of these are less sensitive to atmospheric effects and soil background color. Others have had success combining land-surface temperature with NDVI to detect changes in vegetation.

The second issue relates to the choice of training data used to develop the regression tree model. We selected pixels randomly from managed areas that, in general, are not urbanized, cultivated, or intensively managed for resource extraction. The training data were extracted from areas being managed primarily for biodiversity objectives as mapped by the Gap Analysis Program. This does not mean that all pixels in the training set have been unaffected by human influences such as livestock grazing, mining, dams, or invasion of exotic species. Wildfires are routinely suppressed in the majority of these areas with varying effects on the plant canopy. Selecting training pixels known to be as similar as possible to presettlement conditions would be essential to develop the best predictive model. This will require a substantial effort to identify enough such sites to provide an adequate sample of all environments in the study area. Finding adequate sites will be problematic in many parts of the world where much of the landscape has already been altered.

This pilot study only evaluated deviations between potential and actual NDVI against mapped information on land use and land management. Further work is needed to interpret the results with higher resolution map or field information to determine if apparent deviations truly reflect environmental stress and not model errors.

The launch of the Moderate Resolution Imaging Spectrometer (MODIS) is currently scheduled in late 1998. MODIS will provide global image coverage of better quality than its AVHRR predecessor. Radiometric calibration, atmospheric corrections using specially chosen water absorption bands, and other enhancements will greatly improve the quality of daily images for monitoring environmental stress. To make our predictive modeling technique useful in the MODIS era will require careful calibration of the two data sets. The new MODIS data stream can then be appended to the historical time series compiled of AVHRR imagery. MODIS will give us both a more useful tool for monitoring changes in primary production as well as extending our historical record of change.

Assessment of Risks to Regional Biological Diversity

In this analysis, we set out to answer three questions. The first question dealt with the pattern of species richness for rare and vulnerable terrestrial vertebrates. By extracting the species from the TNC database that were ranked as G1-G3 and tallying the number of species in each hexagon, we observed a distinct geographic pattern in richness. The hexagons with the greatest number of species (9-14) occurred along the California coastal zone, with a narrow band in the north and a wider band in the south. Richness of rare species generally declined with distance from the coastline. Desert regions in California generally had 1-3 rare species, while the interior hexagons of Oregon and Washington frequently had none.

Regression tree modeling was applied to answer the second question about the relationship of this pattern of richness to biophysical and anthropogenic stressors. Data were compiled for 13 stressors. Two data sets represented natural stressors, 8 were anthropogenic stressors, and 3 were derived from satellite data and represented a combination of both types of stressors. Some of the data sets correspond to actual stressors (e.g., roads), while others are surrogate measures of environmental conditions (e.g., an index of habitat condition or percentage of area protected). For simplicity, we refer to all factors as "stressors" throughout the text. By far the most important predictors of rare species richness were two natural stressors, seasonal temperature difference and degree-day cool sum. These two variables represent the extremes of hot and cold to which rare species must adapt and to the severity of the winter in which body temperature must be preserved and food must be available. Rare species richness was highest in hexagons with the lowest values of these stressors, that is, where the climate is relatively mild year-round such as with a marine influence. The only anthropogenic stressors selected in the regression tree model were the number of exotic species (both total and terrestrial vertebrates alone). Rare species and exotic species tended to have similar distribution patterns. It is unclear from our analysis whether exotic species have caused more vertebrates to become rare and vulnerable or simply that, in the West Coast Transect study area, both are influenced by the same, undetermined ecological processes. The more direct measures of stress such as population density, roadedness, or habitat loss were not used by the regression tree model.

Our third question, about the value of satellite data in estimating environmental stress and the number of vulnerable species, produced a negative result. None of the three measures of environmental stress from NDVI were selected by the regression tree. The most significant differences between potential and actual NDVI were in urban and agricultural areas, which were not generally associated with large numbers of rare species. It may be that the vulnerable species have already be extirpated from these hexagons and were thus not in TNCís database.

Our study was hindered by a lack of stressor data at the required resolution. A great deal of data, however, exist at the county scale or similar geographic units. It may still be possible to use these data to estimate stressors are the finer, hexagon scale through development of smart interpolation methods. For instance, grazing density could be inferred based on a model that uses commonly available spatial data such as topography. This kind of GIS model can explicitly limit predicted land uses to appropriate environmental settings and land stewards while disaggregating county level statistics. Other land uses such as logging might be modeled in a similar manner, as might the agricultural census data on chemical applications.

We mentioned at the beginning of this chapter how human activity has appropriated a large proportion of NPP and energy. Estimates of this monumental alteration of ecosystem function has only been estimated at national or global levels. We were disappointed in our attempts to apply this approach at the hexagon level. Data inputs were often too coarse, as discussed above. It may yet be possible to implement this approach, but it will require greater use of the coarse-scale data and smart interpolation techniques. One intriguing possibility of relating energy usage to biodiversity loss is to estimate energy usage from the nighttime lights data from DMSP satellite data.

There is an alternative approach that could be taken in using the stressor data sets. Rather than using them to predict richness of vulnerable species in a modeling context, they could be used to identify potential "train wrecks" in hexagons where both stresses and biodiversity are high. This would require the development either of thresholds for levels of stressors that correspond to threat to biodiversity or of a new index that synthesizes the effects of stresses into an overall metric of threat.

Applications of Stressors Data in Regional Conservation Planning

During the project period, we also participated in several related studies. These gave us the opportunity to develop methods for compiling and processing stressor data in smaller portions of the WCT study area and to apply these data in planning exercises. Two studies used similar measures of stress and management issues to guide selection of potential new biodiversity management areas. In both studies, stressors were used to rate the suitability of sites for biodiversity management as a complement to biological information. The first of these studies was part of the Sierra Nevada Ecosystem Project (SNEP) funded by the U. S. Forest Service and directed by Congress. The reserve selection modeling in this case was exploratory rather than for decision making. The second project, however, assisted TNC in planning their conservation portfolio for the Columbia Plateau ecoregion. Stressor data were valuable in the identification of an initial portfolio, which TNC staff then revised with detailed local knowledge of site conditions. The California Gap Analysis Project used stressor measures of roadedness, projected population growth, and level of protection to predict vulnerability of native plant communities.


We recommend three areas for further research.

  1. Expand the analysis of biophysical correlates of species richness to other regions. The findings reported here are only applicable to the three west coast states. While the study area covers a wide range of environmental variation, from hot deserts to cool, moist temperate rainforest and alpine tundra, there are many environments of the United States that do not occur here. Currently, species data by hexagon have been compiled for the states of the Chesapeake Bay region (Pennsylvania, Delaware, Maryland, West Virginia, and Virginia) (Larry Master, personal communication). The Chesapeake Bay region is characterized by a continental climate, with hot, humid temperature forests and would make an interesting complement to the West Coast Transect. Another option would be to expand the WCT to include British Columbia and Baja California. Data still need to be compiled for these adjacent regions, and data consistency would be a concern.
  2. Continue the investigation of potential NDVI (or other suitable index derived directly form remotely sensed data) as a tool for assessing and monitoring environmental stress. The study should be expanded to other regions (as in recommendation #1) to determine if the findings reported here can be generalized to all environments. The analysis needs to incorporate more years of satellite data to look for trends as opposed to aberrations in 1990. In this regard, some deviation between observed and predicted NDVI may be the result of local weather phenomena instead of land use impacts. The data processing and modeling approach need to be studied more rigorously. The compositing method used to develop the NDVI time series is known to have some bias toward off-nadir views, particularly on the west coast. The influence of this bias on the NDVI modeling needs to be explored. Training data for developing the model of potential NDVI were selected from a sample of pixels in nature reserves where it was assumed that land use impacts were negligible. Selecting training pixels known to be as similar as possible to undisturbed conditions would be essential to develop the best predictive model. This will require a substantial effort to identify enough such sites to provide an adequate sample of all environments. Further work is also needed to interpret results of the analysis with higher resolution map of field information to determine if apparent deviations truly reflect environmental stress and not model errors. If successful, this approach could be a valuable part of environmental monitoring in the MODIS era.
  3. The analysis of stressor data with richness of rare species found little influence of anthropogenic stressors on rarity. This result may be due in part to the absence of key stressors in the analysis. Many stressors reported in the literature in summaries of causes of endangerment were not available for our study because they did not meet the criteria for spatial resolution or consistency. Further work is needed to model county-level statistics to higher resolution using "smart interpolation." This approach might make it possible to model the appropriation of net primary production by humans at the expense of the remainder of the ecosystem at the scale of hexagons. Another reason our results did not find strong association between stressors and rarity could be that the majority of rare species in the study area are not rare because of declining populations in response to stress but are naturally rare. If so, it may be more productive to use the stressor data to predict "hot spots" where large numbers of rare and vulnerable species coincide with high levels of stress. Using stressors for conservation planning rather than for research may require development of a stress index to integrate factors. We have used a subset of stressors as measures of suitability for selection of sites as biodiversity management areas and as a threat index for predicting the vulnerability of native plant communities.

Biogeography Lab Home Page