## IEM Representivity Analysis

#### Daryl Herzmann and Jeff Wolt

Department of Agronomy, Iowa State University

The scale of representivity of a sampling network determines the area and time over which a measurement represents true conditions. This document provides information and statistics regarding the spatial representativity of the Iowa Environmental Mesonet (IEM) following methodology described by Dubois (2000).

- Introduction
- Nearest Neighbor Index
- Fractal Dimension
- Morisita Index
- Thiessen Polygons
- Coefficient of Representativity
- Conclusions

### 1. Introduction:

The IEM is a collection of environmental monitoring networks that have been linked to produce a more precise analysis of weather and climatic variables in the state than can be achieved with any single network. To produce higher resolution products for the entire state, point observations are often gridded onto a regular grid using well known algorithms like nearest neighbor or a weighting function. This process is not arbitrary due to the numerous tunable parameters to most any analysis algorithm. It is important to investigate the characteristics of the Iowa Environmental Mesonet to determine the appropriate scales at which phenomena can be accurately analyzed.

Since the IEM is a collection of networks, each one is evaluated on its own. Groups of similar networks in data quality are then aggregated to explore the spatial statistical benefits of the IEM partnership.

Abbreviation | Description | Map |
---|---|---|

ASOS | The Automated Surface Observing System sites are located at primary airports and are maintained by the National Weather Service. This network is rigorously checked and quality controlled. For Iowa, a limitation of the ASOS network is the lack of sites located in Westcentral and Southwest Iowa. | |

AWOS | The Automated Weather Observing System sites are located at secondary airports and are maintained by state government entities via third party contractors. Data quality should be comparable with the ASOS network since they are using similar instrumentation, but a number of factors produce biases when compared with the ASOS network. The main issues include siting, maintenance schedules, and network communication. Most of the significant differences between the ASOS and AWOS network show up in the temperature and dew point measurements due to siting. | |

RWIS | The Roadway Weather Information System sites are located along side interstates and highways. They are maintained by their respective state department of transportation via a third party contractor. Atmospheric data from this network is of lesser quality than the ASOS or AWOS networks. The primary purpose of the RWIS network is for automated observation of winter weather. The IEM collects RWIS data from Iowa, Minnesota, Wisconsin, Nebraska, and Kansas. Other states may have RWIS networks, but often their data is proprietary. | |

SchoolNet | The IEM collects data from a number of SchoolNets. The IEM collects data from three SchoolNet data networks operated by KCCI-TV, KIMT-TV, and KELO-TV. Data quality is close to that of the RWIS network, but siting issues make measurements like wind and precipitation problematic. | |

ISUAG | The ISU Ag Climate sites primarily provide soil temperature and solar radiation readings. They are maintained by the Iowa State University Experiment Station. The main issue with this network is the timing of observations received. Each station is called daily to download its observation record from the previous day. |

Abbreviation | Description | Map |
---|---|---|

Tier 1 | ASOS + AWOS: The combination of these two networks is commonly used in the meteorological community to produce various analysis products. Both networks are located at airports and are for the most part well maintained. Their instrumentation is very similiar as well. | |

Tier 2 | ASOS + AWOS + RWIS: The RWIS network is included in this grouping, since it has the closest data quality reputation to the ASOS+AWOS networks. The quality of the instrumentation used on the RWIS sites is close to those found on the AWOS sites. | |

Tier 3 | ASOS + AWOS + RWIS + SchoolNet + ISUAG: All networks included. |

### 2. Nearest Neighbors Index (NNI)

The NNI is a statistic comparing the mean minimum distances between sampling points to those that are expected by chance over some sampling area.

#### Procedure:

Point shapefile coverages for each of the network groups were created. These files were loaded into ESRI's ArcGIS 9.1 software. An ArcGIS extension named
*Nearest Neighbor Analysis / Event-Event Distances* written by Dr M. Sawada was used to compute the NNI.

#### Data:

Network | Sites | NNI | Z | Avg Min Distance [km] | Expected Min Distance [km] |
---|---|---|---|---|---|

ASOS | 38 | 1.05 | 0.76 | 38.3 | 36.5 |

AWOS | 76 | 1.14 | 2.12 | 40.9 | 36.0 |

RWIS | 101 | 1.10 | 1.88 | 34.2 | 31.0 |

SchoolNet | 102 | 0.69 | 5.54 | 21.4 | 31.0 |

ISUAG | 12 | 0.76 | 1.44 | 74.7 | 98.8 |

Tier 1 | 114 | 0.82 | 3.83 | 20.6 | 25.2 |

Tier 2 | 215 | 0.72 | 7.57 | 13.9 | 19.2 |

Tier 3 | 329 | 0.76 | 7.98 | 12.0 | 15.7 |

Table 3 presents the results of the NNI computation. Also presented are the Z statistic, average minimum distance between two sites and expected minimum distance based on a homogeneous in space distribution of sites.

#### Analysis:

NNI Value | Details |
---|---|

> 1 | increasing values greater than 1 indicate more dispersion |

= 1 | indicates sampling points have a uniform distribution |

< 1 | decreasing values less than 1 indicate more clustering |

Table 3 shows the results of the NNI computation. For NNI, the closest value to unity (which is desired for this statistic) is the ASOS network. This implies that the ASOS network is representative at the expected minimum distance of 36.5 km. For the most part, the AWOS and RWIS networks have values close to unity as well. The SchoolNet and ISUAG values well below unity indicate clustering. When the networks are grouped together, Tier 1 has the closest values to unity. The addition of more clustered networks in Tier 2 and Tier 3 negatively impact the NNI statistics.

The Z statistic is also presented in Table 3. If the NNI statistic
indicates dispersion or clustering, the Z statistic tests to see if the
difference of NNI from unity (randomness) is significant (Clark 1954).
Values above 1.96 or below -1.96 indicate a 95% confidence that the
distribution is **not** randomly distributed. The only networks within
this desired range are the ASOS, RWIS, and ISUAG networks. The large value
of 5.54 for the SchoolNet clearly indicates the clustering of this network
given our domain of interest. It is interesting to note that all three
"tiers" of networks have rather large Z values. This can be explained by
considering that many cities (spatially confined area) have a station from
a couple of the networks. Thus the combination of networks into tiers
produces some clustering.

### 3. Fractal Dimension

The fractal dimension (D_{m}) is used to characterize the distribution of a
geophysical data array (Lovejoy 1986). An uniform distribution of
elements on the surface of the Earth will have a fractal dimension of 2.
All other distributions should have a value between 0 and 2, with the degree
of inhomogeneity being measured by 2 - D_{m}.

The correlation dimension (D_{c}) is often used as an approximation
to the fractal dimension since it is easier to compute. To compute
(D_{c}), rings of increasing radius are formed around each member of
the observing network. The number of stations are then counted within each
of these rings and then averaged for rings of the same radius. The result is
an average station frequency at various distances.

Doswell and Lasher-Trapp (1997) point out two problems with the computation of the correlation dimension. The first is the problem of considering stations located near the edge of the observing network. These stations will only have a majority of their neighbors to one preferred direction. Data from these stations will bias the computation. The other issue is the arbitrary nature of actually computing the correlation dimension which involves computing the slope of a line comparing the natural log of the station count versus the natural log of station frequency. This slope of this line, thus the correlation dimension, is often not consistent throughout the range of data. Two parts of the curve may appear to be very linear, but their slopes will often be much different.

#### Procedure

A python script was written to compute the needed variables for the correlation dimension statistic. The spatial computations were done in the PostGIS database using the NAD83 UTM Zone15 North projection.

#### Data

Figure 1: Correlation Dimension

Distance | ASOS | AWOS | RWIS | SchoolNet | ISUAG | Tier 1 | Tier 2 | Tier 3 |
---|---|---|---|---|---|---|---|---|

30-100km | 1.01 | 0.85 | 0.84 | 0.70 | 0.69 | 0.91 | 0.79 | 0.76 |

30-300km | 2.23 | 1.90 | 1.86 | 1.38 | 1.99 | 2.00 | 1.82 | 1.75 |

#### Analysis

The issues pointed out in Doswell (1997) are evident in this analysis. The arbitrary selection of points used in the linear regression yield results that may not be accurate for the entire range of scales represented. Ideally, values should not exceed 2 with 2 being an uniform distribution in 2 dimensions. In both ranges, values decrease with the inclusion of Tier 1, 2, and 3 collections. The reason for this is the increased bias introduced by edge effects as the center of the domain becomes more densely populated and the exterior portions of the state (namely Northeast and Southern Iowa) become comparatively less.

### 4. Morisita Index

The Morisita index investigates how much clustering occurs when a sampling network is broken up into regular cells. If each cell has the same number of observation points inside of it, then the index should indicate uniformity in space at that scale. The equation is as follows:

_{i}is the number of samples in i

^{th}cell, and N is the total number of sampling points.

#### Procedure:

A python script was written to compute this value. The spatial computations were done in NAD83 UTM Zone 15N, which is a common projection for Iowa.

Figures 2 illustrates the procedure for computing the Morisita index for a hypothetical network. Within an area of interest, there is a distribution of observation points (left side graphic). The area of interest is divided up into equal sized cells; the number of cells with sites in them (right side graphic) represents the contributing members to the Morisita index.

Observation Points | Divided up by 25 cells |
---|---|

Figure 2: Illustration of Morisita Index

In this example, the Morisita index is around 13. This is a very high number and indicates clustering. This procedure is iterated over a range of grid cell sizes to arrive at a better understanding of how the clustering varies at different scales.

#### Data:

Figure 3: Morisita Index

#### Analysis:

Value | Conclusion |
---|---|

0 | No clustering. All of the cells had 1 or fewer sampling points. |

approaches 1 | for regular distributions, the index increases to a value of 1 with increasing cell size. |

nearly 1 | if the distribution of samples is random but homogeneous, the value will fluctuate around 1 |

> 1 | clusters are present |

Figure 3 indicates the scales at which clustering (values larger than 1) is ocurring. At the smallest scales of 10-40 km, clustering is evident in the Tier 2 and Tier 3 networks. This makes practical sense considering the co-placement of RWIS sites near cities with ASOS/AWOS sites in Tier 2 and the overall clustering of SchoolNet sites in Tier 3. The general clustering of Schoolnet sites is very evident with values well above unity.

### 5. Thiessen/Voronoi polygons

Thiessen polygons have been a traditional staple of the spatial statistician. Each constructed polygon contains exactly one measurement point while having the property of all points within the polygon being closer to this measurement point than any other point. Isolated measurements will therefore have larger polygons than clustered measurements. Histograms of the area covered by these polygons provide insight into network homogeneity.

#### Procedure:

The point coverage shapefiles were loaded into ArcGIS 9.1. The ArcInfo Thiessen polygon tool was then used to generate the polygons. These polygons were then clipped by the border of Iowa within ArcGIS as well. The resulting coverage was exported to shapefile and then loaded into the IEM's spatial database for analysis.

#### Data:

Figure 4: Tier 1 | |
---|---|

Figure 5: Tier 2 | |

Figure 6: Tier 3 | |

#### Analysis:

Thiessen polygons provide a quick synopsis of how large of an area is represented by a single point. Areas in a network that are under sampled can quickly be seen, since large anonolous polygons will visually stick out.

Figure 4 shows the histogram and Thiessen Polygons for Tier 1. The undersampled areas of Northern Missouri, Northeastern Kansas, and South Dakota are clearly seen by their comparatively large polygons. The histogram illustrates this difference with a large clustering of smaller polygons and then numerous large ones outside of Iowa.

The addition of the RWIS networks in Tier 2, shown in Figure 5, almost exasperates the situation. The lack of RWIS sites in Missouri and South Dakota don't help shrink the size of the polygons in those areas. The result can be seen in the histogram with the general trend of smaller polygons, but still possessing some very large ones as well.

For Tier 3, the clustering of sites in Central Iowa with the SchoolNet
network creates a large number of smaller polygons and doesn't help to eliminate
the larger polygons in Missouri and Illinois. The histogram shows a mode
around 500 km^2 with around 15% of the polygons at least 7 times larger. The
conclusion is that for the **entire domain**, the addition of other networks
is only beneficial when those networks have stations in each state used for
this analysis.

### 6. Coefficient of Representativity

The Coefficient of Representativity (CR) was proposed by Dubois (2000) as a measure to combine the Thiessen polygons and the distance to the nearest neighbor.

_{Th}) to the mean Thiessen polygon (S

_{m}): total area divided by number of sampling points); the second (B) is a ratio between the squared distance between nearest neighbors and the same mean Thiessen polygon.

#### Procedure:

Thiessen polygons and network coverages were loaded into the IEM spatial database. Python scripts were then written to analyze distances and compute other parameters needed for the CR.

#### Data:

Figure 7: |
Figure 8: |

Figure 9: |

#### Analysis:

CR Value | Conclusion |
---|---|

CR > 1 | The measure is isolated. |

CR ~~ 1 | Ideal case, measure is spatially uniform. |

CR < 1 | The measure is clustered |

Tier 1 clearly shows the lack of observing sensors in Missouri, Nebraska, and South Dakota. Ideally, the "white" color would dominate the map indicating areas of uniform representativity. The analysis shows that perhaps only portions of Northcentral Iowa are representatively measured by sensors in Tier 1.

It is interesting to note the dark blotches that appear in Figure 8 for Tier 2. The dark blues represent clustering and those are locations where a RWIS site is located close to one or more ASOS/AWOS sites. The overall impact of adding RWIS sites is to increase the sparsity in states where the IEM doesn't collect RWIS data.

The addition of SchoolNet data in Tier 3 only makes the contrast more distinct. It is interesting to note the strong gradient of CR along the Iowa/Missouri border. This is a result of a dearth of observations in Northern Missouri.

### 7. Conclusions

To evaluate the spatial representativity of the Iowa Environmental Mesonet (IEM), five spatial statistics were computed following the work done by Dubois (2000). While each statistic showed some positive aspects of the Mesonet collaboration, they clearly point out the need for caution before blindly combining all available data into a high resolution product.

The CR figures most clearly illustrate the difficulties of adding stations from other networks and their localized impact on the overall analysis. It is particularly dangerous to add networks that are not uniformly distributed across our domain of interest. The easy solution would be to add stations in those areas to 'fill in the holes'. Up until this study, the IEM has focused on adding other networks to fill in these holes without realizing the implications of additional clustering in areas like Central and Eastern Iowa. This "catch-22" can clearly be seen in Figure 8 for Tier 2 when the RWIS sites, which are often located near the same towns as ASOS/AWOS sites, are added. The addition of nearly 70 SchoolNet sites in Central Iowa only further exasperates the situation with under sampled areas showing up over the rest of the state as shown in Figure 9.

Clearly, this study points out the need for a disciplined approach before adding networks in an ad-hoc manner to produce an analysis. Perhaps non-standard networks included in Tier 2 and Tier 3 can be subsampled to selectively augment undersampled areas as shown in Figure 7. There will always be trade offs when working with combining network and data. It is important that one of these trade offs is not degradation in data quality.

### References

Clark, P. and F. Evans, 1954: Distance to nearest neighbor as a measure
of spatial relationships in populations. *Ecology*, **35**, 445-453.

Doswell III, C.A. and S. Lasher-Trapp, 1997: On measuring the degree of
irregularity in an observing network. *Journal of Atmospheric and Oceanic
Technology*, **14**, 120-132.

Dubois, G., 2000: How representative are samples in a sampling network?
*J Geographic Info Decision Anal*, **4**, 1-10.

Lovejoy, S., D. Schertzer, and P. Ladoy, 1986: Fractal characterization
of inhomogeneous geophysical measuring networks. *Nature*, **319**,
43-44.

### Links

- Workshop on Methods and Tools for the design of Surface observation networks

SWS - 2002 held in Zurich, 14-15 May 2002