Documented Datasets: :climodatvtecmetar

IEM Climodat Reports and Data

Summary

This document describes the once daily climate dataset that provides observations and estimates of high and low temperature, precipitation, snowfall, and snow depth. Parts of this dataset have been curated over the years by a number of Iowa State employees including Dr Shaw, Dr Carlson, and Dr Todey.

  • Download Interface: IEM On-Demand
  • Spatial Domain: Midwestern US
  • Temporal Domain: 1893-today
  • Variables Provided: Once daily high and low temperature, precipitation, snowfall, snow depth

Justification for processing

The most basic and important long term climate record are the once daily reports of high and low temperature along with precipitation and sometimes snow. The most commonly asked question of the IEM datasets are climate related, so curating a long-term dataset of daily observations is necessary.

Other Sources of Information

A great source of much of the same type of data is Regional Climate Centers ACIS. The complication when comparing IEM Climodat data to other sources is the difference in station identifiers used. The history of station identifiers is long and complicated. The National Center for Environmental Information (NCEI) has made strides in attempting to straighten the identifiers out. This continues to be complicated as the upstream data source of information uses a completely different set of identifiers known as National Weather Service Location Identifiers (NWSLI), which are different than what NCEI or the IEM uses for our climate datasets.

Processing and Quality Control

There is nothing easy or trivial about processing or quality control of this dataset. After centuries of work, plenty of issues remain. Having human observers be the primary driver of this dataset is both a blessing and a curse. The good aspects include the archive dating back to the late 1800s for some locations and relatively high data quality. The bad aspects include lots of metadata issues due to observation timing, station moves, and equipment siting.

The primary data source for this dataset is the National Weather Service COOP observers. These reports come to the IEM over a variety of paths:

  • Realtime reports found in NWS SHEF Text Products, processed in realtime by the IEM
  • Quality controlled reports sent to the IEM by the State of Iowa Climatologist
  • Via manually downloaded data archives provided by NCEI
  • Via web services provided by RCC ACIS

The merging of these four datasets creates a bit of a nightmare to manage.

Frequently Asked Questions

  1. Where does the radiation data come from?

    The NWS COOP Network does not provide observations of daily solar radiation, but this variable is extremely important to many folks that use this information for modelling. As a convience, the IEM processes a number of reanalysis datasets and produces point sampling from the gridded information to provide "daily" radiation totals. A major complication is that the 'daily' COOP observations are typically at 7 AM and the gridded solar radiation data is extracted on close to a local calendar day basis. In general, the 7 AM value is for the previous day.

  2. Where does the non-radiation data come from?

    This information is primarily driven by the realtime processing of NWS COOP observations done by the IEM. For data older than the past handful of years, it is taken from the NCEI databases and now the ACIS web services. Some manual work is done to meld the differences in site identifiers used between the various online resources.

NWS Valid Time Extent Code (VTEC) Archives

Summary

The National Weather Service uses a rather complex and extensive suite of products and methodologies to issue watch, warnings, and advisories (WWA). Back in the 1990s and early 2000s, the automated processing of these products was extremely difficult and rife with errors. To help with automated parsing, the NWS implemented a system called Valid Time Extent Code (VTEC) which provides a more programatic description of what an individual WWA product is doing. The implementation of began in 2005 and was mostly wrapped up by 2008. The IEM attempts to do high fidelity processing of this data stream and has a number of Internet unique archives and applications to view this information.

  • Download Interface: Shapefile/KML Download
  • Spatial Domain: United States, including Guam, Puerto Rico and some other islands
  • Temporal Domain: Most WWA types back to 2008 or 2005, an archive of Flash Flood Warnings goes back to 2002 or so, and Tornado / Severe Thunderstorm Warnings goes back to 1986

Justification for processing

NWS issued WWA alerts are an important environmental hazard dataset and has broad interest in the research and insurance industries. Even in 2017, there are very few places that you can find long term archives of this information in usable formats.

Other Sources of Information

The National Center for Environmental Information has raw text product archives that do not contain processed atomic data of the actual WWA alerts. So the user is left to the adventure of text parsing the products. Otherwise, it is not clear if any other archive exists on the Internet of this information.

Processing and Quality Control

The pyIEM python package is the primary code that does the text parsing and databasing of the WWA products. A large number of unit tests exist against the various variations and quirks found with processing the WWA data stream since the mid 2000s. New quirks and edge cases are still found today with minor corrections made to the archive when necessary. The IEM continuously alerts and annoys the NWS when various issues are found, hoping to get the NWS to correct their products. While it has been a long and frustrating process, things do eventually get fixed leading to more robust data archives.

The pyIEM parsers send emails to the IEM developer when issues are found. The parser alerts when the following errors are encountered:

  • VTEC Event IDs (ETNs) being used that are out of sequential order.
  • Warning product segments are missing or have invalid Universal Geographic Code (UGC) encoding
  • Product segment has invalid VTEC encoding
  • Polygons included in the warning are invalid or counterclockwise
  • Timestamps are formatted incorrectly
  • The UGC / VTEC sequence of a particular product contains logical errors, for example a UGC zone silently drops out or into a warning.
  • Products are expired outside of the acceptable temporal bounds
  • Any other type of error and/or code bug that caused a processing fault

Frequently Asked Questions

  1. How do Severe Thunderstorm, Flash Flood, or Tornado warnings have VTEC codes for dates prior to implementation?

    Good question! A number of years ago, a kind NWS manager provided a database dump of their curated WWA archive for dates between 1986 and 2005. While not perfect, this archive was the best/only source that was known at the time. The IEM did some logic processing and attempted to back-compute VTEC ETNs for this archive of warnings. The database was atomic to a local county/parish, so some logic was done to merge multiple counties when they spatially touched and had similiar issuance timestamps. Again from the above, automated machine parsing of the raw text is next to impossible. The ETNs were assigned as a convience so that various IEM apps and plots would present this data online.

  2. The database has Weather Forecast Offices (WFOs) issuing WWA products for dates prior to the office even existing? How can this be!?!?

    Yeah, this is somewhat poor, but was done to again provide some continuity with current day operations. The archive database provided to the IEM did not contain the issuance forecast office, so without a means to properly attribute these, the present day WFOs were used. This issue is rarely raised by IEM users, but it is good to document. Maybe someday, a more authoritative archive will be made and these old warnings and be assigned to the various WSOs, etc that existed at the time.

  3. What are the VTEC phenomena and significance codes?

    The phenomena code (two characters) and significance code (one character) denote the particular WWA hazzard at play with the product. The NWS VTEC Site contains a one pager PDF that documents these codes. The NWS uses these codes to color encode their WAWA Map found on their homepage. You can find a lookup reference table of these codes and colors here.

  4. How do polygon warnings exist in the IEM archive prior to being official?

    The NWS offices started experimenting with polygons beginning in 2002. These polygons were included with the warnings, but sometimes were not geographically valid and/or leaked well outside of a local office's CWA bounds. On 1 October 2007, these polygons became the official warning for some VTEC types. In general, the IEM's data ingestor attempts to save these polygons whenever found.

ASOS/AWOS Global METAR Archives

Summary

The primary format that worldwide airport weather station data is reported in is called METAR. This format is somewhat archaic, but well known and utilized in the community. The IEM gets a feed of this data from Unidata's IDD data stream. The weather stations included are typically called "Automated Surface Observation System (ASOS)". The term "Automated Weather Observation System (AWOS)" is often used inter-changably.

  • Download Interface: IEM On-Demand
  • Spatial Domain: Worldwide
  • Temporal Domain: 1928-present (US), 2012-present (Worldwide)

Justification for processing

The highest quality weather information comes from the ASOS sites. These stations get routine maintenance, considerable quality control, and is the baseline hourly interval dataset used by all kinds of folks. The data stream processed by the IEM contains global stations, so extending the ingest to the entire data stream was not significant effort.

Other Sources of Information

NCEI Integrated Surface Database (ISD) is likely the most authoritative source of this information.

Processing and Quality Control

A Python based ingestor using the metar package processes this information into the IEM database.

Frequently Asked Questions

  1. Why is precipitation data all missing / zero for non-US locations?

It is the IEM's understanding that precipitation is not included in the global data streams due to previous data distribution agreements. The precipitation data is considered of very high value as it can be used to model and predict the status of agricultural crops in the country. Such information could push commodity markets. For the present day, other satellite datasets likely negate some of these advantages, but alas.

  1. How are "daily" precipitation totals calculated?

In general, the ASOS stations operate in local standard time for the entire year round. This has some implications with computation of various daily totals as during daylight saving time, the calendar day total will represent a 1 AM to 1 AM local daylight time period. For the context of this METAR dataset, not all METAR reporting sites will generate a total that can be used for assignment of a calendar day's total. So the IEM uses a number of approaches to arrive at this total.

  • A script manually totals up the hourly precipitation reports and computes a true local calendar day total for the station, this total may later be overwritten by either of the below.
  • A real-time ingest process gleans the daily totals from the Daily Summary Message (DSM) issued by some ASOS sites.
  • A real-time ingest process gleans the daily totals from the Climate Report (CLI) that is issued for some ASOS sites by their respective local NWS Forecast Offfice.

Not all stations have DSM and/or CLI products, so the manual totaling provides a minimum accounting. The complication is that this total does not cover the same period that a CLI/DSM product does. So complicated!