Further assessment was also done on one of our sources known as “russsource.” This source contained over 36,000 stations reporting maximum and minimum temperature. While the original format was consistent across all stations, it was discovered that this source included 27 individual sources. It was decided to split these sources up and place them individually in the merge following the source hierarchy defined by the databank working group. Because of some duplication with sources used in GHCN-D, only 20 of the 27 sources were included. In addition, station ID’s were brought into the Stage Two data, so that the merge’s ID test could be implemented. The same was done for the source known as “ghcnsource.”
Other than the above, no additional sources were added to the source hierarchy. One source however was removed (crutem4), because it was determined that the use of these stations as a last resort was causing stations to be unique because of the data changes through bias corrections. Candidate stations from crutem4 were matched with their respective target stations through metadata tests, but were chosen as unique from the data tests, because of these corrections. In order to avoid excessive station duplication, this source was removed.
Changes to Merge Algorithm
The first step of the merge algorithm takes into account the metadata between a target and candidate station, including the stations latitude, longitude, elevation and name. A quasi-probabilistic comparison is made and the result is a metadata metric between 0 and 1. In version 1.0.0, this metric needed to pass a threshold of 0.50 in order to be considered for merging. Analysis showed that too many stations were being pulled through and forcing merges between stations that shouldn’t have. As a result, a stricter threshold of 0.75 was applied, in order to avoid this issue.
|Figure 2: Station count of recommended merge v1.1.0 by year from 1850 to 2014, compared to version 1.0.0, along with GHCN-M version 3.|
|Figure 3: Percentage of global coverage with respect to 5 degree gridboxes for the recommended merger v1.1.0 by year from 1850-2014, comparted to version 1.0.0, along with GHCN-M version 3.|