Friday, April 8, 2016

Sand Mining in Western Wisconsin: Geocoding Mine Locations

Objective:

This blog is taking a step away from Python coding and back to frac sand mining in Wisconsin. The main purpose of the assignment is to geocode the locations of all sand mines in Wisconsin by using ESRI products such as ArcMap and ArcGIS Online. Once the mine address have been geolocated the results of this project will be compared to the actual locations of the mines and to the results of other individuals doing this same lab.

Methods:

The first step of the project was to normalize a DNR supplied dataset of the location of all mines in Wisconsin. Each student of the class was required to normalize and geocode 16 mines, meaning that each individual mine will be geocoded by at least two people in the class. This means that there will be results for an individual to compare to.

The DNR dataset is in poor shape as its creators did not follow very many conventions of data normalization; the worst offence being multiple attributes located in the same fields. New fields were added and the data was manually sorted out without the aid of any tools. If the dataset had more than 16 mines it would have been more time efficient to use an automated process. 

Figure one. Possible address locations after geocoding function.  
Figure two. Model used to query and 
merge class data into one dataset. 
After the dataset was edited enough to be compatible with ESRI ArcMap it was geocoded via its addresses. As this data set had all but one address present, it almost a simple process to find the correct locations for the points. When an address is geocoded the GIS software attempts to match it with the most correct location that it can. Unfortunately these are estimations and for many addresses there are different probabilities for each point. Figure on is a display of where different address points could exist for this address. The central point is the most true to reality so it was obvious to choose this option as the address location, even if the calculated probability for the other points are higher.

There were several address locations which did not seem to correlate with the location of any mines. It was a tedious, yet achievable, task to find the true locations of the mines with the use of Public Land Survey System (PLSS) data and aerial photography. Google Maps was heavily utilized as well.
Once the points were geocoded it was necessary to compare individual results with the rest of the class. All student results were stored in a shared class folder for all to access. In order to compare results it was necessary to create a new feature class with the use of a SQL statement for each student’s results. All these output files were then combined with the use of a merge tool, see figure two for operation model. This merged feature class was then used in conjunction of the individual geocoding results in ArcMap’s Generate Near Table Tool. A 1 kilometer search area was set for each input mine location and a planar search distance method was used.

Another aspect of accuracy assessment used for this lab was the comparison of individual geocoding results to those of the actual geographic coordinates of the mines. A feature class was created using the newly supplied coordinates for mine locations. This features class was used in the Generate Near Table tool in the same fashion (see tables one and two in results). 

Results:

Map one. Individually geocoded mine locations. 

Table one. Source data for geocoding project. 
Note how the address field is populated with multiple attributes. 

Table two. Source data after some normalization. Note the additional PLSS field. 

Standardizing the source Excel data table was a simple yet tedious exercise of sorting out all the information in the address column. As the data was compiled in far too few columns (See Table One) a new column was required to rectify the issue and prepare the points for mapping within ESRI ArcMap.

After checking and fine tuning the mine address points their locations can be correlated with the locations of several mines that are visible via aerial photography. Others were not so easy to locate as their locations could not be determined via aerial photographs. As mentioned earlier, imagery from Google maps was used extensively to search out the mine’s locations. 

Table three. Distance comparison between individual and
 class geocoding results. Distance in meters. 

Figure three. Statistics and distribution for distance 
measurements between individual and class geocoding results. 

Table four. Distance comparison between individual geocoding 
results and geographic coordinate locations of mines. 

Figure four. Statistics and distribution for distances between individual 
geocoding results and geographic coordinate locations of mines.
Discussions:

From the results section it is easy to see that there are many differences between individual, class, and actual mine location results. While the majority of the individual points were within 200 meters of both class points and actual geographic coordinates for each mine, there were several issues regarding class point outliers reaching 4-8 kilometers from the individual point. The individual points and coordinate points were most often closely associated. The minor issues experienced in almost all of the points was because of how the points were mapped: the coordinate points were at the approximate center of the mines, while the individual points were plotted at the entrance of the mine.

Generally the automate plotting of the points from the geocoding process were not as accurate as one would like; this is simply an inherent issue that has to do with how the geocoding program calculated the general address of points. Several other issues had to do with the quality of the source data. These points were initially placed far off from their actual location meaning they had to be manually relocated. This format translation issue more than likely was caused by the poor quality of normalization for the source data used.

After the addresses were digitized and relocated there were still several very minor and hardly mentionable issues regarding projections of several feature classes. This really paled in comparison to the most difficult aspect to overcome: the temporal accuracy of reference images. There were two mines in particular that could not be immediately located. The first was a relatively new mine as ESRI’s ageing aerial photographs of the area did not show it. The other mine was unable to be diagnosed even with the use of aerial photography from both ESRI and Google and after having gotten the coordinate location of the mine. This case is probably because the registration for the mine was recently added to the DNR database but no mining has yet occurred on location. The point designating the address for this mine is more than likely less correct than the other points in the dataset.

Honestly, the only real way to know if the address locations points are correct is to test them by visiting the locations of every mine in person. This is not a cost effective technique so a good alternative would be to use aerial photography (georectified) to associate the points to the mines. In the above exercise this was a paramount technique to ensure the accuracy of the mines locations.

Conclusions:

Have geocoded and compared 16 mine address with their actual and independently geocoded counter parts it was moderately surprising the amount of variation that was among the chosen locations. When individual results were compared to class results it resulted in most points being quite close to one another. There were a few outliers with distances that were extensive when comparing individual and actual locations. The convention here are that the majority of points were off by a higher amount than class results, but had only a few outliers that were closer to the mean distance than the other comparison. 

No comments:

Post a Comment