Using Multifield K-Nearest-Neighbor (KNN) Clustering

Introduction

Multifield K-Nearest-Neighbor Clustering, abbreviated as KNNMultifield, is an online tool that can be used to partition data into multiple clusters based on multiple attributes or fields. For more information on KNN please click here.

How to access the tool?

KNNMultifield can be accessed using standard internet browser. It has been tested on Google Chrome, Microsoft Internet Explorer, Mozilla FireFox, and Safari (MAC Only). The tool can be accessed on

http://dsiweb.cse.msu.edu/

Input Data file format

The input data must be stored in comma separated version (CSV) text file. The first row of the data must contains the header information, with a variable name for each column of the data. The rest of the data should be numerical data, where each field (column) is separated by a comma. Since this is a geospatial tool, there should be two column providing the latitude and longitude of each point. The coordinate should be provided in WGS84 geographic coordinate system. The order of the columns are not important. A sample data file is provided on the website.

NOTE: Other than the headers and the commas all the other entries should be numerical. If there is a column/field that has NON-NUMERIC DATA, such as name, remove that column from the CSV file; otherwise, the clustering WILL FAIL.

How to use the tool?

There are two steps in using this tool: (1) Loading Data, (2) clustering. Once the tool is loaded, it automatically focuses on step 1, Figure 1.

Figure 1: KNNMultifield once it is fully loaded.

Step 1: Loading Data

Wait for the page to completely load. Once the page is fully loaded click on “Choose File” button, under “Step 1: Loading Data” cascade pane. As long as there is no data loaded, the tool displays a message in red color that reads:

“No Data is Loaded. Load some data first.”

If the data is successfully loaded, you will get a message that it is done with loading the data, and reports the number of records that were loaded, Figure 2. Also the red message will change color to green, reading:

“Data is loaded. You can proceed now to Step 2: Clustering”

Figure 2: Once data is loaded, you will be prompted and the number of records is reported.

There is a “Hide”, and “Show” button available. These two buttons are used to hide/show the original, not-clustered data.

Step 2: Clustering

Once data is properly loaded, click on “Step 2: Clustering” Cascade pane, to activate the clustering control. Step 2 cascade pane is located on the lower left part of the page. If the data is properly loaded and is currently shown on the screen, all the columns or parameters should be listed, Figure 3.

Figure 3: Clustering Tool

Under “Select Fields:” select those fields that you want to use in clustering. There should be at least one field selected. Up to 10 fields can be selected simultaneously. To select multiple fields, hold CTRL.

Now select the number of clusters. The number of clusters must be an integer between 2 and 10.

In this example we choose “latitude”, “longitude”, and “V2” of the sample data set and set the number of clusters to 10, Figure 4. Once you are done with your selection click on “KNN Clustering”.

Figure 4: Choosing Long, lat, and V2 as the clustering fields, and setting number of clusters to 10.

The tool at this point collects the information and sends it to DSI geoprocessing server. The actual processing is performed on the server located at Michigan State University. Depending on the internet connection and how busy the server is, this might take a while. Meanwhile, you are prompted that the server is busy with processing, Figure 5. Once the processing is successfully finished, a green message appears showing the procedure was successful and the results is shown on the screen.

Figure 5: While the server is busy with processing the data, a message appears indicating the state of the server. In this picture it shows that it is processing.

Figure 6: If the clustering is successful, the results is shown on the screen along with the legends.