module1.knit

Module 1

Introduction to Self-Organizing Map

Chapter 1: Introduction to SOM
Chapter 2: Algorithm
Chapter 3: Guided Tutorial in R
– 3.1: Initialization
– 3.2: Sampling
– 3.3: Competition
– 3.4: Cooperation
– 3.5: Adaptation
Chapter 4: Implementation
– 4.1: Data Generation
– 4.2: Initialization
– 4.3: Best Matching Unit (BMU)
– 4.4:Training the SOM
Chapter 5: Optimization in R
Chapter 6: Running the model over multiple epochs
Chapter 7: Conclusion

Chapter 1: Introduction to SOM

The Self-Organizing Map (SOM) is an artificial neural network used for the visualization of high-dimensional data. It converts the nonlinear statistical relationships among high-dimensional data into simple geometric relationships of their image points on a low-dimensional display, usually a regular two-dimensional grid of nodes. Essentially the SOM compresses information while preserving the most important topological and/or metric relationships of the primary data elements on display [1].

Understanding how the SOM learns to represent high-dimensional data without any hidden layers was a challenge we faced while creating this tutorial. The SOM, unlike most popular neural networks, learns from the input data using a single grid of neurons that adapt and cooperate to form the final representation [1]. Its learning mechanism is fascinating, and the algorithm’s effectiveness in dimensionality reduction can be observed in the modules of this series.

Figure 1: Kohonen model or SOM

Learning Mechanism

The SOM model is an unsupervised system based on competitive learning in which the output neurons compete amongst themselves to be activated such that only one is activated at a time, and it is called the Best Matching Unit (BMU). BMU is obtained by comparing the neurons’ weights with a randomly selected vector from the input training data. The weight vector with the shortest distance from the selected input vector is the winner. There are numerous ways to determine the distance; however, the most commonly used method is Euclidean. The function used to evaluate the BMU is called the discriminant function [2,3]. In this tutorial, the Euclidean distance calculation method has been used as the discriminant function. Several other functions are available to calculate distances, like the Manhattan, Minkowski, and Hamming distance functions. The learner of this tutorial can experiment with different distance functions and observe the difference in results for each function.

Figure 2: BMU neighbourhood updation

Chapter 2: Algorithm

Figure 3: SOM flowchart

Steps:

1) Initialization: Create an m x n grid of neurons where m and n are random positive integers that correspond to the number of rows and columns of the grid. These numbers depend on the nature of input data and the ideal values should be determined through trial and error. Now, randomly initialize the weight of each neuron in the grid. For more details, kindly refer to the section 3.1 of this module.

2) Sampling: Select a random row (vector) from input training data. More details on this step is given in the section 3.2 of this module.

3) Competition: Neurons compete among themselves to become the BMU and the winner is determined using the discriminant function. Refer to section 3.3 of this module for more details.

4) Cooperation: The winning neuron determines the spatial location of a topological neighbourhood of excited neurons, thereby providing the basis for cooperation among neighbouring neurons [3]. For more details kindly refer to section 3.4 of this module.

5) Adaptation: Weights of the winning neuron and its neighbours get updated such that the response of the winning neuron to the subsequent application of a similar input pattern is enhanced [3]. Refer to section 3.5 of this module for more details.

6) Go back to Step 2: Take a different sample and repeat the process until the map convergence is achieved. For more details on this step, kindly refer to chapter 6 of this module.

These steps summarize the basic functioning of the SOM algorithm proposed by Kohonen and have been concisely defined after a comprehensive evaluation of different adaptations of the SOM algorithm carried out by us. We implemented the m x n grid of neurons as an n x n square grid. We found variations of the original algorithm created for different applications in published articles, but we incorporated the commonly followed procedure in this tutorial. The process of executing these steps in R involved a lot of trial and error due to the lack of open-source material on implementing the SOM algorithm purely in R.

Chapter 3: Guided Tutorial in R

3.1: Initialization

Create a grid of neurons with randomly initialized weights. The neurons are represented by weight vectors having the same dimensions as input. The random weight values are generated using the rnorm() function, which outputs normally distributed random numbers.

# 3.1.1) Creating a matrix with the name "grid" having 10 rows and 5 columns.
rows = 10
columns = 5
grid <- matrix(data = rnorm(rows*columns),  nrow = (rows), ncol = columns)
grid

##               [,1]        [,2]       [,3]       [,4]        [,5]
##  [1,]  0.758519418 -0.01862631  0.7935564  0.5045411 -0.94585286
##  [2,] -2.518547828 -0.83360121  1.4057935  1.8534798  2.28890678
##  [3,] -0.248307740  1.22421538  0.2626770 -1.2381788  0.53212445
##  [4,] -1.600305142 -1.69102909  0.2449216  2.4828802 -0.84935493
##  [5,]  0.198249556  1.29998920 -2.0089154 -0.2842236 -0.04306923
##  [6,]  2.103836363  0.07407905 -1.5085595 -1.7310954 -2.78811697
##  [7,] -0.235551096 -1.29560439 -0.4107427 -0.2933023 -0.02152416
##  [8,]  0.009826559 -3.34077560 -0.5299471  0.4765224  0.69341081
##  [9,]  0.632942364 -1.45163733  0.4258880  0.1789159  0.18111266
## [10,] -0.321978923 -0.84980877  0.8515051  0.5493414 -0.47959513

Before implementing this step in R, the implications of using randomly initialized weights were considered. Several published articles [4,5] mention the advantages of initializing the weights in a close range of input vectors within the given space. We decided to randomly initialize the weights for a generic implementation of the initialization step.

3.2: Sampling

Select a random row (vector) from input data. The sampling is performed using R’s sample() function, which retrieves a random value from its input data. In this example, we will pass a list of row numbers to this function to sample a random row number. Then we extract this row from the data matrix via indexing.

# 3.2.1) Sampling a random row (vector) from an input data matrix with the name "data".
data <- matrix(data = rnorm(150),  nrow = 30, ncol = 5)
sample_input_row <- as.vector(unlist(data[sample(1:nrow(data), size = 1, replace = F), ])) 
sample_input_row

## [1] 0.8718443 0.7427261 0.7289442 0.2095494 1.4608163

3.3: Competition

Neurons fight among themselves to become the BMU which is determined using the discriminant function. Here, we used the Euclidean distance calculation function as the discriminant function whose formula is shown below:

\[\begin{equation} d\left( x,y\right) = \sqrt {\sum _{i=1}^{n} \left( y_{i}- x_{i}\right)^2 } \end{equation}\] Equation 1: Euclidean distance formula
where \(x\) & \(y\) are vectors in the \(n\) dimensional space and \(x_{i}\) & \(y_{i}\) are values of the \(i {th}\) vector coordinates.

# 3.3.1) Creating a function to calculate euclidean distance.
euclidean_distance <- function(x, y) {
  ret <- sqrt(sum((x - y)^2))
  return(ret)
}

# Running the function over a sample input.
x <- c(2,3,4)
y <- c(1,7,9)
euclidean_distance(x,y)

## [1] 6.480741

The discriminant function calculates the distance between all neurons’ weights and the input vector. BMU is the neuron whose weights are closest to the input vector.

Figure 4: Discriminant function calculation

# 3.3.2) Defining the BMU function with parameters "x" and "input_grid", where "x" is a single input data row and "input_grid" is the grid.
BMU <- function(x, input_grid) { 
  distance <- 0
  min_distance <- Inf # Initialize the variable for minimum distance with infinity.
  min_ind <- -1 
  # Iterate through grid.
  for (e in 1:nrow(input_grid))
  {
    # Distance calculation.
    distance <- euclidean_distance(x, input_grid[e, ]) 
    if (distance < min_distance) {
      # Update minimum distance for winning unit.
      min_distance <- distance 
      # Update winning neuron.
      min_ind <- e 
    }
  }
  # Return index of BMU.
  return(min_ind-1) 
}

While creating the BMU function, we understood how the best neuron is selected using distance calculation. As the input is randomly sampled for each iteration, we wrapped the process of competition between neurons into a separate function. We decided to use the Euclidean distance to find the neuron with minimal distance between its weights and the input and declare it the winner. It seems counter-intuitive to randomly sample input as one may argue that the map will keep changing every time a new input is sampled and will never converge. This problem will be solved in the remaining steps of the algorithm.

3.4: Cooperation

The winning neuron determines the spatial location of a topological neighbourhood of excited neurons which will cooperate. The influence of BMU over its neighbourhood can be determined using the following equation -

\[\begin{equation} influence = exp(-(distance^{2}) / (2 * (radius^{2}))) \end{equation}\] Equation 2: Neighbourhood influence calculation formula

where \(distance\) is the lateral distance between neurons in the grid and \(radius\) is the radius of influence of a BMU over its neighbourhood.

Note: Remember that the lateral distance between neurons in the grid is completely different from the Euclidean distance between weights and input vector. Refer to Figure 5 to understand the difference between the lateral distance and the distance calculated by the discriminant function. All subtractions shown in the figure are vector subtractions.

Figure 5: Lateral distance v/s the distance calculated by the discriminant function

# 3.4.1) Defining a function to calculate the neighbourhood influence using lateral distance and radius.
influence_calculation <- function(distance, radius) {
  ret <- exp(-(distance^2) / (2 * (radius^2)))
  return(ret)
}

# Running the function over a sample input.
lateral_distance <- 2
radius <- 4
influence_calculation(lateral_distance,radius)

## [1] 0.8824969

Understanding the difference between the lateral distance and the distance calculated by the discriminant function was one of the challenges faced while implementing the SOM algorithm. Without clear understanding regarding the distinction between both the distances, it was not possible to understand how the SOM learns to represent a highly dimensional input data with a simple 2D grid of neurons.

3.5: Adaptation

Weights are adjusted with respect to the winning neuron such that its response towards the subsequent application of a similar input pattern is enhanced.

\[\begin{equation} new\_radius = radius * exp(-current\_iteration / time\_constant) \end{equation}\] Equation 3: Radius decay formula
where \(radius\) is the initial radius of neighbourhood, \(current\_iteration\) is the iteration of data sampling that the program is currently on and \(time\_constant\) is the constant value of time which is incremented at each iteration when the SOM gets updated.

# 3.5.1) Function for the decaying radius for a given iteration "current_iteration".
decay_radius_function <- function(radius, current_iteration, time_constant) {
  ret <- radius * exp(-current_iteration / time_constant)
  return(ret)
}

# Calculate radius of neighbourhood with the time constant 4, with initial radius 3 and at the 4th iteration.
time_constant <- 4
initial_radius <- 3
iteration_number <- 4
decay_radius_function(initial_radius,iteration_number,time_constant)

## [1] 1.103638

\[\begin{equation} new\_learning\_rate = learning\_rate * exp(-current\_iteration / n\_iteration) \end{equation}\] Equation 4: Learning rate decay formula

where \(learning\_rate\) is the old learning rate to be updated, \(current\_iteration\) is the iteration of data sampling that the program is currently on and \(n\_iteration\) is the total number of iterations the SOM is trained over.

# 3.5.2) Function for the decaying learning rate.
decay_learning_rate <- function(learning_rate, current_iteration, n_iteration) {
  ret <- learning_rate * exp(-current_iteration / n_iteration)
  return(ret)
}

# Calculating the learning rate of model at the 3rd iteration out of a total of 100 iterations and initial learning rate of 0.1.
initial_learning_rate <- 0.1
current_iteration <- 3
total_iterations <- 100
decay_learning_rate(initial_learning_rate,current_iteration,total_iterations)

## [1] 0.09704455

This step is one of the most important step of the SOM algorithm. The algorithm works such that after a few iterations only the winning neuron needs to be updated. The decay occurs exponentially. The decaying radius allows SOM to represent data points of similar magnitude across different locations in the input space. Hence, it allows the map to get more precise with each iteration so that the neurons distinctly represent input data patterns. The decaying learning rate assists in optimization.

Chapter 4: Implementation

4.1: Loading the dataset

In this module, we will demonstrate the working of SOM over a sample dataset [6] of 3 dimensions. The dataset consists of admission parameters with the target variable “admit” indicating whether a student gets admitted or not. We shall load this dataset [6] from the working directory after importinng all the necessary libraries.

Code Snippet

set.seed(222)
library(dplyr)

# 4.1.1) Reading the data and scaling it.
data <- read.csv("Resources/binary.csv", header = T)
X <- scale(data[, -1])
data <- X

4.2: Initialization

The SOM is basically a grid of neurons where each neuron contains a weight vector at a position (i,j) in the grid. We begin by assigning random values for the initial weight vectors w. The dimensions of the weight vector are equal to those of the input data.

Figure 6: Weights matrix

The SOM grid consists of neurons containing p weight vectors, where p is the number of variables in the input dataset. This step results into a 1D grid of \(n^2\) rows, a flattened version of the original \(n*n\) SOM weight matrix. Each neuron contains randomly initialized weight value that will later get modified by the SOM operations.

Code Snippet

# 4.2.1) Initializing the weights of the neural network by creating a 3x3 grid in accordance with the input dataset.

# -----------------------------------------------------
# This is Step 1 of the Algorithm: Initialization
# -----------------------------------------------------

create_grid <- function(n,p) {
  ret <- matrix(data = rnorm(n *n * p), nrow = (n*n), ncol = p)
  return(ret)
}
grid <- create_grid(3,3)
grid

##               [,1]         [,2]       [,3]
##  [1,]  1.487757090 -1.201023506  0.5194904
##  [2,] -0.001891901  1.052458495 -0.7462955
##  [3,]  1.381020790 -1.305063566  0.7264546
##  [4,] -0.380213631 -0.692607634  0.7136567
##  [5,]  0.184136230  0.602648854 -0.6500629
##  [6,] -0.246895883 -0.197753074  1.4986962
##  [7,] -1.215560910 -1.185874517 -1.4358281
##  [8,]  1.561405098 -2.005512989 -2.1613182
##  [9,]  0.427310197  0.007509885  0.3952199

# -----------------------------------------------------
# This is Step 2 of the Algorithm: Sampling
# -----------------------------------------------------

# NOTE: Kindly refer to the Step 5 of the algorithm.

4.3: Best Matching Unit (BMU)

The SOM uses competitive learning to select the best matching unit at each iteration by determining the discriminant function value closest to the randomly sampled input vector. Following is the initial implementation of step 3 without any optimization for faster processing.

Code Snippet

# -----------------------------------------------------
# This is Step 3 of the Algorithm: Competition
# -----------------------------------------------------

# 4.3.1) Euclidean distance function for finding BMU.
euclidean_distance <- function(x, y) {
  ret <- sqrt(sum((x - y)^2))
  return(ret)
}

# 4.3.2) BMU function to obtain winning neuron.
BMU <- function(x, input_grid) { 
  distance <- 0
  min_distance <- Inf 
  min_ind <- -1 
  # Iterating through grid.
  for (e in 1:nrow(input_grid))
  {
    # Distance calculation.
    distance <- euclidean_distance(x, input_grid[e, ]) 
    if (distance < min_distance) {
      # Updating minimum distance.
      min_distance <- distance 
      # Updating winning neuron.
      min_ind <- e 
    }
  }
  # Returns index of BMU.
  return(min_ind-1) 
}

4.4: Training the SOM

The SOM follows the algorithm mentioned above to fit the training data till the map stops changing or in other words till the model converges. For each epoch, the algorithm randomly picks a data point from the dataset and finds the winning neuron. It then computes the neighbourhood using the radius function and updates the weights according to the value of influence and the given learning rate. The algorithm terminates when there is no change between the previous set of weights and the new ones generated after a single iteration.

Code Snippet

# -----------------------------------------------------
# This is Step 4 of the Algorithm: Cooperation
# -----------------------------------------------------

# 4.4.1) Function to calculate the decaying learning rate at a particular iteration.
decay_learning_rate <- function(learning_rate, current_iteration, n_iteration) {
  ret <- learning_rate * exp(-current_iteration / n_iteration)
  return(ret)
}

# -----------------------------------------------------
# This is Step 5 of the Algorithm: Adaptation
# -----------------------------------------------------

# 4.4.2) Function to calculate the decaying radius at a particular iteration.
decay_radius_function <- function(radius, current_iteration, time_constant) {
  ret <- radius * exp(-current_iteration / time_constant)
  return(ret)
}

# 4.4.3) Function to calculate influence of BMU over neighbouring neurons.
influence_calculation <- function(distance, radius) {
  ret <- exp(-(distance^2) / (2 * (radius^2)))
  return(ret)
}

# 4.4.4) Creating a function to train the SOM.
SOM <- function(x, input_grid) {
  # Defining the training parameters.
  # 1) Number of iterations.
  n_iteration <- 400 
  # 2) Initial learning rate.
  initial_learning_rate <- 0.05 
  # 3) Initial radius.
  initial_radius <- 3 
  # 4) Value for time constant.
  time_constant <- n_iteration / log(initial_radius) 
  # 5) Physical locations of neurons to figure out lateral distance.
  lateral_distance_points <- expand.grid(1:sqrt(nrow(input_grid)),1:sqrt(nrow(input_grid)))
  # Taking the square root of the number of rows of the square grid.
  rows <- sqrt(nrow(input_grid)) 
  # 6) Number of epochs.
  n_epochs <- 10 
  for(ne in 1:n_epochs)
  {
    print(ne)
    old_grid <- input_grid
    # Looping through for training.
    for (i in 1:n_iteration) 
    {
      
      # -----------------------------------------------------
      # This is Step 2 of the Algorithm: Sampling
      # -----------------------------------------------------

      # Selecting random input row from the input dataset.
      sample_input_row <- as.vector(unlist(x[sample(1:nrow(x), size = 1, replace = F), ])) 
      # Decaying the radius.
      new_radius <- decay_radius_function(initial_radius, i, time_constant) 
      # Decaying the learning rate.
      new_learning_rate <- max(decay_learning_rate(initial_learning_rate, i, n_iteration), 0.01) 
       # Finding BMU for the given input row.
      index_temp <- BMU(sample_input_row, input_grid)
      # Converting a 1D co-ordinate to a 2D co-ordinate to find the lateral distance on the map.
      index_new <- c((as.integer(index_temp/rows))+1,(index_temp%%rows)+1) 
      # Finding euclidean distance between the given BMU and all other units on the map.
      lateral_distance <- sqrt(rowSums(sweep(lateral_distance_points,2,index_new)^2)) 
      # Finding neurons that are within the radius of the winning unit.
      rn <- which(lateral_distance<=new_radius)
      # Calculating the influence of the winning neuron on neighbours.
      inf <- influence_calculation(lateral_distance[rn],new_radius) 
      # Matrix to stores the difference between a data point and the weights of the winning neuron & neighbours.
      diff_grid <- (sweep(input_grid[rn,],2,sample_input_row))*-1
      # Updating the winning and neighbouring neurons.
      updated_weights <- new_learning_rate*inf*diff_grid 
      # Updating grid entries that are either the winning neuron or its neighbours.
      input_grid[rn,]=input_grid[rn,]+updated_weights
      if(isTRUE(all.equal(old_grid,input_grid)))
      {
        print(i)
        print("Converged")
      }
    }
  }
  # Returning the updated SOM weights.
  return(input_grid) 
}
# Finding the training time.
start <- Sys.time()
gridSOM=SOM(data,grid)

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

end <- Sys.time()
gridSOM

##              [,1]       [,2]         [,3]
##  [1,]  0.83783459  0.7460597 -0.120488929
##  [2,]  0.68457691  0.3751961 -0.062564377
##  [3,]  0.08935171 -0.1923388 -0.204433588
##  [4,]  0.63571536  0.3166299 -0.086911459
##  [5,]  0.30964489 -0.2942021  0.196228014
##  [6,] -0.36909612 -0.7618715  0.048797707
##  [7,]  0.09789500 -0.1890788 -0.202367096
##  [8,] -0.32060957 -0.7080236  0.003513688
##  [9,] -0.77499625 -0.9723701  0.021289937

time_taken <- end - start
print(time_taken)

## Time difference of 4.022394 secs

The following chunk of code converts the \(n\) dimensional vectors returned by the SOM function to a 2D grid for graphical representation. The idea behind the code is to convert an n-dimensional vector into a color value and then display that color on a 2D grid. A question that arises here is what must be the range of those colors, so a solution was to select all the colors that a human eye could see, i.e., all the electromagnetic radiation wavelengths from 360 nm to 720 nm. Thus each vector must be converted to a single number between 360 and 720. An R package named photobiology [7] was used to convert wavelength value into a hex color value. Also, as the SOM function returns all weight vectors in a linear array, therefore each row vector is replaced by a single number resulting into a linear array of numbers. Later, this array is mapped onto a 2D plane and each element of the plane contained a color corresponding to the associated number. But this approach had an issue. The numbers between 360 & 400 and also those between 700 & 720 resulted in colors which were hard to distinguish from each other. Hence, after some trial and error the lower wavelength limit was set to 400 and upper wavelength limit was set to 700.

library(photobiology)

# 4.4.5) Function to visualize the grid of neurons.
drawGrid <- function(weight,dimension){
  weight <- as.matrix(weight, ncol = ncol(weight))
  norm.matrix<-NULL
  for(i in 1:length(weight[,1])){
    a<-norm(weight[i,], type = "2")
    norm.matrix <- rbind(norm.matrix,a)
  }
  
  # Mapping to the numbers between 400 and 700.
  input_start <- min(norm.matrix)
  input_end <- max(norm.matrix)
  output_start <- 400
  output_end <- 700
  
  # Calculating wavelength.
  color<-NULL
  for(i in 1:length(norm.matrix)){
    input = norm.matrix[i]
    output = output_start + ((output_end - output_start) / (input_end - input_start)) * (input - input_start)
    color <- rbind(color,output)
  }
  
  # Getting colors (hex values) from the wavelength.
  color.rgb <- w_length2rgb(color)
  
  # Plotting the grid.
  dim <- max(dimension)+1
  plot(1:dim, type = "n")
  for (i in 1:dimension[1]) {
    for(j in 1:dimension[2]){
      rect(i,j,i+1,j+1, col = color.rgb[i*dimension[1]+j - dimension[1]])
    }
  }
} 
gridSOM=matrix(unlist(gridSOM),ncol=3)
drawGrid(gridSOM,c(3,3))

Chapter 5: Optimization in R

The base algorithm took over 40 seconds to run on a dataset of just 400 rows. Thus, efforts were made to optimize the algorithm while retaining its fundamental logic. Mainly the following two operations were performed to optimize the code -

Eliminating as many for loops as possible.
Converting every potential operation to a matrix operation.

With the help of the R profiler, which was implemented using the profvis() function of profvis package [8], it was determined that the BMU and SOM functions had the highest time complexity. Finally, the following changes were made -

BMU: The entire function was reduced to just 3 lines of code. In the 1st line the sweep() function was used to find the euclidean distance between every neuron in the grid and the given data point. The 2nd line contained the which.min() function to find the winning neuron in the grid. The final line of code returned the index of the winning neuron.
SOM: A similar approach was applied to the SOM function. The lateral distances from the winning neuron were computed using the sweep() function. The weights were updated by first finding the required indices using the which() function and then applying matrix operation instead of a for loop.

Finally, the entire algorithm sped up by approximately 10 times.

BMU_Vectorised <- function(x, input_grid) { 
  # 5.1) Calculating the distance of the row "x" from all the neurons using matrix operations.
  dist_mtrx=rowSums(sweep(input_grid,2,x)^2)
  # 5.2) Finding the location of the neuron associated with the minimum distance.
  min_ind=which.min(dist_mtrx)
  # 5.3) Returning the zero-indexed value of the winning neuron.
  return (min_ind-1) 
}

# 5.4) Faster BMU implementation.
SOM <- function(x, input_grid) {
  n_iteration <- 400 
  initial_learning_rate <- 0.05 
  initial_radius <- 3 
  time_constant <- n_iteration / log(initial_radius) 
  lateral_distance_points=expand.grid(1:sqrt(nrow(input_grid)),1:sqrt(nrow(input_grid)))
  rows=sqrt(nrow(input_grid)) 
  n_epochs=10 
  for(ne in 1:n_epochs)
  {
    print(ne)
    old_grid=input_grid
    for (i in 1:n_iteration) 
    {
      sample_input_row <- as.vector(unlist(x[sample(1:nrow(x), size = 1, replace = F), ])) 
      new_radius <- decay_radius_function(initial_radius, i, time_constant) 
      new_learning_rate <- max(decay_learning_rate(initial_learning_rate, i, n_iteration), 0.01) 
      index_temp <- BMU_Vectorised(sample_input_row, input_grid)
      index_new=c((as.integer(index_temp/rows))+1,(index_temp%%rows)+1) 
      lateral_distance=sqrt(rowSums(sweep(lateral_distance_points,2,index_new)^2))
      rn=which(lateral_distance<=new_radius) 
      inf=influence_calculation(lateral_distance[rn],new_radius) 
      diff_grid=(sweep(input_grid[rn,],2,sample_input_row))*-1 
      updated_weights=new_learning_rate*inf*diff_grid 
      input_grid[rn,]=input_grid[rn,]+updated_weights 
      if(isTRUE(all.equal(old_grid,input_grid)))
      {
        print(i)
        print("Converged")
      }
    }
  }
  return(input_grid) 
}
start <- Sys.time()
gridSOM=SOM(data,grid)

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

end <- Sys.time()
gridSOM

##              [,1]       [,2]         [,3]
##  [1,]  0.93563378  0.5028079 -0.432887318
##  [2,]  0.61205256  0.4273599 -0.249594103
##  [3,] -0.09235935  0.1561476 -0.003106676
##  [4,]  0.59269828  0.4116258 -0.247425108
##  [5,]  0.07422356  0.1578972  0.142911783
##  [6,] -0.49782169 -0.3827718  0.367834940
##  [7,] -0.03019991  0.1661333 -0.082931683
##  [8,] -0.49455171 -0.3881704  0.363960276
##  [9,] -0.70956158 -0.7723888  0.488078157

time_taken <- end - start
print(time_taken)

## Time difference of 4.400472 secs

Chapter 6: Running the model over multiple epochs

The SOM implementation was first developed only for a single epoch. However, the visualization showed that it did not provide accurate clusters and we believed that further training of the model was required. But how much training would be sufficient for convergence? That was an unsolved problem. To solve it, we measured the progress of the SOM by finding the difference in the map weights of consecutive iterations and plotting the difference against time. We observed that as the map learned the representation better over time, the difference between map weights gradually shrank and eventually reached zero, resulting in convergence. Thus the convergence criterion was obtained. The difference plot for the admissions data can be generated by executing the code below.

# 6.1) Training the model over multiple epochs and observing the process to find the iteration of convergence.
SOM <- function(x, input_grid) {
  breaker <- 0
  n_iteration <- nrow(x) 
  initial_learning_rate <- 0.05 
  initial_radius <- 3 
  time_constant <- n_iteration / log(initial_radius)
  lateral_distance_points=expand.grid(1:sqrt(nrow(input_grid)),1:sqrt(nrow(input_grid)))
  rows=sqrt(nrow(input_grid))
  n_epochs=40 
  new_radius <- initial_radius
  l <- c()
  for(ne in 1:n_epochs)
  {
    extra <- ((ne-1)*400)
    for (i in 1:n_iteration)
    {
      old_grid=input_grid
      curr_i <- extra + i
      sample_input_row <- as.vector(unlist(x[sample(1:nrow(x), size = 1, replace = F), ])) 
      new_radius <- decay_radius_function(initial_radius, curr_i, time_constant) 
      new_learning_rate <- decay_learning_rate(initial_learning_rate,curr_i, n_iteration)
      index_temp <- BMU_Vectorised(sample_input_row, input_grid) 
      index_new=c((as.integer(index_temp/rows)+1),(index_temp%%rows)+1) 
      lateral_distance=sqrt(abs(rowSums(sweep(lateral_distance_points,2,index_new)^2))) 
      rn=which(lateral_distance<=new_radius) 
      inf=influence_calculation(lateral_distance[rn],new_radius)
      if(length(rn)!=1) 
      {
        diff_grid=(sweep(input_grid[rn,],2,sample_input_row))*-1 
        updated_weights=new_learning_rate*inf*diff_grid
        input_grid[rn,]=input_grid[rn,]+updated_weights 
      }
      else 
      {
        diff_row=(input_grid[rn,]-sample_input_row)*-1 
        updated_weights=new_learning_rate*inf*diff_row 
        input_grid[rn,]=input_grid[rn,]+updated_weights 
      }
      l <- c(l,euclidean_distance(old_grid,input_grid))
      if(isTRUE(all.equal(old_grid,input_grid)))
      {
        print(curr_i)
        print("Converged")
        breaker <- 1
        break
      }
    }
    if(breaker ==1)
    {
      break
    }
  }
  return(list(input_grid,l)) 
}
y <- SOM(data,grid)

## [1] 5520
## [1] "Converged"

gridSOM <- y[1]
gridSOM

## [[1]]
##              [,1]       [,2]       [,3]
##  [1,]  1.00508462  1.0722503  0.1352241
##  [2,]  0.46464851  0.7124700 -0.6238244
##  [3,]  0.09451207 -0.3290118 -0.6349856
##  [4,]  0.46464852  0.7124700 -0.6238244
##  [5,] -0.20563534  0.3614995  0.0752349
##  [6,] -0.61770607 -0.6731773 -0.1049023
##  [7,]  0.09451208 -0.3290117 -0.6349856
##  [8,] -0.61770605 -0.6731773 -0.1049022
##  [9,] -0.77777217 -0.7624095  1.0044831

l <- y[2]

t=1:length(l[[1]])
plot(t,l[[1]])

The above plot shows the decay in learning rate. The points on the graph represent the difference in weights of the neural network between consecutive training iterations. Initially there is a steep curve in the plot that indicates the rapidness in learning. As the training of the SOM progresses, the neighbourhood radii decreases and the map fixates on finer details. It resulted in the decrease in rapidness of learning.

Chapter 7: Conclusion

SOM successfully got implemented and visualized in R. The code was later optimized for faster execution. For testing, the algorithm was applied over the UCLA Graduate School Admissions data [6] and the final grid of colors obtained from the algorithm indicated the presence of 3 to 4 clusters. Several hurdles were faced while understanding and implementing the algorithm, which were overcome by reading literature on SOM and studying the existing SOM code libraries present online. The inspiration for this project was taken from the original Kohonen map and various implementations of SOM from other programming languages. In this module, the theoretical concepts are explained using graphics to assist the readers of this open-source material.

References

[1] Kohonen, T. 2012. Self-organizing maps. Springer Science & Business Media.
[2] Self Organizing Map Tutorial System by Jae-Wook Ahn and Sue Yeon Syn.
http://www.pitt.edu/is2470pb/Spring05/FinalProjects/Group1a/tutorial/som.html
[3] Self Organizing Maps: Fundamentals. https://www.cs.bham.ac.uk/jxb/NN/l16.pdf
[4] Self-Organizing Map (SOM) by Jaakko Hollmen. https://users.ics.aalto.fi/jhollmen/dippa/node20.html
[5] Self-organizing Maps by Kevin Pang. https://www.cs.hmc.edu/kpang/nn/som.html
[6] UCLA Graduate School Admissions Data. https://stats.idre.ucla.edu/stat/data/binary.csv
[7] Aphalo, Pedro J. (2015) The r4photobiology suite. UV4Plants Bulletin, 2015:1, 21-29. DOI: 10.19232/uv4pb.2015.1.14
[8] Winston Chang, Javier Luraschi and Timothy Mastny (2020). profvis: Interactive Visualizations for Profiling R Code. R package version 0.3.7. https://CRAN.R-project.org/package=profvis

Module 1

Introduction to Self-Organizing Map

Table of Contents

Chapter 1: Introduction to SOM

Learning Mechanism

Chapter 2: Algorithm

Steps:

Chapter 3: Guided Tutorial in R

3.1: Initialization

3.2: Sampling

3.3: Competition

3.4: Cooperation

3.5: Adaptation

Chapter 4: Implementation

4.1: Loading the dataset

4.2: Initialization

4.3: Best Matching Unit (BMU)

4.4: Training the SOM

Chapter 5: Optimization in R

Chapter 6: Running the model over multiple epochs

Chapter 7: Conclusion

References