Open Source For Geeks: Merge Sort in java

Friday, 21 February 2014

Merge Sort in java

Background

Mergesort is a divide and conquer algorithm that was invented by John von Neumann in 1945.

Conceptually it works as follows -

Divide the unsorted list into n sublists, each containing 1 element (a list of 1 element is considered sorted).
Repeatedly merge sublists to produce newly sorted sublists until there is only 1 sublist remaining. This will be the sorted list.

Code :

import java.util.Arrays;

/**
 * Created with IntelliJ IDEA.
 * User: aniket
 * Date: 12/2/14
 * Time: 11:24 AM
 */
public class MergeSort {

    public void sort(int[] array) {
        mergeSort(array,0,array.length-1);

    }

    private void mergeSort(int[] array, int start, int end){

        if(start == end){
            return;
        }
        int mid = (start + end)/2;
        mergeSort(array, start, mid);
        mergeSort(array, mid+1, end);
        merge(array,start,mid,end);

    }

    private void merge(int[] array, int start, int mid, int end){

        int[] leftArray = Arrays.copyOfRange(array,start,mid+1);
        int[] rightArray = Arrays.copyOfRange(array,mid+1,end+1);

        int i = 0;
        int j = 0;

        int counter = start;

        while(i<leftArray.length && j<rightArray.length){
            if(leftArray[i] < rightArray[j]){
                array[counter] = leftArray[i];
                i++;
            }
            else {
                array[counter] = rightArray[j];
                j++;
            }
            counter++;
        }

        if(i == leftArray.length){
            while(j != rightArray.length){
                array[counter] = rightArray[j];
                j++;
                counter++;
            }
        }
        else{//j == rightArray.length
            while(i != leftArray.length){
                array[counter] = leftArray[i];
                i++;
                counter++;
            }

        }

    }

    public static void main(String args[]){

        int[] array = new int[]{2,7,5,9,4,7,1,0};
        System.out.println("Array Before Sort : " + Arrays.toString(array));
        new MergeSort().sort(array);
        System.out.println("Array after Sort : " + Arrays.toString(array));

    }

}

Output :

Array Before Sort : [2, 7, 5, 9, 4, 7, 1, 0]
Array after Sort : [0, 1, 2, 4, 5, 7, 7, 9]

Complexity

In sorting n objects, merge sort has an average and worst-case performance of O(n log n).

You can visualize it with the following diagram -

Handling very large data

If the data set is very large (large that what can fit into RAM/main memory) then above approach will not work. In that case, you need to do what is popularly known as -

External sorting

External merge sort is one such external sorting algorithm

Sort chunks that each fit in RAM, then merges the sorted chunks together.
First divide the file into runs such that the size of a run is small enough to fit into main memory. Then sort each run in main memory using merge sort sorting algorithm.
Finally merge the resulting runs together into successively bigger runs, until the file is sorted.

Example -

For example, for sorting 900 megabytes of data using only 100 megabytes of RAM:

Read 100 MB of the data in main memory and sort by some conventional method, like quicksort.
Write the sorted data to disk.
Repeat steps 1 and 2 until all of the data is in sorted 100 MB chunks (there are 900MB / 100MB = 9 chunks), which now need to be merged into one single output file.
Read the first 10 MB (= 100MB / (9 chunks + 1)) of each sorted chunk into input buffers in main memory and allocate the remaining 10 MB for an output buffer. (In practice, it might provide better performance to make the output buffer larger and the input buffers slightly smaller.)
Perform a 9-way merge and store the result in the output buffer. Whenever the output buffer fills, write it to the final sorted file and empty it. Whenever any of the 9 input buffers empties, fill it with the next 10 MB of its associated 100 MB sorted chunk until no more data from the chunk is available. This is the key step that makes external merge sort work externally -- because the merge algorithm only makes one pass sequentially through each of the chunks, each chunk does not have to be loaded completely; rather, sequential parts of the chunk can be loaded as needed.

You can use the following approach to sort sorted runs -

Merge k sorted arrays