Open Source For Geeks: March 2014

Wednesday, 26 March 2014

Find the Number Occurring Odd Number of Times

Question :

You are given an array of integers. All integers in the array appear exactly twice except one integer. Your goal is to find and return that integer. For example if you have an array 1,2,3,4,4,3,2,1,7,9,9 you have to return number 7.

Approach :

When I first encountered this question in an interview I said we can use a HashMap. Iterate the array and store the integers in the Map with key as the integer itself and value as the count of number of times the integer occurs in the array. Then simply iterate the HashMap and return the key with value equal to one.

Time Complexity -> O(N)

Space Complexity -> O(N)

But there is a better approach - one that involves no extra space.

Do bitwise XOR of all the elements. Finally we get the number which has odd occurrences i.e 1 in our case.

Later I found out that it is very common question asked. The questions is

Given an array of positive integers. All numbers occur even number of times except one number which occurs odd number of times. Find the number in O(n) time & constant space.(GeeeksForGeeks)

You can see the simple C solution in above link. Below code is for the original question I encountered. So simply xor all the values and return the result.

Code :

package Arrays;

/**
 * Created by Aniket on 3/26/14.
 */
public class SingleCountElementFinder {

    public static int returnNumber(int[] array){

        int no = array[0];

        for(int i=1;i<array.length;i++){
            no = no ^ array[i];
        }

        return no;

    }


    public static void main(String args[]){

        int[] array = new int[]{1,2,3,4,4,3,2,1,7,9,9};
        System.out.println("Single occurring element : " + SingleCountElementFinder.returnNumber(array));

    }

}

Output :

Single occurring element : 7

NOTE : If you did not know already XOR operation return true if the inputs are different and false if they are same. So if you XOR a number with itself you will get back 0. And if you XOR any number with 0 you get back that number. So in above question each pair of same numbers when XORed gives 0 and finally when XORed with a single instance of the number gives back that number.

Sunday, 16 March 2014

Processes and Threads in Linux

Linux Processes

In a very basic form, Linux process can be visualized as running instance of a program. For example, just open a text editor on your Linux box and a text editor process will be born.

First command (gedit &) opens gedit window while second ps command (ps -aef | grep gedit) checks if there is an associated process(ps command gives running processes output of which is piped to grep command which searches the required token i.e gedit in iur case). In the result you can see that there is a process associated with gedit.

You can see two entries corresponding to search of word gedit in processes. Each process in Linux is associated with a unique PID(process ID). You can see the output in the screenshot above number in 2nd column is the PID of the process. SO gedit has a pid of 2343. So whats 2039 ? Is is called PPID(Parents Process ID). We have run the gedit command/process in a terminal instance. Hence the terminal forms the parent of all the processes that we run via that terminal and gedit being one of them. So how do we verify that 2039 is indeed the PID of parent terminal process. To find the PID of the terminal you can simply type ps in your terminal.

You can see 2039 PID corresponding to bash process which is our terminal. I use bash but you may be using other shells like ksh, csh etc. To find which shell you are using you can refer to one of my earlier posts

How to find current Shell in Linux?

How PIDs are assigned to process?

As per the Wiki

Under Unix, process IDs are usually allocated on a sequential basis, beginning at 0 and rising to a maximum value which varies from system to system. Once this limit is reached, allocation restarts at zero and again increases. However, for this and subsequent passes any PIDs still assigned to processes are skipped.

But there is a small update in above. For user processes PIDs to be assigned generally start from a number RESERVED_PIDS and go till PID_MAX_DEFAULT. PIDs from 1 to RESERVED_PIDS are reserved for kernel processes. Also know that these numbers can be configured.

Processes have priority based on which kernel context switches them. A process can be pre-empted if a process with higher priority is ready to be executed.

For example, if a process is waiting for a system resource like some text from text file kept on disk then kernel can schedule a higher priority process and get back to the waiting process when data is available. This keeps the ball rolling for an operating system as a whole and gives user a feeling that tasks are being run in parallel.
Processes can talk to other processes using Inter process communication methods and can share data using techniques like shared memory.

How processes are created in Linux?

In Linux, fork() is used to create new processes. These new processes are called as child processes and each child process initially shares all the segments like text, stack, heap etc until child tries to make any change to stack or heap. In case of any change, a separate copy of stack and heap segments are prepared for child so that changes remain child specific. The text segment is read-only so both parent and child share the same text segment. C fork function article explains more about fork().

Step By Step

The fork ( ) system call does the following in a UNIX system

Allocates slot in the process table for the new process.
Assigns a unique process id to the new process.
Make a copy of the process image of the parent, with the exception of shared memory.
Increases counters for any files owned by the parent, to reflect that an additional process now also owns these files.
Assigns the child process to a ready to run state.
Returns the Process ID number (PID) of the child to the parent process and a 0 value to the child process.

Note : All these works is done in Kernel space of parent process.

Above diagram shows the process table and how each entry in it points to a process image.

A Process image consists of

User Data
User program
System Stack(Kernel space).
Process control block (PCB) containing process attributes.

PCB looks like following

It has the process state(Eb. ready to run, sleeping, preempted etc), process number or PID which we talked about earlier, registers, PC, File descriptors etc.

Process States

From forking(birth) of a process to it's end(resources being freed up and entry removed from process table), a process goes through various states. Below diagram shows the state chart of a process in UNIX.

Threads in Linux

Threads in Linux are nothing but a flow of execution of the process. A process containing multiple execution flows is known as multi-threaded process.

For a non multi-threaded process there is only execution flow that is the main execution flow and hence it is also known as single threaded process. For Linux kernel , there is no concept of thread. Each thread is viewed by kernel as a separate process but these processes are somewhat different from other normal processes. I will explain the difference in following paragraphs.

Threads are often mixed with the term Light Weight Processes or LWPs. The reason dates back to those times when Linux supported threads at user level only. This means that even a multi-threaded application was viewed by kernel as a single process only. This posed big challenges for the library that managed these user level threads because it had to take care of cases that a thread execution did not hinder if any other thread issued a blocking call.

Later on the implementation changed and processes were attached to each thread so that kernel can take care of them. But, as discussed earlier, Linux kernel does not see them as threads, each thread is viewed as a process inside kernel. These processes are known as light weight processes.

The main difference between a light weight process (LWP) and a normal process is that LWPs share same address space and other resources like open files etc. As some resources are shared so these processes are considered to be light weight as compared to other normal processes and hence the name light weight processes.

So, effectively we can say that threads and light weight processes are same. It’s just that thread is a term that is used at user level while light weight process is a term used at kernel level.

From implementation point of view, threads are created using functions exposed by POSIX compliant pthread library in Linux. Internally, the clone() function is used to create a normal as well as alight weight process. This means that to create a normal process fork() is used that further calls clone() with appropriate arguments while to create a thread or LWP, a function from pthread library calls clone() with relevant flags. So, the main difference is generated by using different flags that can be passed to clone() function.

PIDs - User and Kernel View

In the kernel, each thread has it's own ID, called a PID (although it would possibly make more sense to call this a TID) and they also have a TGID (thread group ID) which is the PID of the thread that started the whole process.

Simplistically, when a new process is created, it appears as a thread where both the PID and TGID are the same (new) number.

When a thread starts another thread, that started thread gets its own PID (so the scheduler can schedule it independently) but it inherits its TGID from the thread that created it.

That way, the kernel can happily schedule threads independent of what process they belong to, while processes (thread group IDs) are reported to you.

Fer example refer to following diagram

You can see that starting a new process gives you a new PID and a new TGID (both set to the same value), while starting a new thread gives you a new PID while maintaining the same TGID as the thread that started it.

PS : I picked up some basic knowledge from thegeekstuff and added some extra points and diagrams to make it more easily understandable.

Friday, 14 March 2014

Iterative binary tree traversal

Background

In last post Binary Tree Traversal we saw recursive method to print the tree. We saw DFS(Depth first search) approach which included pre order, post order and in order traversal and we also saw BFS(Breath first search) approach which includes level order traversal.

In this post we will see an iterative way of implementing the DFS approach. Implementation is very simple and uses stack data structure.

Code :

package Tree;

import java.util.Stack;

/**
 * Created by Aniket on 3/14/14.
 */
public class IterativeTreePrinter {

    public static void printIterativePreOrderTraversal(TreeNode root){

        Stack<TreeNode> stack = new Stack<TreeNode>();

        while(root != null){
            System.out.println("Date : " + root.getData());
            if(root.getRightNode() != null){
                stack.push(root.getRightNode());
            }
            if(root.getLeftNode() != null){
                stack.push(root.getLeftNode());
            }

            if(!stack.isEmpty()){
                root = stack.pop();
            }
            else {
                root = null;
            }
        }
    }


    public static void printIterativeInOrderTraversal(TreeNode root){

        Stack<TreeNode> stack = new Stack<TreeNode>();

        while(!stack.isEmpty() || root != null){
            if(root != null){
                stack.push(root);
                root = root.getLeftNode();
            }
            else {
                root = stack.pop();
                System.out.println("Data : " + root.getData());
                root = root.getRightNode();
            }
        }
    }

    public static void printIterativePostOrder(TreeNode root){

        Stack<TreeNode> stack = new Stack<TreeNode>();
        TreeNode peekNode = null;
        TreeNode lastVisitedNode = null;

        while(!stack.isEmpty() || root != null){

            if(root != null){
                stack.push(root);
                root = root.getLeftNode();
            }
            else {
                peekNode = stack.peek();
                if(peekNode.getRightNode() != null && peekNode.getRightNode() != lastVisitedNode){
                    root = peekNode.getRightNode();
                }
                else {
                    stack.pop();
                    System.out.println("Data : " + peekNode.getData());
                    lastVisitedNode = peekNode;
                }

            }


        }
    }

}

Output :

Output consumes lot of line as I am printing each data in a single line. So I am skipping it in this case. I have tested and verified the code. Output is same as that of recursive method. You can check the output provided in the post( Binary Tree Traversal ).

Saturday, 8 March 2014

Whats new in Java7?

I know it kind of late to write this post considering Java 7 which was a major update and made available to developers on July 28, 2011. We have seen a lot of updates and patches for it since then. In fact java 8 is due 18 march 2014. However I though it would be a good idea to list down some of the major features introduced in Java 7.

Strings in switch statement

From java 7 you can use String in the Switch statements. The switch statement compares the String object in its expression with the expressions associated with each case label as if it were using the String.equals method; consequently, the comparison of String objects in switch statements is case sensitive. The Java compiler generates generally more efficient bytecode from switch statements that use String objects than from chained if-then-else statements.(More on Switch statement)

Example

    public static void main(String args[]){

        System.out.println("Enter your country");
        Scanner scanner = new Scanner(System.in);
        String input = scanner.nextLine();

        switch(input){

            case "USA" :
                System.out.println("You are from USA");
                break;
            case "India" :
                System.out.println("You are from India");
                break;
            default :
                System.out.println("You are from " + input);
        }
    }

The try-with-resources StatementThe try-with-resources statement is a try statement that declares one or more resources. A resource is as an object that must be closed after the program is finished with it. The try-with-resources statement ensures that each resource is closed at the end of the statement. Any object that implements java.lang.AutoCloseable, which includes all objects which implement java.io.Closeable, can be used as a resource.

Prior to java 7 when this feature was not available programmers use to close the resources in the finally statement . For Example lets say you need to write a function that takes file path as an argument and return first line from that file. Prior to Java 7 you would do
```
static String readFirstLineFromFileWithFinallyBlock(String path) throws IOException {
  BufferedReader br = new BufferedReader(new FileReader(path));
  try {
    return br.readLine();
  } finally {
    if (br != null) br.close();
  }
}
```
but now you can do
```
static String readFirstLineFromFile(String path) throws IOException {
  try (BufferedReader br = new BufferedReader(new FileReader(path))) {
    return br.readLine();
  }
}
```
Simple and elegant!

Note: A try-with-resources statement can have catch and finally blocks just like an ordinary try statement. In a try-with-resources statement, any catch or finally block is run after the resources declared have been closed.

Note : The close methods of resources are called in the opposite order of their creation.

Note : In a simple try catch statement if try block (or catch block) throws an Exception and finally block throws an Exception then the Exception from try block is ignored and the one from finally block is thrown.

Note : In Java 7 try with resource statements there is a twist. Now even try-with-resource statement can throw exception. Again if exception is thrown by finally statement then that is the one that will suppress all other exception. If that is not the case - Exception is thrown from try-with-resource statement (during close) and try block then the one from try block get precedence. However of exception is thrown at initialization (in try-with-res block) try block will not even execute.
Catching Multiple Exception Types

In Java SE 7 and later, a single catch block can handle more than one type of exception. This feature can reduce code duplication and lessen the temptation to catch an overly broad exception.

Consider the following example, which contains duplicate code in each of the catch blocks:
```
catch (IOException ex) {
     logger.log(ex);
     throw ex;
catch (SQLException ex) {
     logger.log(ex);
     throw ex;
}
```
The following example, which is valid in Java SE 7 and later, eliminates the duplicated code:
```
catch (IOException|SQLException ex) {
    logger.log(ex);
    throw ex;
}
```
The catch clause specifies the types of exceptions that the block can handle, and each exception type is separated with a vertical bar (|).

Note: If a catch block handles more than one exception type, then the catch parameter is implicitly final. In this example, the catch parameter ex is final and therefore you cannot assign any values to it within the catch block.

Note : In a multi-catch block, you cannot combine catch handlers for two exceptions that share a base- and derived-class relationship. You can only combine catch handlers for exceptions that do not share the parent-child relationship between them.

Bytecode generated by compiling a catch block that handles multiple exception types will be smaller (and thus superior) than compiling many catch blocks that handle only one exception type each. A catch block that handles multiple exception types creates no duplication in the bytecode generated by the compiler; the bytecode has no replication of exception handlers.

More details on this part.
JDBC

JDBC 4 that comes with Java 7 has following features
- You no longer have to explicitly load the driver class using Class.ForName(driver). From JDBC 4 driver class is automatically loaded from the class path.
- Another addition is the ability to use a try-with-resources statement to automatically close resources of type Connection, ResultSet, and Statement.
- There is also introduction of the RowSetFactory interface and the RowSetProvider class, which enable you to create all types of row sets supported by your JDBC driver.
Interned Strings are allocated in heap area rather that permgen area

In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
Garbage-First Collector(G1)

The Garbage-First (G1) garbage collector is fully supported in Oracle JDK 7 update 4 and later releases. The G1 collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with high probability, while achieving high throughput. Whole-heap operations, such as global marking, are performed concurrently with the application threads. This prevents interruptions proportional to heap or live-data size.
Java File I/O (NIO.2)

In Java 7 new set of I/O APIs were introduced called NIO.2. You can read that in the detailed post - Java File I/O (NIO.2)

Important Links

Monday, 3 March 2014

HeapSort in Java

Heap Sort sorts an array in place with time complexity O(N log N). Following is the code to do the same

Binary Heap

A binary heap has two following two properties

Structural property : All levels except the last are full. Last level is left filled.
Heap property : Priority of node is at least as large as it's parent (This is true for min heap but similarly you can have max heap where priority of parent is at least as large as it's children)

Code :

package Sorts;

import java.util.Arrays;

/**
 * Created by Aniket on 3/3/14.
 */
public class HeapSort {

    public int getLeft(int i){
        return 2*i;
    }

    public int getRight(int i){
        return 2*i+1;
    }

    public int getParent(int i){
        return i/2;
    }

    public void maxHeapify(int[]  array, int position){

        int leftPosition = getLeft(position);
        int rightPosition = getRight(position);

        int largest = 0;

        if(leftPosition <= array.length && array[leftPosition-1] > array[position-1]){
            largest = leftPosition;
        }
        else {
            largest = position;
        }
        if(rightPosition <= array.length && array[rightPosition-1] > array[largest-1]){
            largest = rightPosition;
        }

        if(largest != position){
            swap(array,position-1,largest-1);
            maxHeapify(array, largest);
        }
    }

    public void maxHeapifyForSort(int[]  array, int position, int maxPos){

        int leftPosition = getLeft(position);
        int rightPosition = getRight(position);

        int largest = 0;

        if(leftPosition <= maxPos && array[leftPosition-1] > array[position-1]){
            largest = leftPosition;
        }
        else {
            largest = position;
        }
        if(rightPosition <= maxPos && array[rightPosition-1] > array[largest-1]){
            largest = rightPosition;
        }

        if(largest != position){
            swap(array,position-1,largest-1);
            maxHeapifyForSort(array, largest,maxPos);
        }
    }

    public void buildMaxHeap(int[] array){

        for(int i=(array.length)/2;i>0;i--){
            maxHeapify(array,i);
        }

    }

    public void swap(int[] array, int i, int j){
        int temp = array[i];
        array[i] = array[j];
        array[j] = temp;
    }

    public void heapSort(int[] array){
        buildMaxHeap(array);
        for(int i=array.length;i>0;i--){
            swap(array,0,i-1);
            maxHeapifyForSort(array,1,i-1);
        }
    }

    public static void main(String args[]){

        int[] array = new int[]{4,1,3,2,16,9,10,14,8,7};
        System.out.println("Original Array : " + Arrays.toString(array));
        //new HeapSort().buildMaxHeap(array);
        new HeapSort().heapSort(array);
        System.out.println("After Sorting Array : " + Arrays.toString(array));


    }

}

Output :

Original Array : [4, 1, 3, 2, 16, 9, 10, 14, 8, 7]
After Sorting Array : [1, 2, 3, 4, 7, 8, 9, 10, 14, 16]

NOTE: I have shown two similar methods here maxHeapify(int[] array, int position)
and maxHeapifyForSort(int[] array, int position, int maxPos) but for sorting you will only need the second one. If your goal is to only create a max heap method one will do just fine. When we go for sort in each iteration we need to decouple indexes from the end so they are the maximum ones and hence we need a limit position to inform heapify function till where it should operate. What heapify does is that each element should have children which are smaller then itself.

NOTE: Heap sort is an in-place algorithm (in-place algorithm is an algorithm which transforms input using no auxiliary data structure).

Applications : Heaps are generally used in priority queues ( min or max heap) so that you can retrieve the min or max element in O(1). Insertion will be log(N). It can be used to store realtime data. Lets say to keep score of top 3 people in a game given their scores are changing in realtime.

Wednesday, 26 March 2014

Find the Number Occurring Odd Number of Times

Question :

Approach :

Code :

Output :

Sunday, 16 March 2014

Processes and Threads in Linux

Linux Processes

How PIDs are assigned to process?

How processes are created in Linux?

Process States

Threads in Linux

PIDs - User and Kernel View

Friday, 14 March 2014

Iterative binary tree traversal

Background

Code :

Output :

Related Links

Saturday, 8 March 2014

Whats new in Java7?

Important Links

Monday, 3 March 2014

HeapSort in Java

Binary Heap

Code :

Output :

Related Links