Friday, February 11, 2011

...Types of Sorting and Sorting Algorithms....


Sorting
Sorting is any process of arranging items in some sequence and/or in different sets, and accordingly, it has two common, yet distinct meanings:
  1. Ordering: arranging items of the same kind, class, nature, etc. in some ordered sequence,
  2. Categorizing: grouping and labeling items with similar properties together (by sorts).

Here, I will present four kinds of sorting: Selection Sort, Bubble Sort, Insertion Sort, and Shell Sort.

©      SELECTION SORT
Definition:
            Selection sort is a simple sorting algorithm that develops on the performance of bubble sort. It works by first looking the least element using a linear scan and exchanging it into the first spot in the list, then finding the second smallest element by checking the remaining elements, and so on. Selection sort is exceptional compared to approximately any other algorithm in that its running time is not influenced by the preceding ordering of the list: it executes the same quantity of operations because of its simple structure. Selection sort requires (n -1) swaps and hence O(n) memory writes. Thus selection sort can be incredibly attractive if writes are the most expensive operation, but otherwise it will frequently be outperformed by insertion sort or the more complicated algorithms.
Selection sort is an attempt to localize the exchanges of array elements by finding a misplaced element first and putting it in its final place.

�� the list is divided into two sub lists, sorted and unsorted, which are divided by an imaginary wall.

�� We select the smallest element from the unsorted sub list and swap it with the element at the beginning of the unsorted data.

�� Then the smallest value among the remaining elements is selected and put in the second position and so on.

�� After each selection and swapping, the imaginary wall between the two sub lists move one element ahead, increasing the number of sorted elements and decreasing the number of unsorted ones.

�� Each time we move one element from the unsorted sub list to the sorted sub list, we say that we have completed a sort pass.

�� A list of n elements requires n-1 passes to completely rearrange the data.

  • ·         Time Complexity of Selection Sort
§  The number of comparisons and moves is independent of the data (the initial order of elements doesn’t matter).
§  Therefore, the best, average and worst case time complexities of the Selection Sort are all O(n2).
  • ·         Space Complexity of Selection Sort
§  Besides the collection itself, the only extra storage for this sort is the single temp reference used in the swap method.
§  Therefore, the space complexity of Selection Sort is O(n).


Selection Sort Algorithm  
/* Sorts by selecting smallest element in unsorted
portion of array and exchanging it with element
at the beginning of the unsorted list.
*/
void selectionSort(int list[], int n)
{
int cur, j, smallest, tempData;
for (cur = 0; cur <= n; cur ++){
smallest = cur;
for ( j = cur +1; j <= n ; j++)
if(list[ j ] < list[smallest])
smallest = j ;
// Smallest selected; swap with current element
tempData = list[cur];
list[cur] = list[smallest];
list[smallest] = tempData;
}
}
The outer loop is executed n times, and the inner loop iterates
j = (n –1 – cur) times.
So it is a n times summation of j. Thus the order turns out to be O(n 2).
The number of comparisons remains the same in all cases.
There may be some saving in terms of data movements (swappings).


©      BUBBLE SORT
Imagine all elements are objects of various sizes and are placed in a vertical column. If the objects are allowed to float, then the smallest element will bubble its way to the top. This is how the algorithm gets its name.

            �� the list scanned from the bottom up, and two adjacent elements are interchanged if they are found to be out of order with respect to each other.
Then next pair of adjacent elements are considered and so on.

�� thus the smallest element is bubbled from the unsorted list to the top of the array.

           �� after that, the wall moves one element ahead, increasing the number of sorted elements and decreasing the number of unsorted ones.

�� Each time an element moves from the unsorted part to the sorted part one sort pass is completed.
�� given a list of n elements, bubble sort requires up to n-1 passes to sort the data.

            �� Bubble sort was originally written to “bubble up” the highest element in the list. From an efficiency point of view it makes no difference whether the high element is bubbled or the low element is bubbled.

In some researches, bubble sort is referred to as “sorting by repeated comparison and exchanging, respectively.” It is easy to remember and to program, and little time is necessary to complete a single step. However, this sorting algorithm is normally used where the value of n is not too big and programming effort is to be kept to a least amount.



Bubble Sort Algorithm
/* Sorts list using bubble sort. Adjacent elements
are compared and exchanged until list is
completely ordered.
*/
void bubbleSort(int list[], int n)
{
int cur, j, temp;
for (cur = 0; cur <= n; cur++){
for ( j = n; j > cur; j--)
if(list[j ] < list[ j - 1]){
temp = list[ j ];
list[ j ] = list[ j – 1];
list[ j -1] = temp;
}
}
return;
}

What is the time complexity of the Bubble sort algorithm?

The number of comparisons for the worst case (when the array is in the reverse order sorted ) equals the number of iterations for the inner loop ,i.e. O(n 2). How many comparisons are to be made for the best case? Well, it turns out that the number of comparisons remains the same.

What about the number of swaps?
In the worst case, the number of movements is 3 n(n – 1 )/2. In the best case, when all elements are ordered, no swaps are necessary.
If it is a randomly ordered array, then number of movements are around half of the worst case figure.

Compare it with insertion sort. Here each element has to be bubbled one step at a time. It does not go directly to its proper place as in insertion sort. It could be said that insertion sort is twice as fast as the bubble sort , for the average case., because bubble sort makes approximately twice as many comparisons .

In summary, we can say that bubble sort, insertion sort and selection sort are not very efficient methods, as the complexity is O(n 2).

©      INSERTION SORT
The key idea is to pick up a data element and insert it into its proper place in the partial data considered so far. An outline of the algorithm is as follows:
Let the n data elements be stored in an array list[ ]. Then,
for ( cur = 1; cur < n ; cur++ )
move all elements list [ j ] greater than data [ cur ] by one position;
place data [ cur ] in its proper position. 

Note that the sorting is restricted only to fraction of the array in each iteration.
The list is divided into two parts: sorted and unsorted.
�� In each pass, the first element of the unsorted part is picked up, transferred to the sorted sub list, and inserted at the appropriate place. A list of n elements will take at most n-1 passes to sort the data.


Insertion Sort Algorithm
/* With each pass, first element in unsorted
sub list is inserted into sorted sub list.
*/
void insertionSort(int list[], int n)
{
int cur, located, temp, j;
for (cur = 1; cur <= n ; cur++){
located = 0;
temp = list[cur];
for ( j = cur - 1; j >= 0 && !located;)
if(temp < list[ j ]){
list[j + 1]= list[ j ];
j--;
}
else
located = 1;
list[j + 1] = temp;
}
return;
}

An advantage of this method is that it sorts the array only when it is really necessary.
If the array is already in order, no moves are performed.
However, it overlooks the fact that the elements may already be in their proper positions.
When an item is to be inserted, all elements greater than this item have to be moved.
There may be large number of redundant moves, as an element (properly located) may be moved,
but later, brought back to its position.

The best case is when the data are already in order. Only one comparison is made for each
position, so the comparison complexity is O(n),
and the data movement is 2n – 1 , i.e. it is O(n).

The worst case is when the data are in reverse order.
Each data element is to be moved to new position and for that each of the other elements
have to be shifted. This works out to be complexity of O(n2).

©      SHELL SORT
 
Shell sort is the oldest quick sorting algorithms. Also called Diminishing Increments sort It is said to be a better insertion sort. It is named after its inventor, Donald Shell. Insertion sort does a preset amount of work to take away each inversion. Shell sort can remove numerous inversions with one replace. It is typically works very well on real data, better then on random sequences. It is fast, easy to understand and easy to implement. However, its difficulty analysis is a little more complicated. The idea of Shell sort is the following: organize the data sequence in a two-dimensional array sort the columns of the array. The result is that the data sequence is partly sorted.
Its running time is proved to be O(N3/2)in the worst case, and seems on average to be O(N5/4) (Weiss p230). It might even be as good as O(N7/6.

            We can consider shell sort to be an elegant extension of insertion sort that gains speed by allowing exchanges of elements that are far apart. It sorts slices with a particular step h. Such a file is said to be h-sorted. If we first h-sort a file using a large value of h, elements move long distances and the efficiency of h-sorting for smaller values of  h improves.  When you get to the h=1 you implement a regular insertion sort and thus get a sorted file.
But the details of this elegant idea are non-trivial. First of all the result of h-sorting a file that is already k-sorted is a file became both h-and k-sorted. While any sequence of values of h will work as long as ends with h1 = 1, some sequences are clearly better than others.


Shell Sort Algorithm
void shellsort (int[] a, int n)
{
    int i, j, k, h, v;
    int[] cols = {1391376, 463792, 198768, 86961, 33936, 13776, 4592,
                    1968, 861, 336, 112, 48, 21, 7, 3, 1}
    for (k=0; k<16; k++)
    {
        h=cols[k];
        for (i=h; i
        {
            v=a[i];
            j=i;
            while (j>=h && a[j-h]>v)
            {
                a[j]=a[j-h];
                j=j-h;
            }
            a[j]=v;
        }
    }
}

References:
  • Cmput 115 - Lecture 9
Department of Computing Science
University of Alberta
©Duane Szafron 2000
  •   Keone Hon

  • Owen Astrachan  
        Computer Science Department
        Duke University

Friday, February 4, 2011

...Algorithm...

….Empirical Analysis….

            When we say “empirical”, basically, it means a result obtained from an observation or experiment rather than theory. Masasabi nating ang isang bagay ay ginamitan ng empirical analysis kapag ito ay natamo sa pamamagitan ng paggawa ng experimento, hindi mula sa mga haka-haka o sabi-sabi lamang.
As I explore internet sites, napag-alaman kong may mga hakbang para sa Empirical Analysis tulad ng mga sumusunod:
1. Choose on the essential process.
2. Think and choose your input sample (range, size,...).
3. Convert the algorithm to a program.
4. Create a sample of inputs.
5. Execute the program; obserbahan ang mga datos at irekord..
6. Evaluate the data.



To establish the quantity of resources, such as time and storage, which is necessary in executing an algorithm is what we call analysis of algorithm. Halos lahat ng algorithms ay denisenyo to work with keys of arbitrary length. Frequently, the competence or running time of an algorithm is affirmed as a function relating the input length to the amount of steps (time complexity) or storage places (space complexity).
Ang analysis of algorithm ay isang napakahalagang bahagi ng malawakang computational complexity theory, na nagbibigay ng theoretical estimates para sa mga kinakailangang resources ng kahit anong algorithm which solves a given computational problem. These estimates provide an approach into sensible directions of search for well-organized algorithms.
(Wikipedia.com)


Generally, analyzing an algorithm is basically studying the arrangement of the algorithm and drawing conclusions about how the execution of that algorithm, which is the program, will carry out.
            According t Bruno R. Preiss, as we do analysis of algorithm, we can:
  • determine the running time of a program as a function of its inputs;
  • determine the total or maximum memory space needed for program data;
  • determine the total size of the program code;
  • determine whether the program correctly computes the desired result;
  • determine the complexity of the program--e.g., how easy is it to read, understand, and modify;
  • determine the robustness of the program--e.g., how well does it deal with unexpected or erroneous inputs?

….Big – Oh Notation….

Ang Big-Oh (the "O" means "order of") notation is primarily anxious with what happens for very large values of N, therefore only the largest term in a polynomial is needed. All smaller terms are dropped.
According to Swartz, Big- Oh notation is not always useful, having the following reasons:
  • Too hard to analyze. Many algorithms are simply too hard to analyze mathematically.
  • Average case unknown. There may not be sufficient information to know what the most important "average" case really is, therefore analysis is impossible.
  • Unknown constant. Both walking and traveling at the speed of light have a time-as-function-of-distance big-oh complexity of O(N). Although they have the same big-oh characteristics, one is rather faster than the other. Big-oh analysis only tells you how it grows with the size of the problem, not how efficient it is.
  • Small data sets. If there are no large amounts of data, algorithm efficiency may not be important.
(Fred Swartz)

This notation is useful for us to be able to know the estimated memory allocation of an algorithm will require in CPU or memory resources.
Ang Big-Oh notation ay naglalarawan sa limitadong gawa ng isang function when the disagreement be inclined towards a particular value or infinity, usually in terms of simpler functions. Ang growth rates ang batayan ng Big O notation malaman nya ang mga functions: ang magkaibang function na may parehong growth rate ay pwedeng irepresenta ng iisang O notation.
Kahit nadeveloped ito bilang bahagi ng purong matematika, this notation is now habitually also used in the analysis of algorithms to illustrate an algorithm's practice of computational resources: the nastiest case or regular case running time or memory usage of an algorithm is frequently uttered as a function of the length of its input using big O notation. This allows algorithm designers to guess the behavior of their algorithms and to decide which of multiple algorithms to use, in a way that is independent of computer architecture or clock rate. Because big O notation rejects multiplicative constants on the running time, and disregard effectiveness for low input sizes, it does not always disclose the fastest algorithm in practice or for practically-sized data sets, but the approach is still very effective for comparing the scalability of various algorithms as input sizes become large
(Wikipedia.com)


Formula:
f(n) = O(g(n))
if f(n) grows with same rate or slower than g(n).
f(n) = Θ(g(n))  or f(n) = o(g(n))
Example:
n+5  = Θ(n)  = O(n) = O(n2)
        = O(n3) = O(n5)
the closest estimation: n+5 = Θ(n)
the general practice is to use the Big-Oh notation:
            n+5 = O(n)
*      Rules to manipulate Big-Oh expressions
Rule 1:
a. If T1(N) = O(f(N)) and T2(N) = O(g(N))
then   T1(N) + T2(N) = max( O( f (N) ), O( g(N) ) ) 

b. If T1(N) = O( f(N) ) and T2(N) = O( g(N) )
then T1(N) * T2(N) = O( f(N)* g(N) )
Rule 2:
If T(N) is a polynomial of degree k,
then T(N) = Θ( Nk )
Rule 3:
log k N = O(N) for any constant k.

Examples:
n2   + n = O(n2)                                   
we disregard any lower-order term
nlog(n) = O(nlog(n))
n2   + nlog(n) = O(n2)