25
UCT Algorithm Circle: Intermediate Class: Sorting and Searching Marco Gallotta 13 August, 2009 Marco Gallotta Sorting and Searching

UCT Algorithm Circle: Intermediate Class: Sorting and ... · UCT Algorithm Circle: Intermediate Class: Sorting and Searching ... is for bubble sort Each comparison results in

Embed Size (px)

Citation preview

UCT Algorithm Circle: Intermediate Class:Sorting and Searching

Marco Gallotta

13 August, 2009

Marco Gallotta Sorting and Searching

Sorting

Problem: Given a list a1,a2, . . . ,an of unordered data,reorder it into the list a′

1,a′2, . . . ,a

′n such that a′

k <= a′k+1

Sorting is one of the most heavily researched areas ofComputer ScienceGoogle sorted 1PB in 6 hours and 2 minutes in 2008We will be sorting integers, but any data type with anordering (including floats, strings and even Python tuplescontaining comparable data) can be sortedFor now we will assume that each element in the list isunique

Marco Gallotta Sorting and Searching

Sorting

Problem: Given a list a1,a2, . . . ,an of unordered data,reorder it into the list a′

1,a′2, . . . ,a

′n such that a′

k <= a′k+1

Sorting is one of the most heavily researched areas ofComputer ScienceGoogle sorted 1PB in 6 hours and 2 minutes in 2008We will be sorting integers, but any data type with anordering (including floats, strings and even Python tuplescontaining comparable data) can be sortedFor now we will assume that each element in the list isunique

Marco Gallotta Sorting and Searching

Human Sorting

We often sort things ourselves without the help of acomputer, e.g. sorting books in alphabetical orderCan you think of what method you use to sort?Can we come up with an algorithm that we can implementas a computer program?

Marco Gallotta Sorting and Searching

O(N2) Sorting Algorithms

Bubble sortCompares every pair of adjacent elements, swapping themif they’re out of orderRepeats the process until no swaps occur in an iteration ofthe process

Selection sortIterates through the list, selecting the minimum andswapping it with the first element in the listRepeats the process to sort the sublist a′

2,a′3, . . . ,a

′n

Insertion sortMaintains a sorted sublist a′

1,a′2, . . . ,a

′k and an unsorted

sublist ak+1,ak+2, . . . ,anTakes elements from the unsorted sublist and inserts theminto the sorted sublists until the unsorted sublist is emptyWorks better on linked lists (next week’s lecture)

Can we do better?

Marco Gallotta Sorting and Searching

O(N2) Sorting Algorithms

Bubble sortCompares every pair of adjacent elements, swapping themif they’re out of orderRepeats the process until no swaps occur in an iteration ofthe process

Selection sortIterates through the list, selecting the minimum andswapping it with the first element in the listRepeats the process to sort the sublist a′

2,a′3, . . . ,a

′n

Insertion sortMaintains a sorted sublist a′

1,a′2, . . . ,a

′k and an unsorted

sublist ak+1,ak+2, . . . ,anTakes elements from the unsorted sublist and inserts theminto the sorted sublists until the unsorted sublist is emptyWorks better on linked lists (next week’s lecture)

Can we do better?

Marco Gallotta Sorting and Searching

O(N2) Sorting Algorithms

Bubble sortCompares every pair of adjacent elements, swapping themif they’re out of orderRepeats the process until no swaps occur in an iteration ofthe process

Selection sortIterates through the list, selecting the minimum andswapping it with the first element in the listRepeats the process to sort the sublist a′

2,a′3, . . . ,a

′n

Insertion sortMaintains a sorted sublist a′

1,a′2, . . . ,a

′k and an unsorted

sublist ak+1,ak+2, . . . ,anTakes elements from the unsorted sublist and inserts theminto the sorted sublists until the unsorted sublist is emptyWorks better on linked lists (next week’s lecture)

Can we do better?

Marco Gallotta Sorting and Searching

O(N2) Sorting Algorithms

Bubble sortCompares every pair of adjacent elements, swapping themif they’re out of orderRepeats the process until no swaps occur in an iteration ofthe process

Selection sortIterates through the list, selecting the minimum andswapping it with the first element in the listRepeats the process to sort the sublist a′

2,a′3, . . . ,a

′n

Insertion sortMaintains a sorted sublist a′

1,a′2, . . . ,a

′k and an unsorted

sublist ak+1,ak+2, . . . ,anTakes elements from the unsorted sublist and inserts theminto the sorted sublists until the unsorted sublist is emptyWorks better on linked lists (next week’s lecture)

Can we do better?

Marco Gallotta Sorting and Searching

Decision Tree

Think of the sortingprocess as searchingthrough a binary searchtree, where each node is apermutation of the listRoot is the initial order ofthe listChildren are the resultswhen a comparison is “lessthan” or “greater than”Decision tree on the rightis for bubble sort

[a, b, c]

[a, b, c]

a < b

[b, a, c]

a > b

[a, b, c]

b < c

[a, c, b]

b > c

[b, a, c]

a < c

[b, c, a]

a > c

Each comparison results inat most one change in theorder of the list, so we canignore all other operationsin our analysis

Marco Gallotta Sorting and Searching

Decision Tree

Think of the sortingprocess as searchingthrough a binary searchtree, where each node is apermutation of the listRoot is the initial order ofthe listChildren are the resultswhen a comparison is “lessthan” or “greater than”Decision tree on the rightis for bubble sort

[a, b, c]

[a, b, c]

a < b

[b, a, c]

a > b

[a, b, c]

b < c

[a, c, b]

b > c

[b, a, c]

a < c

[b, c, a]

a > c

Each comparison results inat most one change in theorder of the list, so we canignore all other operationsin our analysis

Marco Gallotta Sorting and Searching

Strict Lower Bound

Worst case complexity is the height of the decision treeA clever sorting algorithm would have an almost-completetreeHow many distinct nodes are there in a decision tree for alist of size N?There is a node for every permutation of the list, so thereare N! distinct nodesTherefore the cleverest sorting algorithm has a decisiontree of height O(log N!) (height of a binary tree with N!nodes)√

2πN(Ne )N ≤ N! (Stirling’s approximation)

Taking logs on both sides gives us O(log N!) = O(N log N)

Therefore it is theoretically possible for a sorting algorithmto be O(N log N), but no better!

Marco Gallotta Sorting and Searching

Strict Lower Bound

Worst case complexity is the height of the decision treeA clever sorting algorithm would have an almost-completetreeHow many distinct nodes are there in a decision tree for alist of size N?There is a node for every permutation of the list, so thereare N! distinct nodesTherefore the cleverest sorting algorithm has a decisiontree of height O(log N!) (height of a binary tree with N!nodes)√

2πN(Ne )N ≤ N! (Stirling’s approximation)

Taking logs on both sides gives us O(log N!) = O(N log N)

Therefore it is theoretically possible for a sorting algorithmto be O(N log N), but no better!

Marco Gallotta Sorting and Searching

Strict Lower Bound

Worst case complexity is the height of the decision treeA clever sorting algorithm would have an almost-completetreeHow many distinct nodes are there in a decision tree for alist of size N?There is a node for every permutation of the list, so thereare N! distinct nodesTherefore the cleverest sorting algorithm has a decisiontree of height O(log N!) (height of a binary tree with N!nodes)√

2πN(Ne )N ≤ N! (Stirling’s approximation)

Taking logs on both sides gives us O(log N!) = O(N log N)

Therefore it is theoretically possible for a sorting algorithmto be O(N log N), but no better!

Marco Gallotta Sorting and Searching

Merge Sort

Merging two sorted lists of lengthN and M can be done inO(N + M) time

Use two pointers, one per listAt each step pick the smallestof the two numbers they pointto and advance that pointerThis picks the numbers insorted order

Merge sort splits the list in half,recursively sorts each sublist andthen merges themLists of length 0 and 1 arealready sorted, forming the basecase

[7, 5, 4, 8, 1]

[7, 5, 4] [8, 1]

[7, 5]

[4]

[8] [1]

[7] [5]

[4, 5, 7]

[5, 7]

[1, 8]

[1, 4, 5, 7, 8]

Marco Gallotta Sorting and Searching

Merge Sort

Merging two sorted lists of lengthN and M can be done inO(N + M) time

Use two pointers, one per listAt each step pick the smallestof the two numbers they pointto and advance that pointerThis picks the numbers insorted order

Merge sort splits the list in half,recursively sorts each sublist andthen merges themLists of length 0 and 1 arealready sorted, forming the basecase

[7, 5, 4, 8, 1]

[7, 5, 4] [8, 1]

[7, 5]

[4]

[8] [1]

[7] [5]

[4, 5, 7]

[5, 7]

[1, 8]

[1, 4, 5, 7, 8]

Marco Gallotta Sorting and Searching

Merge Sort: Analysis

If merge sort takes T (N) time on a list of length N, thenT (N) = 2T (N

2 ) + NUsing Mathematical Induction we can show that this givesa complexity of O(N log N), which is both the average andworst caseMerge sort can sort in-place using linked lists (next week’slecture), but it is rather tricky to implementWorks well on already sorted data, requiring only O(N)timePython uses an in-place, stable merge sort

Marco Gallotta Sorting and Searching

Merge Sort: Analysis

If merge sort takes T (N) time on a list of length N, thenT (N) = 2T (N

2 ) + NUsing Mathematical Induction we can show that this givesa complexity of O(N log N), which is both the average andworst caseMerge sort can sort in-place using linked lists (next week’slecture), but it is rather tricky to implementWorks well on already sorted data, requiring only O(N)timePython uses an in-place, stable merge sort

Marco Gallotta Sorting and Searching

Quicksort

Quicksort picks a pivot elementfrom the listThe list is reordered with allelements less than the pivotbefore the pivot and all otherelements after the pivotThe sublists on either sides of thepivot are then sorted recursivelySame base case as with mergesort

[7, 5, 4, 8, 1]

[5, 4, 1] [7] [8]

[4, 1] [5] []

[1, 4, 5, 7, 8]

[1] [4] []

qsort [] = []qsort [x:xs] = qsort(all xs < x) + \

[x] + qsort(all xs >= x)

Marco Gallotta Sorting and Searching

Quicksort: Analysis

Quciksort is average case O(N log N) and significantlyfaster than other O(N log N) algorithms in practiceThe issue is it’s worst case: when all elements fall alwaysfall on the same side of the pivot, it degenerates to O(N2)

On what cases will we get this worst case performance?Select the pivot randomly for expected O(N log N)performanceC++ uses a introsort, which uses quicksort down to acertain depth and then switches over to merge sort toachieve O(N log N) worst case

Marco Gallotta Sorting and Searching

Quicksort: Analysis

Quciksort is average case O(N log N) and significantlyfaster than other O(N log N) algorithms in practiceThe issue is it’s worst case: when all elements fall alwaysfall on the same side of the pivot, it degenerates to O(N2)

On what cases will we get this worst case performance?Select the pivot randomly for expected O(N log N)performanceC++ uses a introsort, which uses quicksort down to acertain depth and then switches over to merge sort toachieve O(N log N) worst case

Marco Gallotta Sorting and Searching

Searching

Problem: Given a list of data, search for a given elementThe simplest algorithm is linear search, which checks eachelement of the list in orderAverage and worst case is O(N)

Can we do better?

Marco Gallotta Sorting and Searching

Searching

Problem: Given a list of data, search for a given elementThe simplest algorithm is linear search, which checks eachelement of the list in orderAverage and worst case is O(N)

Can we do better?

Marco Gallotta Sorting and Searching

Binary Search

Lists are often presorted: think of class lists and telephonedirectoriesHow do you usually search for a number in a telephonedirectory?We open the book in the middle and if the name is there,we stopSince the names are sorted, we know which half the namewe’re looking for is inSearch for the name in this half of the bookThis algorithm is called binary search and has a worstcase complexity of O(log N)

Marco Gallotta Sorting and Searching

Binary Search

Lists are often presorted: think of class lists and telephonedirectoriesHow do you usually search for a number in a telephonedirectory?We open the book in the middle and if the name is there,we stopSince the names are sorted, we know which half the namewe’re looking for is inSearch for the name in this half of the bookThis algorithm is called binary search and has a worstcase complexity of O(log N)

Marco Gallotta Sorting and Searching

Binary Search

Lists are often presorted: think of class lists and telephonedirectoriesHow do you usually search for a number in a telephonedirectory?We open the book in the middle and if the name is there,we stopSince the names are sorted, we know which half the namewe’re looking for is inSearch for the name in this half of the bookThis algorithm is called binary search and has a worstcase complexity of O(log N)

Marco Gallotta Sorting and Searching