Diferencia entre revisiones de «Estructuras de datos dinámicas/Algoritmos de ordenamiento»

Contenido eliminado Contenido añadido
Rgfernan (discusión | contribs.)
Nueva página: '''Quicksort''' is a well-known sorting algorithm developed by C. A. R. Hoare that, on average, makes <math>\Theta(n~\log~n)</math> (big O notation...
 
Sin resumen de edición
Línea 1:
Un algoritmo de ordenamiento recibe como entrada una secuencia de elementos, y devuelve una secuencia con los mismos elementos, en la cual el orden relativo de cada elemento respecto de los demás respeta un criterio dado.
'''Quicksort''' is a well-known [[sorting algorithm]] developed by [[C. A. R. Hoare]] that, [[average performance|on average]], makes <math>\Theta(n~\log~n)</math> ([[big O notation]]) comparisons to sort ''n'' items. However, in the [[worst-case performance|worst case]], it makes <math>\Theta(n^2)</math> comparisons. Typically, quicksort is significantly faster in practice than other <math>\Theta(n~\log~n)</math> algorithms, because its inner loop can be efficiently implemented on most architectures, and in most real-world data it is possible to make design choices which minimize the possibility of requiring quadratic time.
 
== Quicksort ==
Quicksort is a [[comparison sort]] and is not a [[Sorting algorithm#Classification|stable sort]].
 
Quicksort es un algoritmo de ordenamiento que utiliza la técnica 'divide y vencerás'. Consiste en dividir la secuencia a ser ordenada en dos partes, de tal manera que cada unos de elementos de una de las partes sea mayor o igual a todos los elementos de la segunda parte. Cada una de las partes se divide y se le aplica a su vez este procedimiento recursivamente, hasta llegar a las secuencias unitarias. Al finalizar este procedimiento, la secuencia completa queda ordenada.
== The algorithm ==
 
=== Algoritmo ===
Quicksort sorts by employing a [[divide and conquer algorithm|divide and conquer]] strategy to divide a [[List (computing)|list]] into two sub-lists.
# Elegir un elemento de la secuencia, llamado ''pivote''.
# Reordenar la secuencia de tal manera que los elementos menores que le pivote aparezcan primero, y los mayores después (el orden de los elementos iguales al pivote no importa).
# Ordenar recursivamente las subsecuencias.
 
El caso base de esta recursión son las secuencias de un elemento.
The steps are:
# Pick an element, called a [[Pivot element|''pivot'']], from the list.
# Reorder the list so that all elements which are less than the pivot come before the pivot and so that all elements greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the '''partition''' operation.
# [[recursion|Recursively]] sort the sub-list of lesser elements and the sub-list of greater elements.
 
El algoritmo se puede expresar de la siguiente forma:
The [[Base case#Recursive programming|base case]] of the recursion are lists of size zero or one, which are always sorted. The algorithm always terminates because it puts at least one element in its final place on each iteration (the loop invariant).
 
public static Collection quickSort(Collection secuencia)
In simple [[pseudocode]], the algorithm might be expressed as:
{
Collection salida, menores, mayores, iguales;
if(secuencia.size() == 1) return;
Iterator iteradorSecuencia = secuencia.iterator();
Comparable pivote = obtenerPivote(secuencia);
while (iteradorSecuencia.hasNext())
{
Comparable elementoActual=(Comparable) iteradorSecuencia.next();
if(pivote.compareTo(elementoActual)<0)
mayores.add(elementoActual);
else if (pivote.compareTo(elementoActual)>0)
menores.add(elementoActual);
else
iguales.add(elementoActual);
}
mayores = quickSort(mayores);
menores = quickSort(menores);
salida.addAll(menores);
salida.addAll(iguales);
salida.addAll(mayores);
return salida;
}
 
==== Versión de ordenamiento sin espacio de almacenamiento adicional ====
'''function''' quicksort(array)
La desventaja de la versión precedente es que requiere un almacenamiento extra de <math>\Omega (n)</math>. Esto puede evitarse ordenando sobre una misma secuencia:
'''var''' ''list'' less, pivotList, greater
'''if''' length(array) ≤ 1
'''return''' array
select a pivot value ''pivot'' from array
'''for each''' x '''in''' array
'''if''' x < pivot '''then''' add x to less
'''if''' x = pivot '''then''' add x to pivotList
'''if''' x > pivot '''then''' add x to greater
'''return''' concatenate(quicksort(less), pivotList, quicksort(greater))
 
public static void quickSort(List secuencia, int indiceInferior, int indiceSuperior)
Notice that we only examine elements by comparing them to other elements. This makes quicksort a [[comparison sort]].
{
int i=indiceInferior, j = indiceSuperior;
if(indiceInferior == indiceSuperior) return;
Comparable pivote = obtenerPivote(secuencia);
for(;i!=j;)
{
for(;((Comparable)secuencia.get(i)).compareTo(pivote)<0;i++);
for(;((Comparable)secuencia.get(j)).compareTo(pivote)<0;j--);
intercambiar(secuencia,i,j);
i++;
j++;
}
quickSort(secuencia, indiceInferior, i);
quickSort(secuencia, i+1, indiceSuperior);
}
public static List quickSort(List secuencia)
{
quickSort(secuencia,0,secuencia.size());
return secuencia;;
}
 
=== Version with in-place partitionAnálisis ===
 
En el mejor de los casos, en cada paso se dividirá la secuencia en dos partes, de manera que la profundidad del árbol de llamadas iterativas será<math>\log_2n </math>, y como la cantidad de elementos a ser comparados es <math>n</math>, el orden de la complejidad del algoritmo será <math>\mathcal{\Omega}(n \log n)</math>
[[Image:Partition example.svg|right|200px|thumb|In-place partition in action on a small list. The boxed element is the pivot element, blue elements are less or equal, and red elements are larger.]]
The disadvantage of the simple version above is that it requires <math>\Omega</math>(n) extra storage space, which is as bad as [[mergesort]] (see [[big-O notation]] for the meaning of <math>\Omega</math>). The additional memory allocations required can also drastically impact speed and cache performance in practical implementations. There is a more complicated version which uses an [[in-place]] partition algorithm and can achieve <math>\mathcal{O}(\log~n)</math> space use on average for good pivot choices:
 
'''function''' partition(array, left, right, pivotIndex)
pivotValue := array[pivotIndex]
swap( array, pivotIndex, right) ''// Move pivot to end''
storeIndex := left - 1
'''for''' i ''' from ''' left '''to''' right-1
'''if''' array[i] <= pivotValue
storeIndex := storeIndex + 1
swap( array, storeIndex, i)
swap( array, right, storeIndex+1) ''// Move pivot to its final place''
'''return''' storeIndex+1
 
This form of the partition algorithm is not the original form; multiple variations can be found in various textbooks, such as versions not having the storeIndex. However, this form is probably the easiest to understand.
 
This is the in-place partition algorithm. It partitions the portion of the array between indexes ''left'' and ''right'', inclusively, by moving all elements less than or equal to <code>a[pivotIndex]</code> to the beginning of the subarray, leaving all the greater elements following them. In the process it also finds the final position for the pivot element, which it returns. It temporarily moves the pivot element to the end of the subarray, so that it doesn't get in the way. Because it only uses exchanges, the final list has the same elements as the original list. Notice that an element may be exchanged multiple times before reaching its final place.
 
Once we have this, writing quicksort itself is easy:
 
'''function''' quicksort(array, left, right)
'''if''' right > left
select a pivot index (e.g. pivotIndex := left)
pivotNewIndex := partition(array, left, right, pivotIndex)
quicksort(array, left, pivotNewIndex-1)
quicksort(array, pivotNewIndex+1, right)
 
=== Parallelization ===
 
Like [[mergesort]], quicksort can also be easily [[parallel algorithm|parallelized]] due to its divide-and-conquer nature. Individual in-place partition operations are difficult to parallelize, but once divided, different sections of the list can be sorted in parallel. If we have <math>p</math> processors, we can divide a list of <math>n</math> elements into <math>p</math> sublists in <math>\Theta(n)</math> average time, then sort each of these in <math>\mathcal{O}\left(\frac{n}{p}~\log\frac{n}{p}\right)</math>
average time. Ignoring the <math>\mathcal{O}(n)</math> preprocessing, this is [[linear speedup]]. Given <math>\Theta(n)</math> processors, only <math>\mathcal{O}(n)</math> time is required overall.
 
One advantage of parallel quicksort over other parallel sort algorithms is that no synchronization is required. A new thread is started as soon as a sublist is available for it to work on and it does not communicate with other threads. When all threads complete, the sort is done.
 
Other more sophisticated parallel sorting algorithms can achieve even better time bounds. For example, in 1991 David Powers described a parallelized quicksort that can operate in <math>\mathcal{O}(\log~n)</math> time given enough processors by performing partitioning implicitly.<sup>[[#1|1]]</sup>
 
== Formal analysis ==
 
From the initial description it's not obvious that quicksort takes <math>\mathcal{O}(n~\log~n)</math> time on average. It's not hard to see that the partition operation, which simply loops over the elements of the array once, uses <math>\Theta(n)</math> time. In versions that perform concatenation, this operation is also <math>\Theta(n)</math>.
 
In the best case, each time we perform a partition we divide the list into two nearly equal pieces. This means each recursive call processes a list of half the size. Consequently, we can make only <math>\log~n</math> nested calls before we reach a list of size 1. This means that the depth of the [[call tree]] is <math>\mathcal{O}(\log~n)</math>. But no two calls at the same level of the call tree process the same part of the original list; thus, each level of calls needs only <math>\mathcal{O}(n)</math> time all together (each call has some constant overhead, but since there are only <math>\mathcal{O}(n)</math> calls at each level, this is subsumed in the <math>\mathcal{O}(n)</math> factor). The result is that the algorithm uses only <math>\mathcal{O}(n~\log~n)</math> time.
 
An alternate approach is to set up a [[recurrence relation]] for <math>\mathrm{T}(n)</math> factor), the time needed to sort a list of size <math>n</math>. Because a single quicksort call involves <math>\mathcal{O}(n)</math> factor) work plus two recursive calls on lists of size <math>\frac{n}{2}</math> in the best case, the relation would be:
 
:<math>\mathrm{T}(n) = \mathcal{O}(n) + 2\mathrm{T}(\frac{n}{2}).</math>
 
The [[master theorem]] tells us that <math>\mathrm{T}(n) = \Theta(n~\log~n)</math>.
 
In fact, it's not necessary to divide the list this precisely; even if each pivot splits the elements with 99% on one side and 1% on the other (or any other fixed fraction), the call depth is still limited to <math>100~log~n</math>, so the total running time is still <math>\mathcal{O}(n~\log~n)</math>.
 
In the worst case, however, the two sublists have size 1 and <math>n-1</math>, and the call tree becomes a linear chain of <math>n</math> nested calls. The ''i''th call does <math>\mathcal{O}(n-i)</math> work, and <math>\sum_{i=0}^n (n-i) = \mathcal{O}(n^2)</math>. The recurrence relation is:
 
:<math>\mathrm{T}(n) = \mathcal{O}(n)~+~\mathrm{T}(1)~+~\mathrm{T}(n-1) = \mathcal{O}(n)~+~\mathrm{T}(n-1)</math>
 
This is the same relation as for [[insertion sort]] and [[selection sort]], and it solves to <math>\mathrm{T}(n) = \Theta(n^2)</math>.
 
=== Randomized quicksort expected complexity ===
 
Randomized quicksort has the desirable property that it requires only <math>\mathcal{O}(n~\log~n)</math> [[expected value|expected]] time, regardless of the input. But what makes random pivots a good choice?
 
Suppose we sort the list and then divide it into four parts. The two parts in the middle will contain the best pivots; each of them is larger than at least 25% of the elements and smaller than at least 25% of the elements. If we could consistently choose an element from these two middle parts, we would only have to split the list at most <math>2~log_2~n</math> times before reaching lists of size 1, yielding an <math>\mathcal{O}(n~\log~n)</math> algorithm.
 
Unfortunately, a random choice will only choose from these middle parts half the time. The surprising fact is that this is good enough. Imagine that you are flipping a coin over and over until you get <math>k</math> heads. Although this could take a long time, on average only <math>2~k</math> flips are required, and the chance that you won't get <math>k</math> heads after <math>100~k</math> flips is infinitesimally small. By the same argument, quicksort's recursion will terminate on average at a call depth of only <math>2~(2~\log_2~n)</math>. But if its average call depth is <math>\mathcal{O}(\log~n)</math>, and each level of the call tree processes at most <math>n</math> elements, the total amount of work done on average is the product, <math>\mathcal{O}(n~\log~n)</math>.
 
=== Average complexity ===
 
Even if we aren't able to choose pivots randomly, quicksort still requires only <math>\mathcal{O}(n~\log~n)</math> time over all possible permutations of its input. Because this average is simply the sum of the times over all permutations of the input divided by <math>n</math> factorial, it's equivalent to choosing a random permutation of the input. When we do this, the pivot choices are essentially random, leading to an algorithm with the same running time as randomized quicksort.
 
More precisely, the average number of comparisons over all permutations of the input sequence can be estimated accurately by solving the recurrence relation:
 
:<math>\mathrm{C}(n) = n - 1 + \frac{1}{n} \sum_{i=0}^{n-1} (\mathrm{C}(i)+\mathrm{C}(n-i-1)) = 2n \ln n = 1.39n \log_2 n.</math>
 
Here, <math>n-1</math> is the number of comparisons the partition uses. Since the pivot is equally likely to fall anywhere in the sorted list order, the sum is averaging over all possible splits.
 
This means that, on average, quicksort performs only about 39% worse than the ideal number of comparisons, which is its best case. In this sense it is closer to the best case than the worst case. This fast average runtime is another reason for quicksort's practical dominance over other sorting algorithms.
 
C(n) = (n-1) + C(n/2) + C(n/2)
= (n-1) + 2C(n/2)
= (n-1) + 2((n/2) - 1 + 2C(n/4))
= n + n + 4C(n/4) - 1 - 2
= n + n + n + 8C(n/8) - 1 - 2 - 4
= ...
= kn + 2^kC(n/(2^k)) - (1 + 2 + 4 + . . . . . + 2^(k-1))
where log<sub>2</sub>n > k > 0
= kn + 2^kC(n/(2^k)) - 2^k + 1
-> nlog<sub>2</sub>n + nC(1) - n + 1.
 
=== Space complexity ===
 
The space used by quicksort depends on the version used.
 
Quicksort has a space complexity of <math>\mathcal{O}(\log~n)</math>, even in the worst case, when it is carefully implemented such that
* in-place partitioning is used. This requires <math>\mathcal{O}(1)</math>.
* After partitioning, the partition with the fewest elements is (recursively) sorted first, requiring at most <math>\mathcal{O}(\log~n)</math> space. Then the other partition is sorted using tail-recursion or iteration. (This idea is commonly attributed to R.M.Sedgewick [http://www.cs.columbia.edu/~hgs/teaching/isp/hw/qsort.c ][http://www.ugrad.cs.ubc.ca/~cs260/chnotes/ch6/Ch6CovCompiled.html ][http://home.tiscalinet.ch/t_wolf/tw/ada95/sorting/index.html ])
<!-- please replace these random links with a good reference to Sedgewick ... or perhaps Knuth quoting Sedgewick -->
 
The version of quicksort with in-place partitioning uses only constant additional space before making any recursive call. However, if it has made <math>\mathcal{O}(\log~n)</math> nested recursive calls, it needs to store a constant amount of information from each of them. Since the best case makes at most <math>\mathcal{O}(\log~n)</math> nested recursive calls, it uses <math>\mathcal{O}(\log~n)</math> space. The worst case makes <math>\mathcal{O}(n)</math> nested recursive calls, and so needs <math>\mathcal{O}(n)</math> space.
 
We are eliding a small detail here, however. If we consider sorting arbitrarily large lists, we have to keep in mind that our variables like ''left'' and ''right'' can no longer be considered to occupy constant space; it takes <math>\mathcal{O}(\log~n)</math> bits to index into a list of <math>n</math> items. Because we have variables like this in every stack frame, in reality quicksort requires <math>\mathcal{O}(\log^2n)</math> bits of space in the best and average case and <math>\mathcal{O}(n~\log~n)</math> space in the worst case. This isn't too terrible, though, since if the list contains mostly distinct elements, the list itself will also occupy <math>\mathcal{O}(n~\log~n)</math> bits of space.
 
The not-in-place version of quicksort uses <math>\mathcal{O}(n)</math> space before it even makes any recursive calls. In the best case its space is still limited to <math>\mathcal{O}(n)</math>, because each level of the recursion uses half as much space as the last, and
 
:<math>\sum_{i=0}^{\infty} \frac{n}{2^i} = 2n.</math>
 
Its worst case is dismal, requiring
 
:<math>\sum_{i=0}^n (n-i+1) = \Theta (n^2)</math>
 
space, far more than the list itself. If the list elements are not themselves constant size, the problem grows even larger; for example, if most of the list elements are distinct, each would require about <math>\mathcal{O}(\log~n)</math> bits, leading to a best-case <math>\mathcal{O}(n~\log~n)</math> and worst-case <math>\mathcal{O}(n^2~\log~n)</math> space requirement.
 
== Relationship to selection ==
 
A [[selection algorithm]] chooses the ''k''th smallest of a list of numbers; this is an easier problem in general than sorting. One simple but effective selection algorithm works nearly in the same manner as quicksort, except that instead of making recursive calls on both sublists, it only makes a single tail-recursive call on the sublist which contains the desired element. This small change lowers the average complexity to linear or <math>\Theta(n)</math> time, and makes it an [[in-place algorithm]]. A variation on this algorithm brings the worst-case time down to <math>\mathcal{O}(n)</math> (see [[selection algorithm]] for more information).
 
Conversely, once we know a worst-case <math>\mathcal{O}(n)</math> selection algorithm is available, we can use it to find the ideal pivot (the median) at every step of quicksort, producing a variant with worst-case <math>\mathcal{O}(n~\log~n)</math> running time. In practical implementations, however, this variant is considerably slower on average.
 
==Notes ==
<div id="1"></div>1. David M. W. Powers. [http://citeseer.ist.psu.edu/327487.html Parallelized Quicksort with Optimal Speedup]. ''Proceedings of International Conference on Parallel Computing Technologies''. [[Novosibirsk]]. 1991.
 
==Otra definición==
Quicksort
 
Quicksort is one of the fastest and simplest sorting algorithms [Hoa 62]. It works recursively by a divide-and-conquer strategyexplanation.
 
Idea
 
First, the sequence to be sorted a is partitioned into two parts, such that all elements of the first part b are less than or equal to all elements of the second part c (divide). Then the two parts are sorted separately by recursive application of the same procedure (conquer). Recombination of the two parts yields the sorted sequence (combine). Figure 1 illustrates this approach.
 
The first step of the partition procedure is choosing a comparison element x. All elements of the sequence that are less than x are placed in the first part, all elements greater than x are placed in the second part. For elements equal to x it does not matter into which part they come. In the following algorithm it may also happen that an element equal to x remains between the two parts.
 
Algorithm Partition
Input: sequence a0, ..., an-1 with n elements
Output: permutation of the sequence such that all elements a0, ..., aj are less than or equal to all elements ai, ..., an-1 (i > j)
Method:
 
1. choose the element in the middle of the sequence as comparison element x
 
let i = 0 and j = n-1
 
while i<=j
1. search the first element ai which is greater than or equal to x
 
search the last element aj which is less than or equal to x
 
if i<=j
1. exchange ai and aj
 
let i = i+1 and j = j-1
 
 
After partitioning the sequence, Quicksort treats the two parts recursively by the same procedure. The recursion ends whenever a part consists of one element only.
 
Program
 
The following Java program implements Quicksort.
 
void quicksort (int[] a, int lo, int hi)
{
// lo is the lower index, hi is the upper index
// of the region of array a that is to be sorted
int i=lo, j=hi, h;
int x=a[(lo+hi)/2];
 
// partition
do
{
while (a[i]<x) i++;
while (a[j]>x) j--;
if (i<=j)
{
h=a[i]; a[i]=a[j]; a[j]=h;
i++; j--;
}
} while (i<=j);
 
// recursion
if (lo<j) quicksort(a, lo, j);
if (i<hi) quicksort(a, i, hi);
}
 
Analysis
 
The best-case behavior of the Quicksort algorithm occurs when in each recursion step the partitioning produces two parts of equal length. In order to sort n elements, in this case the running time is in T(n log(n)). This is because the recursion depth is log(n) and on each level there are n elements to be treated (Figure 2 a).
 
The worst case occurs when in each recursion step an unbalanced partitioning is produced, namely that one part consists of only one element and the other part consists of the rest of the elements (Figure 2 c). Then the recursion depth is n-1 and Quicksort runs in time T(n2).
 
In the average case a partitioning as shown in Figure 2 b is to be expected.
 
The choice of the comparison element x determines which partition is achieved. Suppose that the first element of the sequence is chosen as comparison element. This would lead to the worst case behavior of the algorithm when the sequence is initially sorted. Therefore, it is better to choose the element in the middle of the sequence as comparison element.
 
Even better would it be to take the n/2-th greatest element of the sequence (the median). Then the optimal partition is achieved. Actually, it is possible to compute the median in linear time [AHU 74]. This variant of Quicksort would run in time O(n log(n)) even in the worst case.
 
However, the beauty of Quicksort lies in its simplicity. And it turns out that even in its simple form Quicksort runs in O(n log(n)) on the average. Moreover, the constant hidden in the O-notation is small. Therefore, we trade this for the (rare) worst case behavior of T(n2).
 
Proposition: The time complexity of Quicksort is in
T(n log(n)) in the average case and in
T(n2) in the worst case
 
Conclusions
 
Quicksort turns out to be the fastest sorting algorithm in practice. It has a time complexity of T(n log(n)) on the average. However, in the (very rare) worst case Quicksort is as slow as Bubblesort, namely in T(n2). There are sorting algorithms with a time complexity of O(n log(n)) even in the worst case, e.g. Heapsort and Mergesort. But on the average, these algorithms are by a constant factor slower than Quicksort.
 
It is possible to obtain a worst case complexity of O(n log(n)) with a variant of Quicksort (by choosing the median as comparison element). But this algorithm is on the average and in the worst case by a constant factor slower than Heapsort or Mergesort; therefore, it is not interesting in practice.