# Sorting a list by a priority list in Python

TL;DR: jump to the solution.

Let's say you want to sort some list according the order another list, how would you do that in Python?

An example would be to sort a list of foods according a list of your favourite foods, ordered from your most liked to least liked foods:

>>> my_favourite_foods = ['Coffee', 'Chocolate', 'Bagels', 'Lasagne']
>>> available_foods = ['Coffee', 'Beetroot', 'Bagels', 'Burgers', 'Chocolate']
>>> sort_by_priority_list(values=available_foods, priority=my_favourite_foods)
['Coffee', 'Chocolate', 'Bagels', 'Beetroot', 'Burgers']
1
2
3
4

I had to implement a function like this for one of our projects today. Another dev sent me this function to get me started:

def sort_by_priority_list(values, priority):
    priority_dict = dict(
        zip(
            priority,
            range(len(priority)),
        ),
    )
    priority_getter = priority_dict.get # dict.get(key)
    return sorted(values, key=priority_getter)
1
2
3
4
5
6
7
8
9

How does this function work?

It starts off by creating a dictionary for which the keys are the items in priority and the values are the indices of the items in priority. For the above example, it would look something like this:

>>> priority_dict
{'Coffee': 0, 'Chocolate': 1, 'Bagels': 2, 'Lasagne': 3}
1
2

Another to way create priority_dict would be:

priority_dict = {k: i for i, k in enumerate(priority)}
1

Next up it creates a function called priority_getter like this:

priority_getter = priority_dict.get # dict.get(key)
1

Another way to define priority_getter would be:

def priority_getter(key):
  return priority_dict.get(key)
1
2

priority_getter gets the value of priority_dict at a given key.

Why do we need priority_getter?

At the last line of our function we use it to return a sorted list of our input values:

sorted(values, key=priority_getter)
1

Here, instead of sorting values by the items in the values, we sort it by calling priority_getter on each item. In other words, instead of comparing two items as usual like item1 > item2, sorted will now compare them using priority_getter(item1) > priority_getter(item2) - which amounts to: priority_dict.get(item1) > priority_dict.get(item2).

There's one issue with this solution though: it fails when there are any items in values that aren't in priority:

>>> sort_by_priority_list(values=[1,2,3], priority=[1,2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-3d6470298442> in <module>()
----> 1 sort_by_priority_list([1,2,3], [1,2])

<ipython-input-20-b8e90b47f443> in sort_by_priority_list(values, priority)
      7     )
      8     priority_getter = priority_dict.get # dict.get(key)
----> 9     return sorted(values, key=priority_getter)

TypeError: '<' not supported between instances of 'NoneType' and 'int'
1
2
3
4
5
6
7
8
9
10
11
12

How can we fix this?

First we need to figure out why we get this error.

Remember that for every item in values, we try to get a value from priority_dict using priority_dict.get(value). dict.get returns None if the dict doesn't contain the given key. We can pass a default value to it as an additional argument to make it return the default value instead of None:

def priority_getter(value):
    return priority_dict.get(value, len(values))
1
2

We set the default value to len(values) because we want the value of keys that aren't in our priority list to be higher (i.e. a lesser priority) than any of the items in our priority list. Why? So that any items in values that aren't in priority will always be put at the back of the resulting list.

Instead of modifying priority_getter, we also have to option of using a defaultdict (opens new window). When we create a defaultdict, we have to pass a function which returns a default value as its first argument. This function is called the default_factory.

priority_dict = defaultdict(
    lambda: len(priority), zip(priority, range(len(priority)),),
)
1
2
3

If we use defaultdict, we need to change priority_getter = priority_dict.get to use __getitem__ instead of get because defaultdict's default_factory function only gets called when a key isn't found when __getitem__ is called. defaultdict.get would return None in same way that it would return None for a normal dict.

# Final Solution

This is what our final solution looks like when we use zip and defaultdict:

from collections import defaultdict
from typing import Iterable, List, TypeVar

T = TypeVar("T")


def sort_by_priority_list(values: Iterable[T], priority: List[T]) -> List[T]:
    """
    Sorts an iterable according to a list of priority items.
    Usage:
    >>> sort_by_priority_list(values=[1,2,2,3], priority=[2,3,1])
    [2, 2, 3, 1]
    >>> sort_by_priority_list(values=set([1,2,3]), priority=[2,3])
    [2, 3, 1]
    """
    priority_dict = defaultdict(
        lambda: len(priority), zip(priority, range(len(priority)),),
    )
    priority_getter = priority_dict.__getitem__  # dict.get(key)
    return sorted(values, key=priority_getter)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

And here it is without using zip and defaultdict:

from typing import Iterable, List, TypeVar

T = TypeVar("T")


def sort_by_priority_list(values: Iterable[T], priority: List[T]) -> List[T]:
    """
    Sorts an iterable according to a list of priority items.
    Usage:
    >>> sort_by_priority_list(values=[1,2,2,3], priority=[2,3,1])
    [2, 2, 3, 1]
    >>> sort_by_priority_list(values=set([1,2,3]), priority=[2,3])
    [2, 3, 1]
    """
    priority_dict = {k: i for i, k in enumerate(priority)}
    def priority_getter(value):
        return priority_dict.get(value, len(values))
    return sorted(values, key=priority_getter)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

I think the solution without zip and defaultdict is a bit easier to read and understand. It is also slightly faster than the first solution for large inputs.

Newsletter

If you'd like to subscribe to my blog, please enter your details below. You can unsubscribe at any time.

Powered by Buttondown.

Last Updated: 11/20/2023, 10:04:51 AM