19 March 2023

How to sort a 2 dimensional list by 2 or more keys in different orders in Python?

https://wiki.python.org/moin/HowTo/Sorting/#Key_Functions

If with Pandas, it is easy - 

dist_list = [            
    [Ada Taylor, 4.00],
    [Iris White,26.02],
    [Octavia Smith, 7.29],
    [Tracy Jones, 4.00],
]

# Convert the list into a Python dataframe.
dist_df = pd.DataFrame(dist_list, columns=['Name', 'Distance'])

# Sort it by Distance in descending order. If some people 
# run the same distance, then sort those people's distances 
# by Name in ascending order.
dist_df.sort_values(by=['Distance', 'Name'], ascending=(False, True), inplace=True)

print(dist_df)

            Name  Distance
0     Iris White     26.02
1  Octavia Smith      7.29
2     Ada Taylor      4.00
3    Tracy Jones      4.00

But if without Pandas, we can manage to do that by sorting it twice. Firstly sort it by the secondary key (Name) in one order (ascending), then sort it by the main key (Distance) in another order (descending).

# Convert the list into a dictionary, while grouping
# and aggregating the distances by person.
dist_dict = {}
for person, distance in dist_list:
    if person not in dist_dict:
        dist_dict[person] = distance
    else:
        dist_dict[person] += distance

# Convert the dictionary back into a list, in order to sort it.
dist_list = [[person, distance] for person, distance in dist_dict.items()]

# Firstly, sort it by secondary key - person, in ascending order.
dist_list = sorted(dist_list, key=lambda x: x[0], reverse=False)

# Then, sort it by primary key - distance, in descending order.
dist_list = sorted(dist_list, key=lambda x: x[1], reverse=True)

print(dist_list)

[['Iris White', 26.02], ['Octavia Smith', 7.29], ['Ada Taylor', 4], ['Tracy Jones', 4]]


We can also manage to do that using the 'itemgetter()' function from the 'operator' Python standard module.


from operator import itemgetter
dist_list = sorted(dist_list, itemgetter(0), reverse=False)
dist_list = sorted(dist_list, itemgetter(1), reverse=True)

print(dist_list)

[['Iris White', 26.02], ['Octavia Smith', 7.29], ['Ada Taylor', 4], ['Tracy Jones', 4]]


Please look at 


I copied the section here as follows.

Operator Module Functions

The key-function patterns shown above are very common, so Python provides convenience functions to make accessor functions easier and faster. The operator module has itemgetterattrgetter, and starting in Python 2.6 a methodcaller function.

Using those functions, the above examples become simpler and faster.

>>> from operator import itemgetter, attrgetter, methodcaller

>>> sorted(student_tuples, key=itemgetter(2))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

>>> sorted(student_objects, key=attrgetter('age'))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

The operator module functions allow multiple levels of sorting. For example, to sort by grade then by age:

>>> sorted(student_tuples, key=itemgetter(1,2))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

>>> sorted(student_objects, key=attrgetter('grade', 'age'))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

The third function from the operator module, methodcaller is used in the following example in which the weighted grade of each student is shown before sorting on it:

>>> [(student.name, student.weighted_grade()) for student in student_objects]
[('john', 0.13333333333333333), ('jane', 0.08333333333333333), ('dave', 0.1)]
>>> sorted(student_objects, key=methodcaller('weighted_grade'))
[('jane', 'B', 12), ('dave', 'B', 10), ('john', 'A', 15)]

Ascending and Descending

Both list.sort() and sorted() accept a reverse parameter with a boolean value. This is using to flag descending sorts. For example, to get the student data in reverse age order:

>>> sorted(student_tuples, key=itemgetter(2), reverse=True)
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]

>>> sorted(student_objects, key=attrgetter('age'), reverse=True)
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]

Sort Stability and Complex Sorts

Starting with Python 2.2, sorts are guaranteed to be stable. That means that when multiple records have the same key, their original order is preserved.

>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
>>> sorted(data, key=itemgetter(0))
[('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]

Notice how the two records for 'blue' retain their original order so that ('blue', 1) is guaranteed to precede ('blue', 2).

This wonderful property lets you build complex sorts in a series of sorting steps. For example, to sort the student data by descending grade and then ascending age, do the age sort first and then sort again using grade:

>>> s = sorted(student_objects, key=attrgetter('age'))     # sort on secondary key
>>> sorted(s, key=attrgetter('grade'), reverse=True)       # now sort on primary key, descending
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

The Timsort algorithm used in Python does multiple sorts efficiently because it can take advantage of any ordering already present in a dataset.

No comments:

Post a Comment