Data Structures in Python

international students learning python

Ultimate Guide to Data Structures in Python

Data structures can hold multiple values such as numbers, letters, characters even lists. In programming the most common data structure is the array and this is also the primary data structure in python although it is called a list.

We explain the wide range of data structures in python and in computer science. Normally online videos or texts cover the same material, such as for job interviews, but we want to present something a bit more interesting.

This article goes from beginner to advanced so the early parts will be of interest to readers new to python, whilst the later parts should be of interest to anyone wishing to learn about data structures and how python works.

There are built-in data structures in python like sets, tuples and dictionaries, and strings contain multiple characters. We compare these to lists.

In computer science we use data structures like stacks, queues, priority queues, hash tables and linked lists. These are often implemented using lists or a built-in function in python.

We also have more complex data structures in computer science such as types of trees or graphs.

Python is also great for data science where we use types of array data structures like numpy arrays and series in pandas. We also have data frames, a table-like data structure consisting of multiple series in pandas.

There are also other types of built in options in python such as collections.

Finally, in most universities we study data structures with algorithms. These consist mainly of search-based or sorting algorithms. As efficiency and complexity are concerns we need to have some understanding of these for our data structures.

So how we store and access values, and how this is affected both in memory and time, is a concern for the data structure and the common operations applied to it, such as find, select, update, insert and delete.

This is an extensive amount of material and you may find it a valuable reference in future studies. Please familiarize yourself with the table of contents and the links to the various sections.

Python Data Structures

There are built-in data structures in python called lists, sets, tuples and dictionaries, in addition to strings that hold multiple characters.

Other data structures can be created using lists, are available in a python library that can be added to your python code using import, or, can be created using python code involving  classes and methods.

To begin we will look at the main python data structures, initially in a easy-to-understand manner suitable for beginners, and then progress to more detail on how lists work for example.

Other data structures in computer science, data science and in python will follow below.

Lists in Python for beginners

The most common data structure in python is the list, that stores multiple values in the same way a list does in real life and is similar to arrays in other programming languages.

If you are a beginner then lists can be used in most of your programs when you need to store more than one item. Lists allow items to be easily added, updated, deleted, accessed and printed.

Lets see a simple list with numbers 1 to 5.

12345

A list can be considered a series of boxes that can contain values. Now lets see how we can perform actions on the list below.

We give the list a name (e.g. my_list) and assign the name to the list by using the equals sign and the list in square brackets, with the values separated by commas. Here is our new list

my_list = [20, 40, 5, 15, 20]

We will use this list in all the examples below, even after we add, update or delete elements, so that it is easier to understand each operation.

The list does not have to be in order, it can contain the same value more than once, and it does not have to be numbers, it could be strings, floating point numbers, a mixture of types or even mixed types.

List Operations

There are many list functions that can be applied to the list, lets start with the basic operations. To add an item to the list we use the append() function. This adds the new value to the end of the list.

To print the list in full we can simple use print with the list name in the brackets as we do with single variables.

my_list = [20, 40, 5, 15, 20]
my_list.append(25)
print(my_list)

output: [20, 40, 5, 15, 20, 25]

To access and update an element in the list we use the index. The index is a number that represents the position of a value in the list, such as how many boxes is it from the beginning.

The first element, in the first position, is zero (0) boxes or places from the beginning, so we use the index number 0. The next item in the list is at the beginning +1, therefore we use the index 1 to access that element.

print(my_list[0])
print(my_list[3])

output: 20
output: 15

The index number continues to the end of the list. In our example of a list we can access the first element using the index 0 in square brackets after the list name.

If we print my_list[0], the first list element, then the output will be the value in the first position, which is 20. The item in the fourth position, which is 3 places from the beginning, will have an index of 3. If we print my_list[4] the output is the value 15.

For reference if you enter a negative number then the index will start at the end of the list so [-1] is the index of the last number, [-2] then second last number, etc.

We can change a value using the index to select the item we wish to change, and assign it the new value. Let’s change the second value in our list to 30.

my_list = [20, 40, 5, 15, 20]
my_list[1] = 30
print(my_list)

output: [20, 30, 5, 15, 20]

Finally, we can delete or remove a list item. There are two approaches. To remove a list item using the value then use remove(), to delete an item by its index use pop().

my_list = [20, 40, 5, 15, 20]
my_list.remove(5)
print(my_list)

output: [20, 40, 15, 20]

If we have multiple values then it will remove the first instance of that numbers, for example:

my_list = [20, 40, 5, 15, 20]
my_list.remove(20)
print(my_list)

output: [40, 5, 15, 20]

To delete a number from within the list we can  use pop() with the index inside the pop function brackets.

my_list.pop(3)
print(my_list)

output: [20, 40, 5, 20]

If we wish to remove the last item in the list we can use pop() without the index number.

my_list = [20, 40, 5, 15, 20]
my_list.pop()
print(my_list)

output: [20, 40, 5, 15]

There are many list functions. If you are a beginner and would like to know more about these functions then watch this video on YouTube called ‘Python Coder | # Beginner | Lists

Print List Elements

So far we have printed the entire list, but we can print each element either by using the index as we have seen or by using a for loop.

for i in my_list:
   print(i)

output:
20
40
5
15
20

To print the list items next to their indexes we can use range(). Range has the advantage of starting from 0, so we only need to know the length of the list.

Even if the list size changes the use of the len() function inside range will still result in the full list being printed.

for i in range(len(my_list)):
   print(i, my_list[i])

output:
0 20
1 40
2 5
3 15
4 20

Slice a List

If you wish to have a subset of your list then you use the start and end positions to slice the list. For example, we want to take our original list of five numbers and slice the middle three numbers into a new list.

new_list = my_list.[ 1 : 4 ]
print(new_list)

output: [40, 5, 15]

We use the start position or index of 1, then use a colon, then use the end position of 4. If you leave either of the start or end position empty the list will be sliced from the start or end position as follows:

new_list = my_list.[  : 4 ]
print(new_list)

output: [20, 40, 5, 15]

new_list = my_list.[ 1 :  ]
print(new_list)

output: [40, 5, 15, 20]

Sets in Python

Sets, as it mathematics, are a collection of unique elements without order. Usual set operations (union, intersection, etc) are available in python.

Sets are very different from lists in python and unless you are specifically dealing with mathematical sets then it is more likely that a list is the data structure for your needs.

A common use of sets is to convert a list to a set to remove duplicates and then to convert the set back to a list.

This is not a suggested approach if efficiency and memory use is a concern, but it does have interesting consequences. Here is an example:

alist = [10,20,30,30,40,40,50]
alist = list(set(alist))
print (alist)

Output: [40, 10, 50, 20, 30]

The duplicates are removed but the list has changed order. Sets not only have unique elements but are not ordered.

Lists in Python for non-beginners

Data structures are often taught with algorithms and the complexity of operations applied on the data structures are important.

For example, with a list we can add an item to the end of a list in one operation, but what if we want to add an item not at the end, at the beginning for example. This is more complex or expensive, computationally expensive.

my_list.remove(20)
print(my_list)

output: [40, 5, 15, 20]

In the remove example, the second item with the value 40 is moved to the first position, the third item with the value 5 is moved to the second position, 15 moves one place to index 2 position, and the last item is now at index 3 not index 4 position.

As a beginner this is not important but if you write programs with larger lists or change list values often, then you may not want this operational expense.

The numbers are not in order in the list, this could be desired. But if access time is important then the list could be sorted and elements located easier.

This depends on use as the frequent expense of sorting the list would have to be worth it for the access benefits.

Sort a List in Python

The numbers are not necessarily in order in a python list, this could be desired. But if access time is important then the list could be sorted and elements located easier.

This depends on use as the frequent expense of sorting the list would have to be worth it for the access benefits.

Python has a sort() function for lists, although it is not the fastest or most efficient, it is effective on lists that are frequently ordered.

my_list.sort()
print(my_list)

output: [5, 15, 20, 20, 40]

The following definition explains the sorting algorithm that is used in the sort function in python called Timsort:

Timsort (derived from merge sort and insertion sort) was introduced in 2002 and while slower than quicksort for random data, Timsort performs better on ordered data. [1]

Lists vs Arrays

Python lists are not the same as arrays. Arrays in other programming languages are initially declared and they store items of the same value. In this approach the array length is known as each position and storage requirement is known in advance.

But python lists can contain different data types or even other nested lists. How?

Python lists do not store the values directly. Instead in python the lists store a pointer to the values in memory. In this approach the position of each value from the beginning is known as it is a pointer in each place.

The value is accessed using the pointer when required. This approach also has the effect on the memory allocation of the python list, seen below.

There are arrays in python that are similar to arrays from other languages, but, in python, we use lists.

List Memory Allocation

When the list is created it has a limited amount of storage (up to 4 elements), and storage is increased at various stages as the amount of elements in the list grows.

The size of the list in bytes, and the stages of the increases in regard to elements are as follows:

    • 224 bytes      empty list 
    • 736 bytes      over 4 elements
    • 2272 bytes    list over 16 elements
    • 8416 bytes    list over 76 elements
    • 32992 bytes  list over 306 elements

Built-in Python Data Structures for non-beginners

Lists are the predominant built-in data structure in python but we also use sets, tuples, dictionaries and even strings.

Tuples

Tuples are not mutable and therefore are different from lists although they share several functions that use the same syntax.

As tuples have little functionality they are easier to store and useful for temporary variables for example.

As tuples can’t be changed and list functions like append() are not available, if a user wants to add an element to a tuple then the ‘+’ operator can be used to add another tuple, this is called concatenation.

Sets

Sets are implemented in python using a hash table, a popular technique to perform insertion, deletion, and traversal in O(1) on average. Hashing tables are explain in detail later.

“Python Sets are implemented using dictionary with dummy variables, where key beings the members set with greater optimizations to the time complexity.” [2]

There is also the frozenset that is part of collections and explained in the collections section.

Set Time Complexity

Set operations have the following complexity:

  • O(n) for search ‘in‘  set
  • O(n) for set difference (x, y) where n is length of set x
  • O(z) where z is length of set x + length of set y for set union (x, y)
  • O(z) where z is length of set x * length of set y for set intersection (x, y)

Set Memory Allocation

Sets have a similar style of memory use to lists with increases at different size stages, but use less bytes.There are more frequent increases but the size remains less than the list of an equal size of items.

An empty set uses 64 bytes compared to 224 bytes for an empty list.

The addition of one set element increases its size to 96 bytes, which then increases to 128 bytes with 5 elements.

The size increases with an addition of four more elements to 192, then 8 more elements to 264 when there are 17 elements.

Data Structure Code Examples

A comprehensive list of functions relating to the data structures featured in this article. This can act as a great resource for students and learners everywhere for all your data structure needs.

Lists in Python

There are 11 python list functions and several common functions applied on python lists. We provide two examples for pop() as it can remove an item using an index or the default removes the last element.

List Functions

append()
alist = [‘A’, ’B’, ’C’]
alist.append(‘d’)
print (alist)

Output:     [’A’, ’B’, ’C’, ’d’]

extend()
alist = [‘A’, ’B’, ’C’]
blist = [‘d’, ’e’, ’f’]
alist.extend(blist)
print (alist)

Output:     [’A’, ’B’, ’C’, ’d’,’e’, ’f’]

insert()
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
alist.insert(3,‘z’)
print (alist)

Output: [’A’, ’B’, ’C’, ’z’, ’D ’, ’E ’, ’F’]

remove()
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
alist.remove(‘E’)
print (alist)

Output: [’A’, ’B’, ’C’, ’D ’, ’F’]

pop()
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
x = alist.pop()
print (alist)

Output: [’A’, ’B’, ’C’, ’z’, ’D ’, ’E ’]

pop(n)
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
x = alist.pop(2)
print (x)

Output: 2

clear()
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
alist.clear()
print (alist)

Output: []

index()
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
x = alist.index(‘C’)
print (x)

Output: 2

count()
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
x = alist.count('B')
print (x)

Output: 1

sort()
alist = [‘E’, ’A’, ’C’, ’D’,’ B’]
alist.sort()
print (alist)

Output: [’A’, ’B’, ’C’, ’D’, ’E’]

reverse()
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
alist.reverse()
print (alist)

Output: [‘F’, ’E’, ’D’, ’C’, ’B’, ’A’]

copy()
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
x = alist.copy()
print (alist)

Output: [’A’, ’B’, ’C’, ’D ’, ’E ’, ’F’]

Functions Applied on Lists

len()
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
print (len(alist))

Output: 6

type()
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
print (type(alist))
Output: <class ‘list’>

min()
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
print (min(alist))
Output: A

max()
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
print (min(alist))
Output: F

sum()
alist = [1, 2, 3]
print (sum(alist))
Output: 6

Operations Applied on Lists

in
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
if ‘B’ in alist:
    print (‘yes’)
Output: yes

in
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
if ‘b’ in alist:
    print (‘yes’)
no output

not in
alist = [‘A’,’B’,’C’,’D’,’E’, ‘F’]
if ‘b’ not in alist:
print (‘yes’)
Output: yes

+ operator
alist = [‘A’, ’B’, ’C’]
blist = [‘d’, ’e’, ’f’]
alist = alist + blist
print (alist)

Output:     [’A’, ’B’, ’C’, ’d’, ’e’, ’f’]

* operator
alist = [1, 2, 3]
print (alist*2)
Output:   [1, 2, 3, 1, 2, 3]

Apply keyword del to Delete a List

alist = [1, 2, 3]
del alist
print (alist)

Output:  NameError: name ‘alist’ is not defined

Tuples in Python

List functions that can be applied on tuples have the same syntax, but not all list functions are available for tuples. List functions that are not available for tuples:

  • append()
  • extend()
  • insert()
  • remove()
  • pop()
  • clear()
  • copy()

There are also sort(), that is not available but tuples can use sorted(), and reversed() that is different. Using the sorted() function on a tuple results in a list.

Online there is a tuple function called cmp() that compares tuples in python 2, but it does not work in python 3.

Tuple Functions

insert()
atup = (‘A’,’B’,’C’,’D’,’E’, ‘F’)
atup.insert(3,‘z’)
print (atup)

Output: (’A’, ’B’, ’C’, ’z’, ’D ’, ’E ’, ’F’)

index()
atup = (‘A’,’B’,’C’,’D’,’E’, ‘F’)
x = atup.index(‘C’)
print (x)

Output: 2

count()
atup = (‘A’,’B’,’C’,’D’,’E’, ‘F’)
x = atup.count('B')
print (x)

Output: 1

Tuple Functions (different from list functions)

sorted() - output is a list
atup = (‘E’, ’A’, ’C’, ’D’,’ B’)
x = sort(atup)
print (x)

Output: [’A’, ’B’, ’C’, ’D’, ’E’]

reverse()
atup = (‘A’,’B’,’C’,’D’,’E’, ‘F’)
x = atup.reverse()
print (tupe(atup))

Output: (‘F’, ’E’, ’D’, ’C’, ’B’, ’A’)

Functions Applied on Tuples

len()
atup = (‘A’,’B’,’C’,’D’,’E’, ‘F’)
print (len(atup))

Output: 6

type()
atup = (‘A’,’B’,’C’,’D’,’E’, ‘F’)
print (type(atup))
Output: <class ‘tuple’>

min()
atup = (‘A’,’B’,’C’,’D’,’E’, ‘F’)
print (min(atup))
Output: A

max()
atup = (‘A’,’B’,’C’,’D’,’E’, ‘F’)
print (min(atup))
Output: F

sum()
atup = (1, 3, 5)
print (sum(atup))
Output: 9

Operations Applied on Tuples

in
atup = (‘A’,’B’,’C’,’D’,’E’, ‘F’)
if ‘B’ in atup:
    print (‘yes’)
Output: yes

in
atup = (‘A’,’B’,’C’,’D’,’E’, ‘F’)
if ‘b’ in atup:
    print (‘yes’)
no output

not in
atup = (‘A’,’B’,’C’,’D’,’E’, ‘F’)
if ‘b’ not in atup:
print (‘yes’)
Output: yes

+ operator
atup = (‘A’, ’B’, ’C’)
btup = (‘d’, ’e’, ’f’)
atup = atup + btup
print (atup)

Output:     (’A’, ’B’, ’C’, ’d’, ’e’, ’f’)

* operator
atup = (1, 2, 3)
print (atup*2)
Output:   (1, 2, 3, 1, 2, 3)

Apply keyword del to Delete a Tuple

atup = (1, 2, 3)
del atup
print (atup)

Output:  NameError: name ‘atup’ is not defined

Sets in Python

create a set
set_X = {10,20,30}
print(type(set_X))
Output:   <class ‘set’>

len()
set_X = {10,20,30}
print(len(set_X))
Output:   3

in
set_X = {10,20,30}
print(10 in set_X)
Output:   True

not in
set_X = {10,20,30}
print(10 not in set_X)
Output:   False

Set Functions

add()
set_X = {10,20,30}
set_X.add(40)
print(set_X)
Output:   {40, 10, 20, 30}

union()
x = {"a", "b", "c"}
y = {"b", "c", "d"}
z = x.union(y)
print(z)
Output:   {‘a’, ‘b’, ‘d’, ‘c’}

intersection()
x = {"a", "b", "c"}
y = {"b", "c", "d"}
z = x.intersection(y)
print(z)
Output:   {‘b’,  ‘c’}

intersection_update()
x = {"a", "b", "c"}
y = {"b", "c", "d"}
x.intersection(y)
print(x)
Output:   {‘b’,  ‘c’}

difference()
x = {"a", "b", "c"}
y = {"b", "c", "d"}
print (x.difference(y))
print (y.difference(x))
Output:   {‘a’}
                  {‘d’}

difference_update()
x = {"a", "b", "c"}
y = {"b", "c", "d"}
x.difference(y)
print (x)
x = {"a", "b", "c"}
y = {"b", "c", "d"}

y.difference(x)
print(y)

Output:   {‘a’}
                  {‘d’}

issubset()
set_X = {10,20,30}
t = {10,20,30,40,50}
print(set_X.issubset(t))
t = {10,20}
print(set_X.issubset(t))

Output:   True
                  False

issuperset()
set_X = {10,20,30}
t = {10,20,30,40,50}
print(set_X.issuperset(t))
t = {10,20}
print(set_X.issuperset(t))

Output:   False
                  True

isdisjoint()
x = {"a", "b", "c"}
y = {"d","e"}
print (x.disjoint(y))
Output:   True

symmetric_difference()
set_X = {10,20,30}
t = {10,20,30,40,50}
print(set_X.symmetric_difference(t))
t = {10,20}
print(set_X.symmetric_difference(t))

Output:   {50, 40}
                 {30}

symmetric_difference_update()
set_X = {10,20,30}
t = {10,20,30,40,50}
set_X.symmetric_difference_update(t))
print(set_X)
t = {10,20}
set_X.symmetric_difference_update(t))
print(set_X)

Output:   {40, 50}
                 {30}

update()
x = {"a", "b", "c"}
y = {"b", "c", "d"}
x.update(y)
print(x)
Output:   {‘a’, ‘b’, ‘d’, ‘c’}

discard()
set_X = {10,20,30}
set_X.discard(10)
print(set_X)
Output:   {20, 30}

pop() – removes a random item
set_X = {10,20,30}
set_X.pop()
print(set_X)
Output:   {20, 30}

remove()
set_X = {10,20,30}
x = 20
set_X.remove(x)
print(set_X)
Output:   {10, 30}

clear()
set_X = {10,20,30}
set_X.clear()
print(set_X)
Output:   set()

copy()
set_X = {10,20,30}
i = set_X.copy()
print(i)
Output:   {10,20,30}

Dictionaries in Python

Although there are several dictionary functions that we have already seen in other data structures, we mainly use the primary functions of values(), keys(), items() and get().

Here is the full list of dictionary functions:

  • items()
  • keys()
  • values()
  • fromkeys()
  • get()
  • pop()
  • popitem()
  • clear()
  • copy()
  • update()
  • setdefault()

Dictionary Functions

Here is a list of dictionary functions and common operations that are applied on dictionaries.

items()
adict={1:'a', 7:'d', 4:'z'}
for i in adict.items():
    print(i)
 Output:
(1, ‘a’)
(7, ‘d’)
(4, ‘z’)

keys()
adict={1:'a', 7:'d', 4:'z'}
for i in adict.keys():
    print(i)
Output:
1
7
4

values()
adict={1:'a', 7:'d', 4:'z'}
for i in adict.values():
    print(i)
Output:
a
d
z

Order a Dictionary

To change the order of the keys or values of the dictionary it is possible to use a sorted function and a reverse function. Here is a series of ways to order a dictionary:

adict= {1:'a', 7:'d', 4:'z'}

#reversed
rdict = dict(reversed(list(adict.items())))

for k,v in rdict.items():
    print(k,v)
4 z
7 d
1 a

#ordered by keys
for k in sorted(adict.keys()):
print(k)

1
4
7

#ordered by values
for v in sorted(adict.values()):
     print(v)

a
d
z

#ordered
sdict = dict(sorted(adict.items()))
for k,v in sdict.items():

    print(k,v)

1 a
4 z
7 d

#ordered in reverse by key
for i in sorted(adict.keys(), reverse=True):
    print (i,adict[i])

7 d
4 z
1 a

#ordered by value
for k in sorted(adict, key=adict.get):
     print(k, adict[k])

1 a
7 d
4 z

#ordered in reverse by value
for k in sorted(adict, key=adict.get, reverse=True):
     print(k, adict[k])

4 z
7 d
1 a

References

[1] Is There a Sorting Algorithm Faster than Quicksort and Timsort? slashdot, at https://developers.slashdot.org/story/20/07/25/0050202/is-there-a-sorting-algorithm-faster-than-quicksort-and-timsort, accessed 13th July 2022

[2] Sets in Python, Geeks for Geeks, at https://www.geeksforgeeks.org/sets-in-python/?ref=lbp, accessed 13th July 2022

Leave a Comment