Sets and Frozensets

Introduction

Graphical Depiction of Sets as Circles

In this chapter of our tutorial, we are dealing with Python's implementation of sets. Though sets are nowadays an integral part of modern mathematics, this has not always been the case. The set theory had been rejected by many, even by some great thinkers. One of them was the philosopher Wittgenstein. He didn't like the set theory and complained mathematics is "ridden through and through with the pernicious idioms of set theory...". He dismissed the set theory as "utter nonsense", as being "laughable" and "wrong". His criticism appeared years after the death of the German mathematician Georg Cantor, the founder of the set theory. David Hilbert defended it from its critics by famously declaring: "No one shall expel us from the Paradise that Cantor has created.

Cantor defined a set at the beginning of his "Beiträge zur Begründung der transfiniten Mengenlehre" as: "A set is a gathering together into a whole of definite, distinct objects of our perception and of our thought - which are called elements of the set." Nowadays, we can say in "plain" English: A set is a well-defined collection of objects.

The elements or members of a set can be anything: numbers, characters, words, names, letters of the alphabet, even other sets, and so on. Sets are usually denoted with capital letters. This is not the exact mathematical definition, but it is good enough for the following.

The data type "set", which is a collection type, has been part of Python since version 2.4. A set contains an unordered collection of unique and immutable objects. The set data type is, as the name implies, a Python implementation of the sets as they are known from mathematics. This explains, why sets unlike lists or tuples can't have multiple occurrences of the same element.

Sets

If we want to create a set, we can call the built-in set function with a sequence or another iterable object:

In the following example, a string is singularized into its characters to build the resulting set x:

x = set("A Python Tutorial")
x
Output::
{' ', 'A', 'P', 'T', 'a', 'h', 'i', 'l', 'n', 'o', 'r', 't', 'u', 'y'}
type(x)
Output::
set

We can pass a list to the built-in set function, as we can see in the following:

x = set(["Perl", "Python", "Java"])
x
Output::
{'Java', 'Perl', 'Python'}

Now, we want to show what happens, if we pass a tuple with reappearing elements to the set function - in our example the city "Paris":

cities = set(("Paris", "Lyon", "London","Berlin","Paris","Birmingham"))
cities
Output::
{'Berlin', 'Birmingham', 'London', 'Lyon', 'Paris'}

As we have expected, no doubles occur in the resulting set of cities.

Immutable Sets

Sets are implemented in a way, which doesn't allow mutable objects. The following example demonstrates that we cannot include, for example, lists as elements:

cities = set((["Python","Perl"], ["Paris", "Berlin", "London"]))

cities
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-5a60d6eeb901> in <module>
----> 1 cities = set((["Python","Perl"], ["Paris", "Berlin", "London"]))
      2 
      3 cities

TypeError: unhashable type: 'list'

Tuples on the other hand are fine:

In [ ]:
cities = set((("Python","Perl"), ("Paris", "Berlin", "London")))

Frozensets

Though sets can't contain mutable objects, sets are mutable:

cities = set(["Frankfurt", "Basel","Freiburg"])
cities.add("Strasbourg")
cities
Output::
{'Basel', 'Frankfurt', 'Freiburg', 'Strasbourg'}

Frozensets are like sets except that they cannot be changed, i.e. they are immutable:

cities = frozenset(["Frankfurt", "Basel","Freiburg"])
cities.add("Strasbourg")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-9abedff8e0b3> in <module>
      1 cities = frozenset(["Frankfurt", "Basel","Freiburg"])
----> 2 cities.add("Strasbourg")

AttributeError: 'frozenset' object has no attribute 'add'

Improved notation

We can define sets (since Python2.6) without using the built-in set function. We can use curly braces instead:

adjectives = {"cheap","expensive","inexpensive","economical"}
adjectives
Output::
{'cheap', 'economical', 'expensive', 'inexpensive'}

Set Operations

add(element)

A method which adds an element to a set. This element has to be immutable.
colours = {"red","green"}
colours.add("yellow")
colours
Output::
{'green', 'red', 'yellow'}
colours.add(["black","white"])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-b914ecb44006> in <module>
----> 1 colours.add(["black","white"])

TypeError: unhashable type: 'list'

Of course, an element will only be added, if it is not already contained in the set. If it is already contained, the method call has no effect.

clear()

All elements will be removed from a set.
cities = {"Stuttgart", "Konstanz", "Freiburg"}
cities.clear()
cities
Output::
set()

copy

Creates a shallow copy, which is returned.
more_cities = {"Winterthur","Schaffhausen","St. Gallen"}
cities_backup = more_cities.copy()
more_cities.clear()
cities_backup 
Output::
{'Schaffhausen', 'St. Gallen', 'Winterthur'}
Just in case, you might think, an assignment might be enough:
more_cities = {"Winterthur","Schaffhausen","St. Gallen"}
cities_backup = more_cities
more_cities.clear()
cities_backup
Output::
set()
The assignment "cities_backup = more_cities" just creates a pointer, i.e. another name, to the same data structure. 

difference()

This method returns the difference of two or more sets as a new set.
x = {"a","b","c","d","e"}
y = {"b","c"}
z = {"c","d"}
x.difference(y) 
Output::
{'a', 'd', 'e'}
x.difference(y).difference(z)
Output::
{'a', 'e'}
Instead of using the method difference, we can use the operator "-":
x - y
Output::
{'a', 'd', 'e'}
x - y - z
Output::
{'a', 'e'}

difference_update()

The method difference_update removes all elements of another set from this set. x.difference_update(y) is the same as "x = x - y" 
x = {"a","b","c","d","e"}
y = {"b","c"}
x.difference_update(y)
x = {"a","b","c","d","e"}
y = {"b","c"}
x = x - y
x
Output::
{'a', 'd', 'e'}

discard(el)

An element el will be removed from the set, if it is contained in the set. If el is not a member of the set, nothing will be done.
x = {"a","b","c","d","e"}
x.discard("a")
x    
Output::
{'b', 'c', 'd', 'e'}
x.discard("z")
x   
Output::
{'b', 'c', 'd', 'e'}

remove(el)

works like discard(), but if el is not a member of the set, a KeyError will be raised.
x = {"a","b","c","d","e"}
x.remove("a")
x   
Output::
{'b', 'c', 'd', 'e'}
x.remove("z")    
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-18-7c787df95113> in <module>
----> 1 x.remove("z")

KeyError: 'z'

union(s)

This method returns the union of two sets as a new set, i.e. all elements that are in either set.
x = {"a","b","c","d","e"}
y = {"c","d","e","f","g"}
x.union(y)   
Output::
{'a', 'b', 'c', 'd', 'e', 'f', 'g'}
This can be abbreviated with the pipe operator "|":
x = {"a","b","c","d","e"}
y = {"c","d","e","f","g"}
x | y
Output::
{'a', 'b', 'c', 'd', 'e', 'f', 'g'}

intersection(s)

Returns the intersection of the instance set and the set s as a new set. In other words, a set with all the elements which are contained in both sets is returned.
x = {"a","b","c","d","e"}
y = {"c","d","e","f","g"}
x.intersection(y)
Output::
{'c', 'd', 'e'}
This can be abbreviated with the ampersand operator "&":
x = {"a","b","c","d","e"}
y = {"c","d","e","f","g"}
x  & y
Output::
{'c', 'd', 'e'}

isdisjoint()

This method returns True if two sets have a null intersection.
x = {"a","b","c"}
y = {"c","d","e"}
x.isdisjoint(y)
Output::
False
x = {"a","b","c"}
y = {"d","e","f"}
x.isdisjoint(y) 
Output::
True

issubset()

x.issubset(y) returns True, if x is a subset of y. "<=" is an abbreviation for "Subset of" and ">=" for "superset of"
"<" is used to check if a set is a proper subset of a set.  
x = {"a","b","c","d","e"}
y = {"c","d"}
x.issubset(y)
Output::
False
y.issubset(x)
Output::
True
x < y
Output::
False
y < x # y is a proper subset of x   
Output::
True
x < x # a set can never be a proper subset of oneself.
Output::
False
x <= x 
Output::
True

issuperset()

x.issuperset(y) returns True, if x is a superset of y. ">=" is an abbreviation for "issuperset of"
">" is used to check if a set is a proper superset of a set.    
x = {"a","b","c","d","e"}
y = {"c","d"}
x.issuperset(y)
Output::
True
x > y
Output::
True
x >= y
Output::
True
x >= x   
Output::
True
x > x
Output::
False
x.issuperset(x)
Output::
True

pop()

pop() removes and returns an arbitrary set element. The method raises a KeyError if the set is empty.
x = {"a","b","c","d","e"}
x.pop()
Output::
'e'
x.pop()
Output::
'a'