Sets in python provide a method to create a unique set of unordered items with no duplicates. Their main use case is for checking if an item exists in a set of items, which can be useful in many different situations.
Creating a set is pretty easy, and is kind of similar to how we define lists in Python. The only difference, is we use {}
curly brackets to define a set:
mySet = { "some", "set", "of", "items" }
Sets can also be defined from lists using the set()
function:
mySet = set([ 'some', 'list', 'becoming', 'a', 'set' ])
# set is { 'some', 'list', 'becoming', 'a', 'set' }
You can also create sets from strings, using the same set()
function:
mySet = set('somestring')
# set is { 's', 'o', 'm', 'e', 's', 't', 'r', 'i', 'n', 'g' }
As with other countable types of data, we can use len
to get the length of a set, too:
let mySet = set([ 'some', 'list', 'becoming', 'a', 'set' ])
print(len(mySet)) # Returns 5
Finally, we can also define what is known as a frozenset
, which is simply an immutable, unchangeable version of a set with a fixed value, using the frozenset()
function:
let mySet = frozenset([ 'some', 'list', 'becoming', 'a', 'set' ])
We can combine two sets into one using the |
operator. If an item exists in both sets, only one copy of it will be brought over. Here's an example where we combine two sets:
mySet = { "set", "one" }
myNewSet = { "set", "two" }
combinedSet = mySet | myNewSet
print(combinedSet) # { "set", "one", "two" }
We can intersect sets using &
. That means we'll end up with a set where the items are only items that exist in both. Using the same example, we can therefore create a set only containing the item set
:
mySet = { "set", "one" }
myNewSet = { "set", "two" }
combinedSet = mySet & myNewSet
print(combinedSet) # { "set" }
Another way we can combine sets is by subtraction, o end up with a new set that only contains items left when removing any common items in both sets. For example, the new set below only has one item - cool
, since mySet
and mySecondSet
both contain "set" and "one":
mySet = { "set", "one", "cool" }
mySecondSet = { "set", "one" }
myNewSet = mySet - mySecondSet
print(myNewSet) # { "cool" }
Finally, we can do what is called symmetric difference, where we end up with a set that contains items found in either mySet
or mySecondSet
, but not both:
mySet = { "set", "one", "cool", "nice" }
mySecondSet = { "set", "one", "friendly" }
myNewSet = mySet ^ mySecondSet
print(myNewSet) # { "cool", "nice", "friendly" }
The main use case for sets is testing membership, to see if an item exists in a set. We can do this using the in
and not in
keywords. Let's look at an example. If we want to check orange
is in our fruits
set, we use in
:
fruits = { "orange", "apple", "peach" }
print("orange" in fruits) # True
Or, if we want to check if orange is not in fruits
, we use not in
:
fruits = { "orange", "apple", "peach" }
print("orange" not in fruits) # False
As with lists, we can make a copy of a set using the copy()
method attached to all sets. This will not change the value but will change the reference in memory for this new set. That means that if compared by value using ==
, the sets will be the same, when compared by reference using is
, the sets will not be the same:
mySet = { "set", "one" }
mySetCopy = mySet.copy();
print(mySet == mySetCopy) # True
print(mySet is mySetCopy) # False
Another really useful use case for sets is the ability to check if a set is a superset or subset of another set (which is a bit of a tongue twister):
Let's say we have two sets, as shown below:
mySet = { "set", "one", "two" }
mySecondSet = { "set", "one" }
mySecondSet
, is in fact a subset of mySet
, since it is fully contained within mySet
. We can test for this using the <=
operator:
mySet = { "set", "one", "two" }
mySecondSet = { "set", "one" }
print(mySecondSet <= mySet) # True
We can also use the <
operator to check for true subsets, meaning that mySecondSet
is contained within mySet
, but is not equal in value to mySet
. In the example above, this is also true:
mySet = { "set", "one", "two" }
mySecondSet = { "set", "one" }
print(mySecondSet < mySet) # True
In the following example, however, mySecondSet
is indeed a subset of mySet
, but it is not a true subset, since both are equal in value:
mySet = { "set", "one", }
mySecondSet = { "set", "one" }
print(mySecondSet <= mySet) # True
print(mySecondSet < mySet) # False
Super sets work exactly the same way as subsets - the only difference is the arrow is the opposite way around. So >
is used to check for true supersets, while >=
is used to check for any supersets. Using our example from before, mySet
is a superset of mySecondSet
- so the following returns true:
mySet = { "set", "one", "two" }
mySecondSet = { "set", "one" }
print(mySet > mySecondSet) # True
And similarly, while mySet is a superset of mySeconSet
below, it is not a true superset, so >
does not return true, while >=
does:
mySet = { "set", "one", }
mySecondSet = { "set", "one" }
print(mySet >= mySecondSet) # True
print(mySet > mySecondSet) # False
Sometimes, you'll also want to check if two sets are completely original when compared to each other. For example, { "one", "two" }
, and { "three", "four" }
are two sets with unique values when compared to each other. In Python, the isdisjoint
function allows us to accomplish that:
mySet = { "one", "two", }
mySecondSet = { "three", "four" }
print(mySet.isdisjoint(mySecondSet)) # True
While everything we've talked about so far applies both to frozenset
s and set
s, there are also a few other methods available to set
s, which allow us to mutate their value. These are:
set.add('item')
- adds an item to the set.set.remove('item')
- removes an item from the set.set.update(newSet)
- adds all items from newSet
to the original set
. This can also be written as set |= newSet
set.clear()
- removes all items from a setset.pop(4)
- removes the 4th item from a set, or the last item if no number is specifiedset.intersection_update(newSet)
- keeps only items found in both set
and newSet
. Can also be written as set &= newSet
set.difference_update(newSet)
- takes set
, and removes any items found in newSet
. Can also be written as set -= newSet
set.symmetric_difference_update(newSet)
- keeps only found in either set
and newSet
, but not both. Can also be written as set ^= newSet
While the first 5 provide easy ways to add and remove items from sets, the last 3 are the same as what we talked about before when we covered intersecting and combining sets. The difference here is we can use these functions to change the set
itself. While this is possible on normal sets, we cannot apply these methods to a frozenset
.
That should be everything you need to know about sets in Python. I hope you've enjoyed this guide. I've also written more about all of the different data structures available in Python here. If you've enjoyed this guide, you might also enjoy my other engineering content here.
Thanks for reading! You can learn more about Python data collections below:
Also published here.