Python has had sets for a while, but they aren't used as much as lists. Nevertheless, they can be very beneficial in certain cases.
A set is a collection of unique objects.
A basic use case is removing duplication.
test_set = ['spam', 'spam', 'eggs', 'spam', 'bacon', 'eggs']
print(set(test_set))
# {'eggs', 'spam', 'bacon'}
print(list(set(l)))
#['eggs', 'spam', 'bacon']
There are two types of sets in Python:
The set Type itself is not hashable, however, the set elements must be hashable to enforce uniqueness. The frozenset Type is hashable since it's immutable by default.
In addition to this, Set Type implements many operations as Infix Operators, for example, given two sets a and b:
# Defining the two sets
first_set = {1, 5, 7, 4, 5}
second_set = {4, 5, 6, 7, 8}
# Creating the intersection of the two sets
intersect_set = first_set & second_set
print(intersect_set) # Output: {4, 5}
# Creating the union of the two sets
union_set = first_set | second_set
print(union_set) # Output: {1, 2, 3, 4, 5, 6, 7, 8}
# Creating the difference of the two sets
diff_set = first_set - second_set
print(diff_set) # Output: {1, 2, 3}
Python provides different ways to create or instantiate set objects, the first is through Literal Syntax and the second is Set Comprehension, and this comes as no surprise since a set is a Python sequence.
The standard string representation of sets always uses the {…} notation.
The syntax is quite simple with the one caveat, there is no way to create an empty Set, so we must call the constructor directly.
not_an_empty_set = {1, 5, 7, 4, 5} # Set Literal
empty_sest = set() # Empty Set
The syntax {1, 2, 3} is quicker and easier to read than writing out set([1, 2, 3]). It's slower to evaluate the latter form since Python has to check the set name, build a list, and afterward pass it to the constructor.
SetComp is similar to listComp and dictComp, it’s the creation of a set based on existing iterables.
new_set = {s for s in [1, 2, 1, 0]}
print(new_set)
# set([0, 1, 2])
The set Type and frozenset Type are implemented with a hash table, and this can have certain effects:
Set elements must be hashable objects. They must implement proper "hash" and "eq" methods.
Membership testing is very efficient. A set may have millions of elements, but an element can be located directly by computing its hash code and deriving an index offset.
Sets have a significant memory overhead, sometimes a low-level array would be more compact and efficient.
If two elements are different but have the same hash code, their position depends on which element is added first, so basically, element ordering depends on the insertion order.
Adding elements to a set may change the order of existing elements. That’s because the algorithm becomes less efficient if the hash table is more than two-thirds full.
Dict views implement similar operations to set operations, using set operators with views will save a lot of loops and ifs when inspecting the contents of dictionaries in your code.
The Python Standard Library documentation, “
The third chapter of the book, Fluent Python
The first chapter of the book, Python Cookbook, 3rd Edition.
Also published here.