Sets
5. Sets
A set is an unordered collection of unique hashable values.
Example
nums = {1, 2, 3}
Important properties
- unordered → no guaranteed position
- unique → duplicates are removed automatically
- mutable → you can add / remove items
Uniqueness
This is the defining feature.
nums = {1, 2, 2, 3} # {1, 2, 3}
The duplicate 2 is automatically removed.
1.5.2 Why sets are powerful
The two biggest reasons:
Uniqueness
Automatic deduplication.
Fast membership
Average lookup complexity: O(1)
Membership checks
3 in nums
This is usually much faster than:
3 in [1, 2, 3]
because list membership is O(n).
A strong verbal answer should explicitly mention hashing.
1.5.3 How set lookup works conceptually
Sets use hashing, just like dictionaries.
Mental model
value -> hash -> bucket -> exists?
This is why lookup is usually constant time.
1.5.9 Common set operations
add()
nums = {1, 2}
nums.add(3) # {1, 2, 3}
Adds a single element. Does nothing if already present (sets auto-deduplicate).
Complexity: O(1)
remove()
nums.remove(2)
Removes the element. Raises KeyError if missing.
discard()
Safer version — raises no error if missing.
nums.discard(5) # No error even if 5 not in set
Complexity: O(1)
Interview context: Use discard() in production to avoid crashes.
pop()
Remove and return an arbitrary element.
item = nums.pop() # Removes random element
Raises KeyError if set is empty.
Complexity: O(1)
Union: | (all elements from both)
a = {1, 2, 3}
b = {3, 4, 5}
a | b # {1, 2, 3, 4, 5}
a.union(b) # Same result
Intersection: & (only in both)
a & b # {3}
a.intersection(b) # Same result
Real-world example:
backend_skills = {"python", "sql", "docker"}
candidate_skills = {"python", "aws", "docker"}
common = backend_skills & candidate_skills # {"python", "docker"}
Difference: - (in first but not second)
a - b # {1, 2}
a.difference(b) # Same result
Real example: required skills not in candidate's
required = {"python", "sql", "docker"}
candidate = {"python", "aws"}
missing = required - candidate # {"sql", "docker"}
Symmetric Difference: ^ (in exactly one)
a ^ b # {1, 2, 4, 5}
Elements in a or b, but not both.
1.5.10 Important interview patterns
Duplicate detection
len(nums) != len(set(nums)) # Has duplicates
Visited tracking (graphs/DFS)
visited = set()
visited.add(node)
if node in visited: # O(1) check
...
Common elements
common = set(a) & set(b) # O(n + m)
Much faster than nested loops O(n*m).
Unique count
unique = len(set(nums))
Set operations summary
| Operation | Method | Symbol | Use Case |
|---|---|---|---|
| Union | .union(b) | a | b | All unique elements |
| Intersection | .intersection(b) | a & b | Common elements |
| Difference | .difference(b) | a - b | In a, not in b |
| Sym Difference | .symmetric_difference(b) | a ^ b | In exactly one |
When to use sets vs lists vs dicts
Use set when:
- Order doesn't matter
- You need O(1) membership
- You need unique values
- You're doing set operations (union, intersection)
Use list when:
- Order matters
- You need indexing
[i] - You need duplicates
- Items not hashable
Use dict when:
- You need key-value mapping
- Fast lookup by key
- Order matters (Python 3.7+)
- Graph adjacency lists
Verbal interview questions
Answer these out loud:
- Why is set membership usually
O(1)? - Why use a set over a list?
- Explain union
- Explain intersection
- Why does converting list → set lose ordering?
Coding drills
Drill 1: duplicates
def has_duplicates(nums: list[int]) -> bool:
...
Drill 2: common elements
def common(a: list[int], b: list[int]) -> set[int]:
...
Use intersection.
Drill 3: preserve order dedupe
def dedupe_preserve_order(nums: list[int]) -> list[int]:
...
Must-answer interview questions
- list vs tuple
- set vs list for membership checks
- dict lookup complexity
- when ordering matters
Additional coding drills
- word frequency counter
- deduplicate list while preserving order
- flatten nested lists
- group objects by field
1.5.11 Common interview problems
Sets are heavily used in interview problems.
Problem 1: Contains Duplicate
Question: Given a list, return True if there are duplicates.
def containsDuplicate(nums: list[int]) -> bool:
# nums = [1, 2, 3, 1] → True
# nums = [1, 2, 3] → False
...
Key insight: If len(set) < len(list), there are duplicates.
return len(nums) != len(set(nums))
Or:
seen = set()
for num in nums:
if num in seen:
return True
seen.add(num)
return False
Complexity: O(n) time, O(n) space.
Problem 2: Find First Duplicate
Question: Find the first duplicate in a list while preserving order.
def firstDuplicate(nums: list[int]) -> int:
# nums = [1, 2, 3, 2, 4] → 2 (first duplicate found)
# nums = [1, 2, 3] → -1 (no duplicate)
...
Key insight: Scan left-to-right, track seen values.
seen = set()
for num in nums:
if num in seen:
return num
seen.add(num)
return -1
Complexity: O(n) time, O(n) space.
Problem 3: Intersection of Two Arrays
Question: Find common elements between two lists.
def intersection(nums1: list[int], nums2: list[int]) -> list[int]:
# nums1 = [1, 2, 2, 1], nums2 = [2, 2]
# Return [2]
...
Key insight: Convert to sets, use intersection operator.
return list(set(nums1) & set(nums2))
Complexity: O(n + m) time, O(min(n, m)) space.
Problem 4: Union of Two Sets
Question: Find all unique elements across two lists.
def union(nums1: list[int], nums2: list[int]) -> list[int]:
# nums1 = [1, 2], nums2 = [2, 3]
# Return [1, 2, 3]
...
Key insight: Use union operator.
return list(set(nums1) | set(nums2))
Complexity: O(n + m) time.
Problem 5: Happy Number (using sets)
Question: Detect if we enter a cycle when repeatedly summing squares of digits.
def isHappy(n: int) -> bool:
# n = 7 → True (eventually reaches 1)
# n = 2 → False (enters cycle)
...
Key insight: Track seen values in a set. If we see a value twice, we're in a cycle.
def get_next(num):
total = 0
while num > 0:
digit = num % 10
total += digit ** 2
num //= 10
return total
seen = set()
while n != 1 and n not in seen:
seen.add(n)
n = get_next(n)
return n == 1
Complexity: O(k) where k = iterations before reaching 1 or cycle.
Problem 6: Unique Email Addresses
Question: Count unique email addresses (ignoring rules about . and +).
def numUniqueEmails(emails: list[str]) -> int:
# Rules: ignore dots before @, ignore everything after + (before @)
# "test.email+tag@example.com" → "testemail@example.com"
...
Key insight: Normalize each email, add to set for automatic deduplication.
unique = set()
for email in emails:
local, domain = email.split("@")
local = local.split("+")[0].replace(".", "") # Remove + part, remove dots
unique.add(local + "@" + domain)
return len(unique)
Complexity: O(n * m) where m = avg email length.
Problem 7: Valid Sudoku
Question: Determine if a partially filled Sudoku board is valid (no duplicates in rows, columns, 3x3 boxes).
def isValidSudoku(board: list[list[str]]) -> bool:
# board[i][j] is either a digit or "."
...
Key insight: Use sets to track numbers seen in rows, columns, and 3x3 boxes.
rows = [set() for _ in range(9)]
cols = [set() for _ in range(9)]
boxes = [set() for _ in range(9)]
for i in range(9):
for j in range(9):
num = board[i][j]
if num == ".":
continue
box_idx = (i // 3) * 3 + (j // 3)
if num in rows[i] or num in cols[j] or num in boxes[box_idx]:
return False
rows[i].add(num)
cols[j].add(num)
boxes[box_idx].add(num)
return True
Complexity: O(1) for 9x9 board (fixed size).
Problem 8: Longest Consecutive Sequence
Question: Find longest sequence of consecutive integers.
def longestConsecutive(nums: list[int]) -> int:
# nums = [100, 4, 200, 1, 3, 2]
# Return 4 ([1, 2, 3, 4])
...
Key insight: Use set for O(1) lookup. For each number, check if it starts a sequence.
num_set = set(nums)
longest = 0
for num in num_set:
if num - 1 not in num_set: # Start of sequence
length = 1
while num + length in num_set:
length += 1
longest = max(longest, length)
return longest
Complexity: O(n) amortized (each number visited at most twice).
Problem 9: Set Difference (Required vs Offered)
Question: Find skills required but not offered in a list.
def missingSkills(required: list[str], offered: list[str]) -> list[str]:
# required = ["python", "sql", "docker"]
# offered = ["python", "aws"]
# Return ["sql", "docker"]
...
Key insight: Set difference.
return list(set(required) - set(offered))
Complexity: O(n + m) where n = required, m = offered.
Problem 10: Group Duplicate Words
Question: Group words that are anagrams of each other.
def groupAnagrams(words: list[str]) -> list[list[str]]:
# words = ["eat", "tea", "ate", "bat", "tab", "cat"]
# Return [["eat", "tea", "ate"], ["bat", "tab"], ["cat"]]
...
Key insight: Sorted string as key, use dict to group.
groups = {}
for word in words:
key = ''.join(sorted(word))
groups.setdefault(key, []).append(word)
return list(groups.values())
or with sets for unique tracking:
from collections import defaultdict
groups = defaultdict(set)
for word in words:
key = ''.join(sorted(word))
groups[key].add(word)
return [list(group) for group in groups.values()]
Complexity: O(n * k log k) where k = avg word length.