🐍 Python Course

Sets

5. Sets

A set is an unordered collection of unique hashable values.

Example

nums = {1, 2, 3}

Important properties

  • unordered → no guaranteed position
  • unique → duplicates are removed automatically
  • mutable → you can add / remove items

Uniqueness

This is the defining feature.

nums = {1, 2, 2, 3}   # {1, 2, 3}

The duplicate 2 is automatically removed.


1.5.2 Why sets are powerful

The two biggest reasons:

Uniqueness

Automatic deduplication.

Fast membership

Average lookup complexity: O(1)

Membership checks

3 in nums

This is usually much faster than:

3 in [1, 2, 3]

because list membership is O(n).

A strong verbal answer should explicitly mention hashing.


1.5.3 How set lookup works conceptually

Sets use hashing, just like dictionaries.

Mental model

value -> hash -> bucket -> exists?

This is why lookup is usually constant time.


1.5.9 Common set operations

add()

nums = {1, 2}
nums.add(3)   # {1, 2, 3}

Adds a single element. Does nothing if already present (sets auto-deduplicate).

Complexity: O(1)


remove()

nums.remove(2)

Removes the element. Raises KeyError if missing.

discard()

Safer version — raises no error if missing.

nums.discard(5)  # No error even if 5 not in set

Complexity: O(1)

Interview context: Use discard() in production to avoid crashes.


pop()

Remove and return an arbitrary element.

item = nums.pop()  # Removes random element

Raises KeyError if set is empty.

Complexity: O(1)


Union: | (all elements from both)

a = {1, 2, 3}
b = {3, 4, 5}
a | b  # {1, 2, 3, 4, 5}
a.union(b)  # Same result

Intersection: & (only in both)

a & b  # {3}
a.intersection(b)  # Same result

Real-world example:

backend_skills = {"python", "sql", "docker"}
candidate_skills = {"python", "aws", "docker"}
common = backend_skills & candidate_skills  # {"python", "docker"}

Difference: - (in first but not second)

a - b  # {1, 2}
a.difference(b)  # Same result

Real example: required skills not in candidate's

required = {"python", "sql", "docker"}
candidate = {"python", "aws"}
missing = required - candidate  # {"sql", "docker"}

Symmetric Difference: ^ (in exactly one)

a ^ b  # {1, 2, 4, 5}

Elements in a or b, but not both.


1.5.10 Important interview patterns

Duplicate detection

len(nums) != len(set(nums))  # Has duplicates

Visited tracking (graphs/DFS)

visited = set()
visited.add(node)
if node in visited:  # O(1) check
    ...

Common elements

common = set(a) & set(b)  # O(n + m)

Much faster than nested loops O(n*m).

Unique count

unique = len(set(nums))

Set operations summary

OperationMethodSymbolUse Case
Union.union(b)a | bAll unique elements
Intersection.intersection(b)a & bCommon elements
Difference.difference(b)a - bIn a, not in b
Sym Difference.symmetric_difference(b)a ^ bIn exactly one

When to use sets vs lists vs dicts

Use set when:

  • Order doesn't matter
  • You need O(1) membership
  • You need unique values
  • You're doing set operations (union, intersection)

Use list when:

  • Order matters
  • You need indexing [i]
  • You need duplicates
  • Items not hashable

Use dict when:

  • You need key-value mapping
  • Fast lookup by key
  • Order matters (Python 3.7+)
  • Graph adjacency lists

Verbal interview questions

Answer these out loud:

  • Why is set membership usually O(1)?
  • Why use a set over a list?
  • Explain union
  • Explain intersection
  • Why does converting list → set lose ordering?

Coding drills

Drill 1: duplicates

def has_duplicates(nums: list[int]) -> bool:
    ...

Drill 2: common elements

def common(a: list[int], b: list[int]) -> set[int]:
    ...

Use intersection.

Drill 3: preserve order dedupe

def dedupe_preserve_order(nums: list[int]) -> list[int]:
    ...

Must-answer interview questions

  • list vs tuple
  • set vs list for membership checks
  • dict lookup complexity
  • when ordering matters

Additional coding drills

  • word frequency counter
  • deduplicate list while preserving order
  • flatten nested lists
  • group objects by field

1.5.11 Common interview problems

Sets are heavily used in interview problems.

Problem 1: Contains Duplicate

Question: Given a list, return True if there are duplicates.

def containsDuplicate(nums: list[int]) -> bool:
    # nums = [1, 2, 3, 1] → True
    # nums = [1, 2, 3] → False
    ...

Key insight: If len(set) < len(list), there are duplicates.

return len(nums) != len(set(nums))

Or:

seen = set()
for num in nums:
    if num in seen:
        return True
    seen.add(num)
return False

Complexity: O(n) time, O(n) space.


Problem 2: Find First Duplicate

Question: Find the first duplicate in a list while preserving order.

def firstDuplicate(nums: list[int]) -> int:
    # nums = [1, 2, 3, 2, 4] → 2 (first duplicate found)
    # nums = [1, 2, 3] → -1 (no duplicate)
    ...

Key insight: Scan left-to-right, track seen values.

seen = set()
for num in nums:
    if num in seen:
        return num
    seen.add(num)
return -1

Complexity: O(n) time, O(n) space.


Problem 3: Intersection of Two Arrays

Question: Find common elements between two lists.

def intersection(nums1: list[int], nums2: list[int]) -> list[int]:
    # nums1 = [1, 2, 2, 1], nums2 = [2, 2]
    # Return [2]
    ...

Key insight: Convert to sets, use intersection operator.

return list(set(nums1) & set(nums2))

Complexity: O(n + m) time, O(min(n, m)) space.


Problem 4: Union of Two Sets

Question: Find all unique elements across two lists.

def union(nums1: list[int], nums2: list[int]) -> list[int]:
    # nums1 = [1, 2], nums2 = [2, 3]
    # Return [1, 2, 3]
    ...

Key insight: Use union operator.

return list(set(nums1) | set(nums2))

Complexity: O(n + m) time.


Problem 5: Happy Number (using sets)

Question: Detect if we enter a cycle when repeatedly summing squares of digits.

def isHappy(n: int) -> bool:
    # n = 7 → True (eventually reaches 1)
    # n = 2 → False (enters cycle)
    ...

Key insight: Track seen values in a set. If we see a value twice, we're in a cycle.

def get_next(num):
    total = 0
    while num > 0:
        digit = num % 10
        total += digit ** 2
        num //= 10
    return total

seen = set()
while n != 1 and n not in seen:
    seen.add(n)
    n = get_next(n)

return n == 1

Complexity: O(k) where k = iterations before reaching 1 or cycle.


Problem 6: Unique Email Addresses

Question: Count unique email addresses (ignoring rules about . and +).

def numUniqueEmails(emails: list[str]) -> int:
    # Rules: ignore dots before @, ignore everything after + (before @)
    # "test.email+tag@example.com" → "testemail@example.com"
    ...

Key insight: Normalize each email, add to set for automatic deduplication.

unique = set()
for email in emails:
    local, domain = email.split("@")
    local = local.split("+")[0].replace(".", "")  # Remove + part, remove dots
    unique.add(local + "@" + domain)
return len(unique)

Complexity: O(n * m) where m = avg email length.


Problem 7: Valid Sudoku

Question: Determine if a partially filled Sudoku board is valid (no duplicates in rows, columns, 3x3 boxes).

def isValidSudoku(board: list[list[str]]) -> bool:
    # board[i][j] is either a digit or "."
    ...

Key insight: Use sets to track numbers seen in rows, columns, and 3x3 boxes.

rows = [set() for _ in range(9)]
cols = [set() for _ in range(9)]
boxes = [set() for _ in range(9)]

for i in range(9):
    for j in range(9):
        num = board[i][j]
        if num == ".":
            continue
        
        box_idx = (i // 3) * 3 + (j // 3)
        
        if num in rows[i] or num in cols[j] or num in boxes[box_idx]:
            return False
        
        rows[i].add(num)
        cols[j].add(num)
        boxes[box_idx].add(num)

return True

Complexity: O(1) for 9x9 board (fixed size).


Problem 8: Longest Consecutive Sequence

Question: Find longest sequence of consecutive integers.

def longestConsecutive(nums: list[int]) -> int:
    # nums = [100, 4, 200, 1, 3, 2]
    # Return 4 ([1, 2, 3, 4])
    ...

Key insight: Use set for O(1) lookup. For each number, check if it starts a sequence.

num_set = set(nums)
longest = 0

for num in num_set:
    if num - 1 not in num_set:  # Start of sequence
        length = 1
        while num + length in num_set:
            length += 1
        longest = max(longest, length)

return longest

Complexity: O(n) amortized (each number visited at most twice).


Problem 9: Set Difference (Required vs Offered)

Question: Find skills required but not offered in a list.

def missingSkills(required: list[str], offered: list[str]) -> list[str]:
    # required = ["python", "sql", "docker"]
    # offered = ["python", "aws"]
    # Return ["sql", "docker"]
    ...

Key insight: Set difference.

return list(set(required) - set(offered))

Complexity: O(n + m) where n = required, m = offered.


Problem 10: Group Duplicate Words

Question: Group words that are anagrams of each other.

def groupAnagrams(words: list[str]) -> list[list[str]]:
    # words = ["eat", "tea", "ate", "bat", "tab", "cat"]
    # Return [["eat", "tea", "ate"], ["bat", "tab"], ["cat"]]
    ...

Key insight: Sorted string as key, use dict to group.

groups = {}
for word in words:
    key = ''.join(sorted(word))
    groups.setdefault(key, []).append(word)
return list(groups.values())

or with sets for unique tracking:

from collections import defaultdict
groups = defaultdict(set)
for word in words:
    key = ''.join(sorted(word))
    groups[key].add(word)
return [list(group) for group in groups.values()]

Complexity: O(n * k log k) where k = avg word length.