5. Sets

A set is an unordered collection of unique hashable values.

Example

nums = {1, 2, 3}

Important properties

unordered → no guaranteed position
unique → duplicates are removed automatically
mutable → you can add / remove items

Uniqueness

This is the defining feature.

nums = {1, 2, 2, 3}   # {1, 2, 3}

The duplicate 2 is automatically removed.

1.5.2 Why sets are powerful

The two biggest reasons:

Uniqueness

Automatic deduplication.

Fast membership

Average lookup complexity: O(1)

Membership checks

3 in nums

This is usually much faster than:

3 in [1, 2, 3]

because list membership is O(n).

A strong verbal answer should explicitly mention hashing.

1.5.3 How set lookup works conceptually

Sets use hashing, just like dictionaries.

Mental model

value -> hash -> bucket -> exists?

This is why lookup is usually constant time.

1.5.9 Common set operations

`add()`

nums = {1, 2}
nums.add(3)   # {1, 2, 3}

Adds a single element. Does nothing if already present (sets auto-deduplicate).

Complexity: O(1)

`remove()`

nums.remove(2)

Removes the element. Raises KeyError if missing.

`discard()`

Safer version — raises no error if missing.

nums.discard(5)  # No error even if 5 not in set

Complexity: O(1)

Interview context: Use discard() in production to avoid crashes.

`pop()`

Remove and return an arbitrary element.

item = nums.pop()  # Removes random element

Raises KeyError if set is empty.

Complexity: O(1)

Union: `|` (all elements from both)

a = {1, 2, 3}
b = {3, 4, 5}
a | b  # {1, 2, 3, 4, 5}
a.union(b)  # Same result

Intersection: `&` (only in both)

a & b  # {3}
a.intersection(b)  # Same result

Real-world example:

backend_skills = {"python", "sql", "docker"}
candidate_skills = {"python", "aws", "docker"}
common = backend_skills & candidate_skills  # {"python", "docker"}

Difference: `-` (in first but not second)

a - b  # {1, 2}
a.difference(b)  # Same result

Real example: required skills not in candidate's

required = {"python", "sql", "docker"}
candidate = {"python", "aws"}
missing = required - candidate  # {"sql", "docker"}

Symmetric Difference: `^` (in exactly one)

a ^ b  # {1, 2, 4, 5}

Elements in a or b, but not both.

1.5.10 Important interview patterns

Duplicate detection

len(nums) != len(set(nums))  # Has duplicates

Visited tracking (graphs/DFS)

visited = set()
visited.add(node)
if node in visited:  # O(1) check
    ...

Common elements

common = set(a) & set(b)  # O(n + m)

Much faster than nested loops O(n*m).

Unique count

unique = len(set(nums))

Set operations summary

Operation	Method	Symbol	Use Case
Union	`.union(b)`	`a \| b`	All unique elements
Intersection	`.intersection(b)`	`a & b`	Common elements
Difference	`.difference(b)`	`a - b`	In a, not in b
Sym Difference	`.symmetric_difference(b)`	`a ^ b`	In exactly one

When to use sets vs lists vs dicts

Use set when:

Order doesn't matter
You need O(1) membership
You need unique values
You're doing set operations (union, intersection)

Use list when:

Order matters
You need indexing [i]
You need duplicates
Items not hashable

Use dict when:

You need key-value mapping
Fast lookup by key
Order matters (Python 3.7+)
Graph adjacency lists

Verbal interview questions

Answer these out loud:

Why is set membership usually O(1)?
Why use a set over a list?
Explain union
Explain intersection
Why does converting list → set lose ordering?

Coding drills

Drill 1: duplicates

def has_duplicates(nums: list[int]) -> bool:
    ...

Drill 2: common elements

def common(a: list[int], b: list[int]) -> set[int]:
    ...

Use intersection.

Drill 3: preserve order dedupe

def dedupe_preserve_order(nums: list[int]) -> list[int]:
    ...

Must-answer interview questions

list vs tuple
set vs list for membership checks
dict lookup complexity
when ordering matters

Additional coding drills

word frequency counter
deduplicate list while preserving order
flatten nested lists
group objects by field

1.5.11 Common interview problems

Sets are heavily used in interview problems.

Problem 1: Contains Duplicate

Question: Given a list, return True if there are duplicates.

def containsDuplicate(nums: list[int]) -> bool:
    # nums = [1, 2, 3, 1] → True
    # nums = [1, 2, 3] → False
    ...

Key insight: If len(set) < len(list), there are duplicates.

return len(nums) != len(set(nums))

Or:

seen = set()
for num in nums:
    if num in seen:
        return True
    seen.add(num)
return False

Complexity: O(n) time, O(n) space.

Problem 2: Find First Duplicate

Question: Find the first duplicate in a list while preserving order.

def firstDuplicate(nums: list[int]) -> int:
    # nums = [1, 2, 3, 2, 4] → 2 (first duplicate found)
    # nums = [1, 2, 3] → -1 (no duplicate)
    ...

Key insight: Scan left-to-right, track seen values.

seen = set()
for num in nums:
    if num in seen:
        return num
    seen.add(num)
return -1

Complexity: O(n) time, O(n) space.

Problem 3: Intersection of Two Arrays

Question: Find common elements between two lists.

def intersection(nums1: list[int], nums2: list[int]) -> list[int]:
    # nums1 = [1, 2, 2, 1], nums2 = [2, 2]
    # Return [2]
    ...

Key insight: Convert to sets, use intersection operator.

return list(set(nums1) & set(nums2))

Complexity: O(n + m) time, O(min(n, m)) space.

Problem 4: Union of Two Sets

Question: Find all unique elements across two lists.

def union(nums1: list[int], nums2: list[int]) -> list[int]:
    # nums1 = [1, 2], nums2 = [2, 3]
    # Return [1, 2, 3]
    ...

Key insight: Use union operator.

return list(set(nums1) | set(nums2))

Complexity: O(n + m) time.

Problem 5: Happy Number (using sets)

Question: Detect if we enter a cycle when repeatedly summing squares of digits.

def isHappy(n: int) -> bool:
    # n = 7 → True (eventually reaches 1)
    # n = 2 → False (enters cycle)
    ...

Key insight: Track seen values in a set. If we see a value twice, we're in a cycle.

def get_next(num):
    total = 0
    while num > 0:
        digit = num % 10
        total += digit ** 2
        num //= 10
    return total

seen = set()
while n != 1 and n not in seen:
    seen.add(n)
    n = get_next(n)

return n == 1

Complexity: O(k) where k = iterations before reaching 1 or cycle.

Problem 6: Unique Email Addresses

Question: Count unique email addresses (ignoring rules about . and +).

def numUniqueEmails(emails: list[str]) -> int:
    # Rules: ignore dots before @, ignore everything after + (before @)
    # "test.email+tag@example.com" → "testemail@example.com"
    ...

Key insight: Normalize each email, add to set for automatic deduplication.

unique = set()
for email in emails:
    local, domain = email.split("@")
    local = local.split("+")[0].replace(".", "")  # Remove + part, remove dots
    unique.add(local + "@" + domain)
return len(unique)

Complexity: O(n * m) where m = avg email length.

Problem 7: Valid Sudoku

Question: Determine if a partially filled Sudoku board is valid (no duplicates in rows, columns, 3x3 boxes).

def isValidSudoku(board: list[list[str]]) -> bool:
    # board[i][j] is either a digit or "."
    ...

Key insight: Use sets to track numbers seen in rows, columns, and 3x3 boxes.

rows = [set() for _ in range(9)]
cols = [set() for _ in range(9)]
boxes = [set() for _ in range(9)]

for i in range(9):
    for j in range(9):
        num = board[i][j]
        if num == ".":
            continue
        
        box_idx = (i // 3) * 3 + (j // 3)
        
        if num in rows[i] or num in cols[j] or num in boxes[box_idx]:
            return False
        
        rows[i].add(num)
        cols[j].add(num)
        boxes[box_idx].add(num)

return True

Complexity: O(1) for 9x9 board (fixed size).

Problem 8: Longest Consecutive Sequence

Question: Find longest sequence of consecutive integers.

def longestConsecutive(nums: list[int]) -> int:
    # nums = [100, 4, 200, 1, 3, 2]
    # Return 4 ([1, 2, 3, 4])
    ...

Key insight: Use set for O(1) lookup. For each number, check if it starts a sequence.

num_set = set(nums)
longest = 0

for num in num_set:
    if num - 1 not in num_set:  # Start of sequence
        length = 1
        while num + length in num_set:
            length += 1
        longest = max(longest, length)

return longest

Complexity: O(n) amortized (each number visited at most twice).

Problem 9: Set Difference (Required vs Offered)

Question: Find skills required but not offered in a list.

def missingSkills(required: list[str], offered: list[str]) -> list[str]:
    # required = ["python", "sql", "docker"]
    # offered = ["python", "aws"]
    # Return ["sql", "docker"]
    ...

Key insight: Set difference.

return list(set(required) - set(offered))

Complexity: O(n + m) where n = required, m = offered.

Problem 10: Group Duplicate Words

Question: Group words that are anagrams of each other.

def groupAnagrams(words: list[str]) -> list[list[str]]:
    # words = ["eat", "tea", "ate", "bat", "tab", "cat"]
    # Return [["eat", "tea", "ate"], ["bat", "tab"], ["cat"]]
    ...

Key insight: Sorted string as key, use dict to group.

groups = {}
for word in words:
    key = ''.join(sorted(word))
    groups.setdefault(key, []).append(word)
return list(groups.values())

or with sets for unique tracking:

from collections import defaultdict
groups = defaultdict(set)
for word in words:
    key = ''.join(sorted(word))
    groups[key].add(word)
return [list(group) for group in groups.values()]

Complexity: O(n * k log k) where k = avg word length.

Sets

5. Sets

Example

Important properties

Uniqueness

1.5.2 Why sets are powerful

Uniqueness

Fast membership

Membership checks

1.5.3 How set lookup works conceptually

Mental model

1.5.9 Common set operations

add()

remove()

discard()

pop()

Union: | (all elements from both)

Intersection: & (only in both)

Difference: - (in first but not second)

Symmetric Difference: ^ (in exactly one)

1.5.10 Important interview patterns

Duplicate detection

Visited tracking (graphs/DFS)

Common elements

Unique count

Set operations summary

When to use sets vs lists vs dicts

Verbal interview questions

Coding drills

Drill 1: duplicates

Drill 2: common elements

Drill 3: preserve order dedupe

Must-answer interview questions

Additional coding drills

1.5.11 Common interview problems

Problem 1: Contains Duplicate

Problem 2: Find First Duplicate

Problem 3: Intersection of Two Arrays

Problem 4: Union of Two Sets

Problem 5: Happy Number (using sets)

Problem 6: Unique Email Addresses

Problem 7: Valid Sudoku

Problem 8: Longest Consecutive Sequence

Problem 9: Set Difference (Required vs Offered)

Problem 10: Group Duplicate Words

`add()`

`remove()`

`discard()`

`pop()`

Union: `|` (all elements from both)

Intersection: `&` (only in both)

Difference: `-` (in first but not second)

Symmetric Difference: `^` (in exactly one)