Data Structures and Algorithms Crash Course
Data Structures
Computer Components
- CPU: executes instructions, does calculations (in billions/sec)
- Control unit: fetches instructions from memory and what it means
- ALU (Arithmetic Logic Unit): does the calculations through logic gates
- Registers: tiny storage spots (few bytes), hold numbers while they are processed
- RAM: temp storage, holds running programs
- Storage (Disk): permanent storage, large but slower than RAM, program loads from storage to RAM and executed in CPU
- This is the “Von Neuman” Architecure
- Store both program instructions and data in same memory (RAM)
- Have CPU fetch and execute instructions sequentially
- CPU 100x fater than RAM, 10,000,000x faster than Disk
- Only holds 64 bytes of memory
- Data Structures live in RAM
References vs Primitives
- Values like integers, booleans are simple so are directly stored by the variable.
- Values like lists, or custom objects are stored via a reference. The variable holds a reference to the value.
- Reference is a way to find where the data lives in memory.
- When you copy a reference of a variable and change it’s value, it will change the value for both variables as they reference is the same.
Memory Size of Types
- Byte: 8 bits (8 ones or zeros)
- Ex:
- Bool : 1 byte, Integer: 4 bytes. Double: 8 bytes
- Why size matters for arrays?
- When initialziing, it needs to know how much space to allocate, different types have different needs? Array of 5 int vs 5 bool
- Address formula: base_address + (index x element_size)
- ex: 1000 + (2*4bytes) = 1008
Memory Alignment
- Small gaps in memory between variables to keep data aligned to specific boundaries based on their byte multiple (element size). allows for efficent reading by CPU.
Strings are special
- can be variable in length
- Usually stored as 3 components:
- pointers (8 bytes) that reference the actual character data.
- Char data store elsewhere in memory
- length info to track how many characters exist
Stack and Heap
- Computers organize memory of a program in two main regions:
Stack
- stack: fast, organized region for temp data that follows strict rules (LIFO)
- only add or remove from top
- when function is called a new “frame” is added to the stack containing:
- function params
- local variables
- return address (where to go back after the function finishes)
- when the function returns, its frame is removed from the stack
- extremely fast
- stack only few MBs
- stack stores primitive values like int and bool directly, complex types like objects and arrays are stored elsewhere with only their references living on the stack.
Heap
- heap: flexible region for data that need to live longer or grow dynamically
- “large warehouse”
- store items of any size for as long as you need
- objects arrays that grow dynamically stay on the heap
- slower than stack access, still fast
- can be GBs in size
- large datastructures are here
- Heap memory requires explicit management
- C++ needs manual allocation and freeing of heap memory
- Python uses garbage collection (automatic)
- Stack references point to heap objects
- reference (pointer) lives on a stack, actual data lives on heap
How it helps in coding?
- Don’t make too many nested function calls, or allocating large local arrays can cause stack overflow errors.
- Recursive functions that never terminate are a common cause of this problem
- Stack operations: fast bc allocation and deallocation simple move a pointer
- Heap operations: search for available sapce, tracking allocations, adding overhead.
- When performance matters, keep frequently accessed data on stack is very useful.
- Memory leaks happen on heap when allocation objects are never freed.
References vs Values (Mutation problem)
- reference types share data
- when you assign one var to another, you copy the refernce, no the entire object. Both variables now point to the same object in memory.
- if you change the value via one variable the other will change to as they refer to the same object.
- common source of bugs
- happens when passing objects to functions
- function recieves copy of the reference (pointing to same object)
- function mutates the object, the caller sees the change
- “pass by reference” behavior
- If you do not want the passed value to change, “copy”
- true indepent copy without affecting the original
- deep copies that duplicate the object itself not just the reference.
- deep copies make sure event nested objects are truly independent copies
- In C++, can pass by value (copy), by reference (with &), or by a pointer.
- “reference is an immutable pointer which cannot be null”
- References are much faster than copying the entire array
- but make sure of mutations
Why Data Structures?
- different way of organizing data makes search and insertion operations faster
- how you organize data determines how fast you can work with it
- linear search on an unsorted array (O(n)) (1000 searches worst case) vs binary search on a sorted array (O(log(n))) (10 searches worst case).
- O(log(n)): want this as much as possible as growth is very slow even when n explodes.
- By imposing structure on data, you can enable a faster algorithm.
- Arrays good for accessing, linked list good for insertion, hastable good for checking memberships, stacks and queues enforce specific orderings.
Matching structure to problem
At scale, performance determines if software is usable
- Array: fast index access O(1); slow middle insert O(n)
- Hash Table: fast lookup O(1); fast insert/delete O(1)
- Linked List: Fast head insert O(1), slow access O(n)
- Stack: LIFO order, used for undo, backtracking
- Queue: FIFO order, used for scheduling and BFS
- Trees: fast sorted operations O(log n); Hiearchical data, DFS
Algorithmic Complexity
Counting operations
- Operations: assign, reading, arithmetic, comparisons, array access, function calls
- each are a form of opeartion
- Loops multiply Operations
- Nested loops multiply operations across all levels
- Counting predicts performance before running code.
- Counting based on input size the most important
Constant vs Linear Time
- Constant: operations stay same despite input size
- O(1)
- Ex: array access always takes 1 operation
- address = base_address + (index × element_size)
- Other ex: var assignment, array update at indec, simple arith, compariosn,get vector size, vector push_back
- Ideal
- Linear: Operations grow proportionally with input size
- O(n)
- Ex: searching an unsorted array (linear search), summ all elements, find max, count occurrence
- Acceptable for most tasks
- Recognizing Constant Ops: No loops, recursion. Direct access.
- Recognizing Linear Ops: Single loop through array/list, visiting every element once, searching unsorted data, summing/counting or processing all elements.
- Constant factors don’t change the category; two seperate for loops are stil O(n)
- We care about growth rate, not exact counts.
Quadratic Time
- $O(n^2)$
- In presence of nested loops:
- Ex: comparing all pairs in an array (for distance, collision, other prop); generating multiplication table
- Gets slow quickly, okay for very small input sizes; fail catstropically with high input sizes.
Avoiding Quadratic Time
- Use better data structures: hash tables for O(1) lookup, instead of O(n) search. For dupes detection; use a set instead of nested loops.
- Sort first, then use efficent algorithm
- Sorting is O(n log n), which is much better than O(n^2)
- After sorting, many problems become O(n)
- Use two-pointer techniques
- Sometimes, two pointers moving through a sorted array can replace nested loops
- Look for math patterns
Best, worst, and average case
- Best case: luckiest input, min possible operations.
- For linear search, first position is the target O(1)
- Worst case: max possible operations
- Average Case: expected performance
- Lin Search: (n+1)/2 approx. n/2 (half of the worst case); still linear growth O(n)
- Focus on worst case:
- Gurantees matter
- Often common
- Average case can be hard to define: real world data distribution are rarely uniform
- Best case is usually trivial
- Ex: finding minimum has same O(n) for best, avg, and worst.
- Have to go through all elements
Space Complexity:
- Previous vs time complexity: how many operations an algorithm performs
- Algo also takes memory
- Count how much memory an algo needs relative to its input space
- O(1): constant space; count, sum
- “in space” algorithms; modify existing data structures than creating new ones
- O(n): linear space; new array of size n
- Auxillary arrays; create new data structs proportial to input size; needed to store interm results or build output without modifying the input.
- O(n^2): quadratic space; matrix n x n
- matricies; common in problems involving grids, graphs, or pairwise comps.
- Space comp: how much add. memory an algo needs beyond itself
Space-Time trade offs
- Can trade memory for speed, or vice versa
- Minimal space, slower
- store only last few values
- More space, faster lookups
- store all computed vals in array
- Depends on constrants: limited memory vs fast repeacted access
When space matters:
- Large datasets
- Embedded systems: limited RAM
- Server apps
- Mobile apps
Arrays:
- Collection of elements stored in contiguous memory locations; meaning they sit side by side in the computer’s memory
- Elements stored sequentially (4 bytes aprt for 32 bit int)
- Zero-based indexing: first element at index 0
- Address calculation: address = base + (index x element_size)
- O(1) direct access, key insight
- CPU cache for nearby data loaded
- Reading and updating are O(1) constant time operations.
Iteration
- Linear serach (linear iteration, for loop)
- start at i = 0, check iterate until found
- -1 if none
- Visit each element one by one
- O(n) linear time operation
- Tasks: summing, searching
- Array not sorted so can’t skip
Adding/Removing elements at the end
- Adding to end, append/push O(1) operation
- nothing to move, update length
- Common pattern is to start with empty array and build it by appending
- Each append is O(1)
- no need to specify size
- total time for n appends, O(n)
- Removing from end: called pop
- O(1), no element to shift
Adding/Removing from Middle
- Much slower: elements must shift
- Middle operations O(n):
- Insert/Delete: must shift elements right (insert), left (delete)
- Insert (shift right):
- start from end and work back
- shift each by one to right
- continue till i, and place new element at i
- update arrays length
- n-i number of shifts
- Delete (shift left):
- remove element at i
- shift each element after i one pose to left
- continue until end
- update array’s length
- n-i-1 (number of shifts)
- Alternative for deletions:
- swap with last element, then pop
- if freq insert/delete, use linked list
- to maintain order, use end of array
Dynamic Arrays
- Trad arrays need a specified size, if that size is reached, you must allocate a new array and copy everything over manulaly
- Modern languages have dynamic arrays:
- List in python
- Vector in C++
- Dyn Array maintians two seperate value: size (current numer of ele) and capacity (number of slots allocated in memory).
- Empty dyn array will allocate 4 ele cap
- When reach cap, it will choose a memory block with double cap and copy everything over.
- Doubling capacity each time
- 4->8->16->32->64
- Why doubling each time?
- Amortized
- Copying takes O(n) time
- If resize every append, n elements require O(n^2) ops.
- With doubling, resize happens at power of 2.
- For n appends, total copy work is O(n). Total append work are O(n), fiving O(n) total time for n operations.
- Divide by n and you get O(1) per operation on average.
- This is called amortized O(1) time.
- While a single append might trigger an expensive O(n) resize, most appends take O(1). Averaged out, each append costs O(1).
- Memory tradeoff: use more memory than strictly necessary.Right after resize you are at 50% full.
- For most apps, acceptable. But for memory efficency switch to trad arrays if needed.
Strings
- Array of chars
- Strings live in the heap (like arrays)
- When var created, that var holds a reference pointing to the actual char data stored in heap memory.
Indexing
- Zero Indexing
- Works just like array indexing
Iteration
- used for counting chars, validate format, transform, search patterns.
- same as array iteration
Concactenation
- join mulltiple strings in one
- ’%+’ opearator to concatenate
- or use append()
- strings are immutable, concatenating creates new string object each time.
- .reserve(num_elements) to avoid relocation when appending.
Comparison
- equality test: same chars in same sequence
- lexigraphical comparioson (position based)
String Methods
- .find(“phrase”);
- substr(i1, i2);
- .replace(string_arr, index, phrase);
- .length(); .size(); .empty();
Immutability
- strings are immutable: cannot be changed after they are created.
- modification creates a new string instead
- Python enforces string immutablity
- C++ allows mutable strings. std::string supports direct char assignment
- make copy if you want to preserve
Character Counting
- Uses hash maps (or arrays) to track how many times each char appears in a string.
- Iterate through string once, update counts as you go
- Create a map from char to frequency
- Use unordered maps
- Used in anagram detection (compares two freq maps for equality)
- Finding dupes if any count exceeds one
- Char validation verifies req chars are there
- Pattern matching
- Transforms strings into struct data for analyis
- iterate once through string
- O(n) time to build a map
- O(1) lookups for each char frequency
Linked List
- To insert in an array at certain pose, need to move everythin to the right of it by 1. Max n operations
- Linked Lists scatter elements throughout memory and connect them through pointers
- No continuous blocks
- Easy to insert O(1)
- Linked List is made of:
- Nodes: contains the data and the pointer to next node
- Must follow the chain to iterate, cannot jump
- Starts with head pointer (points to first node), each node points to next node in sequnce.
- Last node points to nothing (represented as null pointer), marks end of the list.
- Inserting: create new node, point it to the node after it, change previous node’s pointer to point to the new node. Same for deletion. O(1)
- Accessing: Linked list requires traversal to access at an index unlike array. O(N) complexity for Linked List for accessing.
- Array optimize for access speed, linked list optimize for modification flexibility.
Nodes and Pointers
- Node is a simple struct of data and pointer
struct Node {
int data;
Node* next;
Node(int data) : data(data), next(nullptr) {}
};
// Create a node with data = 5
Node* node = new Node(5);
cout << node->data << endl; // 5
cout << node->next << endl; // 0 (nullptr)
// Create two nodes
Node* first = new Node(10);
Node* second = new Node(20);
// Link first to second
first->next = second;
// Now: first -> second -> nullptr
cout << first->data << endl; // 10
cout << first->next->data << endl; // 20
cout << second->next << endl; // 0 (nullptr)
Traverse O(N)
void traverse(Node* head) {
Node* current = head;
while (current != nullptr) {
cout << current->data << endl;
current = current->next;
}
}
- Binary search does not work on LinkList
Insert
- At head easy O(1), at tail have to traverse the entire list; if no tail pointer.
- With tail pointer O(1)
- Implementations like queues: maintain a tail pointer.
Deletion
- Skip over unwanted nodes, disconnect from chain. change the pointer of the previous node to the next node.
- C++ requires explicit memory deallocation (delete head)
- Edge cases matter: empty list (nothing to delete), single node then return null, target not found, return original head, multiple occurance, do only first one (or remove early return).
- O(n) for delete by val, search dominates complexity
- O(1) delete at head
Linked List vs Array
- Arrays for jumping around random positions (binary search, mat ops)
- Insert at begin: linked list O(1)
- Insert at end: arrays with extra cap win O(1) dyn array, linked list need a tail pointer else O(n)
- Insert middle: both O(n), traversal kills linkedlist, inserting by itself is O(1).
- Deletion (mirrors insertion)
- Both pretty much O(n), for value based
- Search: O(n) for unsorted data, arrays can be sorted for bin search O(log n), sorted ll are still O(n)
- Mem Usage: array is contig (heavy stack memory but faster access due to RAM cache), ll is scatt mem easier on memory (slower RAM access,)
- sequential travel array faster (even tho both O(n)) due to RAM cache or overhead
- Linked list good for complex structures: trees, graphs, node based designs
Queues
FIFO Model
- First In, First Out
- Queue: process items in the order they arrive
- Rules of a queue:
- Items come out the way they went in
- Can only add to the back
- Can only remove from the front
- Can look at the front without removing it
- Can’t access items in the middle, can’t remove from the back
- If you want the third item, must remove the first two.
- FIFO maintains event ordering: if A needs to happen before B, queue ensures A completes before B starts.
- Can only interact front and back
Queue Operations
- Enqueue: add at the back of queue ```cpp #include #include
std::queue<std::string> q; q.push(“Task 1”); // Enqueue first task q.push(“Task 2”); // Enqueue second task q.push(“Task 3”); // Enqueue third task // Queue: [Task 1, Task 2, Task 3]
**Dequeue:** remove (and return) from the front
```cpp
// Assume queue has Task 1, Task 2, Task 3
std::string first = q.front(); // Get front
q.pop(); // Remove it
std::cout << first << std::endl; // "Task 1"
// Queue now: [Task 2, Task 3]
std::string nextTask = q.front();
q.pop();
std::cout << nextTask << std::endl; // "Task 2"
// Queue now: [Task 3]
Peek: look at the front without removing it
// Assume queue has First, Second, Third
if (!q.empty()) {
std::string frontItem = q.front();
std::cout << frontItem << std::endl; // "First"
}
// Queue unchanged
Size and empty check
// Assume queue has A, B, C
int size = q.size();
std::cout << "Queue has " << size << " items" << std::endl;
bool isEmpty = q.empty();
std::cout << isEmpty << std::endl; // false
Queue w/ Array
- Build Queue with two index vars, one front, one back
- Enqueue: place at the back and increment back index
- Dequeue: remove item from front index and increment it
- Space on the left becomes “wasted” cant reuse it.
- Both indicies only move right
- Solution to “wasted space” and array fills up
- Circular buffer (wrap around)
- Resize the array O(n) copying time.
- Use linked list instead
Queue w/ Linked List
- Much better choice
- No wasted space problem
- Dequeue from head pointer
- Enqueue from tail pointer
- Dynamic size perk, but extra memory for its next pointer.
Real World Apps
- Task scheduling: used in OS when mult progs need CPU time, they join a queue. Tasks arriving first gets processed first.
- Print Queue
- Request handling: for webservers too many requests reques the server to queue them to process them .
- Queue absorbs bursts from the traffic.
- If traffic arrives faster then workers process them, the queue grows.
- When traffic slows, workers drain the queue.
- this buffering prevents request loss during spikes.
- Message Queues:
- Service A enqueus a message, service B dequeus and processes it.
- This decouples the services, A doesn’t wait for B to finish (Pub/Sub)
- If service B crashes, message remina safe in queue, and service A continues producing.
- B restarts and continues processing
- When to use queues:
- Fair Ordering: First come first served
- Buffering: absorb bursts of work items
- Decoupling: producer and consumer run at diff speeds
- Reliability: store work items safely until processed
Hashmaps
- Hash Table: stores data as key-value pairs
- Hash function: transforms a key into a number; this number tells you where to find the value in an array. Basically, returns an array index.
- Ex: Alice returns 42, value of alice key is in array index 42.
- Hash code are large numbers
- Modulo operation converts hash code to array index
- index = hash_code % array_capacity
- Direct Access:
- Hash the key to get a number
- Convert the number to an array index with mod op
- Look up value at that index.
- No matter how many items you store, lookups is constant time O(1)
- hash + array access
- much better then linear search O(n)
- How to handle collisions?
- If two key produce same index (called collison)
- On average still O(1) performance
Hash Functions
- Convert Keys to indicies
- Good hash function must distribute keys uniformly among available slots (to prevent excessive collisions)
- Function must be deterministic
- Should be fast to compute
- Must work with whatever keys you need
- Polynomial rolling hash (for strings)
- hash = (s[0] × p^(n-1) + s[1] × p^(n-2) + … + s[n-1] × p^0) mod table_size
- hash = (‘a’ × 31² + ‘b’ × 31¹ + ‘c’ × 31⁰) mod table_size
- p here is a prime number
- Distributes strings well and avoid abc and cba to collide
- Integer Hashing:
- hash = (key × 2654435761) % table_size.
- Why modulo?
- makes sure the result fits your table’s bounds
- Collisions are going to happen
- hash func map inf possible keys to finite slots
- good hash func minimize this through even dist
// hash func
std::hash<std::string> str_hash;
size_t hval = str_hash("Alice");
// unordered map uses hash func automatically
std::unordered_map<std::string, int> map;
map["Alice"] = 30;
Collision Handling
- Two diff keys hash to same index
- need to store mult entries that hash to same loc
Chaining
- At each slot, store a linked list
- When collision occurs, add a new entry to the list
- During searching for keys, hash it to find slot, then search through the short list at the slot
- still very fast O(1) on average.
Full table
Hash to Array Index
\(index = hashcode \% capacity\)
- capacity affects distribution
- when resizing hash table, all indicies change, must rehas every entry
Complete lookup process
- Hash the key: hash_code = hash(“Alice”)
- Convert to index: index = hash_code % capacity
- Access array: value = array[index]
Common Ops:
- Insert O(1): add key-val pair
- Lookup O(1):
- Delete O(1):
- Fast data storage and retrieval
#include <unordered_map>
#include <string>
#include <iostream>
// Create empty hash table
std::unordered_map<std::string, std::string> phoneBook;
// Insert key-value pairs
phoneBook["Alice"] = "555-1234";
phoneBook["Bob"] = "555-5678";
phoneBook["Carol"] = "555-9012";
// Lookup by key
std::cout << phoneBook["Alice"] << std::endl; // "555-1234"
// Check if key exists
if (phoneBook.find("Bob") != phoneBook.end()) {
std::cout << "Found Bob" << std::endl;
}
// Delete a key
phoneBook.erase("Carol");
Iteration: does not gurantee order (internal array order)
std::unordered_map<std::string, std::string> phoneBook = {
{"Alice", "555-1234"},
{"Bob", "555-5678"}
};
// Iterate over key-value pairs
for (const auto& [name, number] : phoneBook) {
std::cout << name << ": " << number << std::endl;
}
Sets - Uniqueness
- Set: collection that stores only unique elements, no dupes
- Can’t add Alice twice
- Sets are hash tables without values, just keys
- Ops: Union, Intersection, Difference
std::unordered_set<int> numbers;
numbers.insert(1);
numbers.erase(2);
numbers.find(2);
#include <unordered_set>
#include <algorithm>
std::unordered_set<int> a = {1, 2, 3};
std::unordered_set<int> b = {2, 3, 4};
// Union
std::unordered_set<int> u = a;
u.insert(b.begin(), b.end()); // {1, 2, 3, 4}
// Intersection (manual)
std::unordered_set<int> intersect;
for (int x : a) {
if (b.find(x) != b.end()) {
intersect.insert(x); // {2, 3}
}
}
// Difference (manual)
std::unordered_set<int> diff;
for (int x : a) {
if (b.find(x) == b.end()) {
diff.insert(x); // {1}
}
}
- Use: fast membership testing O(1), automatic deduplication, set ops, counting unique elemts (set size = number of distinct items)
Stack (LIFO Model)
- Last In, First Out
- Ex: browser bak button
- Stack Ops:
- push (add element to top of stack); O(1)
- pop: remove element from top of stack; O(1)
- peek: look at top element without removing
- isEmpty: check if any elemetns in stack
- Cannot pop empty stack, stack underflow
#include <stack>
// Using C++ stack
std::stack<int> stack;
// Push elements
stack.push(10);
stack.push(20);
stack.push(30);
// Stack: [10, 20, 30] (30 is on top)
// Pop element
int top = stack.top(); // Get top value (30)
stack.pop(); // Remove it
// Stack: [10, 20]
// Peek at top
if (!stack.empty()) {
int topElement = stack.top(); // Returns 20
}
stack.size();
Stack Apps
- Function Call Stack: computer creates a stack frame, containing the functions local vars and where to return
- goes onto the call stuck
- after function finishes, frame pops of the stack, and execution returns to the caller
Call sequence:
A starts: Stack = [A] A calls B: Stack = [A, B] B calls C: Stack = [A, B, C] C finishes: Stack = [A, B] B finishes: Stack = [A] A finishes: Stack = []
- Basis for recursion, if recursion goes too deep, stack overflow, as call stack runs out of space.
- Undo mechanism
- Browser navigation (Back stack and Forward Stack)
- Matching parantheses (check for symmetry): check if brackets are balanced
- Expression Eval: calculators use stacks to eval math exp
-
Postfix exp eval Example: 3 4 + 2 *
Push 3: stack = [3] Push 4: stack = [3, 4] See +: pop 4 and 3, compute 3+4=7, push 7: stack = [7] Push 2: stack = [7, 2] See : pop 2 and 7, compute 72=14, push 14: stack = [14] Result: 14
- DFS: graph traversal uses stacks to remember which nodes to visit next. Algo visits deep as possible before backtracking
- Backtracking: solve puzzles and mazes and use stack to try path and backtrack if stuck
- Syntax parsing: compilers use stack to parse code structure and check syntax.
Imp Stack w/ Array
- End of array = top of stack
- Track “top” index
- Push = append to end
- Since, adding to end, no shifting, O(1)
- Pop = remove from end; no shift, O(1)
#include <vector>
#include <stdexcept>
template<typename T>
class Stack {
private:
std::vector<T> items;
public:
void push(const T& item) {
items.push_back(item); // O(1) amortized
}
T pop() {
if (isEmpty()) {
throw std::runtime_error("Pop from empty stack");
}
T top = items.back();
items.pop_back(); // O(1)
return top;
}
T peek() const {
if (isEmpty()) {
throw std::runtime_error("Peek from empty stack");
}
return items.back();
}
bool isEmpty() const {
return items.empty();
}
size_t size() const {
return items.size();
}
};
- Cache friendly (contiguously storage)
- Occasional resize O(n), amortized O(1)
- std choice for stack imps
Stack vs Queue (LIFO vs FIFO)
- Linear data strucs, store element seq
- Queue preservers order, stack reverses it.
- All ops are O(1)
- Both can be imp with arrays or linked list (although arr more common)
- Stack with arr O(1) add/remove
- Queue with arr Add at end, remove from front need circular buffer or linked list for O(1)
Sorting
- Sorting allows for faster search techniques (ex: binary O(log n))
- Sorted data enables optimization
- Algorithms become simpler and faster
- Finding dupes in unsorted array requires comparing every element with every other element
- O(n^2) comps
- Sorted array, dupes sits next to each other O(n) time.
- Merging two lists
- Finding closest pair of number
- Sort once in O( n log n), scan for the min diff in O(n)
- Much better then original O(n^2) or worse brute force method
When to Sort First
- Looking for dupes/patterns: sort groups similar element together, dupes become adjacent and patterns become visible.
- W/o sorting O(n^2)
- W/ sort is O(nlogn) + O(n) to pass through array once and find dupes
- Multiple searches on the same data: sort once; then use binary search repeatedly. Sorting costs spreads across all searches.
- Sort first, then solve with a simple scan.
- Pay O(n log n) to enable an O(n) solution, instead of O(n^2) or worse.
- Comparing or merging collections: sorted data merges cleanly.
- Finding closest, or farthest pairs.
- Sorted data enables two-pointer technique.
- Enables one pointer at start and one at end, moving them toward each other based on comparisons.
- Two sum problem that sum to a target: in unsorted, check every pair O(n^2), with sorting, use two pointers for O(n) time solution after sorting.
- Detecting anagrams:
- sort chars in both words and comapre.
- simpler then counting freq and comparing
- Interval Problems
- time ranges, scheduling conflicts, overlap events, requires sorting by start time.
- Trade-off Analysis: sort takes O(n log n) using built in sort, after many problems require O(n) for a single scan to solve. Compared to O(n^2) on unsorted data much better
Problem Solve Patterns
- Recognize signals to sort first:
- dupes, unique elements
- find pair of specific pair/diff (two pointer)
- work with intervals, ranges
- comparing sets of elements for matches
- detecting anagrams or char patterns
When not to Sort
- If you only serach once, might cost more then saving.
- Arrives already sorted
- Need to preserve original order (like a queue)
- Priority Queues (Example)
- Network Routing (Example): routers maintain sorted routing tables, when a packet arrives, the router uses binary search to find the best route in O(logn) time instead of checking every route.
Selection Sort
- Builds a sorted array by repeatedly finding the small element from the unsorted region and placing it at front (of the unsorted section)
- Each pass swaps one min at the front
- O(n2) op time
- Very few swaps compared to number of comparions
- n-1 swaps total for size n array
- When to use?
- when moving elements is expensive
- small datasets
- for most sorting needs, use built-in sort funcs
- Regardless if the data is partially sorted, or fully unsorted, performs O(n^2), dumb algorithm
void selectionSort(vector<int>& arr) {
int n = arr.size();
// Outer loop moves the boundary
for (int i = 0; i < n - 1; i++) {
// Track index of minimum in unsorted portion
int minIndex = i;
// Inner loop scans unsorted portion
for (int j = i + 1; j < n; j++) {
if (arr[j] < arr[minIndex]) {
minIndex = j;
}
}
// Swap minimum to boundary position
swap(arr[i], arr[minIndex]);
}
}
Insertion Sort
- Sorting a hand of cards:
- hold some cards in ur left hand already sorted
- right hands holds unsorted cards
- take one card from your right hand and put into correct pose in left
- Each pass:
- Take next element from unsorted portion
- place it in order (smallest first) in the sorted portion (start comp from right side) and shift one to the right accordingly (all on the right of the insert place) Ex:
- Initially, element at index 0 (12) is considered sorted
- Take 11 from unsorted portion
- Compare 11 with 12: 11 < 12, so shift 12 right
- Insert 11 at index 0
- Result: [11, 12, 13, 5, 6]
- Best case advantage
- Performs well on sorted, or nearly sorted arrays
- O(n) best case, check n-1 comp for already sorted
- Algorithm of choice when maintaining sorted arrays when inserting new elements:
- Each insertion takes O(n) in worst case
- Worst case:
- O(n^2): array in reverse order requires max shifts
- O(n^2): for random arrays
- good for small datasets
- provides stability
void insertionSort(std::vector<int>& arr) {
int n = arr.size();
// Start from second element
for (int i = 1; i < n; i++) {
int key = arr[i];
int j = i - 1;
// Shift elements right until correct position found
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j--;
}
// Insert key at correct position
arr[j + 1] = key;
}
}
Bubble Sort
- Compares adjacent elements and swaps them if they’re in the wrong order
- Repeats this process moving left to right until no more swaps are needed.
- Large value “bubble up” to their correct pose at the end of array, like bubbles rising to the surface of water.
- After a single pass, the largest element ends up in it’s final pose at the end “bubbles up”
- Algo needs multiple passes
- n-1 passes for an array of size n
- Each pass places one ele in its final pose
- Optimization trick: if a pass completes with no swaps then array must be sorted
- Good for partially sorted array
- Best case O(n), fully sorted one pass
- Else: O(n^2)
- Worst case, array in reverse order O(n^2) and random arr O(n^2).
- Work on nearly sorted well, with opt trick and for small arrs.
void bubbleSort(std::vector<int>& arr) {
int n = arr.size();
// Outer loop for each pass
for (int i = 0; i < n - 1; i++) {
bool swapped = false;
// Inner loop for comparisons
for (int j = 0; j < n - 1 - i; j++) {
// Compare adjacent elements
if (arr[j] > arr[j + 1]) {
// Swap if in wrong order
std::swap(arr[j], arr[j + 1]);
swapped = true;
}
}
// Early termination if no swaps
if (!swapped) break;
}
}
Comparing These 3 Sorting Algo
- None used in production
- Only insert sort used maybe for small dataset and/or embed sys
- Insert and bubble sort stable, select sort not
- Stable in the sense order is preserved (of the original) if the two elements are equal.
- Select sort when write ops are expensive
- Bubble sort mostly just for demos
Built-in Sort Functions
- Run in O (n log n) time
- For prod code, never implement your own sorting algo
- Timsort: hybrid algo that adapts to pattern in data
- Introsort: combine multiple strats based on data chars
- Both rely on divide and conquer
- divide in small subprobs, solve each piece and combine
- Most common algos are merge sort and quicksort.
- By dividing the problem, these algo change growth rate from O(n^2) to O(n log n).
- Sorting numbers
#include <algorithm>
#include <vector>
std::vector<int> numbers = {5, 2, 8, 1, 9};
std::sort(numbers.begin(), numbers.end());
// numbers is now [1, 2, 5, 8, 9]
#include <algorithm>
#include <vector>
#include <string>
#include <cctype>
std::vector<std::string> words = {
"zebra", "apple", "Banana", "cherry"
};
// Default sort
std::sort(words.begin(), words.end());
// [Banana, apple, cherry, zebra]
// Sort by length
std::sort(words.begin(), words.end(),
[](const std::string& a, const std::string& b) {
return a.length() < b.length();
}
);
#include <algorithm>
#include <vector>
#include <string>
#include <iostream>
struct Product {
std::string name;
int price;
double rating;
};
std::vector<Product> products = {
{"Laptop", 999, 4.5},
{"Mouse", 25, 4.8},
{"Keyboard", 79, 4.2},
{"Monitor", 299, 4.6}
};
// Sort by price (ascending)
std::sort(products.begin(), products.end(),
[](const Product& a, const Product& b) {
return a.price < b.price;
}
);
// Sort by multiple criteria: price, then rating
std::sort(products.begin(), products.end(),
[](const Product& a, const Product& b) {
if (a.price != b.price) {
return a.price < b.price;
}
return a.rating > b.rating;
}
);
- Comparators: takes two elemets and returns a value indicating their relativve order.
- if a < b : return -1, place a before b
- if a == b : return 0, keep curr order
- if a > b : return 1, place b before a
- Sort function calls your comparator repeatedly, after many comparisons the sort func determines final order.
- C++ comparators return bool, return true if a should come before b, false otherwise. (Use lambda func to make concise)
- stable vs unstable sort: stable sort preserves the relative order of equal elements
- Python’s sort is stable
- C++ std::sort is unstable, std::stable_sort() gurantee stability.
Beyond Basic Sorting
- Advanced sorting algos use divide and conquer to get O(n log n).
- The log factor comes from dividing the problem in half
- 1000 element array only requires 10 divs (2^10 = 1024)
- each element participates in roughly log n comps, this is why O (n log n)
Merge Sort
- Merge Sort: splits array in half until each piece is one element.
- single element are trivially sorted
- then it merges these “sorted pieces” back together, maintianing order
- merging produces one sorted result, compare first element of each array and take smaller one. repeat until both arrays pass through
- merge op takes, O(n) run time, since each element moves once.
- recursive splitting creates log n levels of depth.
- Total work O (n log n)!!!
- Worst case = best case = avg case = O(n log n)
- Space: O(n) memory to hold merged results temporarily.
- bad for sorting very large datasets.
- tradeoff off mergesort
Quicksort
- Selects a pivot ele and partitions the array around it
- Ele smaller then pivot go left, larger go right
- the pivot ends up in its final sorted pose.
- the algo recursively sorts the left and right partitions.
- No extra space O(1) space complexity.
- On avg: achieves O(n log n) performance, and often faster then merge sort due to better cache localit and fewer memory ops
- However, quick sort’s performance depends on pivot selection
- Worst case: O(n^2): choosing pivot which lead to unbalanced partitions (like first element in an already sorted array.)
- use median of three or randomized pivots to avoid the problem.
- Avg case: O (n log n)
Math for Algos
- Number Bases: base 10 (10 unique digits), base 2 (binary): two unique digits.
- Ex: 1011 -> 2^0 + 2^1 + 02^2 + 12^3 = 11
- Logs
- Sets: collection of distinct elements (order not matter)
- Permutation: specific arrangement of the elements in a set, order is crucial
- counting permutations: factorial of n (n!) = n* (n-1)(n-2)…1
- Number of subsets a set can have: each element can be in it or out of it only two possibilities. If a set has n elements, there will be 2^n subsets. (Includes empty subset and the original set itself)
- Arithmetic Sequence: seq of nums where the diff between conseq terms is const. Ex: 1,3,5,7,9
- Sum of arth seq:
(first_element + last_element)* number_of_element/2 - When is it useful? For analyzing nested loop complexity.
- Geometric Seq: ratio between conseq terms is constant. Ex: 1 2 4 8 16
- Sum of geom seq:
first_element * (1-ration^number_of_elements)/(1-ratio) - Useful in counting number of nodes in a perfect binary tree.
- Modular Arithmetic: system of ints that “wrap around” when reaching a certain value.
(a + b) MOD c = ((a MOD c) + (b MOD c)) MOD c - Useful when: problem with divisibility
- ex: The general solution to this classical problem is to check if n is divisible by any integers in the range [2, sqrt(n)] Why?
Binary Search
- Array search algo
- Narrow down search range by half each time
- Key Requirement: Array is sorted
- O(log N ) runtime
- Pick middle ele, if less then that, move right pointer to middle ele-1, if more, move left pointer to middle ele + 1
- Use iterative version for O(1) memory.
int binary_search(std::vector<int> arr, int target) {
int left = 0;
int right = arr.size() - 1;
while (left <= right) { // <= here because left and right could point to the same element, < would miss it
int mid = left + (right - left) / 2; // use `(right - left) / 2` to prevent `left + right` potential overflow
// found target, return its index
if (arr.at(mid) == target) return mid;
if (arr.at(mid) < target) {
// middle less than target, discard left half by making left search boundary `mid + 1`
left = mid + 1;
} else {
// middle greater than target, discard right half by making right search boundary `mid - 1`
right = mid - 1;
}
}
return -1; // if we get here we didn't hit above return so we didn't find target
}
- Can be used to find boundaries of Bool
- Official template
-
int binary_search(std::vector<int> arr, int target) {
int left = 0;
int right = arr.size() - 1;
int firstTrueIndex = -1;
while (left <= right) {
int mid = left + (right - left) / 2;
if (feasible(mid)) {
firstTrueIndex = mid;
right = mid - 1;
} else {
left = mid + 1;
}
}
return firstTrueIndex;
}
- good for finding pivots in sorted arrays (where it was rotated from)
- good for searching through data distributions and finding value (max/min) of a variable.
Two Pointers
- Solve problems involving iterable data structure (like an array).
- Uses two or more pointers that traverses through the structure.
- Doesn’t actually have to be a pointer
- No singular way to implement it, problem dependent.
- General Algorithm:
- Two moving pointers, regardless of directions, moving dependently or independently
- Function that utilizes the entries referenced by the two pointers, which relates to the solution in some way
- A way of deciding which pointer to move
- A way to process the array when pointers are moved.
Classifications
- Same direction pointers, ex: replacing dupes
- Opposite directions (binary search a sub-method here)
- Sliding window vs Two pointers: similar but in slide wind the function performs on the entire interval bw the two pointers. In two pointers, keep track of the cumulative result of the window.
Non Array Apps
- Can be used on linked list (if it is iterable)
- Detecting cycle in linked list
Why use two Pointers
- More efficient than naive solution
- Converts O(n^2) to O(n), passing through array once instead of two loops.