Data Structures and Algorithms Crash Course

Data Structures
Algorithmic Complexity
Arrays:
Strings
Linked List
Queues
Hashmaps
Stack (LIFO Model)
Sorting
Math for Algos
Binary Search
Two Pointers

Data Structures

Computer Components

CPU: executes instructions, does calculations (in billions/sec)
- Control unit: fetches instructions from memory and what it means
- ALU (Arithmetic Logic Unit): does the calculations through logic gates
- Registers: tiny storage spots (few bytes), hold numbers while they are processed
RAM: temp storage, holds running programs
- “Workspace”
Storage (Disk): permanent storage, large but slower than RAM, program loads from storage to RAM and executed in CPU
This is the “Von Neuman” Architecure
- Store both program instructions and data in same memory (RAM)
- Have CPU fetch and execute instructions sequentially
CPU 100x fater than RAM, 10,000,000x faster than Disk
- Only holds 64 bytes of memory
Data Structures live in RAM

References vs Primitives

Values like integers, booleans are simple so are directly stored by the variable.
Values like lists, or custom objects are stored via a reference. The variable holds a reference to the value.
- Reference is a way to find where the data lives in memory.
- When you copy a reference of a variable and change it’s value, it will change the value for both variables as they reference is the same.

Memory Size of Types

Byte: 8 bits (8 ones or zeros)
Ex:
- Bool : 1 byte, Integer: 4 bytes. Double: 8 bytes
Why size matters for arrays?
- When initialziing, it needs to know how much space to allocate, different types have different needs? Array of 5 int vs 5 bool
Address formula: base_address + (index x element_size)
- ex: 1000 + (2*4bytes) = 1008

Memory Alignment

Small gaps in memory between variables to keep data aligned to specific boundaries based on their byte multiple (element size). allows for efficent reading by CPU.

Strings are special

can be variable in length
Usually stored as 3 components:
- pointers (8 bytes) that reference the actual character data.
- Char data store elsewhere in memory
- length info to track how many characters exist

Stack and Heap

Computers organize memory of a program in two main regions:

Stack

stack: fast, organized region for temp data that follows strict rules (LIFO)
only add or remove from top
- when function is called a new “frame” is added to the stack containing:
  - function params
  - local variables
  - return address (where to go back after the function finishes)
- when the function returns, its frame is removed from the stack
- extremely fast
stack only few MBs
stack stores primitive values like int and bool directly, complex types like objects and arrays are stored elsewhere with only their references living on the stack.

Heap

heap: flexible region for data that need to live longer or grow dynamically
“large warehouse”
- store items of any size for as long as you need
- objects arrays that grow dynamically stay on the heap
slower than stack access, still fast
can be GBs in size
large datastructures are here
Heap memory requires explicit management
- C++ needs manual allocation and freeing of heap memory
- Python uses garbage collection (automatic)
Stack references point to heap objects
- reference (pointer) lives on a stack, actual data lives on heap

How it helps in coding?

Don’t make too many nested function calls, or allocating large local arrays can cause stack overflow errors.
Recursive functions that never terminate are a common cause of this problem
Stack operations: fast bc allocation and deallocation simple move a pointer
Heap operations: search for available sapce, tracking allocations, adding overhead.
When performance matters, keep frequently accessed data on stack is very useful.
Memory leaks happen on heap when allocation objects are never freed.

References vs Values (Mutation problem)

reference types share data
when you assign one var to another, you copy the refernce, no the entire object. Both variables now point to the same object in memory.
- if you change the value via one variable the other will change to as they refer to the same object.
- common source of bugs
happens when passing objects to functions
- function recieves copy of the reference (pointing to same object)
- function mutates the object, the caller sees the change
- “pass by reference” behavior
If you do not want the passed value to change, “copy”
- true indepent copy without affecting the original
- deep copies that duplicate the object itself not just the reference.
  - deep copies make sure event nested objects are truly independent copies
In C++, can pass by value (copy), by reference (with &), or by a pointer.
“reference is an immutable pointer which cannot be null”
References are much faster than copying the entire array
- but make sure of mutations

Why Data Structures?

different way of organizing data makes search and insertion operations faster
how you organize data determines how fast you can work with it
linear search on an unsorted array (O(n)) (1000 searches worst case) vs binary search on a sorted array (O(log(n))) (10 searches worst case).
O(log(n)): want this as much as possible as growth is very slow even when n explodes.
By imposing structure on data, you can enable a faster algorithm.
Arrays good for accessing, linked list good for insertion, hastable good for checking memberships, stacks and queues enforce specific orderings.

Matching structure to problem

At scale, performance determines if software is usable

Array: fast index access O(1); slow middle insert O(n)
Hash Table: fast lookup O(1); fast insert/delete O(1)
Linked List: Fast head insert O(1), slow access O(n)
Stack: LIFO order, used for undo, backtracking
Queue: FIFO order, used for scheduling and BFS
Trees: fast sorted operations O(log n); Hiearchical data, DFS

Algorithmic Complexity

Counting operations

Operations: assign, reading, arithmetic, comparisons, array access, function calls
- each are a form of opeartion
Loops multiply Operations
Nested loops multiply operations across all levels
Counting predicts performance before running code.
Counting based on input size the most important

Constant vs Linear Time

Constant: operations stay same despite input size
- O(1)
- Ex: array access always takes 1 operation
- address = base_address + (index × element_size)
- Other ex: var assignment, array update at indec, simple arith, compariosn,get vector size, vector push_back
- Ideal
Linear: Operations grow proportionally with input size
- O(n)
- Ex: searching an unsorted array (linear search), summ all elements, find max, count occurrence
- Acceptable for most tasks
Recognizing Constant Ops: No loops, recursion. Direct access.
Recognizing Linear Ops: Single loop through array/list, visiting every element once, searching unsorted data, summing/counting or processing all elements.
- Constant factors don’t change the category; two seperate for loops are stil O(n)
- We care about growth rate, not exact counts.

Quadratic Time

$O(n^2)$
In presence of nested loops:
- Ex: comparing all pairs in an array (for distance, collision, other prop); generating multiplication table
- Gets slow quickly, okay for very small input sizes; fail catstropically with high input sizes.

Avoiding Quadratic Time

Use better data structures: hash tables for O(1) lookup, instead of O(n) search. For dupes detection; use a set instead of nested loops.
Sort first, then use efficent algorithm
- Sorting is O(n log n), which is much better than O(n^2)
- After sorting, many problems become O(n)
Use two-pointer techniques
- Sometimes, two pointers moving through a sorted array can replace nested loops
Look for math patterns

Best, worst, and average case

Best case: luckiest input, min possible operations.
- For linear search, first position is the target O(1)
Worst case: max possible operations
- Lin search. O(n)
Average Case: expected performance
- Lin Search: (n+1)/2 approx. n/2 (half of the worst case); still linear growth O(n)
Focus on worst case:
- Gurantees matter
- Often common
- Average case can be hard to define: real world data distribution are rarely uniform
- Best case is usually trivial
Ex: finding minimum has same O(n) for best, avg, and worst.
- Have to go through all elements

Space Complexity:

Previous vs time complexity: how many operations an algorithm performs
Algo also takes memory
Count how much memory an algo needs relative to its input space
O(1): constant space; count, sum
- “in space” algorithms; modify existing data structures than creating new ones
O(n): linear space; new array of size n
- Auxillary arrays; create new data structs proportial to input size; needed to store interm results or build output without modifying the input.
O(n^2): quadratic space; matrix n x n
- matricies; common in problems involving grids, graphs, or pairwise comps.
Space comp: how much add. memory an algo needs beyond itself

Space-Time trade offs

Can trade memory for speed, or vice versa
- Minimal space, slower
  - store only last few values
- More space, faster lookups
  - store all computed vals in array
Depends on constrants: limited memory vs fast repeacted access

When space matters:

Large datasets
Embedded systems: limited RAM
Server apps
Mobile apps

Arrays:

Collection of elements stored in contiguous memory locations; meaning they sit side by side in the computer’s memory
Elements stored sequentially (4 bytes aprt for 32 bit int)
Zero-based indexing: first element at index 0
- Address calculation: address = base + (index x element_size)
O(1) direct access, key insight
CPU cache for nearby data loaded
Reading and updating are O(1) constant time operations.

Iteration

Linear serach (linear iteration, for loop)
- start at i = 0, check iterate until found
- -1 if none
Visit each element one by one
O(n) linear time operation
Tasks: summing, searching
Array not sorted so can’t skip

Adding/Removing elements at the end

Adding to end, append/push O(1) operation
- nothing to move, update length
Common pattern is to start with empty array and build it by appending
- Each append is O(1)
- no need to specify size
- total time for n appends, O(n)
Removing from end: called pop
- O(1), no element to shift

Adding/Removing from Middle

Much slower: elements must shift
Middle operations O(n):
- Insert/Delete: must shift elements right (insert), left (delete)
Insert (shift right):
- start from end and work back
- shift each by one to right
- continue till i, and place new element at i
- update arrays length
- n-i number of shifts
Delete (shift left):
- remove element at i
- shift each element after i one pose to left
- continue until end
- update array’s length
- n-i-1 (number of shifts)
Alternative for deletions:
- swap with last element, then pop
if freq insert/delete, use linked list
to maintain order, use end of array

Dynamic Arrays

Trad arrays need a specified size, if that size is reached, you must allocate a new array and copy everything over manulaly
Modern languages have dynamic arrays:
- List in python
- Vector in C++
Dyn Array maintians two seperate value: size (current numer of ele) and capacity (number of slots allocated in memory).
- Empty dyn array will allocate 4 ele cap
- When reach cap, it will choose a memory block with double cap and copy everything over.
- Doubling capacity each time
- 4->8->16->32->64
- Why doubling each time?
  - Amortized
  - Copying takes O(n) time
  - If resize every append, n elements require O(n^2) ops.
  - With doubling, resize happens at power of 2.
  - For n appends, total copy work is O(n). Total append work are O(n), fiving O(n) total time for n operations.
  - Divide by n and you get O(1) per operation on average.
  - This is called amortized O(1) time.
  - While a single append might trigger an expensive O(n) resize, most appends take O(1). Averaged out, each append costs O(1).
Memory tradeoff: use more memory than strictly necessary.Right after resize you are at 50% full.
- For most apps, acceptable. But for memory efficency switch to trad arrays if needed.

Strings

Array of chars
Strings live in the heap (like arrays)
When var created, that var holds a reference pointing to the actual char data stored in heap memory.

Indexing

Zero Indexing
Works just like array indexing

Iteration

used for counting chars, validate format, transform, search patterns.
same as array iteration

Concactenation

join mulltiple strings in one
’%+’ opearator to concatenate
or use append()
strings are immutable, concatenating creates new string object each time.
.reserve(num_elements) to avoid relocation when appending.

Comparison

equality test: same chars in same sequence
lexigraphical comparioson (position based)

String Methods

.find(“phrase”);
substr(i1, i2);
.replace(string_arr, index, phrase);
.length(); .size(); .empty();

Immutability

strings are immutable: cannot be changed after they are created.
modification creates a new string instead
Python enforces string immutablity
C++ allows mutable strings. std::string supports direct char assignment
- make copy if you want to preserve

Character Counting

Uses hash maps (or arrays) to track how many times each char appears in a string.
- Iterate through string once, update counts as you go
- Create a map from char to frequency
- Use unordered maps
Used in anagram detection (compares two freq maps for equality)
- Finding dupes if any count exceeds one
- Char validation verifies req chars are there
- Pattern matching
Transforms strings into struct data for analyis
- iterate once through string
O(n) time to build a map
- O(1) lookups for each char frequency

Linked List

To insert in an array at certain pose, need to move everythin to the right of it by 1. Max n operations
Linked Lists scatter elements throughout memory and connect them through pointers
- No continuous blocks
- Easy to insert O(1)
Linked List is made of:
- Nodes: contains the data and the pointer to next node
- Must follow the chain to iterate, cannot jump
- Starts with head pointer (points to first node), each node points to next node in sequnce.
- Last node points to nothing (represented as null pointer), marks end of the list.
Inserting: create new node, point it to the node after it, change previous node’s pointer to point to the new node. Same for deletion. O(1)
Accessing: Linked list requires traversal to access at an index unlike array. O(N) complexity for Linked List for accessing.
Array optimize for access speed, linked list optimize for modification flexibility.

Nodes and Pointers

Node is a simple struct of data and pointer

struct Node {
    int data;
    Node* next;

    Node(int data) : data(data), next(nullptr) {}
};

// Create a node with data = 5
Node* node = new Node(5);
cout << node->data << endl;  // 5
cout << node->next << endl;  // 0 (nullptr)

// Create two nodes
Node* first = new Node(10);
Node* second = new Node(20);

// Link first to second
first->next = second;

// Now: first -> second -> nullptr
cout << first->data << endl;         // 10
cout << first->next->data << endl;   // 20
cout << second->next << endl;        // 0 (nullptr)

Traverse O(N)

void traverse(Node* head) {
    Node* current = head;
    while (current != nullptr) {
        cout << current->data << endl;
        current = current->next;
    }
}

Binary search does not work on LinkList

Insert

At head easy O(1), at tail have to traverse the entire list; if no tail pointer.
- With tail pointer O(1)
- Implementations like queues: maintain a tail pointer.

Deletion

Skip over unwanted nodes, disconnect from chain. change the pointer of the previous node to the next node.
C++ requires explicit memory deallocation (delete head)
Edge cases matter: empty list (nothing to delete), single node then return null, target not found, return original head, multiple occurance, do only first one (or remove early return).
O(n) for delete by val, search dominates complexity
O(1) delete at head

Linked List vs Array

Arrays for jumping around random positions (binary search, mat ops)
Insert at begin: linked list O(1)
- Stack or undo ops
Insert at end: arrays with extra cap win O(1) dyn array, linked list need a tail pointer else O(n)
Insert middle: both O(n), traversal kills linkedlist, inserting by itself is O(1).
Deletion (mirrors insertion)
Both pretty much O(n), for value based
Search: O(n) for unsorted data, arrays can be sorted for bin search O(log n), sorted ll are still O(n)
Mem Usage: array is contig (heavy stack memory but faster access due to RAM cache), ll is scatt mem easier on memory (slower RAM access,)
- sequential travel array faster (even tho both O(n)) due to RAM cache or overhead
Linked list good for complex structures: trees, graphs, node based designs

Queues

FIFO Model

First In, First Out
Queue: process items in the order they arrive
Rules of a queue:
- Items come out the way they went in
- Can only add to the back
- Can only remove from the front
- Can look at the front without removing it
Can’t access items in the middle, can’t remove from the back
- If you want the third item, must remove the first two.
FIFO maintains event ordering: if A needs to happen before B, queue ensures A completes before B starts.
Can only interact front and back

Queue Operations

Enqueue: add at the back of queue ```cpp #include #include

std::queue<std::string> q; q.push(“Task 1”); // Enqueue first task q.push(“Task 2”); // Enqueue second task q.push(“Task 3”); // Enqueue third task // Queue: [Task 1, Task 2, Task 3]

**Dequeue:** remove (and return) from the front

```cpp
// Assume queue has Task 1, Task 2, Task 3
std::string first = q.front();   // Get front
q.pop();                         // Remove it
std::cout << first << std::endl; // "Task 1"
// Queue now: [Task 2, Task 3]

std::string nextTask = q.front();
q.pop();
std::cout << nextTask << std::endl; // "Task 2"
// Queue now: [Task 3]

Peek: look at the front without removing it

// Assume queue has First, Second, Third
if (!q.empty()) {
    std::string frontItem = q.front();
    std::cout << frontItem << std::endl;   // "First"
}
// Queue unchanged

Size and empty check

// Assume queue has A, B, C
int size = q.size();
std::cout << "Queue has " << size << " items" << std::endl;

bool isEmpty = q.empty();
std::cout << isEmpty << std::endl;   // false

Queue w/ Array

Build Queue with two index vars, one front, one back
Enqueue: place at the back and increment back index
Dequeue: remove item from front index and increment it
- Space on the left becomes “wasted” cant reuse it.
Both indicies only move right
Solution to “wasted space” and array fills up
- Circular buffer (wrap around)
- Resize the array O(n) copying time.
- Use linked list instead
  Queue w/ Linked List
Much better choice
No wasted space problem
Dequeue from head pointer
Enqueue from tail pointer
Dynamic size perk, but extra memory for its next pointer.

Real World Apps

Task scheduling: used in OS when mult progs need CPU time, they join a queue. Tasks arriving first gets processed first.
Print Queue
Request handling: for webservers too many requests reques the server to queue them to process them .
- Queue absorbs bursts from the traffic.
- If traffic arrives faster then workers process them, the queue grows.
- When traffic slows, workers drain the queue.
- this buffering prevents request loss during spikes.
Message Queues:
- Service A enqueus a message, service B dequeus and processes it.
- This decouples the services, A doesn’t wait for B to finish (Pub/Sub)
- If service B crashes, message remina safe in queue, and service A continues producing.
- B restarts and continues processing
When to use queues:
- Fair Ordering: First come first served
- Buffering: absorb bursts of work items
- Decoupling: producer and consumer run at diff speeds
- Reliability: store work items safely until processed

Hashmaps

Hash Table: stores data as key-value pairs
Hash function: transforms a key into a number; this number tells you where to find the value in an array. Basically, returns an array index.
- Ex: Alice returns 42, value of alice key is in array index 42.
Hash code are large numbers
- Modulo operation converts hash code to array index
- index = hash_code % array_capacity
Direct Access:
- Hash the key to get a number
- Convert the number to an array index with mod op
- Look up value at that index.
No matter how many items you store, lookups is constant time O(1)
- hash + array access
- much better then linear search O(n)
How to handle collisions?
- If two key produce same index (called collison)
- On average still O(1) performance

Hash Functions

Convert Keys to indicies
Good hash function must distribute keys uniformly among available slots (to prevent excessive collisions)
Function must be deterministic
Should be fast to compute
Must work with whatever keys you need
Polynomial rolling hash (for strings)
- hash = (s[0] × p^(n-1) + s[1] × p^(n-2) + … + s[n-1] × p^0) mod table_size
- hash = (‘a’ × 31² + ‘b’ × 31¹ + ‘c’ × 31⁰) mod table_size
- p here is a prime number
- Distributes strings well and avoid abc and cba to collide
Integer Hashing:
- hash = (key × 2654435761) % table_size.
Why modulo?
- makes sure the result fits your table’s bounds
Collisions are going to happen
- hash func map inf possible keys to finite slots
- good hash func minimize this through even dist

// hash func
std::hash<std::string> str_hash;
size_t hval = str_hash("Alice");

// unordered map uses hash func automatically
std::unordered_map<std::string, int> map;
map["Alice"] = 30;

Collision Handling

Two diff keys hash to same index
need to store mult entries that hash to same loc

Chaining

At each slot, store a linked list
- When collision occurs, add a new entry to the list
During searching for keys, hash it to find slot, then search through the short list at the slot
- still very fast O(1) on average.

Full table

Load factor: ratio of entries to slots
High load factor (0.7 to 0.75), table auto resizes
Resize is infrequent, so amortized still O(1) on avg.
Other approch to solve coll: open addressing - when a slot is occup check other slots until finding an empty one, python dict use this.

Hash to Array Index

$index = hashcode \% capacity$

capacity affects distribution
when resizing hash table, all indicies change, must rehas every entry

Complete lookup process

Hash the key: hash_code = hash(“Alice”)
Convert to index: index = hash_code % capacity
Access array: value = array[index]

Common Ops:

Insert O(1): add key-val pair
Lookup O(1):
Delete O(1):
Fast data storage and retrieval

#include <unordered_map>
#include <string>
#include <iostream>

// Create empty hash table
std::unordered_map<std::string, std::string> phoneBook;

// Insert key-value pairs
phoneBook["Alice"] = "555-1234";
phoneBook["Bob"] = "555-5678";
phoneBook["Carol"] = "555-9012";

// Lookup by key
std::cout << phoneBook["Alice"] << std::endl;  // "555-1234"

// Check if key exists
if (phoneBook.find("Bob") != phoneBook.end()) {
    std::cout << "Found Bob" << std::endl;
}

// Delete a key
phoneBook.erase("Carol");

Iteration: does not gurantee order (internal array order)

std::unordered_map<std::string, std::string> phoneBook = {
    {"Alice", "555-1234"},
    {"Bob", "555-5678"}
};

// Iterate over key-value pairs
for (const auto& [name, number] : phoneBook) {
    std::cout << name << ": " << number << std::endl;
}

Sets - Uniqueness

Set: collection that stores only unique elements, no dupes
Can’t add Alice twice
Sets are hash tables without values, just keys
Ops: Union, Intersection, Difference

std::unordered_set<int> numbers;
numbers.insert(1);
numbers.erase(2);
numbers.find(2);

#include <unordered_set>
#include <algorithm>

std::unordered_set<int> a = {1, 2, 3};
std::unordered_set<int> b = {2, 3, 4};

// Union
std::unordered_set<int> u = a;
u.insert(b.begin(), b.end());  // {1, 2, 3, 4}

// Intersection (manual)
std::unordered_set<int> intersect;
for (int x : a) {
    if (b.find(x) != b.end()) {
        intersect.insert(x);  // {2, 3}
    }
}

// Difference (manual)
std::unordered_set<int> diff;
for (int x : a) {
    if (b.find(x) == b.end()) {
        diff.insert(x);  // {1}
    }
}

Use: fast membership testing O(1), automatic deduplication, set ops, counting unique elemts (set size = number of distinct items)

Stack (LIFO Model)

Last In, First Out
Ex: browser bak button
Stack Ops:
- push (add element to top of stack); O(1)
- pop: remove element from top of stack; O(1)
- peek: look at top element without removing
- isEmpty: check if any elemetns in stack
Cannot pop empty stack, stack underflow

#include <stack>

// Using C++ stack
std::stack<int> stack;

// Push elements
stack.push(10);
stack.push(20);
stack.push(30);
// Stack: [10, 20, 30] (30 is on top)

// Pop element
int top = stack.top();  // Get top value (30)
stack.pop();            // Remove it
// Stack: [10, 20]

// Peek at top
if (!stack.empty()) {
    int topElement = stack.top();  // Returns 20
}
stack.size();

Stack Apps

Function Call Stack: computer creates a stack frame, containing the functions local vars and where to return
- goes onto the call stuck
- after function finishes, frame pops of the stack, and execution returns to the caller

Call sequence:

A starts: Stack = [A] A calls B: Stack = [A, B] B calls C: Stack = [A, B, C] C finishes: Stack = [A, B] B finishes: Stack = [A] A finishes: Stack = []

Basis for recursion, if recursion goes too deep, stack overflow, as call stack runs out of space.
Undo mechanism
Browser navigation (Back stack and Forward Stack)
Matching parantheses (check for symmetry): check if brackets are balanced
Expression Eval: calculators use stacks to eval math exp
- Postfix exp eval Example: 3 4 + 2 *
  
  Push 3: stack = [3] Push 4: stack = [3, 4] See +: pop 4 and 3, compute 3+4=7, push 7: stack = [7] Push 2: stack = [7, 2] See : pop 2 and 7, compute 72=14, push 14: stack = [14] Result: 14
DFS: graph traversal uses stacks to remember which nodes to visit next. Algo visits deep as possible before backtracking
Backtracking: solve puzzles and mazes and use stack to try path and backtrack if stuck
Syntax parsing: compilers use stack to parse code structure and check syntax.

Imp Stack w/ Array

End of array = top of stack
Track “top” index
Push = append to end
Since, adding to end, no shifting, O(1)
Pop = remove from end; no shift, O(1)

#include <vector>
#include <stdexcept>

template<typename T>
class Stack {
private:
    std::vector<T> items;

public:
    void push(const T& item) {
        items.push_back(item);  // O(1) amortized
    }

    T pop() {
        if (isEmpty()) {
            throw std::runtime_error("Pop from empty stack");
        }
        T top = items.back();
        items.pop_back();  // O(1)
        return top;
    }

    T peek() const {
        if (isEmpty()) {
            throw std::runtime_error("Peek from empty stack");
        }
        return items.back();
    }

    bool isEmpty() const {
        return items.empty();
    }

    size_t size() const {
        return items.size();
    }
};

Cache friendly (contiguously storage)
Occasional resize O(n), amortized O(1)
std choice for stack imps

Stack vs Queue (LIFO vs FIFO)

Linear data strucs, store element seq
Queue preservers order, stack reverses it.
All ops are O(1)
Both can be imp with arrays or linked list (although arr more common)
- Stack with arr O(1) add/remove
- Queue with arr Add at end, remove from front need circular buffer or linked list for O(1)

Sorting

Sorting allows for faster search techniques (ex: binary O(log n))
Sorted data enables optimization
- Algorithms become simpler and faster
- Finding dupes in unsorted array requires comparing every element with every other element
- O(n^2) comps
- Sorted array, dupes sits next to each other O(n) time.
- Merging two lists
- Finding closest pair of number
  - Sort once in O( n log n), scan for the min diff in O(n)
  - Much better then original O(n^2) or worse brute force method

When to Sort First

Looking for dupes/patterns: sort groups similar element together, dupes become adjacent and patterns become visible.
- W/o sorting O(n^2)
- W/ sort is O(nlogn) + O(n) to pass through array once and find dupes
Multiple searches on the same data: sort once; then use binary search repeatedly. Sorting costs spreads across all searches.
- Sort first, then solve with a simple scan.
- Pay O(n log n) to enable an O(n) solution, instead of O(n^2) or worse.
Comparing or merging collections: sorted data merges cleanly.
Finding closest, or farthest pairs.
Sorted data enables two-pointer technique.
- Enables one pointer at start and one at end, moving them toward each other based on comparisons.
  - Two sum problem that sum to a target: in unsorted, check every pair O(n^2), with sorting, use two pointers for O(n) time solution after sorting.
Detecting anagrams:
- sort chars in both words and comapre.
- simpler then counting freq and comparing
Interval Problems
- time ranges, scheduling conflicts, overlap events, requires sorting by start time.
Trade-off Analysis: sort takes O(n log n) using built in sort, after many problems require O(n) for a single scan to solve. Compared to O(n^2) on unsorted data much better

Problem Solve Patterns

Recognize signals to sort first:
- dupes, unique elements
- find pair of specific pair/diff (two pointer)
- work with intervals, ranges
- comparing sets of elements for matches
- detecting anagrams or char patterns

When not to Sort

If you only serach once, might cost more then saving.
Arrives already sorted
Need to preserve original order (like a queue)
Priority Queues (Example)
Network Routing (Example): routers maintain sorted routing tables, when a packet arrives, the router uses binary search to find the best route in O(logn) time instead of checking every route.

Selection Sort

Builds a sorted array by repeatedly finding the small element from the unsorted region and placing it at front (of the unsorted section)
Each pass swaps one min at the front
O(n2) op time
Very few swaps compared to number of comparions
- n-1 swaps total for size n array
When to use?
- when moving elements is expensive
- small datasets
for most sorting needs, use built-in sort funcs
Regardless if the data is partially sorted, or fully unsorted, performs O(n^2), dumb algorithm

void selectionSort(vector<int>& arr) {
    int n = arr.size();

    // Outer loop moves the boundary
    for (int i = 0; i < n - 1; i++) {
        // Track index of minimum in unsorted portion
        int minIndex = i;

        // Inner loop scans unsorted portion
        for (int j = i + 1; j < n; j++) {
            if (arr[j] < arr[minIndex]) {
                minIndex = j;
            }
        }

        // Swap minimum to boundary position
        swap(arr[i], arr[minIndex]);
    }
}

Insertion Sort

Sorting a hand of cards:
- hold some cards in ur left hand already sorted
- right hands holds unsorted cards
- take one card from your right hand and put into correct pose in left
Each pass:
- Take next element from unsorted portion
- place it in order (smallest first) in the sorted portion (start comp from right side) and shift one to the right accordingly (all on the right of the insert place) Ex:
Initially, element at index 0 (12) is considered sorted
Take 11 from unsorted portion
Compare 11 with 12: 11 < 12, so shift 12 right
Insert 11 at index 0
Result: [11, 12, 13, 5, 6]
Best case advantage
- Performs well on sorted, or nearly sorted arrays
- O(n) best case, check n-1 comp for already sorted
Algorithm of choice when maintaining sorted arrays when inserting new elements:
- Each insertion takes O(n) in worst case
Worst case:
- O(n^2): array in reverse order requires max shifts
- O(n^2): for random arrays
- good for small datasets
- provides stability

void insertionSort(std::vector<int>& arr) {
    int n = arr.size();

    // Start from second element
    for (int i = 1; i < n; i++) {
        int key = arr[i];
        int j = i - 1;

        // Shift elements right until correct position found
        while (j >= 0 && arr[j] > key) {
            arr[j + 1] = arr[j];
            j--;
        }

        // Insert key at correct position
        arr[j + 1] = key;
    }
}

Bubble Sort

Compares adjacent elements and swaps them if they’re in the wrong order
Repeats this process moving left to right until no more swaps are needed.
Large value “bubble up” to their correct pose at the end of array, like bubbles rising to the surface of water.
After a single pass, the largest element ends up in it’s final pose at the end “bubbles up”
Algo needs multiple passes
- n-1 passes for an array of size n
- Each pass places one ele in its final pose
Optimization trick: if a pass completes with no swaps then array must be sorted
- Good for partially sorted array
- Best case O(n), fully sorted one pass
- Else: O(n^2)
Worst case, array in reverse order O(n^2) and random arr O(n^2).
Work on nearly sorted well, with opt trick and for small arrs.

void bubbleSort(std::vector<int>& arr) {
    int n = arr.size();

    // Outer loop for each pass
    for (int i = 0; i < n - 1; i++) {
        bool swapped = false;

        // Inner loop for comparisons
        for (int j = 0; j < n - 1 - i; j++) {
            // Compare adjacent elements
            if (arr[j] > arr[j + 1]) {
                // Swap if in wrong order
                std::swap(arr[j], arr[j + 1]);
                swapped = true;
            }
        }

        // Early termination if no swaps
        if (!swapped) break;
    }
}

Comparing These 3 Sorting Algo

None used in production
Only insert sort used maybe for small dataset and/or embed sys
Insert and bubble sort stable, select sort not
- Stable in the sense order is preserved (of the original) if the two elements are equal.
Select sort when write ops are expensive
Bubble sort mostly just for demos

Built-in Sort Functions

Run in O (n log n) time
For prod code, never implement your own sorting algo
Timsort: hybrid algo that adapts to pattern in data
- Used by python
Introsort: combine multiple strats based on data chars
- Used by C++
Both rely on divide and conquer
- divide in small subprobs, solve each piece and combine
- Most common algos are merge sort and quicksort.
- By dividing the problem, these algo change growth rate from O(n^2) to O(n log n).
Sorting numbers

#include <algorithm>
#include <vector>

std::vector<int> numbers = {5, 2, 8, 1, 9};
std::sort(numbers.begin(), numbers.end());
// numbers is now [1, 2, 5, 8, 9]

Sorting strings

#include <algorithm>
#include <vector>
#include <string>
#include <cctype>

std::vector<std::string> words = {
    "zebra", "apple", "Banana", "cherry"
};

// Default sort
std::sort(words.begin(), words.end());
// [Banana, apple, cherry, zebra]

// Sort by length
std::sort(words.begin(), words.end(),
    [](const std::string& a, const std::string& b) {
        return a.length() < b.length();
    }
);

Sorting custom objects

#include <algorithm>
#include <vector>
#include <string>
#include <iostream>

struct Product {
    std::string name;
    int price;
    double rating;
};

std::vector<Product> products = {
    {"Laptop", 999, 4.5},
    {"Mouse", 25, 4.8},
    {"Keyboard", 79, 4.2},
    {"Monitor", 299, 4.6}
};

// Sort by price (ascending)
std::sort(products.begin(), products.end(),
    [](const Product& a, const Product& b) {
        return a.price < b.price;
    }
);

// Sort by multiple criteria: price, then rating
std::sort(products.begin(), products.end(),
    [](const Product& a, const Product& b) {
        if (a.price != b.price) {
            return a.price < b.price;
        }
        return a.rating > b.rating;
    }
);

Comparators: takes two elemets and returns a value indicating their relativve order.
- if a < b : return -1, place a before b
- if a == b : return 0, keep curr order
- if a > b : return 1, place b before a
Sort function calls your comparator repeatedly, after many comparisons the sort func determines final order.
C++ comparators return bool, return true if a should come before b, false otherwise. (Use lambda func to make concise)
stable vs unstable sort: stable sort preserves the relative order of equal elements
- Python’s sort is stable
- C++ std::sort is unstable, std::stable_sort() gurantee stability.

Beyond Basic Sorting

Advanced sorting algos use divide and conquer to get O(n log n).
The log factor comes from dividing the problem in half
- 1000 element array only requires 10 divs (2^10 = 1024)
- each element participates in roughly log n comps, this is why O (n log n)

Merge Sort

Merge Sort: splits array in half until each piece is one element.
- single element are trivially sorted
- then it merges these “sorted pieces” back together, maintianing order
- merging produces one sorted result, compare first element of each array and take smaller one. repeat until both arrays pass through
- merge op takes, O(n) run time, since each element moves once.
- recursive splitting creates log n levels of depth.
- Total work O (n log n)!!!
Worst case = best case = avg case = O(n log n)
Space: O(n) memory to hold merged results temporarily.
- bad for sorting very large datasets.
- tradeoff off mergesort

Quicksort

Selects a pivot ele and partitions the array around it
Ele smaller then pivot go left, larger go right
the pivot ends up in its final sorted pose.
the algo recursively sorts the left and right partitions.
No extra space O(1) space complexity.
On avg: achieves O(n log n) performance, and often faster then merge sort due to better cache localit and fewer memory ops
However, quick sort’s performance depends on pivot selection
- Worst case: O(n^2): choosing pivot which lead to unbalanced partitions (like first element in an already sorted array.)
  - use median of three or randomized pivots to avoid the problem.
- Avg case: O (n log n)

Math for Algos

Number Bases: base 10 (10 unique digits), base 2 (binary): two unique digits.
Ex: 1011 -> 2^0 + 2^1 + 02^2 + 12^3 = 11
Logs
Sets: collection of distinct elements (order not matter)
Permutation: specific arrangement of the elements in a set, order is crucial
counting permutations: factorial of n (n!) = n* (n-1)(n-2)…1
Number of subsets a set can have: each element can be in it or out of it only two possibilities. If a set has n elements, there will be 2^n subsets. (Includes empty subset and the original set itself)
Arithmetic Sequence: seq of nums where the diff between conseq terms is const. Ex: 1,3,5,7,9
- Sum of arth seq: (first_element + last_element)* number_of_element/2
- When is it useful? For analyzing nested loop complexity.
Geometric Seq: ratio between conseq terms is constant. Ex: 1 2 4 8 16
- Sum of geom seq: first_element * (1-ration^number_of_elements)/(1-ratio)
- Useful in counting number of nodes in a perfect binary tree.
Modular Arithmetic: system of ints that “wrap around” when reaching a certain value.
- (a + b) MOD c = ((a MOD c) + (b MOD c)) MOD c
- Useful when: problem with divisibility
- ex: The general solution to this classical problem is to check if n is divisible by any integers in the range [2, sqrt(n)] Why?

Binary Search

Array search algo
Narrow down search range by half each time
Key Requirement: Array is sorted
O(log N ) runtime
Pick middle ele, if less then that, move right pointer to middle ele-1, if more, move left pointer to middle ele + 1
Use iterative version for O(1) memory.

int binary_search(std::vector<int> arr, int target) {
    int left = 0;
    int right = arr.size() - 1;

    while (left <= right) { // <= here because left and right could point to the same element, < would miss it
        int mid = left + (right - left) / 2; // use `(right - left) / 2` to prevent `left + right` potential overflow
        // found target, return its index
        if (arr.at(mid) == target) return mid;
        if (arr.at(mid) < target) {
            // middle less than target, discard left half by making left search boundary `mid + 1`
            left = mid + 1;
        } else {
            // middle greater than target, discard right half by making right search boundary `mid - 1`
            right = mid - 1;
        }
    }
    return -1; // if we get here we didn't hit above return so we didn't find target
}

Can be used to find boundaries of Bool
Official template

int binary_search(std::vector<int> arr, int target) {
int left = 0;
int right = arr.size() - 1;
int firstTrueIndex = -1;
while (left <= right) {
    int mid = left + (right - left) / 2;
    if (feasible(mid)) {
        firstTrueIndex = mid;
        right = mid - 1;
    } else {
        left = mid + 1;
    }
}
return firstTrueIndex;
}

good for finding pivots in sorted arrays (where it was rotated from)
good for searching through data distributions and finding value (max/min) of a variable.

Two Pointers

Solve problems involving iterable data structure (like an array).
Uses two or more pointers that traverses through the structure.
Doesn’t actually have to be a pointer
No singular way to implement it, problem dependent.
General Algorithm:
- Two moving pointers, regardless of directions, moving dependently or independently
- Function that utilizes the entries referenced by the two pointers, which relates to the solution in some way
- A way of deciding which pointer to move
- A way to process the array when pointers are moved.

Classifications

Same direction pointers, ex: replacing dupes
Opposite directions (binary search a sub-method here)
Sliding window vs Two pointers: similar but in slide wind the function performs on the entire interval bw the two pointers. In two pointers, keep track of the cumulative result of the window.

Non Array Apps

Can be used on linked list (if it is iterable)
Detecting cycle in linked list

Why use two Pointers

More efficient than naive solution
Converts O(n^2) to O(n), passing through array once instead of two loops.

Data Structures and Algorithms Crash Course

Data Structures

Computer Components

References vs Primitives

Memory Size of Types

Memory Alignment

Strings are special

Stack and Heap

Stack

Heap

How it helps in coding?

References vs Values (Mutation problem)

Why Data Structures?

Matching structure to problem

Algorithmic Complexity

Counting operations

Constant vs Linear Time

Quadratic Time

Avoiding Quadratic Time

Best, worst, and average case

Space Complexity:

Space-Time trade offs

When space matters:

Arrays:

Iteration

Adding/Removing elements at the end

Adding/Removing from Middle

Dynamic Arrays

Strings

Indexing

Iteration

Concactenation

Comparison

String Methods

Immutability

Character Counting

Linked List

Nodes and Pointers

Traverse O(N)

Insert

Deletion

Linked List vs Array

Queues

FIFO Model

Queue Operations

Queue w/ Array

Queue w/ Linked List

Real World Apps

Hashmaps

Hash Functions

Collision Handling

Chaining

Full table

Hash to Array Index

Complete lookup process

Common Ops:

Sets - Uniqueness

Stack (LIFO Model)

Stack Apps

Imp Stack w/ Array

Stack vs Queue (LIFO vs FIFO)

Sorting

When to Sort First

Problem Solve Patterns

When not to Sort

Selection Sort

Insertion Sort

Bubble Sort

Comparing These 3 Sorting Algo

Built-in Sort Functions

Beyond Basic Sorting

Merge Sort

Quicksort

Math for Algos

Binary Search

Two Pointers

Classifications

Non Array Apps

Why use two Pointers