## why数据结构与算法

#### 一个数据结构的组成(From Wikipedia)

- A collection of data items
- The relationships among them
- The operations that can be applied to the data structure

#### 数据结构的Common operations(From Geeks4Geeks)

- Access: Get a data item in the given data structure.
- Search: Find a particular data item in the given data structure.
- Insert: Add a data item in the given data structure.
- Delete: Delete a data item in the given data structure.

#### Why不同的数据结构

根据不同的应用场景，对于以上common operations调用的数量不一样，而不同的数据结构对于不同的common operations效率不一样，因此需要不同的数据结构来应对不同的应用场景，使得整个程序的效率更高。

#### 数据结构的几大类

- Primitive Data Structure: Primitive data structures are predefined data types. They are supported by all programming languages. All the programming languages like java, c#, or any object-oriented programming language are all inherited from C and C++.
- Integer, Float, Char, Boolean, Pointer

- Non-primitive Data Structure: Not predefined in programming languages. They can be implemented with the help of primitive data types
- Linear: The data items are arranged in a
**sequential manner**.- Static: Array
- Dynamic: Linked List, Stack, Queue

- Nonlinear: The data items are arranged in a
**random manner**,也因此使得- Hash table
- Tree(Also check Wikipedia Trees: One tree node can have more than 2 child nodes)
**Graph**(与其他数据结构的设计初衷都不同，主要是为了表达network)

- Linear: The data items are arranged in a

#### 算法快慢的评价指标

- Space Complexity
- Time Complexity: 通常我们主要关注
**Big-O**(Asymptotic上界), Big-Ω(Asymptotic下界)和Big-Θ(When the lower and upper bounds are the same)。从最快到最慢的次序是[latexpage] $O(1)< O(log(n)) < O(n) < O(nlog(n)) < O(n^2) < O(2^n) < O(n!)$

#### 不同数据结构对4种common operations的效率

从下图中可以看出： static linear的Array是方便access，不方便search, insert和delete。所有的linear dynamic的数据结构(Queue, stack和linked-list)都是方便insert和delete，但是不方便access和search。Nonlinear的Tree是为了search快，Hash table是search, delete和insert都快，而Graph是有别于其他的2个nonlinear data structures(hash table和tree)以及其他的linear data structures

#### 用到指针的数据结构

## 算法的几大类

基于用途对算法分类(Princeton CS算法课程https://algs4.cs.princeton.edu/home/)：

- Sorting algorithm
- Searching/Traversal algorithm
- Graph algorithm

基于pattern对算法分类(https://cs.lmu.edu/~ray/notes/algpatterns/)

**Brute Force**: Not really a “pattern”. Enumerate all possible solutions, unintelligently, and try them all until you find a solution.**Divide and Conquer**: Breaking down a problem into multiple**independent****subproblems**(mutiple), solving the subproblems (recursively), and combining those solutions into a solution for the original problem.- Mergesort, Quicksort, Median, Karatsuba’s Integer Multiplication, Matrix Multiplication, FFT, Nearest Neighbors

- Decrease and Conquer: A variant of divide and conquer where the problem is broken down into one subproblem.
- Binary search, Factorial, Selection Sort, Largest Number, Greatest Common Divisor, Topological Sort, Insertion or lookup in a binary search tree, Computing the median

- The Greedy Method: Solving a problem by doing the “best looking” thing at each step. (May miss a solution, or may miss the optimal one. But in some cases where it is known to work, it is a great approach.)
- Minimum Spanning Trees, Naive coin changing, Huffman Compression, Dijkstra’s Shortest Path

- Dynamic Programming: Solving an optimization problem by breaking down a problem into multiple overlapping(not independent) subproblems, solving the subproblems (recursively), and combining those solutions into a solution for the original problem. The idea is to cache the results of overlapping subproblems. Can be done bottom-up (table construction) or top-down (recursive with memorization).
- Interval scheduling, Longest common subsequence, Coin changing, Levenshtein distance, Matrix-chain multiplication, Integer knapsack, Shortest path, Word wrap, Traveling salesperson

- Backtracking: A method for systematically generating possible solutions to a problem in which you sometimes have to back up when realizing your partially generated candidate can’t possibly be extended to a real solution.
- Map coloring, Eight queens, Knight’s Tour, Maze solving, Regular expression matching, Generic pathfinding

- Branch and Bound: Backtracking applied to optimization problems.
- Satisfiability, Traveling salesperson, Integer programming, Nearest neighbor search, Nonlinear programming

- Hill Climbing: Solving (or finding an approximate solution to) an optimization problem by generating candidate solutions that are (hopefully) improvements over the previous candidate.
- Basic Hill Climbing chooses the “best” next step, Genetic algorithms choose a genetic mutation of the previous candidate. Simulated Annealing makes the next choice based on a particular formula used in metallurgy.

- Particle Swarm Optimization: Solving an optimization problem with a bunch of decentralized particles all searching for a solution with something that looks like it has a collective organization (e.g. ant colonies, bird flocks, animal herds, etc.)
- Neural network training, Finite element updating

**Las Vegas**: A randomized algorithm that always produces the correct answer but makes no guarantees on how long it will run or how much space it will need (in the worst case).- Used as a defense against algorithm complexity attacks.
- Randomized Quicksort, Finding a value in a collection

**Monte Carlo**: A randomized algorithm that has time and space guarantees but has a small probability of giving the wrong answer. The probability of error can be reduced by running the algorithm longer.- Used when all known deterministic algorithms for a problem are too slow, or when estimation is an inherent part of the problem.
- Miller-Rabin primality test, Approximating π (by throwing darts), Approximating integrals, Game playing

: Solving a problem by reducing, or transforming, it to a similar (usually easier) problem whose solution implies a solution to the original.**Transform and Conquer****Preprocessing**: Playing tricks with the input (input enhancement) or building up a cache (prestructuring) prior to doing the official run.- Table of counts for counting sort, Boyer-Moore pattern matching, Storing often used data in a hashtable, Store often-used data in a search tree (B-tree, BST, Red-black, …), Heapify, prior to heapsort.

## 1 Comment