why数据结构与算法

Data Structure and Algorithms

why数据结构与算法

一个数据结构的组成(From Wikipedia)

A collection of data items
The relationships among them
The operations that can be applied to the data structure

数据结构的Common operations(From Geeks4Geeks)

Access: Get a data item in the given data structure.
Search: Find a particular data item in the given data structure.
Insert: Add a data item in the given data structure.
Delete: Delete a data item in the given data structure.

Why不同的数据结构

根据不同的应用场景，对于以上common operations调用的数量不一样，而不同的数据结构对于不同的common operations效率不一样，因此需要不同的数据结构来应对不同的应用场景，使得整个程序的效率更高。

数据结构的几大类

Based on https://www.webpageacademy.com/cms/college-it-subjects

Primitive Data Structure: Primitive data structures are predefined data types. They are supported by all programming languages. All the programming languages like java, c#, or any object-oriented programming language are all inherited from C and C++.
- Integer, Float, Char, Boolean, Pointer
Non-primitive Data Structure: Not predefined in programming languages. They can be implemented with the help of primitive data types
- Linear: The data items are arranged in a sequential manner.
  - Static: Array
  - Dynamic: Linked List, Stack, Queue
- Nonlinear: The data items are arranged in a random manner,也因此使得
  - Hash table
  - Tree(Also check Wikipedia Trees: One tree node can have more than 2 child nodes)
  - Graph(与其他数据结构的设计初衷都不同，主要是为了表达network)

From https://www.hellocodeclub.com/when-to-use-which-data-structure-top-6-data-structures/

算法快慢的评价指标

Space Complexity
Time Complexity: 通常我们主要关注Big-O(Asymptotic上界), Big-Ω(Asymptotic下界)和Big-Θ(When the lower and upper bounds are the same)。从最快到最慢的次序是[latexpage] $O(1)< O(log(n)) < O(n) < O(nlog(n)) < O(n^2) < O(2^n) < O(n!)$

From https://github.com/TSiege/Tech-Interview-Cheat-Sheet#algorithm-basics

不同数据结构对4种common operations的效率

从下图中可以看出： static linear的Array是方便access，不方便search, insert和delete。所有的linear dynamic的数据结构(Queue, stack和linked-list)都是方便insert和delete，但是不方便access和search。Nonlinear的Tree是为了search快，Hash table是search, delete和insert都快，而Graph是有别于其他的2个nonlinear data structures(hash table和tree)以及其他的linear data structures

用到指针的数据结构

算法的几大类

基于用途对算法分类(Princeton CS算法课程https://algs4.cs.princeton.edu/home/)：

Sorting algorithm
Searching/Traversal algorithm
Graph algorithm

基于pattern对算法分类(https://cs.lmu.edu/~ray/notes/algpatterns/)

Brute Force: Not really a “pattern”. Enumerate all possible solutions, unintelligently, and try them all until you find a solution.
Divide and Conquer: Breaking down a problem into multiple independent subproblems(mutiple), solving the subproblems (recursively), and combining those solutions into a solution for the original problem.
- Mergesort, Quicksort, Median, Karatsuba’s Integer Multiplication, Matrix Multiplication, FFT, Nearest Neighbors
Decrease and Conquer: A variant of divide and conquer where the problem is broken down into one subproblem.
- Binary search, Factorial, Selection Sort, Largest Number, Greatest Common Divisor, Topological Sort, Insertion or lookup in a binary search tree, Computing the median
The Greedy Method: Solving a problem by doing the “best looking” thing at each step. (May miss a solution, or may miss the optimal one. But in some cases where it is known to work, it is a great approach.)
- Minimum Spanning Trees, Naive coin changing, Huffman Compression, Dijkstra’s Shortest Path
Dynamic Programming: Solving an optimization problem by breaking down a problem into multiple overlapping(not independent) subproblems, solving the subproblems (recursively), and combining those solutions into a solution for the original problem. The idea is to cache the results of overlapping subproblems. Can be done bottom-up (table construction) or top-down (recursive with memorization).
- Interval scheduling, Longest common subsequence, Coin changing, Levenshtein distance, Matrix-chain multiplication, Integer knapsack, Shortest path, Word wrap, Traveling salesperson
Backtracking: A method for systematically generating possible solutions to a problem in which you sometimes have to back up when realizing your partially generated candidate can’t possibly be extended to a real solution.
- Map coloring, Eight queens, Knight’s Tour, Maze solving, Regular expression matching, Generic pathfinding
Branch and Bound: Backtracking applied to optimization problems.
- Satisfiability, Traveling salesperson, Integer programming, Nearest neighbor search, Nonlinear programming
Hill Climbing: Solving (or finding an approximate solution to) an optimization problem by generating candidate solutions that are (hopefully) improvements over the previous candidate.
- Basic Hill Climbing chooses the “best” next step, Genetic algorithms choose a genetic mutation of the previous candidate. Simulated Annealing makes the next choice based on a particular formula used in metallurgy.
Particle Swarm Optimization: Solving an optimization problem with a bunch of decentralized particles all searching for a solution with something that looks like it has a collective organization (e.g. ant colonies, bird flocks, animal herds, etc.)
- Neural network training, Finite element updating
Las Vegas: A randomized algorithm that always produces the correct answer but makes no guarantees on how long it will run or how much space it will need (in the worst case).
- Used as a defense against algorithm complexity attacks.
- Randomized Quicksort, Finding a value in a collection
Monte Carlo: A randomized algorithm that has time and space guarantees but has a small probability of giving the wrong answer. The probability of error can be reduced by running the algorithm longer.
- Used when all known deterministic algorithms for a problem are too slow, or when estimation is an inherent part of the problem.
- Miller-Rabin primality test, Approximating π (by throwing darts), Approximating integrals, Game playing
Transform and Conquer: Solving a problem by reducing, or transforming, it to a similar (usually easier) problem whose solution implies a solution to the original.
Preprocessing: Playing tricks with the input (input enhancement) or building up a cache (prestructuring) prior to doing the official run.
- Table of counts for counting sort, Boyer-Moore pattern matching, Storing often used data in a hashtable, Store often-used data in a search tree (B-tree, BST, Red-black, …), Heapify, prior to heapsort.

January 23, 2021 huawei

Hathaway Zhu