why数据结构与算法

一个数据结构的组成(From Wikipedia)

  1. A collection of data items
  2. The relationships among them
  3. The operations that can be applied to the data structure

数据结构的Common operations(From Geeks4Geeks)

  1. Access: Get a data item in the given data structure.
  2. Search: Find a particular data item in the given data structure.
  3. Insert: Add a data item in the given data structure.
  4. Delete: Delete a data item in the given data structure.

Why不同的数据结构

根据不同的应用场景,对于以上common operations调用的数量不一样,而不同的数据结构对于不同的common operations效率不一样,因此需要不同的数据结构来应对不同的应用场景,使得整个程序的效率更高。

数据结构的几大类

Based on https://www.webpageacademy.com/cms/college-it-subjects
  • Primitive Data Structure: Primitive data structures are predefined data types. They are supported by all programming languages. All the programming languages like java, c#, or any object-oriented programming language are all inherited from C and C++.
    • Integer, Float, Char, Boolean, Pointer
  • Non-primitive Data Structure: Not predefined in programming languages. They can be implemented with the help of primitive data types
    • Nonlinear: The data items are arranged in a random manner,也因此使得
      • Hash table
      • Tree(Also check Wikipedia Trees: One tree node can have more than 2 child nodes)
      • Graph(与其他数据结构的设计初衷都不同,主要是为了表达network)
From https://www.hellocodeclub.com/when-to-use-which-data-structure-top-6-data-structures/

算法快慢的评价指标

  1. Space Complexity
  2. Time Complexity: 通常我们主要关注Big-O(Asymptotic上界), Big-Ω(Asymptotic下界)和Big-Θ(When the lower and upper bounds are the same)。从最快到最慢的次序是[latexpage] $O(1)< O(log(n)) < O(n) < O(nlog(n)) < O(n^2) < O(2^n) < O(n!)$
From https://github.com/TSiege/Tech-Interview-Cheat-Sheet#algorithm-basics

不同数据结构对4种common operations的效率

从下图中可以看出: static linear的Array是方便access,不方便search, insert和delete。所有的linear dynamic的数据结构(Queue, stack和linked-list)都是方便insert和delete,但是不方便access和search。Nonlinear的Tree是为了search快,Hash table是search, delete和insert都快,而Graph是有别于其他的2个nonlinear data structures(hash table和tree)以及其他的linear data structures

From https://www.bigocheatsheet.com/

用到指针的数据结构

算法的几大类

基于用途对算法分类(Princeton CS算法课程https://algs4.cs.princeton.edu/home/):

  • Sorting algorithm
  • Searching/Traversal algorithm
  • Graph algorithm

基于pattern对算法分类(https://cs.lmu.edu/~ray/notes/algpatterns/)

  • Brute Force: Not really a “pattern”. Enumerate all possible solutions, unintelligently, and try them all until you find a solution.
  • Divide and Conquer: Breaking down a problem into multiple independent subproblems(mutiple), solving the subproblems (recursively), and combining those solutions into a solution for the original problem.
    • Mergesort, Quicksort, Median, Karatsuba’s Integer Multiplication, Matrix Multiplication, FFT, Nearest Neighbors
  • Decrease and Conquer: A variant of divide and conquer where the problem is broken down into one subproblem.
    • Binary search, Factorial, Selection Sort, Largest Number, Greatest Common Divisor, Topological Sort, Insertion or lookup in a binary search tree, Computing the median
  • The Greedy Method: Solving a problem by doing the “best looking” thing at each step. (May miss a solution, or may miss the optimal one. But in some cases where it is known to work, it is a great approach.)
    • Minimum Spanning Trees, Naive coin changing, Huffman Compression, Dijkstra’s Shortest Path
  • Dynamic Programming: Solving an optimization problem by breaking down a problem into multiple overlapping(not independent) subproblems, solving the subproblems (recursively), and combining those solutions into a solution for the original problem. The idea is to cache the results of overlapping subproblems. Can be done bottom-up (table construction) or top-down (recursive with memorization).
    • Interval scheduling, Longest common subsequence, Coin changing, Levenshtein distance, Matrix-chain multiplication, Integer knapsack, Shortest path, Word wrap, Traveling salesperson
  • Backtracking: A method for systematically generating possible solutions to a problem in which you sometimes have to back up when realizing your partially generated candidate can’t possibly be extended to a real solution.
    • Map coloring, Eight queens, Knight’s Tour, Maze solving, Regular expression matching, Generic pathfinding
  • Branch and Bound: Backtracking applied to optimization problems.
    • Satisfiability, Traveling salesperson, Integer programming, Nearest neighbor search, Nonlinear programming
  • Hill Climbing: Solving (or finding an approximate solution to) an optimization problem by generating candidate solutions that are (hopefully) improvements over the previous candidate. 
    • Basic Hill Climbing chooses the “best” next step, Genetic algorithms choose a genetic mutation of the previous candidate. Simulated Annealing makes the next choice based on a particular formula used in metallurgy.
  • Particle Swarm Optimization: Solving an optimization problem with a bunch of decentralized particles all searching for a solution with something that looks like it has a collective organization (e.g. ant colonies, bird flocks, animal herds, etc.)
    • Neural network training, Finite element updating
  • Las Vegas: A randomized algorithm that always produces the correct answer but makes no guarantees on how long it will run or how much space it will need (in the worst case).
    • Used as a defense against algorithm complexity attacks.
    • Randomized Quicksort, Finding a value in a collection
  • Monte Carlo: A randomized algorithm that has time and space guarantees but has a small probability of giving the wrong answer. The probability of error can be reduced by running the algorithm longer.
    • Used when all known deterministic algorithms for a problem are too slow, or when estimation is an inherent part of the problem.
    • Miller-Rabin primality test, Approximating π (by throwing darts), Approximating integrals, Game playing
  • Transform and Conquer: Solving a problem by reducing, or transforming, it to a similar (usually easier) problem whose solution implies a solution to the original.
  • Preprocessing: Playing tricks with the input (input enhancement) or building up a cache (prestructuring) prior to doing the official run.
    • Table of counts for counting sort, Boyer-Moore pattern matching, Storing often used data in a hashtable, Store often-used data in a search tree (B-tree, BST, Red-black, …), Heapify, prior to heapsort.

1 Comment

Leave a Comment