Paper Reading | Searching for MobileNet V3

Posted on 2020-02-10

论文题目：Seaching for MobileNet v3
论文作者：
相关代码：

摘要

本文展示了基于网络搜索技术和新的网络结构设计的下一代的MobileNet（轻量级网络）。MobileNet v3结合了硬件网络结构搜索（NAS）和NetAdapt算法，并且通过新的网络结构来改进。本文开始介绍了如何自动进行网络搜索，然后通过调整互补的方法来整体上提升网络的性能。通过这个方法，文章介绍了基于资源多少情况下的两种新的轻量级网络：MobileNet v3-Large 和MobileNet v3-Small。对于语义分割（或者其他密集像素预测）的task而言，我们提出了一个新的有效的分割decoder，Lite Reduced Atrous Spatial Pyramid Pooling(LR-ASPP) 轻量ASPP。MobileNetv3 -large 比 MobileNetv2 在imageNet上的准确度提高了3.2%，速度（计算复杂度）提高了 20%；MobileNetv3 -small 比 MobileNetv2 在imageNet上的准确度提高了6.6% ；MobileNetv3 -large 比 MobileNetv2 在COCO上的检测速率快了25%，准确率二者相当；MobileNetv3 -large LR-ASPP比 MobileNetv2 R-ASPP 在 Cityspace 分割人物快了34%，准确率两者相当。

Algorithm | LinkedList

Posted on 2020-01-17 In 算法

啊，好像不怎么碰到Linked List使用的情况啊。之前上一个文章里总结的都是 Array的问题，其实在python里也叫list，所以就比较容易混淆。array list 和linked list的区别就在于，array list有直接的index 可以在 O(1) 的时间复杂度下找到想要找的数值。但是linked list就需要从头开始找。但是linked list比较方便于插入和删除，只需要找到对应的点改变前后两个node的.next属性就可以了，但是array list就很麻烦，需要对每一个list中的值进行移动。

NO.2 Add Two Numbers

题目

You are given two non-empty linked lists representing two non-negative integers. The digits are stored in reverse order and each of their nodes contain a single digit. Add the two numbers and return it as a linked list.

You may assume the two numbers do not contain any leading zero, except the number 0 itself.

Example:
1
2
3
Input: (2 -> 4 -> 3) + (5 -> 6 -> 4)
Output: 7 -> 0 -> 8
Explanation: 342 + 465 = 807.

解题思路

这道题其实就是简单的运用linked list的方面，对于我自己，需要注意的点是：

当两个数相加的和大于10 的时候需要进位，熟练运用 % 和 / 运算，
需要加dummyhead

第一点是我可以想到的，第二点我查了一些帖子，都是在说为什么需要dummy head。是因为如果是doubly linked list时，有了dummyhead就不需要多去判定原来的head node是不是为空。【这类问题还没遇到】另一个原因就是，list整体需要变动，就是我们这道题，定义的两位之和需要不断向后移动来更新.next的内容

Solution

def addTwoNumbers(self, l1, l2):
        """
        :type l1: ListNode
        :type l2: ListNode
        :rtype: ListNode
        """
        addto = 0
        l = ListNode(0)
        dummyhead = l
        
        while l1 and l2:
            l.next = ListNode((addto + l1.val + l2.val) % 10)
            addto = (addto + l1.val + l2.val) / 10
            l = l.next
            l1 = l1.next
            l2 = l2.next
            
        while l1:
            l.next = ListNode((addto + l1.val) % 10)
            addto = (addto + l1.val) / 10
            l = l.next
            l1 = l1.next
            
        while l2:
            l.next = ListNode((addto + l2.val) % 10)
            addto = (addto + l2.val) / 10
            l = l.next
            l2 = l2.next
        print(addto)   
        if addto:
            l.next = ListNode(addto)
            
        return dummyhead.next

应该可以再合并一些情况，但是思路就是这样。

NO.19 Remove Nth Node from the end of the list.

>

解题思路

简单的来说可以先把head中的所有node遍历一遍存在一个list中，然后再遍历一遍根据for循环来找到倒数第n个值，也就是要判定 i == len(vals) - n，把其他的值都存在新的listnode中。但是这部分要遍历两遍。

可以一遍吗？

Solution

# 遍历两遍
def removeNthFromEnd(self, head, n):
        """
        :type head: ListNode
        :type n: int
        :rtype: ListNode
        """
        vals = []
        while head:
            vals.append(head.val)
            head = head.next
        
        newhead = ListNode(0)
        dummy = newhead
        for i in range(len(vals)):
            if i == len(vals) - n:
                continue
            newhead.next = ListNode(vals[i])
            newhead = newhead.next
            
        return dummy.next
# 遍历一遍
    def removeNthFromEnd(self, head, n):
        """
        :type head: ListNode
        :type n: int
        :rtype: ListNode
        """
        dummy = ListNode(0)
        dummy.next = head
        
        new1 = dummy
        new2 = dummy
        
        while n>0:
            new1 = new1.next
            n = n - 1
            
        while new1.next:
            new1 = new1.next
            new2 = new2.next
        new2.next = new2.next.next
        
        return dummy.next

NO.24

Solution

# Change numbers
def swapPairs(self, head):
        """
        :type head: ListNode
        :rtype: ListNode
        """
        if not head or not head.next: return head
        
        dummy = ListNode(0)
        new = dummy        
        while head and head.next:
            val1 = head.val
            head = head.next
            val2 = head.val
            head = head.next
            
            new.next = ListNode(val2)
            new = new.next
            new.next = ListNode(val1)
            new = new.next
        
        new.next = head
        return dummy.next
# Change nodes.
def swapPairs(self, head):
        """
        :type head: ListNode
        :rtype: ListNode
        """
        if not head or not head.next: return head
        
        dummy = ListNode(0)
        dummy.next = head
        
        new = dummy
        
        while new.next and new.next.next:
            first = new.next
            second = new.next.next
            new.next = second
            first.next = second.next
            second.next = first
            
            new = new.next.next
 
        return dummy.next

NO.

Estimation and Detection | Properties of estimators

Posted on 2020-01-17 In 课堂笔记

Intro

这部分主要讲的是对于已知分布的随机变量的分布参数的估计，这里对参数的估计可以具备三种不同的性质：分别置无偏，有效，一致。这些性质可以帮我们更好的对参数进行估计。另外，在我们发现有多个

Properties of estimators

在现实生活中，我们想到得知一个整体的某个特征 $k$，也就是 $k$ 值的大小。但有时候由于整体太过庞大，我们无法对整体中的每一个个体的该特征 $k$ 都进行统计计算，这时候我们只能抽样整体中的一小部分的个体，由他们组成我们要估计的样本空间。那对于这些小部分个体的 $k$ 我们需要构建估计量，来估计 $k$ 的值。

这时候，对于估计量来说，有三个特征：无偏，有效，一致

Unbiased (无偏)

假设现在我们的样本空间${X_1, X_2,…,X_N}$ ，无偏估计是指当我们在对样本的 $\theta$ 特征进行估计的时候，估计量 $T{(\theta)}$ 的期望等于 $\theta$，即 $E{T(\theta)} = \theta$. 若该估计量是有偏估计，那么 $E{T(\theta)} = g(\theta)$，即估计量的期望是关于被估计参数的方程，如果这个方程是线性的，那么这个估计量是可以从有偏变到无偏的。

Consistency（一致）

对于一致估计量，我们希望在我们增大样本数据的时候，估计量的方差会趋近于0.

假设现在我们的样本${X_1, X_2,…,X_N}$ 作为估计样本$\theta$ 的估计量为 $T{X_1, X_2,…,X_N}$ ,若样本量增大 N值变大，$Var{T{X_1, X_2,…,X_N}}$ 趋近于0，那么我们称这个估计量是一致的。

Efficient (有效)

Finsher-Neyman Factorization theorem

$p(\underline{X}|\theta) = h(\underline{X})g(T(\underline{X}),\theta)$

上式表示，当且仅当$T(\underline{X})$ 是充分统计量的时候，样本空间的联合概率 $p(\underline{X}|\theta)$ 可以表示为分离的两部分的乘积。

MVUE & Sufficient Statistics

为什么需要MVUE呢，因为在我们对 ${X_1, X_2,…,X_N}$ 分布的参数进行估计的时候，可能 $\theta$ 的估计量有很多都是无偏估计（unbiased）。如何在这么多的无偏估计中选一个最好的呢，我们选择那个variance比较小的。

引入MVUE的时候，我们需要先说明一个概念：sufficient statistics

Sufficient Statistics

假设样本空间 ${X_1, X_2,…,X_N}$ ,IID (independent and identical distribution)
服从分布 $p_i(X_i|\theta)$

这时，如果联合分布 $P(X_1,…X_N|T{X_1, X_2,…,X_N},\theta)$中的估计量是不含 $\theta$ 变量的，那么估计量 $T{X_1, X_2,…,X_N}$ 就是充分统计量(sufficient statistics)。其实也就是说我们可以用 $T{X_1, X_2,…,X_N}$ 来估计样本 ${X_1, X_2,…,X_N}$ 的分布而不需要用到样本分布的参数 $\theta$ 的时候，我们就说此时 $T{X_1, X_2,…,X_N}$ 是充分统计量。

我们在计算充分统计量的时候可以通过计算 ${X_1, X_2,…,X_N}$ 这些变量的联合概率，然后提取和 $\theta$ 不相关的部分提取出来，就是充分统计量。提取的方法我们称作 Fishser-Neyman factorization theorem.

举个例子

假设 $x1, x_2, …,x_n$ 服从伯努利分布，即： $\begin{align}x_i = \begin{cases} 1 &{w.p. \theta}\0 & w.p. (1-\theta)\end{cases}\end{align}$ 表示$x_i$ 在伯努利分布下，取 1 的概率是 $\theta$ 取 0 的概率是 $1- \theta$. 这样的话，联合分布 $p({\bf x}|\theta) = \prod\limits{i = 1}^{n}\theta^{xi}(1 - \theta)^{1 - x_i} = \theta^{\sum\limits{i = 1}^{n}xi}(1 - \theta)^{n - \sum\limits{i = 1}^{n}x_i}$

这样可以得到，$\theta$ 的充分统计量是 $T(x1,x_2,…,x_n) = \sum\limits{i = 1}^{n}x_i$. 这样的话，充分统计量就是独立于参数$\theta$ 外的量了，也就是说$p({\bf x}|\theta) = \theta^{T}(1 - \theta)^{n - T}$

既然得知了 $\theta$ 的充分统计量 $T(x1,x_2,…,x_n) = \sum\limits{i = 1}^{n}x_i$，我们可以首先来看一下 $T$ 的取值范围：$0\leq T\leq n$, 那么关于 $T(x_1,x_2,…,x_n) $ 就有 $n$ 个不同的取值，$T(x_1, x_2, …,x_n)$ 的分布就为 $P(T(x_1, x_2, …,x_n) = k | \theta) = {\tbinom{k}{n}\theta^k(1 - \theta)^{n-k}}$，$P(x_1, …,x_n,T(x_1, x_2, …,x_n) = k |\theta) = {\theta^k(1 - \theta)^{n-k}}$ 这样的话：$P(x_1,…x_n|T(x_1, x_2, …,x_n) = k , \theta) = \frac{\theta^k(1 - \theta)^{n-k}}{\tbinom{k}{n}\theta^k(1 - \theta)^{n-k}} = \frac{1}{\tbinom{k}{n}}$.

这说明 ,在已知充分统计量的时候，联合分布是和参数 $\theta$ 无关的。同时充分统计量也减少了在数据中和参数 $\theta$ 相关的部分。

Rao-Backwell theorem

令 $g(x_1, x_2, …x_n)$ 为参数 $\theta$ 的一个无偏估计量，那么 $E{g({\bf x})} |T({\bf x})}$则是参数 $\theta$ 的MVUE，也就是说 $E{\hat g} = E{E{g({\bf x})} |T({\bf x})} = \theta$, 且 $VAR{\hat g({\bf x})} \leq VAR{g({\bf x})}$

How to find the MVUE

找到参数的 $T$, 充分统计量
找到任意一个参数的无偏估计量 $g({\bf x})$
计算 MVUE 为 $\hat g({\bf x}) = E{ g({\bf x}) | T({\bf x })}$

Estimation and Detection| Bayes Decision Theory

Posted on 2020-01-16 In 课堂笔记

Intro

这部分就是贝叶斯分类器，也是模式识别中最简单的分类器。贝叶斯分类器的思想在神经网络的目标方程中也有所涉及。对于贝叶斯分类器来讲，重要的是进行分类的指标，也就是feature的分布，以及不同类别的先验概率)。

*先验概率和后验概率的区别在于是否观测到x

Estimation and Detection| Introduction

Posted on 2020-01-15 In 课堂笔记

Estimation and Detection

哈哈哈哈哈哈没想到吧我又来挖坑了，之前的坑会赶紧补上的。

这部分是开坑新课，也就是检测与估计，说白了会包含模式识别的内容。然后是自己和老师的笔记稍微整理和回顾一下。我们老师的课都偏数学，所以一切推导过程如果清晰，那么整个内容就清晰明了了。希望之前你学过概率论。

课程目录

Bayes decision theory
- Classifier
- Probability of error and SNR
- Feature selection
Parametric Estimation and Supervised Learning
- Optimal Estimatiors
- Sufficient Statistics
- Maximum Likelihood Estimation
- Cramer Rao Bounds
- Bayesian Learning
Hypothesis testing
- NP Lemma
- GLRT
Unbiased Classifier and Asymptotic Bayesian Methods
- KL Information
- The concept of Parsimony
Decision fusion
Parametric Estimation and Unsupervised Learning
- Mixture density estimation
- EM algorithm
Nonparametric Supervised Learning
- Nonparametric testing

Paper Reading | ClothFlow: A Flow-Based Model for Clothed Person Generation

Posted on 2020-01-14 In 论文阅读

论文题目： ClothFlow: A Flow-Based Model for Clothed Person Generation
论文作者：$Xintong Han, Xiaojun Hu, Weilin Huang∗, Matthew R. Scott$
论文原文中的一些翻译可能不准确，（）中的部分为论文原文中的术语

摘要

Algorithm | Leetcode - search

Posted on 2020-01-14 In 算法

搜索Tag部分

这部分主要用来记录总结Leetcode中数组部分的题目和思路分析

语言：python

NO.39 Combination Sum

题目

Given a set of candidate numbers (candidates) (without duplicates) and a target number (target), find all unique combinations in candidates where the candidate numbers sums to target.

The same repeated number may be chosen from candidates unlimited number of times.

Note:

All numbers (including target) will be positive integers.

The solution set must not contain duplicate combinations.

Example 1:
1
2
3
4
5
6
Input: candidates = [2,3,6,7], target = 7,
A solution set is:
[
  [7],
  [2,2,3]
]
Example 2:
1
2
3
4
5
6
7
Input: candidates = [2,3,5], target = 8,
A solution set is:
[
  [2,2,2,2],
  [2,3,3],
  [3,5]
]

解题思路

这道题是搜索类题目，需要用到DFS。

DFS其实是一个递归的过程，对于所有的递归过程我们需要知道的是停止的点，以及边界的变化。这个组合数求和的题目终止的点就是当所加数之和大于target或者等于target的时候。更新的部分应该就是target的变化，以及path的变化。

几个值得注意的地方：

什么时候变化target？
什么时候更新path？

在做下次的变化的时候同时跟新target和path

def dfs(nums, idx, target, path, sol):
  if target == 0: 
    sol.append(path)
    return 
  if target < 0:
    return
  for i in range(idx, len(nums)):
    dfs(nums, i , target - nums[i], path + [nums[i]], sol)

需要用到动态规划：

Solution

def combinationSum(self, candidates, target):
        """
        :type candidates: List[int]
        :type target: int
        :rtype: List[List[int]]
        """
        sol = []
        candidates.sort()
        self.dfs(candidates, 0, [], sol, target)
        
        return sol
        
    def dfs(self, nums, idx, path, sol, target):
        
        if target == 0:
            sol.append(path)
            return
        if target < 0:
            return
        
        for i in range(idx, len(nums)):
            self.dfs(nums, i, path + [nums[i]], sol, target - nums[i])

算法 | 【Leetcode】数组array类题目总结

Posted on 2020-01-09 In 算法

数组Tag部分

这部分主要用来记录总结Leetcode中数组部分的题目和思路分析

语言：python

NO.1 Two Sums

NO.4 Median of Two Sorted Arrays

论文阅读 |U-Net: Convolutional Networks for Biomedical Image Segmentation

Posted on 2020-01-08 In 论文阅读

论文题目：U-Net: Convolutional Networks for Biomedical Image Segmentation
论文作者：$Olaf Ronneberger, Philipp Fischer, Thomas Bro$
相关代码：https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
论文原文中的一些翻译可能不准确，（）中的部分为论文原文中的术语

摘要

本文主要介绍了一种能够最大程度利用数据增强进行训练的模型，并且在数据分割任务中表现出色。这个网络结构包含了一个降维采样（contracting path）的过程来提取图片中的内容信息，以及一个对称的上采样（expanding path）可以准确提供分割的位置。文章展示了该网络可以通过比较少的图片进行训练，然后得到比之前的方法更好的分割数据。

Notes | Kmeans 和 KNN

Posted on 2020-01-08 In 干货

KNN（k-Nearest Neighbor）

字面意思可以理解为是K个最近的近邻点。

这个方法主要用于有监督学习的分类算法，即给定一个点不知道它属于哪个类的时候，根据这个点和周围k个最近已分类的点进行判断。相当于一个投票机制下的分类器，需要看这个新的引入点周围的k个临近点，占哪个分类的比较多，则这个新的引入点就被称作是分类较多的那一类。

K means

字面意思可以理解为是k个平均，也就是k个平均中心。

这个方法主要用于无监督学习的聚类算法，即对一堆未分类的数据，根据已知的类的数目K（即，已知这堆点可以被分为K类），进行分类。

主要流程

已知：一堆未分类点，K值

随意在这堆点中选择 K 个点，每个点为K个分类个子的中心，然后对于这堆点中其他的点进行分类。分类的方法是，对于任意一个点，计算它和之前随机选定的K个点之间的距离，该点离哪个中心点最近，这个点就属于那一类。在对所有的点进行分类之后，可以根据每一类已有的点计算他们每一类的中心点（即均值中心点），然后进行中心点的更新，中心点更新之后，对该点集中所有的点重新进行分类计算，然后计算个点新的类别。重复这个步骤直到更新后的中心点与上次计算的中心点之间的差距小于一个阈值。这样就得到了K类。

如果我不知道K怎么办

可以通过手肘法或轮廓间距法对K值进行估计

手肘法
轮廓间距法