数据归一化(Normalization)是将数据按比例缩放,使之落入一个小的特定区间,如:[0, 1]或 [-1, 1]。主要目的是为了将其转化为无量纲的纯数值,以便于不同单位或量级的指标能够进行比较和加权。

主要方法有Min-Max Normalization:(X – min(X)) / (max(X) – min(X)

在R中可以用以下代码计算

min_max_norm <- function(x) {
    (x - min(x)) / (max(x) - min(x))
  }

数据标准化(Standardization)数据的标准化是将数据按比例缩放,使之落入一个特定区间。

主要方法有Z-Score Normalization:(X – μ) / σ

在R中可以用以下代码计算

scale(X)
# scale(x, center = TRUE, scale = TRUE) 结果即z-score标准化结果,默认进行均值为0,标准差为1的标准化操作
# center为真表示数据中心化(只减去均值不做其他处理)
# scale为真表示数据标准化

如果直接看,会发现两者在概念上有些相似之处,下面比较下两者的区别(直接引用参考资料中的原汁原味表格):

NO.NormalisationStandardisation
1.Minimum and maximum value of features are used for scalingMean and standard deviation is used for scaling.
2.It is used when features are of different scales.It is used when we want to ensure zero mean and unit standard deviation.
3.Scales values between [0, 1] or [-1, 1].It is not bounded to a certain range.
4.It is really affected by outliers.It is much less affected by outliers.
5.Scikit-Learn provides a transformer called MinMaxScaler for Normalization.Scikit-Learn provides a transformer called StandardScaler for standardization.
6.This transformation squishes the n-dimensional data into an n-dimensional unit hypercube.It translates the data to the mean vector of original data to the origin and squishes or expands.
7.It is useful when we don’t know about the distributionIt is useful when the feature distribution is Normal or Gaussian.
8.It is a often called as Scaling NormalizationIt is a often called as Z-Score Normalization.

同时建议我们在数据处理、建模的过程中也进行数据的预处理提升数据分析效率及准确性。

参考资料:

1. https://www.statology.org/how-to-normalize-data-in-r/

2.https://www.geeksforgeeks.org/normalization-vs-standardization/

3.https://www.cnblogs.com/ooon/p/4947347.html