标准误、标准差、置信区间是做生物统计的基础,下面我们简单说说这几者的区别和绘图方式。

标准差(Standard Deviation):标准差,缩写为S.D., SD, 或者 s,是描述数据点在均值(mean)周围聚集程度的指标,反映个体变异。

标准误差(Standard Error): 标准误差,缩写为S.E., SE, 样本平均数与总体平均数之间的相对误差,标准误越小,说明样本平均数与总体平均数越接近;否则,表明样本平均数比较离散。

置信区间(Confidence Interval):置信区间又称估计区间, 缩写为CI, 置信区间是指由样本统计量所构造的总体参数的估计区间。

下面我们展示如何在柱状图上加上它们:


# 加载包
library(tidyverse)
library(ggthemes)
library(patchwork)

# 演示数据
data <- iris

# 计算 mean, sd, se 和 ci
my_sum <- data %>%
    group_by(Species) %>%
    summarise(
        n = n(),
        mean = mean(Sepal.Length),
        # 计算标准差
        sd = sd(Sepal.Length)
    ) %>%
    # 计算标准误、置信区间
    mutate(se = sd / sqrt(n)) %>%
    mutate(ic = se * qt((1 - 0.05) / 2 + .5, n - 1))

# 标准误
p1 <- ggplot(my_sum) +
    geom_bar(
        aes(x = Species, y = mean),
        stat = "identity",
        color = "black",
        fill = "black",
        alpha = 0.7,
        width = 0.5
    ) +
    # 添加误差线(添加上半段)
    geom_errorbar(
        aes(
            x = Species,
            ymin = mean,
            ymax = mean + se
        ),
        width = 0.2,
        colour = "black",
        alpha = 0.9,
        size = 0.5
    ) +
    ggtitle("standard error") +
    theme(plot.title = element_text(size = 6)) +
    theme_few() +
    # y轴设置0起始
    scale_y_continuous(expand = c(0, 0), limits = c(0, 10), breaks = c(0, 5, 10)) +
    xlab("") +
    ylab("Sepal Length")

# 标准差
p2 <- ggplot(my_sum) +
    geom_bar(
        aes(x = Species, y = mean),
        stat = "identity",
        fill = "black",
        alpha = 0.7,
        width = 0.5
    ) +
    geom_errorbar(
        aes(
            x = Species,
            ymin = mean,
            ymax = mean + sd
        ),
        width = 0.2,
        colour = "black",
        alpha = 0.9,
        size = 0.5
    ) +
    ggtitle("standard deviation") +
    theme(plot.title = element_text(size = 6)) +
    theme_few() +
    scale_y_continuous(expand = c(0, 0), limits = c(0, 10), breaks = c(0, 5, 10)) +
    xlab("") +
    ylab("Sepal Length")

# 置信区间
p3 <- ggplot(my_sum) +
    geom_bar(
        aes(x = Species, y = mean),
        stat = "identity",
        fill = "black",
        alpha = 0.7,
        width = 0.5
    ) +
    geom_errorbar(
        aes(
            x = Species,
            ymin = mean,
            ymax = mean + ic
        ),
        width = 0.2,
        colour = "black",
        alpha = 0.9,
        size = 0.5
    ) +
    ggtitle("confidence interval") +
    theme(plot.title = element_text(size = 6)) +
    theme_few() +
    scale_y_continuous(expand = c(0, 0), limits = c(0, 10), breaks = c(0, 5, 10)) +
    xlab("") +
    ylab("Sepal Length")

p1 + p2 + p3

如此一个简单的分析做好了,建议读者可以自己动手试试。

参考资料:

1.https://www.data-to-viz.com/caveat/error_bar.html