概率密度函数可帮助确定随机变量值的较高和较低概率的区域。我们通过密度分布图可以看出数据的整体概貌。
下面主要展示如何用R来绘制多样本概率密度分布曲线
#library
library(tidyverse)
library(viridis)
library(patchwork)
library(ggthemes)
# 加载数据
data <- read.table("probly.csv", header=TRUE, sep=",")
data <- data %>%
gather(key="text", value="value") %>%
mutate(text = gsub("\\.", " ",text)) %>%
mutate(value = round(as.numeric(value),0))
# 数据注释,包好4个注释和注释的位置坐标
annot <- data.frame(
text = c("Almost No Chance", "About Even", "Probable", "Almost Certainly"),
x = c(5, 53, 65, 79),
y = c(0.15, 0.4, 0.06, 0.1)
)
# 绘制
data %>%
filter(text %in% c("Almost No Chance", "About Even", "Probable", "Almost Certainly")) %>%
mutate(text = fct_reorder(text, value)) %>%
ggplot( aes(x=value, color=text, fill=text)) +
geom_density(alpha=0.6) +
scale_fill_viridis(discrete=TRUE) +
scale_color_viridis(discrete=TRUE) +
geom_text( data=annot, aes(x=x, y=y, label=text, color=text), hjust=0, size=4.5) +
theme_few() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
) +
xlab("") +
ylab("Assigned Probability (%)") +
ggtitle("how people perceive probability vocabulary")
一个简单的多样本概率密度绘制好了
参考资料:
1.https://www.data-to-viz.com/caveat/multi_distribution.html
2.https://support.minitab.com/zh-cn/minitab/18/help-and-how-to/probability-distributions-and-random-data/supporting-topics/basics/using-the-probability-density-function-pdf/
GK
你好,感谢分享的代码。只是我还是没能重复出此博文里面的数据所做的图。可以具体聊一下吗?
陈浩
你是遇见了什么问题,可以将你遇见的错误信息发布上来。