【抽样调查】实验

news/2025/1/13 3:35:01/

文章目录

    • 1、数组矩阵简单抽样
      • (1)构造数组
      • (2)构造矩阵
      • (3)产生来自正态分布的随机数
      • (4)从正态总体中抽取若干个样本
      • (5)对矩阵的行或列进行统计计算
    • 2、R软件作图
      • (1)用curve函数画Gamma(1 / 2, 1)... Gamma(13 / 2, 1)
      • (2) 用curve函数画Be(1 / 2, 1 / 2), Be(3 / 2.,3 / 2)
      • (3)用plot函数画N(0, 1), t(20), t(10), t(5), t(2)
    • 3、简单抽样(一)
      • (1)从R中装载数据集"cars"
      • (2)iteration = 100, N = 50, n = 10 时的误差
      • (3)iteration = 500, N = 50, n = 10 时的误差
      • (4)iteration = 1000, N = 50, n = 10 时的误差
      • (5)编写循环语句,计算iteration从100到2000,步长为100的误差
      • (6)编写程序,画出(5)中误差变化的情况
      • (7) 将(5)中的结果记录到"record"文件
    • 4、简单抽样(二)
      • 1、产生200个均值15, 标准差1的正态随机数
      • 2、用简单随机抽样方法(无放回),抽取样本容量为20的样本
      • 3、抽培养如2所示样本100个,分别用for循环、replicate()实现
      • 4、计算100个样本中每个样本的样本均值、样本标准差
      • 5、根据每个样本,计算总体均值的置信水平95%的置信区间
      • 6、在平面直角坐标系中画出100个置信区间
      • 7、 计算100个置信区间的置信概率
    • 5、中心极限定理模拟
      • (1)产生来自常用分布的随机数
      • (2)画直方图,并设置图形参数
      • (3)画函数分布的核密度曲线,并设置图形参数
      • (4)画正态分布的密度函数,并设置图形参数
    • 6、分层抽样
      • 问题一
      • 问题二
      • 问题三

1、数组矩阵简单抽样

(1)构造数组

c(1, 2, 3, 4, 5, 6)
seq(from=1, to=10, by= 2)
seq(1,100,4)
1:10

(2)构造矩阵

matrix(1:12, 3, 4)
# 三行四列

(3)产生来自正态分布的随机数

rnorm(100, 0, 1) # 产生100个0 - 1之间服从正态分布的随机数m = rnorm(100, 0, 1)
n1 = sample(m, 10)  # 从m中抽取10个数	 
n1 				   # 打印

(4)从正态总体中抽取若干个样本

m = rnorm(100, 0, 1)
n1 = sample(m, 10)
n2 = sampel(m, 10)
mean1 = mean(n1)     # 计算n1抽样的平均数
mean2 = mean(n2)	#  计算n2抽样的平均数
mean3 = mean(c(n1, n2))  # 计算n1和n2构成数组的元素平均数# 对总体均值于样本均值的平均数均值进行比较
> mean3
[1] 0.52052
mean0 = mean(m) # 总体均值
mean0
[1] 0.05755668
# 总体均值 0.05755668 < 样本均值 0.52052

(5)对矩阵的行或列进行统计计算

2、R软件作图

require() # 要求 用于加载库
graphics  # 绘图 库名称
expression # 表示 显示列名称

(1)用curve函数画Gamma(1 / 2, 1)… Gamma(13 / 2, 1)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-iqYJOXau-1686833845961)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20230613162257583.png)]

require(graphics) # 加载库
par(lwd=2)        # 打开绘图板
color=hsv(seq(0, 1, 1/ 7))   # 设置颜色
curve(dgamma(x, 1 / 2, 1), xlim = c(0, 10), ylim = c(0, 0.7),  # 坐标刻度设定ylab = expression(paste("Ga(",alpha,",",lambda,")")),    # 坐标名称设定col = color[1])
number<-7
for (i in 2 : 7){curve(dgamma(x, (i - 1) + 1 / 2, 1), col = color[i], add = T)legend("topright", c("Ga(1 / 2 , 1)", "Ga(3 / 2, 1)", "Ga(5 / 2, 1)", "Ga(7 / 2, 1)", "Ga(9 / 2, 1)", "Ga(11 / 2, 1)", "Ga(13 / 2, 1)"),col = color, bty = "n", cex = 1.2, pch = 15)
}

(2) 用curve函数画Be(1 / 2, 1 / 2), Be(3 / 2.,3 / 2)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QInDbmMt-1686833746631)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20230613163926996.png)]

require(graphics) # 加载库
par(lwd = 2) 	# 打开绘图板
color = hsv(seq(0, 1, 1 / 7))  # 设置颜色Be = function(x, a, b){dbeta(x, a, b)
}curve(Be(x, 1 / 2, 1 / 2), xlim = c(0, 10), ylim = c(0, 2), # 画线+设置坐标刻度ylab = expression(paste("Ga(",alpha,",",lambda,")")),	# 设置y轴名称col = color[1],						# 设置颜色main = "Beta-Density", lty = 1)  			# 设置图名称curve(Be(x, 3 / 2, 3 / 2), col = color[2], add = T)   # add 图坐标以创建,添加线
legend("topright", c("Be(1 / 2, 1 / 2)", "Be (3 / 2,3 / 2)"), col = color, bty = "n", cex = 1.2, pch = 15) # 设置左上角图例

(3)用plot函数画N(0, 1), t(20), t(10), t(5), t(2)

3、简单抽样(一)

replicate  # 重复

(1)从R中装载数据集"cars"

data("cars")
Y = cars[, 2] # 将数据集cars中的第二列数据构成Y数组

(2)iteration = 100, N = 50, n = 10 时的误差

N = length(Y)
n = 10
iteration=100
y = replicate(Iteration, sample(Y, n))
ybar = apply(y, 2, mean)   # 选择列来进行平局n
Eybar = mean(ybar)		   #计算期望
Dx = Eybar - mean(Y)
Dx

(3)iteration = 500, N = 50, n = 10 时的误差

data("cars")
Y = cars[, 2] 
N = length(Y)
n = 10
iteration = 500
y = replicate(iteration, sample(Y, n))
ybar = apply(y, 2, mean)
Eybar = mean(ybar)
Dx = Eybar - mean(Y)
Dx
[1] -0.1918
Eybar
[1] 42.7882
ybar

(4)iteration = 1000, N = 50, n = 10 时的误差

data("cars")
Y = cars[, 2]
N = length(Y)
n = 10
iteration = 1000
y = replicate(iteration, sample(Y, n))
ybar = apply(y, 2, mean)
Eybar = mean(ybar)
Dx = Eybar - mean(Y)
Dx
0.0226

(5)编写循环语句,计算iteration从100到2000,步长为100的误差

data("cars")
Y = cars[, 2]
N = length(Y)
n = 10for (iteration in seq(100, 2000, 100)){y = replicate(iteration, sample(Y, n))ybar = apply(y, 2, mean)Eybar = mean(ybar)Dx = Eybar - mean(Y)print(Dx)
}

(6)编写程序,画出(5)中误差变化的情况

require(graphics)
data("cars")
Y = cars[, 2]
N = length(Y)
n = 10for (iteration in seq(100, 2000, 100)){y = replicate(iteration, sample(Y, n))ybar = apply(y, 2, mean)Eybar = mean(ybar)Dx[iteration / 100] = Eybar - mean(Y)}
plot(x = seq(100, 2000, 100), y = Dx, type = 'o', xlab = 'iteration', ylab = 'Dx', col = 'red')

(7) 将(5)中的结果记录到"record"文件

write.table(Dx, file = 'record.txt', col.names=F)

4、简单抽样(二)

1、产生200个均值15, 标准差1的正态随机数

mu = 15 	# 均值
sigma = 1   # 标准差
N = 200
r = rnorm(N, mu , sigma)
r

2、用简单随机抽样方法(无放回),抽取样本容量为20的样本

mu = 15 	# 均值
sigma = 1   # 标准差
N = 200
n = 20
Y = rnorm(N, mu, sigma)
r = sample(Y, n)
r

3、抽培养如2所示样本100个,分别用for循环、replicate()实现

# for 循环
mu = 15
sigma = 1
N = 200
n = 20
Y = rnorm(N, mu, sigma)
number = 100
r = replicate(number, sample(Y, n))
r# replicate()
mu = 15
sigma = 1
N = 200
n = 20
Y = rnorm(N, mu, sigma)
number = 100
for (i in 1 : number) r[, i] = sample(Y, n)
r

4、计算100个样本中每个样本的样本均值、样本标准差

mu = 15
sigma = 1
N = 200
n = 20
Y = rnorm(N, mu, sigma)
number = 100
y = replicate(number, sample(Y, n))
ybar = apply(y, 2, mean)       # 计算每一个样本均值
s = apply(y, 2, sd)			  # 计算每一个样本标准差
ybar

5、根据每个样本,计算总体均值的置信水平95%的置信区间

mu = 15
sigma = 1
N = 200
n = 20
Y = rnorm(N, mu, sigma)
number = 100
y = replicate(number, sample(Y, n))
ybar = apply(y, 2, mean)
s = apply(y, 2, sd)f = n / N
ybar = apply(y, 2, mean)
s = apply(r, 2, sd)
yl = ybar - sqrt((1 - f) / n) * s
yu = ybar + sqrt((1 - f) / n) * s
yl		# 置信下区间
yu		# 置信上区间

6、在平面直角坐标系中画出100个置信区间

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CHjoIt1c-1686833915840)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20230614215217276.png)]

mu = 15
sigma = 1
N = 200
n = 20
Y = rnorm(N, mu, sigma)
number = 100
y = replicate(number, sample(Y, n))
s = apply(y, 2, sd)f = n / N
ymean = apply(y, 2, mean)
s = apply(r, 2, sd)
yl = ymean - sqrt((1 - f) / n) * s
yu = ymean + sqrt((1 - f) / n) * sf = n / N
u = 1.96# 制作图表
plot (1, xlim=c(0.5, number + 0.5), ylim = c(min(yl) - 0.5, max(yu) + 0.5),type = "n", xlab = "Trials", ylab = "Confidence Intervals",main = expression(paste("Confidence Interval of 1-", alpha, sep=" ")))# 绘制线条
for (i in 1 : number){arrows(i, yl[i], i, yu[i], length = 0.1, angle = 90, code = 3,col = ifelse(ymean > yl[i] & ymean < yu[i], "blue", "red"))points(i, ymean[i])
}
#Sys.sleep(0.5)

7、 计算100个置信区间的置信概率

mu = 15
sigma = 1
N = 200
n = 20
Y = rnorm(N, mu, sigma)
number = 100
y = replicate(number, sample(Y, n))
s = apply(y, 2, sd)f = n / N
ymean = apply(y, 2, mean)
s = apply(r, 2, sd)
yl = ymean - sqrt((1 - f) / n) * s
yu = ymean + sqrt((1 - f) / n) * sf = n / N
u = 1.96
plot (1, xlim=c(0.5, number + 0.5), ylim = c(min(yl) - 0.5, max(yu) + 0.5),type = "n", xlab = "Trials", main = expression(paste("Confidence Interval of 1-", alpha, sep=" ")),ylab = "Confidence Intervals")cn = 0 # 1for (i in 1 : number){arrows(i, yl[i], i, yu[i], length = 0.1, angle = 90, code = 3,col = ifelse(ymean > yl[i] & ymean < yu[i], "blue", "red"))points(i, ymean[i])cn = cn + as.numeric(ymean > yl[i] & ymean < yu[i]) # 1
}abline(h = ymean, lty = 2,) # 1
cp = cn / number # 1 
cat ("Confidence Probability = ", cp) # 1

5、中心极限定理模拟

(1)产生来自常用分布的随机数

N = 1000
# 从 1-6 中随机抽取一个,重复 1000 次
Y1 = replicate(N, sample(1:6, 1)) # 次数, 样本
Y1

(2)画直方图,并设置图形参数

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-v2CRXamF-1686834005316)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20230614223136724.png)]

N = 1000
# 从 1-6 中随机抽取一个,重复 1000 次
Y1 = replicate(N, sample(1:6, 1)) # 次数, 样本hist(Y1, probability = T, breaks = 0.5 + (0:6) * 1,xlab = expression(paste(Y[1], "=", X[1])),ylab = "直方图", ylim = c(0, 0.25),col = "light blue")

(3)画函数分布的核密度曲线,并设置图形参数

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jb5Rn37z-1686834005319)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20230614224003299.png)]

### 核密度曲线
# 密度函数density
lines(density(Y1), lwd = 3, col = "blue")

(4)画正态分布的密度函数,并设置图形参数

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QI92sZ9g-1686834005320)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20230614224022561.png)]

### 正态分布密度函数, 样式,粗细,颜色
# 正态分布密度函数curve
curve(dnorm(x, mean(Y1), sd(Y1)), lty = 3, lwd = 3, col = "red", add = T)

6、分层抽样

问题一

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-47q17ok8-1686834065957)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20230614224750954.png)]

# 1.1
# Correlation, Variance and Covariance (Matrices)
# 相关、方差和协方差(矩阵)
data = c(43, 84, 98, 0, 10, 44, 0, 124, 13, 0, 50, 147, 62, 87, 84, 158, 170, 104, 56, 160,228, 262, 110, 232, 139, 178, 334, 0, 63, 220, 17, 34, 25, 34, 36, 0, 25, 7, 15, 31)
sam = matrix(data, nrow = 4, ncol = 10, byrow = TRUE) # byrow 指定数组横竖成矩阵total = c(1411, 4700, 2558, 14997)
Wh = c()
Sh = c()
ybar = c()
nh = c(10, 10, 10, 10)
N = sum(total)for (i in (1 : length(total))){Wh[i] = total[i] / N      # 每一层占总体的比例Sh[i] = var(sam[i, ])     # 计算每一层的方差ybar[i] = mean(sam[i, ])  # sam[i,] 输出第i行的数据,每一层的平均值
}Y = sum(total * ybar) 		  # 牛总数
Vss = sum(Wh ^ 2 * Sh / nh) - sum(Wh * Sh / N)
sd = sqrt(Vss)
sd 
Mean = sum(Wh * ybar)
Mean
cv = sd / Mean # 变异系数
cv 
# 1.2
s2 = 1 / (sum(nh) - 1) * sum ((sam - mean(sam)) ^ 2)
s2 
Vsrs = (1 - sum(nh) / N) * (s2 / sum(nh))
Vsrs
deff = Vss / Vsrs
deff
# 1.3
Vyst = 1 / sum(nh) * (sum(Wh * sqrt(Sh))) ^ 2 - 1 / N * (sum (Wh * Sh))
change = Vyst / Vss
change

问题二

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GTAWA4M4-1686834065957)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20230614224759788.png)]

# 1.1
pre = c(550, 720, 1500, 1020, 620, 980, 928, 1200, 1350, 1750, 670, 729, 1530)
now = c(610, 780, 1600, 1030, 600, 1050, 977, 1440, 1570, 2210, 980, 865, 1710)
y_tilde = sum(now) / sum(pre) * 128200
y_tilde
t.test(now / pre * 128200)data:  now/pre * 128200
t = 32.17, df = 12, p-value = 5.142e-13
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:  # 95%置信区间135639 155347
sample estimates:			 # 样本估计
mean of x 					145493 
# 1.2
pre = c(550, 720, 1500, 1020, 620, 980, 928, 1200, 1350, 1750, 670, 729, 1530)
now = c(610, 780, 1600, 1030, 600, 1050, 977, 1440, 1570, 2210, 980, 865, 1710)
y_tilde = sum(now) / sum(pre) * 128200
y_tilde
t.test(now / pre * 128200)xbar = mean(pre)
ybar = mean(now)
N = 123
n = 13
X = 128200
sxy = cov(pre, now)
varx = var(pre)
vary = var(now)
R = ybar / xbar
Yrbar = R * X
a = sxy / varx
Xbar = X / N
ylr = N ^ 2 * (1 - n / N) 
se = sqrt((n - 1) * (vary - a * sxy) / (n - 2))
vYlr = N ^ 2 * ( 1 - n / N) * se ** 2 / n
sd = sqrt(vYlr)
alpha = 0.05
(C = Q(Yrbar, sd, alpha))

问题三

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-AfpLaI78-1686834065958)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20230614224812351.png)]

N1=120N2=180n1=6n2=9y1=c(210,160,75,280,300,190)x1=c(204,143,82,256,275,198)y2=c(150,200,125,60,110,100,180,75,90)x2=c(137,189,119,63,103,107,159,63,87)X1=24500X2=21200f1=n1/N1f2=n2/N2y1bar<-mean(y1) x1bar<-mean(x1) y2bar<-mean(y2) x2bar<-mean(x2)y1var=var(y1)x1var=var(x1)y2var=var(y2)x2var=var(x2)sxy1=cov(x1,y1)sxy2=cov(x2,y2)# (1)分别比估计R1=y1bar/x1barR2=y2bar/x2baryrs=X1*R1+X2*R2yrs#总体估计量
[1] 48206.45yrsvar=(N1^2*(1-f1)/n1*(y1var+R1^2*x1var-2*R1*sxy1)+N2^2*(1-f2)/n2*(y2var+R2^2*x2var-2*R2*sxy2))yrssd=sqrt(yrsvar)yrssd#标准误
[1] 751.4072# (2)联合比估计N=N1+N2W1=N1/NW2=N2/Nxstbar=W1*x1bar+W2*x2barystbar=W1*y1bar+W2*y2barRC=ystbar/xstbaryrc=X*RCyrc#总体估计量
[1] 135240.7yrcvar=(N1^2*(1-f1)/n1*(y1var+RC^2*x1var-2*RC*sxy1)+N2^2*(1-f2)/n2*(y2var+RC^2*x2var-2*RC*sxy2))yrcsd=sqrt(yrcvar)yrcsd#标准误
[1] 746.388

http://www.ppmy.cn/news/406340.html

相关文章

vue可视化面板创建项目

前端项目初始化步骤 安装 vue 脚手架 通过vue脚手架创建项目 在命令行输入vue ui 等待可视化界面打开 填写项目名称&#xff08;必须英文&#xff09;和仓库信息(可选填&#xff09;&#xff0c;然后点击下一步进入“预设面板” 这里根据需要选择一个选项&#xff0c;然后…

日均访问量过万的网站一般需要多大的云服务器

如果一个网站日均访问量过万的话&#xff0c;需要从配置和带宽方面来分析&#xff0c;那需要选择多大的配置和带宽呢&#xff1f;群英简单说说。 在网站建站初期&#xff0c;中小型企业建造的展现型网站&#xff0c;一般信息与访问量都较少&#xff0c;建议挑选低配置&#xff…

游戏高防服务器租用,如何选择配置?

随着互联网业务的快速发展&#xff0c;服务器市场也越发的火热&#xff0c;如何在众多服务商中脱颖而出&#xff0c;除了有出色的平台运作外&#xff0c;服务器的访问速度和稳定性对用户的体验也是至关重要的&#xff0c;下面小编来给您解答&#xff0c;游戏高防服务器租用&…

高防服务器租用的安全标准有哪些

高防服务器标准防御 提供自动DDos流量防御&#xff0c;手工CC防御。比如当syn报文超过一定数量pps后&#xff0c;会自动进入防御状态。拦截攻击流量&#xff0c;放行正常流量。在没有达到防御阀值前所有攻击由服务器自身防御。另外每个源IP到服务器的最高连接数有限制&#xf…

游戏平台搭建韩国大带宽CN2服务器哪里的比较便宜

游戏平台搭建韩国大带宽CN2服务器哪里的比较便宜 很多游戏企业都会将自身的游戏业务部署在韩国服务器上&#xff0c;特别是全球同服的网络游戏都是将亚服部署在韩国。就来给大家说说&#xff0c;为何韩国服务器适合部署游戏业务&#xff1f;   一、游戏行业发达&#xff0c;…

40万用户同时在线 服务器,40万访问量的网站,需要服务器什么样的配置?

40w IP这访问量很小&#xff0c;按照互联网的行业规矩&#xff0c;每天按8小时计算(夜间等时间段排除)&#xff0c;40W IP大约是 400000/8*360014&#xff0c;也就是TPS才十几而已&#xff0c;同时在线人数TPS3&#xff0c;也就是大约40人左右&#xff0c;远远不到4000人。当然…

一个大型网站需要多少服务器?

韩信点兵&#xff0c;多多益善。 现在不光在线服务吃机器&#xff0c;背后跑各种大数据查询的&#xff0c;人工智能算法的&#xff0c;你给他们多少他们就能用多少。巨头的服务器数量是几十万甚至百万量级的。比如说&#xff0c;好几年前Google的服务器数量是两百五十万台。 …

承受每天百万级访问量的网站需要多少台服务器?

说一下我的网站吧&#xff0c;是一个漫画网站&#xff0c;全站动态伪静态&#xff0c;目前建站一年&#xff0c;是一个漫画为主的网站。 帖服务器配置截图 以及高峰期的CPU内存占用情况 因为是漫画网站全是图片&#xff0c;所以配了2个8T SAS硬盘做rid&#xff0c;保持服务器稳…