Spark2.x 入门:逻辑回归分类器

news/2024/9/17 7:51:25/ 标签: 逻辑回归, 算法, 机器学习

方法简介

逻辑斯蒂回归(logistic regression)是统计学习中的经典分类方法,属于对数线性模型。logistic回归的因变量可以是二分类的,也可以是多分类的。

示例代码

我们以iris数据集(iris)为例进行分析。iris以鸢尾花的特征作为数据来源,数据集包含150个数据集,分为3类,每类50个数据,每个数据包含4个属性,是在数据挖掘、数据分类中非常常用的测试集、训练集。为了便于理解,我们这里主要用后两个属性(花瓣的长度和宽度)来进行分类。目前 spark.ml 中支持二分类和多分类,我们将分别从“用二项逻辑斯蒂回归来解决二分类问题”、“用多项逻辑斯蒂回归来解决二分类问题”、“用多项逻辑斯蒂回归来解决多分类问题”三个方面进行分析。

用二项逻辑斯蒂回归解决 二分类 问题

首先我们先取其中的后两类数据,用二项逻辑斯蒂回归进行二分类分析。

1. 导入需要的包:

import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
import org.apache.spark.ml.linalg.{Vector,Vectors}
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
import org.apache.spark.ml.{Pipeline,PipelineModel}
import org.apache.spark.ml.feature.{IndexToString, StringIndexer, VectorIndexer,HashingTF, Tokenizer}
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.classification.LogisticRegressionModel
import org.apache.spark.ml.classification.{BinaryLogisticRegressionSummary, LogisticRegression}
import org.apache.spark.sql.functions;

2. 读取数据,简要分析:

导入spark.implicits._,使其支持把一个RDD隐式转换为一个DataFrame。我们用case class定义一个schema:Iris,Iris就是我们需要的数据的结构;然后读取文本文件,第一个map把每行的数据用“,”隔开,比如在我们的数据集中,每行被分成了5部分,前4部分是鸢尾花的4个特征,最后一部分是鸢尾花的分类;我们这里把特征存储在Vector中,创建一个Iris模式的RDD,然后转化成dataframe;最后调用show()方法来查看一下部分数据。

scala> import spark.implicits._
import spark.implicits._scala> case class Iris(features: org.apache.spark.ml.linalg.Vector, label: String)
defined class Iris//使用spark.sparkContext.textFile()读取文件 
scala> val data = spark.sparkContext.textFile("file:///root/data/iris.txt").map(_.split(",")).map(p => Iris(Vectors.dense(p(0).toDouble,p(1).toDouble,p(2).toDouble,p(3).toDouble),p(4).toString())).toDF()
data: org.apache.spark.sql.DataFrame = [features: vector, label: string]//也可以使用spark.read.textFile() 读取文件
scala> val dataTest = spark.read.textFile("file:///root/data/iris.txt").map(_.split(",")).map(p => Iris(Vectors.dense(p(0).toDouble,p(1).toDouble,p(2).toDouble,p(3).toDouble),p(4).toString())).toDF()
dataTest: org.apache.spark.sql.DataFrame = [features: vector, label: string]scala> data.show()
+-----------------+-----------+
|         features|      label|
+-----------------+-----------+
|[5.1,3.5,1.4,0.2]|Iris-setosa|
|[4.9,3.0,1.4,0.2]|Iris-setosa|
|[4.7,3.2,1.3,0.2]|Iris-setosa|
|[4.6,3.1,1.5,0.2]|Iris-setosa|
|[5.0,3.6,1.4,0.2]|Iris-setosa|
|[5.4,3.9,1.7,0.4]|Iris-setosa|
|[4.6,3.4,1.4,0.3]|Iris-setosa|
|[5.0,3.4,1.5,0.2]|Iris-setosa|
|[4.4,2.9,1.4,0.2]|Iris-setosa|
|[4.9,3.1,1.5,0.1]|Iris-setosa|
|[5.4,3.7,1.5,0.2]|Iris-setosa|
|[4.8,3.4,1.6,0.2]|Iris-setosa|
|[4.8,3.0,1.4,0.1]|Iris-setosa|
|[4.3,3.0,1.1,0.1]|Iris-setosa|
|[5.8,4.0,1.2,0.2]|Iris-setosa|
|[5.7,4.4,1.5,0.4]|Iris-setosa|
|[5.4,3.9,1.3,0.4]|Iris-setosa|
|[5.1,3.5,1.4,0.3]|Iris-setosa|
|[5.7,3.8,1.7,0.3]|Iris-setosa|
|[5.1,3.8,1.5,0.3]|Iris-setosa|
+-----------------+-----------+
only showing top 20 rows

因为我们现在处理的是2分类问题,所以我们不需要全部的3类数据,我们要从中选出两类的数据。这里首先把刚刚得到的数据注册成一个表iris,注册成这个表之后,我们就可以通过sql语句进行数据查询,比如我们这里选出了所有不属于“Iris-setosa”类别的数据;选出我们需要的数据后,我们可以把结果打印出来看一下,这时就已经没有“Iris-setosa”类别的数据。

scala> data.createOrReplaceTempView("iris")scala> val df = spark.sql("select * from iris where label != 'Iris-setosa'")
df: org.apache.spark.sql.DataFrame = [features: vector, label: string]//因为有三类数据Iris-setosa、Iris-versicolor和Iris-virginica,没类数据有50个样本,现在去除Iris-setosa类的样本,现在还剩100个样本。
scala> df.count()
res4: Long = 100scala> df.map(t => t(1)+":"+t(0)).collect().foreach(println)
Iris-versicolor:[7.0,3.2,4.7,1.4]
Iris-versicolor:[6.4,3.2,4.5,1.5]
Iris-versicolor:[6.9,3.1,4.9,1.5]
Iris-versicolor:[5.5,2.3,4.0,1.3]
Iris-versicolor:[6.5,2.8,4.6,1.5]
Iris-versicolor:[5.7,2.8,4.5,1.3]
Iris-versicolor:[6.3,3.3,4.7,1.6]
Iris-versicolor:[4.9,2.4,3.3,1.0]
Iris-versicolor:[6.6,2.9,4.6,1.3]
Iris-versicolor:[5.2,2.7,3.9,1.4]
Iris-versicolor:[5.0,2.0,3.5,1.0]
Iris-versicolor:[5.9,3.0,4.2,1.5]
Iris-versicolor:[6.0,2.2,4.0,1.0]
Iris-versicolor:[6.1,2.9,4.7,1.4]
Iris-versicolor:[5.6,2.9,3.6,1.3]
Iris-versicolor:[6.7,3.1,4.4,1.4]
Iris-versicolor:[5.6,3.0,4.5,1.5]
Iris-versicolor:[5.8,2.7,4.1,1.0]
Iris-versicolor:[6.2,2.2,4.5,1.5]
Iris-versicolor:[5.6,2.5,3.9,1.1]
Iris-versicolor:[5.9,3.2,4.8,1.8]
Iris-versicolor:[6.1,2.8,4.0,1.3]
Iris-versicolor:[6.3,2.5,4.9,1.5]
Iris-versicolor:[6.1,2.8,4.7,1.2]
Iris-versicolor:[6.4,2.9,4.3,1.3]
Iris-versicolor:[6.6,3.0,4.4,1.4]
Iris-versicolor:[6.8,2.8,4.8,1.4]
Iris-versicolor:[6.7,3.0,5.0,1.7]
Iris-versicolor:[6.0,2.9,4.5,1.5]
Iris-versicolor:[5.7,2.6,3.5,1.0]
Iris-versicolor:[5.5,2.4,3.8,1.1]
Iris-versicolor:[5.5,2.4,3.7,1.0]
Iris-versicolor:[5.8,2.7,3.9,1.2]
Iris-versicolor:[6.0,2.7,5.1,1.6]
Iris-versicolor:[5.4,3.0,4.5,1.5]
Iris-versicolor:[6.0,3.4,4.5,1.6]
Iris-versicolor:[6.7,3.1,4.7,1.5]
Iris-versicolor:[6.3,2.3,4.4,1.3]
Iris-versicolor:[5.6,3.0,4.1,1.3]
Iris-versicolor:[5.5,2.5,4.0,1.3]
Iris-versicolor:[5.5,2.6,4.4,1.2]
Iris-versicolor:[6.1,3.0,4.6,1.4]
Iris-versicolor:[5.8,2.6,4.0,1.2]
Iris-versicolor:[5.0,2.3,3.3,1.0]
Iris-versicolor:[5.6,2.7,4.2,1.3]
Iris-versicolor:[5.7,3.0,4.2,1.2]
Iris-versicolor:[5.7,2.9,4.2,1.3]
Iris-versicolor:[6.2,2.9,4.3,1.3]
Iris-versicolor:[5.1,2.5,3.0,1.1]
Iris-versicolor:[5.7,2.8,4.1,1.3]
Iris-virginica:[6.3,3.3,6.0,2.5]
Iris-virginica:[5.8,2.7,5.1,1.9]
Iris-virginica:[7.1,3.0,5.9,2.1]
Iris-virginica:[6.3,2.9,5.6,1.8]
Iris-virginica:[6.5,3.0,5.8,2.2]
Iris-virginica:[7.6,3.0,6.6,2.1]
Iris-virginica:[4.9,2.5,4.5,1.7]
Iris-virginica:[7.3,2.9,6.3,1.8]
Iris-virginica:[6.7,2.5,5.8,1.8]
Iris-virginica:[7.2,3.6,6.1,2.5]
Iris-virginica:[6.5,3.2,5.1,2.0]
Iris-virginica:[6.4,2.7,5.3,1.9]
Iris-virginica:[6.8,3.0,5.5,2.1]
Iris-virginica:[5.7,2.5,5.0,2.0]
Iris-virginica:[5.8,2.8,5.1,2.4]
Iris-virginica:[6.4,3.2,5.3,2.3]
Iris-virginica:[6.5,3.0,5.5,1.8]
Iris-virginica:[7.7,3.8,6.7,2.2]
Iris-virginica:[7.7,2.6,6.9,2.3]
Iris-virginica:[6.0,2.2,5.0,1.5]
Iris-virginica:[6.9,3.2,5.7,2.3]
Iris-virginica:[5.6,2.8,4.9,2.0]
Iris-virginica:[7.7,2.8,6.7,2.0]
Iris-virginica:[6.3,2.7,4.9,1.8]
Iris-virginica:[6.7,3.3,5.7,2.1]
Iris-virginica:[7.2,3.2,6.0,1.8]
Iris-virginica:[6.2,2.8,4.8,1.8]
Iris-virginica:[6.1,3.0,4.9,1.8]
Iris-virginica:[6.4,2.8,5.6,2.1]
Iris-virginica:[7.2,3.0,5.8,1.6]
Iris-virginica:[7.4,2.8,6.1,1.9]
Iris-virginica:[7.9,3.8,6.4,2.0]
Iris-virginica:[6.4,2.8,5.6,2.2]
Iris-virginica:[6.3,2.8,5.1,1.5]
Iris-virginica:[6.1,2.6,5.6,1.4]
Iris-virginica:[7.7,3.0,6.1,2.3]
Iris-virginica:[6.3,3.4,5.6,2.4]
Iris-virginica:[6.4,3.1,5.5,1.8]
Iris-virginica:[6.0,3.0,4.8,1.8]
Iris-virginica:[6.9,3.1,5.4,2.1]
Iris-virginica:[6.7,3.1,5.6,2.4]
Iris-virginica:[6.9,3.1,5.1,2.3]
Iris-virginica:[5.8,2.7,5.1,1.9]
Iris-virginica:[6.8,3.2,5.9,2.3]
Iris-virginica:[6.7,3.3,5.7,2.5]
Iris-virginica:[6.7,3.0,5.2,2.3]
Iris-virginica:[6.3,2.5,5.0,1.9]
Iris-virginica:[6.5,3.0,5.2,2.0]
Iris-virginica:[6.2,3.4,5.4,2.3]
Iris-virginica:[5.9,3.0,5.1,1.8]

3. 构建ML的pipeline

分别获取标签列和特征列,进行索引,并进行了重命名。

scala> val labelIndexer = new StringIndexer().setInputCol("label").setOutputCol("indexedLabel").fit(df)
labelIndexer: org.apache.spark.ml.feature.StringIndexerModel = strIdx_a43d5773da03scala> val featureIndexer = new VectorIndexer().setInputCol("features").setOutputCol("indexedFeatures").fit(df)
featureIndexer: org.apache.spark.ml.feature.VectorIndexerModel = vecIdx_65fcd96b5dbc

接下来,我们把数据集随机分成训练集和测试集,其中训练集占70%。

scala> val Array(trainingData, testData) = df.randomSplit(Array(0.7, 0.3))
trainingData: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [features: vector, label: string]
testData: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [features: vector, label: string]

然后,我们设置logistic的参数,这里我们统一用setter的方法来设置,也可以用ParamMap来设置(具体的可以查看spark mllib的官网)。这里我们设置了循环次数为10次,正则化项为0.3等,具体的可以设置的参数可以通过explainParams()来获取,还能看到我们已经设置的参数的结果。

scala> val lr = new LogisticRegression().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures").setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8)
lr: org.apache.spark.ml.classification.LogisticRegression = logreg_d5c473ad2b66
scala> println("LogisticRegression parameters:\n" + lr.explainParams() + "\n")
LogisticRegression parameters:
aggregationDepth: suggested depth for treeAggregate (>= 2) (default: 2)
elasticNetParam: the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty (default: 0.0, current: 0.8)
family: The name of family which is a description of the label distribution to be used in the model. Supported options: auto, binomial, multinomial. (default: auto)
featuresCol: features column name (default: features, current: indexedFeatures)
fitIntercept: whether to fit an intercept term (default: true)
labelCol: label column name (default: label, current: indexedLabel)
lowerBoundsOnCoefficients: The lower bounds on coefficients if fitting under bound constrained optimization. (undefined)
lowerBoundsOnIntercepts: The lower bounds on intercepts if fitting under bound constrained optimization. (undefined)
maxIter: maximum number of iterations (>= 0) (default: 100, current: 10)
predictionCol: prediction column name (default: prediction)
probabilityCol: Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
rawPredictionCol: raw prediction (a.k.a. confidence) column name (default: rawPrediction)
regParam: regularization parameter (>= 0) (default: 0.0, current: 0.3)
standardization: whether to standardize the training features before fitting the model (default: true)
threshold: threshold in binary classification prediction, in range [0, 1] (default: 0.5)
thresholds: Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class's threshold (undefined)
tol: the convergence tolerance for iterative algorithms (>= 0) (default: 1.0E-6)
upperBoundsOnCoefficients: The upper bounds on coefficients if fitting under bound constrained optimization. (undefined)
upperBoundsOnIntercepts: The upper bounds on intercepts if fitting under bound constrained optimization. (undefined)
weightCol: weight column name. If this is not set or empty, we treat all instance weights as 1.0 (undefined)

这里我们设置一个labelConverter,目的是把预测的类别重新转化成字符型的。

scala> val labelConverter = new IndexToString().setInputCol("prediction").setOutputCol("predictedLabel").setLabels(labelIndexer.labels)
labelConverter: org.apache.spark.ml.feature.IndexToString = idxToStr_9b53b11ef180

构建pipeline,设置stage,然后调用fit()来训练模型。

val lrPipeline = new Pipeline().setStages(Array(labelIndexer, featureIndexer, lr, labelConverter))
lrPipeline: org.apache.spark.ml.Pipeline = pipeline_2c8791fd18dd
scala> val lrPipelineModel = lrPipeline.fit(trainingData)
17/09/07 10:05:03 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
17/09/07 10:05:03 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
lrPipelineModel: org.apache.spark.ml.PipelineModel = pipeline_2c8791fd18dd

pipeline本质上是一个Estimator,当pipeline调用fit()的时候就产生了一个PipelineModel,本质上是一个Transformer。然后这个PipelineModel就可以调用transform()来进行预测,生成一个新的DataFrame,即利用训练得到的模型对测试集进行验证。

scala> val lrPredictions = lrPipelineModel.transform(testData)
lrPredictions: org.apache.spark.sql.DataFrame = [features: vector, label: string ... 6 more fields]
scala> lrPredictions.show()
(Iris-versicolor, [5.6,3.0,4.5,1.5]) --> prob=[0.5015822116420338,0.4984177883579662], predicted Label=Iris-versicolor
+-----------------+---------------+------------+------------------+--------------------+--------------------+----------+---------------+
|         features|          label|indexedLabel|   indexedFeatures|       rawPrediction|         probability|prediction| predictedLabel|
+-----------------+---------------+------------+------------------+--------------------+--------------------+----------+---------------+
|[5.6,3.0,4.5,1.5]|Iris-versicolor|         0.0| [5.6,9.0,4.5,5.0]|[0.00632886769305...|[0.50158221164203...|       0.0|Iris-versicolor|
|[5.8,2.7,4.1,1.0]|Iris-versicolor|         0.0| [5.8,6.0,4.1,0.0]|[0.46667028524646...|[0.61459535539331...|       0.0|Iris-versicolor|
|[6.0,2.2,4.0,1.0]|Iris-versicolor|         0.0| [6.0,1.0,4.0,0.0]|[0.47942122856815...|[0.61761119701304...|       0.0|Iris-versicolor|
|[6.6,3.0,4.4,1.4]|Iris-versicolor|         0.0| [6.6,9.0,4.4,4.0]|[0.15960167914787...|[0.53981593737467...|       0.0|Iris-versicolor|
|[6.7,3.0,5.0,1.7]|Iris-versicolor|         0.0| [6.7,9.0,5.0,7.0]|[-0.1025771337303...|[0.47437817884138...|       1.0| Iris-virginica|
|[4.9,2.5,4.5,1.7]| Iris-virginica|         1.0| [4.9,4.0,4.5,7.0]|[-0.2173356236255...|[0.44587895949586...|       1.0| Iris-virginica|
|[5.5,2.4,3.8,1.1]|Iris-versicolor|         0.0| [5.5,3.0,3.8,1.0]|[0.35802577541757...|[0.58856244608996...|       0.0|Iris-versicolor|
|[5.6,3.0,4.1,1.3]|Iris-versicolor|         0.0| [5.6,9.0,4.1,3.0]|[0.18536505738574...|[0.54620902741999...|       0.0|Iris-versicolor|
|[5.7,2.5,5.0,2.0]| Iris-virginica|         1.0|[5.7,4.0,5.0,10.0]|[-0.4348861348778...|[0.39296017326534...|       1.0| Iris-virginica|
|[5.7,2.9,4.2,1.3]|Iris-versicolor|         0.0| [5.7,8.0,4.2,3.0]|[0.19174052904658...|[0.54778881119272...|       0.0|Iris-versicolor|
|[5.8,2.6,4.0,1.2]|Iris-versicolor|         0.0| [5.8,5.0,4.0,2.0]|[0.28763409555377...|[0.57141682194390...|       0.0|Iris-versicolor|
|[5.8,2.7,3.9,1.2]|Iris-versicolor|         0.0| [5.8,6.0,3.9,2.0]|[0.28763409555377...|[0.57141682194390...|       0.0|Iris-versicolor|
|[5.8,2.7,5.1,1.9]| Iris-virginica|         1.0| [5.8,6.0,5.1,9.0]|[-0.3389925683706...|[0.41605421498910...|       1.0| Iris-virginica|
|[5.9,3.0,5.1,1.8]| Iris-virginica|         1.0| [5.9,9.0,5.1,8.0]|[-0.2430990018634...|[0.43952279234922...|       1.0| Iris-virginica|
|[6.0,2.7,5.1,1.6]|Iris-versicolor|         0.0| [6.0,6.0,5.1,6.0]|[-0.0576873405098...|[0.48558216299242...|       1.0| Iris-virginica|
|[6.0,2.9,4.5,1.5]|Iris-versicolor|         0.0| [6.0,8.0,4.5,5.0]|[0.03183075433644...|[0.50795701676004...|       0.0|Iris-versicolor|
|[6.1,2.6,5.6,1.4]| Iris-virginica|         1.0| [6.1,5.0,5.6,4.0]|[0.12772432084363...|[0.53188774193068...|       0.0|Iris-versicolor|
|[6.1,3.0,4.6,1.4]|Iris-versicolor|         0.0| [6.1,9.0,4.6,4.0]|[0.12772432084363...|[0.53188774193068...|       0.0|Iris-versicolor|
|[6.3,2.5,5.0,1.9]| Iris-virginica|         1.0| [6.3,4.0,5.0,9.0]|[-0.3071152100663...|[0.42381903909131...|       1.0| Iris-virginica|
|[6.3,2.9,5.6,1.8]| Iris-virginica|         1.0| [6.3,8.0,5.6,8.0]|[-0.2175971152200...|[0.44581435344357...|       1.0| Iris-virginica|
+-----------------+---------------+------------+------------------+--------------------+--------------------+----------+---------------+
only showing top 20 rows

注意:probability是一个Vector,例如上面第一行数据中probability[0.5015822116420338,0.4984177883579662],表示预测每一类的概率,两者之和等于1.

最后我们可以输出预测的结果,其中select选择要输出的列,collect获取所有行的数据,用foreach把每行打印出来。其中打印出来的值依次分别代表该行数据的真实分类和特征值、预测属于不同分类的概率、预测的分类。

scala> lrPredictions.select("predictedLabel", "label", "features", "probability").collect().foreach { case Row(predictedLabel: String, label: String, features: Vector, prob: Vector) => println(s"($label, $features) --> prob=$prob, predicted Label=$predictedLabel")}
(Iris-versicolor, [5.6,3.0,4.5,1.5]) --> prob=[0.5015822116420338,0.4984177883579662], predicted Label=Iris-versicolor
(Iris-versicolor, [5.8,2.7,4.1,1.0]) --> prob=[0.6145953553933187,0.3854046446066813], predicted Label=Iris-versicolor
(Iris-versicolor, [6.0,2.2,4.0,1.0]) --> prob=[0.617611197013045,0.38238880298695493], predicted Label=Iris-versicolor
(Iris-versicolor, [6.6,3.0,4.4,1.4]) --> prob=[0.539815937374677,0.460184062625323], predicted Label=Iris-versicolor
(Iris-versicolor, [6.7,3.0,5.0,1.7]) --> prob=[0.47437817884138095,0.525621821158619], predicted Label=Iris-virginica
(Iris-virginica, [4.9,2.5,4.5,1.7]) --> prob=[0.44587895949586764,0.5541210405041324], predicted Label=Iris-virginica
(Iris-versicolor, [5.5,2.4,3.8,1.1]) --> prob=[0.5885624460899619,0.41143755391003817], predicted Label=Iris-versicolor
(Iris-versicolor, [5.6,3.0,4.1,1.3]) --> prob=[0.5462090274199968,0.45379097258000317], predicted Label=Iris-versicolor
(Iris-virginica, [5.7,2.5,5.0,2.0]) --> prob=[0.39296017326534144,0.6070398267346586], predicted Label=Iris-virginica
(Iris-versicolor, [5.7,2.9,4.2,1.3]) --> prob=[0.5477888111927275,0.45221118880727246], predicted Label=Iris-versicolor
(Iris-versicolor, [5.8,2.6,4.0,1.2]) --> prob=[0.5714168219439002,0.42858317805609975], predicted Label=Iris-versicolor
(Iris-versicolor, [5.8,2.7,3.9,1.2]) --> prob=[0.5714168219439002,0.42858317805609975], predicted Label=Iris-versicolor
(Iris-virginica, [5.8,2.7,5.1,1.9]) --> prob=[0.41605421498910905,0.583945785010891], predicted Label=Iris-virginica
(Iris-virginica, [5.9,3.0,5.1,1.8]) --> prob=[0.4395227923492292,0.5604772076507708], predicted Label=Iris-virginica
(Iris-versicolor, [6.0,2.7,5.1,1.6]) --> prob=[0.4855821629924292,0.5144178370075708], predicted Label=Iris-virginica
(Iris-versicolor, [6.0,2.9,4.5,1.5]) --> prob=[0.5079570167600491,0.4920429832399509], predicted Label=Iris-versicolor
(Iris-virginica, [6.1,2.6,5.6,1.4]) --> prob=[0.531887741930683,0.46811225806931694], predicted Label=Iris-versicolor
(Iris-versicolor, [6.1,3.0,4.6,1.4]) --> prob=[0.531887741930683,0.46811225806931694], predicted Label=Iris-versicolor
(Iris-virginica, [6.3,2.5,5.0,1.9]) --> prob=[0.4238190390913101,0.5761809609086899], predicted Label=Iris-virginica
(Iris-virginica, [6.3,2.9,5.6,1.8]) --> prob=[0.44581435344357156,0.5541856465564284], predicted Label=Iris-virginica
(Iris-virginica, [6.3,3.3,6.0,2.5]) --> prob=[0.30064595369512165,0.6993540463048784], predicted Label=Iris-virginica
(Iris-virginica, [6.4,3.1,5.5,1.8]) --> prob=[0.4473900414350662,0.5526099585649339], predicted Label=Iris-virginica
(Iris-virginica, [6.7,2.5,5.8,1.8]) --> prob=[0.45212332546733563,0.5478766745326644], predicted Label=Iris-virginica
(Iris-virginica, [6.7,3.0,5.2,2.3]) --> prob=[0.3453175901585396,0.6546824098414604], predicted Label=Iris-virginica
(Iris-versicolor, [6.7,3.1,4.7,1.5]) --> prob=[0.5191054573756626,0.4808945426243374], predicted Label=Iris-versicolor
(Iris-virginica, [6.8,3.2,5.9,2.3]) --> prob=[0.3467603323149756,0.6532396676850243], predicted Label=Iris-virginica
(Iris-virginica, [7.2,3.0,5.8,1.6]) --> prob=[0.5047044410242485,0.49529555897575156], predicted Label=Iris-versicolor
(Iris-virginica, [7.3,2.9,6.3,1.8]) --> prob=[0.4616150767129017,0.5383849232870982], predicted Label=Iris-virginica
(Iris-virginica, [7.7,2.6,6.9,2.3]) --> prob=[0.3598694101321466,0.6401305898678533], predicted Label=Iris-virginica
(Iris-virginica, [7.7,2.8,6.7,2.0]) --> prob=[0.4237551850416873,0.5762448149583126], predicted Label=Iris-virginica

4. 模型评估

创建一个MulticlassClassificationEvaluator实例,用setter方法把预测分类的列名和真实分类的列名进行设置;然后计算预测准确率和错误率。

scala> val evaluator = new MulticlassClassificationEvaluator().setLabelCol("indexedLabel").setPredictionCol("prediction")
evaluator: org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator = mcEval_e27111142138scala> val lrAccuracy = evaluator.evaluate(lrPredictions)
lrAccuracy: Double = 0.8666666666666667scala> println("Test Error = " + (1.0 - lrAccuracy))
Test Error = 0.1333333333333333

从上面可以看到预测的准确性达到86.6%,接下来我们可以通过model来获取我们训练得到的逻辑斯蒂模型。前面已经说过model是一个PipelineModel,因此我们可以通过调用它的stages来获取模型,具体如下:

scala> val lrModel = lrPipelineModel.stages(2).asInstanceOf[LogisticRegressionModel]
lrModel: org.apache.spark.ml.classification.LogisticRegressionModel = logreg_d5c473ad2b66scala> println("Coefficients: " + lrModel.coefficients+"\n"+"Intercept: "+lrModel.intercept+"\n"+"numClasses: "+lrModel.numClasses+"\n"+"numFeatures: "+lrModel.numFeatures)
Coefficients: [-0.06375471660847784,0.0,0.0,0.08951809484634254]
Intercept: -0.09689292891729179
numClasses: 2
numFeatures: 4
  • Coefficients: 系数(W)
  • Intercept: 截距(b)
  • numClasses: 分类种类个数
  • numFeatures: 特征向量个数

Wx + b

5. 模型评估

spark的ml库还提供了一个对模型的摘要总结(summary),不过目前只支持二项逻辑斯蒂回归,而且要显示转化成BinaryLogisticRegressionSummary 。在下面的代码中,首先获得二项逻辑斯模型的摘要;然后获得10次循环中损失函数的变化,并将结果打印出来,可以看到损失函数随着循环是逐渐变小的,损失函数越小,模型就越好;接下来,我们把摘要强制转化为BinaryLogisticRegressionSummary ,来获取用来评估模型性能的矩阵;通过获取ROC,我们可以判断模型的好坏,areaUnderROC达到了 0.9848856209150327,说明我们的分类器还是不错的;最后,我们通过最大化fMeasure来选取最合适的阈值,其中fMeasure是一个综合了召回率和准确率的指标,通过最大化fMeasure,我们可以选取到用来分类的最合适的阈值。

scala> val trainingSummary = lrModel.summary
trainingSummary: org.apache.spark.ml.classification.LogisticRegressionTrainingSummary = org.apache.spark.ml.classification.BinaryLogisticRegressionTrainingSummary@20e7a10scala> val objectiveHistory = trainingSummary.objectiveHistory
objectiveHistory: Array[Double] = Array(0.6927389617440809, 0.6899308387407462, 0.6865569217265758, 0.6736046289373718, 0.6694299346003736, 0.6681137586393369, 0.6679359916200565, 0.6673276439679328, 0.6668085232695621, 0.6665822228169366, 0.6663915835564641)scala> objectiveHistory.foreach(loss => println(loss))
0.6927389617440809
0.6899308387407462
0.6865569217265758
0.6736046289373718
0.6694299346003736
0.6681137586393369
0.6679359916200565
0.6673276439679328
0.6668085232695621
0.6665822228169366
0.6663915835564641scala> val binarySummary = trainingSummary.asInstanceOf[BinaryLogisticRegressionSummary]
binarySummary: org.apache.spark.ml.classification.BinaryLogisticRegressionSummary = org.apache.spark.ml.classification.BinaryLogisticRegressionTrainingSummary@20e7a10scala> println(s"areaUnderROC: ${binarySummary.areaUnderROC}")
areaUnderROC: 0.9848856209150327                                                scala> val fMeasure = binarySummary.fMeasureByThreshold
fMeasure: org.apache.spark.sql.DataFrame = [threshold: double, F-Measure: double]scala> val maxFMeasure = fMeasure.select(functions.max("F-Measure")).head().getDouble(0)
maxFMeasure: Double = 0.955223880597015scala> val bestThreshold = fMeasure.where($"F-Measure" === maxFMeasure).select("threshold").head().getDouble(0)
bestThreshold: Double = 0.5399690045423275scala> lrModel.setThreshold(bestThreshold)
res17: lrModel.type = logreg_d5c473ad2b66

用多项逻辑斯蒂回归解决 二分类 问题

对于二分类问题,我们还可以用多项逻辑斯蒂回归进行多分类分析。多项逻辑斯蒂回归与二项逻辑斯蒂回归类似,只是在模型设置上把family参数设置成multinomial,这里我们仅列出结果:

scala> val mlr = new LogisticRegression().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures").setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8).setFamily("multinomial")
mlr: org.apache.spark.ml.classification.LogisticRegression = logreg_82bf612d153escala> val mlrPipeline = new Pipeline().setStages(Array(labelIndexer, featureIndexer, mlr, labelConverter))
mlrPipeline: org.apache.spark.ml.Pipeline = pipeline_016e81dbca1fscala> val mlrPipelineModel = mlrPipeline.fit(trainingData)
mlrPipelineModel: org.apache.spark.ml.PipelineModel = pipeline_016e81dbca1fscala> val mlrPredictions = mlrPipelineModel.transform(testData)
mlrPredictions: org.apache.spark.sql.DataFrame = [features: vector, label: string ... 6 more fields]scala> mlrPredictions.select("predictedLabel", "label", "features", "probability").collect().foreach { case Row(predictedLabel: String, label: String, features:Vector, prob: Vector) => println(s"($label, $features) --> prob=$prob, predictedLabel=$predictedLabel")}
(Iris-virginica, [4.9,2.5,4.5,1.7]) --> prob=[0.4706991400166566,0.5293008599833434], predictedLabel=Iris-virginica
(Iris-versicolor, [5.1,2.5,3.0,1.1]) --> prob=[0.6123754219240134,0.38762457807598644], predictedLabel=Iris-versicolor
(Iris-versicolor, [5.5,2.3,4.0,1.3]) --> prob=[0.5724859784244956,0.42751402157550444], predictedLabel=Iris-versicolor
(Iris-versicolor, [5.5,2.4,3.8,1.1]) --> prob=[0.617700896993959,0.3822991030060409], predictedLabel=Iris-versicolor
(Iris-virginica, [5.8,2.7,5.1,1.9]) --> prob=[0.43670908827255583,0.563290911727444], predictedLabel=Iris-virginica
(Iris-versicolor, [5.9,3.0,4.2,1.5]) --> prob=[0.5316312191190347,0.4683687808809653], predictedLabel=Iris-versicolor
(Iris-versicolor, [6.1,3.0,4.6,1.4]) --> prob=[0.5577018837203559,0.44229811627964416], predictedLabel=Iris-versicolor
(Iris-virginica, [6.2,3.4,5.4,2.3]) --> prob=[0.3525986631597158,0.6474013368402842], predictedLabel=Iris-virginica
(Iris-versicolor, [6.3,2.3,4.4,1.3]) --> prob=[0.5834583948072782,0.4165416051927219], predictedLabel=Iris-versicolor
(Iris-versicolor, [6.3,3.3,4.7,1.6]) --> prob=[0.5138182157242249,0.4861817842757751], predictedLabel=Iris-versicolor
(Iris-virginica, [6.3,3.3,6.0,2.5]) --> prob=[0.31220890350252506,0.6877910964974749], predictedLabel=Iris-virginica
(Iris-virginica, [6.3,3.4,5.6,2.4]) --> prob=[0.33271906258823314,0.6672809374117669], predictedLabel=Iris-virginica
(Iris-virginica, [6.4,2.8,5.6,2.2]) --> prob=[0.3769557902496345,0.6230442097503656], predictedLabel=Iris-virginica
(Iris-versicolor, [6.4,3.2,4.5,1.5]) --> prob=[0.5386254134782479,0.461374586521752], predictedLabel=Iris-versicolor
(Iris-versicolor, [6.5,2.8,4.6,1.5]) --> prob=[0.5400225182386277,0.4599774817613724], predictedLabel=Iris-versicolor
(Iris-virginica, [6.5,3.0,5.5,1.8]) --> prob=[0.4697204592971098,0.5302795407028901], predictedLabel=Iris-virginica
(Iris-virginica, [6.5,3.2,5.1,2.0]) --> prob=[0.42334262258749805,0.576657377412502], predictedLabel=Iris-virginica
(Iris-versicolor, [6.6,2.9,4.6,1.3]) --> prob=[0.587552435510627,0.412447564489373], predictedLabel=Iris-versicolor
(Iris-virginica, [6.7,3.0,5.2,2.3]) --> prob=[0.35904307162043886,0.6409569283795613], predictedLabel=Iris-virginica
(Iris-virginica, [6.7,3.3,5.7,2.1]) --> prob=[0.4033033147932801,0.5966966852067198], predictedLabel=Iris-virginica
(Iris-virginica, [6.8,3.0,5.5,2.1]) --> prob=[0.40465727034257987,0.5953427296574202], predictedLabel=Iris-virginica
(Iris-virginica, [6.8,3.2,5.9,2.3]) --> prob=[0.36033816936772717,0.6396618306322728], predictedLabel=Iris-virginica
(Iris-versicolor, [6.9,3.1,4.9,1.5]) --> prob=[0.5456044340218952,0.4543955659781048], predictedLabel=Iris-versicolor
(Iris-virginica, [6.9,3.2,5.7,2.3]) --> prob=[0.361635302911029,0.638364697088971], predictedLabel=Iris-virginica
(Iris-virginica, [7.2,3.2,6.0,1.8]) --> prob=[0.47953540973471454,0.5204645902652856], predictedLabel=Iris-virginica
(Iris-virginica, [7.2,3.6,6.1,2.5]) --> prob=[0.3231782795184636,0.6768217204815363], predictedLabel=Iris-virginica
(Iris-virginica, [7.3,2.9,6.3,1.8]) --> prob=[0.48093901384053533,0.5190609861594646], predictedLabel=Iris-virginica
(Iris-virginica, [7.4,2.8,6.1,1.9]) --> prob=[0.4589531699542302,0.5410468300457698], predictedLabel=Iris-virginica
(Iris-virginica, [7.7,2.8,6.7,2.0]) --> prob=[0.43989505147330155,0.5601049485266986], predictedLabel=Iris-virginicascala> val mlrAccuracy = evaluator.evaluate(mlrPredictions)
mlrAccuracy: Double = 1.0scala> println("Test Error = " + (1.0 - mlrAccuracy))
Test Error = 0.0scala> val mlrModel = mlrPipelineModel.stages(2).asInstanceOf[LogisticRegressionModel]
mlrModel: org.apache.spark.ml.classification.LogisticRegressionModel = logreg_82bf612d153escala> println("Multinomial coefficients: " + mlrModel.coefficientMatrix+"Multin
omial intercepts: "+mlrModel.interceptVector+"numClasses: "+mlrModel.numClasses+
"numFeatures: "+mlrModel.numFeatures)
Multinomial coefficients: 0.028116025552572706   0.0  0.0  -0.046949976003541706
-0.028116025552572695  0.0  0.0  0.046949976003541706   Multinomial intercepts:
[0.13221236569525302,-0.13221236569525302]numClasses: 2numFeatures: 4

用多项逻辑斯蒂回归解决 多分类 问题

对于多分类问题,我们需要用多项逻辑斯蒂回归进行多分类分析。这里我们用全部的iris数据集,即有三个类别,过程与上述基本一致,这里我们同样仅列出结果:

scala> mlrPredictions.select("predictedLabel", "label", "features", "probability").collect().foreach { case Row(predictedLabel: String, label: String, features:Vector, prob: Vector) => println(s"($label, $features) --> prob=$prob, predictedLabel=$predictedLabel")}
(Iris-setosa, [4.3,3.0,1.1,0.1]) --> prob=[0.49856067730476944,0.2623440805400292,0.23909524215520148], predictedLabel=Iris-setosa
(Iris-setosa, [4.4,2.9,1.4,0.2]) --> prob=[0.46571089790971687,0.277891570222724,0.25639753186755915], predictedLabel=Iris-setosa
(Iris-setosa, [4.6,3.4,1.4,0.3]) --> prob=[0.5001101367665973,0.25928904940719977,0.24060081382620296], predictedLabel=Iris-setosa
(Iris-setosa, [4.6,3.6,1.0,0.2]) --> prob=[0.5463459284110406,0.236823238870237,0.2168308327187224], predictedLabel=Iris-setosa
(Iris-setosa, [4.7,3.2,1.6,0.2]) --> prob=[0.48370179709200706,0.2689591735381297,0.24733902936986318], predictedLabel=Iris-setosa
(Iris-setosa, [4.8,3.0,1.4,0.1]) --> prob=[0.4851576852808171,0.2693562861247639,0.24548602859441893], predictedLabel=Iris-setosa
(Iris-setosa, [4.9,3.0,1.4,0.2]) --> prob=[0.47467118791268154,0.2733753207454508,0.2519534913418676], predictedLabel=Iris-setosa
(Iris-setosa, [4.9,3.1,1.5,0.1]) --> prob=[0.48967732688779486,0.267131626783035,0.2431910463291701], predictedLabel=Iris-setosa
(Iris-versicolor, [5.0,2.3,3.3,1.0]) --> prob=[0.26303122888674907,0.36560215832179155,0.3713666127914594], predictedLabel=Iris-virginica
(Iris-setosa, [5.0,3.5,1.3,0.3]) --> prob=[0.5135688079539743,0.2524416257183621,0.2339895663276636], predictedLabel=Iris-setosa
(Iris-setosa, [5.0,3.5,1.6,0.6]) --> prob=[0.4686356517088239,0.2713034457686629,0.26006090252251307], predictedLabel=Iris-setosa
(Iris-setosa, [5.1,3.5,1.4,0.3]) --> prob=[0.5091020180722664,0.25475974124614675,0.23613824068158687], predictedLabel=Iris-setosa
(Iris-setosa, [5.1,3.8,1.5,0.3]) --> prob=[0.531570061574297,0.24348517949467904,0.22494475893102403], predictedLabel=Iris-setosa
(Iris-setosa, [5.1,3.8,1.9,0.4]) --> prob=[0.503222274154322,0.25683175058110785,0.23994597526457007], predictedLabel=Iris-setosa
(Iris-setosa, [5.2,3.5,1.5,0.2]) --> prob=[0.5151370941776632,0.2529823490495923,0.2318805567727446], predictedLabel=Iris-setosa
(Iris-setosa, [5.3,3.7,1.5,0.2]) --> prob=[0.5330773525753305,0.24387796024925384,0.22304468717541576], predictedLabel=Iris-setosa
(Iris-versicolor, [5.4,3.0,4.5,1.5]) --> prob=[0.2306542447600023,0.372383222489962,0.39696253275003573], predictedLabel=Iris-virginica
(Iris-setosa, [5.4,3.9,1.3,0.4]) --> prob=[0.5389512877303541,0.23848657002728416,0.2225621422423618], predictedLabel=Iris-setosa
(Iris-versicolor, [5.5,2.4,3.7,1.0]) --> prob=[0.25620601559263473,0.36919246180632764,0.37460152260103763], predictedLabel=Iris-virginica
(Iris-setosa, [5.5,3.5,1.3,0.2]) --> prob=[0.5240613549472979,0.24832602160956213,0.22761262344314004], predictedLabel=Iris-setosa
(Iris-setosa, [5.5,4.2,1.4,0.2]) --> prob=[0.5818115053858839,0.21899706180633755,0.19919143280777854], predictedLabel=Iris-setosa
(Iris-versicolor, [5.6,2.5,3.9,1.1]) --> prob=[0.24827164138938784,0.3712338899987297,0.38049446861188246], predictedLabel=Iris-virginica
(Iris-versicolor, [5.6,2.7,4.2,1.3]) --> prob=[0.23609842674482123,0.3733910806218104,0.39051049263336834], predictedLabel=Iris-virginica
(Iris-virginica, [5.6,2.8,4.9,2.0]) --> prob=[0.17353784667372726,0.38803750951559646,0.43842464381067625], predictedLabel=Iris-virginica
(Iris-versicolor, [5.6,2.9,3.6,1.3]) --> prob=[0.26994082035183004,0.35725015822484213,0.37280902142332784], predictedLabel=Iris-virginica
(Iris-setosa, [5.7,4.4,1.5,0.4]) --> prob=[0.5744990088621882,0.22068271118182198,0.20481827995598978], predictedLabel=Iris-setosa
(Iris-virginica, [5.8,2.8,5.1,2.4]) --> prob=[0.14589555459093273,0.39150544114527663,0.4625990042637906], predictedLabel=Iris-virginica
(Iris-virginica, [5.9,3.0,5.1,1.8]) --> prob=[0.19164845952411863,0.38448782728830505,0.42386371318757643], predictedLabel=Iris-virginica
(Iris-versicolor, [6.0,2.2,4.0,1.0]) --> prob=[0.23300779791940326,0.3802856918956981,0.3867065101848985], predictedLabel=Iris-virginica
(Iris-versicolor, [6.0,2.7,5.1,1.6]) --> prob=[0.18810463050749873,0.3900406691963187,0.42185470029618244], predictedLabel=Iris-virginica
(Iris-versicolor, [6.1,2.8,4.0,1.3]) --> prob=[0.24928433400912278,0.3671520807495573,0.3835635852413199], predictedLabel=Iris-virginica
(Iris-versicolor, [6.2,2.2,4.5,1.5]) --> prob=[0.18351550396686533,0.3934066675024647,0.42307782853066994], predictedLabel=Iris-virginica
(Iris-virginica, [6.2,2.8,4.8,1.8]) --> prob=[0.1888126898204262,0.38539188363903437,0.4257954265405395], predictedLabel=Iris-virginica
(Iris-versicolor, [6.2,2.9,4.3,1.3]) --> prob=[0.24600050420847877,0.3689652108789115,0.38503428491260977], predictedLabel=Iris-virginica
(Iris-virginica, [6.2,3.4,5.4,2.3]) --> prob=[0.17337730890542696,0.3825617039174212,0.44406098717715176], predictedLabel=Iris-virginica
(Iris-virginica, [6.3,2.9,5.6,1.8]) --> prob=[0.1729681423511942,0.3931462837297906,0.4338855739190153], predictedLabel=Iris-virginica
(Iris-virginica, [6.4,3.1,5.5,1.8]) --> prob=[0.18621090846131505,0.3872972795834499,0.42649181195523495], predictedLabel=Iris-virginica
(Iris-versicolor, [6.6,2.9,4.6,1.3]) --> prob=[0.23618909578565045,0.373766365784125,0.3900445384302246], predictedLabel=Iris-virginica
(Iris-virginica, [6.7,3.3,5.7,2.5]) --> prob=[0.1496994275680708,0.38855932284425526,0.4617412495876739], predictedLabel=Iris-virginica
(Iris-virginica, [6.8,3.0,5.5,2.1]) --> prob=[0.16265889090899283,0.39126984184915486,0.4460712672418523], predictedLabel=Iris-virginica
(Iris-virginica, [7.2,3.2,6.0,1.8]) --> prob=[0.1782593898810351,0.3913068582491216,0.43043375186984334], predictedLabel=Iris-virginica
(Iris-virginica, [7.7,2.6,6.9,2.3]) --> prob=[0.10733085394350968,0.41117706558989164,0.4814920804665987], predictedLabel=Iris-virginica
(Iris-virginica, [7.7,3.8,6.7,2.2]) --> prob=[0.16693678799079806,0.38877323991855633,0.44428997209064564], predictedLabel=Iris-virginica
(Iris-virginica, [7.9,3.8,6.4,2.0]) --> prob=[0.18714592916724979,0.3838745095632083,0.42897956126954184], predictedLabel=Iris-virginicascala> val mlrAccuracy = evaluator.evaluate(mlrPredictions)
mlrAccuracy: Double = 0.6339712918660287scala> println("Test Error = " + (1.0 - mlrAccuracy))
Test Error = 0.36602870813397126scala> val mlrModel = mlrPipelineModel.stages(2).asInstanceOf[LogisticRegressionModel]
mlrModel: org.apache.spark.ml.classification.LogisticRegressionModel = logreg_9661a4f56149scala> println("Multinomial coefficients: " + mlrModel.coefficientMatrix+"Multinomial intercepts: "+mlrModel.interceptVector+"numClasses: "+mlrModel.numClasses+"numFeatures: "+mlrModel.numFeatures)
Multinomial coefficients: 0.0  0.35442627664118775    -0.1787646656602406  -0.36
662299325180614
0.0  0.0                    0.0                  0.0
0.0  -0.010992364266212548  0.0                  0.11193811404312962   Multinomi
al intercepts: [-0.10160079218819881,0.0863062310816332,0.01529456110656562]numC
lasses: 3numFeatures: 4

http://www.ppmy.cn/news/1522463.html

相关文章

Spring AOP的注解式开发实现

目录 AOP常用注解注解开发实现步骤1. 导入Maven项目依赖2. 准备一个实体类(先定义接口再实现)3. 定义切面类4. 准备配置文件5. 编写测试类 注意事项 AOP的配置开发方式见:Spring入门之AOP(包含实例代码)。其他纯注解开…

记忆化搜索【下】

375. 猜数字大小II 题目分析 题目链接:375. 猜数字大小 II - 力扣(LeetCode) 题目比较长,大致意思就是给一个数,比如说10,定的数字是7,让我们在[1, 10]这个区间猜。 如果猜大或猜小都会说明…

Spring Boot之DevTools介绍

Spring Boot DevTools 是 Spring Boot 提供的一组易于使用的工具,旨在加速开发和测试过程。它通过提供一系列实用的功能,如自动重启、实时属性更新、依赖项的热替换等,极大地提高了开发者的开发效率。本文将详细介绍 Spring Boot DevTools 的…

RLVF:避免过度泛化地从口头反馈中学习

人工智能咨询培训老师叶梓 转载标明出处 大模型在不同行业和个人中的广泛应用要求模型能够根据具体的用户反馈进行调整或定制,以满足细微的要求和偏好。虽然通过高层次的口头反馈来指定模型调整非常方便,例如“在给老板起草电子邮件时不要使用表情符号”…

【A题第二套完整论文已出】2024数模国赛A题第二套完整论文+可运行代码参考(无偿分享)

“板凳龙” 闹元宵路径速度问题 摘要 本文针对传统舞龙进行了轨迹分析,并针对一系列问题提出了解决方案,将这一运动进行了模型可视化。 针对问题一,我们首先对舞龙的螺线轨迹进行了建模,将直角坐标系转换为极坐标系&#xff0…

缓存和数据库缓存有什么区别

缓存通常是在应用层面进行管理的,它就像是应用的一个临时数据仓库,可以存储一些常用的数据,这样应用就可以直接从缓存中获取数据,而不用每次都去数据库里查询。而数据库缓存则是在数据库层面进行管理的,它主要存储的是…

一键云迁移:利用VMware PowerCLI将OVA虚拟机顺利迁移到AWS

哈喽大家好,欢迎来到虚拟化时代君(XNHCYL)。 “ 大家好,我是虚拟化时代君,一位潜心于互联网的技术宅男。这里每天为你分享各种你感兴趣的技术、教程、软件、资源、福利…(每天更新不间断,福利…

天气预报爬虫

一、获取天气接口 主要通过nowapi注册用户之后,进入相应的接口,进行抓取报文。 二、wireshark抓取报文,解析cjson格式 Http的交互过程 1.建立TCP连接 2.发送HTTP请求报文 3.回复HTTP响应报文 4.断开TCP连接 CJSON的使用办法 1. JSON…

Python爬虫-Amazon亚马逊oData参数

前言 本文是该专栏的第37篇,后面会持续分享python爬虫干货知识,记得关注。 本文以“亚马逊Amazon”为例,主要获取亚马逊商品详情页的oData参数规律。 具体实现思路和详细逻辑,笔者将在正文结合完整代码进行详细介绍。接下来,跟着笔者直接往下看正文详细内容。(附带完整…

axure之变量

一、设置我们的第一个变量 1、点击axure上方设置一个全局变量a 3 2、加入按钮、文本框元件点击按钮文档框展示变量值。 交互选择【单击时】【设置文本】再点击函数。 点击插入变量和函数直接选择刚刚定义的全局变量,也可以直接手动写入函数(注意写入格式。) 这…

RISC-V最先进CPU微架构分析

简介 近几年热门的RISC-V架构发展迅猛,尽管因为生问题,RISC-V应用方向主要是单片机级的,高端应用方向发展发展速度缓慢,依然有不少公司推出了基于RISC-V指令集的高端应用场景的处理器。 本文汇总具有代表性的RISC-V公司推出的先…

『功能项目』武器的切换实例【34】

本章项目成果展示 我们打开上一篇33战士的A键连击的项目, 本章要做的事情是按键盘E键切换职业时切换手中的武器 首先在资源商店下载免费的武器模型 创建一个空物体 命名为WeaponPos 将武器预制体拖拽至WeaponPos (注意调整空物体位置就可以后续文章会更…

DDoS实战 · 攻防演练

21世纪什么最重要?安全!安全!还是安全!接下来给大家并分享一些实用的安全测试小技巧。 请记住,网络并非法外之地,言行举止需谨慎! 什么是DDoS攻击 分布式拒绝服务攻击(Distribute…

pytest 常用的辅助函数和工具函数

pytest 常用的辅助函数和工具函数示例 # File: my_module.pydef fetch_data():return process datadef process_data():data fetch_data()return data.upper() import logging import sys import pytest#01-------------------------------pytest.fixture,sample_data 在测试…

PromptReps: 解锁LLM的检索力量

论文:https://arxiv.org/pdf/2404.18424代码:https://github.com/ielab/PromptReps机构:CSIRO、昆士兰大学、滑铁卢大学领域:retrieval、embedding model发表:arXiv 当前大型语言模型用于zero-shot文档排序的方法主要有…

文件名管理器,一款免费的文件名管理工具,支持文件整理功能

文件名管理器是一款可以批量修改文件名的工具,但是相较于其他工具又有不同。除了批量重命名功能外,软件同时提供一些特色功能:把文件名插入到文本文件中、根据文件名写入音乐ID3信息,整理下载的视频资源、音乐分类整理等。软件提供…

java当中什么是NIO

Java中的NIO(Non-blocking I/O)即非阻塞I/O,是Java 1.4中引入的一种新的I/O API,用于替代传统的I/O(即BIO, Blocking I/O)。与传统的阻塞式I/O相比,NIO提供了更高效的I/O操作,特别是…

深度学习速通系列:如何计算文本相似度

计算文本相似度是自然语言处理(NLP)中的一个常见任务,用于衡量两个文本片段在语义上的相似性或相关性。以下是一些常用的方法: 余弦相似度: 将文本转换为向量(例如,使用词袋模型或TF-IDF&#x…

SpringBoot开启多端口探究--基于多ApplicationContext

文章目录 前情提要一、思路概要二、具体实现三、其他问题父子关系部分依赖 总结 前情提要 前面探讨了management端口开启,grpc端口开启,本文继续探讨在SpringApplication中开启多个端口的方式之多ApplicationContext, 相比management端口基于多WebServe…

内卷时代无人机培训机构如何做大做强

在当今社会,随着科技的飞速发展,“内卷”一词频繁被提及,反映了各行业竞争日益激烈的现象。对于无人机培训行业而言,如何在这样的时代背景下脱颖而出,实现做大做强的目标,成为每个培训机构必须深思的问题。…