快速掌握序列化工具：Protobuf Boost.Serialization

文章目录

- 0.简介
- 1.序列化/反序列化介绍
- - 1.1 概念
  - 1.2 作用
- 2.常见的序列化工具
- - 2.1 Protobuf
  - - 2.1.1 用法
  - 2.2 Boost.Serialization
  - - 2.2.1 用法
- 3.不同适合场景分析

0.简介

数据在内存中结构格式和传输或存储格式的转换被称为序列化，其广泛应用于通讯协议和框架中，本文将对序列化/反序列化的概念及常用的两种方式进行介绍，为后续介绍相关框架做基础知识说明。

1.序列化/反序列化介绍

1.1 概念

序列化就是把对象转换成一段连续的二进制串/text串称为序列化；反序列化就是把二进制串/text串转换成内存中的对象。

1.2 作用

1）对象的持久化保存和恢复
2）对象的网络传输和恢复
3）跨进程/跨语言的对象传递和恢复

2.常见的序列化工具

常见的序列化工具有Protobuf(Google Protocol Buffers)和Boost的Serialization模块，下面对两者进行介绍和对比。

2.1 Protobuf

ProtoBuf官方描述如下：

protocol buffers 是一种语言无关、平台无关、可扩展的序列化结构数据的方法，它可用于（数据）通信协议、数据存储等。
Protocol Buffers 是一种灵活，高效，自动化机制的结构数据序列化方法－可类比 XML，但是比 XML 更小（3 ~ 10倍）、更快（20 ~ 100倍）、更为简单。
你可以定义数据的结构，然后使用特殊生成的源代码轻松的在各种数据流中使用各种语言进行编写和读取结构数据。你甚至可以更新数据结构，而不破坏由旧数据结构编译的已部署程序。

2.1.1 用法

1）编写.proto文件，定义数据结构

message xxx {// 字段规则：required -> 字段只能也必须出现 1 次// 字段规则：optional -> 字段可出现 0 次或1次// 字段规则：repeated -> 字段可出现任意多次（包括 0）// 类型：int32、int64、sint32、sint64、string、32-bit ....// 字段编号：0 ~ 536870911（除去 19000 到 19999 之间的数字）字段规则 类型 名称 = 字段编号;
}

我们定义一个简单的文件

message  example{repeated int32 repeatedInt32Val = 4;repeated string repeatedStringVal = 5;
}

2）利用proto文件生成读写接口

// $SRC_DIR: .proto 所在的源目录
// --cpp_out: 生成 c++ 代码
// $DST_DIR: 生成代码的目标目录
// xxx.proto: 要针对哪个 proto 文件生成接口代码protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/xxx.proto//执行如下语句和结果如下
protoc -I=. --cpp_out=. ./example.proto
//生成的文件
example.pb.cc  example.pb.h

3）使用

#include "example.pb.h"#include <iostream>
#include <fstream>
#include <string>                                                                                                       int main() {   example a;                                                                                                             example1 a;a.add_repeatedint32val(2);a.add_repeatedint32val(3);a.add_repeatedstringval("repeated1");a.add_repeatedstringval("repeated2");std::string filename = "result";std::fstream output(filename, std::ios::out | std::ios::trunc | std::ios::binary);if (!a.SerializeToOstream(&output)) {exit(-1);}return 0;
}

//编译
g++ -I. ./example.pb.cc ./test.cpp -o test -lprotobuf

2.2 Boost.Serialization

Boost网站上的说明如下

Here, we use the term "serialization" to mean the reversible deconstruction of an arbitrary set of C++ data structures to a sequence of bytes. Such a system can be used to reconstitute an equivalent structure in another program context.
Depending on the context, this might used implement object persistence, remote parameter passing or other facility. 
In this system we use the term"archive" to refer to a specific rendering of this stream of bytes. 
This could be a file of binary data, text data, XML, or some other created by the user of this library.

2.2.1 用法

其序列化方法有多种（text,bin等）以text方式为例来看，拿int来举例，其使用方式较为简单。

#include <boost/archive/text_oarchive.hpp> 
#include <boost/archive/text_iarchive.hpp> 
#include <iostream> 
#include <fstream> void save() 
{ std::ofstream file("archiv.txt"); boost::archive::text_oarchive oa(file); int i = 1; oa << i; 
} void load() 
{ std::ifstream file("archiv.txt"); boost::archive::text_iarchive ia(file); int i = 0; ia >> i; std::cout << i << std::endl; 
} int main() 
{ save(); load(); 
}

其功能十分强大，可以序列化指针，会序列化整个对象，进而反序列化可以恢复指针当前状态。对于类的使用，分为侵入式和非侵入式：
1）侵入式(需要在类中定义序列化函数）

class A
{
private:friend class boost::serialization::access;template<class Archive>void serialize(Archive& ar, const unsigned int version){ar & _number;}public:A():_number(0.0){}private:float _number;
};void TestArchive3()
{A a1(1.2);std::ostringstream os;boost::archive::binary_oarchive oa(os);oa << a1;//序列化到一个ostringstream里面std::string content = os.str();//content保存了序列化后的数据。A a2;std::istringstream is(content);boost::archive::binary_iarchive ia(is);ia >> a2;//从一个保存序列化数据的string里面反序列化，从而得到原来的对象。
}

2）非侵入式(不需要在类中定义序列化函数）

namespace boost {namespace serialization {template<class Archive>void serialize(Archive & ar, A& a, const unsigned int version){ar & a._number;}} // namespace serialization
} // namespace boost

3.不同适合场景分析

可以看到protobuf是利用配置文件来生成对应的代码文件，支持多种语言，适用于跨平台跨语言的开发，比如grpc的序列化就是使用的protobuf；boost.serialization只能用于C++语言，其对于复杂类型的支持更好，更适合与跨进程的通信和数据存储恢复，很多常见的分布式数据库就是使用该方式来进行集群节点和单机节点的通信。