string 的模拟实现

1. 模拟 string 的成员属性

2. constructor

3. destructor

4. size() and operator[]

5. iterator

6. copy constructor 和 operator=

6.1. 为什么用户必须实现字节拷贝构造和赋值

6.2. 普通实现

6.3. 优化实现

7. push_back() and reserve()

7.1. reserve

7.2. push_back

8. append()

9. operator+=

10. insert

10.1. insert (size_t pos, char ch)

10.2. insert(size_t pos, const char* str)

11. erase()

10. find()

11. substr()

12. resize()

13. operator>> 和 operator<<

13.1. operator<<(流插入运算符重载)

13.2. operator>>(流提取运算符重载)

14. string 的简单模拟实现

15. string 的一些例题

15.1. 字符串相加

15.2. 字符串最后一个单词的长度

15.3. 字符串中的第一个唯一字符

15.4. 验证回文串

1. 模拟 string 的成员属性

namespace Xq
{class string{public:// ... 函数实现private:char* _str;         // 存储字符串的堆空间size_t _capacity;   // string 对象的容量size_t _size;       // string 对象的实际大小};
}

2. constructor

第一个版本：

string(const char* str = ""): _str(new char[strlen(str)+1]), _size(strlen(str)), _capacity(strlen(str))
{strcpy(_str, str);
}

缺点：多次重复 strlen()。

第二个版本：

string(const char* str = ""):_size(strlen(str)), _str(new char[_size+1]), _capacity(_size)
{strcpy(_str, str);
}

虽然这种版本看似可以运行，但实际上，代码维护有风险，因为初始化列表的初始化顺序是由成员属性声明顺序决定的，如果不匹配可能有崩溃的风险，不推荐。

第三个版本：

string(const char* str = "")
{_size = strlen(str);_str = new char[_size + 1];_capacity = _size;strcpy(_str, str);
}

在这里，建议不在初始化列表显式初始化，在函数内部更好点

3. destructor

~string()
{if (_str) delete[] _str;_capacity = _size = 0;
}

4. size() and operator[]

size () 实现如下：

//size()只需要满足用户读的需求就可以了，因此在这里将权限放低，让const对象和非const对象可以同时调用
const size_t size() const
{return _size;
}

operator[] 实现如下：

//对于非const对象，用户需要得到可读可写的权限
char& operator[](size_t pos)
{assert(pos < _size);return _str[pos];
}
//对于const对象，用户只需要得到读的权限
const char& operator[](size_t pos) const
{assert(pos < _size);return _str[pos];
}

5. iterator

我们知道 string 本质上就是 basic_string<char>，所存储的元素是连续的，故 iterator 就是原生指针。

typedef char* iterator;
typedef const char* const_iterator;// 返回第一个元素的位置
iterator begin() 
{return _str;
}// 返回最后一个有效元素的下一个位置
iterator end()
{return _str + _size;
}const_iterator begin() const
{return _str;
}const_iterator end() const
{return _str + _size;
}

6. copy constructor 和 operator=

6.1. 为什么用户必须实现字节拷贝构造和赋值

我们知道，对于basic_string<char>来说，如果我们没有显示实现 copy constructor 和 operator= 那么编译器就会默认生成 copy constructor 和 operator=，编译器默认生成的这两个成员函数会对内置类型和自定义类型都处理，对内置类型按照字节序的方式进行处理，对自定义类型会去调用它的 copy constructor 和 operator=，在这里，编译器默认生成的不符合我们的需求，因为它会带来两个问题：

其中一个对象发生修改，另一个对象也会随之改变；
当这两个对象生命周期结束时，会调用析构函数，同一空间被析构两次进程crash掉；

其中一个对象发生修改，另一个对象也会随之改变，如下：

void Test()
{Xq::string str1("haha");Xq::string str2("hehe"); std::cout << "str1: " << str1.c_str() << std::endl;std::cout << "str2: " << str2.c_str() << std::endl;str1 = str2;   // 这里不会崩是因为, 析构还没实现for (size_t i = 0; i < str1.size(); ++i){str1[i] = 'x';}std::cout << "str1: " << str1.c_str() << std::endl;std::cout << "str2: " << str2.c_str() << std::endl;}

现象如下：

如果实现析构，那么同一空间被析构两次，进程自然会崩溃，如下：

很显然，编译器默认生成的copy constructor 和 operator= 不符合用户的需求，因此用户必须自己实现对应的 copy constructor 和 operator=；

6.2. 普通实现

拷贝构造函数的实现如下：

string(const string& copy)// 1.注意需要多开一个空间('\0'):_str(new char[copy._size+1]), _size(copy._size), _capacity(copy._capacity)
{//strcpy(_str, copy._str);// 2.依次赋值for (size_t i = 0; i <= copy._size; ++i){_str[i] = copy._str[i];}
}

赋值实现如下：

string& operator=(const string& copy)
{// 1.判断是否给自己赋值if (this != &copy){// 2.先开空间(含'\0')char* tmp = new char[copy._size + 1];delete[] _str;_str = tmp;// 3.依次赋值for (size_t i = 0; i <= copy._size; ++i){_str[i] = copy._str[i];}_size = copy._size;_capacity = copy._capacity;}// 4.返回对象return *this;
}

6.3. 优化实现

优化实现需要一个 string::swap (string 的成员函数) ，实现如下：

void swap(string& tmp)
{//1.调用标准库里的swapstd::swap(_str, tmp._str);std::swap(_size, tmp._size);std::swap(_capacity, tmp._capacity);
}

copy constructor 实现如下：

string(const string& copy)// 1. 注意这里需要初始化, 如果不初始化, // 交换后这个临时对象会调用析构, 非法访问, 其行为未定义:_str(nullptr), _size(0), _capacity(0)
{// 2. 生成一个临时对象string tmp(copy._str);// 3. 返回这个临时对象swap(tmp);
}

赋值实现如下：

// operator= 的两种优化实现:// 第一种: 生成一个临时对象
string& operator=(const string& copy)
{// 检测是否给自己赋值if (this != &copy){//string tmp(copy._str); // constructorstring tmp(copy); // copy constructorswap(tmp): }return *this;
}// 第二种: 利用传值传参会拷贝构造的机制
string& operator=(string copy)
{swap(copy);return *this;
}

7. push_back() and reserve()

7.1. reserve

void reserve(size_t new_capacity = 0)
{// 1. 只考虑增容if (new_capacity > _capacity){// 2. 先开新空间(包含'\0')char* tmp = new char[new_capacity + 1];// 3. 依次赋值for (size_t i = 0; i <= _size; ++i){tmp[i] = _str[i];}// 4. 释放旧空间delete[] _str;// 5. 更新新空间和新容量// size 没变, 因此不用更新_str = tmp;_capacity = new_capacity;}
}

7.2. push_back

void push_back(char ch)
{// 1. 检查容量if (_capacity == _size){// 2. 处理空对象 _capacity == 0 的特殊情况 reserve(_capacity == 0 ? 4 : 2 * _capacity);}// 3. push 'ch'_str[_size] = ch;++_size;// 4. 注意最后一定要更新'\0'_str[_size] = '\0';
}

8. append()

string& append(const char* str)
{// 1. 检查容量size_t len = strlen(str);size_t j = 0;if (_size + len > _capacity){reserve(_size + len + 1);}// 2. 依次添加每个字符(包含'\0')for (size_t i = _size; i <= _size + len; ++i){_str[i] = str[j++];}// 3. 更新_size_size += len;// 4. 返回该对象return *this;
}

9. operator+=

实现思路：复用 push_back 和 append

string& operator+=(char ch)
{push_back(ch);return *this;
}string& operator+=(const char* str)
{append(str);return *this;
}

10. insert

10.1. insert (size_t pos, char ch)

string& insert(size_t pos, char ch)
{// 1. 特殊处理(pos == 0)int head = pos;if (pos == 0){head = 0;}// 2. 检查容量if (_capacity == _size){reserve(_capacity == 0 ? 4 : 2 * _capacity);}// 在这里进行强转 (size_t 不能小于0)assert(head <= (int)_size);int end = _size;while (end >= head){_str[end + 1] = _str[end];--end;}_str[head] = ch;++_size;return *this;
}

上面是通过强转实现，也可以采用下面的方式：

string& insert(size_t pos, char ch)
{	// 1. 检查容量if (_capacity == _size){reserve(_capacity == 0 ? 4 : 2 * _capacity);}// 2. pos 必须是合法位置assert(pos <= _size);// 3. 找到 '\0' 的下一个位置size_t end = _size + 1;// 4. 依次移动while (end > pos){_str[end] = _str[end - 1];  --end;}// 5. insert ch_str[pos] = ch;++_size;// 6. 返回对象return *this;
}

10.2. insert(size_t pos, const char* str)

string& insert(size_t pos, const char* str)
{assert(pos <= _size);size_t len = strlen(str);// 1. 检查容量if (_size + len > _capacity){reserve(_size + len + 1);}// 2. 移动数据size_t end = _size + len;while (end >= pos + len){_str[end] = _str[end - len];--end;}// 3. 插入数据for (size_t j = 0; j < len; ++j){_str[pos++] = str[j];++_size;}// 4. 返回对象return *this;
}

11. erase()

string& erase(size_t pos, size_t len = npos)
{// 1. 从pos删除到_str的结尾if (len == npos || pos + len >= _size){_size = pos;_str[_size] = '\0';}// 2. 从pos位置开始删除len个字符else{size_t begin = pos + len;size_t str_len = strlen(_str);for (size_t i = begin; i < str_len; i++){_str[pos] = _str[i];++pos;}_size = pos;_str[_size] = '\0';}// 3. 返回对象return *this;
}

10. find()

size_t find(char ch, size_t pos = 0)
{assert(pos < _size);for (size_t i = 0; i < _size; ++i){if (ch == _str[i])return i;}return npos;
}size_t find(const char* str, size_t pos = 0)
{assert(str);assert(pos < _size);const char* ret = strstr(_str + pos, str);if (ret == nullptr){return npos;}else{return ret - _str;}
}

11. substr()


string substr(size_t pos = 0, size_t len = npos)
{assert(pos >= 0 && pos < _size);Xq::string ret;size_t i = pos;//1. +=到string对象的结尾if (len == npos || pos + len >= _size){for (i = pos; i < _size; ++i){ret += _str[i];}}//2. 从pos += len个字符else{while (len--){ret += _str[i++];}}return ret;
}

12. resize()

void resize(size_t n, char ch = '\0')
{// 1. 考虑增容,不考虑缩容if (n > _size){reserve(n);size_t i = _size;for (i = _size; i < n; ++i){_str[i] = ch;}_str[i] = '\0';_size = n;}else{_str[n] = '\0';_size = n;}
}

13. operator>> 和 operator<<

13.1. operator<<(流插入运算符重载)

实现方案一：通过友元函数访问类的私有成员：

namespace Xq
{    class string{public://...// 友元函数friend std::ostream& operator<<(std::ostream& out, const Xq::string& str);private://...}
}
// 必须在类外定义，因为需要第一个参数是out;
std::ostream& Xq::operator<<(std::ostream& out, const Xq::string& str)
{	for (size_t i = 0; i < str._size; ++i){out << str[i];}return out;
}

实现方案二：利用成员函数访问私有属性：

namespace Xq
{    class string{public://...private://...}
}
// 利用成员函数访问私有属性
std::ostream& operator<<(std::ostream& out, const Xq::string& str)
{for(size_t i = 0; i < str._size();++i){out << str[i];}return out;
}

13.2. operator>>(流提取运算符重载)

方案一：

std::istream& operator>>(std::istream& in, Xq::string& str)
{// 1. 保证对象为空str.erase(0);    // 2. 考虑到可能有大量输入的情况, 频繁+=, 影响效率, // 因此在这里预先开好空间, 一定程度上减少频繁扩容的问题str.reserve(15);char ch = 0;// 3. 在这里in不符合需求，因为in认为' '和'\n'是分隔符，in会忽略这两个字符// in >> ch;// 4. std::istream::get 这个成员函数会获取每一个字符ch = in.get();while (ch != ' ' && ch != '\n'){str += ch;ch = in.get();}return in;
}

上面的方案，频繁调用 operator+= 也会影响效率，第二种方案：

std::istream& operator>>(std::istream& in, Xq::string& str)
{// 1. 保证对象为空str.erase(0);// 2. 获取第一个字符, 以便于进入循环char ch = 0;ch = in.get();// 3. 利用一个缓冲字符串, 缓冲区的大小为Nconst size_t N = 32;char buffer[N] = { 0 };size_t i = 0;while (ch != ' ' && ch != '\n'){buffer[i++] = ch;// 3. 缓冲字符串满了，就调用+=，再重置缓冲字符串if (N-1 == i){	buffer[N - 1] = '\0';str += buffer;i = 0;}ch = in.get();}// 4. 防止缓冲字符串还有内容未+=到string对象里buffer[i] = '\0';str += buffer;return in;
}

14. string 的简单模拟实现

#include <iostream>
#include <cstring>
#include <string>
#include <cassert>namespace Xq
{class string{public:typedef char* iterator;typedef const char* const_iterator;string(const char* str = ""){_size = strlen(str);_str = new char[_size + 1];_capacity = _size;strcpy(_str, str);}string(const string& copy):_str(nullptr), _size(0), _capacity(0){string tmp(copy._str);swap(tmp);}void swap(string& tmp){std::swap(_str, tmp._str);std::swap(_size, tmp._size);std::swap(_capacity, tmp._capacity);}string& operator=(string copy){swap(copy);return *this;}~string(){delete[] _str;_str = nullptr;_capacity = _size = 0;}const char* c_str() const{return _str;}const size_t size() const{return _size;}iterator begin(){return _str;}iterator end(){return _str + _size;}const_iterator begin() const{return _str;}const_iterator end() const{return _str + _size;}void resize(size_t n, char ch = '\0'){//1.考虑增容,不考虑缩容if (n > _size){reserve(n);size_t i = _size;for (i = _size; i < n; ++i){_str[i] = ch;}_str[i] = '\0';_size = n;}else{_str[n] = '\0';_size = n;}}void reserve(size_t new_capacity = 0){if (new_capacity > _capacity){char* tmp = new char[new_capacity + 1];for (size_t i = 0; i <= _size; ++i){tmp[i] = _str[i];}delete[] _str;_str = tmp;_capacity = new_capacity;}}void push_back(char ch){//检查容量if (_capacity == _size){reserve(_capacity == 0 ? 4 : 2 * _capacity);}_str[_size] = ch;++_size;_str[_size] = '\0';}string& append(const char* str){//检查容量size_t len = strlen(str);size_t j = 0;if (_size + len > _capacity){reserve(_size + len + 1);}for (size_t i = _size; i <= _size + len; ++i){_str[i] = str[j++];}_size = _size + len;return *this;}string& operator+=(char ch){push_back(ch);return *this;}string& operator+=(const char* str){append(str);return *this;}string& insert(size_t pos, char ch){//检查容量/*int head = pos;if (pos == 0){head = 0;}if (_capacity == _size){reserve(_capacity == 0 ? 4 : 2 * _capacity);}assert(head <= (int)_size);int end = _size;while (end >= head){_str[end + 1] = _str[end];--end;}_str[head] = ch;++_size;return *this;*/if (_capacity == _size){reserve(_capacity == 0 ? 4 : 2 * _capacity);}assert(pos <= _size);size_t end = _size + 1;while (end > pos){_str[end] = _str[end - 1];--end;}_str[pos] = ch;++_size;return *this;}string& insert(size_t pos, const char* str){size_t len = strlen(str);if (_size + len > _capacity){reserve(_size + len + 1);}size_t end = _size + len;while (end >= pos + len){_str[end] = _str[end - len];--end;}for (size_t j = 0; j < len; ++j){_str[pos++] = str[j];++_size;}return *this;}string& erase(size_t pos, size_t len = npos){if (len == npos || pos + len >= _size){_size = pos;_str[_size] = '\0';}else{size_t begin = pos + len;size_t str_len = strlen(_str);for (size_t i = begin; i < str_len; i++){_str[pos] = _str[i];++pos;}_size = pos;_str[_size] = '\0';}return *this;}size_t find(char ch, size_t pos = 0){assert(pos < _size);for (size_t i = 0; i < _size; ++i){if (ch == _str[i])return i;}return npos;}size_t find(const char* str, size_t pos = 0){assert(str);assert(pos < _size);const char* ret = strstr(_str + pos, str);if (ret == nullptr){return npos;}else{return ret - _str;}}string substr(size_t pos = 0, size_t len = npos){assert(pos < _size);Xq::string ret;size_t i = pos;if (len == npos || pos + len >= _size){for (i = pos; i < _size; ++i){ret += _str[i];}}else{while (len--){ret += _str[i++];}}return ret;}char& operator[](size_t pos){assert(pos < _size);return _str[pos];}const char& operator[](size_t pos) const{assert(pos < _size);return _str[pos];}bool operator>(const string& s) const{return strcmp(_str, s._str) > 0;}bool operator==(const string& s) const{return strcmp(_str, s._str) == 0;}bool operator>=(const string& s) const{return strcmp(_str, s._str) >= 0;}bool operator<(const string& s) const{return !(*this >= s);}bool operator<=(const string& s) const{return !(*this > s);}bool operator!=(const string& s) const{return !(*this == s);}public:static size_t npos;friend std::ostream& operator<<(std::ostream& out, const Xq::string& str);private:char* _str;size_t _capacity;size_t _size;};
}
size_t Xq::string::npos = -1;
std::ostream& Xq::operator<<(std::ostream& out, const Xq::string& str)
{for (size_t i = 0; i < str._size; ++i){out << str[i];}//for (size_t i = 0; i < str.size(); ++i)//{//	out << str[i];//}return out;
}
std::istream& operator>>(std::istream& in, Xq::string& str)
{1.考虑到可能有大量输入的情况，频繁+=，影响效率，因此在这里预先开好空间，一定程度上减少频繁扩容的问题//str.reserve(15);//char ch = 0;2.在这里in不符合需求，因为cin认为' '和'\n'是分隔符，in会忽略这两个字符in >> ch;3.get()这个成员函数会获取每一个字符//ch = in.get();//while (ch != ' ' && ch != '\n')//{//	str += ch;//	ch = in.get();//}//return in;str.erase(0);char ch = 0;ch = in.get();const size_t N = 32;char buffer[N] = { 0 };size_t i = 0;while (ch != ' ' && ch != '\n'){buffer[i++] = ch;if (N - 1 == i){buffer[N - 1] = '\0';str += buffer;i = 0;}ch = in.get();}buffer[i] = '\0';str += buffer;return in;
}

现在知道了浅拷贝对于这种类 (有动态申请的资源) 的问题：

a. 一个对象的修改会影响另一个对象；
b. 同一空间，析构两次，进程崩溃。

因此提出了深拷贝，但是呢，深拷贝的代价是很大的；

因此有人针对b问题提出了新的解决方案：引用计数。

每个对象析构时会减减引用计数，最后一个对象生命周期结束 (引用计数为0) 才会调用析构函数，释放空间。

那么如何解决a问题呢：写时拷贝--- COW(copy on write)。

即当某一个对象发生修改时才会拷贝新的空间 (进行深拷贝)，如果没有对象进行写操作，只进行读操作，那么这些对象共用同一块空间(一定程度上减少了深拷贝带来的代价)，但是这种方案也会存在问题(比如多执行流场景下，可能存在线程安全的问题)。

15. string 的一些例题

15.1. 字符串相加

原题链接：415. 字符串相加 - 力扣（LeetCode）

class Solution {
public:string addStrings(string num1, string num2) {std::string ret;int next = 0;int index1 = num1.size() - 1;int index2 = num2.size() - 1;while(index1 >= 0 || index2 >= 0){int tmp1 = index1 >= 0 ? num1[index1] - '0' : 0;int tmp2 = index2 >= 0 ? num2[index2] - '0' : 0;int tmp3 = tmp1 + tmp2 + next;tmp3 > 9 ? next = 1: next = 0;// a. 头插，效率低//ret.insert(0,1,(tmp3 % 10) + '0');// b. 尾插+逆置，效率高ret.push_back(tmp3 % 10+'0');--index1;--index2;}if(next == 1)//ret.insert(0,1,'1');ret.push_back('1');//std::reverse;//template <class BidirectionalIterator>//void reverse (BidirectionalIterator first, BidirectionalIterator last);reverse(ret.begin(),ret.end());  //逆置return ret;}
};

15.2. 字符串最后一个单词的长度

原题链接：字符串最后一个单词的长度_牛客题霸_牛客网 (nowcoder.com)

#include <iostream>
using namespace std;
int main() {string str;// 此处用 cin 不会通过测试用例getline(std::cin,str);      size_t pos = str.rfind(' ');if(pos != string::npos){cout << (str.size() - pos - 1);}else{cout << str.size();}   
}

15.3. 字符串中的第一个唯一字符

原题链接：387. 字符串中的第一个唯一字符 - 力扣（LeetCode）

class Solution {
public:int firstUniqChar(string s) {int array[26] = {0};for(auto ch : s){//array[0]映射的就是'a'，以此类推;++array[ch - 'a']; }//保证了第一次出现的唯一字符for(size_t i = 0; i < s.size();++i){if(array[s[i] - 'a'] == 1)return i;   }return -1;}
};

15.4. 验证回文串

原题连接：125. 验证回文串 - 力扣（LeetCode）

class Solution {
public:// 判断是不是数组和字母bool is_letter_or_nums(char ch) {return (ch >='0' && ch <='9'|| ch >='A' && ch <='Z');}// 将所有小写字母转化为大写字母bool isPalindrome(string s) {    for(auto& ch : s){if(ch >= 'a' && ch <= 'z')ch-=32;}int left = 0;int right = s.size() - 1;while(left < right){while(left < right && !is_letter_or_nums(s[left]))++left;while(left < right && !is_letter_or_nums(s[right]))--right;if(s[left++] != s[right--])return false;}return true;}
};