物种分化在进化拓扑中的作用

- 0. 前言
- 1. 使用灭绝促进物种进化
- 2. NEAT 物种分化
- 小结
- 系列链接

0. 前言

在本节中，我们将探讨 NEAT 如何使用“物种分化 (speciation) ”的特性来跟踪种群多样性。物种分化源自生物学，是一种描述相似的有机体如何进化出独特特征以成为不同物种的方法。达尔文首先提出了物种的概念，它是一种描述地球上生命进化过程的方法。

1. 使用灭绝促进物种进化

NEAT 使用将基因组分组为物种的相同概念来进行优化和多样化。将基因组分组为物种突出了多样化的网络种群如何进化，我们通常希望保持种群的多样性，以避免陷入局部最大值或最小值。
缺乏多样性往往会导致进化中的种群变得过于专门化或固定在某些局部最小值/最大值上，在现实世界中，变得过于专门化且无法适应环境的有机体会由于环境持续的变化而灭绝。
我们通常将物种灭绝视为一件坏事，这是因为我们人类现在能够意识到自身行为在全球数千个物种持续灭绝中的作用。然而，如果没有人类干预，灭绝是地球上生命经历了数十亿年的自然过程。在进化计算中，灭绝也可能是一件好事，因为它鼓励多样性和更好的个体表现。
NEAT 使用灭绝来迫使物种不断进化或灭绝，这样做可以防止物种变得停滞不前或过度专门化，并鼓励种群多样性。接下来，我们将学习如何使用物种分化帮助 NEAT 解决复杂问题。

2. NEAT 物种分化

在本节中，使用圆形问题集，使用 NEAT 物种分化功能，我们还将探索更多的 NEAT 配置选项。

(1) NEAT-Python 使用配置选项可以控制基因组进化的每个方面，包括节点连接、节点、激活/聚合函数和权重。这些选项赋予了 NEAT 强大的能力，但也使得在复杂问题上进化网络更加困难：

import numpy as np
import sklearn
import sklearn.datasets
import sklearn.linear_model
import matplotlib.pyplot as plt
from IPython.display import clear_output
import timeimport neatnumber_samples = 15 #@param {type:"slider", min:10, max:1000, step:5}
difficulty = 1 #@param {type:"slider", min:1, max:5, step:1}
problem = "circles" #@param ["classification", "blobs", "gaussian quantiles", "moons", "circles"]
number_features = 2
number_classes = 2 def load_data(problem):  if problem == "classification":clusters = 1 if difficulty < 3 else 2informs = 1 if difficulty < 4 else 2data = sklearn.datasets.make_classification(n_samples = number_samples,n_features=number_features, n_redundant=0, class_sep=1/difficulty,n_informative=informs, n_clusters_per_class=clusters)if problem == "blobs":data = sklearn.datasets.make_blobs(n_samples = number_samples,n_features=number_features, centers=number_classes,cluster_std = difficulty)if problem == "gaussian quantiles":data = sklearn.datasets.make_gaussian_quantiles(mean=None, cov=difficulty,n_samples=number_samples,n_features=number_features,n_classes=number_classes,shuffle=True,random_state=None)if problem == "moons":data = sklearn.datasets.make_moons(n_samples = number_samples)if problem == "circles":data = sklearn.datasets.make_circles(n_samples = number_samples)return datadata = load_data(problem)
X, Y = data# Input Data
plt.figure("Input Data")
plt.scatter(X[:, 0], X[:, 1], c=Y, s=40, cmap=plt.cm.Spectral)

(2) 设置 NEAT 配置选项。更新适应度函数以产生最大适应度 1.0。因此，还更新了 fitness_threshold，中间节点的数量为 25，以允许网络拓扑结构有空间进行扩展。从经验上讲，我们知道圆形问题是可解的，只需简单的几层架构即可。为了减少网络内部的拓扑变化次数，减少连接和节点添加或删除的概率：

[NEAT]
fitness_criterion     = max
fitness_threshold     = .9
pop_size              = 100
reset_on_extinction   = 1[DefaultGenome]
num_inputs              = 2
num_hidden              = 25
num_outputs             = 1
initial_connection      = partial_direct 0.5
feed_forward            = True
compatibility_disjoint_coefficient    = 1.0
compatibility_weight_coefficient      = 0.6
conn_add_prob           = 0.02
conn_delete_prob        = 0.02
node_add_prob           = 0.02
node_delete_prob        = 0.02
activation_default      = sigmoid
activation_options      = sigmoid
activation_mutate_rate  = 0.0
aggregation_default     = sum
aggregation_options     = sum
aggregation_mutate_rate = 0.0
bias_init_mean          = 0.0
bias_init_stdev         = 1.0
bias_replace_rate       = 0.1
bias_mutate_rate        = 0.7
bias_mutate_power       = 0.5
bias_max_value          = 30.0
bias_min_value          = -30.0
response_init_mean      = 1.0
response_init_stdev     = 0.0
response_replace_rate   = 0.0
response_mutate_rate    = 0.0
response_mutate_power   = 0.0
response_max_value      = 30.0
response_min_value      = -30.0weight_max_value        = 30
weight_min_value        = -30
weight_init_mean        = 0.0
weight_init_stdev       = 1.0
weight_mutate_rate      = 0.08
weight_replace_rate     = 0.01
weight_mutate_power     = 0.1
enabled_default         = True
enabled_mutate_rate     = 0.01[DefaultSpeciesSet]
compatibility_threshold = 1.0[DefaultStagnation]
species_fitness_func = max
max_stagnation  = 25[DefaultReproduction]
elitism            = 2
survival_threshold = 0.2

由于圆形问题可以通过权重调整来解决，因此本节我们专注于最小化权重变化，允许基因组逐渐适应并缓慢调整权重，这类似于在训练深度学习网络时减少学习率的方式。更新两个选项以更好地控制物种分化，第一个选项 compatibility_threshold 控制物种之间的距离；第二个是 max_stagnation，它控制在检查物种灭绝之前要等待的代数。

(3) 接下来，更新适应度评估函数以更好地评估二元分类问题。在之前的学习中，使用均方误差 (Mean squared error, MSE) 进行适应度评估。在本节中，为了更好地考虑错误的类别分类，可以使用像二元交叉熵这样的函数来计算误差，简单起见，我们使用计算真实标签与实际输出之间距离。因此，如果真实标签为 0，而网络输出为 0.9，则误差为 -0.9。同样，如果类别为 1，而网络输出为 0.2，则误差为 0.8。平方误差将其附加到结果中可以消除符号，并允许我们使用 np.mean 提取平均误差，然后通过从最大适应度(现在为 1 )中减去平均误差计算总适应度：

config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction,neat.DefaultSpeciesSet, neat.DefaultStagnation,'config')print(config.genome_type, config.genome_config,config.pop_size)key = "fred"
genome = config.genome_type(key)
genome.configure_new(config.genome_config)net = neat.nn.FeedForwardNetwork.create(genome, config)results = []
for x, y in zip(X,Y):   yi = net.activate(x)[0] if y < .5:error = yi - yelse:error = y - yi  print(yi, error)results.append(error*error)
fitness = 1 - np.mean(results)print(fitness)def eval_genomes(genomes, config):for genome_id, genome in genomes:     net = neat.nn.FeedForwardNetwork.create(genome, config)  results = []  for x, y in zip(X,Y):yi = net.activate(x)[0]  if y < .5:error = yi - yelse:error = y - yi                    results.append(error*error)genome.fitness = 1 - np.mean(results)def show_predictions(net, X, Y, name=""):""" display the labeled data X and a surface of prediction of model """x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))X_temp = np.c_[xx.flatten(), yy.flatten()]Z = []    for x in X_temp:Z.append(net.activate(x))Z = np.array(Z)plt.figure("Predictions " + name)plt.contourf(xx, yy, Z.reshape(xx.shape), cmap=plt.cm.Spectral)plt.ylabel('x2')plt.xlabel('x1')plt.scatter(X[:, 0], X[:, 1],c=Y, s=40, cmap=plt.cm.Spectral)
plt.show()import graphvizdef draw_net(config, genome, view=False, filename=None, node_names=None, show_disabled=True, prune_unused=False,node_colors=None, fmt='svg'):""" Receives a genome and draws a neural network with arbitrary topology. """# Attributes for network nodes.if graphviz is None:print("This display is not available due to a missing optional dependency (graphviz)")return# If requested, use a copy of the genome which omits all components that won't affect the output.if prune_unused:genome = genome.get_pruned_copy(config.genome_config)if node_names is None:node_names = {}assert type(node_names) is dictif node_colors is None:node_colors = {}assert type(node_colors) is dictnode_attrs = {'shape': 'circle','fontsize': '9','height': '0.2','width': '0.2'}dot = graphviz.Digraph(format=fmt, node_attr=node_attrs)inputs = set()for k in config.genome_config.input_keys:inputs.add(k)name = node_names.get(k, str(k))input_attrs = {'style': 'filled', 'shape': 'box', 'fillcolor': node_colors.get(k, 'lightgray')}dot.node(name, _attributes=input_attrs)outputs = set()for k in config.genome_config.output_keys:outputs.add(k)name = node_names.get(k, str(k))node_attrs = {'style': 'filled', 'fillcolor': node_colors.get(k, 'lightblue')}dot.node(name, _attributes=node_attrs)used_nodes = set(genome.nodes.keys())for n in used_nodes:if n in inputs or n in outputs:continueattrs = {'style': 'filled','fillcolor': node_colors.get(n, 'white')}dot.node(str(n), _attributes=attrs)for cg in genome.connections.values():if cg.enabled or show_disabled:# if cg.input not in used_nodes or cg.output not in used_nodes:#    continueinput, output = cg.keya = node_names.get(input, str(input))b = node_names.get(output, str(output))style = 'solid' if cg.enabled else 'dotted'color = 'green' if cg.weight > 0 else 'red'width = str(0.1 + abs(cg.weight / 5.0))dot.edge(a, b, _attributes={'style': style, 'color': color, 'penwidth': width})dot.render(filename, view=view)dot.view()return dotnode_names = {-1: 'X1', -2: 'X2', 0: 'Classify'}
draw_net(config, genome, True, node_names=node_names)from neat.math_util import mean, stdevclass CustomReporter(neat.reporting.BaseReporter):"""Uses `print` to output information about the run; an example reporter class."""def __init__(self, show_species_detail, gen_display=100):self.show_species_detail = show_species_detailself.generation = Noneself.generation_start_time = Noneself.generation_times = []self.num_extinctions = 0   self.gen_display = gen_display        def start_generation(self, generation):clear_output()self.generation = generationprint('\n ****** Running generation {0} ****** \n'.format(generation))self.generation_start_time = time.time()def end_generation(self, config, population, species_set):ng = len(population)ns = len(species_set.species)      if self.show_species_detail:print('Population of {0:d} members in {1:d} species:'.format(ng, ns))print("   ID   age  size   fitness   adj fit  stag")print("  ====  ===  ====  =========  =======  ====")for sid in sorted(species_set.species):s = species_set.species[sid]a = self.generation - s.createdn = len(s.members)f = "--" if s.fitness is None else f"{s.fitness:.3f}"af = "--" if s.adjusted_fitness is None else f"{s.adjusted_fitness:.3f}"st = self.generation - s.last_improvedprint(f"  {sid:>4}  {a:>3}  {n:>4}  {f:>9}  {af:>7}  {st:>4}")else:print('Population of {0:d} members in {1:d} species'.format(ng, ns))elapsed = time.time() - self.generation_start_timeself.generation_times.append(elapsed)self.generation_times = self.generation_times[-10:]average = sum(self.generation_times) / len(self.generation_times)print('Total extinctions: {0:d}'.format(self.num_extinctions))if len(self.generation_times) > 1:print("Generation time: {0:.3f} sec ({1:.3f} average)".format(elapsed, average))else:print("Generation time: {0:.3f} sec".format(elapsed))def post_evaluate(self, config, population, species, best_genome):# pylint: disable=no-self-usefitnesses = [c.fitness for c in population.values()]fit_mean = mean(fitnesses)fit_std = stdev(fitnesses)best_species_id = species.get_species_id(best_genome.key)print('Population\'s average fitness: {0:3.5f} stdev: {1:3.5f}'.format(fit_mean, fit_std))print('Best fitness: {0:3.5f} - size: {1!r} - species {2} - id {3}'.format(best_genome.fitness,best_genome.size(),best_species_id,best_genome.key))if (self.generation) % self.gen_display == 0 : members = [len(s.members) for s in species.species.values()]      num_generations = len(members)curves = np.array(members).Tfig, ax = plt.subplots()ax.stackplot(range(num_generations), *curves)plt.title("Speciation")plt.ylabel("Size per Species")plt.xlabel("Generations")plt.show()self.best_fit = best_genome.fitnessnet = neat.nn.FeedForwardNetwork.create(best_genome, config)      show_predictions(net, X, Y)     time.sleep(5) def complete_extinction(self):self.num_extinctions += 1print('All species extinct.')def found_solution(self, config, generation, best):print('\nBest individual in generation {0} meets fitness threshold - complexity: {1!r}'.format(self.generation, best.size()))def species_stagnant(self, sid, species):if self.show_species_detail:print("\nSpecies {0} with {1} members is stagnated: removing it".format(sid, len(species.members)))def info(self, msg):print(msg)# Create the population, which is the top-level object for a NEAT run.
p = neat.Population(config)# Add a stdout reporter to show progress in the terminal.
p.add_reporter(CustomReporter(True, gen_display=10))# Run until a solution is found.
winner = p.run(eval_genomes)# Display the winning genome.
print('\nBest genome:\n{!s}'.format(winner))# Show output of the most fit genome against training data.
print('\nOutput:')
winner_net = neat.nn.FeedForwardNetwork.create(winner, config)
show_predictions(winner_net, X, Y)
draw_net(config, winner, True, node_names=node_names)
for x, y in zip(X, Y):output = winner_net.activate(x)
print("  input {!r}, expected output {!r}, got {!r}".format(x, y, output))draw_net(config, winner, True)

下图显示了演化网络的结果，早期的进化中，NEAT 只追踪三个物种的网络。每个物种中的个体数量由 compatibility_threshold 选项控制。兼容性是衡量网络之间相似性的指标，包括连接数、连接权重、节点等。减小兼容性阈值会产生更多的物种，因为网络之间的兼容性差异很小，同样，增加此阈值会减少物种数量。

运行结果

NEAT 跟踪每个物种在进化过程中的历史，max_stagnation 选项控制在评估特定物种的进展之前要等待多少代。在停滞期结束后，将评估物种的改进。如果此时某个物种在停滞期内没有发生变化，它将灭绝并从种群中移除。在下图中，左侧图表显示所有物种都已被标记为灭绝，这是因为物种停滞不前，适应度没有明显改善。事实上，当前获胜基因组的结果看起来相对不错，所以当前设定的停滞期可能太短了。探索不同配置选项，并观察是否可以以大于 0.95 的适应度解决圆形问题。

小结

物种分化不仅增加了种群的多样性，还展示了进化网络何时停滞，使用 NEAT 成功解决复杂问题的关键在于平衡配置选项。

系列链接

进化深度学习
生命模拟及其应用
生命模拟与进化论
遗传算法（Genetic Algorithm）详解与实现
遗传算法中常用遗传算子
遗传算法框架DEAP
DEAP框架初体验
使用遗传算法解决N皇后问题
使用遗传算法解决旅行商问题
使用遗传算法重建图像
遗传编程详解与实现
粒子群优化详解与实现
协同进化详解与实现
进化策略详解与实现
差分进化详解与实现
神经网络超参数优化
使用随机搜索自动超参数优化
使用网格搜索自动超参数优化
使用粒子群优化自动超参数优化
使用进化策略自动超参数优化
使用差分搜索自动超参数优化
使用Numpy构建神经网络
利用遗传算法优化深度学习模型
在Keras中应用神经进化优化
使用Keras构建卷积神经网络
编码卷积神经网络架构
进化卷积神经网络
卷积自编码器详解与实现
编码卷积自编码器架构
使用遗传算法优化自编码器模型
变分自编码器详解与实现
生成对抗网络详解与实现
WGAN详解与实现
编码WGAN
使用遗传算法优化生成对抗网络
NEAT详解与实现
NEAT初体验