数据结构-六度分离(Bloom过滤器)

news/2024/10/30 22:22:54/

你听说过“六度分离[1]”实验,并决定亲自测试。你的计划很简单:随机选择两个人,Alice和Bob,以及一个秘密单词。请Alice在他们最喜欢的社交媒体Touhiter上发布该消息,并将其发布给所有追随者,请他们按照相同的指示重新发布该消息。

然后等到Bob收到消息,问他们需要多少次回复。(当然,一次是不够的,所以你必须重复几次实验来建立信心。)

在本作业中,您将实现并模拟这一点。有一个包含4亿个顶点的底层(有向)图:您可以对每个用户进行以下访问(表示为一个顶点;您不必实现它):

- add_follower(Vertex): Adds a follower to the current user. 

- get_followers(): Returns a list of all the followers connected to the Vertex.

- get_username(): Returns the (guaranteed unique) username of the user.

您的目标是实现一个BFS(广度优先搜索)来执行实验,从Alice(作为输入提供给您的起始顶点)开始,到Bob(提供给您另一个节点)时停止。不幸的是,您无法承受使用太多内存的代价,尤其是对于40000000大小的数据结构而言。虽然错误概率很小,但您还是可以接受的,因此您必须首先实现Bloom过滤器(类BloomFilter),并通过实现GraphBFS方法修改BFS算法(类Bloom过滤器)来使用它:

bfs(self, start: Vertex, end: Vertex, bloom_filter: BloomFilter)

请注意,您不必实现类Vertex,也不必创建自己的哈希函数。它们将在测试期间提供给您的代码!您也不必指定比特数bits和散列函数m的数量,这也将在测试期间由您的代码传递(当然,对于您自己的测试,您可能希望使用问题e中的设置)。例如,作业的书面部分)。

[1] 六度分离:早在上个世纪60年代,美国著名社会心理学家米尔格伦(Stanley Milgram)就提出了“六级分隔”(Six Degrees of Separation)的理论,并设计了一个连锁信的实验来验证这个假设。他认为,任何两个陌生人都可以通过“朋友的朋友”建立联系,并且他们之间所间隔的人不会超过六个,无论是美国总统与威尼斯的船夫,或热带雨林中的土著与爱斯基摩人。也就是说,最多通过六个人你就能够认识任何一个陌生人。这就是著名的“小世界假设”。

从2001年秋天开始,美国哥伦比亚大学的社会学教授瓦茨(Duncan Watts)组建了一个研究小组,利用Email这一现代通信工具,开始进行“小世界假设”的实验(http://smallworld.columbia.edu)。在1年多时间里,总共有166个国家和地区的6万多名志愿者参与实验,实验结果证明,一封邮件平均被转发6次,即可回到接收者那里。

这个玄妙理论引来了数学家、物理学家和电脑科学家纷纷投入研究,结果发现,世界上许多其他的网络也有极相似的结构。经济活动中的商业联系网络、通过超文本链接的网络、生态系统中的食物链,甚至人类脑神经元,以及细胞内的分子交互作用网络,都有着完全相同的组织结构。

from typing import Callable, List"""
BloomFilter Class
----------This class represents the bloom filter to be used in the graph. Each BloomFilter consists of the following properties:- hash_functions: A list of hash functions used in the bloom filter. Each function hashes a string to a number in the range of [0, bits*elements).- bits: The number of bits per element in the bloom filter.- elements: The number of elements in the bloom filter (capacity).- filter: The bit array representing the bloom filter.The class also supports the following functions:- add(str): Adds a new entry to the bloom filter.- get(str): Returns a boolean that probabilistically indicates whether the given string is in the bloom filter.Your task is to complete the following functions which are marked by the TODO comment.
You are free to add properties and functions to the class as long as the given signatures remain identical.
Good Luck!
"""class BloomFilter:# These are the defined properties as described abovehash_functions: List[Callable[[str], int]]bits: intelements: intfilter: List[bool]def __init__(self, hash_functions: List[Callable[[str], int]], bits: int, elements: int) -> None:"""The constructor for the BloomFilter class.:param hash_functions: A list of hash functions used in the bloom filter. Each function hashes a string to a number in the range of [0, BITS).:param bits: The number of bits in the bloom filter.:param elements: The number of elements in the bloom filter (capacity)."""# TODO: Fill this inself.hash_functions = hash_functionsself.bits = bitsself.elements = elementsself.bitMap = [0] * self.bitsdef add(self, item: str) -> None:"""Adds a new entry to the bloom filter.:param item: The string to be added to the bloom filter."""# TODO: Fill this infor func in self.hash_functions:self.bitMap[func(item) % self.bits] = 1def get(self, item: str) -> bool:"""Given a string, returns a boolean that probabilistically indicates whether the given string is in the bloom filter.:param item: The string to be checked in the bloom filter.:return: A boolean that indicates whether the string has been seen before."""# TODO: Fill this infor func in self.hash_functions:if self.bitMap[func(item) % self.bits] != 1:return Falsereturn True"""
Vertex Class
----------This class represents an individual user in the social network. Each Vertex consists of the following properties:- username: The username for the current User. Guaranteed to be unique.- followers: A list of followers connected to this Vertex (adjlist)The class also supports the following functions:- add_follower(Vertex): Adds a follower to the current user. - get_followers(): Returns a list of all the followers connected to the Vertex.- get_username(): Returns the username of the User.**You do not need to edit this class**"""class Vertex:# These are the defined properties as described aboveusername: strfollowers: "List[Vertex]"def __init__(self, username: str) -> None:"""The constructor for the Vertex class.:param username: The username of the user."""self.username = usernameself.followers = []def add_follower(self, vertex: "Vertex") -> None:"""Adds a follower to the given user with the given distance.:param vertex: The vertex to add an edge to."""# vertex is None, self-loop or already in the adjlistif not vertex or vertex == self or vertex in self.followers:return# Add to adjlistself.followers.append(vertex)def get_followers(self) -> "List[Vertex]":"""Returns the list of followers for the user.:return: The list of followers for the user."""return self.followersdef get_username(self) -> str:"""Returns the username of the user.:return: The username of the user."""return self.username"""
Graph Class
----------This class represents the social network as a graph. The class supports the following functions:- bfs(Vertex, Vertex, BloomFilter): Perform a BFS from the start vertex to the end using the bloom filter.Your task is to complete the following functions which are marked by the TODO comment.
You are free to add properties and functions to the class as long as the given signatures remain identical.
Good Luck!
"""
from collections import dequeclass Graph:def bfs(self, start: Vertex, end: Vertex, bloom_filter: BloomFilter) -> int:"""Performs a BFS from the start vertex to the end using the bloom filter.Note you should use the bloom filter to avoid excessive memory usage with large graph inputs.The bloom filter is constructed from your filled in BloomFilter class.:param start: The start vertex.:param end: The end vertex.:param bloom_filter: The bloom filter used in the graph.:return: The degree of separation between the start and end vertices or -1 if they're not connected."""# TODO: Fill this inqueue = deque()queue.append(start)visited = bloom_filtervisited.add(start.get_username())count = 0while queue:v = queue.popleft()for w in v.get_followers():if w.get_username() == end.get_username():return countif not visited.get(w.get_username()):visited.add(w.get_username())queue.append(w)count += 1return -1from unittest import TestCaseBITS = 4
ELEMENTS = 4"""
This is a sample test suite for you to write your own test.
Note that the provided hash functions are NOT good hash function but
are used for demonstration purposes only. You should write your own
hash functions and test cases to test your implementation.No tests are provided or will be provided for the bfs function.
You are free (and encouraged!) to use this as a base to write your own tests.
"""# These are BAD hash function and are used for demonstration only.
# It just takes the first letter of a string and returns its ASCII value in the range [0, BITS * ELEMENTS).
# This means that it will always return the same value for a given string
# or any string that starts with the same letter.
def bad_hash_function_1(name: str) -> int:return ord(name[0]) % (BITS * ELEMENTS)def bad_hash_function_2(name: str) -> int:return (ord(name[0]) + 8) % (BITS * ELEMENTS)class TestBloomFilterSample(TestCase):def setUp(self) -> None:self.bf = BloomFilter([bad_hash_function_1, bad_hash_function_2], bits=16, elements=1)def test_bloom_filter_sample(self):self.bf.add("a")  # This should set the 1st and 9th index to Trueself.assertEqual(self.bf.get("a"), True, "Bloom Filter returns incorrect result")self.bf.add("z")  # This should set the 10th and 2nd index to Trueself.assertEqual(self.bf.get("z"), True, "Bloom Filter returns incorrect result")# This should return True because the 2nd and 10th index are set to True even though the key 'b' has not been added# This is what makes a Bloom Filter a probabilistic data structureself.assertEqual(self.bf.get("b"), True, "Bloom Filter returns incorrect result")


http://www.ppmy.cn/news/573881.html

相关文章

网易游戏(互娱)2021届秋招开始

网易游戏(互娱)秋招秋招截止至9.25 投递地址: https://game.campus.163.com/home 内推码:NaMukB 更多问题欢迎加qq群1062592666讨论,还有sp-offer内推~hr小姐姐也在的哦~

网易云音乐2020校招

网易云音乐2020校招 http://m.bole.netease.com/#/app/index?boleId67e074d29438c9d0&boleType2&type6&projectId19&signatureebb84bde09e56d114472e68564cbfa5a 点击马上投递,全程分享招聘进度

23届网易秋招内推信息

【网易游戏(互娱)】2023秋季校园招聘7.26(明天)即将启动! 👉加入专属答疑社群:QQ【818424638】(或扫描群二维码) 即可获取网易专属🎁求职资料大礼包&#x1f…

梦幻春晚服务器找不到,《梦幻西游2》春晚即将开启 网易CC全程直播

《梦幻西游( 有爱互动,与GM一起欢天喜地过猴年 万人挖高级宝图这种盛况空前的场景,你看过吗?想要变成最神奇的游戏造型吗?想要看见被你拿着鞭炮满场景跑的年兽吗?这些你统统都可以在梦幻春晚中看到。平日里神秘的GM大大…

CCF201709-1 打酱油

试题编号:201709-1试题名称:打酱油时间限制:1.0s内存限制:256.0MB 问题描述: 小明带着N元钱去买酱油。酱油10块钱一瓶,商家进行促销,每买3瓶送1瓶,或者每买5瓶送2瓶。请问小明最多可…

初试CCF认证有感

一年前,还是大二,当时学校推送CCF认证的优惠报名,没有当回事。因为我并没有认真学数据结构(有点后悔),直到班里有几个同学拿了比较高的分数后,CCF认证才引起了我的关注。上学期期末和朋友的无意…

网易cc题2

简单的字符串抽取,按顺序遍历即可,复杂度o(nm) 题目2 : crossgift 时间限制:5000ms 单点时限:1000ms 内存限制:256MB 描述 话说有个网易CC主播lily唱歌很好听,但是她粉丝很多,每次都有很多人要点歌。聪明的Lily想了个办法&…

大表姐今晚直播:计算机保研考研工作党集结令

中国计算机学会、计算机辅助设计与图形学专委会“走进高校”与“走进企业”联合活动(第二季第2期)将于今天(9月3日)20:00-21:30举办。 参与活动直播,了解计算机图形学大家庭,更有机会和 CEO/CTO 互动&#…