黑莓宕机影响全球用户3天 ...各大洲的用户都影响到了,可以看RIM官方上的service update:
http://www.rim.com/newsroom/service-update.shtml
原因是一个核心交换机坏了,虽有备份的,不过切换后备份交换机没按预期工作
一般实现一个高可用的服务器系统,有两种方式。
方法A,让一个主服务器工作,另外一个从服务器不工作但时刻待命,当主服务器故障时,待命的从服务器切换成主的,工作。
方法B,若干个同类型的服务器节点,均同时工作,但分担工作量。当有一台服务器故障时,分给它的工作则分摊到其他服务器节点。
黑莓服务的宕机,暴露了方法A的一些缺陷。平时是主机器工作,主机器坏时,想切从机器工作可是从机器正常工作不了.... 最初肯定测试过从机器可以切换工作,可是因为从机器平时不干活(虽然可能有同步主机器一些数据和状态什么的),所以可能配置维护的时候改错了都不知道,或是软件版本迭代过程中改坏了也没发现。。。
而方法B在这方面好一些,因为平时所有服务器节点都干活,功能都正常,有个别服务器故障时,服务不会中断,只是其他服务器负载因为分担了故障服务器的工作而稍微更一点。
更一般地说,放着N久没跑的功能,有一天开启想用,常常不能用...这种杯具已经在项目中实际碰到过多回了...
当然,方法A和方法B使用场合不尽相同,各有优缺点。方法A适用需要 控制节点,也就是需要单一 集中决策的节点。方法B更适合无状态的干活节点,或者传输中间转发节点。作为一个交换机,黑莓服务这里就做传输过程中的中间转发,似乎 使用方法B更好,平时两个都开着,两个都跑(是不是会费电些?),一个故障时,流量都跑到一个上去,但不至于突然发现备用的工作不了。
ps. 附上黑莓CIO对故障影响的一个说明:
Wednesday 12th October – 17:44 (GMT-5)
Service update from RIM CIO
To All BlackBerry Customers:
I want to first apologize for the service interruptions and delays many of you have been experiencing this week. I also wanted to connect with you directly, give you an update on the service issues we are trying to solve, and answer some of the questions and concerns you’ve expressed.
You’ve depended on us for reliable, real-time communications, and right now we’re letting you down. We are taking this very seriously and have people around the world working around the clock to address this situation. We believe we understand why this happened and we are working to restore normal service levels in all markets as quickly as we can.
Here is the current status of service and issues for the various regions that were impacted:
For Europe, Middle East, India and Africa (EMEIA):
- Email systems are operating and we are continuing to clear any backlogged messages. Support teams are working to minimize the impact on our customers.
- BBM traffic is online and traffic is passing successfully
- Browsing is temporarily unavailable as the Support teams monitor service stability and continue to assess when this service can be safely brought online
- Support teams have added capacity to help with message delivery between regions and continents
For Canada and Latin America:
- Email systems are operating and we are continuing to clear any backlogged messages. Support teams are working to minimize the impact on our customers
- BBM and browsing services are online and traffic is passing successfully (except for three carrier networks in Latin America that are serviced by the EMEIA infrastructure – browsing is temporarily unavailable for those three carrier networks)
- Support teams are investigating reports of BBM delays
For the U.S.:
- Email systems are operating and we are continuing to clear any backlogged messages. Support teams are working to minimize the impact on our customers.
- Support teams have added capacity to help with message delivery between regions and continents
- BBM and browsing services are online and traffic is passing successfully
- Support teams are investigating reports of BBM delays
We will provide regular updates on BlackBerry.com, RIM.com and via our social channels. We are doing everything in our power to restore regular service everywhere and to restore your trust in us.
Yours sincerely,
Robin Bienfait
Chief Information Officer, RIM