計算機專業(yè)時文選讀(972)

軟考 責任編輯:simonwang 2005-08-07

添加老師微信

備考咨詢

加我微信

摘要:BulletproofStorageDisksystemswillrepairthemselvesorcanbeleftunrepairedforyears.Youcanflyatwo-engineplanewithoneengine,buthowmanypassengerswouldwanttobeonit?That’stheideabehind“bulletproofstorage,”aconceptthatIBMhasbeendevelopingfortwoyearsandplans

Bulletproof Storage

Disk systems will repair themselves or can be left unrepaired for years.

You can fly a two-engine plane with one engine, but how many passengers would want to be on it?

That’s the idea behind “bulletproof storage,” a concept that IBM has been developing for two years and plans to begin unveiling incrementally over the next one to three years.

IBM’s technology initiative deals with fault tolerance in every part of a storage system: disk, controller, network cards, power supplies and software. By building more-robust storage systems that can defer replacement of failed parts for up to three years because of redundant components, IBM believes it can also eliminate many human errors that happen when failing components are replaced.

According to Stanley Zaffos, an analyst at Gartner Inc. the bulletproof storage concept still has another five to 10 years before it’s broadly embraced by users. But once it is, storage systems will require less maintenance and, therefore, cost less to maintain.

“We know how to build very reliable code. We use appliances every day that have software built into them that work forever: your automobile, your calculator, the disk drive in your PC, your telephone,”Zaffos says.

But IBM is looking to attack far more complex systems than telephones or calculators.

Under its bulletproof initiative, IBM is addressing disk-sector failures that grow along with disk capacity. While disk capacities double every 12 to 18 months, uncorrectable read/write error rates haven’t improved, nor has the probability of an uncorrectable error occurring on a disk read decreased. There are more sectors on today’s disks and, therefore, a greater chance of an uncorrectable error.

The answer is to create self-healing capabilities for storage management software and more-robust RAID configurations.

IBM says that in about a year it will release storage systems that can support three simultaneous disk-drive failures in a single array by introducing additional parity disks into RAID configurations, offering many times the resiliency of a RAID configuration with two parity disks. Today, standard systems allow for only two disk failures.

But Zaffos argues that 80% of downtime today is caused by user error and software failures, not hardware failures. He says that the failures resulting from software are created by complexity and that there is an almost infinite number of failures that can occur in a complex system.

IBM is addressing those code failures with a software project called N-Version Programming, where two pieces of code in the same application save data and then compare the data to ensure that there are no errors.

In N-Version Programming, two copies of data are protected using different means. One copy might be protected by standard RAID-5 programming coded by Programmer A.

The second copy is protected by a different algorithm coded by Programmer B. That way, if the first copy gets corrupted due to a particular bug in the program written by Programmer A, then the second copy can be used.

The second copy may have its own bugs, but they will manifest in different ways at different times, and when they do, the first copy will be the one which is good and which you can then use. It’s kind of like having a second person check the work of a first person and keep fixing it whenever it finds mistakes.

One way IBM plans to detect and correct corrupted data is to create more-resilient storage software with repairable data structures. The code checks that certain conditions, which are described in rules, are met. For example, in a file system with multiple files, the sum of the space taken by the files plus the free space in the system must be equal to the total available space. The code will check this property automatically at various times and use a procedure to repair and fix problems if the property isn’t met.

In this case, the software isn’t checking the code to see that it’s functioning properly and isn’t checking data contents. If certain properties aren’t met, the software knows how to fix the data structures.

But don’t expect to see fruit from N-Version Programming or checkable data structures for another two to three years.

防彈存儲

磁盤系統(tǒng)自行修理或者幾年不用修理。

雙引擎飛機能用一個引擎飛行,但有多少乘客愿意乘坐?

“防彈存儲”背后的想法就是這樣一個概念,IBM已經(jīng)研究了兩年,并計劃在今后一至三年中不斷公布進展。

IBM的此項技術(shù)首創(chuàng)是要在存儲系統(tǒng)的方方面面:磁盤、控制器、網(wǎng)卡、電源和軟件,實現(xiàn)容錯。IBM相信,通過制造更健壯的、并由于有冗余部件從而能將故障部件的更換推遲兩至三年的存儲系統(tǒng),能避免很多在更換故障部件時產(chǎn)生的人為錯誤。

Gartner公司的分析師Stanley Zaffos稱,防彈存儲概念能為用戶廣為接受還需要5至10年的時間。但一旦得到認可,存儲系統(tǒng)將需要更少的維護,因而需要更低的維護成本。

Zaffos說:“我們知道如何編制非??煽康某绦?。我們每天使用各種各樣的裝置:汽車、計算器、PC機中的磁盤機和電話,它們都內(nèi)裝了使其能永遠工作的軟件。”

但IBM著眼于攻克比電話或計算器更復(fù)雜的系統(tǒng)。

在此項技術(shù)首創(chuàng)中,IBM要解決隨磁盤容量增加而增加的磁盤部分故障。磁盤容量每12至18個月就翻一番,但無法糾正的讀/寫錯誤率沒有得到改進,而且發(fā)生在磁盤讀時的無法糾正的錯誤概率也沒有降低。今天的磁盤上有更多的扇區(qū),因而出現(xiàn)無法糾正錯誤的機會就更多。

這個問題的答案是提供存儲管理軟件的自修復(fù)能力以及更健壯的RAID(冗余磁盤陣列)配置。

IBM稱,約在一年的時間里,將公布通過在RAID配置中增加一個奇偶盤而能在單個陣列中支持三個磁盤同時發(fā)生故障的存儲系統(tǒng),這將比兩個奇偶盤RAID配置的彈性高出了很多倍。今天,標準的系統(tǒng)只允許兩個磁盤出現(xiàn)故障。

但Zaffos認為,今天80%的宕機是由于用戶的錯誤和軟件故障,而不是硬件故障引起的。他說,軟件帶來的故障是因復(fù)雜性造成的,而在復(fù)雜系統(tǒng)中可能發(fā)生的故障幾乎是不計其數(shù)的。

IBM用一個叫N-Version Programming的軟件項目來解決這些程序故障,其中同一應(yīng)用軟件中有兩段程序保存數(shù)據(jù),然后通過比較數(shù)據(jù)來確保沒有錯誤。

在N-Version Programming中,使用不同的方式保護數(shù)據(jù)的兩個備份。一個備份可以用由程序員A編寫的標準RAID-5編程保護。

第二個備份由程序員B編寫的不同算法進行保護。這樣,如果第一個備份由于程序員A編寫的程序中的特定錯誤而被破壞了,就可以使用第二個備份。

第二個備份也可能有其自己的錯誤,但這些錯誤將以不用的方式、在不同的時間表現(xiàn)出來,當出現(xiàn)這些錯誤時,第一個備份將是好的,你可以使用。這好像是有第二個人來檢查第一個人的工作,一發(fā)現(xiàn)錯誤就糾正。

IBM計劃用來檢測和糾正被破壞數(shù)據(jù)的一個方法,就是用可修理的數(shù)據(jù)結(jié)構(gòu)來生成更有彈性的存儲軟件。這種程序檢查在規(guī)則中描述的某些條件是否得到滿足。例如,在有多個文件的文件系統(tǒng)中,文件占用的空間與系統(tǒng)中未用的空間之和應(yīng)該等于總的可用空間。上述程序在不同的時間自動檢查此特性,并在此特性未能得到滿足時啟用程序進行修理并糾正此問題。

此時,軟件不是檢查此程序,看看它是否正常運行,也不是檢查數(shù)據(jù)內(nèi)容。如果某些特性未能滿足,軟件知道如何來修正數(shù)據(jù)結(jié)構(gòu)。

但不要指望在今后兩三年內(nèi)就能見到N-Version Programming項目,即可檢查數(shù)據(jù)結(jié)構(gòu)的成果。

更多資料
更多課程
更多真題
溫馨提示:因考試政策、內(nèi)容不斷變化與調(diào)整,本網(wǎng)站提供的以上信息僅供參考,如有異議,請考生以權(quán)威部門公布的內(nèi)容為準!

軟考備考資料免費領(lǐng)取

去領(lǐng)取

!
咨詢在線老師!