計(jì)算機(jī)專(zhuān)業(yè)時(shí)文選讀(968)

軟考 責(zé)任編輯:ice_fish 2005-07-01

添加老師微信

備考咨詢

加我微信

摘要:Stream-processingEngineApplicationsthatprocessreal-timedatastreamsarepushingthelimitsoftraditionaldataprocessingtechnologies.Theseapplicationsarecharacterizedbytheneedforsub-secondresponsetimes——whethertheyinvolveautomatingtrades,monitoringnetworksforintrusions,or

Stream-processing Engine

Applications that process real-time datastreams are pushing the limits of traditional data processing technologies. These applications are characterized by the need for sub-second response times——whether they involve automating trades, monitoring networks for intrusions, or tracking credit card transactions for fraud. Applications that depend on the traditional store-and-query model cannot handle the volume and velocity of streaming data, whose value might exist only in the moment.

A stream-processing engine (SPE) is data management software that enables the execution of queries and computations—— and ultimately, actions——on streaming data in real time. Previously, queries and computations could only be executed with stored data using standard database management systems. An SPE accepts SQL-like, stream-oriented, continuous queries and executes them over live event streams, outputting results in real time.

An SPE achieves real-time operation by integrating several mechanisms. First, it supports inbound processing, in which incoming event streams immediately start to flow through the continuous queries as they enter the system. The queries transform the events as they move, continuously producing results, all in main memory. Read or write operations to storage are optional and can be executed asynchronously in many cases.

Inbound processing overcomes a limitation of the traditional outbound processing model conventional database management systems employ, in which data must be inserted into the database and indexed before any processing can take place. By removing storage from the critical path of processing, an SPE achieves significant performance gains compared with traditional processing approaches.

Second, an SPE adopts a single-process model, in which all time-critical operations (including event processing, storage and execution of custom application logic) are run as part of one multi-threaded process. This integrated approach eliminates high-overhead process switches present in solutions that use multiple software systems to provide the same capabilities.

Third, an SPE provides a flexible, in-process storage model and standards-based access to external databases. In-memory hash tables are used for very fast insert and look-up operations. Embedded databases are used to ensure persistence of data and can be accessed and manipulated using SQL-style declarative queries. External, remote-process databases are accessible through standard Open Database Connectivity calls and are convenient to use when supporting legacy databases or facilitating database sharing with external applications.

An SPE has built-in filtering, aggregating and correlating, and merging operators that manipulate windows of events. Standard SQL is defined over finite-sized tables, and an execution engine thereby knows when it is finished with all its operations. In contrast, streams potentially never end, and an SPE must be instructed when to finish processing and output an answer.

The windowing construct serves this purpose by defining the scope of an operator. In a trading application, a one-hour window can be used to express a stream-oriented query that calculates an hourly volume-weighted average price. Windows are user-configurable and can be defined over time, number of events or breakpoints in other attributes of an event.

Stream-oriented operators provide resiliency to imperfections in datastreams, caused by out-of-order or delayed data arrivals, both of which occur frequently in real-world scenarios. Resiliency is achieved by making operators time-sensitive: Optionally, an operator can be told to wait a longer period of time for out-of-order messages, or timeout and stop waiting for late messages that might never arrive.

Finally, an SPE supports distributed operation for improved scalability and availability. Incremental scalability is achieved by letting processing be partitioned and distributed across multiple machines transparently, without necessitating any changes in the application. High availability is crucial to preserve the integrity of applications and to avoid disruptions in real-time processing.

流處理引擎

處理實(shí)時(shí)數(shù)據(jù)流的應(yīng)用程序正在將傳統(tǒng)的數(shù)據(jù)處理技術(shù)推到極限。這些應(yīng)用程序以亞秒的響應(yīng)時(shí)間為特征的——不管它們是涉及到貿(mào)易自動(dòng)化,為防入侵而監(jiān)視網(wǎng)絡(luò),還是為防詐騙而跟蹤信用卡交易。那些依靠傳統(tǒng)的存儲(chǔ)-查詢模型的應(yīng)用程序已不能滿足流數(shù)據(jù)的量與速度方向的要求,而流數(shù)據(jù)的價(jià)值可能只存在于瞬間之間。

流處理引擎(SPE)是一種數(shù)據(jù)管理軟件,能實(shí)時(shí)地對(duì)流數(shù)據(jù)實(shí)現(xiàn)查詢與計(jì)算、以及最終(應(yīng)采取的)動(dòng)作。過(guò)去,只能對(duì)利用標(biāo)準(zhǔn)數(shù)據(jù)庫(kù)管理系統(tǒng)存儲(chǔ)的數(shù)據(jù)執(zhí)行查詢和計(jì)算,而SPE接收類(lèi)似SQL、面向流的連續(xù)查詢,并執(zhí)行正在發(fā)生的事件流,實(shí)時(shí)地輸出結(jié)果。

SPE是通過(guò)將幾種機(jī)理整合在一起實(shí)現(xiàn)實(shí)時(shí)操作的。首先,支持入處理,即輸入的事件流一進(jìn)入系統(tǒng)就馬上開(kāi)始流經(jīng)連續(xù)的查詢。在它們流動(dòng)時(shí),查詢變換事件,連續(xù)地給出結(jié)果,所有這一切都是在內(nèi)存中進(jìn)行的。對(duì)磁盤(pán)存儲(chǔ)的讀或?qū)懖僮魇强蛇x的,在很多情況下是被異步處理的。

入處理克服了常規(guī)數(shù)據(jù)庫(kù)管理系統(tǒng)使用的傳統(tǒng)出處理的局限,在出處理中,數(shù)據(jù)必須插入數(shù)據(jù)庫(kù),并在開(kāi)始任何處理之前建立索引。通過(guò)將磁盤(pán)存儲(chǔ)排除在處理的關(guān)鍵路徑之外,與傳統(tǒng)的處理方法相比,SPE獲得了明顯的性能提高。

第二,SPE采用了單處理模型,其中所有與時(shí)間密切相關(guān)的操作(包括事件處理、定制的應(yīng)用邏輯的存儲(chǔ)和執(zhí)行)是作為一個(gè)多線索進(jìn)程的一部分運(yùn)行的。這種整合的方法消除了進(jìn)程轉(zhuǎn)換的高開(kāi)銷(xiāo),在使用多個(gè)軟件系統(tǒng)來(lái)提供同樣功能的解決方案中就存在著這種進(jìn)程轉(zhuǎn)換。

第三,SPE提供了一個(gè)靈活的進(jìn)程間存儲(chǔ)模型和基于標(biāo)準(zhǔn)的對(duì)外部數(shù)據(jù)庫(kù)的訪問(wèn)。內(nèi)存中散列表用于極快的插入和查找操作。嵌入的數(shù)據(jù)庫(kù)用于確保數(shù)據(jù)的一致性,以及能利用SQL風(fēng)格的描述性查詢進(jìn)行的訪問(wèn)和操縱。外部的、遠(yuǎn)程進(jìn)程數(shù)據(jù)庫(kù)通過(guò)標(biāo)準(zhǔn)的“開(kāi)放數(shù)據(jù)庫(kù)互連”調(diào)用進(jìn)行訪問(wèn),當(dāng)要支持過(guò)時(shí)的數(shù)據(jù)庫(kù)時(shí),這種數(shù)據(jù)庫(kù)用起來(lái)很方便,能方便地實(shí)現(xiàn)數(shù)據(jù)庫(kù)與外部應(yīng)用程序的共享。

SPE擁有內(nèi)在的過(guò)濾、聚合和相關(guān)、以及合并操作符,它們操縱事件的窗口。標(biāo)準(zhǔn)SQL定義在有限大小的表格之上,從而執(zhí)行引擎知道何時(shí)完成了所有的操作。相反,流存在著永不結(jié)束的潛在可能,在結(jié)束處理和輸出答案時(shí)SPE必須要有指令。

通過(guò)定義操作符的范圍,窗口構(gòu)建為此目的服務(wù)。在傳統(tǒng)的應(yīng)用程序中,一小時(shí)的窗口可以用來(lái)表達(dá)計(jì)算以小時(shí)為量加權(quán)的面向流的查詢。窗口是用戶可以配置的,可以定義在時(shí)間、事件數(shù)量或者一個(gè)事件中其他屬性的斷開(kāi)點(diǎn)上。

面向流的操作符對(duì)數(shù)據(jù)流中因次序破壞或數(shù)據(jù)達(dá)到的延誤造成的破壞提供了彈性,而這兩種情況在現(xiàn)實(shí)世界中是經(jīng)常發(fā)生的。彈性是通過(guò)使操作符對(duì)時(shí)間敏感而獲得的。操作符可以有選擇地被告知,對(duì)失序的信息等待更長(zhǎng)一些時(shí)間,或者規(guī)定的時(shí)間用完不再等待可能永遠(yuǎn)不會(huì)到來(lái)的過(guò)時(shí)信息。

最后,SPE支持改進(jìn)可擴(kuò)性和可用性的分布式操作。增強(qiáng)可擴(kuò)性是通過(guò)讓處理分割并透明地分布到多個(gè)機(jī)器上實(shí)現(xiàn)的,不必修改應(yīng)用程序。高可用性對(duì)保留應(yīng)用程序的完整性是至關(guān)重要的,可避免實(shí)時(shí)處理的中斷。

更多資料
更多課程
更多真題
溫馨提示:因考試政策、內(nèi)容不斷變化與調(diào)整,本網(wǎng)站提供的以上信息僅供參考,如有異議,請(qǐng)考生以權(quán)威部門(mén)公布的內(nèi)容為準(zhǔn)!

軟考備考資料免費(fèi)領(lǐng)取

去領(lǐng)取

!
咨詢?cè)诰€老師!