計算機專業(yè)時文選讀之二

軟考 責任編輯:zhxlbj2 2004-12-31

添加老師微信

備考咨詢

加我微信

摘要:DataCubesDEFINITION:Adatacubeisatypeofmultidimensionalmatrixthatletsusersexploreandanalyzeacollectionofdatafrommanydifferentperspectives,usuallyconsideringthreefactors(dimensions)atatime.Whenwetrytoextractinformationfromastackofdata,weneedtoolstoh

Data Cubes

DEFINITION: A data cube is a type of multidimensional matrix that lets users explore and analyze a collection of data from many different perspectives, usually considering three factors (dimensions) at a time.

When we try to extract information from a stack of data, we need tools to help us find what's relevant and what's important and to explore different scenarios. A report, whether printed on paper or viewed on-screen, is at best a two-dimensional representation of data, a table using columns and rows. That's sufficient when we have only two factors to consider, but in the real world we need more powerful tools.

Data cubes are multidimensional extensions of 2-D tables, just as in geometry a cube is a three-dimensional extension of a square. The word cube brings to mind a 3-D object, and we can think of a 3-D data cube as being a set of similarly structured 2-D tables stacked on top of one another.

But data cubes aren't restricted to just three dimensions. Most online analytical processing (OLAP) systems can build data cubes with many more dimensions—Microsoft SQL Server 2000 Analysis Services, for example, allows up to 64 dimensions. We can think of a 4-D data cube as consisting of a series of 3-D cubes, though visualizing such higher-dimensional entities in spatial or geometric terms can be a problem.

In practice, therefore, we often construct data cubes with many dimensions, but we tend to look at just three at a time. What makes data cubes so valuable is that we can index the cube on one or more of its dimensions.

Relational or Multidimensional?

Since data cubes are such a useful interpretation tool, most OLAP products are built around a structure in which the cube is modeled as a multidimensional array. These multidimensional OLAP, or MOLAP, products typically run faster than other approaches, primarily because it's possible to index directly into the data cube's structure to collect subsets of data.

However, for very large data sets with many dimensions, MOLAP solutions aren't always so effective. As the number of dimensions increases, the cube becomes sparser—that is, many cells representing specific attribute combinations are empty, containing no aggregated data. As with other types of sparse databases, this tends to increase storage requirements, sometimes to unacceptable levels. Compression techniques can help, but using them tends to destroy MOLAP's natural indexing. ?

Data cubes can be built in other ways. Relational OLAP uses the relational database model. The ROLAP data cube is implemented as a collection of relational tables (up to twice as many as the number of dimensions) instead of as a multidimensional array. Each of these tables, called a cuboid, represents a particular view.

Because the cuboids are conventional database tables, we can process and query them using traditional RDBMS techniques, such as indexes and joins. This format is likely to be efficient for large data collections, since the tables must include only data cube cells that actually contain data.

However, ROLAP cubes lack the built-in indexing of a MOLAP implementation. Instead, each record in a given table must contain all attribute values in addition to any aggregated or summary values. This extra overhead may offset some of the space savings, and the absence of an implicit index means that we must provide one explicitly.

From a structural perspective, data cubes are made up of two elements: dimensions and measures. Dimensions are already explained; measures are simply the actual data values.

It's important to keep in mind that the data in a data cube has already been processed and aggregated into cube form. Thus we normally don't perform calculations within a data cube. This also means that we're not looking at real-time, dynamic data in a data cube.

The data contained within a cube has already been summarized to show figures such as unit sales, store sales, regional sales, net sale profits and average time for order fulfillment. With this data, an analyst can efficiently analyze any or all of those figures for any or all products, customers, sales agents and more. Thus data cubes can be extremely helpful in establishing trends and analyzing performance. In contrast, tables are best suited to reporting standardized operational scenarios.

時文選讀

數(shù)據(jù)立方體

定義:數(shù)據(jù)立方體是一類多維矩陣,讓用戶從多個角度探索和分析數(shù)據(jù)集,通常是一次同時考慮三個因素(維度)。

當我們試圖從一堆數(shù)據(jù)中提取信息時,我們需要工具來幫助我們找到那些有關(guān)聯(lián)的和重要的信息,以及探討不同的情景。一份報告,不管是印在紙上的還是出現(xiàn)在屏幕上,都是數(shù)據(jù)的二維表示,是行和列構(gòu)成的表格。在我們只有兩個因素要考慮時,這就足矣,但在真實世界中我們需要更強的工具。

數(shù)據(jù)立方體是二維表格的多維擴展,如同幾何學(xué)中立方體是正方形的三維擴展一樣。 “立方體”這個詞讓我們想起三維的物體,我們也可以把三維的數(shù)據(jù)立方體看作是一組類似的互相疊加起來的二維表格。

但是數(shù)據(jù)立方體不局限于三個維度。大多數(shù)在線分析處理( OLAP)系統(tǒng)能用很多個維度構(gòu)建數(shù)據(jù)立方體,例如,微軟的SQL Server 2000 Analysis Services工具允許維度數(shù)高達64個(雖然在空間或幾何范疇想像更高維度的實體還是個問題)。

在實際中,我們常常用很多個維度來構(gòu)建數(shù)據(jù)立方體,但我們傾向于一次只看三個維度。數(shù)據(jù)立方體之所以有價值,是因為我們能在一個或多個維度上給立方體做索引。

關(guān)系的還是多維的?

由于數(shù)據(jù)立方體是一個非常有用的解釋工具,所以大多數(shù) OLAP產(chǎn)品都圍繞著按多維陣列建立立方模型這樣一個結(jié)構(gòu)編制。這些多維的OLAP產(chǎn)品,即MOLAP產(chǎn)品,運行速度通常比其他方法更快,這是因為能直接把索引做進數(shù)據(jù)立方的結(jié)構(gòu),方便收集數(shù)據(jù)子集。

然而,對于非常大的多維數(shù)據(jù)集, MOLAP方案并不總是有效的。隨著維度數(shù)目的增加,立方體變得更稀疏,即表示某些屬性組合的多個單元是空的,沒有集合的數(shù)據(jù)。相對于其他類型的稀疏數(shù)據(jù)庫,數(shù)據(jù)立方體往往會增加存儲需求,有時會達到不能接受的程度。壓縮技術(shù)能有些幫助,但利用這些技術(shù)往往會破壞MOLAP的自然索引。

數(shù)據(jù)立方體還可以用其他的方法構(gòu)建。關(guān)系 OLAP就利用了關(guān)系數(shù)據(jù)庫模型。ROLAP數(shù)據(jù)立方體是按關(guān)系表格的集合實現(xiàn)的(最多可達維度數(shù)目的兩倍),來代替多維陣列。其中的表格叫做立方單元,代表特定的視圖。

由于立方單元是一個常規(guī)的數(shù)據(jù)庫表格,所以我們能用傳統(tǒng)的 RDBMS技術(shù)(如索引和連接)來處理和查詢它們。這種形式對大量的數(shù)據(jù)集合可能是有效的,因為這些表格必須只能包含實際有數(shù)據(jù)的數(shù)據(jù)立方單元。

但是 ROLAP缺少了用MOLAP實現(xiàn)時所具有的內(nèi)在索引功能。相反,給定表格中的每個記錄必須包括所有的屬性值而任何集合的或摘要的數(shù)據(jù)。這種額外的開銷可能會抵消掉一些節(jié)省出來的空間,而隱性索引的缺少意味著我們必須提供顯性的索引。

從結(jié)構(gòu)角度看,數(shù)據(jù)立方體由兩個單元構(gòu)成:維度和測度。維度已經(jīng)解釋過了,測度就是實際的數(shù)據(jù)值。

記住這點是很重要的:數(shù)據(jù)立方體中的數(shù)據(jù)是已經(jīng)過處理并聚合成立方形式。因此,通常不需要在數(shù)據(jù)立方體中進行計算。這也意味著我們看到數(shù)據(jù)立方體中的數(shù)據(jù)并不是實時的、動態(tài)的數(shù)據(jù)。

立方體中的數(shù)據(jù)已經(jīng)過摘要,表示諸如計件銷售、店面銷售、區(qū)域銷售、銷售純利和完成訂單的平均時間等數(shù)據(jù)。有了這些數(shù)據(jù),分析師能針對一個或全部產(chǎn)品、客戶、銷售代理等,就這些數(shù)字中的一個或全部進行分析。這樣,在預(yù)測趨勢和分析業(yè)績時,數(shù)據(jù)立方體就非常有用,而表格最適合報告標準化的運作情況。

更多資料
更多課程
更多真題
溫馨提示:因考試政策、內(nèi)容不斷變化與調(diào)整,本網(wǎng)站提供的以上信息僅供參考,如有異議,請考生以權(quán)威部門公布的內(nèi)容為準!

軟考備考資料免費領(lǐng)取

去領(lǐng)取

!
咨詢在線老師!