摘要:TopicMapsComputershavesooverloadeduswithdata,itbecomesincreasinglydifficulttofindtheinformationweseek.Beginninginthe1990s,powerfulsearchengineslikeYahoo,AltaVistaandGooglemadetheWebanincomparablyvaluableinformationresource,butthegrowthofavailableinformati
Topic Maps
Computers have so overloaded us with data, it becomes increasingly difficult to find the information we seek. Beginning in the 1990s, powerful search engines like Yahoo, AltaVista and Google made the Web an incomparably valuable information resource, but the growth of available information has rendered even those remarkable tools far less useful. Google currently indexes more than 4 billion pages, and queries often return tens of thousands of pages, but they are arranged in no discernable order.
One promising approach, still in its infancy, is called topic mapping. A topic map is a kind of data structure, just as an outline or a set of categories is. In practice, topic maps were standardized by the International Standards Organization in 2000 (ISO/IEC 13250) as XML Topic Maps, or XTM. XTM provides a basic model using XML tags to represent the structure of information resources, concepts and the relationships between them.
How It Works
Let's start with a subject, a real-world entity or an idea that we're representing in our map by topic. A subject can be almost anything, from an abstract concept to a specific document section, and the terms subject and topic are often used interchangeably.
The topic map model lets us attach three elements (called characteristics) to any given topic: its names, its associations with other topics and its occurrences (also called resources).
Names are mainly useful to people in dealing with topics, and a topic doesn't actually need a name: A typical cross-reference points to an unnamed topic. Also, we typically group topics according to some notion of type.
For example, if we're mapping an IT installation, we likely have topics for specific pieces of equipment, homegrown and purchased applications, data storage information and the like. Thus, our map would also include categorical topics such as hardware, software and data structures.
Associations are the conceptual heart of topic maps, indicating how one topic relates to another. For example, Book A (a topic) is written by (association) Author B (another topic).
Occurrences are the actual references—pointers to relevant information resources. Occurrences could include articles, books, images, audio and video segments, application code routines or even people. Typically, we refer to occurrences with uniform resource identifiers (URI), an Internet Engineering Task Force standard for addressing and referencing resources. Web address URLs are a type of URI.
These characteristics of topics aren't universal. They exist within a limited context (called scope), where they are regarded as valid.
The final concept is identity. Ideally, there should be one topic for each subject, and vice versa. In practice, multiple topics can represent a single subject, as when different topic maps are merged. And in a single topic map, we might find “William F. Bonney” and “Billy the Kid” as separate topic names referring to the same subject, a historical person.
But the topic name“Billy the Kid”might also refer to the ballet. To get around these problems, we can unambiguously define the identity of a subject through resources called subject indicators.
The promise of topic maps is clear. Unfortunately, the idea of topic maps is still well ahead of its time. Tools for creating topic maps do exist, along with some implementations in specific subject areas, but these are primarily oriented toward representing and organizing content, and they don't yet adequately address the task of content creation.
But in a few more years, as Moore's Law continues to expand our computing capabilities, we may well see topic maps come into their own.
時文選讀
主題圖
計算機給我們帶來了太多的數(shù)據(jù),要找到我們所需數(shù)據(jù)已變得非常困難。從上世紀九十年代開始, Yahoo、AltaVista 和Google等強大的搜索引擎讓萬維網(wǎng)成為價值無可比擬的信息源泉,但是可獲得信息的(快速)增長使得這些著名的工具也變得不太有用。目前,Google對40多億頁編了索引,一次查詢常常返回數(shù)以萬計的頁面,而它們的排列又是沒有可辨識的次序。
一個尚處于幼年期的叫作主題圖的方法前途無量。主題圖是一類數(shù)據(jù)結(jié)構(gòu),類似于一個綱要或者一組分類。實際上,主題圖已由國際標準化組織在2000年進行了標準化(ISO/IEC13250),稱作XML主題圖,縮寫為XTM。XTM提供了利用XML標記的基本模式,來表示信息資源、概念和它們之間關(guān)系的結(jié)構(gòu)。
它是如何工作的?
讓我們從一個題目、一個真實世界的實體或一個觀念開始,把它在按題目的圖中表達出來。題目幾乎可以是任何東西,從抽象的概念到具體的文檔章節(jié),“題目”和“主題”術(shù)語常常交換使用。
主題圖模型讓我們對任何給出的主題附加三種成分(稱作特性):名字、與其他主題的聯(lián)系以及事件(也稱作資源)。
名字主要在處理主題時用于人,主題實際上不需要名字:典型的交叉引用指向沒有命名的主題。我們通常也根據(jù)某個類型觀念給主題分組。
例如,我們?nèi)粢oIT設(shè)備歸并主題,我們有可能擁有特定設(shè)備、自制的和購買的應(yīng)用程序、數(shù)據(jù)存儲信息等的名字。因而我們的圖也包括類別主題,如硬件、軟件和數(shù)據(jù)結(jié)構(gòu)。
聯(lián)系是主題圖的核心概念,指出了一個主題是如何與另一個主題發(fā)生關(guān)系的。例如,A書(一個主題)是由(聯(lián)系)B作者(另一個主題)寫的。
事件是實際的引用——對有關(guān)信息資源的指針。事件可能包括文章、書籍、圖像、音頻和視頻片斷、應(yīng)用程序的例行子程序或者甚至是人。通常,我們利用統(tǒng)一的資源標識符(URI)引用事件。URI是因特網(wǎng)工程任務(wù)組提出的尋址和引用資源的標準。萬維網(wǎng)地址URL就是URI的一個類型。
主題的這些特性不是通用的。它們存在于有限制的上下文中(稱作范圍),在此范圍內(nèi)它們是正確的。
最后一個概念是同一性。在理想的情況下,對應(yīng)每一個題目就應(yīng)該有一個主題,反之亦然。在實際中,可以有多個主題表達單個題目,如在合并不同主題圖時就是這樣。在單一主題圖中,我們可能會發(fā)現(xiàn)“William F. Bonney” 和“Billy the Kid”不同的主體名稱都是指同一題目,即一位歷史人物。
但是“Billy the Kid”這個主題名可能也是指一個芭蕾舞曲的名稱。為了避免這類問題,我們可以通過一個稱作題目指示符的資源來清晰地定義主題的同一性。
主題圖的未來是光明的??上?,主題圖的思想超前了。生成主題圖的工具的確存在,并在一些特定的題目范圍內(nèi)已有實現(xiàn),但是它們主要是面向表達和組織內(nèi)容的,它們還不能完全解決內(nèi)容生成這個任務(wù)。
但在今后的幾年中,隨著摩爾定律繼續(xù)提高我們的計算能力,我們將看到主題圖會發(fā)出它應(yīng)有的光輝。
軟考備考資料免費領(lǐng)取
去領(lǐng)取