摘要:TopicMapsComputershavesooverloadeduswithdata,itbecomesincreasinglydifficulttofindtheinformationweseek.Beginninginthe1990s,powerfulsearchengineslikeYahoo,AltaVistaandGooglemadetheWebanincomparablyvaluableinformationresource,butthegrowthofavailableinformati
Topic Maps
Computers have so overloaded us with data, it becomes increasingly difficult to find the information we seek. Beginning in the 1990s, powerful search engines like Yahoo, AltaVista and Google made the Web an incomparably valuable information resource, but the growth of available information has rendered even those remarkable tools far less useful. Google currently indexes more than 4 billion pages, and queries often return tens of thousands of pages, but they are arranged in no discernable order.
One promising approach, still in its infancy, is called topic mapping. A topic map is a kind of data structure, just as an outline or a set of categories is. In practice, topic maps were standardized by the International Standards Organization in 2000 (ISO/IEC 13250) as XML Topic Maps, or XTM. XTM provides a basic model using XML tags to represent the structure of information resources, concepts and the relationships between them.
How It Works
Let's start with a subject, a real-world entity or an idea that we're representing in our map by topic. A subject can be almost anything, from an abstract concept to a specific document section, and the terms subject and topic are often used interchangeably.
The topic map model lets us attach three elements (called characteristics) to any given topic: its names, its associations with other topics and its occurrences (also called resources).
Names are mainly useful to people in dealing with topics, and a topic doesn't actually need a name: A typical cross-reference points to an unnamed topic. Also, we typically group topics according to some notion of type.
For example, if we're mapping an IT installation, we likely have topics for specific pieces of equipment, homegrown and purchased applications, data storage information and the like. Thus, our map would also include categorical topics such as hardware, software and data structures.
Associations are the conceptual heart of topic maps, indicating how one topic relates to another. For example, Book A (a topic) is written by (association) Author B (another topic).
Occurrences are the actual references—pointers to relevant information resources. Occurrences could include articles, books, images, audio and video segments, application code routines or even people. Typically, we refer to occurrences with uniform resource identifiers (URI), an Internet Engineering Task Force standard for addressing and referencing resources. Web address URLs are a type of URI.
These characteristics of topics aren't universal. They exist within a limited context (called scope), where they are regarded as valid.
The final concept is identity. Ideally, there should be one topic for each subject, and vice versa. In practice, multiple topics can represent a single subject, as when different topic maps are merged. And in a single topic map, we might find “William F. Bonney” and “Billy the Kid” as separate topic names referring to the same subject, a historical person.
But the topic name“Billy the Kid”might also refer to the ballet. To get around these problems, we can unambiguously define the identity of a subject through resources called subject indicators.
The promise of topic maps is clear. Unfortunately, the idea of topic maps is still well ahead of its time. Tools for creating topic maps do exist, along with some implementations in specific subject areas, but these are primarily oriented toward representing and organizing content, and they don't yet adequately address the task of content creation.
But in a few more years, as Moore's Law continues to expand our computing capabilities, we may well see topic maps come into their own.
時(shí)文選讀
主題圖
計(jì)算機(jī)給我們帶來(lái)了太多的數(shù)據(jù),要找到我們所需數(shù)據(jù)已變得非常困難。從上世紀(jì)九十年代開(kāi)始, Yahoo、AltaVista 和Google等強(qiáng)大的搜索引擎讓萬(wàn)維網(wǎng)成為價(jià)值無(wú)可比擬的信息源泉,但是可獲得信息的(快速)增長(zhǎng)使得這些著名的工具也變得不太有用。目前,Google對(duì)40多億頁(yè)編了索引,一次查詢常常返回?cái)?shù)以萬(wàn)計(jì)的頁(yè)面,而它們的排列又是沒(méi)有可辨識(shí)的次序。
一個(gè)尚處于幼年期的叫作主題圖的方法前途無(wú)量。主題圖是一類數(shù)據(jù)結(jié)構(gòu),類似于一個(gè)綱要或者一組分類。實(shí)際上,主題圖已由國(guó)際標(biāo)準(zhǔn)化組織在2000年進(jìn)行了標(biāo)準(zhǔn)化(ISO/IEC13250),稱作XML主題圖,縮寫為XTM。XTM提供了利用XML標(biāo)記的基本模式,來(lái)表示信息資源、概念和它們之間關(guān)系的結(jié)構(gòu)。
它是如何工作的?
讓我們從一個(gè)題目、一個(gè)真實(shí)世界的實(shí)體或一個(gè)觀念開(kāi)始,把它在按題目的圖中表達(dá)出來(lái)。題目幾乎可以是任何東西,從抽象的概念到具體的文檔章節(jié),“題目”和“主題”術(shù)語(yǔ)常常交換使用。
主題圖模型讓我們對(duì)任何給出的主題附加三種成分(稱作特性):名字、與其他主題的聯(lián)系以及事件(也稱作資源)。
名字主要在處理主題時(shí)用于人,主題實(shí)際上不需要名字:典型的交叉引用指向沒(méi)有命名的主題。我們通常也根據(jù)某個(gè)類型觀念給主題分組。
例如,我們?nèi)粢oIT設(shè)備歸并主題,我們有可能擁有特定設(shè)備、自制的和購(gòu)買的應(yīng)用程序、數(shù)據(jù)存儲(chǔ)信息等的名字。因而我們的圖也包括類別主題,如硬件、軟件和數(shù)據(jù)結(jié)構(gòu)。
聯(lián)系是主題圖的核心概念,指出了一個(gè)主題是如何與另一個(gè)主題發(fā)生關(guān)系的。例如,A書(shū)(一個(gè)主題)是由(聯(lián)系)B作者(另一個(gè)主題)寫的。
事件是實(shí)際的引用——對(duì)有關(guān)信息資源的指針。事件可能包括文章、書(shū)籍、圖像、音頻和視頻片斷、應(yīng)用程序的例行子程序或者甚至是人。通常,我們利用統(tǒng)一的資源標(biāo)識(shí)符(URI)引用事件。URI是因特網(wǎng)工程任務(wù)組提出的尋址和引用資源的標(biāo)準(zhǔn)。萬(wàn)維網(wǎng)地址URL就是URI的一個(gè)類型。
主題的這些特性不是通用的。它們存在于有限制的上下文中(稱作范圍),在此范圍內(nèi)它們是正確的。
最后一個(gè)概念是同一性。在理想的情況下,對(duì)應(yīng)每一個(gè)題目就應(yīng)該有一個(gè)主題,反之亦然。在實(shí)際中,可以有多個(gè)主題表達(dá)單個(gè)題目,如在合并不同主題圖時(shí)就是這樣。在單一主題圖中,我們可能會(huì)發(fā)現(xiàn)“William F. Bonney” 和“Billy the Kid”不同的主體名稱都是指同一題目,即一位歷史人物。
但是“Billy the Kid”這個(gè)主題名可能也是指一個(gè)芭蕾舞曲的名稱。為了避免這類問(wèn)題,我們可以通過(guò)一個(gè)稱作題目指示符的資源來(lái)清晰地定義主題的同一性。
主題圖的未來(lái)是光明的??上В黝}圖的思想超前了。生成主題圖的工具的確存在,并在一些特定的題目范圍內(nèi)已有實(shí)現(xiàn),但是它們主要是面向表達(dá)和組織內(nèi)容的,它們還不能完全解決內(nèi)容生成這個(gè)任務(wù)。
但在今后的幾年中,隨著摩爾定律繼續(xù)提高我們的計(jì)算能力,我們將看到主題圖會(huì)發(fā)出它應(yīng)有的光輝。
軟考備考資料免費(fèi)領(lǐng)取
去領(lǐng)取
共收錄117.93萬(wàn)道題
已有25.02萬(wàn)小伙伴參與做題