Federal Information Management

Addressing critical issues faced by the U.S. Federal Government in managing its information resources: information architecture, information assurance and security, sharing, search, and others.

Sunday, June 25, 2006

Information sharing

Thanks to Alice Marshall for pointing me to the June 5 GCN article "Frictionless data: let it flow" on challenges pertaining to government-wide information-sharing efforts.
Below are some of the noteworthy points:

(a) What purpose is information sharing supposed to serve? What "functionality" within the government should it support? An example from the 9/11 Commission Report:
"Action officers should have been able to draw on all available knowledge about al-Qaeda in the government. Management should have ensured that information was shared."

(b) One major type of challenges to successful information sharing is political - it refers to the situation when "those that hold information don't feel all that strong of a need to share [it]"; overcoming this challenge involves, to a large extent, changing an organizational culture, which supports what is known "information hoarding" (see Rob Fay's discussion of effective organizational culture change with implications for improving information sharing).

(c) Another type of challenges is technical - one example is getting different computer systems to use the same data set. This can be resolved, in particular, by creating a vocabulary of common terms for describing data elements (see, for instance, DEEDS - Data Elements for Emergency Department Systems).

(d) In order to successfully implement such a vocabulary, it is essential to have it defined through collaboration between the owners of those systems, rather than have it handed to them by some designated government agency.

Click here to view the article.

Technorati tags: , Federal Government, ,

Wednesday, June 21, 2006

"The Economist" on Internet search engines

An article published in the June 15 edition of The Economist discusses some of the latest trends in the Internet search market.

Where are the Internet search technology and the market headed towards?

Below are a few interesting points from the article:

(a) Two major barriers to entry into the search business are (1) the lack of engineering talent and (2) the need to build large data centers that can simultaneously support millions of searches;

(b) How to ensure the quality of search results – that is, their relevance to what the user is searching for? What mechanisms (or, rather algorithms) do the leading search providers currently use to accomplish this task?

Google's PageRank is based on the number of links each search result receives from other Web pages (similar to academic papers that are recognized based on how many times they were cited in other research).

The mechanism utilized by Yahoo! extends the above algorithm by focusing on the links (that each search result receives) that are associated with actual people. (This refers to solving what is known "subjective queries".) As one may notice, this mechanism works best when people "tag" information that they see on the Web (or, in other words, create metadata). Correspondingly, a limitation of this mechanism is that it may require a lot of work on the part of users.

Finally, Ask (formerly known as "Ask Jeeves") uses an algorithm called "ExpertRank": similar to PageRank, this algorithm looks at popularity of each particular result among other Web pages. However, instead of looking at links from all possible pages, it only focuses at links from the most popular pages within the topic of the search. This mechanism is said to often return better results than Google's.

Click here to view the article.

Technorati tags: , Federal Government, , , ,

Sunday, June 18, 2006

Solution for insider threats to information systems

On June 1st, the Defense Information Systems Agency (DISA) issued a Request for Information (RFI) for insider threat-focused observation solution, which would be able to "detect authorized users who conduct malicious activity within a network or on a system". (Click here to read a synopsis of the RFI.)

How does the DISA define an insider threat?

According to the X.509 Certificate Policy for the Defense Department (released in February 2005), an insider threat is “an entity with authorized access that has the potential to harm an information system through destruction, disclosure, modification of data, and/or denial of service”.

The DISA distinguishes three scenarios of incidents associated with insider threats:
1. An authorized user "accidentally or inadvertently commits or omits some action that damages or compromises the system, one of its components, or information processed, stored, or transmitted by the system";
2. An authorized user "takes deliberate action to damage the system, one of its components, or its data for personal gain or vengeful reasons";
3. A "co-option of users with authorized access to the system, contractor support personnel, or employees with physical access to the system components arising with the motivation of financial gain".

As presented in a DISA memo titled "Threat Description And Environment" (a Microsoft Word file), the insider threats can be manifested in the following ways:
(a) The unauthorized reading, copying or disclosure of sensitive information;
(b) The execution of denial of services attacks;
(c) The introduction into the system of viruses, worms or other malicious software,
(d) The destruction or corruption of data (intentional or unintentional);
(e) The exposure of sensitive data to compromise through the improper labeling or handling of printed output;
(f) The improper labeling or handling of magnetic media resulting in the compromise of sensitive information.

Also, as announced during the recent DISA Industry Day last March, the Agency is planning acquisition of the following two capabilities:
1. The "Act Capability", which identifies the potential insider threat (the RFP will be released next months), and
2. The "React Capability", which verifies the threat and takes necessary actions to eliminate it (the RFP will be released in January 2007).

Technorati tags: , Federal Government, , , , ,

Tuesday, June 06, 2006

"RFP checklist" for Enterprise Search - Part 2

In his recent comment to the previous post on this blog, Patrick Cormier made a special note of the second last item on Sean M. Gallagher’s "RFP checklist" for enterprise search – "make vendors demonstrate search precision".

As Mr. Gallagher then points out (in the same item),
"Enterprise users (as opposed to general Google users) don't want hundreds of pages of results. They want exact answers to their queries. Ask search vendors how they tune their engines for better precision..."

It may be noted that achieving greater search precision may also involve supplementing the search engine with a Business Intelligence tool if most of the search within an agency deals with structured (and known) data sources (for example, a database of small business owners' demographic characteristics) as opposed to unstructured (and largely unknown) ones (such as collections of Web pages). Business Intelligence tools are designed to return exact answers when search queries are executed over structured datasets.

(Among interesting reviews of search vs. Business Intelligence is an article published by Information Week last March. It points, for example, to a growing trend when search-engine companies, such as Google, form partnerships with online information providers, such as the Securities and Exchange Commission's Edgar database, to connect the search engines to the metadata underlying those databases.)

Given this, a slight modification to Mr. Gallagher's "checklist" may be suggested. This modification includes:
(1) Specifying (in the third item) that the concept of "federated search" refers to executing a single query over both structured and unstructured data sources;
(2) Putting the second last item right next to the third one; and
(3) Adding the following sentence to the beginning of the second last item - "If your users need to perform federated searches, consider implementing a Business Intelligence tool".

To conclude, I would like to thank Mr. Gallagher for this excellent summary.

Technorati tags: , Federal Government, , , ,

Thursday, June 01, 2006

"RFP checklist" for Enterprise Search

One particularly noteworthy item published by GCN last month was the so-called "RFP checklist" for Enterprise Search prepared by Sean M. Gallagher. It suggests how a government agency, which considers investing in an enterprise search solution, can clearly articulate its needs with regard to such solutions, and, correspondingly, ensure that that investment would fit in with the agency's business requirements.

Mr. Gallagher proposes to look at several characteristics that are common to most of enterprise search tools available on the market and then specify desired "parameters" for those characteristics. One such characteristic, for example, is the type of content that would be indexed - if the agency needs to be able to search not only standard office automation document formats, but also images, video and audio content, then it should consider purchasing a specialized tool, rather than a "household-name" product.

Click here to access the article.

Technorati tags: , Federal Government, , , ,