taxonomy services

There’s a lot of content on your company’s website or intranet, but how do your website visitors or intranet users find the information that they need? They might choose to browse around the site by means of navigational links. Or they might instead chose to use the site search engine. Either way, how easy is it for them to find what they are looking for?

Having searched for information yourself on large websites, you already know that it’s not always easy to find the specific content that you want. Here’s what Intranet Design Magazine wrote about the challenges of information retrieval. Although it discuss company intranets, it applies to websites as well:

As company intranets grow in content, it becomes increasingly difficult to find the exact information that an intranet user may be looking for. Companies have traditionally used search engines to locate information on their intranets. However, many are finding that search engines (even the newer, so-called “intelligent” ones) are just not enough.

For example, perhaps you are looking up information on a particular subject. You type the word into the search engine interface and click “GO!” Within seconds you have a list of retrieved documents. But there are 87 of them! And there is little indication as to which document might be the one that you need. You have a choice of clicking on some of the entries with the hopes that the needed information will be within the first few documents, or spending literally hours combing through each one of them.

Why is it so difficult to find what you need?

Intelligent – Not!

One reason is the manner in which search engines operate. Generally search engines look for every occurrence of the word which was typed into the search interface. Upon finding them, it lists each and every document containing that word. However, the topic may only be mentioned within some of the documents, with no information of real value.

Also, you may be searching for more specific information regarding the topic, but are not sure how to narrow your search. Or perhaps the documents use certain words or phrases within the text, and although you are typing in synomynous terms, they are not the exact terms needed. Or it may be that a word is simply misspelled.

A search engine, like other computer automata, can’t allow for such errors.

For example, if you work in an insurance company, you may be looking for information regarding “theft.” Some of the documents use this precise word, so the search engine grabs those pages. But it does not retrieve any of the pages using the term “robbery” or “thievery.” You may not even understand why the search engine retrieved certain documents. In many instances, only the title of the document is listed, which doesn’t tell you much.

One way of improving the relevance of search results is to look for keywords that can be inserted as “metadata” within the pages of the intranet. This is one of the promises of eXtended Markup Language (XML), which (among other things) lets authors tag pages precisely so users can more easily find them.

But metadata is no panacea. For one thing, the user may still be unsure how to narrow a search, resulting in an overabundance of irrelevant hits. Moreover, word processing tools like Microsoft Word have long given authors the ability to add metadata to documents. Yet how many times have you filled in those Summary Info fields? Any information retrieval scheme that relies on people to categorize their ideas will at best be limited, and at worst may interfere with the creation of intellectual capital.

In addition to the above-mentioned articles, Brian O’Leary (in Book:A Futurist’s Manifesto) had this to say:

“Digital abundance is pushing publishers to create much more than title-level metadata. To manage abundance, publishers and their agents can (and do) use blunt instruments, like verticals, or somewhat more elegant tools, like search engines.But when it comes to discovery, access, and utility, nothing substitutes for authorial and editorial judgment, as evidenced in the structural and contextual tags applied to our content.

Context can’t be just a preference or an afterthought any more. Early and deep tagging is a search reality. In structural terms, our content fits search conventions, or it will not be referenced. And in contextual terms, our content needs to be deeply and consistently tagged, or it will face an increasingly tough time being found.”

So what should you do to overcome these challenges? Simply put, talk to us about our taxonomy services. Here are some essential steps we take when providing taxonomy services, to ensure quality information retrieval:

  1. To start off, all of your web pages or intranet pages must contain keywords. This is a form of metadata which is a great aid in information retrieval.
  2. As mentioned in the quote above, keyword tags must be applied consistently for all pages. Synonymous terms must be entered by whoever inputs the keywords. This allows users to search for an item or concept using different terms or phrases that mean the same thing as the terms or phrases used in the text of the web or intranet pages.
  3. Keywords must also include broader terms for the word. An example of this (for a medical web site) would be the phrase “root canal.” If the page was only tagged with the term “root canal” and a user typed the term “dentistry” into the site search box, the page would not be retrieved, even though it most certainly is about dentistry.
  4. Keywords must include narrower terms for the word. Just flip the situation we just mentioned around. If a page was only tagged with the term “dentistry” because that is the main topic of the page, but several paragraphs discussed root canals, and a user typed in the term “root canal” into the site search box, the page would not be retrieved, even though there is some information on it about root canals.
  5. It would also be helpful for users to be guided to related information, by means of “See also” references.

So how can all of this vocabulary be controlled? By creating a taxonomy or a knowledge organization system which clearly defines synonymous terms, broader terms, narrow terms and even relationships between terms.

That’s what we do.

The result is users find all of the information that they are looking for, without having to wade through hundreds (or even thousands) of irrelevant documents.

BIM has created taxonomies (or helped implement them) for medical and financial institutions as well as for ecommerce sites. Here are some of the projects that have we worked on:

Healthstream, Inc:

Healthstream is a provider of online medical training courses, supplying content for WebMD and other Web-based educational resources for the medical community. Healthstream had a search engine on their site, but it was hard to use to find specific courses that they offered. They turned to BIM. We developed a taxonomy of all of the major terms that appeared within the courses. We then went about creating keyword meta-tags for each of the 774 courses. Healthstream is pleased to now have a keyword system that works along with their existing search engine to retrieve truly relevant information.

Whisk already had an existing, faceted taxonomy, composed of recipe ingredients. Kevin supervised a team of Spanish-language annotators who entered the ingredients and categorized terms by facets such as attributes, forms, varieties, etc. This allows for machine reading of the recipe ingredients.

Brookhaven National Laboratory (for the U.S. Dept. of Energy):

The United States Department of Energy wanted a better system for searching through their Standards Based Management Systems on their web site. Kevin created a controlled vocabulary for use in keyword tagging of their content, along with a site index. The index alone added up to over 200 pages when printed out. BNL created a navigation system for internal use, based on the taxonomy.

21st Century Online:

BIM cataloged articles within their online magazine. We also provided consulting services regarding structure, navigation, and labeling schemes for their web site.

Twenty-First Century Investors:

Desiring an intuitive method of searching through their web site, Twenty-First Century Investors called BIM. We read through and indexed their entire web site, creating a file which was used for a “back-of-the-book” style index.

Descartes Systems Group:

BIM created a web site index as an information retrieval system for their corporate site.

Affinity Corporation:

BIM cataloged information from a large, well-known online book store for marketing purposes.

Accelerating 1to1:

BIM created abstracts of information on one-to-one marketing from over 65 various web sites. The information was included in a searchable database.

Leave a Reply

Your email address will not be published. Required fields are marked *