About Us

Indexing of Publications

Organizing Web Pages and Links

Translating

Recent Projects

Contact Us

Home Page

	  
	  

Web Site Indexing

Web site visitors need to find what they are looking for on your site.

If they can't find the product that they want, they can't buy it. Or if they can't find adequate information about a service that you offer, they might not be convinced that your company can do the job.

Aren't site search engines the answer?

Putting a search engine on your Web site can be of some help. However, Web site visitors have two major complaints about site search engines:

(1) They complain that they can't find the information that they are looking for (although the information is on the site).

(2) The site search engine gives them too much information. They search a particular word, and hundreds or even thousands of pages are retrieved. Not only do they not know where to begin looking through all of these pages, but it turns out that many pages are not all relevant to their search.

Why does this happen?

Users often can't find information that, in reality, is there on the site because the search engine is searching only for the term that the user typed into the search box. If it doesn't find that specific term, it will give them the famous "No Documents Found" message. For example, on a medical Web site, if a visitor typed in the word "cardiology", Web pages about the heart would not be retrieved unless they specifically used the word "cardiology" within the page.

The reason that too much information is retrieved, along with many documents that are not relevant, has to do with the way search engines work. Many search engines do free-text searches. This means that the engine is looking for each and every occurrence of a word within the text of a page. This is not a good way to search. Using the medical Web site example again, if a visitor typed in the word "heart", a page would be retrieved even if it contained one section that reads "Depression among young people is often the result of a break-up with a girlfriend or boyfriend. Many have difficulty dealing with the split, and complain of a broken heart". Obviously, this sentence is not about the physical heart, but is speaking figuratively. A search engine would not know this.

What is the solution?

The solution lies in four basic steps that must be performed in order to have accurate information retrieval.

(1) The search engine must be set to search only keywords that are assigned to each Web page. This means that someone has to read through each page of the site, and then decide which words most accurately reflect the contents of that page. The text must be carefully examined for concepts that are implied, even if the word itself is not used within the text.

(2) A custom-designed thesaurus must be developed, based on the terminology used on the Web site as well as within the specific industry or profession. The thesaurus lists not only terms that mean the same thing as the chosen term, but both broader and narrower terms for the word. An example of this (for our medical Web site) would be the phrase "root canal". A broader term for this phrase would be "dentistry".

(3) The search engine is set to recognize synonymous terms. If a user types in a certain word, all pages are retrieved that have words with meanings similar to the chosen term.

(4) Then the individual assigning the keywords uses the site thesaurus as a guide in assigning terms to the various pages. She or he assigns broader and narrower terms to the principal keywords as called for.

A fifth step can be used to create a super-precise information retrieval system. This step is the creation of sub-categories to go along with each keyword. Let's illustrate this with our medical Web site example.

Using the first four steps, we've created a search system which allows visitors to find all of the information dealing with the human heart. However, if the Web site is quite large, the search engine still might retrieve up to 20 or more pages that have to do with the heart. Although there is usually a title and paragraph summary showing what each page is about, they can often be very wordy and not easy to scan through. By creating sub-categories for the word "heart" (such as "heart disease", "heart surgery", "parts of the heart", etc.) and displaying these sub-topics as search results, Web site visitors can quickly and easily identify the topics that fit their needs. There is no need for Boolean operators or any special skills on the part of the user.

Implementing the above methods allows users to find all of the information that they are looking for, without having to wade through hundreds of (or even thousands) irrelevant documents.

Who should assign the keywords?

There is always the temptation to use content writers to assign the keywords. The thought is that they are already writing the page, so why not stick some terms in the keyword tags of the html page so that it can be used with a search engine later on? This almost inevitably results in inconsistent keyword tagging, lack of identification of all words that might be used in the search, and (once again) user frustration.

Broccoli Information Management uses trained indexers and information architects to assign keywords to html files, to custom-designed thesauri, and to index Web sites. Kevin Broccoli, president of BIM, has experience indexing books, magazines, and Web sites. He is a member of the American Society of Indexers and past-vice president of the New York Chapter of ASI. Kevin is also manager of ASI's Web Indexing Special Interests Group.

Please feel free to contact us to chat about your current site search system. Call us at 828-252-3107 or e-mail us at broccoli@bim.net . We look forward to meeting you!