The Linear Algebra Behind Search Engines - The Future of Information Retrieval

Amy Langville

Many search engines are beginning to incorporate structured text retrieval, meaning that users can search for text in italics, capitals, or boldface print. Google currently incorporates some structured retrieval. Users can choose to search for keywords located in the title, body, or hyperlinks of documents. Google's advanced features include searching for HTML documents, PostScript documents, or Excel documents, among other choices. Hotbot allows users to search for images, video, or MP3-formatted data, in addition to ordinary text. Its advanced features also include date information. For example, users can choose to retrieve only recent documents, those updated in the last two weeks, two months, or two years.

Another recent trend in search engine research is to use syntactic clues and query structure to improve retrieval results. These methods incorporate the location of the keywords in the query, the use of prepositions, verbs, and adjectives, hyperlink structure on document pages, and citation links to enhance retrieval. Other research efforts include accommodations for errors and typos in the query, using the concepts of pattern matching and edit distance.

Most keyword search engines, such as the library search engine for N. C. State University, allow for wildcards. For example, you can enter computer * to retrieve books about computer-aided analysis, the computer age, computer circuits, and many other such topics.

One of the most exciting applications of information retrieval is image retrieval. True image retrieval uses mathematical models to capture how similar images are to an original image without using text hints, such as HTML image tags. Soon users will be able to retrieve images in addition to text documents. One application of image retrieval is locating possible suspects for a crime by searching through a criminal photo ID database electronically to find those criminals whose photo profiles most completely match the sketch rendered by the crime scene artist. This can be further extended to sound retrieval. Searching through audio files is as challenging as image retrieval, but may be possible in the near future.

