The vector space model provides the framework for most information retrieval algorithms used today. However, this most basic vector space model alone is not efficient enough. Many modifications and heuristics have been invented to speed up the basic model, giving rise to a popular model called the *Latent Semantic Indexing* (LSI) model (Berry, Drmac & Jessup, 1999).

The major mathematical entity in the LSI model is also a matrix. We have already discussed how the vector space model described in preceding pages can be viewed as a matrix. Now let's adjoin the 3 billion document vectors associated with the Web (each of which is a 300,000-D vector) to one another to form a 300,000-by-3,000,000,000 matrix, which is called the *term-by-document matrix*. This matrix might look like

where the (*i,j*)-element of the matrix contains a 1 if the word for position *i* in document *j* is a keyword and a 0 otherwise. Column 1 in this matrix is document 1's document vector, column 2 is document 2's vector, and so forth.

This huge term-by-document matrix may contain redundant information. Berry, Drmac & Jessup (1999) provide the following example:

There may be a document on the Web with keywords

applied linear algebraand another document on the Web with keywordscomputer graphics. There might also be a document with keywordslinear algebra applied [to] computer graphics. Clearly, the vectors for the first two documents sum to the vector for the third document.

In linear algebra, this is called *dependence among columns* of the matrix. This dependence among columns may be frequent, since there are so many documents on the Web. Fortunately, this dependence in the term-by-document matrix can be used to reduce the work required in information retrieval.

Here is where the LSI model departs from the vector space model. In most general terms, the LSI method manipulates the matrix to eradicate dependencies and thus consider only the independent, smaller part of this large term-by-document matrix. In particular, the mathematical tool used to achieve the reduction is the *truncated singular value decomposition* (SVD) of the matrix.

Note: We introduce the SVD briefly in what follows; Berry, Drmac & Jessup (1999) and Meyer (2000) contain detailed explanations of the SVD of a matrix.

Basically, a truncated SVD of the term-by-document matrix reduces the *m*-by-*n* matrix **A** to an approximation **A _{k}** that is stored compactly with a smaller number of vectors. The subscript

Let's pause now for a formal definition of the SVD, followed by a numerical example. Before defining the SVD, we also review some other matrix algebra terms. The following definitions and theorems are taken from Meyer (2000).

DefinitionTherankof anm-by-nmatrixAis the number of linearly independent rows or columns ofA.

DefinitionAnorthogonal matrixis a realm-by-nmatrixPof rankrwhose columns (or rows) constitute an orthonormal basis forR. That is, the set of column (or row) vectors {^{r}p,_{1}p, …,_{2}p}, where_{r}r =rank(P), has the property thatp_{i}^{T}p= 1 for_{j}i = j, 0 otherwise.

TheoremFor eachm-by-nmatrixAof rankr, there are orthogonal matrices

U_{m-by-m}andV_{n-by-n}and a diagonal matrix

D_{r-by-r}= diag(σ_{1}, σ_{2}, …, σ_{r})such that

with σ

_{1}≥ σ_{2}≥ … ≥ σ_{r}> 0.

DefinitionThe σ_{i}'s in the preceding theorem are called thenonzero singular valuesofA. Whenr<p =min(m,n),Ais said to havep - radditionalzero singular values. The factorization in the theorem is called thesingular value decomposition(SVD) ofA, and the columns ofU(denotedu) and the columns of_{i}V(denotedv) are called the_{i}left-handandright-hand singular vectorsofA, respectively. Written another way, the SVD expressesAas a sum of rank-1 matrices:.

DefinitionThetruncated SVDis

Theorem(Meyer, 2000, p. 417) If σ_{1}≥ σ_{2}≥ … ≥ σ_{r}are the nonzero singular values ofA_{m-by-n}, then for eachk, the distance fromAto the closest rank-kmatrix is

Thus, σ_{k+1} is a measure of how well **A _{k}** approximates

Figure 6 shows the storage savings obtained with the truncated SVD in general.

A numerical example may make these concepts clearer. Suppose the original term-by-document matrix **A** is 100-by-300. After a truncated SVD, suppose **A _{20}** approximates

Larger data collections often have much greater storage savings. The Medlars collection uses a 5831 × 1033 term-by-document matrix, which can be very well approximated by the truncated SVD with *k *= 70, creating storage savings about 13 times less than the original matrix.

We close this section with a small example, taken from Berry & Browne (1999), showing that the truncated SVD can well approximate a matrix. Suppose **A** is the 9 × 7 matrix

As *k* increases, the truncated SVD better captures the essence of **A**, yet the storage savings decrease. For example, here is **A _{6}**:

Notice that **A _{6}** does a much better job of capturing the relationship between the vectors of

vectors and query using A, A3, and A6

Having demonstrated the truncated SVD's potential to save storage (note), let's now consider how the angle measures are computed using

since ||**U _{k}**

It's helpful to review the above equation, emphasizing the advantages of the truncated SVD. First,

In practice, *k* can be much smaller than *r* and yet **A _{k}** still approximates

Exercise: Experiment with an LSI Search Engine built in Matlab