Linear Algebra Quiz
Instructor's Note
  1. Suppose it takes a computer 10 seconds to compute 30,000 cosine measures of the angle between the query vector and 30,000 document vectors. How long would it take 5 computers working in parallel (at the same time) to do this same task (neglecting overhead times associated with computer-computer interaction)?
  2. Term Index Term


    1 fish
    2 dog
    3 swim
    4 cat
    5 compute
    6 bike
    7 car
    8 bake
    9 orange
    10 blue

  3. Suppose a document collection has only 10 terms associated with it, as listed in table at the right. For example, term 1 is fish. Create a query vector for the query of orange, dog. You may use a weighting scheme if you like. What is the appropriate document vector for a document with keywords of swim, bike, car, fish?

  4. If it takes one second to do a single 30,000 × 3,000,000 matrix by 3,000,000 × 1 vector multiplication, how long would it take to do 8,000,000 of these matrix-vector multiplications?

  5. Suppose a small document collection contains 45 documents pertaining to a particular query. Search engine A returns 29 relevant documents and 10 irrelevant documents, while search engine B returns 25 relevant documents and 6 irrelevant documents. Find the precision and recall associated with each search engine for this query. Which search engine performs better on this query with respect to these two measures?

  6. If a particular search engine returns all documents in the collection to the user as relevant, what is the recall? What is the precision for this system, high or low?

  7. Distance between two points and length of a vector:

    1. The Euclidean distance between two 2-D points h=(h1,h2) and i=(i1,i2) is given by

      To see why this is so, draw two points h and i in the xy-plane. Now use the Pythagorean theorem for right triangles (a2 + b2 = c2) to find the distance between the two points h and i. Find the Euclidean distance between the points h =(3,5) and i =(7,10).

    2. The Euclidean distance of a 2-D point x=(x1, x2) (also called a 2-D vector) to the origin [point (0,0)] is called the length of the vector x:


      Find the Euclidean length of the 2-D vector x = (1,4). [This is also equivalent to asking: find the Euclidean distance between the point h = (1,4) and i = (0,0).] Find the length of the 2-D vector y = (8,5).


    3. The Euclidean length of a 3-D vector x=(x1, x2, x3) is similarly given by

      You verified that this formula was correct by using a ruler and creating a 3-D plot system in the assembler section of the lab. Can you prove this by applying the Pythagorean theorem twice? Hint: First prove that the distance in the flat 2-D surface is (x12+x22)1/2.

    4. We can extend these Euclidean length measures to n-dimensional vectors.

      Find the length of the 5-D vector x=(1 3 4 0 9) T.

  8. Use the cosine angle formula to compute the angle between the two vectors x = (1 2 3 4) T and y = (1 0 0 2) T.

  9. Use the MATLAB command svd to compute the nearest rank-4 matrix A4 to the 9 × 7 term-by-document matrix A presented on page 7 and reproduced here:

    Compute the nearest rank-3 matrix A3, and compare it to the A3 represented on page 7.

  10. Use the Matlab command rand(m,n) to create a dense 200 × 300 matrix of random elements ranging between 0 and 1. Call this matrix B. The three entities ui, vi, σi are called the ith singular triplet of the matrix B=UDVT. Compute the first 10 (20, 50, 100) singular triplets of B using MATLAB's svds command. Notice how long it takes to compute the singular triplets for the different values of k (10, 20, 50, 100).

  11. σi

    18.24
    13.45
    7.34
    7.11
    5.59
    2.56
    .57
    .13
    .98 × 10-5
    .33 × 10-5
    .72 × 10-6
    .19 × 10-6
  12. Use MATLAB's built-in gallery of matrices to create P and Q. Let P = gallery('frank',10,1) and Q = gallery('frank',30,1). Plot the singular values for each matrix. Do you notice a dropoff in the magnitude of the singular values for P or Q? Why is it a good idea to truncate at this dropoff? That is, why is it good to choose k = dropoff point? Compute P5 and Q10. Do these truncated SVD approximations resemble the original matrices P and Q? Suppose the nonzero singular values for a matrix R are those in the table at the right. Argue that k = 8 is a good truncation point.

  13. Using the P matrix from the problem 10, measure the error EP in the approximation Pk to P using the k you chose in problem 10. That is, calculate EP = P - Pk and ||EP||2 = ||P - Pk||2. [Use MATLAB's norm command to compute ||EP||2.] Now do the same for Q: compute EQ and ||EQ||2. Compare ||EP||2 with ||EQ||2. Which approximation seems better, Pk or Qk? Is this a fair comparison? Would it be better to compare ||EP||2 / ||P||2 with ||EQ||2 / ||Q||2? Why or why not? ||EP||2 is called the absolute error and ||EP||2 / ||P||2 is called the relative error. Do these terms make sense? In comparing ||EP||2 to ||EQ||2, is it more appropriate to use the relative or the absolute error?




Answers