Create Your Own 3-D Physical Search Engine


Here you will investigate search engines by exploring their inner parts. You will build a physical search engine and visually understand how it works on the tiny 3-D Food! document collection.



3-D Physical Search Engine

Supplies

You need the following tools for this project: Instructions
  1. Orient yourself with the piece of paper that has the xy-plane marked on it. Practice by plotting the vector

    Note the scale of the axes: one tick mark represents 0.1 units. The first element of the vector represents the distance on the x-axis, and the second element represents the distance on the y-axis. Now measure in inches the length of the line connecting the origin and the point (0.5, 0.8). Remember the scale. Note that the length of this 2-D vector is equal to the square root of the sum of squares of each element in the vector. That is, for the 2-D vector

    the length is


  2. Now use the 2-D xy-plane and create a 3-D xyz-space. Place the edge of the xy-plane over the edge of the desk. Adhere the styrofoam block to the corner of the desk with tape. Make sure the top of the block sits about 1 inch above the face of the desk. Now tape the wooden stick to the styrofoam block so that the stick's base touches the origin and the stick stands vertical with the numbers facing the xy-plane. This will be the z-axis. See the Figure 1.

    Figure 1. Setup of desk


    Practice using the 3-D space by plotting the 3-D vectors and
    The first element represents distance on the x-axis, the second represents distance on the y-axis, and the third represents distance on the z-axis.

    Notice that the length of each 3-D vectors is, as with the 2-D vectors, the square root of the sum of squares of the elements in the vector. That is, for the 3-D vector

    the length is

Find the lengths of your practice vectors.


  1. Now return to the small Food! magazine document collection example. The term-by-document matrix A for this example is

    Recall that each column of this matrix represents a document vector corresponding to a particular article. Find the length of each document vector using the formula from step 2. Now cut a green pipe cleaner 1 inch longer than the length of each document vector and, using the stickers, label each pipe cleaner according to the document vector it represents.

  2. Plot each document vector on your 3-D space system by inserting one end of the pipe cleaner 1 inch into the styrofoam block at the origin at the proper angle so that the other end of the pipe cleaner touches its 3-D point.

  3. Once all the document vectors have been plotted, consider the query vector. Query vector 1 from this Food! example is

    Find the length of this query vector and cut a red pipe cleaner 1 inch longer than this length. Plot this query vector in your 3-D space.

  4. Visually inspect your 3-D graph and see if you can answer the question "Which document vectors (green pipe cleaners) are closest to this query vector (red pipe cleaner)?"

  5. Use your protractor to measure an angle of 50°. Fold the blue pipe cleaner in half, opening the legs so that the angle between them is 50°. Take the "V" end of the folded blue pipe cleaner and hold it at the origin. Take one leg of this "V" and fasten it around the red query pipe cleaner. Make sure the legs of the blue pipe cleaner stay spread 50°. Rotate the free, unattached leg of the blue pipe cleaner around the red pipe cleaner, sweeping out an invisible cone. Figure 2 shows a visible representation of the invisible cone. Only the red and blue pipe cleaners, not the green ones, are depicted, to simplify the picture.


    Figure 2: Cone created by sweeping blue pipe cleaner around red pipe cleaner

    Which document vectors fall in this cone? Notice that an angle of 50° has a cosine measure of 0.643. This means that all document vectors whose cosines of their angles with the query vector are greater than 0.643 will be retrieved by your physical 3-D search engine system. Suppose you want to be more lenient and return to the user all documents with a relevancy score above 0.3. What angle should you now sweep out? [You desire the angle θ such that cos(θ) = 0.3, which means θ = cos-1(0.3) = arccos(0.3) = 72.54°.] Now which document vectors would your physical 3-D search engine return to the user?
  6. Repeat these steps using query 2 from the Food! example, Since the document vectors remain the same, you can start at step 5. Be sure to remove the red pipe cleaner for query vector 1 and cut the remaining red pipe cleaner the appropriate length for query vector 2. For step 7, use stricter values for the cutoff value. Try a cutoff value of 0.8, which implies θ = cos-1(0.8) = arccos(0.8) = 36.87°, and a more lenient cutoff value of 0.7, which implies θ = cos-1(0.7) = arccos(0.7) = 47.57°.

Take a few minutes now to think about the limitations of this physical search engine system. For example, the human mind cannot visualize 4 and 5 dimensions easily, much less 30,000 dimensions. Also, how would the weighting scheme used in the term-by-document matrix affect the results? In this small Food! example, we used the percentage weighting scheme for vectors. That is, the ith element of a document vector represents the percentage of that document pertaining to term i. Suppose frequency weights had been used. Then the number of times the document mentions term i is stored in vector element i.