Google-opoly: Markov's strategy

Markov's strategy

The histograms in the previous applets computed the total number of times each web page was visited at the current stage of the game. Suppose instead, the histograms reported the proportion of time each web page had been visited. Calculating such proportions if Google-opoly were played forever computes the PageRank for the web pages in that network. The ability of PageRank to give useful information is a key part of the success of Google's search engine.

How does Google find the PageRank of the billions of web pages that it indexes? Clearly, the company doesn't pay its employees to roll multi-billion-sided dice. Does a computer program simulate the game of Google-opoly and randomly surf over the network? The answer is no but the technique that is used produces equivalent results. Instead of simulating a random surfing session, we will create a matrix G in which the entries g_{ij} equal the probability of being at web page i and moving to web page j.

Figure 6. The first Google-opoly game board from Figure 1.

Let's compute the first row of G for the network in Figure 1 which is reproduced in Figure 6. Entries in this row correlate to a probability of jumping from web page 1. The entry in the first column, g_{11}, equals the probability of moving from web page 1 to web page 1. This only happens through teleportation. In our game, 1/6 of the time we teleport, and at that time every page is an equally likely destination. So, each page has a (1/6)(1/6) = 1/36 chance of being visited through teleportation from web page 1. In the game, we follow hyperlinks on a page rather than teleport 5/6 of the time. Web page 1's only link is to web page 4. So, web page 4 will be visited through teleportation 1/36 of the time and through web page 1's hyperlink 5/6 of the time. So, there is a 31/36 chance of visiting web page 4 after your first turn of Google-opoly. Therefore, the first row of G is

\mathbf{g}_1 = (\matrix{1/36 & 1/36 & 1/36 & 31/36 & 1/36 & 1/36 }).

Before producing the entire matrix, let's produce the sixth row which assumes you are at the dangling node. So, we must teleport. Since all web pages are equally likely as a destination, each web page has a 1/6 probability of being visited from web page 6. So, the sixth row of G is

\mathbf{g}_6 = (\matrix{6/36 & 6/36 & 6/36 & 6/36 & 6/36 & 6/36 }).

Similar logic for all the rows produces

G = \left\lgroup\matrix{ 1/36 & 1/36 & 1/36 & 31/36 & 1/36 & 1/36 \cr 31/36 & 1/36 & 1/36 & 1/36 & 1/36 & 1/36 \cr 31/36 & 1/36 & 1/36 & 1/36 & 1/36 & 1/36 \cr 1/36 & 11/36 & 11/36 & 1/36 & 11/36 & 1/36 \cr 1/36 & 1/36 & 16/36 & 1/36 & 1/36 & 16/36 \cr 6/36 & 6/36 & 6/36 & 6/36 & 6/36 & 6/36 \cr }\right\rgroup,

which is called a Markov transition matrix. Our game states that we always start at web page 1. The vector

\mathbf{v} = (\matrix{ 1 & 0 & 0 & 0 & 0 & 0 }),

like the matrix G contains probabilities as its elements. Since the first entry is 1, the vector states that you are, with probability 1, at web page 1. To determine the probabilities of visiting the web pages after one turn, compute

\mathbf{v}G = \mathbf{v}_1 = (\matrix{1/36 & 1/36 & 1/36 & 31/36 & 1/36 & 1/36}).

The probabilities of visiting the web pages after two turns are

\mathbf{v}_1G = \mathbf{v}_2 = (\matrix{ 0.0779 & 0.2708 & 0.2824 & 0.0548 & 0.2708 & 0.0432 }).

Note that

\mathbf{v}_2 = \mathbf{v}_1 G = \mathbf{v} G^2 \text{. So, } \mathbf{v}_{100} = \mathbf{v}G^{100}.

In the applet in Figure 6, you can compute \mathbf{v}_n^T for increasingly large n. Doing so, what do you find? Note that the applet computes with G^T, simply to save space in the applet. As such, you find \mathbf{v}_{n}^T (a column vector) rather than a row vector as described in the text above.

Figure 6. Markov chain model of Google-opoly.

From your work with the applet in Figure 6, you should see that \mathbf{v}_{50} is the same to five decimal places as \mathbf{v}_{60}. Therefore, we have found the steady-state vector. From the entries, we see that 26.6% of the time, if we surf forever, we will visit web page 1. We have found the best web page to own in Google-opoly for this network. You can compute similar steady-state vectors for the other game boards. Notice how we found the PageRank of the network with \mathbf{v}_{50} using the Markov transition matrix, whereas we would need more than 50 (a lot more) rolls of the dice to find the same PageRank values for the system were we to randomly surf.