fengyiming: Social Network Analysis and an Example

For me, social network analysis is a mathematical theory in social networking field. Why do we need it(what can we obtain by using it) and how to use it?
Firstly, I want to give some simple concepts in SNA(Social Network Analysis). For example, let me use the following graph of a social network to illustrate(Fig1.).

Fig1. Graph of a simple social network

As we can see, it is a social network with a very simple structure that only consists of five people. The graph is called "sociograph". Now, let's express their relationships in a SNA way, which is called "sociomatrix" or "adjacency matrix". The sociomatrix is shown as following:

The entries in the matrix X represent the links between two arbitrary nodes. "1" means two nodes are adjacent within a distance of 1, "0" means these two nodes are not adjacent or not reachable. Because the graph of this social network is a undirected graph, therefore its sociomatrix is a symmetric matrix. It means X equals its transposed matrix.

Now, let's have an analysis on X. In the horizontal direction of Alice, "1" or "0" means Alice chooses some one or not chooses someone, respectively ("-" means Alice cannot choose herself). In the vertical direction of Alice, "1" or "0" means Alice is chosen or not chosen by someone.

What we are going to do are some computing works.

1. Density

Density is the proportion of links that exists out of all possible links. Because there are 5 nodes in the graph, thus, the number of all possible links is 10. It is very easy that the density of this social network is 0.6.

2. N-Clique

N-Clique is a set of nodes that are within distance N of each other. For example, set{Alice, Bob, Carol, David} is a 1-clique and the whole graph is a 2-clique.

In social network, cliques are groups.

3. K-Plex

K-Plex is a set of nodes in which every nodes has a tie to at least n-k others in the set. In this graph, the set{Alice, Bob, Carol, David} is a 2-Plex and the whole graph is a 4-Plex.

4. Group Degree Centralization

Degree of a node is the number of links that are incident with it. And the group degree centralization is used to look at the dispersion of degree centrality. By using the formula shown as following[1]:

The group degree centralization is 0.667.

5. Group Degree Closeness Centralization

Closeness is based on the inverse of the geodesic distance of each actor to every other actor, therefore the shorter the distance is, the larger the closeness will be. Group closeness centralization measures the overall level of closeness in a network, which is defined as the following formula[1]:

By using this formula, the group closeness can be calculated out as 0.756.

6. Group Betweenness Centralization

Betweeness is the proportion of the number of the shortest paths that one nodes on between two other nodes out of the number of all shortest paths between these two nodes. Group betweenness centralization measures the overall level of betweenness in a network by using the following formula[1]:

In this five people social network, the group betweeness centralization is 0.5625(Alice=1/12, Bob=Carol=Eva=0, David=7/12).

-------------------------------------------------------------------

After the introductions of these terminologies, I'd like to introduce some complicated concepts in SNA, like PageRank and HITS, both are algorithms of ranking the web-pages(ranking the nodes). Because there so many web-pages online that refer to some topics can be searched out and so many different relationships between the social network users, therefore it is important to find out which "node" is most important.

a). PageRank

Firstly, let's use PageRank to explore what behind the nodes and their relationships. The example above we have discussed is used to illustrate again. PageRank is the algorithm used bu Google search engine. Recall the following formula[2]:

Let's express this formula in the form of sociomatrix of the example, it is shown as below[2]:

Where vector P consists of rank prestige of Alice, Bob, Carol, David and Eva. The equation is recursive but may be computed by starting with any set of ranks and iterating the computation until it converges. Let's list each process values of them shown in the table below:

The rank prestige value of each people is converged from the round of fourteen, and we can show their rank prestige values and relationships as following:

By using PageRank algorithm, we can rank these five people in a descending order, like:David, Alice, Carol and Bob, Eva.

b). HITS

HITS is short for "Hyperlink- Induced Topic Search" also known as hubs and authorities. In HITS algorithm, each node acts as a hub and authority at the same time. Hub is a node that chooses other nodes, while authority is a node that chosen by other nodes. The following are the two equations to calculate both values, respectively.

Authority vector equation[3]:

and Hub vector equation:

In order to calculate both values, we use recursive calculation and iterating computing. At the first round, we set all the values equal 1. The processes and values are listed in two tables, shown as below:

After iterate the calculation for 8 rounds, these values have almost converged. We can draw a figure to show all values and relationships:

Therefore, from the figure we can list the descending order sequence of authorities, it is "David, Alice, Bob and Carol, Eva". Also, the descending order sequence of hubs can be given, it is "Eva, Carol and Bob, Alice, David ".

It means David is the one always chosen by others and with the highest authority. And Eva is a useful hub, because she can connect to David(the most influential guy) directly.

c). The Powers of a Sociomatrix

X^n means the sociomatrix multiply itself by n times. What does it mean? Let's still use the example.

The numbers of the entries in X^2 give us how many paths of distance 2 between two nodes. Like there are 2 different paths of distance 2 between David and Carol. In contrast with X, some entries of X^2 in the corresponding place have changed from zeros to a non-zero values. It means there is no path of distance 1 between two certain nodes but they are reachable with two links, (e.g. the geodesic distance of them is 2). Beside, the values stand for how many paths of distance 2 between these two nodes. Therefore, the calculation of the power of a sociomatrix is very useful, by checking the entries values(0 or not 0) of the matrix(shown as below) can give us reachabilitis of two arbitrary nodes:

Where g is the number of nodes in social network.

References:

[1]S. Wasserman & K. Faust (1994) Social Network Analysis, Cambridge University Press

[2]PageRank - Wikipedia, URL: http://en.wikipedia.org/wiki/PageRank

[3]HITS – Wikipedia, URL: http://en.wikipedia.org/wiki/HITS_algorithm

10 則留言:

Michael2012年3月14日晚上10:54
Wow, you did a really great job on SNA topic, and from your blog I see something deeper inside.
回覆刪除
回覆
adamparkinson2602012年3月15日清晨6:40
Wow, you have calculated the rank prestige matrix. It is a little hard for me that it request too much calculation. But the matrix also represent the same result from simple formula that David is the most influential.
回覆刪除
回覆
Xiang LIU2012年3月15日晚上8:04
It is really a great articl, and I think before you do some research, you should have some describe about the picture, that we can know better about the result we take. And its really a excellent work to have the result about ranking.
回覆刪除
回覆
Rock Lee2012年4月1日上午8:25
Thanks for your work, It is the most detailed one that I found in all the blogs. Even more than the assignment spec required.
回覆刪除
回覆
tsangchorshan2012年4月20日凌晨12:21
Good work and I study some from your input.
The calculation and formula are so prectise.
回覆刪除
回覆
Chenyu2012年4月24日凌晨3:14
Since you have introduced something about the powers of sociomatrix,I'd like to ask you something about the sociomatrix,because I am still a little confused about it.I don't quit understand the meaning of X^2 and what X^n actually means in the formula.Why it stands as the distance?
回覆刪除
回覆
Jason Zhang2012年4月28日清晨6:38
It is really hard to do the work like this, and I have to say you have really done a good job. From your post, we can clearly know how to do social network analysis. You have even done the analysis by calculate the rank prestige matrix, which I feel a little hard and haven't done in my post. We can learn much form your post!
回覆刪除
回覆
汪景晨_11550080262012年4月29日上午8:10
Thanks for your work, it is the most detailed, I find that in a blog. Even more than the operation specification.
回覆刪除
回覆