2012年3月31日 星期六

Security differences between conventional & social network

Security is so important! No one want his or her personal information "attacked" by any one.Security service is a service, provided by a layer of communicating open systems, which ensures adequate security of the systems or of data transfers as defined by ITU-T X.800 Recommendation.
X.800 and ISO 7498-2 (Information processing systems - Open systems interconnection – Basic Reference Model – Part 2: Security architecture) are technically aligned. This model is widely recognized.
 We have theoretical definition about security,conventional network security includes[1]:
A. Authentication
a1. Peer entity authentication
a2. Data origin authentication
B. Access control
C. Data confidentiality
c1. Connection confidentiality
c2. Connectionless confidentiality
c3. Selective field confidentiality
c4. Traffic flow confidentiality
D. Data integrity
d1. Connection integrity with recovery
d2. Connection integrity without recovery
d3. Selective field connection integrity
d4. Connectionless integrity
d5. Selective field connectionless integrity
E. Non-repudiation
e1. Non-repudiation with proof of origin
e2. Non-repudiation with proof of delivery
A more general definition is in CNSS Instruction No. 4009 dated 26 April 2010 by Committee on National Security Systems of United States of America: [5]
A capability that supports one, or more, of the security requirements (Confidentiality, Integrity, Availability). Examples of security services are key management, access control, and authentication[2].
So, What are the social network security objective?Are they different? In fact, there are something different, they are:
1. Privacy
   (a) user profile privacy
   (b) communication privacy
   (c) message confidentiality
   (d) information disclosure
2. Integrity
3. Availability
From above, we can find some differences, like privacy, availability. These two are not included in conventional online network security. The conventional online network take more
factors into consideration, like  access control, etc. For example, I have my personal information on social network website, like age, name, job, etc.. I want these items shared in my friends but no others I don't know. But, conventional network security don't focus on these.

Those I written are from Wikipedia and the lecture, also my own existing knowledge.
The social network security objectives are quoted from " lecture of week 10, slide 6-slide 10". The conventional network security part is quoted from Wikipedia.
The example I used is my own knowledge, some are my experience I get from social network website, like Weibo, Renren and Hoopchina.

2012年3月14日 星期三

Social Network Analysis and an Example

For me, social network analysis is a mathematical theory in social networking field. Why do we need it(what can we obtain by using it) and how to use it?
Firstly, I want to give some simple concepts in SNA(Social Network Analysis). For example, let me use the following graph of a social network to illustrate(Fig1.).

Fig1. Graph of a simple social network
As we can see, it is a social network with a very simple structure that only consists of five people. The graph is called "sociograph". Now, let's express their relationships in a SNA way, which is called "sociomatrix" or "adjacency matrix". The sociomatrix is shown as following:
     
The entries in the matrix X represent the links between two arbitrary nodes. "1" means two nodes are adjacent within a distance of 1, "0" means these two nodes are not adjacent or not reachable. Because the graph of this social network is a undirected graph, therefore its sociomatrix is a symmetric matrix. It means X equals its transposed matrix.
Now, let's have an analysis on X. In the horizontal direction of Alice, "1" or "0" means Alice chooses some one or not chooses someone, respectively ("-" means Alice cannot choose herself). In the vertical direction of Alice, "1" or "0" means Alice is chosen or not chosen by someone.
What we are going to do are some computing works. 
1. Density  
Density is the proportion of links that exists out of all possible links. Because there are 5 nodes in the graph, thus, the number of all possible links is 10. It is very easy that the density of this social network is 0.6.
2. N-Clique
N-Clique is a set of nodes that are within distance N of each other. For example, set{Alice, Bob, Carol, David} is a 1-clique and the whole graph is a 2-clique.
In social network, cliques are groups.
3. K-Plex
K-Plex is a set of nodes in which every nodes has a tie to at least n-k others in the set. In this graph, the set{Alice, Bob, Carol, David} is a 2-Plex and the whole graph is a 4-Plex.  
4. Group Degree Centralization
Degree of a node is the number of links that are incident with it. And the group degree centralization is used to look at the dispersion of degree centrality. By using the formula shown as following[1]:
The group degree centralization is 0.667.
5. Group Degree Closeness Centralization
Closeness is based on the inverse of the geodesic distance of each actor to every  other actor, therefore the shorter the distance is, the larger the closeness will be. Group closeness centralization measures the overall level of closeness in a network, which is defined as the following formula[1]:
By using this formula, the group closeness can be calculated out as 0.756.
6. Group Betweenness Centralization
Betweeness is the proportion of the number of the shortest paths that one nodes on between two other nodes out of the number of all shortest paths between these two nodes. Group betweenness centralization measures the overall level of betweenness in a network by using the following formula[1]:
In this five people social network, the group betweeness centralization is 0.5625(Alice=1/12, Bob=Carol=Eva=0, David=7/12).
-------------------------------------------------------------------
After the introductions of these terminologies, I'd like to introduce some complicated concepts in SNA, like PageRank and HITS, both are algorithms of ranking the web-pages(ranking the nodes). Because there so many web-pages online that refer to some topics can be searched out and so many different relationships between the social network users, therefore it is important to find out which "node" is most important.
a). PageRank
Firstly, let's use PageRank to explore what behind the nodes and their relationships. The example above we have discussed is used to illustrate again. PageRank is the algorithm used bu Google search engine. Recall the following formula[2]:
Let's express this formula in the form of sociomatrix of the example, it is shown as below[2]:

Where vector P consists of rank prestige of Alice, Bob, Carol, David and Eva. The equation is recursive but may be computed by starting with any set of ranks and iterating the computation until it converges. Let's list each process values of them shown in the table below:
The rank prestige value of each people is converged from the round of fourteen, and we can show their rank prestige values and relationships as following:
By using PageRank algorithm, we can rank these five people in a descending order, like:David, Alice, Carol and Bob, Eva. 
b). HITS 
HITS is short for "Hyperlink- Induced Topic Search" also known as hubs and authorities. In HITS algorithm, each node acts as a hub and authority at the same time. Hub is a node that chooses other nodes, while authority is a node that chosen by other nodes. The following are the two equations to calculate both values, respectively.
Authority vector equation[3]:



and Hub vector equation:
In order to calculate both values, we use recursive calculation and iterating computing. At the first round, we set all the values equal 1. The processes and values are listed in two tables, shown as below:
After iterate the calculation for 8 rounds, these values have almost converged. We can draw a figure to show all values and relationships:
Therefore, from the figure we can list the descending order sequence of authorities, it is "David, Alice, Bob and Carol, Eva". Also, the descending order sequence of hubs can be given, it is "Eva, Carol and Bob, Alice, David ".
It means David is the one always chosen by others and with the highest authority. And Eva is a useful hub, because she can connect to David(the most influential guy) directly.
c). The Powers of a Sociomatrix
X^n means the sociomatrix multiply itself by n times. What does it mean? Let's still use the example. 
The numbers of the entries in X^2 give us how many paths of distance 2 between two nodes. Like there are 2 different paths of distance 2 between David and Carol. In contrast with X, some entries of X^2 in the corresponding place have changed from zeros to a non-zero values. It means there is no path of distance 1 between two certain nodes but they are reachable with two links, (e.g. the geodesic distance of them is 2). Beside, the values stand for how many paths of distance 2 between these two nodes. Therefore, the calculation of the power of a sociomatrix is very useful, by checking the entries values(0 or not 0) of the matrix(shown as below)  can give us reachabilitis of two arbitrary nodes:
Where g is the number of nodes in social network.

References:
[1]S. Wasserman & K. Faust (1994) Social Network Analysis, Cambridge University Press
[2]PageRank - Wikipedia, URL: http://en.wikipedia.org/wiki/PageRank
[3]HITS – Wikipedia, URL: http://en.wikipedia.org/wiki/HITS_algorithm