In this interconnected world, it is more important than ever to understand not just the details about your data, but also how its different parts are related to each other. From social networks to supply chains to text analytics, network analysis is becoming a critical requirement – and network visualization is one of the best ways to make sense of the results. The new SAS Visual Analytics network visualization shows links between related nodes as well as additional attributes such as color, size or labels. This paper will explain the basic concepts of networks as well as provide detailed background information on how to use network visualizations within SAS Visual Analytics.
Bad news are not the only thing travelling fast. Diseases, financial meltdowns, social unrest; disruptive events can spread across the globe faster than ever before. What these diverse events have in common is that they spread or propagate over connected structures: namely transportation, information and social networks.
Networks are at the core of complex systems and one of the reasons why their behavior is often surprising and unpredictable. Networks can’t be understood using traditional reductionist methods. To quote Aristotle, “the whole is larger than the sum of its parts”. The difference lies on how the parts are connected and influence each other.
If being oblivious to the nature of networks can leave one vulnerable to disruptive change, a deep understanding of networks can unlock massive gains. The reason is simple: networks are everywhere. Unlike the abstract world of relational databases, in the real world everything is interrelated. In recent years, companies like Facebook, Google and Wal-Mart harnessed the power of networks of relationships, information and supply-chains to dominate their competitors.
Gartner identifies five graphs in the world of business—social, intent, consumption, interest, and mobile—and says that the ability to leverage these graphs provides a “sustainable competitive advantage.”
We will see how you too gain this advantage with new network analysis and visualization features in SAS Visual Analytics.
A network is defined as a collection of objects in which some pairs of objects are connected by links. A link can represent any kind of relationship. This definition is very generic, explaining why we can find networks in so many diverse domains.
In mathematical terms, networks are represented by graphs (not to be confused with the common term used to describe some data visualizations). The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges. Graph properties are the subject of study of a branch of discrete mathematics called graph theory.
Along the same lines, network visualization is based on of the visual display of graphs. The most common form is the node-link diagram which uses a set of nodes (usually dots or circles) to represent the vertices, joined by lines or curves to represent the edges.
Vertice attributes can be mapped to visual node characteristics like size, color and shape. Edge attributes can be mapped to link width and color.
One edge attribute is particularly important: direction. Most relationships are undirected or symmetric. For example, the “friends” relationship in Facebook. But some network relationships are directed, or asymmetric - meaning that a link from A to B doesn’t imply a corresponding link from B to A. An example is the “follows” relationship in Twitter. In this case the visualization will use arrows or tapered edges to indicate the direction of the relationship.
The biggest challenge in network visualization is finding an optimal spatial placement for the nodes that makes the main characteristics of the network structure clearly visible. In a few specific cases, like transportation networks, node position is a simple mapping of node attributes that represent spatial coordinates. This is how Visual Analytics creates network diagrams overlaid over geographical maps.
But in general, node positions are generated dynamically. This is accomplished using a special kind of algorithm called a network layout. One of the most common layout algorithms, used by Visual Analytics, is the force-directed layout. It is based on a physics simulation where spring-like attractive forces are associated with the links, so that they attract the node endpoints to each other; these forces are balanced by repulsive forces associated with the nodes, like the ones in electrically charged particles. The layout will iterate over this system until a state of equilibrium (or a maximum number of iterations) is reached.
Stephen Hawking named this the “century of complexity”. We saw how networks represent the architecture of complexity. Understanding this architecture will give your business unique insights, like knowing which nodes could lead to catastrophic failures or which actors are crucial in a marketing campaign. The combination of SAS Visual Analytics network visualization with SAS graph analysis creates a powerful vehicle to find and deliver these insights.
Reasoning about a Highly Connected World
David Easley, Jon Kleinberg
September 23, 2013 by Sébastien Heymann in Graph Viz 101
(Key Actor Analysis)
Hidalgo, Cesar. MAS.961 Networks, Complexity and Its Applications
straight talk from the frontline
Rachel Schutt & Cathy O’Neil
Chapter 10 - Social Networks and Data Journalism
Matt Bogard, Western Kentucky University
The Competitive Dynamics of the Consumer Web: Five Graphs Deliver a Sustainable Advantage
“Realize that everything connects to everything else.”
― Leonardo da Vinci
“We cannot live only for ourselves. A thousand fibers connect us with our fellow men; and among those fibers, as sympathetic threads, our actions run as causes, and they come back to us as effects.”
― Herman Melville
“No man is an island, entire of itself.”
― John Donne, No Man Is An Island
“Everything is connected… no one thing can change by itself.”
“Eventually everything connects — people, ideas, objects… the quality of the connections is the key to quality per se.”
“Everything touches everything.”
— Jorge Luis Borges
“Social Network Analysis is much more than parsing tweets to see who’s flaming whom these days. At heart, it involves exploring the shifting web of relationships among people based on their profiles, interactions, and affinities.”
— James Kobielus, Information Management
aka six degrees of separations
Any two persons in the world are linked by at most six acquaintances
Truthy is a research project that helps you understand how communication spreads on Twitter.
We currently focus on tweets about politics, social movements and news.
Game Theory, Nash Equilibrium, Braess’s Paradox
NOTE: VAE Limitation:
Ideally (as in terrorist paper): use residuals as metric and display as attribute in network diagram > becomes as ‘Key Actors Plot’.
From companies like Wal-Mart that turned supply-chain efficiency into a competitive advantage to police departments that use social network analysis to identify and disrupt gangs, the list of success stories is growing.
Second, find the key actors based on their centrality measures:
NOTE: Plot with key actor data
In general, scale-free networks are largely immune from random failures, but very sensitive to failures in the hubs [Albert et al. 2000];
We saw how different aspects of the network topology can be extracted and added to the network data using SAS PROCs like OPTGRAPH and OPTNET as part of the data preparation phase. In future versions of Visual Analytics this type of analysis will be incorporated to the application itself, joining other features like forecasting and text analytics.
Analysing the network topology and its statistical properties goes hand-in-hand with network visualization and filtering to provide a comprehensive understanding of its nature. This combination extends the exploratory data analysis model supported by SAS Visual Analytics in a way that is perfectly suited to the exploration of networks.
In the following sections we will see a practical application of this specialized exploration workflow - sometimes called Exploratory Network Analysis (ENA)  in two domains where networks play a crucial role: transportation and social media.
Transportation has long been a fertile ground for the study of networks. In fact graph theory - the science that studies network topology - started when Euler solved the Seven Bridges of Königsberg  problem in 1736, introducing the now common abstraction of nodes and links to represent a real-life network of islands connected by bridges.
The study of transportation networks is at the core of Logistics - the management of the flow of resources between the point of origin and the point of consumption -and by extension Supply Chain Management (SCM). Innovations in SCM, like adopting a retail hub-and-spoke system in the early 1970s, have allowed Wal-Mart to become the world’s largest retailer.
But we are going to look at transportation networks as a way to understand how networks, even when associated with systems that have the same generic purpose, can be fundamentally different in their nature.
Examples of networks in modern society: social networks, information, tech and economic systems
Manufacturing” networks of suppliers
web sites: of users
media: of advertisers
a network is any collection of objects in which some pairs are connected by links.
people by email exchange or friend/following relationships
financial institutions by borrower-lender relationship
blogs links on the web to each other
summarizing a network is difficult
Characteristics include parts that are more or less densely interconnected, presence of cores, natural splits into tightly-linked regions. Participants can be more central or more peripheral; straddle the boundaries of regions or sit in the middle of one.
Need a language to talk about structural features.
Structure, but also behavior and dynamics
Behavior: each individual action has implicit consequences for the outcomes of everyone in the system.
ways of thinking about networks:
Explicit structure: clusters
aggregate effects: popularity curves
Central themes and topics
Graph theory: structure
Game theory: behavior
social network analysis
karate dataset good to show communities - eventually split in two rival clubs
A graph (also called network) is made of a set of entities, called nodes, and a set of relationships between entities (also called edges or links). The way nodes are connected constitutes the topology of the network. Moreover, additional information can be added such as properties, which are key-value pairs associated to each node or relationship. For example, individuals of a social network may be characterized by properties like gender, language, and age.
The analysis of complex networks consists in (but is not limited to) diverse types of tasks, such as the understanding the statistical properties of their topologies, the identification of significant nodes, and the detection of anomalies.
Faced with such diversity of data and the potentially unlimited number of analysis to perform at the first steps of a new project, analysts usually follow an exploratory approach to inspect data and outline interesting perspectives before drilling down to specific issues. When the datasets describe complex networks, this process is called Exploratory Network Analysis (ENA); it is based on data visualization and manipulation to analyze complex networks. This framework takes its roots in the more general framework of Exploratory Data Analysis (EDA), which consists in performing a preliminary analysis guided by visualization before proposing a model or doing a statistical analysis.
The good news is that recently scientists have been learning to map our interconnectivity. Their maps are shedding new light on our weblike universe, offering surprises and challenges that could not even be imagined a few years ago. Detailed maps of the Internet have unmasked the Internet’s vulnerability to hackers. Maps of companies connected by trade or ownership have traced the trail of power and money in Silicon Valley. Maps of interactions between species in ecosystems have offered glimpses of humanity’s destructive impact on the environment. Maps of genes working together in a cell have provided insights into how cancer works. But the real surprise has come from placing these maps side by side. Just as diverse humans share skeletons that are almost indistinguishable, we have learned that these diverse maps follow a common blueprint. A string of recent breathtaking discoveries has forced us to acknowledge that amazingly simple and far-reaching natural laws govern the structure and evolution of all the complex networks that surround us.
Riding reductionism, we run into the hard wall of complexity. We have learned that nature is not a well-designed puzzle with only one way to put it back together. In complex systems the components can fit in so many different ways that it would take billions of years for us to try them all. Yet nature assembles the pieces with a grace and precision honed over millions of years. It does so by exploiting the all-encompassing laws of self-organization, whose roots are still largely a mystery to us.
We have come to see that we live in a small world, where everything is linked to everything else. We are witnessing a revolution in the making as scientists from all different disciplines discover that complexity has a strict architecture. We have come to grasp the importance of networks.
Networks are a ubiquitous way to represent complex systems, including those in the social and economic sciences. … applications of networks and complexity to diverse systems, including epidemic spreading, social networks and the evolution of economic development.
Spring 2011. (MIT OpenCourseWare: Massachusetts Institute of Technology), http://ocw.mit.edu/courses/media-arts-and-sciences/mas-961-networks-complexity-and-its-applications-spring-2011 (Accessed 10 Jan, 2014). License: Creative Commons BY-NC-SA
Matt Bogard. “Using Twitter to Demonstrate Basic Concepts from Network Analysis”. Jan. 2010.
Available at: http://works.bepress.com/matt_bogard/9
Social network analysis focuses on finding patterns in interactions between people or entities. These patterns may be described in the form of a network. Network analysis in general has many applications including models of student integration and persistence, business to business supply chains, terrorist cells, or analysis of social media such as Facebook and Twitter. This presentation provides a reference for basic concepts from social network analysis with examples using tweets from Twitter.
The most obvious use of SNA is its ability to identify key actors and entities within a network. Centrality measures within a network are means for measuring a node’s relative importance within the network.  It is well-accepted that “the ability to measure centrality in social networks has been a particularly useful development in social network analysis.” What is more interesting, however, is the number of centrality measures that social network analysts use to reveal different things about how key actors interact within a network.  For example, a node with a high degree centrality is connected to many other nodes. In Figure 3 below, it is unsurprising that the American Nuclear Society (ANS) has the highest degree centrality in its own Twitter network. However, a node with a high betweenness centrality is one that connects the cliques in the network. Figure 4 shows the same ANS network, reconfigured and revisualized with an emphasis on betweenness, with a new node, Nuclear.com, emerging as the most important.
ORCA software turns a database of arrest records into useful social networks. Here’s an analysis of one particular gang. Each circle represents a person. The more arrests, the larger the circle. Lines connect two people who have been arrested together. ORCA created likely subgroups within the gang, represented by different colors.
(Euro countries, US banks)
Gephi forums are located at https://forum.gephi.org/ and are home to an active user community that can help answer your questions about specific Gephi issues or functionality.
The Gephi wiki at https://wiki.gephi.org/index.php/Main_Page provides detailed information about a wide variety of topics, and includes user manuals, plugin information, community details, and much more.
The Gephi blog at https://gephi.org/blog/ provides periodic updates on major news about Gephi.
Manuel Lima curates the Visual Complexity website (http://www.visualcomplexity.com/vc/), an archive of interesting network graphs provided by a wide array of users. This is a great place to find inspiration for your future graphs.
The Complexity and Social Networks Blog can be found at http://blogs.iq.harvard.edu/netgov/. Here a wide variety of topics relating to network analysis are discussed.
The Center for Complex Network Research at Northeastern University hosts the BarabasiLab at http://www.barabasilab.com/. Here you will find an array of resources including books, projects, external sites, and much more.
Truthy is a site dedicated to the analysis of Twitter communications, and is found at http://truthy.indiana.edu/.
Coursera (https://www.coursera.org/course/sna) offers courses in Social Network Analysis (SNA) that provide both a theoretical as well as practical focus on how social networks work to connect our society.
LinkedIn plays host to several groups that focus on SNA and Gephi, including the Social Network Analysis Group, the Social Network Analysis in Practice group, and a Gephi group.
D.J. Watts and S.H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature 393, 440-442 (1998).
The neural network of the worm Caenorhabditis elegans, the power grid of the western United States, and the collaboration graph of ﬁlm actors are shown to be small-world networks.
Models of dynamical systems with small-world coupling display enhanced signal-propagation speed, computational power, and synchronizability. In particular, infectious diseases spread more
easily in small-world networks than in regular lattices
Preventing traffic jams:
The paradox is stated as follows:
“For each point of a road network, let there be given the number of cars starting from it, and the destination of the cars. Under these conditions one wishes to estimate the distribution of traffic flow. Whether one street is preferable to another depends not only on the quality of the road, but also on the density of the flow. If every driver takes the path that looks most favorable to him, the resultant running times need not be minimal. Furthermore, it is indicated by an example that an extension of the road network may cause a redistribution of the traffic that results in longer individual running times.”
The reason for this is that in a Nash equilibrium, drivers have no incentive to change their routes. If the system is not in a Nash equilibrium, selfish drivers must be able to improve their respective travel times by changing the routes they take. In the case of Braess’s paradox, drivers will continue to switch until they reach Nash equilibrium, despite the reduction in overall performance.
Braess Paradox - adding resources (routes) can increase scarcity (traffic jams)
Braess’s paradox, credited to the German mathematician Dietrich Braess, states that adding extra capacity to a network when the moving entities selfishly choose their route, can in some cases reduce overall performance. This is because the Nash equilibrium of such a system is not necessarily optimal.
Braess Paradox - adding resources (routes) can increase scarcity (traffic jams)
Force-directed graph drawing algorithms assign forces among the set of edges and the set of nodes of a graph drawing. Typically, spring-like attractive forces based on Hooke’s law are used to attract pairs of endpoints of the graph’s edges towards each other, while simultaneously repulsive forces like those of electrically charged particles based on Coulomb’s law are used to separate all pairs of nodes. In equilibrium states for this system of forces, the edges tend to have uniform length (because of the spring forces), and nodes that are not connected by an edge tend to be drawn further apart (because of the electrical repulsion).
Networks represent the architecture of complexity. Fully understand complex systems need to move beyond architecture and uncover laws governing underlying dynamic processes. more importantly how two layers, architecture and dynamics, evolve together. new era stephen hawking called “century of complexity”
“I think the next [21st] century will be the century of complexity. We have already discovered the basic laws that govern matter and understand all the normal situations. We don’t know how the laws fit together, and what happens under extreme conditions. But I expect we will find a complete unified theory sometime this century. The is no limit to the complexity that we can build using those basic laws.”
— Stephen Hawking
Management consulting companies, technology providers, social networking sites, and business corporations are starting now to address their attention towards SNA as a management tool and business opportunity. However, far from being a mainstream management innovation, SNA is still a research-driven set of theories and methodologies with little applications in the business world. However, the more company data are digitalized, collected, stored, organized, and integrated in enterprise data warehouses, the more data mining tools are able to extract information and knowledge, the more SNA will be able support the identification and management of internal or external social networks for the creation of business value.
How are networks formed? If networks are the common structure behind so many radically different systems in nature, could we also find common laws that govern their evolution, much like the laws of physics and chemistry?
An answer to this question was proposed in 1959 by Erdős and Rényi. Their answer was very simple: randomness. Nature grows networks by connecting its nodes randomly. The highly complex structures found in networks were basically a consequence of this randomness.
This idea - the Theory of Random Networks - dominated our view of networks for many decades. And for a good reason: most natural phenomena (imagine the height of your co-workers or the number of consecutive tails in a coin toss) fit this model. Their measures are characterized by a bell curve, with a peaked distribution that decays rapidly on both sides. Most values are close to the average, and that extreme values are rare. For random networks, that means that most nodes will have about the same number of links, with a few exceptions to either side and almost no extreme outliers.
As an example, let’s look at the network formed by US cities and the interstate highways that connect them. Every city is connected to at least another city by an interstate, most cities have about the same number of roads connecting them, and there is no city connected directly to hundreds of others.
When we look at the distribution of the cities link counts (or node degree as it is called in graph theory) in a histogram, we can see the typical peak distribution of a random network.
(map and histogram for interstate).
But as time went by, scientists and researchers started to find networks where this rule simply didn’t apply. The most obvious example is the World Wide Web - the first “crawlers” came back reporting that some sites had a disproportionate number of links pointing to them, and they were far from being as rare as predicted by the distribution associated with random networks.
Soon other examples were found, from social networks to citations in scientific journals to interactions between molecules within cells.
A closer look at the link distribution of these networks showed not a bell curve, but a continuously decreasing curve. This curve is the visual signature of a power law, and it has two major implications on how this new class of networks evolve:
These nodes - usually called hubs - have a significant impact on how the events propagate across the network. They are at the center of a number of disruptive behaviors, from epidemics to the spreading of news and riots to cascading financial crises.
We can see these characteristics in the US airline routing network. Here the network is formed by airports connected by direct flights. The first feature that jumps at us is the existence of major hubs - airports like Chicago and Atlanta. Then you notice that most other airports have but a few links and are usually connected to one or more hubs.
A histogram of node degree will clearly show the continuously decreasing curve associated with a power law.
(map and histogram for airports)
A peculiar aspect of complex systems, and one of the reasons they can be so difficult to predict, is the interplay between its structure and the behavior of its components.
This is particularly visible in social networks. Who you know affects what you do, and vice-versa. In this scenario, the network is comprised of actors - mostly people, but sometimes also automated agents or bots - linked by their relationships and the actions they enable, like following, liking and re-tweeting. The impact of what the actors do is largely a function of with whom they relate. Actors in critical places in the topology can affect the entire group in drastic ways.
So how do we go about understanding social networks? We being by asking two basic questions:
These questions reveal how the macro and micro aspects of a network influence each other. Key actors shape the network structure (think about the hubs on a scale-free network) while communities define the scope of their influence.
Community detection, or clustering, is the process by which a network is partitioned into communities such that links within community subgraphs are more densely connected than the links between communities [optgraph.pdf].
The study of communities has great practical value as the nodes in these communities usually display common properties or have similar preferences - for example, the partition of the blogsphere along political lines [politicosphere] or gang membership among criminals [orca].
Thus identifying the communities in a network can further the understanding of how network function and topology affect each other. This has direct applications in marketing, sociology and many other areas.
Here is an example of how SAS Visual Analytics can be used to explore the communities in a network, based on the data from Hartford, CT drug users study[drug_sna]. Fig. X shows a visualization of the network structure:
For such a small graph, network visualization by itself can provide many insights. We can see a clear bifurcation within the network, where each region appears to cluster around central communities. We can also see sparse peripheral structures with long “pendant chains”.
But we can learn a lot more about this network structure by using community detection. As we saw in the Data Preparation session, OPTGRAPH can be used to calculate and then annotate our graph dataset with a large number of network measures - including information about its communities.
We can use this information to color the visualization by community (Fig X) allowing us to clearly differentiate between them.
Furthermore, we can use the community information as a categorical filter. That allow us to narrow our focus to a single community at a time, a starting point for further exploration and analysis. With less data we can add more detail, like labels and arrow directions, without the risk of overwhelming our ability to make sense of the visualization (Fig. X).
There are different ways to measure who is important - or central - to a network. Examples of such centrality measures include:
Key Actor Analysis [Conway] identifies key actors by plotting actors’ scores for Eigenvector centrality versus Betweenness. Any actor with a high score on both measures is obviously an important node in the network. But given how these measures are expected to be approximately linear, any non-linear outliers are also considered to be key actors playing very specific roles.
An actor with high Betweenness but low Eigenvector centrality may provide the only path to a central actor. These are “gate-keepers”, connecting actors to a session of the network that would otherwise be isolated from the core.
On the other hand an actor with low Betweenness but high Eigenvector centrality may have unique access to central actors. These are “pulse-takers”, well-connected actors at the core of the network.
These criteria allow us to identify not only key actors inside the network’s core, but also those with unique structural positions.
The implementation of the Key Actor Analysis in SAS Visual Analytics is straightforward. We already saw how to visualize the network structure. We can enrich that display by assigning centrality measures (calculated by OPTGRAPH during Data Preparation) to visual attribute roles like node size and color. Next we create a bubble plot of the Eigenvalue centrality by Betweenness.
As illustrated in Fig. X, the bubble plot makes it easy to visually identify the key actors and data linking and brushing between the two visualizations highlights these actors in the context of the network structure. We can identify the core leadership, the mid-level actors (pulse-takers) and the bridges between communities (gate-keepers).
The analysis is not perfect though; some key actors are not highlighted as the centrality measures are skewed by the data asymmetry between the two large communities in the map. At this point the community detection and filtering described above could be used to refine the results.
Degree - counts how many people are connected to you. How many friends in Facebook (symmetric, or undirected relation), or followers in Twitter (asymmetric or directed).
Closeness - if you are close to everyone, you should have a high closeness score. This is based on the notion of distance between nodes in a connected graph. The distance between two nodes is the length of the shortest path between the two nodes. Closeness of a node is the sum of the summation of 2^distance of this node to all other nodes.
Betweenness is a measure of how often a node lies along the shortest path of other nodes. In practice it measures the extent to which nodes in a network can reach each other through a given node. Information is more likely to flow through nodes with high betweenness scores.
Eigenvector Centrality is proportional to the centrality of other nodes connected to a particular node in a network. It measures if a node is popular among the popular nodes. Google’s PageRank is an example of this centrality.
The Impact of Social Media on Social Unrest in the Arab Spring
The internet role in Egypt demonstrations
The online social networks played main role in Egypt’s demonstration in January 2011, mainly, Facebook and Twitter were the basic tools used in the demonstrations. Facebook is well spread in Egypt since 2008, its popularity based on political events too, when it used to very wide invitation to a strike in all Egypt on April 6, 2008, which was very successful invitation and after that a group of political activist form and entitled “6th April Youth Movement”.
I think Twitter was the big winner because it started to be well known in Egypt during January events.
By an inspector eyes; we can define three main roles of Facebook and Twitter in January demonstrations; Call for demonstrations, dissemination of demonstrations news, and increasing information circulation…
The Egyptian Revolution on Twitter
A system is an interconnected set of elements that is coherently organized in a way that achieves something. If you look at that definition closely for a minute, you can see that a system must consist of three kinds of things: elements, interconnections, and a function or purpose.
Donella, Meadows. (2013-01-18). Thinking in Systems: A Primer (p. 11). Chelsea Green Publishing. Kindle Edition.
A supply chain is a network of organizations performing various processes and activities to produce value in the form of products and services for the end customer (Christopher, 1992).
Logistics is the critical element in supply chain management. The main task of logistics facilitates is managing material and information flows. This task is a key part of the overall task of SCM. SCM is concerned with managing the entire chain of processes from raw material supply to the end-customer (Hutt & Speh, 2004).
Therefore, we conclude that social media on its own
has been a useful but not sufficient tool for the organization and implementation of protest
Social media played a distinctive role in two separate ways that we believe are particularly salient to our policy recommendations. First, it served to boost international attention to particular events by facilitating reporting from places where the traditional media has limited access to, and by providing a bottom-up, decentralized process for generating news stories.
Second, the positive use of social media by many protesters during the Arab Spring to discuss ideas and plan protest activities is being increasingly countered by its use by governments eager to repress the activities of protesters and stymie democratic movements.