Nowadays hundreds of information sources and media publish every day information about different events in the field of economics and business. They publish thousands of articles, mentions many companies and deals. This quantity of regularly announced data can contain a valuable information to help to predict or even to foresee events in economics and business field, to detect hidden facts and correlations, to define the loyalty and the tolerance of certain sources and media towards different companies and deals.
Thus to find and analyze significant news the following research topics and task are of interest:
The role of Internet mass media increased greatly in the recent years. Internet media get more and more audience coverage and thereby its role in forming of public opinion strengthens. Public organizations, political parties, companies as well as public figures, politicians and entrepreneurs are interested in any mentioning of them or of some specific facts concerning them in media including Internet media. They often trace a quantity of web-sites citing them, and different sources where some significant news could appear. Significant news finding could be also useful for common Internet-uses since the amount of daily media publications is large and some news which are significant for someone could be missed. Thus significant (in some context) news identification is essential and actual problem.
The origin of news provider could be some Internet mass-media (such as cnn.com, bbc.com and others) as well as a page in a social network or blog. We will assume that each news has its unique URL-address - the address of web page where text of news was published.
The news spreads in WWW after its appearance and a quantity of references to the original web page appears. Besides, there could be more than one origin/source of the news. It’s common that the same news appears in several web-sites independently in a short space of time. However, there could be some content difference between them. Thus the collection of all web-pages with published news and web-pages which refer to the former, forms a network. We will be able to make conclusions about the significance of news, their interconnections and mutual influence by analysing the structure and properties of such network.
Thus we have a network which nodes are web-pages unambiguously identified by URL-addresses and which links are references between web-pages as their URL-addresses. Since it is well-known that WWW network is scale-free [1], it is natural to assume that the network we are working with is scale-free as well. Thus, it is necessary to compose the network degree distribution in order to prove if the network is scale-free. In this case the degree distribution can be approximated by the power law.
It is necessary to specify degree distribution in order to acquire more accurate research data for the network properties. In real processes which contribute to the evolution of real networks, we often observe systematic deviations from the pure power law. There’re some well-known distributions which often describe real networks: stretched exponential distribution, fitness-induced corrections, small-degree cutoffs, exponential cutoffs [1]. Currently we are building a generative model that analytically predicts the expected functional form of degree distribution, because this is a key factor determining system’s properties.
We research and interpret such system property as node fitness - it’s an ability of each node to make new connections in the network. Most likely that if a node (web page in our network) has a high value of fitness then the corresponding news are significant, because it attracts a lot of references. We are also analyzing such nodes for super-stable property [2] and try to interpret that. News spreading features [3] represents a particular interest in our research. Currently we are researching these in order to understand and model the network evolution path which determines network properties [1]. We are also investigating the thresholds when such a news becomes significant. It is planned also to present some initial findings of our research.