• Parallel Algorithms for Search and Analysis of Significant News

  • Introduction

  • Literature review

  • Parallel algorithms for search and analysis of significant news

  • Comparison of the approaches and Experimental results

  • Conclusion

  • References

  • Nowadays hundreds of information sources and media publish every day information about different events in the field of economics and business. They publish thousands of articles, mentions many companies and deals. This quantity of regularly announced data can contain a valuable information to help to predict or even to foresee events in economics and business field, to detect hidden facts and correlations, to define the loyalty and the tolerance of certain sources and media towards different companies and deals.

    Thus to find and analyze significant news the following research topics and task are of interest:

    1. Determine criteria to mark a news or a publication as a significant one in certain context. Text mining methods, clustering methods as well as ranking methods could be applied here.
    2. Suggest advanced algorithms for searching of significant news among a big amount of data. Parallel algorithms based on ranking, Latent Semantic Indexing and clustering and some optimization methods such as Multi-Criteria Optimization will be investigated due to the fact that operating speed matters and much of data should be scanned regularly. So the problem of determining significance and clustering in case of Big Data should be researched. Both parallel stochastic and hybrid methods will be developed.
    3. Analyze the whole set of information taking into account historical data and daily updates. Prediction problems occur here: we have to determine trends and significance here about the occurrence of hidden facts and correlations, about loyalty of media and future events.
  • Parallel algorithms justification. (Stochastic and Hybrid Algorithms)

  • Network science approach

  • Optimization approach

    1. Albert-László Barabási (2013). Network Science.
    2. Gourab Ghoshal, Albert-László Barabási (2011). Ranking stability and super-stable nodes in complex networks.
    3. Dashun Wang, Zhen Wen, Hanghang Tong, Ching-Yung Lin, Chaoming Song, Albert-László Barabási (2011). Information Spreding in Context.
  • Introduction

    The role of Internet mass media increased greatly in the recent years. Internet media get more and more audience coverage and thereby its role in forming of public opinion strengthens. Public organizations, political parties, companies as well as public figures, politicians and entrepreneurs are interested in any mentioning of them or of some specific facts concerning them in media including Internet media. They often trace a quantity of web-sites citing them, and different sources where some significant news could appear. Significant news finding could be also useful for common Internet-uses since the amount of daily media publications is large and some news which are significant for someone could be missed. Thus significant (in some context) news identification is essential and actual problem.

    The origin of news provider could be some Internet mass-media (such as cnn.com, bbc.com and others) as well as a page in a social network or blog. We will assume that each news has its unique URL-address - the address of web page where text of news was published.

    The news spreads in WWW after its appearance and a quantity of references to the original web page appears. Besides, there could be more than one origin/source of the news. It’s common that the same news appears in several web-sites independently in a short space of time. However, there could be some content difference between them. Thus the collection of all web-pages with published news and web-pages which refer to the former, forms a network. We will be able to make conclusions about the significance of news, their interconnections and mutual influence by analysing the structure and properties of such network.

    Thus we have a network which nodes are web-pages unambiguously identified by URL-addresses and which links are references between web-pages as their URL-addresses. Since it is well-known that WWW network is scale-free [1], it is natural to assume that the network we are working with is scale-free as well. Thus, it is necessary to compose the network degree distribution in order to prove if the network is scale-free. In this case the degree distribution can be approximated by the power law.

    It is necessary to specify degree distribution in order to acquire more accurate research data for the network properties. In real processes which contribute to the evolution of real networks, we often observe systematic deviations from the pure power law. There’re some well-known distributions which often describe real networks: stretched exponential distribution, fitness-induced corrections, small-degree cutoffs, exponential cutoffs [1]. Currently we are building a generative model that analytically predicts the expected functional form of degree distribution, because this is a key factor determining system’s properties.

    We research and interpret such system property as node fitness - it’s an ability of each node to make new connections in the network. Most likely that if a node (web page in our network) has a high value of fitness then the corresponding news are significant, because it attracts a lot of references. We are also analyzing such nodes for super-stable property [2] and try to interpret that. News spreading features [3] represents a particular interest in our research. Currently we are researching these in order to understand and model the network evolution path which determines network properties [1]. We are also investigating the thresholds when such a news becomes significant. It is planned also to present some initial findings of our research.

  • Network identification

  • Network simulation and analysis

{"cards":[{"_id":"3fc43b7b08023c424700010c","treeId":"3fc4320f08023c4247000109","seq":1,"position":1,"parentId":null,"content":"# Parallel Algorithms for Search and Analysis of Significant News"},{"_id":"3fc47f8708023c424700010d","treeId":"3fc4320f08023c4247000109","seq":1,"position":1,"parentId":"3fc43b7b08023c424700010c","content":"## Introduction"},{"_id":"3fc4a17908023c424700011a","treeId":"3fc4320f08023c4247000109","seq":1,"position":1,"parentId":"3fc47f8708023c424700010d","content":"Nowadays hundreds of information sources and media publish every day information about different events in the field of economics and business. They publish thousands of articles, mentions many companies and deals. This quantity of regularly announced data can contain a valuable information to help to predict or even to foresee events in economics and business field, to detect hidden facts and correlations, to define the loyalty and the tolerance of certain sources and media towards different companies and deals.\n\nThus to find and analyze significant news the following research topics and task are of interest:\n1. Determine criteria to mark a news or a publication as a significant one in certain context. Text mining methods, clustering methods as well as ranking methods could be applied here.\n2. Suggest advanced algorithms for searching of significant news among a big amount of data. Parallel algorithms based on ranking, Latent Semantic Indexing and clustering and some optimization methods such as Multi-Criteria Optimization will be investigated due to the fact that operating speed matters and much of data should be scanned regularly. So the problem of determining significance and clustering in case of Big Data should be researched. Both parallel stochastic and hybrid methods will be developed.\n3. Analyze the whole set of information taking into account historical data and daily updates. Prediction problems occur here: we have to determine trends and significance here about the occurrence of hidden facts and correlations, about loyalty of media and future events.\n"},{"_id":"3fc4814408023c424700010e","treeId":"3fc4320f08023c4247000109","seq":1,"position":2,"parentId":"3fc43b7b08023c424700010c","content":"## Literature review"},{"_id":"3fc481da08023c424700010f","treeId":"3fc4320f08023c4247000109","seq":1,"position":3,"parentId":"3fc43b7b08023c424700010c","content":"## Parallel algorithms for search and analysis of significant news"},{"_id":"3fc494bd08023c4247000114","treeId":"3fc4320f08023c4247000109","seq":1,"position":1,"parentId":"3fc481da08023c424700010f","content":"## Parallel algorithms justification. (Stochastic and Hybrid Algorithms)"},{"_id":"3fc4955c08023c4247000115","treeId":"3fc4320f08023c4247000109","seq":1,"position":2,"parentId":"3fc481da08023c424700010f","content":"## Network science approach"},{"_id":"3fc4eb7908023c424700011b","treeId":"3fc4320f08023c4247000109","seq":1,"position":0.5,"parentId":"3fc4955c08023c4247000115","content":"### Introduction\nThe role of Internet mass media increased greatly in the recent years. Internet media get more and more audience coverage and thereby its role in forming of public opinion strengthens. Public organizations, political parties, companies as well as public figures, politicians and entrepreneurs are interested in any mentioning of them or of some specific facts concerning them in media including Internet media. They often trace a quantity of web-sites citing them, and different sources where some significant news could appear. Significant news finding could be also useful for common Internet-uses since the amount of daily media publications is large and some news which are significant for someone could be missed. Thus significant (in some context) news identification is essential and actual problem.\n\nThe origin of news provider could be some Internet mass-media (such as cnn.com, bbc.com and others) as well as a page in a social network or blog. We will assume that each news has its unique URL-address - the address of web page where text of news was published.\n\nThe news spreads in WWW after its appearance and a quantity of references to the original web page appears. Besides, there could be more than one origin/source of the news. It’s common that the same news appears in several web-sites independently in a short space of time. However, there could be some content difference between them. Thus the collection of all web-pages with published news and web-pages which refer to the former, forms a network. We will be able to make conclusions about the significance of news, their interconnections and mutual influence by analysing the structure and properties of such network.\n\nThus we have a network which nodes are web-pages unambiguously identified by URL-addresses and which links are references between web-pages as their URL-addresses. Since it is well-known that WWW network is scale-free [1], it is natural to assume that the network we are working with is scale-free as well. Thus, it is necessary to compose the network degree distribution in order to prove if the network is scale-free. In this case the degree distribution can be approximated by the power law.\n\nIt is necessary to specify degree distribution in order to acquire more accurate research data for the network properties. In real processes which contribute to the evolution of real networks, we often observe systematic deviations from the pure power law. There’re some well-known distributions which often describe real networks: stretched exponential distribution, fitness-induced corrections, small-degree cutoffs, exponential cutoffs [1]. Currently we are building a generative model that analytically predicts the expected functional form of degree distribution, because this is a key factor determining system’s properties.\n\nWe research and interpret such system property as node fitness - it’s an ability of each node to make new connections in the network. Most likely that if a node (web page in our network) has a high value of fitness then the corresponding news are significant, because it attracts a lot of references. We are also analyzing such nodes for super-stable property [2] and try to interpret that. News spreading features [3] represents a particular interest in our research. Currently we are researching these in order to understand and model the network evolution path which determines network properties [1]. We are also investigating the thresholds when such a news becomes significant. It is planned also to present some initial findings of our research."},{"_id":"3fc495ba08023c4247000116","treeId":"3fc4320f08023c4247000109","seq":1,"position":1,"parentId":"3fc4955c08023c4247000115","content":"### Network identification"},{"_id":"3fc495f208023c4247000117","treeId":"3fc4320f08023c4247000109","seq":1,"position":2,"parentId":"3fc4955c08023c4247000115","content":"### Network simulation and analysis"},{"_id":"3fc4967908023c4247000118","treeId":"3fc4320f08023c4247000109","seq":1,"position":3,"parentId":"3fc481da08023c424700010f","content":"## Optimization approach"},{"_id":"3fc4937d08023c4247000110","treeId":"3fc4320f08023c4247000109","seq":1,"position":4,"parentId":"3fc43b7b08023c424700010c","content":"## Comparison of the approaches and Experimental results"},{"_id":"3fc4943608023c4247000112","treeId":"3fc4320f08023c4247000109","seq":1,"position":5,"parentId":"3fc43b7b08023c424700010c","content":"## Conclusion"},{"_id":"3fc4947208023c4247000113","treeId":"3fc4320f08023c4247000109","seq":1,"position":6,"parentId":"3fc43b7b08023c424700010c","content":"## References"},{"_id":"3fc49d1f08023c4247000119","treeId":"3fc4320f08023c4247000109","seq":1,"position":1,"parentId":"3fc4947208023c4247000113","content":"1. Albert-László Barabási (2013). Network Science.\n2. Gourab Ghoshal, Albert-László Barabási (2011). Ranking stability and super-stable nodes in complex networks.\n3. Dashun Wang, Zhen Wen, Hanghang Tong, Ching-Yung Lin, Chaoming Song, Albert-László Barabási (2011). Information Spreding in Context.\n\t\t\t\t"}],"tree":{"_id":"3fc4320f08023c4247000109","name":"Significant News","publicUrl":"blokh-significant-news"}}