Abstract: The study of multipath communication technologies is a hot research area today. One natural effect of using multipath communication instead of the single path one is the higher throughput value which will result in a better performance, not only in the usual Internet communication, but also in Big Data centers where the communication infrastructure can appear as a bottleneck point of the system. In this paper, we introduce a new game theoretical model for the evaluation of multiuser-multipath communication technologies. The decision problem for the users (i.e. network clients) is studied in a multipath communication system. We develop a game theoretical model for client payoff maximization, where the decision variables for each client are defined as their path requests. Due to limited hardware performance and limited service capacity, we assume that each client’s payoff depends on other clients’ path requests. We apply the tools of game theory to describe equilibrium behavior of the clients in the given interaction situation. By providing two examples, we show that our model is suitable for measuring payoffs, both in money and in throughput. We also offer possible directions for the further development of our model.
Keywords: multipath communication; throughput aggregation; Data Center Networks; game theory modeling; concave games; performance analysis
Abstract: Classification of electroencephalograph (EEG) data is the common denominator in various recognition tasks related to EEG signals. Automated recognition systems are especially useful in cases when continuous, long-term EEG is recorded and the resulting data, due to its huge amount, cannot be analyzed by human experts in depth. EEG-related recognition tasks may support medical diagnosis and they are core components of EEGcontrolled devices such as web browsers or spelling devices for paralyzed patients. Stateof-the-art solutions are based on machine learning. In this paper, we show that EEG datasets contain hubs, i.e., signals that appear as nearest neighbors of surprisingly many signals. This paper is the first to document this observation for EEG datasets. Next, we argue that the presence of hubs has to be taken into account for the classification of EEG signals, therefore, we adapt hubness-aware classifiers to EEG data. Finally, we present the results of our empirical study on a large, publicly available collection of EEG signals and show that hubness-aware classifiers outperform the state-of-the-art time-series classifier.
Keywords: electroencephalograph; nearest neighbor; classification; hubs
Abstract: Establishing an effective defense strategy in IT security is essential on one hand, but very challenging on the other hand. According to the 2014 Cyberthreat Defence Report  that involved more than 750 security decision makers and practitioners, more than 60% of organizations had been breached in 2013. Big data analytics in security provides the possibility to gather and analyse massive amounts of digital information in order to predict and prevent these attacks. However, since collecting the needed data in an efficient, complete and reliable fashion encounters problems, the industry is lacking and could truly benefit from a tool offering benchmark data, provided in a platform, which would allow gauging and improving the effectiveness of security defence algorithms. To this end in this paper we introduce a platform that allows one to generate large parametrized datasets of simulated Internet traffic consisting of the combination of attack-free and malicious network traffic patterns. For the simulations we use the ns3 discrete-event network simulator. To make the resulting dataset appropriate for intrusion detection system benchmarking purposes we investigate the statistical characteristics of normal and intrusive traffic patterns. Finally we present a use case in which we validate our results.
Keywords: Network Traffic Simulation; Intrusion Detection; DDOS; NS-3
Abstract: Big Data is a technology developed for 3-V management of data by which large volumes and different varieties of data would be processed in optimal velocity. The data to be dealt with may be structured or unstructured. Relational databases (spreadsheets) are typical examples of structured data and the methods, as well as the techniques for researches of relational database management are well-known. In this paper, we describe a formalism, by which, structured data, can be considered as a directly generalized model of relational databases. A higher leveled structured data, in our generalization, are defined recursively as a set or a queue of lower leveled structured data. Consequently, our study proves that many concepts and results of relational database management can be transferred to structured data, accordingly to this generalization. The sub data, the components of structured data, the functional dependencies between structured data, as well as the keys data in structured data are defined and studied. Alternately, some concepts that are defined here for structured data can be applied for relational databases, as a special case. In this paper, some operations on structured data and the homomorphism between structured data are defined and studied that appear to be quite suitable for relational databases. In fact, the formalization introduced here, offers effective methods for further structural, algebraic researches of structured data.
Keywords: Data management; Big Data; Structured data; Relational database; Lattice; Partially ordered set
Abstract: Cluster computing frameworks are important in the “Big Data” world. The famous common framework is the MapReduce that was introduced by Google. This framework is used by many of companies. However, this technique doesn't effectively solve all analytical problems. Some cases need another framework and these frameworks can work in the cluster. In this case, the cluster needs a manager that manages the framework. Therefore, the performance analysis of cluster management systems will be important. In this paper, we compare the performance of two most well-known cluster management systems (Yarn, Mesos) with stress cases. We analyze the resource usage techniques of the management systems.
Keywords: cluster management; resource sharing; scheduling
Abstract: The efforts of the European Union (EU) in the energy supply domain aim to introduce intelligent grid management across the whole of the EU. The target intelligent grid is planned to contain 80% of all meters to be smart meters generating data every 15 minutes. Thus, the energy data of EU will grow rapidly in the very near future. Smart meters are successively installed in a phased roll-out, and the first smart meter data samples are ready for different types of analysis in order to understand the data, to make precise predictions and to support intelligent grid control. In this paper, we propose an incremental heterogeneous ensemble model for time series prediction. The model was designed to make predictions for electricity load time series taking into account their inherent characteristics, such as seasonal dependency and concept drift. The proposed ensemble model characteristics – robustness, natural ability to parallelize and the ability to incrementally train the model – make the presented ensemble suitable for processing streams of data in a “big data” environment.
Keywords: big data; time series prediction; incremental learning; ensemble learning
Abstract: Mobile environments are based on wireless communication, and wireless networks that provide communication services via radio signals. Although both mobile and wireless systems may be free from space constraints, they suffer from certain unstable characteristics. These problems can be compensated by applying some countermeasures to the communication environment. This paper presents an adaptive communication management model based on fuzzy logic. The proposed model includes an estimation module to control the flow throughput, and adopts a policy of providing greater benefits to better links. In addition, the model includes tuned snooping and retransmission schemes to ensure the quality-of-service of wireless communication. Simulation results verify the efficiency of the proposed model.
Keywords: Adaptive Flow Management; Fuzzy Logic; Mobile Wireless Network
Abstract: The paper introduces a novel conceptualization algorithm optimized for a distributed, Big Data environment. The proposed method uses a concept generation module based on clique detection in the context graph. The presented work proposes a novel incremental version of the Bron-Kerbosch maximal clique detection method. The efficiency of the method is evaluated with random context tests. The presented incremental model is even comparable with the usual batch methods. The analysis of the clique detection algorithm in MapReduce architecture provides efficiency comparison for large scale contexts.
Keywords: ontologization; clique detection; incremental clique generation; mapreduce architecture
Abstract: In this paper, we discuss whether data collected from monitoring software developers' logs can be considered big. We hypothesize that it falls within the category of Big Data. The main topic of our paper however, is how to facilitate analysis of such data. Due to the specificity of the monitored activity, the analysis is at least partially explorative in its nature. We hypothesize that visualisation can be a productive approach in such a case. We present several visualisation schemes (diagram types) and show those applied to explorative analysis of data gathered within one four year project that we have been participating in.
Keywords: Activity log; log stream; Programmer; Software development; Visualisation; Big data
Abstract: The large amount of information that is currently being collected (the so-called “big data”), have resulted in model-based Collaborative Filtering (CF) methods to encountering limitations, e.g., the sparsity problem and the scalability problem. It is difficult for model-based CF methods to address the scalability-performance trade-off. Therefore, we propose a scalable clustering-based CF method in this paper that can help provide a balance by re-locating elements in the cluster model. The proposed method is evaluated by performing a comparison against existing methods in terms of measurements for the Mean Absolute Error (MAE) and response time to assess the performance and scalability. The experimental results show that the proposed method improves the MAE and the response time by 50.79% and 48.25%, respectively.
Keywords: Big data; Recommender System; Adaptive System; Clustering-based Collaborative Filtering; Scalable System
Abstract: The information on the web is not only published by an original language, but also expressed in many different languages. Almost recommendation systems also lack mechanisms to support users overcoming the language problem. In these systems, it is difficult to search a specific value (e.g., movie artist, movie title in movie domain) by using native language. In this paper, we present our approach to deal with this problem. We develop an ontology-based multilingual recommendation system using integrated data from Linked Open Data to support user with in different languages on movie domain. Multilingual Movie Recommendation System (MMRS) for searching as a case of study is developed. In this system, we illustrate a more comfortable and flexible implementation
Keywords: multilingual entities; Linked Open Data; interlink; movie; recommendation system
Abstract: Production flow analysis includes various families of components and groups of machines. Machine-part cell formation means the optimal design of manufacturing cells consisting of similar machines producing similar products from a similar set of components. Most of the algorithms reorders of the machine-part incidence matrix. We generalize this classical concept to handle more than two elements of the production process (e.g. machine - part - product - resource - operator). The application of this extended concept requires an efficient optimization algorithm for the simultaneous grouping these elements. For this purpose, we propose a novel co-clustering technique based on crossing minimization of layered bipartite graphs. The present method has been implemented as a MATLAB toolbox. The efficiency of the proposed approach and developed tools is demonstrated by realistic case studies. The log-linear scalability of the algorithm is proven theoretically and experimentally.
Keywords: cell formation; co-clustering; co-crossing minimization
Abstract: Early Big Data solutions were not based on database management system principles. As the popularity of these solutions have increased and are applied in more data management scenarios in the recent years, DBMS principles, are being recognized as important factors and are becoming important in newly developed solutions. Big Data collections do not enforce document structure, but just because a data store is schema-less, it does not mean the structure of the stored documents will not play an important role in the overall performance and flexibility of an application. In this paper we will explore a method for the conceptual modeling for document based databases, using Formal Concept Analysis (FCA). We have shown that FCA is a valuable visual analyzer for large-scale data, for example, offering a means of reading the possibility of nested scheme design from the built concept lattice. Results of experiments using our method have proven that decisions affecting the modeling of data can affect application performance and database capacity.
Keywords: conceptual design; NoSQL database; document store; Formal Concept Analysis