Abstract: Location-based recommender systems (LBRSs) provide a technological solution for helping users to cope with the vast amount of information coming from geo-localization services. Most online social networks capture the geographic location of users and their points-of-interests (POIs). Location-based social networks (LBSNs), like Foursquare, lever- age technologies such as GPS, Web 2.0 and smartphones allow users to share their locations (check-ins), search for POIs, look for discounts, comment about specific places, connect with friends and find the ones who are near a specific location. LBRSs play an important role in social networks nowadays as they generate suggestions based on techniques such as collaborative filtering (CF). In this traditional recommendation approach, prediction about a user preferences are based on the opinions of like-minded people. Users that can provide valuable information for prediction need to be first selected from the complete network and, then, their opinions weighted according to their expected contribution. In this paper, we propose and analyze a number of strategies for selecting neighbors within the CF framework leveraging on information contained in the users’ social network, common visits, visiting area and POIs categories as influential factors. Experimental evaluation with data from Foursquare social network shed some light on the impact of different mechanisms on user weighting for prediction.
Keywords: Location-based social networks; recommender systems; user-based collaborative filtering.
Abstract: A bipolar rating scale is a linearly ordered set with symmetry between elements considered as negative and positive categories. First, we present a survey of bipolar rating scales used in psychology, sociology, medicine, recommender systems, opinion mining, and sentiment analysis. We discuss different particular cases of bipolar scales and, in particular, typical structures of bipolar scales with verbal labels that can be used for construction of bipolar rating scales. Next, we introduce the concept of bipolar scoring function preserving linear ordering and the symmetry of bipolar scales, study its properties, and propose methods for construction of bipolar scoring functions. We show that Pearson’s correlation coefficient often used for analysis of relationship between profiles of ratings in recommender systems can be misleading if the rating scales are bipolar. Basing on the general methods of construction of association measures, we propose new correlation measures on bipolar scales free from the drawbacks of Pearson’s correlation coefficient. Our correlation measures can be used in recommender systems, sentiment analysis and opinion mining for analysis of possible relationship between opinions of users and their ratings of items.
Keywords: rating scale; bipolar scale; recommender system; opinion mining; sentient analysis; correlation; association measure
Abstract: For the Authorship Attribution (AA) task, character n-grams are considered among the best predictive features. In the English language, it has also been shown that some types of character n-grams perform better than others. This paper tackles the AA task in Portuguese by examining the performance of different types of character n-grams, and various combinations of them. The paper also experiments with different feature representations and machine-learning algorithms. Moreover, the paper demonstrates that the performance of the character n-gram approach can be improved by fine-tuning the feature set and by appropriately selecting the length and type of character n-grams. This relatively simple and language-independent approach to the AA task outperforms both a bag-of-words baseline and other approaches, using the same corpus.
Keywords: authorship attribution; character n-grams; Portuguese; stylometry; computational linguistics; machine learning
Abstract: In this paper, we introduce an algorithm for obtaining the subtrees (continuous and non-continuous syntactic n-grams) from a dependency parse tree of a sentence. Our algorithm traverses the dependency tree of the sentences within a text document and extracts all its subtrees (syntactic n-grams). Syntactic n-grams are being successfully used in the literature (by ourselves and other authors) as features to characterize text documents using machine learning approach in the field of Natural Language Processing.
Keywords: syntactic n-grams; subtrees extraction; tree traversal; linguistic features
Abstract: In this paper, we describe an approach to create a summary obfuscation corpus for the task of plagiarism detection. Our method is based on information from the Document Understanding Conferences related to years 2001 and 2006, for the English language. Overall, an unattributed summary used within someone else’s document is considered a kind of plagiarism because the main author’s ideas are still in a succinct form. In order to create the corpus, we use a Named Entity Recognizer (NER) to identify the entities within an original document, its associated summaries, and target documents. After, these entities, together with similar paragraphs in target documents, are used to make fake suspicious documents and plagiarized documents. The corpus was tested in plagiarism competition.
Keywords: corpus generation; plagiarism detection; obfuscation strategies
Abstract: This paper proposes the usage of computational techniques that allow for automatic analysis of the vocabulary contained in an explanatory dictionary. It is proposed for the extraction of a set of words, called semantic primitives, which are considered those allowing the creation of a system used to establish definitions in dictionaries. The proposed approach is based on the representation of a dictionary as a directed graph and the combination of a multi-objective differential evolution algorithm with the PageRank weighting algorithm. The differential evolution algorithm extracted a set of primitives that fulfill two objectives: minimize the set size and maximize its degree of representation (PageRank), allowing the creation of a computational dictionary without cycles in its definitions. We experimented with a RAE dictionary of Spanish. Our results present improvement over other algorithms that are representative of the state-of-the-art.
Keywords: lexicography; computational lexicography; semantic primitives; defining vocabulary; explanatory dictionary; multiobjective bioinspired algorithms; differential evolution; weighting algorithms; PageRank.
Abstract: There are different channels to communicate the results of a scientific research; however, several research communities state that the Open Access (OA) is the future of academic publishing. These Open Access Platforms have adopted OAI-PMH (Open Archives Initiative - the Protocol for Metadata Harvesting) as a standard for communication and interoperability. Nevertheless, it is significant to highlight that the open source knowledge discovery services based on an index of OA have not been developed. Therefore, it is necessary to address Knowledge Discovery (KD) within these platforms aiming at students, teachers and/or researchers, to recover both, the resources requested and the resources that are not explicitly requested – which are also appropriate. This objective represents an important issue for structured resources under OAI-PMH. This fact is caused because interoperability with other developments carried out outside their implementation environment is generally not a priority (Level 1 "Shared term definitions"). It is here, where the Semantic Web (SW) becomes a cornerstone of this work. Consequently, we propose OntoOAIV, a semantic approach for the selective knowledge discovery and visualization into structured information with OAI-PMH, focused on supporting the activities of scientific or academic research for a specific user. Because of the academic nature of the structured resources with OAI-PMH, the field of application chosen is the context information of a student. Finally, in order to validate the proposed approach, we use the RUDAR (Roskilde University Digital Archive) and REDALYC (Red de Revistas Científicas de América Latina y el Caribe, España y Portugal) repositories, which implement the OAI-PMH protocol, as well as one student profile for carrying out KD.
Keywords: the Semantic web; knowledge discovery; user profile ontology; ontology merging; OAI-PMH; visualization
Edge detection is one of the most important low level steps in image processing. In this work we propose a fuzzy ensemble based method for edge detection including a fuzzy c-means (FCM) approach to define the input membership functions of the fuzzy inference system (FIS). We tested the performance of the method using a public database with ground truth. Also, we compared our proposal with classical and other fuzzy based methods, using F-measure curves and the precision metric. We conducted experiments with different levels of salt & pepper noise to evaluate the performance of the edge detectors. The metrics illustrate the robustness of the choice of the threshold in the binarization step using this fuzzy ensemble method. In noisy conditions, the proposed method works better than other fuzzy approaches. Comparative results validated that our proposal overcomes traditional techniques.
Keywords: edge detection; fuzzy inference system; fuzzy clustering; noise, image processing
Abstract: Skin cancer is a major health issue affecting a vast segment of the population regardless the skin color. This affectation can be detected using dermoscopy to determine whether the visible spots on skin are either benign or malignant tumors. In spite of the specialists' experience, skin lesions are difficult to classify, reason for which computer systems are developed to increase the effectiveness of cancer detection. Systems assisting in the detection of skin cancer process digital images to determine the occurrence of tumors by interpreting clinical parameters, relying, firstly, upon an accurate segmentation process to extract relevant features. Two of the well-known methods to analyze lesions are ABCD (Asymmetry, Border, Color, Differential structures) and the 7-point check list. After clinically-relevant features are extracted, they are used to classify the presence or absence of a tumor. However, irregular and disperse lesion borders, low contrast, artifacts in images and the presence of various colors within the region of interest complicate the processing of images. In this article, we propose an intelligent system running the following method. The feature extraction stage begins with the segmentation of an image, for which we apply the Wavelet - Fuzzy C-Means algorithm. Next, specific features should be determined, among others the area and the asymmetry of the lesion. An ensemble of clusterers extracts the Red-Green-Blue values that correspond to one or more of the colors defined in the ABCD guide. The feature extraction stage includes the discovery of structures that appear in the lesion according to the method known as Grey Level Co-Occurrence Matrix (GLCM). Then, during the detection phase, an ensemble of classifiers determines the occurrence of a malignant tumor. Our experiments are performed on images taken from the ISIC repository. The proposed system provides a skin cancer detection performance above 88 percent, as measured by the accuracy. Details of how this performance fares when compared with other systems are also given.
Keywords: segmentation; fuzzy logic; color detection; classification
Abstract: In addtion to security purposes, closed circuit video camera usually installed in a business establishment can provide extra customer information, e.g., a frequently visited area. Such valuable information allows marketing analysis to better understand customer behavior and can provide a more satifying service. Underlying customer behavior analysis is customer detection that usually serves as an early step. This article discusses a complete automatic customer behavior pipeline in detail with a focus on customer detection. Conventional customer detection approach relies on one source of decision based on multiple small image areas. However, a human visual system also exploits many other cues, e.g., context, prior knowledge, sense of place, and even other sensory input, to interpret what one sees. Accounting for multiple cues may enable a more accurate detection system, but this requires a reliable integration mechanism. This article proposes a framework for integration of multiple cues for customer detection. The detection framework is evaluated on 609 image frames captured from a retailer video data. The detected locations are compared against ground truth provided by our personnel. Miss rate to false positive per window is used as a performance index. Performance of the detection framework shows at least 42% improvement over other control treatments. Our results support our hypothesis and show the potential of the framework.
Keywords: customer detection, human detection, video analytics, hot zone visualization, multiple-cue integration, global-local inference integration, ensemble framewo
Abstract: Volcanic eruptions cause significant loss of lives and property around the world each year. Their importance is highlighted by the sheer number of volcanoes for which eruptive activity is probable. These volcanoes are classified as in a state of unrest. The Global Volcano Project maintained by the Smithsonian Institution estimates that approximately 600 volcanoes, many proximal to major urban areas, are currently in this state of unrest. A spectrum of phenomena serve as precursors to eruption, including ground deformation, emission of gases, and seismic activity. The precursors are caused by magma upwelling from the Moho to the shallow (2-5 km) subsurface and magma movement in the volcano conduit immediately preceding eruption. Precursors have in common the fundamental petrologic processes of melt generation in the lithosphere and subsequent magma differentiation. Our ultimate objective is to apply state-of-the-art machine learning techniques to volcano eruption forecasting. In this paper, we applied machine learning techniques to the precursor data, such as the 1999 eruption of Redoubt volcano, Alaska, for which a comprehensive record of precursor activity exists as USGS public domain files and global data bases, such as the Smithsonian Institution Global Volcanology Project and Aerocom (which is part of the HEMCO data base). As a result, we get geophysically meaningful results.
Keywords: machine learning; volcano activities; clustering
Abstract: Metagenomics allows researchers to sequence genomes of many microorganisms directly from a natural environment, without the need to isolate them. The results of this type of sequencing are a huge set of DNA fragments of different organisms. These results pose a new computational challenge to identify the groups of DNA sequences that belong to the same organism. Even when there are big databases of known species genomes and some similarity-based supervised algorithms, they only have a very small representation of existing microorganisms and the process to identify a set of short fragments is very time consuming. For all those reasons, the reconstruction and identification process in a set of metagenomics fragments has a binning process, as a preprocess step, in order to join fragments into groups of the same taxonomic levels. In this paper, we propose a clustering algorithm based on k-means iterative and a consensus of clusters using different distance functions. The results achieved by the proposed method are divided using different lengths of sequences and different combinations of distances. The proposed method outperforms the simple and iterative k-means.
Keywords: Metagenomics; consensus clustering; sequences binning; k-means; distances function
Abstract: In many practical problems, we need to find the values of the parameters that optimize the desired objective function. For example, for the toll roads, it is important to set the toll values that lead to the fastest return on investment. There exist many optimization algorithms, the problem is that these algorithms often end up in a local optimum. One of the promising methods to avoid the local optima is the filled function method, in which we, in effect, first optimize a smoothed version of the objective function, and then use the resulting optimum to look for the optimum of the original function. It turns out that empirically, the best smoothing functions to use in this method are the Gaussian and the Cauchy functions. In this paper, we show that from the viewpoint of computational complexity, these two smoothing functions are indeed the simplest. The Gaussian and Cauchy functions are not a panacea: in some cases, they still leave us with a local optimum. In this paper, we use the computational complexity analysis to describe the next-simplest smoothing functions which are worth trying in such situations.
Keywords: optimization; toll roads; filled function method; Gaussian and Cauchy smoothing
Abstract: Vibratory gyroscopes are now most applicable for such intelligent systems as drones, for motion stabilization, robots for accurate positioning of end-effectors, virtual reality systems to change image orientation with turn of a head and many others. During motion these systems can be exposed to mechanical shocks and vibrations. To provide required accuracy, working in such environmental conditions, gyroscopes shall have property of robustness to operating disturbances. This paper proposes differential mode of operation for single-mass vibratory gyroscope as a new operating mode that has higher rejection factors for different external disturbances like shocks and vibrations allowing meeting the requirements of many important applications in intelligent systems. Test results presented in this paper show excellent disturbance rejection properties of differential mode of operation in comparison to well-known rate mode. Despite excellent disturbance rejection results have been obtained for non MEMS gyro, the same results can certainly be obtained for MEMS gyro, too.
Keywords: differential vibratory gyroscope; shock rejection factor; vibration sensitivity