University of Macau Library | UML Digital Resources Hub

UM Dissertations & Theses Collection (澳門大學電子學位論文庫)

Full Text

Title

Efficient query processing in time series data

English Abstract

With the rapid development over the last decade, time series data become one of the most frequently used data in real world applications. As expected, the volume of the time series data will even grow larger in near future. It is thus important to design efficient and effective algorithms to extract meaningful patterns from large volume of time series data. Specifically, two types of extraction queries are studied in this thesis, i.e., discovering longest-lasting correlated subsequence and time series motif. Longest-lasting correlated subsequence: The search for similar subsequences is a core module for various analytical tasks in time series databases. Typically, the similarity computations require users to set a length. However, there is no robust mean to define the proper length for different application needs. In this study, we examine a new query that is capable of returning the longest-lasting highly correlated subsequences in a time series database, which is particularly helpful to analyses without prior knowledge regarding the query length. A baseline, yet expensive, solution is to calculate the correlations for every possible subsequence length. To boost performance, we study a space-constrained index that can provide a tight correlation bound for subsequences of similar lengths and offset by intraobject and interobject grouping techniques. To the best of our knowledge, this is the first index to support a normalized distance measure of arbitrary length subsequences. In addition, we study the use of a smart cache for disk-resident data (e.g., millions of time series data) and a GPU-based technique for frequently updated data (e.g., nonindexable streaming time series) to compute the longest-lasting highly correlated subsequences. iii Time series motif: Discovering motifs in time series data has been receiving abundant attentions, where the motif is the most correlated pair of subsequences in a time series. Prior works cannot offer fast correlation computations and prune subsequence pairs at the same time, as these two techniques require different orderings on examining subsequence pairs. To address this issue, we propose a novel framework named Quick-Motif (QM) by adopting a two-level approach. It enables batch pruning at the outer level and enable fast correlation calculation at the inner level. We further propose two optimization techniques for the outer and the inner level. The efficiency and effectiveness of all proposed methods in discovering the longestlasting correlation subsequences and time series motif are verified by extensive experimental evaluations on both real and synthetic datasets. Moreover, we discuss some other potential pattern extraction queries in this thesis.

Issue date

2016.

Author

Li, Yu Hong

Faculty

Faculty of Science and Technology

Department

Department of Computer and Information Science

Degree

Ph.D.

Subject

Time-series analysis

Data mining

Supervisor

Gong, Zhi Guo

Files In This Item

Full-text (Intranet only)

Location

1/F Zone C

Library URL

991001901729706306