Abstract:
Data analysis using association rules belongs to the fundamental data mining
approaches and was introduced as a method aiming at dependency analysis by
Rakesh Agrawal at the IBM Research Center in Almaden, California, USA.
In this thesis, the established algorithms for association rule mining are
analyzed and systemized. The chief goal is to learn more about the algorithms
that thus far have not been described coherently. Together with the results
of an exhaustive evaluation of runtime and memory usage, this leads to a
changed appreciation of the different approaches.
On the basis of the results obtained, new algorithms for the generation of
association rules are developed. These algorithms rely on an optimized
pruning of the search space, a hybrid approach, and the incorporation of a
potentially available taxonomy. In a multitude of experiments carried out
during a comprehensive evaluation, the new algorithms achieved not only much
shorter runtimes but also a greatly reduced memory usage as compared to
established approaches. All in all, the algorithms introduced are much more
efficient than conventional approaches, in particular when a taxonomy on the
data is available.
Aligned with the efficiency of the algorithms is the aspect of integrating the
rule generation into the process of knowledge discovery. An iterative and
interactive process assumes short response times that cannot be reached by the
algorithms on very huge datasets. For this often neglected problem, an
extended rule cache is proposed. This rule cache stays valid even for many
mining queries that include selections of the underlying data. Hence, for
such queries, the cache does not need to be reinitialized.