Another extension is to discover rules where items may be annotated with quantities in sequences and each item may have a unit profit. Discovering high-utility sequential rules.For example, a user may want to find rules appearing whithin three consecutive itemsets in sequences. This is interesting for example for analyzing sequence of web clicks. An algorithm for this task is TRuleGrowth. This algorithm let the user find rules of the form X -> Y where X and Y must be close to each other with respect to time. D iscovering sequential rules with a window size constraint.Some algorithms for this task are TopSeqRules and TNS. For example, a user may specify that he wants to find the top 1000 rules having a confidence of at least 75 %. The idea is to discover the k most frequent rules in a dataset having at least a confidence no less than minconf. Discovering the top-k sequential rules.These extensions have been proposed to address specific needs. I will provide a brief overview of a few extensions. But note there also exists several extensions of the problem of sequential rule mining. In the previous paragraphs, I have introduced the topic of sequential rule mining. The reason is that sequential rules consider the probability (confidence), while sequential patterns do not.Įxtensions of the task of sequential rule mining In that study, we found sequential rules can provide a much higher prediction accuracy than sequential patterns when the patterns are used for sequence prediction. In the past, I have carried a study with my student to compare the prediction accuracy of sequential patterns and sequential rules. It indicates that customers who bought have a confidence of 100 %. For example, the sequential pattern appears in the two first sequences of our database. Then, I will explain some of their limitations and then discuss sequential rules.Ī sequential pattern is a subsequence that appear in several sequences of a database. I will first discuss sequential patterns. In the following, Iwill discuss two types of patterns that can be found. There have been a lot of research on this topic in the field of data mining and various algorithms have been proposed. the order of nucleotides in a DNA sequence).ĭiscovering sequential patterns in sequencesĪn important data mining problem is to design algorithm for discovering hidden patterns in sequences. It is to be noted that sequence can be ordered by time or other properties (e.g. Sequences are a very common type of data structures that can be found in many domains such as bioinformatics (DNA sequence), sequences of clicks on websites, the behavior of learners in e-learning, sequences of what customers buy in retail stores, sentences of words in a text, etc. This sequence indicates that the second customer bought items “a” and “d” together, than bought item “c”, then bought “b”, and then bought “a”, “b”, “e” and “f” together. For example, consider the second sequence “seq2”. For our example, we will assume that each sequence represents what a customer has bought in our supermarket over time. Now, a sequence is an ordered list of sets of items. For example, “a” could represent an “apple”, “b” could be some “bread”, “c” could denote “cake”, etc. For our example, consider that the symbols “a”, “b”, “c”, d”, “e”, “f”, “g” and “h” respectively represents some items sold in a supermarket. This database contains four sequences named seq1, seq2, seq3 and seq4. A sequence database containing four sequences
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |