Data Mining by Mehmed Kantardzic (good book recommendations TXT) π
Read free book Β«Data Mining by Mehmed Kantardzic (good book recommendations TXT) πΒ» - read online or download for free at americanlibrarybooks.com
- Author: Mehmed Kantardzic
Read book online Β«Data Mining by Mehmed Kantardzic (good book recommendations TXT) πΒ». Author - Mehmed Kantardzic
12.2.3 Temporal Data Modeling
A model is a global, high-level, and often abstract representation of data. Typically, models are specified by a collection of model parameters that can be estimated from a given data set. It is possible to classify models as predictive or descriptive depending on the task they are performing. In contrast to the (global) model structure, a temporal pattern is a local model that makes a specific statement about a few data samples in time. Spikes, for example, are patterns in a real-valued time series that may be of interest. Similarly, in symbolic sequences, regular expressions represent well-defined patterns. In bioinformatics, genes are known to appear as local patterns interspersed between chunks of noncoding DNA. Matching and discovery of such patterns are very useful in many applications, not only in bioinformatics. Due to their readily interpretable structure, patterns play a particularly dominant role in data mining. There have been many techniques used to model global or local temporal events. We will introduce only some of the most popular modeling techniques.
Finite State Machine (FSM) has a set of states and a set of transitions. A state may have transitions to other states that are caused by fulfilling some conditions within the state. An FSM must have an initial state, usually drawn with an arrow, and it is a state that provides a starting point of the model. Inputs to the states, in our case representing symbols in a sequence, act as triggers for the transition from one state to another state. An accept state, which is also known as final state, is usually represented by a double circle in a graph representation. The machine reaches the final state when it has performed the procedure successfully, or in our case recognized a sequence pattern. An FSM can be represented using a state-transition table or state-transition diagram. Figures 12.20a,b shows both of these representations for a modeling recognition of a binary number with an even number of ones. FSM does not work very well when the transitions are not precise and does not scale well when the set of symbols for sequence representation is large.
Figure 12.20. Finite-state machine. (a) State-transition table; (b) state-transition diagram.
Markov Model (MM) extends the basic idea behind FSM. Both FSM and MM are directed graphs. As with FSM, MM always has a current state. Start and end nodes are drawn for illustrative purposes and need not be present. Unlike in FSM, transitions are not associated with specific input values. Arcs carry a probability value for transition from one state to another. For example, the probability that transition from state βStartβ to βS1β is 0.4 and the probability of staying in the βStartβ state is 0.6. The sum of the probability values coming out of each node should be 1. MM shows only transitions with probability greater than 0. If a transition is not shown, it is assumed to have a probability of 0. The probabilities are combined to determine the final probability of the pattern produced by the MM. For example, with the MM shown in Figure 12.21, the probability that the MM takes the horizontal path from starting node to S2 is 0.4 Γ 0.7 = 0.28.
Figure 12.21. A simple Markov Model.
MM is derived based on the memoryless assumption. It states that given the current state of the system, the future evolution of the system is independent of its history. MMs have been used widely in speech recognition and natural language processing.
Hidden Markov Model (HMM) is an extension to MM. Similar to MM, HMM consists of a set of states and transition probabilities. In a regular MM, the states are visible to the observer, and the state-transition probabilities are the only parameters. In HMM, each state is associated with a state-probability distribution. For example, assume that we were given a sequence of events in a coin toss: O = (HTTHTHH), where H = Head and T = Tail. But additional information is necessary. What is not given is the sequence generated with one or two coins. According to the above definitions, Figure 12.22 shows two possible models. Figure 12.22a assumes that only one coin was tossed. We can model this system as an MM with a two-state model, where Head and Tail are the two states with the same initial probabilities. The probability of the sequence O is P(O) = 0.5 Γ 0.7 Γ 0.3
Comments (0)