Monday, May 29, 2023

Ch 1 - What is Data Mining?

What is Data Mining?

- extracting knowledge from large amount of data
- cleaning, integration, selection, transformation, mining/processing, pattern evaluation, presentation

- - - - -

What kind of patterns can be mined/found by data mining techniques?

1. Characterization and Discrimination
2. Frequent patterns, Associations, and Correlations
3. Classification and Prediction
4. Cluster analysis
5. Outlier analysis
6. Evolution analysis

Give examples of each of the following

Characterization and Discrimination

- Characteristics of customers who buy a certain kind of product - Customers who buy product A vs customers who buy another product B Data Characterization − This refers to summarizing data of class under study. This class under study is called as Target Class. Data Discrimination − It refers to the mapping or classification of a class with some predefined group or class. - - - - -

Frequent patterns, Associations, and Correlations

What kind of products do customers buy together? (Ex. of Association Mining) If customer buys product A, what’s the chance that he/she will buy product B as well (Ex. of Association Mining) Frequent patterns are itemsets, subsequences, or substructures that appear in a data set with frequency no less than a user-specified threshold. For example, a set of items, such as milk and bread, that appear frequently together in a transaction data set, is a frequent itemset. - - - - -

Classification and Prediction

- Classify the sales of various products into different classes - Predict the sales of the product Classification Imagine a T-shirt store: As a customer, you would tell your age, weight, height to the salesman and the salesman will show T-shirts of appropriate sizes as in small, medium, large. Prediction It is a forecast as in how many t-shirts store might sell this month? - - - - -

Cluster analysis

- Divide the data into different groups of similar items - No. of cluster are not known apriori If let's say we want to open T-shirt store, we might not know what all sizes of t-shirts should be placed in the store. We collect data such as weight and height and then we have to group them into classes like small, medium, large. Question is: Is this list of three sizes exhaustive or there can be more sizes like XL or XXL? Who would tell us what all sizes should be there? - - - - -

Outlier analysis

- What is an outlier: Deviant data from the expected Example: a fraudulent transaction: amount might be very high (in comparison to other routine transactions done by the user) for a fraudulent transaction because the criminal might think of pulling out as much money as possible before a stolen card or hacked account is blocked. - - - - -

Evolution analysis

- Time series analysis of data Example: Identification of current trend in the stock market whether it is saturated, or it is bullish or it is bearish.

Discess whether or not each of the following activities is a data mining task.

Q: Dividing the customers of a company according to their gender. A: No, this is a simple database query. Q: Dividing the customers of a company according to their profitability. A: No. This is an accounting calculation, followed by the application of a threshold. However, predicting the profitability of a new customer would be data mining. Q: Predicting the outcomes of a tossing a (fair) pair of dice. A: No. Since the die is fair, this is a probability calculation. Q: Predicting the future stock price of a company using historical recors. A: Yes. We would attempt to create a model that can predict the continuous value of the stock price. This is an example of the area of data mining known as predictive modeling. Q: Monitoring siesmic waves for earthquake activities. A: Yes, in this case, we would build a model of different types of siesmic wave behavior associated with earthquake activities and raise an alarm when one of these different types of seismic activity was observed. This is an example of the area of data mining known as classification.
Tags: Data Analytics,Technology

No comments:

Post a Comment