What is Data Mining? - extracting knowledge from large amount of data - cleaning, integration, selection, transformation, mining/processing, pattern evaluation, presentation - - - - - What kind of patterns can be mined/found by data mining techniques? 1. Characterization and Discrimination 2. Frequent patterns, Associations, and Correlations 3. Classification and Prediction 4. Cluster analysis 5. Outlier analysis 6. Evolution analysisGive examples of each of the following
Characterization and Discrimination
- Characteristics of customers who buy a certain kind of product - Customers who buy product A vs customers who buy another product B Data Characterization − This refers to summarizing data of class under study. This class under study is called as Target Class. Data Discrimination − It refers to the mapping or classification of a class with some predefined group or class. - - - - -Frequent patterns, Associations, and Correlations
What kind of products do customers buy together? (Ex. of Association Mining) If customer buys product A, what’s the chance that he/she will buy product B as well (Ex. of Association Mining) Frequent patterns are itemsets, subsequences, or substructures that appear in a data set with frequency no less than a user-specified threshold. For example, a set of items, such as milk and bread, that appear frequently together in a transaction data set, is a frequent itemset. - - - - -Classification and Prediction
- Classify the sales of various products into different classes - Predict the sales of the product Classification Imagine a T-shirt store: As a customer, you would tell your age, weight, height to the salesman and the salesman will show T-shirts of appropriate sizes as in small, medium, large. Prediction It is a forecast as in how many t-shirts store might sell this month? - - - - -Cluster analysis
- Divide the data into different groups of similar items - No. of cluster are not known apriori If let's say we want to open T-shirt store, we might not know what all sizes of t-shirts should be placed in the store. We collect data such as weight and height and then we have to group them into classes like small, medium, large. Question is: Is this list of three sizes exhaustive or there can be more sizes like XL or XXL? Who would tell us what all sizes should be there? - - - - -Outlier analysis
- What is an outlier: Deviant data from the expected Example: a fraudulent transaction: amount might be very high (in comparison to other routine transactions done by the user) for a fraudulent transaction because the criminal might think of pulling out as much money as possible before a stolen card or hacked account is blocked. - - - - -Evolution analysis
- Time series analysis of data Example: Identification of current trend in the stock market whether it is saturated, or it is bullish or it is bearish.Discess whether or not each of the following activities is a data mining task.
Q: Dividing the customers of a company according to their gender. A: No, this is a simple database query. Q: Dividing the customers of a company according to their profitability. A: No. This is an accounting calculation, followed by the application of a threshold. However, predicting the profitability of a new customer would be data mining. Q: Predicting the outcomes of a tossing a (fair) pair of dice. A: No. Since the die is fair, this is a probability calculation. Q: Predicting the future stock price of a company using historical recors. A: Yes. We would attempt to create a model that can predict the continuous value of the stock price. This is an example of the area of data mining known as predictive modeling. Q: Monitoring siesmic waves for earthquake activities. A: Yes, in this case, we would build a model of different types of siesmic wave behavior associated with earthquake activities and raise an alarm when one of these different types of seismic activity was observed. This is an example of the area of data mining known as classification.
Monday, May 29, 2023
Ch 1 - What is Data Mining?
Sunday, May 28, 2023
Data Analytics Books (May 2023)
Download Books
1. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking Tom Fawcett, 2013 2. Big Data: A Revolution That Will Transform How We Live, Work, and Think Viktor Mayer-Schönberger, 2013 3. Storytelling With Data: A Data Visualization Guide for Business Professionals Cole Nussbaumer Knaflic, 2015 4. Python for Data Analysis Wes McKinney, 2011 5. Naked Statistics: Stripping the Dread from the Data Charles Wheelan, 2012 6. Business unIntelligence: Insight and Innovation beyond Analytics and Big Data Barry Devlin, 2013 7. Too Big to Ignore: The Business Case for Big Data Phil Simon, 2013 8. Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data 2014 9. Lean Analytics: Use Data to Build a Better Startup Faster Benjamin Yoskovitz, 2013 10. Artificial Intelligence: A Guide for Thinking Humans Melanie Mitchell, 2019 11. Data Strategy: How to Profit from a World of Big Data, Analytics and the Internet of Things Bernard Marr, 2017 12. The Hundred-Page Machine Learning Book Andriy Burkov, 2019 13. Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python Peter Bruce, 2017 14. Learning R: A Step-by-Step Function Guide to Data Analysis Richard Cotton, 2013 15. The Art of Statistics: How to Learn from Data David Spiegelhalter, 2019 16. Developing Analytic Talent: Becoming a Data Scientist Vincent Granville, 2014 17. Data Smart: Using Data Science to Transform Information into Insight John W. Foreman, 2013 18. R for Data Science Hadley Wickham, 2016 19. Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, Or Die Eric Siegel, 2013 20. Now You See it: Simple Visualization Techniques for Quantitative Analysis Stephen Few, 2009 21. Predictive Analytics For Dummies Anasse Bari, 2013 22. Data Analytics: Become a Master in Data Analytics Richard Dorsey, 2017 23. Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again Eric Topol, 2019 24. Creating Value with Social Media Analytics: Managing, Aligning, and Mining Social Media Text, Networks, Actions, Location, Aps, Hyperlinks, Multimedia, and Search Engines Data Gohar F. Khan, 2018 25. The Quick Python Book Kenneth McDonald, 1999 26. Numsense! Data Science for the Layman: No Math Added Annalyn Ng, 2017 27. Weapons of Math Destruction Cathy O'Neil, 2016 28. Business Analytics: Data Analysis & Decision Making Wayne L. Winston, 2014 29. Microsoft Excel Data Analysis and Business Modeling Wayne L. Winston, 2004 30. A PRACTITIONER'S GUIDE TO BUSINESS ANALYTICS: Using Data Analysis Tools to Improve Your Organization's Decision Making and Strategy Randy Bartlett, 2012 31. Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions Matt Taddy, 2019 32. Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Seth Stephens-Davidowitz, 2017 33. Rebooting AI: Building Artificial Intelligence We Can Trust Ernest Davis, 2019 34. Data analysis using SQL and Excel Gordon Linoff, 2007 35. An introduction to statistical methods and data analysis Lyman Ott, 1977 36. Doing Data Science: Straight Talk from the Frontline Cathy O'Neil, 2013 37. SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Walter Shields, 2015 38. Big Data in Practice: How 45 Successful Companies Used Big Data Analytics to Deliver Extraordinary Results Bernard Marr, 2016 39. Data Analytics for Beginners: Basic Guide to Master Data Analytics Paul Kinley, 2016 40. Data Strategy: How to Profit from a World of Big Data, Analytics and Artificial Intelligence Bernard Marr, 2021 41. SQL for Data Analytics: Perform Fast and Efficient Data Analysis with the Power of SQL Upom Malik, 2019 42. The Data Detective: Ten Easy Rules to Make Sense of Statistics Tim Harford, 2021 43. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies John D. Kelleher, 2015 44. From Big Data to Big Profits: Success with Data and Analytics Russell Walker, 2015 45. Analytics in a Big Data World. The Essential Guide to Data Science and Its Applications Bart Baesens, 2014 46. Competing on Analytics: The New Science of Winning Thomas H. Davenport, 2007 47. The Elements of Statistical Learning Trevor Hastie, 2001 48. Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses Michael Minelli, 2012 49. Marketing Analytics: Data-Driven Techniques with Microsoft Excel Wayne L. Winston, 2014 50. Head First Statistics Dawn Griffiths, 2008Tags: List of Books,Technology,Python,Machine Learning,
Python Quiz (13 Questions, May 2023)
Q1: What will be output of: >>> s = 'malayalam' >>> s.strip('mal') Q2: Which of these are valid variable names? a. &code = 'abc' b. discount% = 90 c. _ = "Alpha" d. string = "Beta" Q3: What will be the output of: >>> s = " Python Program " >>> s.lstrip("P") Q4: What will be the output of: >>> l = ['Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon'] >>> l[-2][2] Q5: What will be the output of: var = 0 if var: print("In If") elif (var == 0): print("In Elif 1") elif (var == 0): print("In Elif 2") else: print("In Else") Q6: What will be the output of: for i in range(10): if(i == 5): break else: print(i, sep = " ") else: print("In Else 2") Q7: What will be output of: for x in range(6): print(x) else: print("Finally finished!") Q8: A riddle. >>> t = (1, 2, [30, 40]) >>> t[2] += [50, 60] What happens next? Choose the best answer: a) t becomes (1, 2, [30, 40, 50, 60]). b) TypeError is raised with the message 'tuple' object does not support item assignment. c) Neither. d) Both a and b. Q9: What of these gives you back a dict: a) a = dict(one=1, two=2, three=3) b) b = {'one': 1, 'two': 2, 'three': 3} c) c = dict(zip(['one', 'two', 'three'], [1, 2, 3])) d) d = dict([('two', 2), ('one', 1), ('three', 3)]) e) e = dict({'three': 3, 'one': 1, 'two': 2}) Q10: What is the output of: i = 01 print(i + 5) Q11: What is the output of: import re s = "Malaaavikaa" s = re.sub("a{2}", "*", s) print(s) Q12: What is the output of: class Person(): def __init__(self, pid): self.pid = pid obama = Person(100) obama.age = 49 print(obama.age + 2) Q13: l = ['alpha', 'beta', 'gamma', 'delta', 'epsilon'] Sort this list based on string length in one line.Tags: Python,Technology,
Wednesday, May 24, 2023
Find the count of occurences of each alphabet in a sentence
In [4]:
s = """Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation via the off-side rule.
Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.
Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0.[36] Python 2.0 was released in 2000. Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7.18, released in 2020, was the last release of Python 2.
Python consistently ranks as one of the most popular programming languages."""
In [7]:
s = s.lower()
In [6]:
s.count("a")
Out[6]:
64
In [8]:
ord('a')
Out[8]:
97
In [9]:
chr(97)
Out[9]:
'a'
In [13]:
a2z = [chr(i) for i in range(97, 123)]
In [15]:
for i in a2z:
print(i, s.count(i))
a 65 b 10 c 21 d 29 e 71 f 9 g 29 h 23 i 55 j 2 k 3 l 36 m 20 n 54 o 48 p 31 q 0 r 47 s 49 t 51 u 21 v 6 w 7 x 0 y 18 z 1
In [ ]:
Subscribe to:
Posts (Atom)