General introduction Marco Saerens (UCL), with Christine Decaestecker (ULB) 2 Table of contents (1/6) Feature reduction: a short review of subspace ...
12 downloads
21 Views
4MB Size
General introduction
Marco Saerens (UCL), with Christine Decaestecker (ULB)
Table of contents (1/6) Feature
reduction: a short review of subspace projection methods (15%) – Feature selection and extraction – Principal components analysis – Multidimensional scaling – Canonical correlation analysis – Correspondence analysis – Multiple correspondence analysis – Discriminant analysis
2
Table of contents (2/6)
Standard clustering methods (15%) – – – – – – – – –
General introduction K-means Fuzzy k-means Gaussian mixtures Hierarchical clustering The kernel trick for clustering: kernel k-means Spectral clustering Comparing and assessing clustering solutions A case study
3
Table of contents (3/6)
Supervised classification (15%) – General introduction – Comparing and assessing classification models + bias/variance decomposition – Nearest neighbors classifiers – Naive Bayes classifier – Logistic regression – Multilayer neural networks – Support vector machines and the kernel trick for classification – Combination of classifiers (ensemble methods) – Estimation of a posteriori probabilities 4 – Change of a priori probabilities
Table of contents (4/6) Classification
and modelling of sequences (15%) – General introduction – Dynamic programming – Edit-distances – Dynamic time warping – Modelling by Markov models – Modelling by hidden Markov models 5
Table of contents (5/6) Decision
analysis (25%)
– Individual and static decisions based on data: • Bayes decision theory for supervised classification
– Individual and static decisions based on the knowledge of the probability distribution of the variables: • Bayesian networks and decision graphs
– Individual and static decisions based on preferences modelling: • Multicriterion decision methods (Promethee)
6
Table of contents (5/6) Decision
analysis (25%)
– Individual and dynamic decisions: • Markov decision processes and reinforcement learning
– Multiperson and static decisions: • Theory of voting
– Multiperson, non-cooperative dynamic decisions: • Two-person competitive games 7
Table of contents (6/6)
Information retrieval and web mining techniques (15%) – General introduction – Content-based information retrieval • Probabilistic methods • Vector-space methods
– Web link analysis • PageRank • Kleinberg’s hubs and authorities model
– Collaborative recommendation methods • Standard nearest neighbours rule • Latent semantic models • ItemRank
8
General introduction
9
Slides references
Many slides and figures have been adapted from the slides associated to the following books: – Alpaydin (2004), Introduction to machine learning. The MIT Press. – Duda, Hart & Stork (2000), Pattern classification, John Wiley & Sons. – Han & Kamber (2006), Data mining: concepts and techniques, 2nd ed. Morgan Kaufmann. – Tan, Steinbach & Kumar (2006), Introduction to data mining. Addison Wesley.
As well as from Wikipedia, the free online encyclopedia I really thank these authors for their contribution to machine learning and data mining teaching
10
Machine learning and data mining Interplay
of these two related fields:
– Data mining – Machine learning
11
Why “Machine learning” ?
Machine learning is programming computers to optimize a performance criterion using example data or past experience – There is no need to “learn” to calculate payroll
Learning is used when: – Human expertise does not exist (navigating on Mars), – Humans are unable to explain their expertise accurately (speech recognition) – Solution changes in time (routing on a computer network) – Solution needs to be adapted to particular cases (user biometrics)
12
What we talk about when we talk about “Learning”
Learning general models from a data of particular examples – Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce
Example in retail: Customer transactions to consumer behavior: People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven” (www.amazon.com)
Build a model that is a good and useful approximation to the data
13
Data mining is related to machine learning
Related to knowledge extraction from data – Retail: Market basket analysis, Customer relationship management (CRM) – Finance: Credit scoring, fraud detection – Manufacturing: Optimization, troubleshooting – Medicine: Medical diagnosis – Telecommunications: Quality of service optimization – Bioinformatics: Motifs, alignment – Web mining: Search engines
In order to take well-founded decisions
14
What is machine learning?
Optimize a performance criterion using example data or past experience. – Role of Statistics: Inference from a sample – Role of Engineering: • Efficient extraction of features from images, camera, measurement devices • Finding the best strategy – similar to control theory
– Role of Computer science: Efficient algorithms to • Solve the optimization problem • Representing and evaluating the model for inference 15
Brief sketch of some applications Association
rules Supervised Learning – Classification – Regression Clustering Reinforcement
Learning Collaborative recommendation Etc...
16
Learning associations Basket
analysis: P(Y | X) probability that somebody who buys X also buys Y where X and Y are products/services. Example: P(chips | beer) = 0.7
17
Classification
Example: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings
Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk 18
Classification: Applications
Aka Pattern recognition Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style Character recognition: Different handwriting styles Speech recognition: Temporal dependency – Use of a dictionary or the syntax of the language – Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for speech
Medical diagnosis: From symptoms to illnesses
19
Classification: Face recognition Training examples of a person
Test images
AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html
20
Regression
Example: Price of a used car y = wx + w0
x: car features y: price y = g(x, w): model w: model parameters
21
Unsupervised learning and clustering Learning
“what normally happens” No output (= unsupervised) Clustering: Grouping similar instances Example applications – Customer segmentation in CRM – Image compression: Color quantization – Bioinformatics: Learning motifs
22
Reinforcement learning Learning
a policy: A sequence of
outputs – No supervised output but delayed reward Credit
assignment problem Game playing Robot in a maze Multiple agents, partial observability, ... 23
Introduction to data mining
24
Why data mining?
The Explosive Growth of Data: from terabytes to petabytes – Data collection and data availability • Automated data collection tools, database systems, Web, computerized society
– Major sources of abundant data • Business: Web, e-commerce, transactions, stocks, … • Science: Remote sensing, bioinformatics, scientific simulation, … • Society and everyone: news, digital cameras,
We are drowning in data, but starving for knowledge!25
Evolution of database technology
1960s: – Data collection, database creation, IMS and network DBMS
1970s: – Relational data model, relational DBMS implementation
1980s: – RDBMS, advanced data models (extended-relational, OO, deductive, etc.) – Application-oriented DBMS (spatial, scientific, engineering, etc.)
1990s: – Data mining, data warehousing, multimedia databases, and Web databases
2000s – Stream data management and mining – Data mining and its applications – Web technology (XML, data integration) and global information systems
26
What Is Data Mining?
Data mining (knowledge discovery from data) – Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from large amount of data – Data mining: a misnomer?
Alternative names – Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
Watch out: Is everything “data mining”? – No: Simple search and query processing – (Deductive) expert systems 27
Why data mining?– potential applications
Data analysis and decision support – Market analysis and management • Target marketing, customer relationship management (CRM), market basket analysis, cross selling, market segmentation
– Risk analysis and management • Forecasting, customer retention, improved underwriting, quality control, competitive analysis
– Fraud detection and detection of unusual patterns (outliers)
Other Applications – Text mining (news group, email, documents) – Web mining – Stream data mining – Bioinformatics and bio-data analysis
28
Knowledge discovery process – Data mining—core of knowledge discovery process
Performances Evaluation
Data Mining Task-relevant Data Data Warehouse
Selection
Data Cleaning Data Integration Databases
29
KDD methodology CRISP-DM (1999) Cross-Industry Standard Process Model
30
KDD methodology Business Knowledge
Specification
Extraction
Audit
Transformation
Modelling
Evaluation
Knowledge Deployment Presentation
-Map issues to data, gap analysis -Model data design (Matched modelling techinques, Taregt definition -Sampling: data history, time period, volume that models can be built and used!) -Extraction/transformation specification -Data quality audit -Data exploration (understanding, first insight for variable selection) -Data Treatment (missing values, outliers, transformation spec…) Data consolidation, aggregations suitable format for modelling Construct and evaluate as-simple-as-possible models that render the highest amount of explanatory power Evaluate the model and its impact on your business, an ROI analysis
Data Preparation can take 60 – 80% of the time!
-Model implementation31 -Real time repeatable applications
How should we act to derive value?
KDD process How do we evaluate what we find?
How do we extract appropriate data?
How ought we deliver discovered information?
Interpretation/Evaluation
Data Mining
Knowledge
Transformation Patterns Preprocessing
How do we specify what we need?
Selection
Preprocess ed Data
Transform ed Data
Target Data Data
32
KDD process: several key steps
Learning the application domain – relevant prior knowledge and goals of application
Creating a target data set: data selection
Data cleaning and preprocessing
Data reduction and transformation – Find useful features, dimensionality/variable reduction, invariant representation
Choosing functions of data mining – summarization, classification, regression, association, clustering
Choosing the mining algorithm(s)
Data mining: search for patterns of interest
Performance evaluation and knowledge presentation – visualization, transformation, removing redundant patterns, etc.
Use of discovered knowledge
33
Data mining and business intelligence Increasing potential to support business decisions
End User
Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical exploration, Querying, and Reporting
Business Analyst
Data Analyst
Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems
DBA 34
Confluence of multiple disciplines Database technology
Machine learning Pattern recognition
Statistics
Data Mining
Algorithmics
Visualization
Other disciplines
35
Data mining vs statistics Roughly
speaking, data mining is statistical analysis of data performed by computer science people In order to be able to support a wide variety of fast-growing, innovative, computer science applications
36
Why not traditional data analysis?
Tremendous amount of data – Algorithms must be highly scalable to handle such as terabytes of data
High-dimensionality of data – Micro-array may have tens of thousands of dimensions
High complexity of data – Data streams and sensor data – Time-series data, temporal data, sequence data – Structure data, graphs, social networks and multi-linked data – Heterogeneous databases and legacy databases – Spatial, spatiotemporal, multimedia, text and web data – Software programs, virtual worlds
New and sophisticated applications – Small gap between development of techniques and embedding in software applications
37
Multi-dimensional view of data mining
Data to be mined – Relational, data warehouse, transactional, stream, objectoriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW
Knowledge to be mined – Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc. – Multiple/integrated functions and mining at multiple levels
Techniques utilized – Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc
Applications tackled – Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc
38
Data mining: classification schemes
General functionality – Descriptive data mining – Predictive data mining
Different views lead to different classifications – Data view: kinds of data to be mined – Knowledge view: kinds of knowledge to be discovered – Method view: kinds of techniques utilized – Application view: kinds of applications adapted
39
Data mining: what kinds of data?
Database-oriented data sets and applications – Relational database, data warehouse, transactional database
Advanced data sets and advanced applications – Data streams and sensor data – Time-series data, temporal data, sequence data (incl. biosequences) – Structured data, graphs, social networks and multi-linked data – Object-relational databases – Heterogeneous databases and legacy databases – Spatial data and spatiotemporal data – Multimedia database – Text databases – The World-Wide Web – Virtual worlds
40
Data mining functionalities
Multidimensional concept description: Characterization and discrimination – Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions
Frequent patterns, association, correlation vs. causality – Diaper Beer [0.5%, 75%] (Correlation or causality?)
Classification and prediction – Construct models (functions) that describe and distinguish classes or concepts for future prediction • E.g., classify countries based on (climate), or classify cars based on (gas mileage)
– Predict some unknown or missing values
41
Data Mining Functionalities (2)
Cluster analysis – Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns – Maximizing intra-class similarity & minimizing interclass similarity
Outlier analysis – Outlier: Data object that does not comply with the general behavior of the data – Noise or exception? Useful in fraud detection, rare events analysis
Trend and evolution analysis – Trend and deviation: e.g., regression analysis – Sequential pattern mining: e.g., digital camera large SD memory – Periodicity analysis – Similarity-based analysis
Etc
42
Are the “discovered” patterns interesting?
Data mining may generate thousands of patterns: Not all of them are interesting – Suggested approach: Human-centered, query-based, focused mining
Interestingness measures – A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm
Objective vs. subjective interestingness measures – Objective: based on statistics and structures of patterns, e.g., support, confidence, etc. – Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, actionability, etc.
43
Some other pattern mining issues
Precise patterns vs. approximate patterns – Association and correlation mining: possible find sets of precise patterns • But approximate patterns can be more compact and sufficient • How to find high quality approximate patterns??
– Gene sequence mining: approximate patterns are inherent • How to derive efficient approximate pattern mining algorithms??
Constrained vs. non-constrained patterns – Why constraint-based mining? – What are the possible kinds of constraints? How to push constraints into the mining process? 44
Architecture: typical data mining system Graphical User Interface Pattern Evaluation Knowle dgeBase
Data Mining Engine Database or Data Warehouse Server Data cleaning, transformation, integration, and selection Database
Data warehouse
World-wide web
Other info repositories
45
Some issues in data mining
Mining methodology – Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web – Performance: efficiency, effectiveness, and scalability – Pattern evaluation: the interestingness problem – Incorporation of background knowledge – Handling noise and incomplete data – Parallel, distributed and incremental mining methods – Integration of the discovered knowledge with existing one: knowledge fusion
User interaction – Data mining query languages and ad-hoc mining – Expression and visualization of data mining results – Interactive mining of knowledge at multiple levels of abstraction
Applications and social impacts – Domain-specific data mining & invisible data mining – Protection of data security, integrity, and privacy
46
Summary
Data mining: Discovering interesting patterns from large amounts of data
A natural evolution of database technology + statistics, in great demand, with wide applications
A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation
Mining can be performed in a variety of information repositories
Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.
Data mining systems and architectures
Major issues in data mining 47
Some practical information
48
Conferences
International conferences KDD,
PKDD Knowledge Discovery in Databases KDD is organised by ACM
50
International conferences ICDM IEEE
International Conference on Data Mining IEEE Computer Society
51
International conferences Closely
related to « Machine Learning » ICML, ECML International Conference on Machine Learning
52
International conferences Closely
related to « pattern recognition » (from a engineering perspective) ICPR International Conference on Pattern Recognition Organised by IAPR 53
International conferences Closely
related to « neural information processing » NIPS Neural Information Processing Systems Conference
54
Some periodicals
Periodicals
Researchers of the data mining community typically publish their work in – – – – – – –
ACM Transactions on Knowledge Discovery form Data Machine Learning Journal of Machine Learning Research Neural Computation IEEE Transactions on Neural Networks Pattern Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence – IEEE Transactions on Knowledge and Data Engineering – Etc…
56
Some books
Books
Hastie, Tibshirani & Friedman (2001) “The elements of statistical learning”.
Mitchell (1997) “Machine learning”.
Webb (2002) “Statistical pattern recognition, 2nd ed”. 58
Books
Bishop (1995) “Neural networks for pattern recognition”.
Theodoridis & Koutroumbas (2005) “Pattern recognition, 3nd ed”
Cornuéjols & Miclet (2002) “Apprentissage artificiel”. 59
Books
Duda, Hart & Stork (2001) “Pattern classification”.
Ripley (1996) “Pattern recognition and neural networks”.
Bishop (2006) “Pattern recognition and machine learning”.
60
Books
Alpaydin (2004) “Introduction to machine learning”.
Han & Kamber (2006) “Data mining. Concepts and techniques, 2nd ed”.
Tan, Steinbach & Kumar (2005) “Introduction to data mining”.
61
More books… Alpaydin (2004), “Introduction to machine learning”. MIT Press. Bardos (2001), "Analyse discriminante. Application au risque et scoring financier. Dunod. Bishop (1995), "Neural networks for pattern recognition". Clarendon Press. Bishop (2006), “Pattern recognition and machine learning”. Springer-Verlag. Bouroche & Saporta (1983), "L'analyse des données". Que Sais-je. Cornuéjols & Miclet (2002), "Apprentissage artificiel. Concepts et algorithmes". Eyrolles. Duda, Hart & Stork (2001), "Pattern classification, 2nd ed". John Wiley & Sons. Dunham (2003), "Data mining. Introductory and advanced topics". Prentice-Hall. Greenacre (1984), "Theory and applications of correspondence analysis". Academic Press. Han & Kamber (2006), "Data mining: Concepts and techniques, 2nd ed". Morgan Kaufmann. Hand (1981), "Discrimination and classification". John Wiley & Sons. Hardle & Simar (2003), "Applied multivariate statistical analysis". Springer-Verlag. Disponible à http://www.quantlet.com/mdstat/scripts/mva/htmlbook/mvahtml.html Hastie, Tibshirani & Friedman (2001), "The elements of statistical learning". Springer-Verlag. Johnson & Wichern (2002), "Applied multivariate statistical analysis, 5th ed". Prentice-Hall. Lebart, Morineau & Piron (1995), "Statistique exploratoire multidimensionnelle". Dunod. Mitchell (1997), "Machine learning". McGraw-Hill. Naim, Wuillemin, Leray, Pourret & Becker (2004), “Réseaux bayesiens”. Editions Eyrolles. Nilsson (1998), “Artificial intelligence: A new synthesis”. Morgan Kaufmann. Ripley (1996), "Pattern recognition and neural networks". Cambridge University Press. Rosner (1995), "Fundamentals of biostatistics, 4th ed".Wadsworth Publishing Company. Saporta (1990), "Probabilités, analyse des données et statistique". Editions Technip. Tan, Steinbach & Kumar (2005), “Introduction to data mining”. Addison Wesley. Theodoridis & Koutroumbas (2006), "Pattern recognition, 3nd ed". Academic Press. Therrien (1989), "Decision, estimation and classification". Wiley & Sons. Venables & Ripley (2002), "Modern applied statistics with S. Springer-Verlag. Wasserman (2004), “All of statistics”. Springer. Webb (2002), "Statistical pattern recognition, 2nd ed". John Wiley and Sons.
62
Some softwares
Softwares SAS/Enterprise
Miner You benefit from the full power of SAS – SAS/Stat, Base, Graph, etc Highly
scalable Quite complete, but expensive
64
Softwares/SAS EM
65
Softwares
SPSS/Clementine One of the first specific data mining software – Less scalable than SAS – Less expensive
Same look and feel as SAS/EM – Palette with icons – Drag & drop of models – Data flux
66
Softwares/SPSS
67
Softwares StatSoft/DataMiner Released
recently Relies on Statistica Same look and feel
68
Softwares/StatSoft
69
Softwares
Insightful Miner Relies on S-Plus Same look and feel
70
Softwares/Insightful
71
Softwares
IBM/DB2 Intelligent Miner – One of the first on the market – Quite impressive for data visualisation
KXEN – A newcomer – Quite controversial – A bit “Black Box” 72
Softwares
Of course, you do not need a specific data mining product in order to do data mining Statistical softwares integrate data mining techniques as well – – – – –
S-Plus R SAS/JMP Stata Etc… 73
Open source softwares
WEKA – www.cs.waikato.ac.nz/~ml/weka/ – Data mining techniques implemented in Java
Pentaho (integrates WEKA) – www.pentaho.com – BI open source suite
R – www.r-project.org – Statistical software including data mining techniques
Orange – http://www.ailab.si/orange – Data mining tool in Python
Rapid Miner – http://rapid-i.com
74