General introduction Marco Saerens (UCL), with Christine Decaestecker (ULB) 2 Table of contents (1/6) Feature reduction: a short review of subspace ...

12 downloads 21 Views 4MB Size

General introduction

Marco Saerens (UCL), with Christine Decaestecker (ULB)

Table of contents (1/6) Feature

reduction: a short review of subspace projection methods (15%) – Feature selection and extraction – Principal components analysis – Multidimensional scaling – Canonical correlation analysis – Correspondence analysis – Multiple correspondence analysis – Discriminant analysis

2

Table of contents (2/6)

Standard clustering methods (15%) – – – – – – – – –

General introduction K-means Fuzzy k-means Gaussian mixtures Hierarchical clustering The kernel trick for clustering: kernel k-means Spectral clustering Comparing and assessing clustering solutions A case study

3

Table of contents (3/6)

Supervised classification (15%) – General introduction – Comparing and assessing classification models + bias/variance decomposition – Nearest neighbors classifiers – Naive Bayes classifier – Logistic regression – Multilayer neural networks – Support vector machines and the kernel trick for classification – Combination of classifiers (ensemble methods) – Estimation of a posteriori probabilities 4 – Change of a priori probabilities

Table of contents (4/6) Classification

and modelling of sequences (15%) – General introduction – Dynamic programming – Edit-distances – Dynamic time warping – Modelling by Markov models – Modelling by hidden Markov models 5

Table of contents (5/6) Decision

analysis (25%)

– Individual and static decisions based on data: • Bayes decision theory for supervised classification

– Individual and static decisions based on the knowledge of the probability distribution of the variables: • Bayesian networks and decision graphs

– Individual and static decisions based on preferences modelling: • Multicriterion decision methods (Promethee)

6

Table of contents (5/6) Decision

analysis (25%)

– Individual and dynamic decisions: • Markov decision processes and reinforcement learning

– Multiperson and static decisions: • Theory of voting

– Multiperson, non-cooperative dynamic decisions: • Two-person competitive games 7

Table of contents (6/6)

Information retrieval and web mining techniques (15%) – General introduction – Content-based information retrieval • Probabilistic methods • Vector-space methods

– Web link analysis • PageRank • Kleinberg’s hubs and authorities model

– Collaborative recommendation methods • Standard nearest neighbours rule • Latent semantic models • ItemRank

8

General introduction

9

Slides references

Many slides and figures have been adapted from the slides associated to the following books: – Alpaydin (2004), Introduction to machine learning. The MIT Press. – Duda, Hart & Stork (2000), Pattern classification, John Wiley & Sons. – Han & Kamber (2006), Data mining: concepts and techniques, 2nd ed. Morgan Kaufmann. – Tan, Steinbach & Kumar (2006), Introduction to data mining. Addison Wesley.

As well as from Wikipedia, the free online encyclopedia I really thank these authors for their contribution to machine learning and data mining teaching

10

Machine learning and data mining Interplay

of these two related fields:

– Data mining – Machine learning

11

Why “Machine learning” ?

Machine learning is programming computers to optimize a performance criterion using example data or past experience – There is no need to “learn” to calculate payroll

Learning is used when: – Human expertise does not exist (navigating on Mars), – Humans are unable to explain their expertise accurately (speech recognition) – Solution changes in time (routing on a computer network) – Solution needs to be adapted to particular cases (user biometrics)

12

What we talk about when we talk about “Learning”

Learning general models from a data of particular examples – Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce

Example in retail: Customer transactions to consumer behavior: People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven” (www.amazon.com)

Build a model that is a good and useful approximation to the data

13

Data mining is related to machine learning

Related to knowledge extraction from data – Retail: Market basket analysis, Customer relationship management (CRM) – Finance: Credit scoring, fraud detection – Manufacturing: Optimization, troubleshooting – Medicine: Medical diagnosis – Telecommunications: Quality of service optimization – Bioinformatics: Motifs, alignment – Web mining: Search engines

In order to take well-founded decisions

14

What is machine learning?

Optimize a performance criterion using example data or past experience. – Role of Statistics: Inference from a sample – Role of Engineering: • Efficient extraction of features from images, camera, measurement devices • Finding the best strategy – similar to control theory

– Role of Computer science: Efficient algorithms to • Solve the optimization problem • Representing and evaluating the model for inference 15

Brief sketch of some applications Association

rules Supervised Learning – Classification – Regression Clustering Reinforcement

Learning Collaborative recommendation Etc...

16

Learning associations Basket

analysis: P(Y | X) probability that somebody who buys X also buys Y where X and Y are products/services. Example: P(chips | beer) = 0.7

17

Classification

Example: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings

Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk 18

Classification: Applications

Aka Pattern recognition Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style Character recognition: Different handwriting styles Speech recognition: Temporal dependency – Use of a dictionary or the syntax of the language – Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for speech

Medical diagnosis: From symptoms to illnesses

19

Classification: Face recognition Training examples of a person

Test images

AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html

20

Regression

Example: Price of a used car y = wx + w0

x: car features y: price y = g(x, w): model w: model parameters

21

Unsupervised learning and clustering Learning

“what normally happens” No output (= unsupervised) Clustering: Grouping similar instances Example applications – Customer segmentation in CRM – Image compression: Color quantization – Bioinformatics: Learning motifs

22

Reinforcement learning Learning

a policy: A sequence of

outputs – No supervised output but delayed reward Credit

assignment problem Game playing Robot in a maze Multiple agents, partial observability, ... 23

Introduction to data mining

24

Why data mining?

The Explosive Growth of Data: from terabytes to petabytes – Data collection and data availability • Automated data collection tools, database systems, Web, computerized society

– Major sources of abundant data • Business: Web, e-commerce, transactions, stocks, … • Science: Remote sensing, bioinformatics, scientific simulation, … • Society and everyone: news, digital cameras,

We are drowning in data, but starving for knowledge!25

Evolution of database technology

1960s: – Data collection, database creation, IMS and network DBMS

1970s: – Relational data model, relational DBMS implementation

1980s: – RDBMS, advanced data models (extended-relational, OO, deductive, etc.) – Application-oriented DBMS (spatial, scientific, engineering, etc.)

1990s: – Data mining, data warehousing, multimedia databases, and Web databases

2000s – Stream data management and mining – Data mining and its applications – Web technology (XML, data integration) and global information systems

26

What Is Data Mining?

Data mining (knowledge discovery from data) – Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from large amount of data – Data mining: a misnomer?

Alternative names – Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

Watch out: Is everything “data mining”? – No: Simple search and query processing – (Deductive) expert systems 27

Why data mining?– potential applications

Data analysis and decision support – Market analysis and management • Target marketing, customer relationship management (CRM), market basket analysis, cross selling, market segmentation

– Risk analysis and management • Forecasting, customer retention, improved underwriting, quality control, competitive analysis

– Fraud detection and detection of unusual patterns (outliers)

Other Applications – Text mining (news group, email, documents) – Web mining – Stream data mining – Bioinformatics and bio-data analysis

28

Knowledge discovery process – Data mining—core of knowledge discovery process

Performances Evaluation

Data Mining Task-relevant Data Data Warehouse

Selection

Data Cleaning Data Integration Databases

29

KDD methodology CRISP-DM (1999) Cross-Industry Standard Process Model

30

KDD methodology Business Knowledge

Specification

Extraction

Audit

Transformation

Modelling

Evaluation

Knowledge Deployment Presentation

-Map issues to data, gap analysis -Model data design (Matched modelling techinques, Taregt definition -Sampling: data history, time period, volume that models can be built and used!) -Extraction/transformation specification -Data quality audit -Data exploration (understanding, first insight for variable selection) -Data Treatment (missing values, outliers, transformation spec…) Data consolidation, aggregations suitable format for modelling Construct and evaluate as-simple-as-possible models that render the highest amount of explanatory power Evaluate the model and its impact on your business, an ROI analysis

Data Preparation can take 60 – 80% of the time!

-Model implementation31 -Real time repeatable applications

How should we act to derive value?

KDD process How do we evaluate what we find?

How do we extract appropriate data?

How ought we deliver discovered information?

Interpretation/Evaluation

Data Mining

Knowledge

Transformation Patterns Preprocessing

How do we specify what we need?

Selection

Preprocess ed Data

Transform ed Data

Target Data Data

32

KDD process: several key steps

Learning the application domain – relevant prior knowledge and goals of application

Creating a target data set: data selection

Data cleaning and preprocessing

Data reduction and transformation – Find useful features, dimensionality/variable reduction, invariant representation

Choosing functions of data mining – summarization, classification, regression, association, clustering

Choosing the mining algorithm(s)

Data mining: search for patterns of interest

Performance evaluation and knowledge presentation – visualization, transformation, removing redundant patterns, etc.

Use of discovered knowledge

33

Data mining and business intelligence Increasing potential to support business decisions

End User

Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical exploration, Querying, and Reporting

Business Analyst

Data Analyst

Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems

DBA 34

Confluence of multiple disciplines Database technology

Machine learning Pattern recognition

Statistics

Data Mining

Algorithmics

Visualization

Other disciplines

35

Data mining vs statistics Roughly

speaking, data mining is statistical analysis of data performed by computer science people In order to be able to support a wide variety of fast-growing, innovative, computer science applications

36

Why not traditional data analysis?

Tremendous amount of data – Algorithms must be highly scalable to handle such as terabytes of data

High-dimensionality of data – Micro-array may have tens of thousands of dimensions

High complexity of data – Data streams and sensor data – Time-series data, temporal data, sequence data – Structure data, graphs, social networks and multi-linked data – Heterogeneous databases and legacy databases – Spatial, spatiotemporal, multimedia, text and web data – Software programs, virtual worlds

New and sophisticated applications – Small gap between development of techniques and embedding in software applications

37

Multi-dimensional view of data mining

Data to be mined – Relational, data warehouse, transactional, stream, objectoriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW

Knowledge to be mined – Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc. – Multiple/integrated functions and mining at multiple levels

Techniques utilized – Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc

Applications tackled – Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc

38

Data mining: classification schemes

General functionality – Descriptive data mining – Predictive data mining

Different views lead to different classifications – Data view: kinds of data to be mined – Knowledge view: kinds of knowledge to be discovered – Method view: kinds of techniques utilized – Application view: kinds of applications adapted

39

Data mining: what kinds of data?

Database-oriented data sets and applications – Relational database, data warehouse, transactional database

Advanced data sets and advanced applications – Data streams and sensor data – Time-series data, temporal data, sequence data (incl. biosequences) – Structured data, graphs, social networks and multi-linked data – Object-relational databases – Heterogeneous databases and legacy databases – Spatial data and spatiotemporal data – Multimedia database – Text databases – The World-Wide Web – Virtual worlds

40

Data mining functionalities

Multidimensional concept description: Characterization and discrimination – Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions

Frequent patterns, association, correlation vs. causality – Diaper Beer [0.5%, 75%] (Correlation or causality?)

Classification and prediction – Construct models (functions) that describe and distinguish classes or concepts for future prediction • E.g., classify countries based on (climate), or classify cars based on (gas mileage)

– Predict some unknown or missing values

41

Data Mining Functionalities (2)

Cluster analysis – Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns – Maximizing intra-class similarity & minimizing interclass similarity

Outlier analysis – Outlier: Data object that does not comply with the general behavior of the data – Noise or exception? Useful in fraud detection, rare events analysis

Trend and evolution analysis – Trend and deviation: e.g., regression analysis – Sequential pattern mining: e.g., digital camera large SD memory – Periodicity analysis – Similarity-based analysis

Etc

42

Are the “discovered” patterns interesting?

Data mining may generate thousands of patterns: Not all of them are interesting – Suggested approach: Human-centered, query-based, focused mining

Interestingness measures – A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm

Objective vs. subjective interestingness measures – Objective: based on statistics and structures of patterns, e.g., support, confidence, etc. – Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, actionability, etc.

43

Some other pattern mining issues

Precise patterns vs. approximate patterns – Association and correlation mining: possible find sets of precise patterns • But approximate patterns can be more compact and sufficient • How to find high quality approximate patterns??

– Gene sequence mining: approximate patterns are inherent • How to derive efficient approximate pattern mining algorithms??

Constrained vs. non-constrained patterns – Why constraint-based mining? – What are the possible kinds of constraints? How to push constraints into the mining process? 44

Architecture: typical data mining system Graphical User Interface Pattern Evaluation Knowle dgeBase

Data Mining Engine Database or Data Warehouse Server Data cleaning, transformation, integration, and selection Database

Data warehouse

World-wide web

Other info repositories

45

Some issues in data mining

Mining methodology – Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web – Performance: efficiency, effectiveness, and scalability – Pattern evaluation: the interestingness problem – Incorporation of background knowledge – Handling noise and incomplete data – Parallel, distributed and incremental mining methods – Integration of the discovered knowledge with existing one: knowledge fusion

User interaction – Data mining query languages and ad-hoc mining – Expression and visualization of data mining results – Interactive mining of knowledge at multiple levels of abstraction

Applications and social impacts – Domain-specific data mining & invisible data mining – Protection of data security, integrity, and privacy

46

Summary

Data mining: Discovering interesting patterns from large amounts of data

A natural evolution of database technology + statistics, in great demand, with wide applications

A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation

Mining can be performed in a variety of information repositories

Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.

Data mining systems and architectures

Major issues in data mining 47

Some practical information

48

Conferences

International conferences KDD,

PKDD Knowledge Discovery in Databases KDD is organised by ACM

50

International conferences ICDM IEEE

International Conference on Data Mining IEEE Computer Society

51

International conferences Closely

related to « Machine Learning » ICML, ECML International Conference on Machine Learning

52

International conferences Closely

related to « pattern recognition » (from a engineering perspective) ICPR International Conference on Pattern Recognition Organised by IAPR 53

International conferences Closely

related to « neural information processing » NIPS Neural Information Processing Systems Conference

54

Some periodicals

Periodicals

Researchers of the data mining community typically publish their work in – – – – – – –

ACM Transactions on Knowledge Discovery form Data Machine Learning Journal of Machine Learning Research Neural Computation IEEE Transactions on Neural Networks Pattern Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence – IEEE Transactions on Knowledge and Data Engineering – Etc…

56

Some books

Books

Hastie, Tibshirani & Friedman (2001) “The elements of statistical learning”.

Mitchell (1997) “Machine learning”.

Webb (2002) “Statistical pattern recognition, 2nd ed”. 58

Books

Bishop (1995) “Neural networks for pattern recognition”.

Theodoridis & Koutroumbas (2005) “Pattern recognition, 3nd ed”

Cornuéjols & Miclet (2002) “Apprentissage artificiel”. 59

Books

Duda, Hart & Stork (2001) “Pattern classification”.

Ripley (1996) “Pattern recognition and neural networks”.

Bishop (2006) “Pattern recognition and machine learning”.

60

Books

Alpaydin (2004) “Introduction to machine learning”.

Han & Kamber (2006) “Data mining. Concepts and techniques, 2nd ed”.

Tan, Steinbach & Kumar (2005) “Introduction to data mining”.

61

More books… Alpaydin (2004), “Introduction to machine learning”. MIT Press. Bardos (2001), "Analyse discriminante. Application au risque et scoring financier. Dunod. Bishop (1995), "Neural networks for pattern recognition". Clarendon Press. Bishop (2006), “Pattern recognition and machine learning”. Springer-Verlag. Bouroche & Saporta (1983), "L'analyse des données". Que Sais-je. Cornuéjols & Miclet (2002), "Apprentissage artificiel. Concepts et algorithmes". Eyrolles. Duda, Hart & Stork (2001), "Pattern classification, 2nd ed". John Wiley & Sons. Dunham (2003), "Data mining. Introductory and advanced topics". Prentice-Hall. Greenacre (1984), "Theory and applications of correspondence analysis". Academic Press. Han & Kamber (2006), "Data mining: Concepts and techniques, 2nd ed". Morgan Kaufmann. Hand (1981), "Discrimination and classification". John Wiley & Sons. Hardle & Simar (2003), "Applied multivariate statistical analysis". Springer-Verlag. Disponible à http://www.quantlet.com/mdstat/scripts/mva/htmlbook/mvahtml.html Hastie, Tibshirani & Friedman (2001), "The elements of statistical learning". Springer-Verlag. Johnson & Wichern (2002), "Applied multivariate statistical analysis, 5th ed". Prentice-Hall. Lebart, Morineau & Piron (1995), "Statistique exploratoire multidimensionnelle". Dunod. Mitchell (1997), "Machine learning". McGraw-Hill. Naim, Wuillemin, Leray, Pourret & Becker (2004), “Réseaux bayesiens”. Editions Eyrolles. Nilsson (1998), “Artificial intelligence: A new synthesis”. Morgan Kaufmann. Ripley (1996), "Pattern recognition and neural networks". Cambridge University Press. Rosner (1995), "Fundamentals of biostatistics, 4th ed".Wadsworth Publishing Company. Saporta (1990), "Probabilités, analyse des données et statistique". Editions Technip. Tan, Steinbach & Kumar (2005), “Introduction to data mining”. Addison Wesley. Theodoridis & Koutroumbas (2006), "Pattern recognition, 3nd ed". Academic Press. Therrien (1989), "Decision, estimation and classification". Wiley & Sons. Venables & Ripley (2002), "Modern applied statistics with S. Springer-Verlag. Wasserman (2004), “All of statistics”. Springer. Webb (2002), "Statistical pattern recognition, 2nd ed". John Wiley and Sons.

62

Some softwares

Softwares SAS/Enterprise

Miner You benefit from the full power of SAS – SAS/Stat, Base, Graph, etc Highly

scalable Quite complete, but expensive

64

Softwares/SAS EM

65

Softwares

SPSS/Clementine One of the first specific data mining software – Less scalable than SAS – Less expensive

Same look and feel as SAS/EM – Palette with icons – Drag & drop of models – Data flux

66

Softwares/SPSS

67

Softwares StatSoft/DataMiner Released

recently Relies on Statistica Same look and feel

68

Softwares/StatSoft

69

Softwares

Insightful Miner Relies on S-Plus Same look and feel

70

Softwares/Insightful

71

Softwares

IBM/DB2 Intelligent Miner – One of the first on the market – Quite impressive for data visualisation

KXEN – A newcomer – Quite controversial – A bit “Black Box” 72

Softwares

Of course, you do not need a specific data mining product in order to do data mining Statistical softwares integrate data mining techniques as well – – – – –

S-Plus R SAS/JMP Stata Etc… 73

Open source softwares

WEKA – www.cs.waikato.ac.nz/~ml/weka/ – Data mining techniques implemented in Java

Pentaho (integrates WEKA) – www.pentaho.com – BI open source suite

R – www.r-project.org – Statistical software including data mining techniques

Orange – http://www.ailab.si/orange – Data mining tool in Python

Rapid Miner – http://rapid-i.com

74

Marco Saerens (UCL), with Christine Decaestecker (ULB)

Table of contents (1/6) Feature

reduction: a short review of subspace projection methods (15%) – Feature selection and extraction – Principal components analysis – Multidimensional scaling – Canonical correlation analysis – Correspondence analysis – Multiple correspondence analysis – Discriminant analysis

2

Table of contents (2/6)

Standard clustering methods (15%) – – – – – – – – –

General introduction K-means Fuzzy k-means Gaussian mixtures Hierarchical clustering The kernel trick for clustering: kernel k-means Spectral clustering Comparing and assessing clustering solutions A case study

3

Table of contents (3/6)

Supervised classification (15%) – General introduction – Comparing and assessing classification models + bias/variance decomposition – Nearest neighbors classifiers – Naive Bayes classifier – Logistic regression – Multilayer neural networks – Support vector machines and the kernel trick for classification – Combination of classifiers (ensemble methods) – Estimation of a posteriori probabilities 4 – Change of a priori probabilities

Table of contents (4/6) Classification

and modelling of sequences (15%) – General introduction – Dynamic programming – Edit-distances – Dynamic time warping – Modelling by Markov models – Modelling by hidden Markov models 5

Table of contents (5/6) Decision

analysis (25%)

– Individual and static decisions based on data: • Bayes decision theory for supervised classification

– Individual and static decisions based on the knowledge of the probability distribution of the variables: • Bayesian networks and decision graphs

– Individual and static decisions based on preferences modelling: • Multicriterion decision methods (Promethee)

6

Table of contents (5/6) Decision

analysis (25%)

– Individual and dynamic decisions: • Markov decision processes and reinforcement learning

– Multiperson and static decisions: • Theory of voting

– Multiperson, non-cooperative dynamic decisions: • Two-person competitive games 7

Table of contents (6/6)

Information retrieval and web mining techniques (15%) – General introduction – Content-based information retrieval • Probabilistic methods • Vector-space methods

– Web link analysis • PageRank • Kleinberg’s hubs and authorities model

– Collaborative recommendation methods • Standard nearest neighbours rule • Latent semantic models • ItemRank

8

General introduction

9

Slides references

Many slides and figures have been adapted from the slides associated to the following books: – Alpaydin (2004), Introduction to machine learning. The MIT Press. – Duda, Hart & Stork (2000), Pattern classification, John Wiley & Sons. – Han & Kamber (2006), Data mining: concepts and techniques, 2nd ed. Morgan Kaufmann. – Tan, Steinbach & Kumar (2006), Introduction to data mining. Addison Wesley.

As well as from Wikipedia, the free online encyclopedia I really thank these authors for their contribution to machine learning and data mining teaching

10

Machine learning and data mining Interplay

of these two related fields:

– Data mining – Machine learning

11

Why “Machine learning” ?

Machine learning is programming computers to optimize a performance criterion using example data or past experience – There is no need to “learn” to calculate payroll

Learning is used when: – Human expertise does not exist (navigating on Mars), – Humans are unable to explain their expertise accurately (speech recognition) – Solution changes in time (routing on a computer network) – Solution needs to be adapted to particular cases (user biometrics)

12

What we talk about when we talk about “Learning”

Learning general models from a data of particular examples – Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce

Example in retail: Customer transactions to consumer behavior: People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven” (www.amazon.com)

Build a model that is a good and useful approximation to the data

13

Data mining is related to machine learning

Related to knowledge extraction from data – Retail: Market basket analysis, Customer relationship management (CRM) – Finance: Credit scoring, fraud detection – Manufacturing: Optimization, troubleshooting – Medicine: Medical diagnosis – Telecommunications: Quality of service optimization – Bioinformatics: Motifs, alignment – Web mining: Search engines

In order to take well-founded decisions

14

What is machine learning?

Optimize a performance criterion using example data or past experience. – Role of Statistics: Inference from a sample – Role of Engineering: • Efficient extraction of features from images, camera, measurement devices • Finding the best strategy – similar to control theory

– Role of Computer science: Efficient algorithms to • Solve the optimization problem • Representing and evaluating the model for inference 15

Brief sketch of some applications Association

rules Supervised Learning – Classification – Regression Clustering Reinforcement

Learning Collaborative recommendation Etc...

16

Learning associations Basket

analysis: P(Y | X) probability that somebody who buys X also buys Y where X and Y are products/services. Example: P(chips | beer) = 0.7

17

Classification

Example: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings

Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk 18

Classification: Applications

Aka Pattern recognition Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style Character recognition: Different handwriting styles Speech recognition: Temporal dependency – Use of a dictionary or the syntax of the language – Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for speech

Medical diagnosis: From symptoms to illnesses

19

Classification: Face recognition Training examples of a person

Test images

AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html

20

Regression

Example: Price of a used car y = wx + w0

x: car features y: price y = g(x, w): model w: model parameters

21

Unsupervised learning and clustering Learning

“what normally happens” No output (= unsupervised) Clustering: Grouping similar instances Example applications – Customer segmentation in CRM – Image compression: Color quantization – Bioinformatics: Learning motifs

22

Reinforcement learning Learning

a policy: A sequence of

outputs – No supervised output but delayed reward Credit

assignment problem Game playing Robot in a maze Multiple agents, partial observability, ... 23

Introduction to data mining

24

Why data mining?

The Explosive Growth of Data: from terabytes to petabytes – Data collection and data availability • Automated data collection tools, database systems, Web, computerized society

– Major sources of abundant data • Business: Web, e-commerce, transactions, stocks, … • Science: Remote sensing, bioinformatics, scientific simulation, … • Society and everyone: news, digital cameras,

We are drowning in data, but starving for knowledge!25

Evolution of database technology

1960s: – Data collection, database creation, IMS and network DBMS

1970s: – Relational data model, relational DBMS implementation

1980s: – RDBMS, advanced data models (extended-relational, OO, deductive, etc.) – Application-oriented DBMS (spatial, scientific, engineering, etc.)

1990s: – Data mining, data warehousing, multimedia databases, and Web databases

2000s – Stream data management and mining – Data mining and its applications – Web technology (XML, data integration) and global information systems

26

What Is Data Mining?

Data mining (knowledge discovery from data) – Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from large amount of data – Data mining: a misnomer?

Alternative names – Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

Watch out: Is everything “data mining”? – No: Simple search and query processing – (Deductive) expert systems 27

Why data mining?– potential applications

Data analysis and decision support – Market analysis and management • Target marketing, customer relationship management (CRM), market basket analysis, cross selling, market segmentation

– Risk analysis and management • Forecasting, customer retention, improved underwriting, quality control, competitive analysis

– Fraud detection and detection of unusual patterns (outliers)

Other Applications – Text mining (news group, email, documents) – Web mining – Stream data mining – Bioinformatics and bio-data analysis

28

Knowledge discovery process – Data mining—core of knowledge discovery process

Performances Evaluation

Data Mining Task-relevant Data Data Warehouse

Selection

Data Cleaning Data Integration Databases

29

KDD methodology CRISP-DM (1999) Cross-Industry Standard Process Model

30

KDD methodology Business Knowledge

Specification

Extraction

Audit

Transformation

Modelling

Evaluation

Knowledge Deployment Presentation

-Map issues to data, gap analysis -Model data design (Matched modelling techinques, Taregt definition -Sampling: data history, time period, volume that models can be built and used!) -Extraction/transformation specification -Data quality audit -Data exploration (understanding, first insight for variable selection) -Data Treatment (missing values, outliers, transformation spec…) Data consolidation, aggregations suitable format for modelling Construct and evaluate as-simple-as-possible models that render the highest amount of explanatory power Evaluate the model and its impact on your business, an ROI analysis

Data Preparation can take 60 – 80% of the time!

-Model implementation31 -Real time repeatable applications

How should we act to derive value?

KDD process How do we evaluate what we find?

How do we extract appropriate data?

How ought we deliver discovered information?

Interpretation/Evaluation

Data Mining

Knowledge

Transformation Patterns Preprocessing

How do we specify what we need?

Selection

Preprocess ed Data

Transform ed Data

Target Data Data

32

KDD process: several key steps

Learning the application domain – relevant prior knowledge and goals of application

Creating a target data set: data selection

Data cleaning and preprocessing

Data reduction and transformation – Find useful features, dimensionality/variable reduction, invariant representation

Choosing functions of data mining – summarization, classification, regression, association, clustering

Choosing the mining algorithm(s)

Data mining: search for patterns of interest

Performance evaluation and knowledge presentation – visualization, transformation, removing redundant patterns, etc.

Use of discovered knowledge

33

Data mining and business intelligence Increasing potential to support business decisions

End User

Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical exploration, Querying, and Reporting

Business Analyst

Data Analyst

Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems

DBA 34

Confluence of multiple disciplines Database technology

Machine learning Pattern recognition

Statistics

Data Mining

Algorithmics

Visualization

Other disciplines

35

Data mining vs statistics Roughly

speaking, data mining is statistical analysis of data performed by computer science people In order to be able to support a wide variety of fast-growing, innovative, computer science applications

36

Why not traditional data analysis?

Tremendous amount of data – Algorithms must be highly scalable to handle such as terabytes of data

High-dimensionality of data – Micro-array may have tens of thousands of dimensions

High complexity of data – Data streams and sensor data – Time-series data, temporal data, sequence data – Structure data, graphs, social networks and multi-linked data – Heterogeneous databases and legacy databases – Spatial, spatiotemporal, multimedia, text and web data – Software programs, virtual worlds

New and sophisticated applications – Small gap between development of techniques and embedding in software applications

37

Multi-dimensional view of data mining

Data to be mined – Relational, data warehouse, transactional, stream, objectoriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW

Knowledge to be mined – Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc. – Multiple/integrated functions and mining at multiple levels

Techniques utilized – Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc

Applications tackled – Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc

38

Data mining: classification schemes

General functionality – Descriptive data mining – Predictive data mining

Different views lead to different classifications – Data view: kinds of data to be mined – Knowledge view: kinds of knowledge to be discovered – Method view: kinds of techniques utilized – Application view: kinds of applications adapted

39

Data mining: what kinds of data?

Database-oriented data sets and applications – Relational database, data warehouse, transactional database

Advanced data sets and advanced applications – Data streams and sensor data – Time-series data, temporal data, sequence data (incl. biosequences) – Structured data, graphs, social networks and multi-linked data – Object-relational databases – Heterogeneous databases and legacy databases – Spatial data and spatiotemporal data – Multimedia database – Text databases – The World-Wide Web – Virtual worlds

40

Data mining functionalities

Multidimensional concept description: Characterization and discrimination – Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions

Frequent patterns, association, correlation vs. causality – Diaper Beer [0.5%, 75%] (Correlation or causality?)

Classification and prediction – Construct models (functions) that describe and distinguish classes or concepts for future prediction • E.g., classify countries based on (climate), or classify cars based on (gas mileage)

– Predict some unknown or missing values

41

Data Mining Functionalities (2)

Cluster analysis – Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns – Maximizing intra-class similarity & minimizing interclass similarity

Outlier analysis – Outlier: Data object that does not comply with the general behavior of the data – Noise or exception? Useful in fraud detection, rare events analysis

Trend and evolution analysis – Trend and deviation: e.g., regression analysis – Sequential pattern mining: e.g., digital camera large SD memory – Periodicity analysis – Similarity-based analysis

Etc

42

Are the “discovered” patterns interesting?

Data mining may generate thousands of patterns: Not all of them are interesting – Suggested approach: Human-centered, query-based, focused mining

Interestingness measures – A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm

Objective vs. subjective interestingness measures – Objective: based on statistics and structures of patterns, e.g., support, confidence, etc. – Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, actionability, etc.

43

Some other pattern mining issues

Precise patterns vs. approximate patterns – Association and correlation mining: possible find sets of precise patterns • But approximate patterns can be more compact and sufficient • How to find high quality approximate patterns??

– Gene sequence mining: approximate patterns are inherent • How to derive efficient approximate pattern mining algorithms??

Constrained vs. non-constrained patterns – Why constraint-based mining? – What are the possible kinds of constraints? How to push constraints into the mining process? 44

Architecture: typical data mining system Graphical User Interface Pattern Evaluation Knowle dgeBase

Data Mining Engine Database or Data Warehouse Server Data cleaning, transformation, integration, and selection Database

Data warehouse

World-wide web

Other info repositories

45

Some issues in data mining

Mining methodology – Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web – Performance: efficiency, effectiveness, and scalability – Pattern evaluation: the interestingness problem – Incorporation of background knowledge – Handling noise and incomplete data – Parallel, distributed and incremental mining methods – Integration of the discovered knowledge with existing one: knowledge fusion

User interaction – Data mining query languages and ad-hoc mining – Expression and visualization of data mining results – Interactive mining of knowledge at multiple levels of abstraction

Applications and social impacts – Domain-specific data mining & invisible data mining – Protection of data security, integrity, and privacy

46

Summary

Data mining: Discovering interesting patterns from large amounts of data

A natural evolution of database technology + statistics, in great demand, with wide applications

A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation

Mining can be performed in a variety of information repositories

Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.

Data mining systems and architectures

Major issues in data mining 47

Some practical information

48

Conferences

International conferences KDD,

PKDD Knowledge Discovery in Databases KDD is organised by ACM

50

International conferences ICDM IEEE

International Conference on Data Mining IEEE Computer Society

51

International conferences Closely

related to « Machine Learning » ICML, ECML International Conference on Machine Learning

52

International conferences Closely

related to « pattern recognition » (from a engineering perspective) ICPR International Conference on Pattern Recognition Organised by IAPR 53

International conferences Closely

related to « neural information processing » NIPS Neural Information Processing Systems Conference

54

Some periodicals

Periodicals

Researchers of the data mining community typically publish their work in – – – – – – –

ACM Transactions on Knowledge Discovery form Data Machine Learning Journal of Machine Learning Research Neural Computation IEEE Transactions on Neural Networks Pattern Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence – IEEE Transactions on Knowledge and Data Engineering – Etc…

56

Some books

Books

Hastie, Tibshirani & Friedman (2001) “The elements of statistical learning”.

Mitchell (1997) “Machine learning”.

Webb (2002) “Statistical pattern recognition, 2nd ed”. 58

Books

Bishop (1995) “Neural networks for pattern recognition”.

Theodoridis & Koutroumbas (2005) “Pattern recognition, 3nd ed”

Cornuéjols & Miclet (2002) “Apprentissage artificiel”. 59

Books

Duda, Hart & Stork (2001) “Pattern classification”.

Ripley (1996) “Pattern recognition and neural networks”.

Bishop (2006) “Pattern recognition and machine learning”.

60

Books

Alpaydin (2004) “Introduction to machine learning”.

Han & Kamber (2006) “Data mining. Concepts and techniques, 2nd ed”.

Tan, Steinbach & Kumar (2005) “Introduction to data mining”.

61

More books… Alpaydin (2004), “Introduction to machine learning”. MIT Press. Bardos (2001), "Analyse discriminante. Application au risque et scoring financier. Dunod. Bishop (1995), "Neural networks for pattern recognition". Clarendon Press. Bishop (2006), “Pattern recognition and machine learning”. Springer-Verlag. Bouroche & Saporta (1983), "L'analyse des données". Que Sais-je. Cornuéjols & Miclet (2002), "Apprentissage artificiel. Concepts et algorithmes". Eyrolles. Duda, Hart & Stork (2001), "Pattern classification, 2nd ed". John Wiley & Sons. Dunham (2003), "Data mining. Introductory and advanced topics". Prentice-Hall. Greenacre (1984), "Theory and applications of correspondence analysis". Academic Press. Han & Kamber (2006), "Data mining: Concepts and techniques, 2nd ed". Morgan Kaufmann. Hand (1981), "Discrimination and classification". John Wiley & Sons. Hardle & Simar (2003), "Applied multivariate statistical analysis". Springer-Verlag. Disponible à http://www.quantlet.com/mdstat/scripts/mva/htmlbook/mvahtml.html Hastie, Tibshirani & Friedman (2001), "The elements of statistical learning". Springer-Verlag. Johnson & Wichern (2002), "Applied multivariate statistical analysis, 5th ed". Prentice-Hall. Lebart, Morineau & Piron (1995), "Statistique exploratoire multidimensionnelle". Dunod. Mitchell (1997), "Machine learning". McGraw-Hill. Naim, Wuillemin, Leray, Pourret & Becker (2004), “Réseaux bayesiens”. Editions Eyrolles. Nilsson (1998), “Artificial intelligence: A new synthesis”. Morgan Kaufmann. Ripley (1996), "Pattern recognition and neural networks". Cambridge University Press. Rosner (1995), "Fundamentals of biostatistics, 4th ed".Wadsworth Publishing Company. Saporta (1990), "Probabilités, analyse des données et statistique". Editions Technip. Tan, Steinbach & Kumar (2005), “Introduction to data mining”. Addison Wesley. Theodoridis & Koutroumbas (2006), "Pattern recognition, 3nd ed". Academic Press. Therrien (1989), "Decision, estimation and classification". Wiley & Sons. Venables & Ripley (2002), "Modern applied statistics with S. Springer-Verlag. Wasserman (2004), “All of statistics”. Springer. Webb (2002), "Statistical pattern recognition, 2nd ed". John Wiley and Sons.

62

Some softwares

Softwares SAS/Enterprise

Miner You benefit from the full power of SAS – SAS/Stat, Base, Graph, etc Highly

scalable Quite complete, but expensive

64

Softwares/SAS EM

65

Softwares

SPSS/Clementine One of the first specific data mining software – Less scalable than SAS – Less expensive

Same look and feel as SAS/EM – Palette with icons – Drag & drop of models – Data flux

66

Softwares/SPSS

67

Softwares StatSoft/DataMiner Released

recently Relies on Statistica Same look and feel

68

Softwares/StatSoft

69

Softwares

Insightful Miner Relies on S-Plus Same look and feel

70

Softwares/Insightful

71

Softwares

IBM/DB2 Intelligent Miner – One of the first on the market – Quite impressive for data visualisation

KXEN – A newcomer – Quite controversial – A bit “Black Box” 72

Softwares

Of course, you do not need a specific data mining product in order to do data mining Statistical softwares integrate data mining techniques as well – – – – –

S-Plus R SAS/JMP Stata Etc… 73

Open source softwares

WEKA – www.cs.waikato.ac.nz/~ml/weka/ – Data mining techniques implemented in Java

Pentaho (integrates WEKA) – www.pentaho.com – BI open source suite

R – www.r-project.org – Statistical software including data mining techniques

Orange – http://www.ailab.si/orange – Data mining tool in Python

Rapid Miner – http://rapid-i.com

74