NCI Courses - H9DMML1 - Data Mining and Machine Learning I

Module Code:	H9DMML1
Long Title	Data Mining and Machine Learning I
Title	Data Mining and Machine Learning I
Module Level:	LEVEL 9
EQF Level:	7
EHEA Level:	Second Cycle

Credits:	5

Module Coordinator:	MICHAEL BRADFORD

Module Author:	Margarete Silva

Departments:	School of Computing

Specifications of the qualifications and experience required of staff	MSc/PhD in a computing or cognate discipline. May have industry experience also.

Learning Outcomes
On successful completion of this module the learner will be able to:
#	Learning Outcome Description
LO1	Critically analyse fundamental data mining and knowledge discovery methodologies in order to assess best practice guidance when applied to data mining problems in specific contexts
LO2	Extract, transform, explore, and clean data in preparation for data mining and machine learning.
LO3	Build and evaluate data mining and machine learning models on various datasets and problem domains.
LO4	Extract, interpret and evaluate information and knowledge from various datasets.
LO5	Critically review current data mining research and assess research methods applied in the field

Dependencies
Module Recommendations This is prior learning (or a practical skill) that is required before enrolment on this module. While the prior learning is expressed as named NCI module(s) it also allows for learning (in another module or modules) which is equivalent to the learning specified in the named module(s).
No recommendations listed
Co-requisite Modules
No Co-requisite modules listed

Entry requirements	A level 8 degree or its equivalent in any discipline

Module Content & Assessment

Indicative Content
Overview of Data Mining and Machine Learning History and Evolution. Revision of data science methodologies: KDD, CRISP-DM. Data Security. Taxonomy and overview of data mining and machine learning techniques
General data pre-processing and transformation strategies Intro to prediction. Identifying and Handling Missing Values. Looking for Outliers. Transformations for Single/Multiple Predictors. Adding/removing predictors. Binning . Feature Selection
Prediction models evaluation Data Splitting and Sampling Methods (Holdout, Cross-fold Validation, Stratification, etc.). Model Tuning and Overfitting. Determining the best model
Regression Models Quantitative Methods of Performance. The Variance/Bias Trade-off. Linear Regression
Regression Models Partial Least Squares Regression. K-Nearest Neighbours Regression
Regression Models Regression Trees. Model-based Regression Trees
Regression Models Rule-based Models. Model Tuning via LASSO, ElastiNet, and similar. Computing Considerations
Classification Models Logistic Regression. Linear Discriminant Analysis
Classification Models K-Nearest Neighbours. Naïve Bayes
Classification Models Decision Trees (e.g., C5.0, Random Forests, etc.)
Unsupervised Machine Learning Notions of distance and similarity. Euclidian vs. non-Euclidian spaces. Clustering: k-means, k-medoids
Unsupervised Machine Learning Clustering for outlier detection. Plotting and understanding clusters. Cluster evaluation measures: DBIndex, WSSSE, scree plots

Assessment Breakdown	%
Coursework	100.00%

Assessments

Full Time

Coursework

Assessment Type:	Formative Assessment	% of total:	Non-Marked
Assessment Date:	n/a	Outcome addressed:	1,2,3,4,5
Non-Marked:	Yes
Assessment Description: Formative assessment will be provided on the in-class individual or group activities. Feedback will be provided in written or oral format, or on-line through Moodle. In addition, in class discussions will be undertaken as part of the practical approach to learning.

Assessment Type:	Project	% of total:	100
Assessment Date:	n/a	Outcome addressed:	1,2,3,4,5
Non-Marked:	No
Assessment Description: Produce a portfolio of studies that critically compare the performance of different machine learning methods applied to at least 3 related large datasets.

No End of Module Assessment

No Workplace Assessment

Reassessment Requirement
Coursework Only This module is reassessed solely on the basis of re-submitted coursework. There is no repeat written examination.
Reassessment Description The repeat strategy for this module is to repeat the project, learners may build upon previous submissions.

NCIRL reserves the right to alter the nature and timings of assessment

Module Workload

Module Target Workload Hours 0 Hours

Workload: Full Time
Workload Type	Workload Description	Hours	Frequency	Average Weekly Learner Workload
Lecture	Classroom & Demonstrations (hours)	24	Every Week	24.00
Tutorial	Other hours (Practical/Tutorial)	24	Every Week	24.00
Independent Learning	Independent learning (hours)	77	Every Week	77.00
Total Weekly Contact Hours				48.00

Module Resources

Recommended Book Resources
Witten, I. H., Frank, E., Hall, M. A. & Pal, C. J.. (2016), Data Mining: Practical machine learning tools and techniques (4th ed), Morgan Kaufmann. Lantz, B.. (2015), Machine learning with R (2nd ed), Packt Publishing Ltd. Kelleher, J. D., Mac Namee, B., & D'Arcy, A.. (2015), Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies, MIT Press.
Supplementary Book Resources
Mueller, A. C.. (2016), Introduction to machine learning with Python, O’Reilly. Hofmann, M., & Klinkenberg, R.. (2013), RapidMiner: Data Mining Use Cases and Business Analytics Applications, CRC Press. Han, J., Pei, J., & Kamber, M.. (2011), Data mining: concepts and techniques (3rd ed), Elsevier. Berthold, M., & Hand, D. J.. (2003), Intelligent data analysis: an introduction, Springer Science & Business Media.
This module does not have any article/paper resources
Other Resources
[website], UC Irvine Machine Learning Reposi, http://archive.ics.uci.edu/ml/ [website], Kaggle platform for predictive modelling competitions, https://www.kaggle.com/ [website], Datasets for Data Mining and Data Science, http://www.kdnuggets.com/datasets/index. html [website], Datacamp, http://www.datacamp.com [website], Bloomberg, https://www.bloomberg.com/europe [website], Yahoo! Finance, https://uk.finance.yahoo.com [website], Google Finance, https://www.google.com/finance [website], Central Statistics Office, http://www.cso.ie [website], Eurostat, http://ec.europa.eu/eurostat [website], Data.gov, https://www.data.gov [website], Amazon Web Services Public Datasets, https://aws.amazon.com/datasets [website], DataMarket, https://datamarket.com [website], The Pew Research Centre, http://www.pewresearch.org/data [website], The Fama-French Data Library, http://mba.tuck.dartmouth.edu/pages/facu lty/ken.french/data_library.html [website], Federal Reserve Economic Data (FRED), https://fred.stlouisfed.org

Discussion Note:

Powered By Akari Curriculum Management

Curriculum Management Version 5.1.0