Module Code: 
H9DAI 
Long Title

Data Analytics for Artificial Intelligence

Title

Data Analytics for Artificial Intelligence

Module Level: 
LEVEL 9 
EQF Level: 
7 
EHEA Level: 
Second Cycle 
Module Coordinator: 
Rejwanul Haque 
Module Author: 
Shauni Hegarty 
Departments: 
School of Computing

Specifications of the qualifications and experience required of staff 
PhD/Master’s degree in a computing or cognate discipline. May have industry experience also.

Learning Outcomes 
On successful completion of this module the learner will be able to: 
# 
Learning Outcome Description 
LO1 
Retrieve, extract, manipulate, synthesise, explore,and visualise data in preparation for data analysis and machine learning 
LO2 
Demonstrate expert knowledge of the theory, concepts and methods associated with the analysis of data using numerical and statistical techniques to assist on decisionmaking. 
LO3 
Use fundamental machine learning concepts and techniques to build and evaluate machine learning models on various problem domains. 
LO4 
Evaluate and employ graphical tools for building comprehensive analytics processes and dashboards. 
LO5 
Critically analyse, compare, summarise, and present results to support decision making and address requirements in realworld problems. 
Dependencies 
Module Recommendations
This is prior learning (or a practical skill) that is required before enrolment on this module. While the prior learning is expressed as named NCI module(s) it also allows for learning (in another module or modules) which is equivalent to the learning specified in the named module(s).

No recommendations listed 
Corequisite Modules

No Corequisite modules listed 
Entry requirements 
Applicants are required to hold a minimum of a Level 8 honours qualification (2.2 or higher) or equivalent on the National Qualifications Framework in either STEM (e.g., Information Management Systems, Information Technologies, Computer Science, Computer Engineer) or Business (e.g., Business Information Systems, Business Administration, Economics) discipline and a minimum of three years of relevant work experience in industry, ideally but not necessarily, in management. Previous numerical and computer proficiencies should be part of their work experience or formal training. Graduates from disciplines which do not have technical or mathematical problemsolving skills embedded in their programme will need to be able to demonstrate technical or mathematical problemsolving skills in addition to their level 8 programme qualifications (Certifications, Additional Qualifications, Certified Experience and Assessment Tests). All applicants for the programme must provide evidence that they have prior Mathematics and Computing module experience (e.g., via academic transcripts or recognised certification) as demonstrated in one mathematics/statistics module and one computing module or statement of purpose must specify numerical and computing work experience.
NCI also operates a prior experiential learning policy where graduates with lower, or no formal qualifications, currently working in a relevant field, may be considered for the programme.
Applicants must also be able to have their own laptop with the minimum required specification that will be communicated to each applicant through both the admissions and marketing departments.

Module Content & Assessment
Indicative Content 
Introduction to data analytics, nature of data
Introduction to data analytics, nature of data, data analysis process/spectrum (descriptive, diagnostic, predictive, prescriptive).
Measures of central tendency (mode, median, mean) Measures of dispersion (range, variance, standard deviation)
Data mining methodologies (e.g., CRISPDM, KDD)

Data Collection and Data Manipulation
Sources of data, data repositories, gather and Import data.
Learn different file formats, relational and no relational databases, APIs, web scraping
Selecting columns, rows, grouping, aggregation, filtering, joining datasets, remove duplicates, string manipulation, regular expressions, data cleaning

Data Preprocessing and Transformation Strategies
Discretisation and binning,feature normalisation,filtering outliers, handling missing values, handling class imbalance, handing categorial data, scaling, feature selection techniques

Data Presentation (Visualisations) Reporting
Communicating and sharing Data Analysis Findings. Understand trends, outliers, and patterns in data through appropriate visualisations such as scatter plots, histograms, boxplots, pie charts, bar charts, overlayed bar charts, clustered bar charts, line charts, etc.

Statistical Analysis Hypothesis & Inference
Statistical analysis, different kinds of hypothesis tests, Standard Errors Hypothesis Testing, Parametric Tests (e.g., TTest, ANOVA, regression), Nonparametric Tests (e.g., chisquare tests)Correlation, Zstatistic, Distributions, Sample size, Confidence intervals, significance levels, pvalues, effect size

Dimension Reduction methods
Need for dimension reduction, Principal Component Analysis, Singular Value Decomposition, Eigenvalues Criterion, Factor analysis, Backward Feature Elimination, Cross correlation

Prediction (Regression)
Simple Linear regression, pvalue, Fstatistics, residual standard error, Multiple Linear Regression, Logistic Regression, Forecasting

Classification
Binary Classification, Multi Class Classification, MultiLabel Classification,kNearest Neighbour, choosing k, Decision Trees, Random Forests, SVM, Logistic Regression

Clustering
What is clustering, distances (e.g., Euclidean, Manhattan, Minkowski).
Normalising distances
Hierarchical clustering methods, KMeans, Kmeans++,distortion cost function, choosing value of k, Density Based Clustering (DBScan)

Modelling, Evaluation
Splitting a dataset, training, testing and validation, cross validation.
Resampling methods.
Confusion matrix, Accuracy, Precision, Recall, F1 score, Roc curve. Sample size. Sampling methods (e.g., random, cluster)

Time series Analysis
Smoothing data, Analysing time series, curve fitting, seasonality. Moving averages, ARIMA (Seasonal, Nonseasonal)

Content analysis
Document classification, entity extraction, tokenizing, Filtering of Tokens,topic modelling, language modelling, Term Frequencies, Document Frequencies.
Bayesian classification
Handling unstructured Data, Stemming, syntax and semantics, wordembedding vectors;

Assessment Breakdown  % 
Coursework  100.00% 
AssessmentsFull Time
Coursework 
Assessment Type: 
Formative Assessment 
% of total: 
NonMarked 
Assessment Date: 
n/a 
Outcome addressed: 
1,2,3,4,5 
NonMarked: 
Yes 
Assessment Description: Formative assessment will be provided on the inclass individual or group activities. Feedback will be provided in written or oral format, or online through Moodle. In addition, in class discussions will be undertaken as part of the practical approach to learning. 

Assessment Type: 
Continuous Assessment 
% of total: 
30 
Assessment Date: 
n/a 
Outcome addressed: 
1,2,4 
NonMarked: 
No 
Assessment Description: Assessment will be through an inclass, open book test, that will require learners to retrieve, extract, manipulate and present data. Learners will be also asked to make statistical inferences and draw conclusions about a population. 

Assessment Type: 
Project 
% of total: 
70 
Assessment Date: 
n/a 
Outcome addressed: 
1,2,3,4,5 
NonMarked: 
No 
Assessment Description: The terminal assessment will consist of a project that will evaluate all learning outcomes. Learners will have to identify and carry out a series of analytic tasks upon a large dataset (or a collection of datasets that are somehow related or complement each other), utilising appropriate tools and techniques for data extraction, processing, analysis and critical evaluation. The final submission will consist of an academic research paper style report as well as the implemented data analytics artefact. It is also expected students to present and communicate the results/insights of their study. 

No End of Module Assessment 
Reassessment Requirement 
Repeat examination
Reassessment of this module will consist of a repeat examination. It is possible that there will also be a requirement to be reassessed in a coursework element.

NCIRL reserves the right to alter the nature and timings of assessment
Module Workload
Module Target Workload Hours 0 Hours 
Workload: Full Time 
Workload Type 
Workload Description 
Hours 
Frequency 
Average Weekly Learner Workload 
Lecture 
Lectures 
24 
Per Semester 
2.00 
Independent Learning 
Independent Learning 
202 
Per Semester 
16.83 
Tutorial 
Tutorials/Practicals 
24 
Per Semester 
2.00 
Total Weekly Contact Hours 
4.00 
Module Resources
Recommended Book Resources 


McClave, J. T. & Sincich, T. (2017). Statistics(13th ed.). Pearson. [ISBN: 9780134080215]..

Bruce, P., Bruce, A., & Gedeck, P.(2020).Practical Statistics for Data Scientists(2nd ed.).O’Reilly Media.[ISBN: 9781492072942]..

Han, J., Pei, J., & Kamber, M.(2012).Data Mining: Concepts and Techniques(3rd ed.).Morgan Kaufmann. [ISBN: 9780123814791]..

Alpaydin, E. (2020). Introduction to Machine Learning. The MIT Press.[ISBN: 9780262043793]..
 Supplementary Book Resources 


ShalevShwartz, S. & BenDavid, S. (2014). Understanding Machine Learning: From Theory to Algorithms.Cambridge University Press.[ISBN: 9781107057135]..

Runkler, T.A. (2012).Data Analytics: Models and Algorithms for Intelligent Data Analysis.Springer.[ISBN: 9783834825889]..

Davies, A. (2017). Understanding Statistics: An Introduction.Cato Institute. [ISBN: 9781944424350]..

Kranzler. J.H. (2017). Statistics for the Terrified(6th ed.).Rowman & Littlefield Publishers. [ISBN: 9781538100288]..

Kelleher, J. D., MacNamee, B.,& D’Arcy, A. (2020). Fundamentals of Machine Learning for Predictive Data Analytics(2nd ed.). The MIT Press.[ISBN: 9780262044691]..

Marz, N. & Warren, J.(2015).Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications [ISBN: 9781617290343].

Hofmann, M. & Klinkenberg, R. (2013).RapidMiner: Data Mining Use Cases and Business Analytics Applications.CRC Press.[ISBN: 9781482205497]..
 This module does not have any article/paper resources 

Other Resources 


DataCamp, Learn R, Python & Data
Science Online
(https://www.datacamp.com/).

Machine Learning
Stanford(https://www.coursera.org/course
/ml).

UCI
Repository(http://www.ics.uci.edu/~mlear
n/MLRepository.html).

DataCamp (www.datacamp.com).

RapidMiner(https://rapidminer.com/).

Azure Machine
Learning(https://azure.microsoft.com/en
in/services/machinelearning/).

KaggleCompetitions
(https://www.kaggle.com/competitions).

MySQL
Tutorial(https://www.mysqltutorial.org).

mongoDB Tutorial
(https://www.mongodb.com/nosqlexplained
).

JSON
(https://developer.mozilla.org/enUS/doc
s/Learn/JavaScript/Objects/JSON).

