NCI Courses

Module Code:	H9DAI
Long Title	Data Analytics for Artificial Intelligence
Title	Data Analytics for Artificial Intelligence
Module Level:	LEVEL 9
EQF Level:	7
EHEA Level:	Second Cycle

Credits:	10

Module Coordinator:	Rejwanul Haque

Module Author:	Shauni Hegarty

Departments:	School of Computing

Specifications of the qualifications and experience required of staff	PhD/Master’s degree in a computing or cognate discipline. May have industry experience also.

Learning Outcomes
On successful completion of this module the learner will be able to:
#	Learning Outcome Description
LO1	Retrieve, extract, manipulate, synthesise, explore,and visualise data in preparation for data analysis and machine learning
LO2	Demonstrate expert knowledge of the theory, concepts and methods associated with the analysis of data using numerical and statistical techniques to assist on decision-making.
LO3	Use fundamental machine learning concepts and techniques to build and evaluate machine learning models on various problem domains.
LO4	Evaluate and employ graphical tools for building comprehensive analytics processes and dashboards.
LO5	Critically analyse, compare, summarise, and present results to support decision making and address requirements in real-world problems.

Dependencies
Module Recommendations This is prior learning (or a practical skill) that is required before enrolment on this module. While the prior learning is expressed as named NCI module(s) it also allows for learning (in another module or modules) which is equivalent to the learning specified in the named module(s).
No recommendations listed
Co-requisite Modules
No Co-requisite modules listed

Entry requirements

Applicants are required to hold a minimum of a Level 8 honours qualification (2.2 or higher) or equivalent on the National Qualifications Framework in either STEM (e.g., Information Management Systems, Information Technologies, Computer Science, Computer Engineer) or Business (e.g., Business Information Systems, Business Administration, Economics) discipline and a minimum of three years of relevant work experience in industry, ideally but not necessarily, in management. Previous numerical and computer proficiencies should be part of their work experience or formal training. Graduates from disciplines which do not have technical or mathematical problem-solving skills embedded in their programme will need to be able to demonstrate technical or mathematical problem-solving skills in addition to their level 8 programme qualifications (Certifications, Additional Qualifications, Certified Experience and Assessment Tests). All applicants for the programme must provide evidence that they have prior Mathematics and Computing module experience (e.g., via academic transcripts or recognised certification) as demonstrated in one mathematics/statistics module and one computing module or statement of purpose must specify numerical and computing work experience.

NCI also operates a prior experiential learning policy where graduates with lower, or no formal qualifications, currently working in a relevant field, may be considered for the programme.

Applicants must also be able to have their own laptop with the minimum required specification that will be communicated to each applicant through both the admissions and marketing departments.

Module Content & Assessment

Indicative Content
Introduction to data analytics, nature of data Introduction to data analytics, nature of data, data analysis process/spectrum (descriptive, diagnostic, predictive, prescriptive). Measures of central tendency (mode, median, mean) Measures of dispersion (range, variance, standard deviation) Data mining methodologies (e.g., CRISP-DM, KDD)
Data Collection and Data Manipulation Sources of data, data repositories, gather and Import data. Learn different file formats, relational and no relational databases, APIs, web scraping Selecting columns, rows, grouping, aggregation, filtering, joining datasets, remove duplicates, string manipulation, regular expressions, data cleaning
Data Preprocessing and Transformation Strategies Discretisation and binning,feature normalisation,filtering outliers, handling missing values, handling class imbalance, handing categorial data, scaling, feature selection techniques
Data Presentation (Visualisations) -Reporting Communicating and sharing Data Analysis Findings. Understand trends, outliers, and patterns in data through appropriate visualisations such as scatter plots, histograms, boxplots, pie charts, bar charts, overlayed bar charts, clustered bar charts, line charts, etc.
Statistical Analysis -Hypothesis & Inference Statistical analysis, different kinds of hypothesis tests, Standard Errors Hypothesis Testing, Parametric Tests (e.g., T-Test, ANOVA, regression), Non-parametric Tests (e.g., chi-square tests)Correlation, Z-statistic, Distributions, Sample size, Confidence intervals, significance levels, p-values, effect size
Dimension Reduction methods Need for dimension reduction, Principal Component Analysis, Singular Value Decomposition, Eigenvalues Criterion, Factor analysis, Backward Feature Elimination, Cross correlation
Prediction (Regression) Simple Linear regression, p-value, F-statistics, residual standard error, Multiple Linear Regression, Logistic Regression, Forecasting
Classification Binary Classification, Multi Class Classification, Multi-Label Classification,k-Nearest Neighbour, choosing k, Decision Trees, Random Forests, SVM, Logistic Regression
Clustering What is clustering, distances (e.g., Euclidean, Manhattan, Minkowski). Normalising distances Hierarchical clustering methods, K-Means, K-means++,distortion cost function, choosing value of k, Density Based Clustering (DBScan)
Modelling, Evaluation Splitting a dataset, training, testing and validation, cross validation. Resampling methods. Confusion matrix, Accuracy, Precision, Recall, F1 score, Roc curve. Sample size. Sampling methods (e.g., random, cluster)
Time series Analysis Smoothing data, Analysing time series, curve fitting, seasonality. Moving averages, ARIMA (Seasonal, Non-seasonal)
Content analysis Document classification, entity extraction, tokenizing, Filtering of Tokens,topic modelling, language modelling, Term Frequencies, Document Frequencies. Bayesian classification Handling unstructured Data, Stemming, syntax and semantics, word-embedding vectors;

Assessment Breakdown	%
Coursework	100.00%

Assessments

Full Time

Coursework

Assessment Type:	Formative Assessment	% of total:	Non-Marked
Assessment Date:	n/a	Outcome addressed:	1,2,3,4,5
Non-Marked:	Yes
Assessment Description: Formative assessment will be provided on the in-class individual or group activities. Feedback will be provided in written or oral format, or on-line through Moodle. In addition, in class discussions will be undertaken as part of the practical approach to learning.

Assessment Type:	Continuous Assessment	% of total:	30
Assessment Date:	n/a	Outcome addressed:	1,2,4
Non-Marked:	No
Assessment Description: Assessment will be through an in-class, open book test, that will require learners to retrieve, extract, manipulate and present data. Learners will be also asked to make statistical inferences and draw conclusions about a population.

Assessment Type:	Project	% of total:	70
Assessment Date:	n/a	Outcome addressed:	1,2,3,4,5
Non-Marked:	No
Assessment Description: The terminal assessment will consist of a project that will evaluate all learning outcomes. Learners will have to identify and carry out a series of analytic tasks upon a large dataset (or a collection of datasets that are somehow related or complement each other), utilising appropriate tools and techniques for data extraction, processing, analysis and critical evaluation. The final submission will consist of an academic research paper style report as well as the implemented data analytics artefact. It is also expected students to present and communicate the results/insights of their study.

No End of Module Assessment

No Workplace Assessment

Reassessment Requirement
Repeat examination Reassessment of this module will consist of a repeat examination. It is possible that there will also be a requirement to be reassessed in a coursework element.

NCIRL reserves the right to alter the nature and timings of assessment

Module Workload

Module Target Workload Hours 0 Hours

Workload: Full Time
Workload Type	Workload Description	Hours	Frequency	Average Weekly Learner Workload
Lecture	Lectures	24	Per Semester	2.00
Independent Learning	Independent Learning	202	Per Semester	16.83
Tutorial	Tutorials/Practicals	24	Per Semester	2.00
Total Weekly Contact Hours				4.00

Module Resources

Recommended Book Resources
McClave, J. T. & Sincich, T. (2017). Statistics(13th ed.). Pearson. [ISBN: 978-0134080215].. Bruce, P., Bruce, A., & Gedeck, P.(2020).Practical Statistics for Data Scientists(2nd ed.).O’Reilly Media.[ISBN: 978-1492072942].. Han, J., Pei, J., & Kamber, M.(2012).Data Mining: Concepts and Techniques(3rd ed.).Morgan Kaufmann. [ISBN: 978-0123814791].. Alpaydin, E. (2020). Introduction to Machine Learning. The MIT Press.[ISBN: 978-0262043793]..
Supplementary Book Resources
Shalev-Shwartz, S. & Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms.Cambridge University Press.[ISBN: 978-1107057135].. Runkler, T.A. (2012).Data Analytics: Models and Algorithms for Intelligent Data Analysis.Springer.[ISBN: 978-3834825889].. Davies, A. (2017). Understanding Statistics: An Introduction.Cato Institute. [ISBN: 978-1944424350].. Kranzler. J.H. (2017). Statistics for the Terrified(6th ed.).Rowman & Littlefield Publishers. [ISBN: 978-1538100288].. Kelleher, J. D., MacNamee, B.,& D’Arcy, A. (2020). Fundamentals of Machine Learning for Predictive Data Analytics(2nd ed.). The MIT Press.[ISBN: 978-0262044691].. Marz, N. & Warren, J.(2015).Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications [ISBN: 978-1617290343]. Hofmann, M. & Klinkenberg, R. (2013).RapidMiner: Data Mining Use Cases and Business Analytics Applications.CRC Press.[ISBN: 978-1482205497]..
This module does not have any article/paper resources
Other Resources
DataCamp, Learn R, Python & Data Science Online (https://www.datacamp.com/). Machine Learning Stanford(https://www.coursera.org/course /ml). UCI Repository(http://www.ics.uci.edu/~mlear n/MLRepository.html). DataCamp (www.datacamp.com). RapidMiner(https://rapidminer.com/). Azure Machine Learning(https://azure.microsoft.com/en- in/services/machine-learning/). KaggleCompetitions (https://www.kaggle.com/competitions). MySQL Tutorial(https://www.mysqltutorial.org). mongoDB Tutorial (https://www.mongodb.com/nosql-explained ). JSON (https://developer.mozilla.org/en-US/doc s/Learn/JavaScript/Objects/JSON).

Discussion Note:

Curriculum Management Version 5.1.0

http://courses.ncirl.ie/

H9DAI - Data Analytics for Artificial Intelligence

Module Content & Assessment

Assessments

Full Time

Module Workload

Module Resources

National College of Ireland