Module Code: |
H9DAI |
Long Title
|
Data Analytics for Artificial Intelligence
|
Title
|
Data Analytics for Artificial Intelligence
|
Module Level: |
LEVEL 9 |
EQF Level: |
7 |
EHEA Level: |
Second Cycle |
Module Coordinator: |
Rejwanul Haque |
Module Author: |
Shauni Hegarty |
Departments: |
School of Computing
|
Specifications of the qualifications and experience required of staff |
PhD/Master’s degree in a computing or cognate discipline. May have industry experience also.
|
Learning Outcomes |
On successful completion of this module the learner will be able to: |
# |
Learning Outcome Description |
LO1 |
Retrieve, extract, manipulate, synthesise, explore,and visualise data in preparation for data analysis and machine learning |
LO2 |
Demonstrate expert knowledge of the theory, concepts and methods associated with the analysis of data using numerical and statistical techniques to assist on decision-making. |
LO3 |
Use fundamental machine learning concepts and techniques to build and evaluate machine learning models on various problem domains. |
LO4 |
Evaluate and employ graphical tools for building comprehensive analytics processes and dashboards. |
LO5 |
Critically analyse, compare, summarise, and present results to support decision making and address requirements in real-world problems. |
Dependencies |
Module Recommendations
This is prior learning (or a practical skill) that is required before enrolment on this module. While the prior learning is expressed as named NCI module(s) it also allows for learning (in another module or modules) which is equivalent to the learning specified in the named module(s).
|
No recommendations listed |
Co-requisite Modules
|
No Co-requisite modules listed |
Entry requirements |
Applicants are required to hold a minimum of a Level 8 honours qualification (2.2 or higher) or equivalent on the National Qualifications Framework in either STEM (e.g., Information Management Systems, Information Technologies, Computer Science, Computer Engineer) or Business (e.g., Business Information Systems, Business Administration, Economics) discipline and a minimum of three years of relevant work experience in industry, ideally but not necessarily, in management. Previous numerical and computer proficiencies should be part of their work experience or formal training. Graduates from disciplines which do not have technical or mathematical problem-solving skills embedded in their programme will need to be able to demonstrate technical or mathematical problem-solving skills in addition to their level 8 programme qualifications (Certifications, Additional Qualifications, Certified Experience and Assessment Tests). All applicants for the programme must provide evidence that they have prior Mathematics and Computing module experience (e.g., via academic transcripts or recognised certification) as demonstrated in one mathematics/statistics module and one computing module or statement of purpose must specify numerical and computing work experience.
NCI also operates a prior experiential learning policy where graduates with lower, or no formal qualifications, currently working in a relevant field, may be considered for the programme.
Applicants must also be able to have their own laptop with the minimum required specification that will be communicated to each applicant through both the admissions and marketing departments.
|
Module Content & Assessment
Indicative Content |
Introduction to data analytics, nature of data
Introduction to data analytics, nature of data, data analysis process/spectrum (descriptive, diagnostic, predictive, prescriptive).
Measures of central tendency (mode, median, mean) Measures of dispersion (range, variance, standard deviation)
Data mining methodologies (e.g., CRISP-DM, KDD)
|
Data Collection and Data Manipulation
Sources of data, data repositories, gather and Import data.
Learn different file formats, relational and no relational databases, APIs, web scraping
Selecting columns, rows, grouping, aggregation, filtering, joining datasets, remove duplicates, string manipulation, regular expressions, data cleaning
|
Data Preprocessing and Transformation Strategies
Discretisation and binning,feature normalisation,filtering outliers, handling missing values, handling class imbalance, handing categorial data, scaling, feature selection techniques
|
Data Presentation (Visualisations) -Reporting
Communicating and sharing Data Analysis Findings. Understand trends, outliers, and patterns in data through appropriate visualisations such as scatter plots, histograms, boxplots, pie charts, bar charts, overlayed bar charts, clustered bar charts, line charts, etc.
|
Statistical Analysis -Hypothesis & Inference
Statistical analysis, different kinds of hypothesis tests, Standard Errors Hypothesis Testing, Parametric Tests (e.g., T-Test, ANOVA, regression), Non-parametric Tests (e.g., chi-square tests)Correlation, Z-statistic, Distributions, Sample size, Confidence intervals, significance levels, p-values, effect size
|
Dimension Reduction methods
Need for dimension reduction, Principal Component Analysis, Singular Value Decomposition, Eigenvalues Criterion, Factor analysis, Backward Feature Elimination, Cross correlation
|
Prediction (Regression)
Simple Linear regression, p-value, F-statistics, residual standard error, Multiple Linear Regression, Logistic Regression, Forecasting
|
Classification
Binary Classification, Multi Class Classification, Multi-Label Classification,k-Nearest Neighbour, choosing k, Decision Trees, Random Forests, SVM, Logistic Regression
|
Clustering
What is clustering, distances (e.g., Euclidean, Manhattan, Minkowski).
Normalising distances
Hierarchical clustering methods, K-Means, K-means++,distortion cost function, choosing value of k, Density Based Clustering (DBScan)
|
Modelling, Evaluation
Splitting a dataset, training, testing and validation, cross validation.
Resampling methods.
Confusion matrix, Accuracy, Precision, Recall, F1 score, Roc curve. Sample size. Sampling methods (e.g., random, cluster)
|
Time series Analysis
Smoothing data, Analysing time series, curve fitting, seasonality. Moving averages, ARIMA (Seasonal, Non-seasonal)
|
Content analysis
Document classification, entity extraction, tokenizing, Filtering of Tokens,topic modelling, language modelling, Term Frequencies, Document Frequencies.
Bayesian classification
Handling unstructured Data, Stemming, syntax and semantics, word-embedding vectors;
|
Assessment Breakdown | % |
Coursework | 100.00% |
AssessmentsFull Time
Coursework |
Assessment Type: |
Formative Assessment |
% of total: |
Non-Marked |
Assessment Date: |
n/a |
Outcome addressed: |
1,2,3,4,5 |
Non-Marked: |
Yes |
Assessment Description: Formative assessment will be provided on the in-class individual or group activities. Feedback will be provided in written or oral format, or on-line through Moodle. In addition, in class discussions will be undertaken as part of the practical approach to learning. |
|
Assessment Type: |
Continuous Assessment |
% of total: |
30 |
Assessment Date: |
n/a |
Outcome addressed: |
1,2,4 |
Non-Marked: |
No |
Assessment Description: Assessment will be through an in-class, open book test, that will require learners to retrieve, extract, manipulate and present data. Learners will be also asked to make statistical inferences and draw conclusions about a population. |
|
Assessment Type: |
Project |
% of total: |
70 |
Assessment Date: |
n/a |
Outcome addressed: |
1,2,3,4,5 |
Non-Marked: |
No |
Assessment Description: The terminal assessment will consist of a project that will evaluate all learning outcomes. Learners will have to identify and carry out a series of analytic tasks upon a large dataset (or a collection of datasets that are somehow related or complement each other), utilising appropriate tools and techniques for data extraction, processing, analysis and critical evaluation. The final submission will consist of an academic research paper style report as well as the implemented data analytics artefact. It is also expected students to present and communicate the results/insights of their study. |
|
No End of Module Assessment |
Reassessment Requirement |
Repeat examination
Reassessment of this module will consist of a repeat examination. It is possible that there will also be a requirement to be reassessed in a coursework element.
|
NCIRL reserves the right to alter the nature and timings of assessment
Module Workload
Module Target Workload Hours 0 Hours |
Workload: Full Time |
Workload Type |
Workload Description |
Hours |
Frequency |
Average Weekly Learner Workload |
Lecture |
Lectures |
24 |
Per Semester |
2.00 |
Independent Learning |
Independent Learning |
202 |
Per Semester |
16.83 |
Tutorial |
Tutorials/Practicals |
24 |
Per Semester |
2.00 |
Total Weekly Contact Hours |
4.00 |
Module Resources
Recommended Book Resources |
---|
-
McClave, J. T. & Sincich, T. (2017). Statistics(13th ed.). Pearson. [ISBN: 978-0134080215]..
-
Bruce, P., Bruce, A., & Gedeck, P.(2020).Practical Statistics for Data Scientists(2nd ed.).O’Reilly Media.[ISBN: 978-1492072942]..
-
Han, J., Pei, J., & Kamber, M.(2012).Data Mining: Concepts and Techniques(3rd ed.).Morgan Kaufmann. [ISBN: 978-0123814791]..
-
Alpaydin, E. (2020). Introduction to Machine Learning. The MIT Press.[ISBN: 978-0262043793]..
| Supplementary Book Resources |
---|
-
Shalev-Shwartz, S. & Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms.Cambridge University Press.[ISBN: 978-1107057135]..
-
Runkler, T.A. (2012).Data Analytics: Models and Algorithms for Intelligent Data Analysis.Springer.[ISBN: 978-3834825889]..
-
Davies, A. (2017). Understanding Statistics: An Introduction.Cato Institute. [ISBN: 978-1944424350]..
-
Kranzler. J.H. (2017). Statistics for the Terrified(6th ed.).Rowman & Littlefield Publishers. [ISBN: 978-1538100288]..
-
Kelleher, J. D., MacNamee, B.,& D’Arcy, A. (2020). Fundamentals of Machine Learning for Predictive Data Analytics(2nd ed.). The MIT Press.[ISBN: 978-0262044691]..
-
Marz, N. & Warren, J.(2015).Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications [ISBN: 978-1617290343].
-
Hofmann, M. & Klinkenberg, R. (2013).RapidMiner: Data Mining Use Cases and Business Analytics Applications.CRC Press.[ISBN: 978-1482205497]..
| This module does not have any article/paper resources |
---|
Other Resources |
---|
-
DataCamp, Learn R, Python & Data
Science Online
(https://www.datacamp.com/).
-
Machine Learning
Stanford(https://www.coursera.org/course
/ml).
-
UCI
Repository(http://www.ics.uci.edu/~mlear
n/MLRepository.html).
-
DataCamp (www.datacamp.com).
-
RapidMiner(https://rapidminer.com/).
-
Azure Machine
Learning(https://azure.microsoft.com/en-
in/services/machine-learning/).
-
KaggleCompetitions
(https://www.kaggle.com/competitions).
-
MySQL
Tutorial(https://www.mysqltutorial.org).
-
mongoDB Tutorial
(https://www.mongodb.com/nosql-explained
).
-
JSON
(https://developer.mozilla.org/en-US/doc
s/Learn/JavaScript/Objects/JSON).
|
|