Teaching Responsibility
LJMU Schools involved in Delivery:
Computer Science and Mathematics
Learning Methods
Lecture
Practical
Module Offerings
6020DACOMP-SEP-CTY
Aims
The aim of this module is to develop the knowledge and skills for working effectively with the large scale data storage and processing frameworks that underpin data science.
Learning Outcomes
1.
Differentiate between the functions of the components of big data storage and processing frameworks
2.
Appraise emerging trends in large scale data storage and processing
3.
Formulate a machine learning/analytics exercise for a given subject area
Module Content
Outline Syllabus:Big Data
Volume – tracks what happens
Velocity – real-time
Variety – text, images, audio, video
Big Data Difficulties
Variability – inconsistency of data
Veracity – quality of data
Complexity – complex data management issues
Big Data storage and Analysis Tools
Apache Hadoop
Hadoop provenance
Apache Hadoop Framework
Common
Distributed File System (HDFS)
YARN
MapReduce
Job Tracker
Task Tracker
Apache Hadoop Tools
Pig (Pig Latin, ETL)
Hive (data warehousing + SQL) in detail
Apache Spark (in-memory analytics) in detail
Apache Mahout (machine learning system) in detail
Apache SOLR (scalable search tool)
Hadoop in the Cloud - Amazon EC2/S3 Services
Emerging Trends in Big Data storage and processing
Additional Information:This module provides both theoretical and practical experience of large scale data storage considerations and the development of tools to support the processing of that data.