Big Data, Tools and Analysis

Teaching Responsibility

LJMU Schools involved in Delivery:

Computer Science and Mathematics

Learning Methods

Lecture

Practical

Module Offerings

6020DACOMP-SEP-CTY

Aims

The aim of this module is to develop the knowledge and skills for working effectively with the large scale data storage and processing frameworks that underpin data science.

Learning Outcomes

Differentiate between the functions of the components of big data storage and processing frameworks

Appraise emerging trends in large scale data storage and processing

Formulate a machine learning/analytics exercise for a given subject area

Module Content

Outline Syllabus:Big Data Volume – tracks what happens Velocity – real-time Variety – text, images, audio, video Big Data Difficulties Variability – inconsistency of data Veracity – quality of data Complexity – complex data management issues Big Data storage and Analysis Tools Apache Hadoop Hadoop provenance Apache Hadoop Framework Common Distributed File System (HDFS) YARN MapReduce Job Tracker Task Tracker Apache Hadoop Tools Pig (Pig Latin, ETL) Hive (data warehousing + SQL) in detail Apache Spark (in-memory analytics) in detail Apache Mahout (machine learning system) in detail Apache SOLR (scalable search tool) Hadoop in the Cloud - Amazon EC2/S3 Services Emerging Trends in Big Data storage and processing

Additional Information:This module provides both theoretical and practical experience of large scale data storage considerations and the development of tools to support the processing of that data.

Course Catalogue