Examination

Machine Learning Exercise

Big Data
Volume – tracks what happens
Velocity – real-time
Variety – text, images, audio, video
Big Data Difficulties
Variability – inconsistency of data
Veracity – quality of data
Complexity – complex data management issues
Big Data storage and Analysis Tools
Apache Hadoop
Hadoop provenance
Apache Hadoop Framework
Common
Distributed File System (HDFS)
YARN
MapReduce
Job Tracker
Task Tracker
Apache Hadoop Tools
Pig (Pig Latin, ETL)
Hive (data warehousing &#43; SQL) in detail
Apache Spark (in-memory analytics) in detail
Apache Mahout (machine learning system) in detail
Apache SOLR (scalable search tool)
Hadoop in the Cloud - Amazon EC2/S3 Services
Emerging Trends in Big Data storage and processing

Big Data, Tools and Analysis - 6124COMP

Big Data, Tools and Analysis

Appraise emerging trends in large scale data storage and processing

Formulate a machine learning/analytics exercise for a given subject area

Differentiate between the functions of the components of big data storage and processing frameworks