big data testing
Interview QA

Big Data Testing- Meaning, Need and Challenges

Big Data Testing

This blog is meant for the Techies who are interested to grab the knowledge about Big Data Ecosystem or want to know the cases we test while doing the Big Data Testing. Following will be the Topics for this blog:

  • What is the meaning of Big Data?
  • Characteristics of Big Data (10 V’s of Big Data)
  • Need of Testing
  • How to Perform Big Data Testing?
  • Performance Test in Big Data
  • Big Data Testing Environment
  • Challenges we Face while Testing Big Data
  • Tools we use in Big Data Testing

Meaning of Big Data

Big data is a new word in the Software industry due to a large amount of data.

Big Data Refers to the huge and complex data sets from new data sources. These data sets are so huge and complex that traditional data processing software is not capable to store it and process it.

Sources of Big Data – Data collected from

  •      Sensors
  •      Devices
  •      Video/audio
  •      Networks
  •      Log files
  •     Transactional applications
  •      Web
  •      Social media

Types of big data are:

  1. Structured: Organized data in the form of fixed format. Ex: RDBMS
  2. Semi-Structured: Partially organized data which does not have a fixed format. Ex: XML, JSON
  3. Unstructured: Data with unknown format or structure is called Unstructured data. Ex: Audio, video files etc.

Characteristics of Big Data (10 V’s of Big Data)

big data

Need of Big Data Testing

In this , data plays a crucial role. If Big Data System is not tested Correctly then it will affect the business, it will be difficult to understand the error, cause of the error and where it occurs, and finding the  solution will also become difficult.

big data

How to Perform Big Data Testing?

The testing process is divided into below 3 phases:

  • Data Ingestion
  • Data Processing
  • Validation of the Output 
big data

Data Ingestion: In this Phase,we fill data from various sources in the Big Data System via extracting tools. Hadoop Distributed File System (HDFS), MongoDB, etc. are examples of storage.

Then, we test the filled data for errors, corrupt and missing data

Data Processing: In this phase, also known as ‘MapReduce Validation’, the generation of key-value pairs for data takes place. Various nodes apply MapReduce and check whether the algorithms are working as expected or not.

A data validation process is done here to check if generated data is as per expectations or not.

Data Storage: The Final step of Big Data Testing is to store output data in HDFS or any other storage system (such as Data Warehouse).

In this Phase, transformation logic, data integrity is verified, and the key-value pairs are validated for accuracy.

http://www.csueastbay.edu

Performance Testing Approach

big data

In a very short span of time, a big data system processes a huge amount of structured and unstructured data.This can lead to performance issues.

So, in Big Data, it is essential to Test Performance Test to ignore such bottlenecks. We focus on below points while doing Performance Testing of Big Data System:

Data loading and Throughput: In this Phase, we test how quickly data can be consumed from various Data Sources and the rate at which data is created in the data store.

Data Processing Speed: The rate at which Map Reduce Jobs are executed is calculated at this stage.

Sub-System Performance: As the system has multiple components, it is necessary to test the components individually.

Parameters for Performance Testing

  • Data Storage: Way to store data in different nodes
  • Commit logs: Maximum allowed size for the commit log to grow.
  • Concurrency: How many threads can perform write and read operation
  • Caching: Tune the cache setting “row cache” and “key cache.”
  • Timeouts: Values for connection timeout, query timeout, etc.
  • JVM Parameters: Heap size, GC collection algorithms, etc.
  • Map-reduce performance: Sorts, merge, etc.
  • Message queue: Message rate, size, etc.

Big Data Testing Environment

Below are the basic requirements for the setup of Big data testing environment:

  • Space for Storing, Processing, and Validating Huge the volume of data should be available.
  • It should have a responsive cluster with distributed nodes and data
  • It should have powerful CPU and memory utilization to keep performance high

Challenges we Face while Testing

big data
Challenges in Testing

Testing Tools we use in Big Data Testing

ProcessTools
Data Ingestion Zookeeper, Kafka, Sqoop
Data Processing MapR, Hive, Pig
Data Storage Amazon S3, HDFS
Data Migration Talend, Kettle, CloverDX

Leave a Reply