With every search on the Internet, with every like on social networks, with every online purchase, with every ticket ordered, you generate data about yourself. And there are more than 4 billion active network users like you. Imagine the volumes of information!

And then here is big data, or Big Data?

In the last two years alone, 90% of the world’s data has been generated. 2.5×10 18 bytes of information are created daily. This information is so complex and extensive that it is difficult to parse it with a relational database . Therefore, a separate term Big Data was introduced, which has its own characteristics and work tools.

большие данные поток Big data источники

What is Big Data?

Big Data is also data , but which is too diverse and extensive for conventional technologies. Clifford Lynch coined the term “big data” in 2008 with his paper for the journal Nature. The sizes of such arrays of information are not just large, but also increase with increasing speed – exponentially .

For example, in 2010, Eric Schmidt at the Techonomy conference in California reported that from the beginning of time until 2003 , a total of 5 exabytes (10 18 ) of data is stored. The engineer may not have suspected that by 2016 the same amount of information would be generated every two days.

How “big” is big data?

Data with a size of 10 15 bytes is called large .

Back in 2008, the world operated with 0.18 zettabytes, and already in 2015 the volume of information increased to 7.4 zettabytes, in 2020 – up to 40–44 zettabytes, while in 2025 this volume is predicted to increase by 10 times.

1 Zb = 1024 exabytes, where 1 exabyte = 10 18 bytes

What is big data?

There are 3 types of Big Data in analytics:

  • Structured data

This is the name for easily accessible arrays that are stored in a fixed format . They are easy to use because they are easy to store, sort, analyze and process. Structured data has clear dimensions that can be defined by changing parameters. Because of the fixed format, each field is unique and can be extracted individually or in combination with data from other fields.

database structured big data example
Table of cities – an example of structured data
  • Unstructured data

These are data arrays with no defined structure . Therefore, problems arise when it comes to processing and extracting value from them. The photos we post on Instagram or Facebook, the videos we watch on platforms, Google search results are examples of unstructured data. Although organizations have access to a large amount of information, they have no idea how to extract useful information from it, because the data is in its raw form .

structured big data type example
Google search – an example of data without structure
  • semi-structured data

It is a mixture of structured and unstructured data. This type consists of information that does not have a defined structure and does not correspond to relational databases.

semi-structured data example
Semi-structured data example

Properties 3V, 5V Big Data

In 2001 Meta Group identified three main characteristics ( 3V ) of big data : volume , velocity , variety .

3V VVV Big Data
3V big data
  1. Volume (volume)

The name Big Data already implies that the volumes of information are too large and unmanageable to apply special software to them.

  1. Velocity (generalization speed)

Data growth rate . Determines how quickly information arrays are generated and processed, their potential . The data flow is massive and continuous.

  1. Variety (variety)

This is the organization and efficient processing of information of different formats.

There are also two more characteristics of big data ownership – veracity, value . Together with volume, velocity and variety they form the 5V Big Date concept .

5V Big Data
5V big data
  1. Veracity (veracity)

Determines the quality and validity of the data. Truthfulness is the level of confidence in the information collected. Since sometimes large amounts of information can often cause more confusion than understanding.

  1. Value (value)

Data is useful if valuable information can be extracted from it. When working with Big Data, organizations can use standard collection and analysis tools. Only ways to extract value from arrays need to be unique .

Where does big data come from?

big data sources social media illustration
Social media data
  • Documentation . Documentation in any format, such as HTML, CSV, PDF, XLS, Word, XML, and so on.
  • Media . Images, video, audio, live streaming, podcasts.
  • Social networks . Big Data companies such as Facebook and Google receive data about any activity we perform. Other examples are YouTube, Twitter, LinkedIn, blogs, Instagram, WordPress, Jive, and more.
  • Public Websites . This data comes from Wikipedia, health services, World Bank, government, weather, traffic.
  • Archives . These are archives of personal data such as medical records, customer correspondence, insurance forms.
  • Data storage . Databases and file systems.
  • Machine log data . Data from servers, application logs, audit logs, CDR call detail records, mobile location, and more
  • Sensor data . Data from medical device sensors, road cameras, satellites.

Why use Big Data? Benefits of big data technology

Despite the difficulties, 94% of business representatives consider the implementation of Big Data a necessity for growth. And 59% of organizations are already using big data analytics .

The use of Big Data helps organizations

  • Understand where, when and why customers buy
  • Optimize operations and workforce planning
  • Anticipate market trends and future needs
  • Make companies more innovative and competitive
  • Open new sources of income
  • Protect the company’s customer base

See also: creating a 3D model: any shape with micron accuracy

What is big data analytics?

With the explosive growth of Big Data, databases have been created to work with them: Hadoop , Spark and MySQL . Now almost all online resources have implemented this technology. The use of this tool ensures the compactness and relevance of information, and also greatly simplifies the work with large amounts of information. The KLONA company provides its services in the development, configuration and modification of databases . Thanks to many years of experience , KLONA knows exactly what kind of database and what kind of tools your business needs .

There are 4 steps of Big Data analytics : collection , processing , cleaning , analysis of large arrays.

stages of working with big data step by step analysis
Stages of working with Big Data
  1. Collection

This stage is different for every organization. Thanks to technology, organizations can collect data from mobile app cloud storage and even from in-store IoT sensors . Some data resides in data warehouses where analysts can easily access it.

  1. Treatment

Data , especially unstructured data, must be properly organized . One processing option is batch processing . It considers large blocks of data over time . Batch processing is useful when more time is required between data collection and analysis. Streaming considers small batches of data at the same time for faster decision making. Stream processing is more complex and often more expensive.

  1. cleaning

Raw data can be misleading, creating erroneous ideas. Arrays of information require cleaning to improve the quality and obtain more accurate results . All sets of information must be properly formatted , and any duplicate or irrelevant parts must be removed .

  1. Analysis

Preparing large arrays for use takes time. Once they’re ready, advanced analytics processes can turn big data into big insights .

scheme big data big ideas analytics

How to analyze big data?

The main methods of big data analysis are distinguished: machine learning, statistical learning and intellectual analysis.

  1. Data mining ( data mining )

Data mining methods can be organized into two main classes : supervised and unsupervised methods.

In supervised learning , there is an outcome of interest. And we need to develop a forecasting model and achieve this result.

In unsupervised learning, there is no outcome variable that we would like to achieve. Our goal is to group variables or pieces of data based on their degree of similarity . Unsupervised learning is commonly used in psychological research.

  1. Machine learning

This is a method well known in the field of artificial intelligence. Originating from computer science, machine learning works with computer algorithms to make guesses from data. It provides predictions that would be impossible for human analysts.

  1. Statistical learning

Uses an organization’s historical data to predict the future, identify upcoming risks and opportunities.

See also : All about creating 3D models from photos

What are the disadvantages of Big Data?

In addition to the advantages, the big data system also has its disadvantages .

  • Confidentiality

The biggest disadvantage of Big Data is the danger of cyberattacks . Even giant companies have faced cases of massive information leaks. However, with the implementation of the GDPR , businesses are increasingly investing in infrastructure to support Big Data.

  • Systems overload

Big data can create congestion and noise, reducing its own usefulness. Companies have to process information arrays, identify noise and filter out the excess.

  • The need for special treatment even before application

Structured data is easy to store and sort. But unstructured ones , like emails, videos, and text documents, require sophisticated techniques to be useful.

  • The need for a good technical base

Working with big data requires a high level of technical proficiency . That is why Big Data analysts belong to a highly paid group in the IT field.

Big Data: examples and applications of technology

  • Government and public administration
  • healthcare
  • cybersecurity
  • Transport

The automotive industry has long embraced big data. They are used to produce better components , improve driver safety and increase car sales . Automotive manufacturers such as BMW benefit from the analysis of extensive datasets, such as predictive maintenance. This is how they create customized customer solutions and the cars of tomorrow.

  • Marketing

Consumer behavior analysis is a completely new level of working with data. Thanks to information collected from GPS, social networks and the Internet (for example, purchase history or published opinions), companies can now analyze the reactions of not only selected customer groups, but even specific individuals .

  • Medicine

Big data analytics in this sector can help improve patient care ; support for clinical research; health security monitoring ; creation of management control systems and counteraction to epidemics and other threats .

The use of Big Data in healthcare contributes to the improvement of telemedicine, the adjustment of the number of personnel, and the improvement of disease research.

Susan Atliger – How to deal with big data?

The future of big data. Where are we heading?

In the digital age, data is probably our most valuable resource and product at the same time.

big data meme

The use of Big Data helps to create new services, develop new business models, and sell products. Not only large businesses analyze arrays of information, but also small businesses , corporate and even creative .

In the world as we know it, the amount of information generated will continue to grow . Companies and government agencies need to create a culture of data science by incorporating it into their structures.