The term big data refers to the rapid increase in data in the organizations and societies of today. Deep learning is a recent and powerful tool, which organizations and cities use to obtain value from big data. In this page, we’ll explain what is big data, and outline four applications of deep learning for big data, including text classification and automatic tagging and automatic image caption generation.
What is Big Data?
The term big data refers to the proliferation of data in modern society and business, along three dimensions:
Volume—growing quantities of data, both structured and unstructured. Unstructured data is the fastest growing, and can take the shape of social media feeds, web analytics data, sensor data or documents generated by humans in an organization.
Velocity—data is flowing at an increasing rate. There is a need to ingest, store and process massive data streams, either in real time as they are received, or in retrospect.
Variety—traditional data types were structured and controllable by organizations. Today the number of data types is rapidly growing, from structured to semi-structured and unstructured data, including text, audio, video from millions of possible sources.
Big data is the fuel of the modern economy. It is used by the world’s biggest companies to provide products and services that are changing the face of society. The digital economy, digital lifestyle, mobile devices and applications that have become an inseparable part of daily life, are all driven by big data.
But the raw data, like crude oil, isn’t useful on its own. Over the past two decades, the industry has developed technology that can ingest, store, process, analyze and derive value quickly from big datasets. But some datasets are a harder nut to crack—complex, unstructured data is difficult to comprehend using traditional analytics techniques. This is where deep learning comes in—it can help make sense of big data often through the use of a deep learning platform.
Four Applications of Deep Learning for Big Data
Deep learning, with artificial neural networks at its core, is a new and powerful tool that can be used to derive value from big data. Most of the data today is unstructured, and deep learning algorithms are very effective at learning from, and generating predictions for, wildly unstructured data. Following are several ways deep learning is being applied to big data sets.
1. Text Classification and Automatic Tagging
Deep learning architectures including Recurrent Neural Networks and Convolutional Neural Networks are used to process free text, perform sentiment analysis and identify which categories or types it belongs to. This can help search through, organize and make use of huge unstructured datasets.
See our in-depth guide on https://indiantechwarrior.com/sentence-classification-using-convolutional-neural-networks/
2. Automatic Image Caption Generation
Deep learning is used to identify the contents of an image and automatically generate descriptive text, turning images from unstructured to structured and searchable content. This involves the use of Convolutional Neural Networks, in particular very large networks like ResNet, to perform object detection, and then using Recurrent Neural networks to write coherent sentences based on object classification.
See our in-depth guide on https://indiantechwarrior.com/image-segmentation-in-deep-learning/
3. Deep Learning in Finance
Today, most financial transactions are electronic. Stock markets generate huge volumes of data reflecting buy and sell actions, and the resulting financial metrics such as stock prices. Deep learning can ingest these huge data volumes, understand the current market position and create an accurate model of the probabilities of future price movements.
However deep learning is mainly used for analyzing macro trends or making one-time decisions such as analyzing the possibility of company bankruptcy; it is still limited in its ability to drive real-time buying decisions.
4. Deep Learning in Healthcare
Deep learning, particularly computer vision algorithms, are used to help diagnose and treat patients. Deep learning algorithms analyze blood samples, track glucose levels in diabetic patients, detect heart problems, analyze images to detect tumors, can diagnose cancer, and are able to detect osteoarthritis from an MRI scan before damage is caused to bone structures.
See our in-depth guide on https://indiantechwarrior.com/deep-learning-in-healthcare/
Key Challenges of Deep Learning and Big Data
While deep learning has tremendous potential to help derive more value from big data, is still in its infancy, and there are significant challenges facing researchers and practitioners. Some of these challenges are:
1. Deep learning needs enough quality data—as a general rule, neural networks need more data to make more powerful abstractions. While big data scenarios have abundant data, the data is not always correct or of sufficiently high quality to enable training. Small variations or unexpected features of the input data can completely throw off neural network models.
2. Deep learning has difficulty with changing context—a neural network model trained on a certain problem will find it difficult to answer very similar problems, presented in a different context. For example, deep learning systems which can effectively detect a set of images can be stumped when presented by the same images, rotated or with different characteristics (grayscale vs. color, different resolution, etc).
3. Security concerns—deep learning needs to train and retrain on massive, realistic datasets, and during the process of developing an algorithm, that data needs to be transferred, stored, and handled securely. When a deep learning algorithm is deployed in a mission-critical environment, attackers can affect the output of the neural network by making small, malicious changes to inputs. This could change financial outcomes, result in wrong patient diagnosis, or crash a self-driving car.
4. Real-time decisions—much of the world’s big data is streamed in real time, and real time data analytics is growing in importance. Deep learning is difficult to use for real time data analysis, because it is very computationally intensive. For example, computer vision algorithms went through several generations over the course of two decades, until they became fast enough to detect objects in a live video stream.
5. Neural networks are black boxes—organizations who deal with big data need more than just good answers. They need to justify those answers and understand why they are correct. Deep learning algorithms rely on millions of parameters to reach decisions and it is often impossible to explain “why” the neural network selected one label over another. This opacity will limit the ability to use deep learning for critical decisions such as patient treatment in healthcare or large financial investments.