Big Data
Wikipedia as of mid-2018, he gave the following definition of Big Data:
“Big data (Big Data) — marking of structured and unstructured data of enormous volume and significant variety, effectively processed horizontally scalable software tools that emerged in the late 2000s, years and alternative to traditional systems database management and solutions Business Intelligence”.
As you can see, this definition includes such vague terms as “enormous”, “substantial”, “effective” and “alternative”. Even the name itself is very subjective. For example, 4 Terabytes (capacity of modern external hard drive for laptop) — it is big data or not? By this definition, Wikipedia adds the following: “in the broad sense of “big data” referred to as socio-economic phenomenon related to the emergence of technological capabilities to analyze large amounts of data in some areas — the entire world the amount of data, and the resulting transformational consequences.”
Analysts IBS “all the world’s data volume” estimated these values:
2003 — 5 exabytes of data (1 EB = 1 billion gigabytes)
2008 — 0.18 zettabytes (1 ZB = 1024 exabytes)
2015 — more than 6.5 zettabytes
2020 — 40-44 zettabytes (forecast)
2025 — this volume will grow 10 times.
The report also notes that most of the data to generate are not ordinary consumers, and predpriyatiya1 (remember the Industrial Internet of things).
You can use a more simple definition, it is consistent with the established opinion of journalists and marketers.
“Big data is a collection of technologies that are designed to perform three operations:
It is believed that these “skills” allow you to reveal hidden patterns, slipping from limited human perception. This gives an unprecedented opportunity to streamline many areas of our life: public administration, medicine, telecommunications, Finance, transport, production and so on. It is not surprising that journalists and marketers so often used the phrase Big Data, what many experts believe this term is discredited and propose to abandon it.3
Moreover, in October 2015, Gartner has ruled out Big Data from a number of popular trends. His decision analysts explained that the concept of “big data” includes a large number of technologies already are actively used in enterprises, they partially belong to other popular areas and trends and have become a daily working tool.4
Anyway, the term Big Data is still widely used, as exemplified by our article.
The defining characteristics of big data are, in addition to their physical volume, and others, emphasizing the complexity of the task of processing and analyzing these data. The set of signs of the VVV (volume, velocity, variety — the physical volume, the rate of growth in data and the need for their fast processing, the ability to simultaneously process data of different types) was developed by company Meta Group in 2001 with the aim to point out the equal importance of data management in all three aspects.
In the future, there are interpretations with four V (added veracity — the authenticity), the five V (viability — the viability and value — value), family V (variability — variability and visualization — visualization). But IDC, for example, interpretirovat it is the fourth V, value (value), emphasizing the economic feasibility of processing large volumes of data under appropriate conditions.5
Based on the above definitions, the basic principles of big data are:
These principles differ from those typical of traditional, centralized, vertical storage models of well-structured data. Accordingly, for big data to develop new approaches and technologies.
Initially, a set of approaches and technologies tools include massively parallel processing vaguely structured data, such as DBMS NoSQL, MapReduce and Hadoop tools. In further to big data technologies began to carry and other solutions that provide characteristics similar to the handling capabilities of the ultralarge data sets, as well as some hardware.
McKinsey, in addition to considered by most analysts technologies NoSQL, MapReduce, Hadoop, R, includes in the context of the applicability of big data technology Business Intelligence and relational database management system supporting the SQL language.
International consulting company McKinsey, specializing in solving problems related to strategic management, highlights 11 analysis methods and techniques applicable to big data.
• Class methods Data Mining (data mining, data mining, data mining) — a set of detection methods in data previously unknown, nontrivial, practically useful knowledge needed for decision-making. Such methods include the learning of associative rules (association rule learning), classification (division into categories), cluster analysis, regression analysis, detection and analysis of deviations, etc.
• Crowdsourcing — classification and enrichment of the data by the broad, indefinite circle of persons performing the work without joining the employment relationship
• Blending and data integration (data fusion and integration) is a set of techniques that integrate heterogeneous data from a variety of sources to conduct in-depth analysis (for example, digital signal processing, natural language processing, including tone analysis, etc.)
• Machine learning, including supervised learning and unsupervised — use of models constructed on the basis of statistical analysis or machine learning to produce comprehensive forecasts for base models
• Artificial neural networks, network analysis, optimization, including genetic algorithms (genetic algorithm — heuristic search algorithms used for solving optimization problems and modelling by random selection, combination and variation of the desired parameters using the mechanisms similar to natural selection in nature)
• Pattern recognition
• Predictive Analytics
• Simulation (simulation) method, allowing to build the models describing processes how they would pass actually. Simulation can be regarded as a kind of experimental tests
• Spatial analysis (spatial analysis) is a class of methods that use topological, geometric and geographic information extracted from data
• Statistical analysis — time series analysis, A/B testing (A/B testing, split testing — method of marketing research; when using the test element group is compared to a set of test groups in which one or several indicators was changed, in order to find out which changes improve the target)
• Visualization of analytical data the presentation of information in the form of drawings, diagrams, using interactive features and animation as for getting results, and for use as source data for further analysis. A very important step for big data analysis, allowing to present the most important results of the analysis in the most readable form.7
According to a report by McKinsey, “Global Institute, Big data: The next frontier for innovation, competition, and productivity”, data has become as important a production factor as labor and production assets. Through the use of big data, companies can gain tangible competitive advantages. Big Data technologies can be useful for solving the following tasks:
Industrial enterprises of large data generated also due to the introduction of technologies for the Industrial Internet of things. In this process, the main components and parts of machines and machines are equipped with sensors, actuators, controllers and, sometimes, inexpensive processors that are capable of producing boundary (vague) calculations. In the course of the production process is continuous data collection and, perhaps, their pre-processing (e.g. filtering). Analytical platform to handle these amounts of data in real-time, present results in the most readable form and retain for further use. Based on the analysis of the obtained data conclusions are made about the condition of the equipment, its performance, quality of products, the need for changes in processes, etc.
By monitoring information in real-time, plant personnel can:
The last point is particularly important. For example, operators working in the petrochemical industry, earn an average of approximately 1,500 alarms per day, i.e. more than one message per minute. This leads to increased fatigue of operators who have to constantly make instant decisions on how to respond to a particular signal. But analytical platform can filter out nonessential information, and then the operators get the opportunity to focus primarily on critical situations. This allows them to more effectively identify and prevent accidents and possible accidents. This results in improved levels of production reliability, safety, availability of manufacturing equipment, regulatory compliance.10
In addition, the results of big data analysis you can calculate the payback period, prospects for changes of technological regimes, the reduction or reallocation of staff — i.e., to make strategic decisions regarding further development of the enterprise.11
Links:
1. https://rb.ru/howto/chto-takoe-big-data/
2. https://postnauka.ru/faq/46974
3. https://www.datacenterknowledge.com/archives/2015/03/30/big-data-bubble-set-burst
4. https://www.tadviser.ru/index.php/Статья:Большие_данные(Big_Data)
5. https://ru.wikipedia.org/wiki/Большие_данные
6. https://intellect.ml/big-data-6821
7. https://sewiki.ru/index.php?title=Большие_данные&oldid=3075
8.https://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
9. https://engjournal.ru/articles/1228/1228.pdf
10. https://www.crn.ru/news/detail.php?ID=117807
11. https://www.ogcs.com.ua/index.php/articles/121-big-data-v-promyshlennosti-innovatsii-k-kotorym-pridetsya-privykat
1A Sportyvna sq, Kyiv, Ukraine 01023
804 NE 125th St, Miami, FL 33161
info@servreality.com
info@servreality.com