Is Dataquest useful for learning machine learning

Data Scientist (specialist in processing, analyzing and storing large data fields). What is data science and how does it work? & Nbsp What is data science?

Ever wanted to figure out how to become a data analyst, study data science, but didn't know where to start? Then this article is for you.

Who among us has not heard of "Big Data"? There is hardly one. Interest in working with data has increased significantly in recent years as large IT companies have to develop more and more solutions for analyzing, processing, and then using data. Some even run educational programs in collaboration with universities. However, most have no understanding of what kind of people data analysts are. If you are one of those people and have a desire to become a data analyst, this article is for you. We just selected free training tools that you can use anywhere.

The so-called data analysts deal with information and analyzes in order to obtain visual results that can be read by humans. These people are usually considered specialists in big data, data mining, machine learning, systems analysis, and business analysts.

Lectures from the Yandex School of Data Analysis

SHAD - courses on data analysis by Yandex employees. It is quite difficult to enter there. The minimum required for applicants are the basics of advanced algebra; mathematical analysis, combinatorics, probability theory, and the basics of programming. Fortunately, courses are recorded so everyone can learn from video lectures.

Machine learning course

The course teaches you how to apply probability theory and statistics, talk about the basics of machine learning and how to create algorithms

Course search algorithms and data structures

During the lectures they talk about algorithms for searching and sorting large amounts of data, algorithms and manipulations with strings, graph-theoretical algorithms, construction and analysis of data structures.

Parallel and distributed computer course

For those who have been familiar with multithreading and parallel programming as well as MapReduce for a long time.

Course "Discrete Analysis and Probability Theory"

The course examines the basic concepts and methods of combinatorial, discrete and asymptotic analysis, probability theory and statistics and demonstrates their application.

Computational Complexity Course

After completing the course, you will learn the probabilistic complexity classes and the basic techniques of analysis and data construction.

Lectures Technostream Mail.ru Group

The course programs are aimed at students from several Moscow universities, but are available to everyone. We recommend the following collections of lectures for future analysts:

Lectures at Big Data University

Big Data University is an online course created in collaboration with IBM for beginners and those with no math education. The lectures, designed to help you understand the basics of working with data, are written in plain English.

What labs

This channel contains lectures on math, computer science, programming and machine learning. Examples are given for the application of the things examined in real life. The presentations are in English but have excellent Russian subtitles.

Course "Structured Data Training: An Introduction to Probabilistic Representation Models" Faculty of Computer Science, National Research University Higher School of Economics

The course focuses on an in-depth introduction to the theory and application of one of the most popular approaches to solving such problems - discrete probabilistic graphic models. The language of the course is English.

Sentdex channel

The channel is entirely dedicated to working with data. In addition, not only those who are interested in math will find useful things for themselves. With the Rasperri Pi there is a video about analysis and programming for financial analysts and robotics.

Siraj Raval Canal

The guy talks about modern technologies and how to work with them. Learn how to work with data in courses in deep learning, data science, and machine learning.

Data school channel

If you've just heard about machine learning but are already interested, then this channel is for you. The author will explain what it is, how it works and where it is used on an understandable level using examples.

For those who are unsure whether they are ready to study completely independently by watching the lectures, there are online courses with assignments with review.

Coursera data science courses

There is no need to explain what the platform is. You need to choose a course and start studying.

Stepik.org

Analyze data in R.

The first part covers all the main phases of statistical analysis of R, reading out data, data preprocessing, applying basic statistical methods and visualizing the results. The students get to know the basic elements of programming in the R language, which enables them to quickly and efficiently solve a wide variety of tasks that arise during data processing.

The second part covers several advanced topics that were not covered in the first: data preprocessing with the data.table and dplyr packages, advanced visualization techniques, working in R Markdown.

Introduction to databases

Immerse yourself in the DBMS

A course for those who have experience with relational DBMS and want to learn more about how it works. The course includes:

  • design of a database schema;
  • transaction management;
  • optimization of queries;
  • new functions of the relational DBMS

Hadoop. A system for processing large amounts of data

The course focuses on methods of processing large amounts of data with the Hadoop system. After completing the course, you will acquire knowledge of basic storage methods and methods for processing large amounts of data, understand the principles of distributed systems in the context of the Hadoop framework, and acquire practical knowledge of application development using the MapReduce programming model.

There are different directions in the IT world. Someone is involved in managing it, someone is involved in development or testing. Courses are being created to train system administrators, programmers and testers. This article covers a special program - Data Scientist - specifically for developers, analysts, and product managers.

Who is a data scientist or a data scientist?

There are many myths surrounding the data scientist profession, and many don't really understand what it is. Someone thinks that a data scientist or data analyst is something like a programmer (on the principle: you know how to program, that is, how to work with data), someone sees this profession as similar to a database administrator and who doesn't know what it is.

Looking ahead, it should be noted immediately that a data analyst is not a programmer and certainly not a database administrator, although he / she must have programming skills.

A data scientist is someone with three skills:

  • mathematics and statistics;
  • IT skills, including programming;
  • understanding of business processes in a particular area.

Jobs are not always referred to as a data scientist. Very often there are options: programmer analyst, big data analyst, systems analysis manager, big data architect, business analyst, and others.
The responsibilities of a data scientist include:

  • collect large amounts of data and bring them into a convenient format;
  • programming in Python, R and SAS languages;
  • solve business problems using data processing methods;
  • look for hidden relationships and patterns in data;
  • conduct statistical tests.

The data scientist needs to understand the business needs of his organization and be familiar with analytical tools: machine learning and text analysis.
According to the consulting firm at McKinsey Global Institute, the United States (just the United States, not the whole world!) Will need a whole army of data specialists over the next year - from 140,000 to 190,000.

How Much Does a Data Scientist Make?

In the United States, the average salary for a data scientist is over $ 138,000 per year. In Russia, you can apply for a salary of 120,000 rubles per month (more than $ 26,000 per year).

If we compare it to the profession of a simple programmer, then the average salary of a programmer in the US is 65-80 thousand dollars a year, and in Russia 60 thousand rubles a month, or 13 thousand dollars a year.

Either way, as a data scientist, you can make more money by becoming a data scientist.

As you can see, a data scientist is a very promising career. First, his salary is higher than that of a normal programmer. Second, there aren't that many data specialists and there is a shortage of specialists in the market, not just in Russia but around the world.

You can master the job of data scientist at the university for the training and additional training of specialists.

What does a data scientist training course do?

THE INFORMATION

  • Months of training: 5
  • Hours per week: 9
  • Experts: 13
  • Practice hours: 100+

Requirements for the students

Students must be proficient in at least one programming language (it is better if it is Python).
Students should master mathematics at the high school level: functions, derivatives, vector and matrix algebra, trigonometry.

Preparation course

If you do not have the necessary knowledge, a free preparatory course will be offered especially for you, which will open immediately after payment of the main course. The course consists of 11 video recordings of lectures and homework for them. He will talk about loops, data types, functions, learn how to work with HTTP requests, different data formats and much more.

How much it costs

The basic cost is 180,000 rubles, but by June 15, the cost of training was reduced to 165,000 rubles. At the same time, an interest-free installment payment plan for 6 months is provided, that is, the cost of training is 27,500 rubles per month.

What is the result

The student receives a state diploma for vocational retraining in the field of "Data Analyst / Machine Learning Specialist". With him you can apply as a "Data Analyst", "Big Data Developer" with a salary of 120,000 rubles per month.

Please note that after completing the training no type of "certificate" is issued, but a state diploma.


Data Science, Machine Learning - You've probably heard these big words, but how clear did they mean to you? For some, they're nice baits. Someone believes that data science is magic that makes a machine do what it is supposed to do for free. Others believe this is an easy way to make big money. Nikita Nikitinsky, Head of Research and Development at IRELA, and Polina Kazakova, Data Scientist, explain what it is in simple and understandable language.

I work in the field of automatic natural language processing, one of the data science applications, and I often see people who use these terms incorrectly. So I wanted to clarify a little bit. This article is for people who have a poor idea of ​​what data science is and want to understand the concepts.

Let's define the terminology

First of all, nobody knows exactly what data science is, and there is no strict definition - this is a very broad and interdisciplinary concept. That's why I'm going to share my vision here, which doesn't necessarily coincide with the opinions of others.

The term data science is translated into Russian as "data science" and often simply transliterated in a professional environment - "data science". Formally, it is a collection of interconnected disciplines and methods from the fields of computer science and mathematics. Sounds too abstract, doesn't it? Let's find out.

Part one: data

Indeed, the first component of data science, without which the whole further process is impossible, is the data itself: how it is collected, stored and processed, as well as how useful information is extracted from the general data field. It is precisely data cleansing and bringing it into the desired form that specialists devote up to 80% of their working time.

An important part of this point is the handling of data for which standard storage and processing methods are not suitable due to their large volume and / or their diversity - the so-called big data. By the way, don't be confused: Big data and data science are not synonyms, but the first subsection of the second. At the same time, data analysts don't always have to work with big data in practice - small ones can be useful.

We collect data

Imagine we are interested in whether there is a connection between your colleagues' coffee per day and the sleep the day before. Let's write down the information we have: Let's say your colleague Grigory slept for 4 hours today, so he had to drink 3 cups of coffee; Ellina slept nine hours and didn't drink any coffee at all; and Polina slept every 10 hours, but drank 2.5 cups of coffee - and so on.

Let's view the received data on a graph (visualization is also an important element of any data science project). Let's move the time in hours on the X-axis and coffee in milliliters on the Y-axis. We get something like this:

Part two: science

We have data, what can we do with it now? Analyze properly, extract, extract useful patterns and use them somehow. Disciplines such as statistics, machine learning and optimization help us here.

They form the next and perhaps most important part of data science - data analysis. Machine learning enables you to find patterns in existing data and then predict the information you need for new objects.

Let's analyze the data

Let's go back to our example. It seems to the eye that the two parameters are somehow related: the less a person slept, the more coffee they will drink the next day. At the same time, we also have an example that stands out from this trend - Polina, who likes to sleep and drink coffee. However, you can try to approximate the resulting regularity with a common straight line so that it comes as close as possible to all points:

The green line is our machine learning model, it summarizes the data and can be described mathematically. Now we can use this to determine the values ​​for new objects: If we want to predict how much coffee Nikita, who has entered the office, will drink today, we will ask how much he slept. After we received the value of 7.5 hours as the answer, we put it in the model - it corresponds to the amount of coffee that was drunk in a volume of slightly less than 300 ml. The red point represents our prediction.

This is how machine learning works, the idea of ​​which is very simple: find a pattern and expand it to new data. In fact, in machine learning, another class of problems differs when you don't need to predict some values ​​as in our example, but instead need to divide the data into some groups. But we'll talk about this in more detail another time.

Let's apply the result

In my opinion, however, data science doesn't end with identifying patterns in data. Any data science project is applied research where it is important not to forget things like formulating a hypothesis, planning an experiment, and of course evaluating the result and its suitability for solving a particular case.

The latter is very important in real business problems when you need to understand whether the data science solution you found will benefit your project or not. What could be the use of the constructed model in our example? Maybe with his help we could optimize the delivery of coffee to the office. At the same time, we need to assess the risks and determine if our model can handle it better than the existing solution - office manager Mikhail, who is responsible for purchasing the product.

Find exceptions

Of course, our example is simplified as much as possible. In reality, a more complex model could be built that takes some other factors into account, such as: B. whether a person generally likes coffee. Or the model could find relationships more complex than those represented by a straight line.

One might find outliers in our data first - objects that, like Polina, are very different from most others. The fact is that such examples in real work can have a bad influence on the process of creating a model and its quality, and it makes sense to process them in other ways. Sometimes such objects are of primary interest, for example when detecting abnormal banking activities, to prevent fraud.

In addition, Polina shows us another important idea - the imperfection of machine learning algorithms. Our model predicts only 100 ml of coffee for a person who slept for 10 hours, while Polina even drank 500. Data science solution customers will never believe this, but it is still impossible to train a machine to predict everything in the world perfectly: no matter how well we can isolate patterns in the data, there are always elements that cannot be predicted.

Let's continue the story

So data science is a set of methods for processing data, analyzing it and applying it to practical problems. At the same time, it should be clear that each specialist has his own opinion on this area, and opinions may be different.

Data science is based on fairly simple ideas, but in practice a lot of subtleties that are not obvious are often found. How data science surrounds us in everyday life, what methods of data analysis there are, who makes up the data science team and what difficulties can arise in the research process - we will talk about this in the following articles.

We are continuing a series of analytical studies on the demand for skills in the labor market. This time, thanks to Pavel Surmenk Sharky, we will consider a new profession - data scientist.

In recent years, the term data science has grown in popularity. You write a lot about it, talk about it at conferences. Some companies even hire people for the sweeping data scientist job. What is data science? And who are data scientists?

If you ask this question to a San Francisco resident, the answer will be that the data scientist is a San Francisco-based statistician. Funny, if not very encouraging for those outside of San Francisco, is it? Okay, then one more definition: A data scientist is someone who understands statistics better than any programmer and understands programming better than any statistician. But this option is already close to the point. The data scientist is a mixture of a statistician and a programmer. In addition, statisticians and programmers are both very different. Hence, it is better to think of this profession as a broad spectrum from pure statisticians to pure programmers.

Robert Chang, data scientist from Twitter, divides representatives of his profession into two groups: Type A data scientist v.s. Type B data scientist.

Type A, where A is analysis. These people are mainly concerned with extracting meaning from static data. They are very much like statisticians, they can even be statisticians and simply change their job title to data scientist, and as we know, just changing the job title can bring a significant raise, honor and respect. In addition to statistics, they also know practical aspects: how to clean up data, how to work with large amounts of data, how to visualize data and describe the results of their work.

Type B, where B is building. They also have statistical knowledge, but are strong and experienced programmers. They are more interested in applying data to real systems. Often models are created that work in interaction with users, e.g. B. Systems for recommending products, films and advertising.

Data science also overlaps slightly with areas such as machine learning and artificial intelligence. Representatives of this area are close to type B data science.

What can you learn for those who want to become a data scientist? What skills are required? Let's take a look at the demands American employers have placed on candidates for positions in data science and machine learning.

Hard skills of data scientist

Let's start with an analysis of the hard skill requirements.

As you can see from the ranking, basic math, statistics, computer science and machine learning are the most popular. In addition to theoretical knowledge, the data scientist must be able to break down, cleanse, model and visualize data. Experience in software development and quality management is also important.

Data science tools and technologies

Data Scientist's main tools are Python and R.

R is a specialized programming language for statistical computing, which is why it is so loved by statisticians and data scientists. You can use it to quickly load a data set, calculate the most important statistical characteristics, visualize data and create data models.

While Python is a universal programming language, it has a variety of high quality libraries and platforms for data science and machine learning.

Notably, 39% of jobs require knowledge of R and Python at the same time. Hence, it is better to learn both languages ​​at the same time than trying to choose one of them.

For big data, employers prefer Hadoop and Spark. MySQL and MongoDB are popular with databases.

Data Scientist Soft Skills

General skills (soft skills) are less in demand than professional skills, as they are mentioned more than half as often in vacancies. The average salaries for vacancies that require soft skills are around 20% as high as salaries that require hard skills and technological knowledge.

However, among the soft skills encountered, the following are the most important: communication, data visualization, presentations, effective writing and speaking. Teamwork, management, and problem-solving skills are also helpful.

Data Scientist Domain Knowledge

Some open positions require knowledge of a subject ranging from physics and biology to real estate and hospitality. Here are the business, marketing, and medicine guides.

Specializations of data scientists

Before we begin any research, we wanted to highlight the sub-specializations of the data scientist profession. For example, to separate those who deal primarily with data analysis and visualization from those who build models for predictive analysis or algorithms for machine learning. However, as it turned out when analyzing the data, the requirements for most vacancies are fairly uniform and there is no clear division into subject areas.

Although some patterns seem interesting. For example, if a job requires knowledge of Python or C ++, it is unlikely that communication and management skills will be required, and vice versa.

The O'Reilly 2015 Data Science Salary Survey helps us see the job market from the other side. This study is based on a survey of 600 data scientists. The data collected includes salary levels, demographic information, and the time specialists spend on different types of tasks. The main results of this study are:
  • SQL, Excel, R, Python are key tools, and this list hasn't changed in 3 years.
  • Spark and Scala are becoming increasingly popular.
  • The focus of those who used previously specialized commercial tools is shifting to the use of R.
  • But those who have previously used R are switching to Python. Python leads the way.
  • Of all industries, the highest salaries are in software development.
  • Cloud computing is still in demand.
We encourage you to read the entire report. Among other things, he describes a mathematical model of the dependence of the salary of a data scientist on his place of residence, his training and his tasks. For example, data scientists who spend more time in meetings earn more. Those who study data for more than 4 hours a day earn less. Many online courses on this topic have appeared in recent years. And that's a very good start!

If you're more interested in data science, Coursera: Start Your Career In Data Science is a great option. Getting a specialization isn't free, but if you don't need a certificate, all of these courses can be taken for free: just look at the course name and search for a course.

For those interested in machine learning, we recommend the course Andrew Ng, Chief Scientist at Baidu Research, Faculty Member at Stanford and Founder of Coursera: Computer Learning.

Data science is a new field, so the requirements for data scientists are not yet fully met. Given the dynamics of our time, it is possible that data science will never become an independent profession taught in universities, but rather remain a set of practices and skills. But it is precisely these practices and skills that will be in great demand in the years to come.

Data scientist - a specialist in the processing, analysis and storage of large amounts of data, the so-called "big data". The profession is suitable for those who are interested in physics, mathematics and computer science (see career choice for interest in school subjects).

Data Science - data science at the interface of different disciplines: mathematics and statistics; Informatics and informatics; Business and economy.

(S. Maltseva, V. Kornilov, National Research University "Higher School of Economics")

The profession is new, relevant and. The term "Big Data" itself appeared in 2008. And the data scientist profession - "Data Scientist" was officially registered in early 2010 as an academic and cross-sectoral discipline. The first mention of the term "data science" was in Peter Naur's book from 1974, but in a different context.

The necessity for such a profession to emerge was determined by the fact that the datasets of Ultra Big Data are too large to be processed by standard means of mathematical statistics. Every day, thousands of petabytes (10 15 bytes \ u003d 1024 terabytes) of information are transmitted through the servers of companies around the world. In addition to such amounts of data, their heterogeneity and high update rate complicate the problem.

Data arrays are divided into three types:

structured (e.g. data from cash registers in retail);

semi-structured (email messages);

unstructured (video files, images, photos).

Most big data data is unstructured, which makes it very difficult to process.

A statistician, systems analyst, or business analyst alone cannot solve problems with such amounts of data. This requires a person with an interdisciplinary education who is proficient in math and statistics, economics and business, computer science and computer technology.

The main task of the data scientist is the ability to extract the required information from a variety of sources using information flows in real time. Identify hidden patterns in data sets and analyze them statistically to make smart business decisions. The workplace of such a specialist is not 1 computer or even 1 server, but a cluster of servers.

Characteristics of the profession

Data Scientist uses different methods to work with data:

  • statistical methods;
  • database modeling;
  • methods of intellectual analysis;
  • artificial intelligence applications for working with data;
  • methods of design and development of databases.

The roles of a data scientist vary depending on their area of ​​activity, but the general list of roles is as follows:

  • collection of data from various sources for later operational processing;
  • analysis of consumer behavior;
  • customer base modeling and product personalization;
  • analysis of the effectiveness of internal processes of the base;
  • analysis of various risks;
  • identify a possible fraud to investigate suspicious transactions;
  • Preparation of regular reports with forecasts and data presentation.

Like a real scientist, a data scientist not only collects and analyzes data, but also examines it in different contexts and from different perspectives, questioning assumptions. The most important quality of a data scientist is the ability to recognize logical connections in the system of collected information and to develop effective business solutions based on quantitative analysis. In today's highly competitive and rapidly changing world, in an ever-increasing flow of information, the data scientist is essential to support you in making the right business decisions.

Advantages and disadvantages of the profession

Professionals

  • The profession is not only very popular, there is also an acute shortage of professionals at this level. According to the McKinsey Global Institute, more than 190,000 data scientists will be needed in the US alone by 2018. That is why the faculties of the most renowned universities for the training of data specialists are funded and further developed so quickly and comprehensively. The demand for data scientists is also growing in Russia.
  • Highly paid job.
  • The need to constantly evolve, to keep pace with the development of IT technologies and to develop new methods of processing, analyzing and storing data on our own.

Minus points

  • Not everyone will master this profession, a special way of thinking is required.
  • During the work process, known methods and more than 60% of ideas may not work. Many solutions fail and it takes a lot of patience to achieve satisfactory results. A scientist has no right to say "NO!" Accept. Problem. He has to find a way that will help solve the problem.

Workplace

Data scientist holds key positions in the following areas:

  • technological industries (car navigation systems, drug manufacturing, etc.);
  • IT sphere (search engine optimization, spam filters, message systematization, automatic text translations and much more);
  • medicine (automatic diagnosis of diseases);
  • financial structures (making decisions about granting loans), etc .;
  • tV company;
  • large retail chains;
  • election campaign.

Important properties

  • analytical mind;
  • hard work;
  • persistence;
  • unscrupulousness, accuracy, attention;
  • the ability to pursue research despite poor interim results;
  • conviviality;
  • the ability to explain complex things in simple terms;
  • business intuition.

Expertise and skills:

  • knowledge of mathematics, mathematical analysis, mathematical statistics, probability theory;
  • knowledge of english;
  • Knowledge of the most important programming languages ​​with components for working with large data fields: Java (Hadoop), C ++ (BigARTM, Vokal-Wabbit, XGBoost), Python (Matplotlib, Numpy, Scikit, Skipy);
  • knowledge of statistical tools - SPSS, R, MATLAB, SAS Data Miner, Tableau;
  • solid knowledge of the industry the data scientist works in; If it is a pharmaceutical industry, knowledge of the main production processes and drug components is required.
  • the main core skill of a data scientist is to organize and manage cluster storage systems for large fields of data.
  • knowledge of the laws of business development;
  • economic knowledge.

Universities

  • Moscow State University Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics, Special Education Program Mail.Ru Group "Technosphere", with training in methods of mining large amounts of data, programming in C ++, multithreaded programming and technology for building information retrieval systems.
  • MIPT, data analysis department.
  • The Faculty of Information Systems at the Higher School of Economics trains systems analysts, designers and implementers of complex information systems as well as organizers of the management of corporate information systems.
  • Yandex data analysis school.
  • Innopolis University, Dundee University, Southern California University, Oakland University, Washington University: Masters in Big Data.
  • Imperial College London Business School, MSc in Data Science and Management.

As in any profession, self-education is also important here, the undisputed benefit of which brings the following resources:

  • online courses from the world's leading universities COURSERA;
  • machine learning channel MASHIN LEARNING;
  • a selection of edX courses;
  • udacity courses;
  • dataquest courses where you can become a real professional in data science;
  • 6-level Datacamp courses;
  • o'Reilly training videos;
  • screencasts for beginners and advanced users Data origami;
  • a quarterly conference of Moscow Data Scients Meetup specialists;
  • kaggle.com data analysis competition

salary

Salary for 07/04/2019

Russia 50,000-200,000 ₽

Moscow 60000-300000 ₽

The data scientist profession is one of the highest paid. Information from the website hh.ru - the salary per month ranges from $ 8.5 to $ 9,000. In the United States, the salary of such a specialist is $ 110,000 to $ 140,000 per year.

According to a survey by the Superjob Research Center, data scientist salaries vary based on work experience, role and region. A beginner specialist can count on 70,000 rubles. in Moscow and 57 thousand rubles. in Saint Petersburg. With professional experience of up to 3 years, the salary increases to 110 thousand rubles. in Moscow and 90 thousand rubles. in Saint Petersburg. For experienced specialists with scientific publications, the salary can reach 220 thousand rubles. in Moscow and 180 thousand rubles. In Petersburg.

Career steps and prospects

The job of data scientist is a great achievement in itself and requires sound theoretical knowledge and practical experience in various professions. Such a specialist is a key figure in any organization. To reach this level, it is necessary to work hard, purposefully and constantly improve in all areas that form the basis of the profession.

You joke about data scientist: this is a generalist who programs better than any statistician and knows statistics better than any programmer. And he understands business processes better than the head of the company.

WHAT "LARGEDATA"In real numbers?

  1. Every two days the amount of data increases by the amount of information that mankind created from Christmas to 2003.
  2. 90% of all data available today appeared in the last 2 years.
  3. By 2020 the amount of information will increase from 3.2 to 40 zettabytes. 1 zettabyte \ u003d 10 21 bytes.
  4. Within a minute, 200,000 photos are uploaded to Facebook, 205 million letters are sent and 1.8 million likes are posted.
  5. Google processes 40,000 search queries within 1 second.
  6. The total volume of data for each industry doubles every 1.2 years.
  7. By 2020, the market for Hadoop services will grow to $ 50 billion.
  8. In 2015, 1.9 million jobs were created in the United States for specialists working on big data projects.
  9. Big data technologies increase retail chain profits by 60% per year.
  10. The big data market is predicted to grow to $ 68.7 billion in 2020, compared to $ 28.5 billion in 2014.

Despite these positive growth indicators, forecast errors also occur. For example, one of the most famous mistakes of 2016: The predictions about the US presidential election did not come true. The predictions were made by renowned US data scientists Nate Silver, Kirk Bourne and Bill Schmarzo on behalf of Hillary Clinton. In previous election campaigns, they made precise predictions and were never wrong.

For example, this year Nate Silver gave an accurate prediction for 41 states, but he was wrong for 9 states, leading to Trump's victory. After analyzing the reasons for the 2016 errors, they concluded that:

  1. Mathematical models objectively reflect the image at the time they were created. But they have a half-life, by the end of which the situation can change dramatically. The predictive qualities of the model deteriorate over time. In this case, for example, misconduct, income inequality and other social upheavals played a role. Therefore, the model needs to be adjusted regularly to reflect new data. This was not done.
  2. It is necessary to look for and take into account additional data that can have a significant impact on the forecasts. When watching the video of the rallies during the Clinton and Trump election campaigns, the total number of participants in the rallies was not taken into account. Hundreds of people were involved. It found that there were 400 to 600 people attending the rally in favor of Trump and only 150 to 200 in favor of Clinton - which was reflected in the results.
  3. Mathematical models in election campaigns are based on demographic data: age, race, gender, income, status in society, etc. The weight of each group depends on how they voted in the last election. Such a forecast has an error of 3-4% and works reliably when there is a large gap between the candidates. In this case, however, the Clinton-Trump gap was narrow, and that tendency had a significant impact on election results.
  4. The irrational behavior of people was not taken into account. Opinion polls conducted create the illusion that people are voting as they have responded to the polls. But sometimes they do the opposite. In this case, additional facial and speech analysis should be performed to determine an unfair attitude towards the vote.

In general, it turned out that the wrong prediction was so due to the small gap between candidates. In the case of a large loophole, these errors would not be as critical.

Video: New Big Data Specialization - Mikhail Levin



Previous article:Game Walkthrough of Divinity: Original Sin Divinity Original sin that stole the corpseNext article:Divinity: Original Sin Divinity Original Sin Complete PC Solution