Auto ML

automated machine learning


AutoML is a hype that promises an enormous increase of efficiency for the machine learning process. But how much potential is there in the automated machine learning approach? We'll discuss in this article how AutoML can automate your ML process and the benefits you can gain from it.

Machine learning is usually carried out by data science teams based on empirical values or previous exploratory analyzes. This manual process is lengthy and difficult, as the analysts spend a lot of time preparing the data and testing model parameters. The aim of AutoML is the automated creation of models in order to simplify the ML process.

What is AutoML?

AutoML is the automation of machine learning. The aim is to reduce human working time in the data science process. AutoML eliminates the manual steps of the classic, iterative process. The human then only has to define the prepared training data as input and an optimized model is created.

Large companies, such as Google and Amazon, are banking heavily the development of AutoML and hope to open up significantly larger customer groups. Because machine learning is often an expensive development field for companies. Data science employees are extremely expensive and the analytical infrastructure is complex. This makes it an interesting use case for cloud providers.

AutoML - automate machine learning

Machine learning enables the extraction of knowledge from data: a computer program learns patterns based on sample data and uses this as a basis to create forecasts for the future.

Machine learning is very useful for companies - for example to assess the success of new products or to identify the risks of certain business processes. Although a computer is responsible for generating knowledge in machine learning, this process is not autonomous.

ML-Process

The human being has a high manual part in this iterative process. The sequence of a classic ML process is usually as follows:

  • Data collection
  • Data viewing
  • Prepare the data
  • Feature engineering
  • Selection of the appropriate machine learning model and features
  • Training of the model (including hyperparameter search)
  • Prediction by the model

So far, all of these steps have been carried out separately. The goal of AutoML is to ** automatically execute ** all of these individual blocks. As a user, it is your job to provide the relevant data and, at the end of the process, to evaluate the predictions and make them usable in the business processes. AutoML process according to Olson Source: R. Olson et. al. (2016) "Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science."

In the figure by [R. Olson et. al.] (https://arxiv.org/abs/1603.06212): from the raw data to the creation of the model, the entire process is automated (within the box).

All other steps run automatically with AutoML - without any human intervention.

Feature engineering automation

The AutoML process even goes so far that parts of the ** feature engineering ** are automated. So far, this complex task has been done by trained data engineering and data science experts. AutoML tries to automate this process.

Data is often in tabular form, but images, texts and videos in particular have to be preprocessed in order to make artificial neural networks more robust, for example.

The data used are referred to as features in machine learning. The better the features that describe the relationships between the target varibals, the easier it is to draw conclusions about future events.

Before this data can be evaluated, the feature engineering is necessary. Variables are calculated in order to best explain the relationships to the target variable.

Due to the complexity of feature engineering, human labor has always been necessary up to now. AutoML can also automate this task.

Classic machine learning and AutoML in comparison

When comparing it with classic ML processes, it becomes clear that AutoML has one advantage: the significant reduction in human labor.

The classic ML process not only requires a lot of working time, but also workers with special knowledge. The costs for skilled workers with a high level of training are considerable and it is difficult to put together a suitable team.

But the AutoML process alone does not add value to the company, but only when the results of the models in the business processes are used to support decision-making. AutoML cannot do this transfer, so humans will continue to play a crucial role.

Technical knowledge also plays an important role in modeling, something AutoML cannot do either. Example of AutoML on the Google Cloud based on a Kaggle Data Science Challenge.

But what can we use AutoML for? A crucial point that I see for AutoML is the possibility for small companies to also use machine learning sensibly without employing an expensive team of specialists.

In the past, the cost of ML projects has been a major barrier to entry for smaller businesses. AutoML is supposed to change this - because significantly less special knowledge is required. This makes it much easier to apply machine learning.

Advantages of AutoML at a glance

  • Faster results through automation
  • Lower costs due to less labor
  • Less susceptibility to errors
  • Use also possible for smaller businesses

AutoML Frameworks

Here the list of the most important AutoML frameworks and platforms:

automl-plattformen
AutoML Platform History. (Source: KDnuggets)

Natürlich gibt es noch viele weitere Anbieter und Lösungen, aber die oben genannten sind die wichtigen Akteure im AutoML-Feld.

Are data scientists obsolete now?

According to KDnuggets the AutoML frameworks are capable of building models well, but they cannot solve the most important tasks of a data scientist. A data scientist is still needed to mediate the technical exchange between technology and specialist area, to bring important specialist knowledge to the feature engineering and to specify the actual problem definition of the modeling.