python-course.eu

1. Machine Learning with Python

By Bernd Klein. Last modified: 30 Nov 2021.

Data sets and Visualization

Machine learning is a subfield of Artificial Intelligence (AI). So what is Artificial Intelligence?

Andrew Moore, former Dean of the School of Computer Science at Carnegie Mellon University, defined it as follows: "Artificial intelligence is the science and engineering of making computers behave in ways that, until recently, we thought required human intelligence."

The question "What is artificial intelligence?" depends on the answer to a more general question: "What is intelligence?"

It shows extremels hard to answer the previous question.

To get closer to the answers we can divide AI into to partitions:

weak AI and strong AI

weak AI:

strong AI:

We know now about Artificial Intelligence and Weak and Strong AI, but what about Machine Learning?

Let's start with a very "old" attempt at a definition by Arthur Samuek, an IBM pioneer:

"Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed."

A good attempt, but many questions remain unanswered. Almost 40 years later, in 1998, Tom Mitchell shaped a "well-off learning problem" as follows:

"Well posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E."

Annotation: A mathematical problem is called correctly (also well-posed, well-posed or properly posed) if the following conditions are met:

Machine Learning:

Machine learning means that an algorithm (the machine) learns automatically. This means that it is capable of extracting the necessary knowledge from given data automatically. The goal is to make predictions on new, unseen data. There is another way of putting it: In traditional heuristic decision-making algorithms, the programmers set the rules according to which the decisions are made. With machine learning, this is done independently by the program without interference from human beings!

Who makes the rules?

Machine learning taxonomy

There are two different approaches to Machine Learning:

We will solely cover "supervised learning in this tutorial".

Machine learning taxonomy

Examples for machine learning:

As already mentioned, a spam filter could be implemented using a classifier based on machine learning.

At the heart of machine learning is the concept of automating decision making from data without the user specifying explicit rules on how to make that decision. In the case of emails, the user does not provide a list of words or features that spam an email. Instead, the user provides examples of spam and non-spam emails that are marked as such. This is the so-called learning set.

The goal of a machine learning model is to predict new, previously invisible data. In a real application, we are not interested in marking an already marked email as spam or not. Instead, we want to make life easier for users by automatically classifying new incoming emails.

These examples are then learned or trained by the algorithm:

Supervised learning training phase

After the learning phase, we have to evaluate the classifier. We test both on labeled learning data and on non-learned labeled test data:

Supervised learning evaluation phase

If we are satisfied with the results, the classifier is ready to classify completely new documents:

Supervised learning prediction

The data is presented to the algorithm usually as a two-dimensional array (or matrix) of numbers. Each data point (also known as a sample or training instance) that we want to either learn from or make a decision on is represented as a list of numbers, a so-called feature vector, and its containing features represent the properties of this point.