Why is data important? Nowadays, information is considered an increasingly useful and precious resource, in particular in the artificial intelligence world.

Indeed, the quality and quantity of your information are the main factors to consider when approaching a problem with machine learning methods.

The simplest situation to manage is the one in which a large number of annotated data is available.

This information consists of a series of example cases in which you already know the answer you want, and which the machine can “study” to understand how to achieve the same solution, even in new cases never seen before.

Example data could be a set of recordings with transcription for a voice recognizer or photos with description for an image recognizer.

The harder is the problem, the more data you need.


This information can be collected by making direct field observations, or by applying human intelligence to the same problem.

If sufficiently varied and complete information is provided, the machine will find its own particular method to reach the solution, perhaps different from how a human would do, but often with equally correct results and undoubtedly faster times.

What to do if we don’t have these examples of problems with solutions? Even if we only have un-annotated data, there is still a lot to learn.

One thing the machine can do is find clusters of data which share similar characteristics.

For example, in a messy catalog of photos, those that show similar subjects could be grouped together, even if the machine is not able to assign a description to them by itself.

In addition, this can be a great way to assist people in data collection and annotation work, which can then be input to other algorithms for further analysis.

In conclusion, you never know what valuable secrets are hidden inside your data until you start digging.