Bài học 4 trong số 7

Data preparation

5 phút để hoàn thành

Data preparation

Assess your use case, source and prepare your data

What is training data?

If you have properly set up your Google Cloud account, you are now ready for the exercise. In this lesson, you will learn what questions you should ask while gathering the training data and how to prepare it to be used by AutoML Vision.

With training data, what we mean is examples of what we want our ML model to be able to recognise and categorise. In our case, this means providing a set of satellite images and telling the algorithm which ones are examples of amber mining and which are not.

Start with your use case

While putting together the dataset, always start from the problem you are asking ML to help you solve. Consider the following questions:

What is the outcome you’re trying to achieve?
What kinds of categories would you need to recognise to achieve this outcome?
Is it possible for humans to recognise those categories? Although AutoML Vision can handle many more images and categories than humans can, if a human cannot recognise a specific category, then AutoML Vision will have a hard time as well.
What kinds of examples would best reflect the type and range of data your system will classify?

Think about a story you are working on. How do the answers to those questions change your approach to the story and whether you need Machine Learning for it?

Assess your use case

In our case, these could be our answers:

We want our model to be able to recognise instances of amber mining in satellite images we will present to it.
We only need two categories: "YES: this image includes elements consistent with patterns that usually show amber mining activity" and "NO: this image doesn't include elements that suggest amber mining".
Mostly yes: instances of amber mining are quite recognisable in satellite images because of the distinctive pockmark-like pattern of holes in the ground. But we'll see in the testing phase that it might not always be as easy as we think.
Different background, different density of the holes, different colours. The more diverse the examples in our dataset, the better the algorithm will learn.

Source your data

Once you’ve established what data you need, the next step is to find a way to source it. In our case, we already have the dataset provided by Texty. But think of what might be your own use case: How and where can you find the images you need?

You might be able to source them from what your organisation collects or from third-parties. In both cases, make sure to review regulations about data protection in your region and the locations your application will serve.

No training data will ever be perfectly "unbiased", but you can improve your chances of building a "fair" ML model if you carefully consider potential sources of bias in your data and take steps to address them. Review our Introduction to Machine Learning to find out more about it.

Prepare your data

There are a few more things to keep in mind as you put together the training data:

Include enough labelled examples in each category: The minimum required by AutoML Vision is 100 examples per label. In general, the more labelled images you can bring to the training process, the better your model will be.

It’s important to include roughly similar amounts of training examples for each category. If you have an abundance of data for one label, use only part of it to avoid having a widely different amount of examples per category.

Find images that are visually similar to what you’re planning to ask the model to categorise. Ideally, your training examples are real-world data drawn from the same dataset you're planning to use the model to classify.

Xin chúc mừng! Bạn đã hoàn thành Data preparation Rồi, tôi đang thực hiện

Đề xuất cho bạn

open_in_new

Tìm hiểu thêm về trình chặn quảng cáo

Bài học

Lấy lại doanh thu từ quảng cáo đã mất do trình chặn quảng cáo

Bắt đầu

Xóa khỏi tài khoản

Lưu vào tài khoản của bạn

None
open_in_new

2-Step Verification: Stronger security for your Google account.

Bài học

Add an extra layer of protection beyond your password.

Bắt đầu

Xóa khỏi tài khoản

Lưu vào tài khoản của bạn

None
open_in_new

Data Studio: Make interactive data visualizations

Bài học

Give life to your datasets by creating powerful interactive visualizations with an easy-to-use studio.

Bắt đầu

Xóa khỏi tài khoản

Lưu vào tài khoản của bạn

None

Bạn đánh giá như thế nào về bài học này?

Ý kiến phản hồi của bạn sẽ giúp chúng tôi không ngừng cải thiện các bài học!

TITLE

TITLE

TITLE

TITLE

TITLE

TITLE

TITLE

TITLE

TITLE

TITLE

TITLE

TITLE

TITLE

TITLE

TITLE

TITLE

Data preparation

What is training data?

Start with your use case

Assess your use case

Source your data

Prepare your data

Tìm hiểu thêm về trình chặn quảng cáo

2-Step Verification: Stronger security for your Google account.

Data Studio: Make interactive data visualizations

Tôi đang tìm kiếm tài nguyên trong

Data preparation

What is training data?

Start with your use case

Assess your use case

Source your data

Prepare your data

Tìm hiểu thêm về trình chặn quảng cáo

2-Step Verification: Stronger security for your Google account.

Data Studio: Make interactive data visualizations