paint-brush
Aktif Öğrenmeye Giriş ile@whatsai
850 okumalar
850 okumalar

Aktif Öğrenmeye Giriş

ile Louis Bouchard
Louis Bouchard HackerNoon profile picture

Louis Bouchard

@whatsai

I explain Artificial Intelligence terms and news to non-experts.

3 dk. read2023/06/18
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript

Çok uzun; Okumak

Aktif öğrenme, veri kümenizin açıklamalarını optimize etmeyi ve en az miktarda eğitim verisi kullanarak mümkün olan en iyi modeli eğitmeyi amaçlar. Modelinizin tahminleri ile verileriniz arasında yinelenen bir süreci içeren denetimli bir öğrenme yaklaşımıdır. Genel olarak daha az görüntüye açıklama ekleyerek, optimize edilmiş bir model elde ederken zamandan ve paradan tasarruf edersiniz.
featured image - Aktif Öğrenmeye Giriş
Louis Bouchard HackerNoon profile picture
Louis Bouchard

Louis Bouchard

@whatsai

I explain Artificial Intelligence terms and news to non-experts.

0-item
1-item
2-item
3-item

STORY’S CREDIBILITY

DYOR

DYOR

The writer is smart, but don't just like, take their word for it. #DoYourOwnResearch before making any investment decisions or decisions regarding you health or security. (Do not regard any of this content as professional investment advice, or health advice)

Guide

Guide

Walkthroughs, tutorials, guides, and tips. This story will teach you how to do something new or how to do something better.

Opinion piece / Thought Leadership

Opinion piece / Thought Leadership

The is an opinion piece based on the author’s POV and does not necessarily reflect the views of HackerNoon.

Vested Interest

Vested Interest

This writer has a vested interested be it monetary, business, or otherwise, with 1 or more of the products or companies mentioned within.

Günümüz dünyasında ChatGPT gibi güçlü yapay zeka modellerinin yanı sıra görüş modelleri ve diğer benzer teknolojiler sayesinde çok büyük miktarda veriye erişebiliyoruz. Ancak bu modellerin dayandığı yalnızca veri miktarı değil, aynı zamanda kalite de önemlidir. Hızlı ve geniş ölçekte iyi bir veri kümesi oluşturmak zorlu ve maliyetli bir iş olabilir.


Aktif öğrenmenin devreye girdiği yer burasıdır.

Basit bir ifadeyle aktif öğrenme, veri kümenizin açıklamalarını optimize etmeyi ve en az miktarda eğitim verisi kullanarak mümkün olan en iyi modeli eğitmeyi amaçlar.


Modelinizin tahminleri ile verileriniz arasında yinelenen bir süreci içeren denetimli bir öğrenme yaklaşımıdır. Tam bir veri kümesini beklemek yerine, küçük bir grup seçilmiş, açıklamalı veriyle başlayabilir ve modelinizi bununla eğitebilirsiniz.


Ardından, aktif öğrenmeyi kullanarak, görünmeyen verileri etiketlemek, tahminlerin doğruluğunu değerlendirmek ve edinme işlevlerine dayalı olarak açıklama eklenecek sonraki veri kümesini seçmek için modelinizden yararlanabilirsiniz.


Aktif öğrenmenin bir avantajı, modelinizin tahminlerinin güven düzeyini analiz edebilmenizdir.


Bir tahminin güvenirliği düşükse model, etiketlenecek bu türdeki ek görüntüleri talep edecektir. Öte yandan, yüksek güvenilirliğe sahip tahminler daha fazla veri gerektirmeyecek. Genel olarak daha az görüntüye açıklama ekleyerek, optimize edilmiş bir model elde ederken zamandan ve paradan tasarruf edersiniz. Aktif öğrenme, büyük ölçekli veri kümeleriyle çalışmak için oldukça umut verici bir yaklaşımdır.


Aktif öğrenmenin temsili. Kumar ve ark.'dan görüntü

Aktif öğrenmenin temsili. Kumar ve ark.'dan görüntü



Aktif öğrenme hakkında hatırlanması gereken birkaç önemli nokta vardır.

Birincisi, modelinizin tahminlerinin kalitesi üzerinde kontrol sahibi olmanızı sağlayan insan açıklamasını içerir. Milyonlarca görüntüyle eğitilmiş bir kara kutu değil. Gelişimine aktif olarak katılırsınız ve performansının iyileştirilmesine yardımcı olursunuz. Bu durum, denetimsiz yaklaşımlarla karşılaştırıldığında maliyetleri artırsa da aktif öğrenmeyi önemli ve ilginç kılmaktadır. Ancak modelin eğitiminde ve devreye alınmasında tasarruf edilen zaman çoğu zaman bu maliyetlerden daha fazladır.


Ek olarak, otomatik açıklama araçlarını kullanabilir ve bunları manuel olarak düzelterek masrafları daha da azaltabilirsiniz.


Aktif öğrenmede, modelinizin üzerinde eğitim aldığı etiketli bir veri kümesine sahip olursunuz; etiketlenmemiş küme ise henüz açıklama eklenmemiş potansiyel verileri içerir. Önemli bir kavram, hangi verilerin etiketleneceğini belirleyen sorgu stratejileridir. Büyük etiketlenmemiş veri havuzunda en bilgilendirici alt kümeleri bulmaya yönelik çeşitli yaklaşımlar vardır. Örneğin belirsizlik örneklemesi, modelinizi etiketlenmemiş veriler üzerinde test etmeyi ve açıklama için en az güvenli şekilde sınıflandırılmış örnekleri seçmeyi içerir.


Aktif öğrenmenin Komiteye Göre Sorgulama yaklaşımıyla temsili. Kumar ve ark.'dan görüntü

Aktif öğrenmenin Komiteye Göre Sorgulama yaklaşımıyla temsili. Kumar ve ark.'dan görüntü



Aktif öğrenmedeki diğer bir teknik, her biri etiketli verilerin farklı bir alt kümesi üzerinde eğitilen birden fazla modelin bir komite oluşturduğu Komiteye Göre Sorgulamadır (QBC) . Tıpkı farklı deneyimlere sahip insanların belirli kavramlara ilişkin farklı anlayışlara sahip olması gibi, bu modellerin de sınıflandırma sorununa ilişkin farklı bakış açıları vardır. Açıklama eklenecek veriler, karmaşıklığı gösteren komite modelleri arasındaki anlaşmazlığa göre seçilir. Bu yinelemeli süreç, seçilen verilere sürekli olarak açıklama eklendiğinden devam eder.


Bu, aktif öğrenmenin yalnızca temel bir açıklamasıdır ve sorgu stratejisinin bir örneğini gösterir.

İlgileniyorsanız diğer makine öğrenimi stratejileri hakkında daha fazla bilgi veya video sağlayabilirim. Aktif öğrenmeye gerçek hayattan bir örnek, Google'da captcha'ları yanıtlamanızdır. Bunu yaparak, karmaşık görüntüleri belirlemelerine ve birden fazla kullanıcının ortak girdisiyle veri kümeleri oluşturmalarına yardımcı olarak hem veri kümesi kalitesini hem de insan doğrulamasını sağlarsınız. Dolayısıyla bir dahaki sefere bir captcha ile karşılaştığınızda yapay zeka modellerinin ilerlemesine katkıda bulunduğunuzu unutmayın!


Daha fazlasını öğrenmek ve Encord'daki arkadaşlarım tarafından geliştirilen mükemmel bir aracın kullanıldığı pratik bir örneği görmek için videoya göz atın:


foreign

[00:00:00] : [00:00:03]

[Music]

[00:00:03] : [00:00:10]

amounts of data thanks to the

[00:00:10] : [00:00:13]

superpowers of large models including

[00:00:13] : [00:00:16]

the famous chatgpt but also Vision

[00:00:16] : [00:00:18]

models and all other types you may be

[00:00:18] : [00:00:21]

working with indeed the secrets behind

[00:00:21] : [00:00:23]

those models is not only the large

[00:00:23] : [00:00:25]

amount of data they are being trained on

[00:00:25] : [00:00:28]

but also the quality of that data but

[00:00:28] : [00:00:31]

what does this mean it means we need

[00:00:31] : [00:00:34]

lots of very good balance and varied

[00:00:34] : [00:00:37]

data and as data scientists we all know

[00:00:37] : [00:00:40]

how complicated and painful it can be to

[00:00:40] : [00:00:43]

build such a good data set fast and at

[00:00:43] : [00:00:45]

large scale and maybe with a limited

[00:00:45] : [00:00:47]

budget what if we could have helped

[00:00:47] : [00:00:50]

build that or even have automated help

[00:00:50] : [00:00:53]

well that is where Active Learning comes

[00:00:53] : [00:00:56]

in in one sentence the goal of active

[00:00:56] : [00:00:58]

learning is to use the least amount of

[00:00:58] : [00:01:00]

training data to optimize The annotation

[00:01:00] : [00:01:03]

of your whole data set and train the

[00:01:03] : [00:01:05]

best possible model it's a supervised

[00:01:05] : [00:01:06]

learning approach that will go back and

[00:01:06] : [00:01:09]

forth between your model's predictions

[00:01:09] : [00:01:11]

and your data what I mean here is that

[00:01:11] : [00:01:13]

you may start with a small batch of

[00:01:13] : [00:01:16]

curated annotated data and train your

[00:01:16] : [00:01:18]

model with it you don't have to wait for

[00:01:18] : [00:01:21]

your whole millions of images that are

[00:01:21] : [00:01:23]

set to be ready just push it out there

[00:01:23] : [00:01:25]

then using Active Learning you can use

[00:01:25] : [00:01:28]

your model on your unseen data and get

[00:01:28] : [00:01:31]

human annotators to label it but that is

[00:01:31] : [00:01:34]

not only it we can also evaluate how

[00:01:34] : [00:01:36]

accurate the predictions are and using a

[00:01:36] : [00:01:38]

variety of acquisition functions which

[00:01:38] : [00:01:41]

are functions used to select the next

[00:01:41] : [00:01:43]

unseen data to annotate we can quantify

[00:01:43] : [00:01:46]

the impact of labeling a larger data set

[00:01:46] : [00:01:49]

volume or improving the accuracy of the

[00:01:49] : [00:01:52]

labels generated to improve the model's

[00:01:52] : [00:01:54]

performance thanks to how you train the

[00:01:54] : [00:01:56]

models you can analyze the confidence

[00:01:56] : [00:01:58]

they have in their predictions

[00:01:58] : [00:02:00]

predictions with low confidence will

[00:02:00] : [00:02:02]

automatically request additional images

[00:02:02] : [00:02:05]

of this type to be labeled and

[00:02:05] : [00:02:07]

predictions with high confidence won't

[00:02:07] : [00:02:09]

need additional data so you will

[00:02:09] : [00:02:11]

basically save a lot of time and money

[00:02:11] : [00:02:14]

by having to annotate fewer images in

[00:02:14] : [00:02:16]

the end and have the most optimized

[00:02:16] : [00:02:20]

model possible how cool is that Active

[00:02:20] : [00:02:22]

Learning is one of the most promising

[00:02:22] : [00:02:24]

approach to working with large-scale

[00:02:24] : [00:02:26]

data sets and there are a few important

[00:02:26] : [00:02:28]

key Notions to remember with active

[00:02:28] : [00:02:30]

learning the most important is that it

[00:02:30] : [00:02:33]

uses humans which you can clearly see

[00:02:33] : [00:02:34]

here in the middle of this great

[00:02:34] : [00:02:37]

presentation of active learning it will

[00:02:37] : [00:02:40]

still require humans to annotate data

[00:02:40] : [00:02:42]

which has the plus side to give you full

[00:02:42] : [00:02:44]

control over the quality of your model's

[00:02:44] : [00:02:47]

prediction it's not a complete Black Box

[00:02:47] : [00:02:49]

that trained with millions of images

[00:02:49] : [00:02:51]

anymore you iteratively follow its

[00:02:51] : [00:02:54]

development and help it get better when

[00:02:54] : [00:02:56]

it fails of course it does have the

[00:02:56] : [00:02:58]

downside of increasing costs versus

[00:02:58] : [00:03:01]

unsupervised approaches where you don't

[00:03:01] : [00:03:03]

need anyone but it allows you to limit

[00:03:03] : [00:03:06]

those costs by only training where the

[00:03:06] : [00:03:08]

models need it instead of feeding it as

[00:03:08] : [00:03:11]

much data as possible and hoping for the

[00:03:11] : [00:03:13]

best moreover the reduction in time

[00:03:13] : [00:03:16]

taken to train the model and put it into

[00:03:16] : [00:03:18]

production often outweighs these costs

[00:03:18] : [00:03:20]

and you can use some automatic

[00:03:20] : [00:03:22]

annotation tools and manually correct it

[00:03:22] : [00:03:25]

after again reducing the costs then

[00:03:25] : [00:03:27]

obviously you will have your labeled

[00:03:27] : [00:03:29]

data set the labeled set of data is what

[00:03:29] : [00:03:31]

your current model is being trained on

[00:03:31] : [00:03:34]

and the unlabeled set is the data you

[00:03:34] : [00:03:36]

could put in usually used but hasn't

[00:03:36] : [00:03:39]

been annotated yet another key notion is

[00:03:39] : [00:03:40]

actually the answer to the most

[00:03:40] : [00:03:43]

important question you may already have

[00:03:43] : [00:03:46]

in mind how do you find the bad data to

[00:03:46] : [00:03:49]

annotate and add to the training set

[00:03:49] : [00:03:51]

the solution here is called query

[00:03:51] : [00:03:54]

strategies and they are essential to any

[00:03:54] : [00:03:57]

Active Learning algorithm deciding which

[00:03:57] : [00:04:00]

data to label and which not to there are

[00:04:00] : [00:04:02]

multiple possible approaches to finding

[00:04:02] : [00:04:05]

the most informative subsets in our

[00:04:05] : [00:04:07]

large pool of unlabeled data that will

[00:04:07] : [00:04:10]

most help our model by being annotated

[00:04:10] : [00:04:13]

like uncertainty sampling where you test

[00:04:13] : [00:04:15]

your current model on your unlabeled

[00:04:15] : [00:04:17]

data and draw the least confident

[00:04:17] : [00:04:20]

classified examples to annotate another

[00:04:20] : [00:04:22]

technique shown here is the query by

[00:04:22] : [00:04:25]

committee or QBC approach here we have

[00:04:25] : [00:04:27]

multiple models our committee models

[00:04:27] : [00:04:29]

they will all be trained on a different

[00:04:29] : [00:04:32]

subset of our label data and thus have a

[00:04:32] : [00:04:34]

different understanding of our problem

[00:04:34] : [00:04:37]

these models will each have a hypothesis

[00:04:37] : [00:04:39]

on the classification of our unlabeled

[00:04:39] : [00:04:43]

data that should be somewhat similar but

[00:04:43] : [00:04:45]

still different because they basically

[00:04:45] : [00:04:47]

see the world differently just like us

[00:04:47] : [00:04:50]

that have different live experience and

[00:04:50] : [00:04:52]

have seen different animals in our lives

[00:04:52] : [00:04:54]

but still have the same concepts of a

[00:04:54] : [00:04:57]

cat and a dog then it's easy the data to

[00:04:57] : [00:04:59]

be annotated is simply the ones our

[00:04:59] : [00:05:02]

models most disagree on which means it

[00:05:02] : [00:05:05]

is complicated to understand and we

[00:05:05] : [00:05:07]

start over by feeding the selected data

[00:05:07] : [00:05:10]

to our experts for annotation this is of

[00:05:10] : [00:05:12]

course a basic explanation of active

[00:05:12] : [00:05:15]

learning with only one example of a

[00:05:15] : [00:05:17]

query strategy let me know if you'd like

[00:05:17] : [00:05:19]

more videos on other machine learning

[00:05:19] : [00:05:21]

strategies like this here A clear

[00:05:21] : [00:05:23]

example of the active learning process

[00:05:23] : [00:05:26]

is when you answer captchas on Google it

[00:05:26] : [00:05:29]

helps you identify complex images and

[00:05:29] : [00:05:32]

build data sets using you and many other

[00:05:32] : [00:05:35]

people as a committee jury for

[00:05:35] : [00:05:36]

annotation

[00:05:36] : [00:05:39]

building cheap and great data sets while

[00:05:39] : [00:05:41]

entering you are a human serving two

[00:05:41] : [00:05:44]

purposes so next time you are annoyed by

[00:05:44] : [00:05:46]

a captcha just think that you are

[00:05:46] : [00:05:49]

helping AI models progress but we have

[00:05:49] : [00:05:51]

enough theory for now I thought it would

[00:05:51] : [00:05:52]

be great to partner with some friends

[00:05:52] : [00:05:55]

from encord a great company I have known

[00:05:55] : [00:05:58]

for a while now to Showcase a real

[00:05:58] : [00:06:00]

example of active learning since we are

[00:06:00] : [00:06:02]

in this team it's for sure the best

[00:06:02] : [00:06:04]

platform I have seen yet for active

[00:06:04] : [00:06:07]

learning and the team is amazing before

[00:06:07] : [00:06:09]

diving into a short practical example I

[00:06:09] : [00:06:11]

just wanted to mention that I will be at

[00:06:11] : [00:06:14]

cvpr in person this year and so will

[00:06:14] : [00:06:16]

Encore if you are attending in person 2

[00:06:16] : [00:06:19]

let me know and go check out their Booth

[00:06:19] : [00:06:22]

it's Booth 1310. here's a quick demo we

[00:06:22] : [00:06:23]

put together for exploring one of

[00:06:23] : [00:06:26]

encore's products that perfectly fits

[00:06:26] : [00:06:29]

this episode and chord active it is

[00:06:29] : [00:06:30]

basically an active learning platform

[00:06:30] : [00:06:32]

where you can perform everything we

[00:06:32] : [00:06:35]

talked about in this video without any

[00:06:35] : [00:06:37]

coding with a great visual interface

[00:06:37] : [00:06:39]

here's what you would see in a classic

[00:06:39] : [00:06:41]

visual task like segmentation once you

[00:06:41] : [00:06:44]

open up your project you directly have

[00:06:44] : [00:06:46]

relevant information and statistics

[00:06:46] : [00:06:48]

about your data you'll see all the

[00:06:48] : [00:06:50]

outlier characteristics of your data

[00:06:50] : [00:06:52]

which will help you figure out what

[00:06:52] : [00:06:55]

causes the issues in your test for

[00:06:55] : [00:06:57]

example here we see that blur is one of

[00:06:57] : [00:06:59]

those outliers that has been

[00:06:59] : [00:07:01]

automatically identified if we check out

[00:07:01] : [00:07:03]

the worst images for that category we

[00:07:03] : [00:07:05]

can easily find some problematic images

[00:07:05] : [00:07:08]

and tag them for review like here where

[00:07:08] : [00:07:10]

the image is super saturated you can

[00:07:10] : [00:07:13]

also visualize groups of data thanks to

[00:07:13] : [00:07:14]

their embeddings just like clip

[00:07:14] : [00:07:16]

embeddings that you might have heard a

[00:07:16] : [00:07:18]

lot these days and those embeddings can

[00:07:18] : [00:07:21]

easily be compared together and grouped

[00:07:21] : [00:07:23]

when similar helping you find

[00:07:23] : [00:07:25]

problematic groups all at once instead

[00:07:25] : [00:07:27]

of going through your data one by one

[00:07:27] : [00:07:29]

then once you are satisfied with your

[00:07:29] : [00:07:32]

identified images to review you can

[00:07:32] : [00:07:34]

simply export it to their encode

[00:07:34] : [00:07:35]

platform where you can do your

[00:07:35] : [00:07:38]

annotation directly when you have your

[00:07:38] : [00:07:40]

annotations and you get back on the

[00:07:40] : [00:07:42]

encode active platform you can now

[00:07:42] : [00:07:44]

visualize what it looks like with labels

[00:07:44] : [00:07:47]

you can see how the embedding plots have

[00:07:47] : [00:07:49]

changed now with the different classes

[00:07:49] : [00:07:51]

attached here again you can look at

[00:07:51] : [00:07:53]

different subgroups of data to find

[00:07:53] : [00:07:56]

problematic ones for example you can

[00:07:56] : [00:07:58]

look at images containing school buses

[00:07:58] : [00:08:00]

this can be done using natural language

[00:08:00] : [00:08:03]

to look for any information in images

[00:08:03] : [00:08:06]

metadata or classes something quite

[00:08:06] : [00:08:07]

necessary these days if you want to say

[00:08:07] : [00:08:09]

that you are working in AI when you

[00:08:09] : [00:08:11]

cannot find any more problems easily

[00:08:11] : [00:08:14]

with your data you train your model and

[00:08:14] : [00:08:16]

come back to the platform to analyze its

[00:08:16] : [00:08:19]

performance once again you have access

[00:08:19] : [00:08:22]

to a ton of valuable information about

[00:08:22] : [00:08:25]

how well your model is performing for

[00:08:25] : [00:08:27]

example if we take a look at the object

[00:08:27] : [00:08:30]

area where we see that small images seem

[00:08:30] : [00:08:32]

problematic we can easily filter them

[00:08:32] : [00:08:35]

out and create a new sub data set using

[00:08:35] : [00:08:38]

only our problematic small object images

[00:08:38] : [00:08:41]

the project is created in your Encore

[00:08:41] : [00:08:43]

active dashboard with all the same

[00:08:43] : [00:08:45]

statistics you had but for only this set

[00:08:45] : [00:08:48]

of data if you want to have a closer

[00:08:48] : [00:08:50]

look or run experiments with this more

[00:08:50] : [00:08:53]

complicated part of the data like using

[00:08:53] : [00:08:55]

it for training one of your committee

[00:08:55] : [00:08:58]

models and you repeat this Loop over and

[00:08:58] : [00:09:00]

over on the annotating problematic data

[00:09:00] : [00:09:03]

and improving your model as efficiently

[00:09:03] : [00:09:06]

as possible it will both reduce the need

[00:09:06] : [00:09:08]

for paying experts annotators especially

[00:09:08] : [00:09:11]

if you work with medical applications as

[00:09:11] : [00:09:13]

I do or other applications where experts

[00:09:13] : [00:09:15]

are quite expensive and maximize the

[00:09:15] : [00:09:18]

results of your model I hope you can now

[00:09:18] : [00:09:20]

see how valuable Active Learning can be

[00:09:20] : [00:09:23]

and maybe even try it out with your own

[00:09:23] : [00:09:25]

application and it can all be done with

[00:09:25] : [00:09:27]

a single product if you want to let me

[00:09:27] : [00:09:29]

know if you do so

[00:09:29] : [00:09:31]

but before ending this video I just

[00:09:31] : [00:09:33]

wanted to thank ankord for sponsoring

[00:09:33] : [00:09:35]

this week's episode with a great example

[00:09:35] : [00:09:37]

of active learning and an amazing

[00:09:37] : [00:09:39]

product I also wanted to point out that

[00:09:39] : [00:09:42]

they had a webinar on June 14th on how

[00:09:42] : [00:09:44]

to build a semantic search for visual

[00:09:44] : [00:09:47]

data using chatgpt and clip that is

[00:09:47] : [00:09:50]

housed on encode active with a recording

[00:09:50] : [00:09:52]

available if you want to check it out

[00:09:52] : [00:09:54]

it's definitely worthwhile and super

[00:09:54] : [00:09:57]

interesting I hope you enjoyed this

[00:09:57] : [00:09:59]

episode format as much as I enjoyed

[00:09:59] : [00:10:03]

making it thank you for watching

[00:10:03] : [00:10:03]


L O A D I N G
. . . comments & more!

About Author

Louis Bouchard HackerNoon profile picture
Louis Bouchard@whatsai
I explain Artificial Intelligence terms and news to non-experts.

ETİKETLERİ ASIN

BU YAZI...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
X REMOVE AD