MulD-tasks learning and transfer and domain

Transcription

MulD-tasks learning and transfer and domain
=.1DE(52F2"1+5-'*'3"5'G""
(-5'2C+-"5'G""G)H5*'"5G5B(5D)'"
:'(-)G.7D)'"
&'()*'+",)-'./0)12"
&3-)45-*26+78"9":;<&"""=:&">#?"
5'()*'+@7)-'.+0)12A53-)B5-*2(+78@C-"
!"#$%"
PAN AND YANG: A SURVEY ON TRANSFER LEARNING
&"(5O)')HP"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
1349
Transfer Learning
%" !"#$%"
Z)H5*'"5G5B(5D)'"
•  Z+^'*D)'"
V45'K"6TE:_,&:`#]"(.()-*51\
"
Definition
[Pan,
TL-IJCAI’13 tutorial]
Ability
of a system to recognize and apply knowledge and skills learned in
–  &a*1*(P")C"5"2P2(+H"()"!"#$%&'(")5'G"*++,-)F')b1+G3+"5'G"2F*112"1+5-'+G"*'"
previous+!".'$/0)1$2*'&034*050"()"&$.",)1$2*'&034*050)
domains/tasks to novel domains/tasks
An
•  example
[O5HB1+"
• We have labeled images from a Web image corpus
–  d+"85S+",*6","1)'2*%"0"C-)H"5"7"6)#$!+/0)
• Is there
a Person in unlabeled images from a Video corpus ?
–  ;)S+1"(52Fe"'0)48"!")*)+"!0$&"*'".'15a+1+G"*H53+2"C-)H"5".'1"$)#$!+/0f)
Person
Fig. 2. An overview of differentQ-)H"R&"2.-S+P")'"6-5'2C+-"T+5-'*'3U"V45'"W"X5'3K"6YZ[K"%$#$\"
settings of transfer.
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
behind this context is that some relationship among the data
in the source and target domains is similar. Thus, the
knowledge to be transferred is the relationship among the
data. Recently, statistical relational learning techniques
dominate this context [51], [52].
]" !"#$%"
fT ð"Þ in DT using the knowledge in DS and T S , where
T S 6¼ T T .
Based on the above definition of the inductive transfer
learning setting, a few labeled data in the target domain are
(LaHC)
no Person
Is there a Person?
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
4 / 95
c" !"#$%"
[O5HB1+2e"(-5'2C+-"1+5-'*'3"*'"S*2*)'"
Hard to predict
what will change in the new domain
;5(.-51"T5'3.53+"4-)7+22*'3"
Natural Language Processing
Natural Language Processing
6+O(2"5-+"-+B-+2+'(+G"aP"Rb)-G2U""M+@3@"j53")C"b)-G2N"
Text of
areWords)
represented by “words” (Bag of Words)
Text are represented by “words” (Bag
•  652F2"
of learned
Speech Tagging:
Adaptpapers
a tagger learned
Part of Speech Tagging: Adapt a Part
tagger
from medical
–  )9*!4)$:);+""#8)<*%%'&%e"&G5B("5"(533+-"1+5-'+G"C-)H"H+G*751"B5B+-2"()"5"
a journal (Wall Street Journal) - Newsgroup
to a journal (Wall Street Journal) to
- Newsgroup
0).-'51"Md511"h(-++("_).-'51N"9";+b23-).B"
–  );+*2)1"4"#=$&e"&G5B("5"71522*^+-"C-)H")'+".2+-"()"5')(8+-"
Spam detection: Adapt a classifier from one ma
Spam detection: Adapt a classifier from one mailbox to another
–  );"&=2"&4)*&*,-0'0)
[Xu,Saenko,Tsang,
Domain Transfer Tutorial - CVPR’12]
Vg.K"h5+'F)K"625'3""RZ)H5*'"6-5'2C+-U"(.()-*51"9",i4<`#%\"
(LaHC)
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
Sentiment analysis:
Sentiment analysis:
(LaHC)
>" 18!"#$%"
/ 95
Z)H5*'"5G5B(5D)'"C)-"2+'DH+'("5'51P2*2
"V45'K"6TE:_,&:`#]"(.()-*51\"
Domain
Adaptation for sentiment analysis
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
(LaHC)
Domain Adaptation - EPAT’14
Domain
k" Adaptation - EPAT’14
!"#$%"
19 / 95
Domain Adaptation for sentiment analysis - ex
Z)H5*'"5G5B(5D)'"C)-"2+'DH+'("5'51P2*2"
[Pan-IJCAI’13 tutorial]
Electronics
Video games
(1) Compact; easy to operate; very
good picture quality; looks sharp!
(2) A very good game! It is action
packed and full of excitement. I am
very much hooked on this game.
(3) I purchased this unit from Circuit
City and I was very excited about the
quality of the picture. It is really nice
and sharp.
(4) Very realistic shooting action and
good plots. We played this and were
hooked.
(5) It is also quite blurry in very dark
settings. I will never buy HP again.
(6) It is so boring. I am extremely
unhappy and will probably never buy
UbiSoft again.
Source specific: compact, sharp, blurry.
Target specific: hooked, realistic, boring.
Domain independent: good, excited, nice, never buy, unhappy.
(LaHC)
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
(LaHC)
Domain Adaptation - EPAT’14
l" !"#$%"
20 / 95
Domain Adaptation - EPAT’14
V45'K"6TE:_,&:`#]"(.()-*51\"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
21 / 95
?" !"#$%"
,)E(-5*'*'3"
•  ;"2'>0/+"!.'0"1"1+5-'*'3"
–  &"02*,,)+*!4")C"(8+"(-5*'*'3"G5(5"*2"15a+1+G"
–  &",*!%")+*!4"*2".'15a+1+G"
•  RZ*253-++H+'(Ea52+GU"H+(8)G2"
,)E6-5*'*'3"
–  <7$)#,*00'?"!0"5-+"(-5*'+G"2+B5-5(+1P"
–  n'"G*o+-+'("2.a2+(2")C"(8+"G+27-*BDS+"C+5(.-+2"M47$).'"70N"
•  &BB1*75D)'2"
–  ;5(.-51"15'3.53+"B-)7+22*'3"
–  :H53+"-+(-*+S51"
m" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
!"#$+#,(-#(!$00#0$1-0-%#($!30-#2)#0-$*/#'/'2'$0#*.0-(6#
!"#$+#,(-#(!$00#0$1-0-%#($!30-#2)#0-$*/#'/'2'$0#*.0-(6#
:)%--;-($+<-$=--(+$="+>&%$1+*+0"#$%&'('()[email protected];#B'$0C-..DEFG+
‡ (J´P\DGYLVRUµSRLQWLQJWRDSDJHLVDJRRGLQGLFDWRULWLVD
(J´P\DGYLVRUµSRLQWLQJWRDSDJHLVDJRRGLQGLFDWRULWLVD
•  )@1"*e".2+"2H511"15a+1+G"25HB1+"()","*!&)'&'=*,)!/,"0)
5$7.02"#8)!-#3$4-6#
#++-3&;>.-1+0"($&'(+$="+1A//'0'-($+1-$1+"/+/-&$A%-1H+3+I+!+34H+36+"++
#
–  [@3@"RHP"5GS*2)-U"B)*'D'3"()"5"B53+"*2"5"3))G"*'G*75()-"(85("(8*2"B53+"*2"5"
‡ (J´,DPWHDFKLQJµRQDSDJHLVDJRRGLQGLFDWRULWLVDIDFXOW\
(J´,DPWHDFKLQJµRQDSDJHLVDJRRGLQGLFDWRULWLVDIDFXOW\
8)!-#3$4-6#
C57.1(P"8)H+"B53+@"
#++<-.'-/*+$C-+>&%$1+&%-+0"(1'1$-($H+'G-G+$+04H+06+1G$G+04K34LI06K36LI0MK3L+
J"%+-3&;>.-H+'/+=-+=&($+$"+0.&11'/2+=-<+>&)-1*# x = ! x1, x2 "#
#./0$1-0-%#%$2$#2)##3*)3$4$2-#0-$*/-%#'/5)*!$2')/6#
#./0$1-0-%#%$2$#2)##3*)3$4$2-#0-$*/-%#'/5)*!$2')/6#
– !"#$+#,([@3@"R:"5H"(+578*'3U")'"5"B53+"*2"5"3))G"*'G*75()-"*("*2"5"C57.1(P"8)H+"B53+@"
#
&1+/&0A.$2+;-;<-%+C";->&)-+"%+("$#
!"#$%&'()*#
3+#+7'(8+'(/"+9+5-3$+'(/"+
+*),-#$&*'.#/01.#
34#+5-3$+'(/"+
#$" !"#$%"
92-*$2'&-#:);<*$'/'/4#
:(+-5DS+"7)E(-5*'*'3"
,)E(-5*'*'3"5'G"2+1CE7)'2*2(+'7P"
!"#$%&'('()*+,-./#0"(1'1$-(02+
+*),-#$&*'.#/01.#
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
!"#$%&'()*#
36#+7'(8+'(/"+
p2*'3"21*G+2")C"=5-*5EQ1)-*5'5"j5175'"VhhTE7)E(-5*'*'3Ej5175'E#mq221q$]E]$E%$#>@BGC\"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
##" !"#$%"
!"#$%&'()*#
•  )@1"*e"(8+'".2+".'15a+1+G"G5(5"()"B-)B535(+"1+5-'+G"*'C)-H5D)'"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
#%" !"#$%"
8,%/),19%":-;</)1'1'0"
!"#$!"#$%"$3)(("()*%(%+"$)3.(%",-"(%)/'"1'1,1)("/&(%$4"
:(+-5DS+"7)E(-5*'*'3"
!"#$%"&'#()*+,$%&-&-.( G
‡ (J´P\DGYLVRUµSRLQWLQJWRDSDJHLVDJRRGLQGLFDWRULWLVD
2)5&(,6"7-3%".)0%4"
"
‡ (J´,DPWHDFKLQJµRQDSDJHLVDJRRGLQGLFDWRULWLVDIDFXOW\
7-3%".)0%4"
H*$A1(<>(B1&-.(B-0%<#0#=(=%"%("*((
?$*?%.%"#(0#%$-#=(&-8*$D%"&*-:(
(
!"#$!"#$%"&'()*%(%+"+),)",-""./-.)0),%"(%)/'%+"1'2-/3),1-'4"
"=-->"2-/"&'()*%(%+"%?)3.(%$"@7%/%"-'%"/&(%"1$"5-'21+%',")'+"
,7%"-,7%/"1$"'-,4"A)9%"1,"()*%(",7%"%?)3.(%"2-/",7%"-,7%/4""
!"
!"
!"
73(
7(
‡ /%'#(0#%$-&-.(%0.*1(234(25(*-(#%67(*8("7#("9*('&#91:(
‡ ;1#(0%<#0#=(=%"%("*(0#%$-("9*(&-&"&%0(7>?:(734(75:(
(
F#?#%"(
(
!?BC?D"# !?BC?D"#
!?BC?D"# !?BC?D"#
!?BC?D"# !?BC?D"#
‡ @**A("7$*B.7(B-0%<#0#=(=%"%("*(8&-=(#C%D?0#1(
97#$#(*-#(*8(7&(&1((6*-8&=#-"(<B"(*"7#$(&1(-*":(
‡ /%'#("7#(6*-8&=#-"(7&(0%<#0(&"(8*$(%0.*$&"7D(2E+&:(
</)1'1'0"D"5()$$121%/$C"-'%"-'"%)57",6.%"-2"1'2-4""#$1'0"
%)57",-"7%(.",/)1'",7%"-,7%/4"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
G5(
3(
#]" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
#c" !"#$%"
!"#$%"&'#()*+,$%&-&-.(((
n-*3*'51"5BB1*75D)'e"d+aB53+"71522*^75D)'"
!"#$#%&'()**'#+&,#-%.(/01*&$0(+'&22#3#+&,#-%(
:(+-5DS+"7)E(-5*'*'3"
/(0&123#(45%123#6(7#%$-&-.(!-"#$'%38(
45('&10'06(07&8*'029(4:::(;%'&10'06(
((7%=#3#>(#5%123#8(
((<-3%=#3#>(#5%123#8(
•  &"2*HB1+"+O5HB1+e!"#$%&'&(!'&)#%*$"+!
A(
9:(
+(
<2&8*'0(";%=(
?:;(
+(
9;(
?;;(
<8#(3%=#3#>(>%"%("*(3#%$-(?;;(%->(?:;(
<8#(@-3%=#3#>(>%"%("*(=**"8"$%2(
?::(
?:;(
?;:(
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
#>" !"#$%"
?;:(
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
#k" !"#$%"
[OB5'2*)'e"1+5-'*'3"*'(+-S512"
,)E(-5*'*'3e"68+)-+D751"3.5-5'(++2"
!"#$%&'(%)*!"$+#,-&.*/-$0%'%1*2%3-04$,&*
:(%&'&3-%5;.*<-0(*#0(=$=','3;*
+$&&*'%*3>-*0-1'(%&*
56*
57*
?(%@-"#$%A'%1*B%(%@>-,#CD,E*
A'&30'=D3'(%*
!"#$%A'%1*A'&30'=D3'(%*
• +>%OXP0LWFKHOO&2/7¶@+
Vj1.H"W"=*(78+11K",nT6`m?\"
#@ 
=>
%@ 
@>
89*
56*
56*
F7*
!"#$%&'('()*+,-."%.$'/&0+12&%&($..3+
!"#$%,%-,#%.#+!&'%()%*))&%+',%-'.$,#/*/*0%)-!/-%0!/#""1%
4-&$+5%"5.%$'.3+6"+7.+(..6+8"%+/"#$%&'('()+$"+7"%9+7.00:+
•  d+"'++G"522.HBD)'2"5a).(e"
4.+(..6+&332;5$'"(3+&<"2$*+
#@  "68+".'G+-1P*'3"G5(5"G*2(-*a.D)'"
=> $-.+2(6.%0?'()+6&$&+6'3$%'<2$'"(+
%@  "68+"1+5-'*'3"513)2")'"(8+"(b)"2*G+2"
@> $-.+0.&%('()+&0)"3+"(+$-.+$7"+3'6.3+
89*
":'G+B+'G+'7+"3*S+'"(8+"15a+1"
+A(6.5.(6.(/.+)'B.(+$-.+0&<.0+
"&13)"C)-"1+5-'*'3"-5'G)H"')*2+"
+C0)>+8"%+0.&%('()+8%";+%&(6";+("'3.>+
•  DE&0/&(F+E02;F+G&()F+HAIJ+@KKLM+
Vj5175'K"j1.H"W"X5'3K";:4h"%$$c\"
#@  "Z*2(-*a.D)'51"+OB5'2*)'"
F6*
57*
‫ܦ‬ଵା +
‫ܦ‬ଶା +
‫ܦ‬ଵି +
‫ܦ‬ଶି +
=> N'3$%'<2$'"(&0+.O5&(3'"(>+
@> +C0)>+8"%+0.&%('()+8%";+5"3'$B.+6&$&+"(0?>+
#l" !"#$%"
,)E(-5*'*'3e"68+)-+D751"3.5-5'(++2"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
#?" !"#$%"
&')(8+-"5BB-)578e"7)E-+3.15-*s5D)'"
!"#$%,%-,#%.#+!&'%()%*))&%+',%-'.$,#/*/*0%)-!/-%0!/#""1%
•  ;)"*(+-5DS+"B-)7+G.-+"
•  Vj5175'"+("51@"M%$$>N\"
– 
P'.7+@+
%@  "&13)"C)-"1+5-'*'3"C-)H"B)2*DS+"G5(5")'1P"
57*
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
P'.7+=+
•  Z*-+7(1P"H*'*H*s+2""
":C"(8+"71522*^+-2"5-+"'+S+-"7)'^G+'("a.("b-)'3K"(8+"!E+OB5'2*)'"
522.HBD)'"75'"3.5-5'(++"(8+"2.77+22")C"7)E(-5*'*'3"
–  "(8+)"!!$!)!*4")$&),*6","1)1*4*""
–  "&'G"(8+"1'0*%!""2"&4))S+-"/&,*6","1)1*4*)
•  Vd5'3"W"r8)."M%$$lN\"
– 
"*C"(8+"(b)"71522*^+-2"85S+"15-3+"G*S+-2*(PK"7)E(-5*'*'32"2(P1+"513)-*(8H2"
75'"2.77++G"
D$2+*=6','4-))C"5"B5*-")C"8PB)(8+2+2"C#K"C%e"
&11"(8+2+"5'51P2+2"522.H+"(85(""
"*#8)A.'"7B)'0)0/C#'"&4"()"1+5-'"b+11"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
"M')"B2+.G)E15a+12"5-+"522*3'+G"()".'15a+1+G"G5(5N"
#m" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
%$" !"#$%"
!"#$%&'('()*+,-$'#.'/01223415'%/6$1
,)E(-5*'*'3!=.1DES*+b"hhTe"Z*-+7(")BDH*s5D)'")C"53-++H+'("
78$'9':&$'"(1";1<)%//9/($1
1@(8,$41୪ =>Ašଵ ǡ ›ଵ B«š୫ౢ ǡ ›୫ౢ B?11
1୳ =>šଵ «š୫౫ ?1
1@(8,$41୪ =>Ašଵ ǡ ›ଵ B«š୫ౢ ǡ ›୫ౢ B?11
1୳ =>šଵ «š୫౫ ?1
ଶ
୫ౢ
ଶ
୫౫
ƒ”‰‹୦భǡ୦మ ෍ ෍ ŽሺŠ୪ š୧ ǡ ›୧ ሻ ൅ ෍ ƒ‰”‡‡‡–ሺŠଵ š୧ ǡ Šଶ š୧ ሻ1
୪ୀଵ ୧ୀଵ
!"#$%&'('()*+,-$'#.'/01223415'%/6$1
,)E(-5*'*'3!=.1DES*+b"hhTe"Z*-+7(")BDH*s5D)'")C"53-++H+'("
78$'9':&$'"(1";1<)%//9/($1
୫ౢ
୫౫
ƒ”‰‹୦భǡ୦మ ෍ ෍ ŽሺŠ୪ š୧ ǡ ›୧ ሻ ൅ ෍ ƒ‰”‡‡‡–ሺŠଵ š୧ ǡ Šଶ š୧ ሻ1
୧ୀଵ
୪ୀଵ ୧ୀଵ
୧ୀଵ
Ȉ ŽሺŠ š୧ ǡ ›୧ ሻ1-"CC1;,(6$'"(1
C&6D1";1$D/91D&E1E9&--1
-&F/-/G1/%%"%1
‡ DE)EF1CG,&%/1-"CC1Ž Š š୧ ǡ ›୧ ൌ ›୧ െ ݄ š୪
H/),-&%':/%1$"1/(6",%&)/1
&)%//9/($1"./%1,(-&F/-/G1G&$1
CJ)JL1
IJ1K&%$-/$$L15J1H"E/(F/%)L1<@2M<M21NOOPQ11 RJ12%'GD&%&(L12J1R&S&G/L1!73M1NOOT1
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
%#" !"#$%"
,)E(-5*'*'3"b*(8"*'2.t7*+'("S*+b2"
‡ DE)EF1H*I1-"CC1Ž Š š୧ ǡ ›୧ ൌ ͳ௬೔ ஷ௛ሺ௫೔ሻ 1
DE)EF1
JE1K&%$-/$$F15E1L"C/(M/%)F1<@2N<N21OHHPQ11 RE12%'ST&%&(F12E1R&U&S/F1!73N1OHHV1
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
%%" !"#$%"
,)E(-5*'*'3"b*(8"*'2.t7*+'("S*+b2"M%N"
"Vd+*"r85'3"W"r8*Eu.5"r8)."M%$#]N@"R2'.$,#/*/*0%(/$"%/*345-/)*$%6/)(3U@""
_=T<K"%mecklec?%K"%$#]\"
"Vd+*"r85'3"W"r8*Eu.5"r8)."M%$#]N@"R2'.$,#/*/*0%(/$"%/*345-/)*$%6/)(3U@""
_=T<K"%mecklec?%K"%$#]\"
•  :C"&$)0*2+,'&%)6'*0e""
•  4-+S*).2"2(.G*+2"522.H+"(85("+578"S*+b"*2"2.t7*+'("
•  <7$)+$4"&=*,)+!$6,"20"C)-"7)E(-5*'*'3"
–  )E*6",)&$'0"e"*C")'+"b+5F"1+5-'+-"G)+2"')("15a+1"577)-G*'3"()"(8+"(5-3+("7)'7+B("
–  ":C"(8+"1'."!0'4-)b*(8"H5-3*'2""#"5'G""%"a+(b++'"(8+"(b)"S*+b2)'0)G"M+S+-P".'15a+1+G"*'2(5'7+"7).1G"
a+"15a+1+G"b*(8"15-3+"H5-3*'"aP")'+"(8+"(b)"S*+b2N"7)E(-5*'*'3"7).1G"$/4+/4)*)&"*!)%$$1)
8-+$48"0'0@"
j.("(8+-+"75'"a+"5"15-3+"G*o+-+'7+"a+(b++'"(8+"(b)")BDH51"71522*^+-2"*'"(8+"(b)"S*+b2@"
–  "68+"')D)'")C"*'C)-H5D)'"M2.t7*+'7PN"H.2("a+"*'(-)G.7+G@"@:)$&").'"7)#*&)+!$.'1")*)%$$1)
'&:$!2*=$&K"5'G"*C"(8+"1'."!0'4-))C"71522*^+-2"(-5*'+G")'"(8+"*'*D51"15a+1+G"G5(5"b*(8"H5-3*'2""#"5'G"
"%)'0)GK"(8+'"7)E(-5*'*'3"7).1G"$/4+/4)*)!>*++!$H'2*=$&)$:)48")$+=2*,)#,*00'?"!)
–  );*2+,")6'*0e"(8+"B2+.G)E15a+1+G"G5(5"H*38("a+"')("*@*@G@"
•  ;)D)'2")C""
•  d*(8"0*2+,'&%)6'*0e"
–  )@&0/C#'"&#-))C"S*+b2"
–  ":C"(8+"1'."!0'4-)b*(8"H5-3*'2""#"5'G""%"a+(b++'"71522*C+-2"(-5*'+G")'"*'*D51"15a+1+G"G5(5)'0),*!%"K"7)E
(-5*'*'3"#$/,1)'2+!$.")48")+"!:$!2*&#")$:)7"*5)8-+$48"0"2"aP"+OB1)*D'3".'15a+1+G"G5(5".'D1"(8+"
G*S+-2*(P"a+(b++'"(8+"(b)"S*+b2"a+7)H+2"$@""
–  )F'."!0'4-))C"S*+b2"Ma52+")'"(8+"H5-3*'2")C"(8+"(b)"71522*^+-2N"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
ଶ1
%]" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
%c" !"#$%"
&22+22H+'("
•  ,)E(-5*'*'3e"5"&'#")'1"*)
•  u52"a++'"C).'G"/0":/,"*'"2+S+-51"5BB1*75D)'2)
=.1DE(52F"1+5-'*'3"
•  j.("(8+"48"$!"=#*,)*&*,-0'0"*2"2D11"'&#$2+,"4")
–  "T*'F2"b*(8"2+H*E2.B+-S*2+G"1+5-'*'3"
–  "d85("5a).("()(511P".'2.B+-S*2+G"7)'(+O(f"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
%>" !"#$%"
&22.HBD)'"a+8*'G"=6T"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
%k" !"#$%"
4)22*a1+"-+15D)'2"a+(b++'"(52F2"
•  68+"#$26'&"1),"*!&'&%")C"H.1DB1+"-+15(+G"(52F2"#*&)$/4+"!:$!2),"*!&'&%)
"*#8)4*05)'&)'0$,*=$&)
•  =6T"511)b2"C)-"#$22$&)'&:$!2*=$&"285-+G"a+(b++'"(8+"(52F2"()"a+".2+G"*'"
(8+"1+5-'*'3"B-)7+22K"b8*78"1+5G2"()"a+v+-"3+'+-51*s5D)'"*C"(8+"(52F2"5-+"
-+15(+G"
•  [@3@"T+5-'*'3"())+!"1'#4)48")!*=&%0"C)-"2+S+-51"G*o+-+'("7-*D72"M*'"G*o+-+'("
7).'(-*+2N"75'"1+5G"()"a+v+-"B+-C)-H5'7+2"C)-""*#8)0"+*!*4")4*05"MB-+G*7("(8+"
-+2(5.-5'("-5D'32"C)-"5"2B+7*^7"7-*D7N"
•  &11"C.'7D)'2"()"a+"1+5-'"5-+"#,$0")()"+578")(8+-"'&)0$2")&$!2)
–  "[@3@"C.'7D)'2"75B(.-*'3"B-+C+-+'7+2"*'".2+-2`"H)G+1*'3"B-)a1+H2"
•  652F2"(85("285-+"5"#$22$&)/&1"!,-'&%)!"+!"0"&4*=$&)
–  "[@3@"*'""47#*%6/3/'*K"511"(52F2".2+"(8+"0*2")0"4)$:):"*4/!"0"1+5-'("*'"(8+"
^-2("2(53+2")C"(8+"S*2.51"2P2(+H"M+@3@"1)751"^1(+-2"2*H*15-"()"b5S+1+(2N"
–  "p2+-2"H5P"512)"8,)+),"G*o+-+'("(PB+2")C"(8*'32"M+@3@"a))F2K"H)S*+2K"H.2*7N"
a52+G")'"(8+"0*2")0"4)$:):"*4/!"0")-"0#$!")C.'7D)'2"
•  T+5-'*'3"()"!"#$%&'(")*):*#""5'G"(8+""H+!"00'$&)MC+5-K"G*23.2(K"5'3+-K"wN"
•  I/,=)2$1*,'4-),"*!&'&%e"+@3@"S*2*)'"5'G"B-)B-*)7+BD)'"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
%l" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
%?" !"#$%"
x.+2D)'"
&BB-)78+"B5-"-/3.15-*25D)'"e"5BB-+'D2253+"H.1DE(J78+2"
u)b"G)"b+"78)2+"()"2$1",)48")08*!"1)'&:$!2*=$&"a+(b++'"(8+"
(52F2f"
•  9""(J78+2"G+"71522*^75D)'"a*'5*-+"G/^'*+2"2.-"X"O"Y")
uPB)(8y2+2"1*'/5*-+2"
•  h)H+"285-+G".'G+-1P*'3"7)'2(-5*'("
–  [@3@"5",$7)1'2"&0'$&*,)!"+!"0"&4*=$&"285-+G"57-)22"H.1DB1+"-+15(+G"
(52F2"
:#,$#0)%)*$,)%$;-")3"
•  jP"b5P")C"5"285-+G"8*GG+'"15P+-"*'"5"'+.-51"'+(b)-F"
•  jP"+OB1*7*(1P"7)'2(-5*'*'3"(8+"G*H+'2*)'51*(P")C"5"285-+G"-+B-+2+'(5D)'"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
%m" !"#$%"
=.1DE(52F"C+5(.-+"1+5-'*'3"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
]$" !"#$%"
=.1DE(52F"C+5(.-+"1+5-'*'3"M%N"
•  h.BB)2+"(85("+578"-+3-+22*)'"C.'7D)'""$"*2"
1*'+5-"*'"C+5(.-+"C.'7D)'2"+$""
•  h.BB)2+"$&,-)$&")4*05"b*(8":"*4/!"0)1')?H"1"5"B-*)-*"
•  T+5-'*'3"(52Fe"1+5-'"(8+"B5-5H+(+-"S+7()-"#("*'"<G"C-)H"G5(5"2+("
•  68+"C+5(.-+"C.'7D)'2"5-+"2.BB)2+G"()"a+"
1*'+5-@"&'G"(8+"1*"5-+"2.BB)2+G"()"a+"
)-(8)')-H51"
•  d+"b5'("()"a).'G"(8+"'.Ha+-")C"')'"s+-)"7)HB)'+'(2")C"$("""
&@"&-3P-*)."W"6@"[S3+'*)."M%$$kN@"R=.1DE(52F"C+5(.-+"1+5-'*'3U@";:4hE%$$k@"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
]#" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
]%" !"#$%"
=.1DE(52F"C+5(.-+"1+5-'*'3"M]N"
=.1DE(52F"C+5(.-+"1+5-'*'3"McN"
•  LH+"!'2"&40)M2P'(8+D7"G5(5N"%$$"(52F2""
•  h.BB)2+"2/,=+,")4*050"b*(8":"*4/!"0)1')4$)6"),"*!&"1"
• 
•  b*(8"(("C-)H"5">"G*H+'2*)'51"}5.22*5'"MH+5'"5'G"7)S5-*5'7+N~"%$"*--+1+S5'("G*H+'2*)'2"
T+5-'*'3"(52Fe"1+5-'"(8+"B5-5H+(+-"S+7()-"#("*'"<G"5'G"(8+"
•  [578"(52Fe">")-"#$"+O5HB1+2"
C+5(.-+2"1*"C-)H"G5(5"2+("
18
•  d+"b5'("()"a).'G"(8+"'.Ha+-")C"')'"s+-)"7)HB)'+'(2")C"$(""
5'G"85S+"J"()"a+"5"1)b"-5'F"H5(-*O"
12
15
T = 200T = 200
T = 100T = 100
0.16 0.16
T = 25 T = 25
T = 10 T = 10
0.14
0.14
10
independent
independent
10
0.12 0.12
1
1
8
0.1
0.1
5
0.08 0.08
0.9
0.9
0.8
0.8
0.06 0.06
0.7
0.7
16
14
6
1.2
1.1
T = 200T = 200
T = 100T = 100
1.2
T = 25 T = 25
T = 10 T = 10
1.1
independent
independent
4
2
−4
10
−2
10
0
0.04 0.04
5
5
0
10
0
10
10
10
15
15
20
20
1
10
25
25
5
5
10
10
15
15
20
20
25
25
Figure 1: Number of features learned versus
theFigure
regularization
parameter
γ (see
text
Figure
2: Test
2: Test
errorerror
(left)(left)
and and
residual
residual
of for
learned
ofdescription).
learned
features
features
(right)
(right)
vs. dimensionality
vs. dimensionality
of the
of input.
the input.
j.("(8*2"*2"5"&$&>#$&."H)+!$6,"2K"5'G"(8+"')-H"zz&zz%K#"*2"&$&)02$$48@"
{|"""&1(+-'5(+"H*'*H*s5D)'")C"1)22"b-(@"K"5'G"pK""
"""""""5'G"(8+"7)HB.(5D)'")C"(8+"b("""
;.Ha+-")C"C+5(.-+2"
6+2("+--)-"M1+N"5'G"-+2*G.51")C"1+5-'+G"C+5(.-+2"M-*38(N""
1+5-'+G"S+-2.2"(8+"
However, since the trace norm is nonsmooth, we have opted for the above alternating minimization
S2@"G*H+'2*)'51*(P")C"(8+"*'B.("
-+3.15-*s5D)'"B5-5H+(+-""""
strategy which
is simple to implement and has a natural interpretation. Indeed,
Algorithm 1 alter5.3
5.3
5.2
5.2
5.1
5.1
5
5
0.8
0.8
0.25
0.7
0.7
0.2
0.2
0.15
0.15
nately performs a supervised and an unsupervised step, where in the latter step we learn common
representations across the tasks and in the former step we learn task-specific functions using these
representations.
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
]]" !"#$%"
4.9
4.9
4.8
4.8
4.7
4.7
4.6
4.6
4.5
4.5
4.4
4.4
4.3
0
4.3
0
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.25
0.1
0.1
0.05
0.05
We conclude this section by noting that when matrix D in problem (3.2) is additionally constrained
]c" !"#$%"
to be diagonal, problem (3.2) reduces toI"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
problem (2.5). Formally, we have the following corollary.
Corollary 4.3. Problem (2.5) is equivalent to the problem
!
#for the
Figure
Figure
3: Test
3: "
Test
error
error
vs. vs.
number
number
of tasks
of tasks
(left)(left)
for computer
the computer
survey
survey
datadata
set. set.
Significance
Significance
of of
d
d (middle)
features
(middle)
and
and
attributes
learned
by
by
the most
important
important
feature
feature
(right).
(right).
min R(W, Diag(λ)) : W ∈ IRd×T
, λfeatures
∈ IR
λi ≤
1,attributes
λi #=learned
0 when
w ithe
#= 0most
(4.4)
+,
50
50
100
100 150
150 200
200
0.2
0.2
0
0
0.1
0.1
−0.05
−0.05
0
0
−0.1
2
42
64
86
10
8
12
10
14
12
14
−0.1
TE RAM SC
TE CPU
RAMHD
SC CD
CPU CA
HD CO
CD AV
CA WA
CO SW
AV GU
WA PR
SW GU PR
i=1
and the optimal λ is given by
=.1DE(52F"1+5-'*'3"b*(8"G++B"'+.-51"'+(b)-F2"
Query classification posterior
probability computed by sigmoid
Query Classification
Web Search
P (C1 |Q) P (C2 |Q)
W3t=C1
P (D1 |Q) P (D2 |Q)
Relevance measured by cosine
similarity
W3t=C2
l3 : Task-Specific
Q C1
Q C2
Representation
(128)
W2t=C1 W2t=C2
D1Sd
Q Sq
t=Sq
W2
W2t=Sd
l2 : Semantic Representation (300)
Shared
layers
Web search posterior probability
computed by softmax
W1
l1 : Letter 3gram (50k)
H
X: Bag-of-Words Input (500k)
D2Sd
W2t=Sd
On On
the
right,
right,
we have
we have
plotted
plotted
a residual
a residual
measure
measure
of how
of how
wellwell
the the
learned
learned
features
features
approximate
approximate
i the
$wthe
$2actual
used
to
generate
to generate
the data.
the data.
More
More
specifically,
specifically,
we depict
the Frobenius
the Frobenius
norm
norm
of the
of the
λi = the actual
, onesones
iused
∈ IN
(4.5) we depict
d.
$W
$
difference
difference
of CNN
the
of learned
theUsing
learned
andTransfer
and
actual
actual
D’s
D’s
versus
versus
thefrom
input
the input
dimensionality.
dimensionality.
We We
observe
observe
that that
adding
adding
Training
Learning
Pseudo-Tasks
73
=.1DE(52F"1+5-'*'3"b*(8"G++B"'+.-51"'+(b)-F2"
moremore
taskstasks
leadsleads
to better
to better
estimates
estimates
of the
of underlying
the underlying
features.
features.
2,1
Using this corollary we can make a simple modification to Algorithm 1 in order to use it for variable
Conjoint
Conjoint
analysis
analysis
experiment.
experiment.
We
We
then
tested
tested
the 1)
the
method
using
using
a real
a real
datadata
set about
set about
people’s
people’s
selection. That is, we modify the computation
of the
matrix
D (penultimate
line
inthen
Algorithm
asmethod
products
of products
fromfrom
[13].[13].
TheThe
datadata
waswas
taken
taken
fromfrom
a survey
a survey
of 180
of 180
persons
persons
whowho
ratedrated
the the
D = Diag(λ), where the vector λ = (λ1ratings
, . . ratings
. , λof
d ) is computed using equation (4.5).
likelihood
likelihood
of purchasing
of purchasing
one one
of 20
of different
20 different
personal
personal
computers.
computers.
HereHere
the persons
the persons
correspond
correspond
to to
taskstasks
and and
the PC
the models
PC models
to examples.
to examples.
TheThe
inputinput
is represented
is represented
by the
by following
the following
13 binary
13 binary
attributes:
attributes:
5 Experiments
telephone
telephone
hot hot
line line
(TE),
(TE),
amount
amount
of memory
of memory
(RAM),
(RAM),
screen
screen
sizesize
(SC),
(SC),
CPUCPU
speed
speed
(CPU),
(CPU),
hardhard
In this section, we present experiments on
a disk
synthetic
and
a real data set. In
all(CD),
ofcache
ourcache
experiments,
disk
(HD),
(HD),
CD-ROM/multimedia
CD-ROM/multimedia
(CD),
(CA),
(CA),
Color
Color
(CO),
(CO),
availability
availability
(AV),
(AV),
warranty
warranty
(WA),
(WA),
we used the square loss function and automatically
tuned
theguarantee
regularization
γ(PR).
with
leavesoftware
software
(SW),
(SW),
guarantee
(GU)
(GU)
andparameter
and
priceprice
(PR).
We We
also
also
added
added
an input
an input
component
component
accounting
accounting
for for
one-out cross validation.
the bias
the bias
term.
term.
TheThe
output
output
is anisinteger
an integer
rating
rating
on the
on scale
the scale
0−10.
0−10.
Following
Following
[13],[13],
we used
we used
4 examples
4 examples
per task
per task
asdata
the
as sets
test
the test
data
and and
8 examples
8 examples
task
pertask
task
as the
as training
the training
data.data.
Synthetic Experiments. We created synthetic
by data
generating
T =per
200
parameters wt from a 5-dimensional Gaussian
mean
and covariance
equal
to
As distribution
shown
As shown
in Figure
inwith
Figure
3,zero
the
3, performance
the
performance
of our
of our
algorithm
algorithm
improves
improves
withwith
the number
the number
of tasks.
of tasks.
It also
It also
Diag(1, 0.25, 0.1, 0.05, 0.01). These areperforms
theperforms
relevant
dimensions
we
wish
to learn.
To regressions,
these wewhose
much
much
better
better
thanthan
independent
independent
ridgeridge
regressions,
whose
test test
errorerror
is equal
is equal
to 16.53.
to 16.53.
In this
In this
kept adding up to 20 irrelevant dimensions
which
are
exactly
training
setswhich
were
particular
particular
problem,
problem,
it zero.
is italso
is The
also
important
important
toand
investigate
to test
investigate
which
features
features
are significant
are significant
to alltoconsumers
all consumers
selected randomly from [0, 1]25 and contained
5 and
10they
examples
respectively.
The We
outputs
and and
how
how
they
weight
weight
theper
13
thetask
computer
13 computer
attributes.
attributes.
We
demonstrate
demonstrate
the results
the results
in the
in two
the two
adjacent
adjacent
yti were computed from the wt and xti asplots,
ytiplots,
=which
%wwhich
ν, where
νwith
is with
zero-mean
Gaussian
were
obtained
obtained
the data
the data
for
all
for 180
all noise
180
tasks.
tasks.
In the
In middle,
the middle,
the distribution
the distribution
of the
of the
t , xwere
ti & +
with standard deviation equal to 0.1.(a) eigenvalues
eigenvalues
of DofisDdepicted,
is depicted,
indicating
indicating
that that
therethere
is a(b)
is
single
a single
mostmost
important
important
feature
feature
which
which
is shared
is shared
by all
bypersons.
all persons.
The
The
plotplot
on the
on right
the
right
shows
shows
the
weight
the weight
of each
of each
inputinput
dimension
dimension
in this
in this
mostmost
important
important
We first present, in Figure 1, the number
of
features
learned
by
our
algorithm,
as measured
by
feature.
Thisset
This
feature
seems
to
weight
to25
weight
the technical
the
technical
characteristics
characteristics
of aof
computer
a computer
(RAM,
(RAM,
CPUCPU
and and
rank(D). Fig.
The plot
on the left corresponds
tofeature.
a data
offeature
200seems
tasks
with
input
dimensions
and
1. Illustrating
the mechanism
of
transfer
learning.
(a)
Functional
view:
tasks
represented
asdiscern
CD-ROM)
CD-ROM)
against
against
its
price.
its
price.
Therefore,
Therefore,
in
this
in
this
application
application
our
algorithm
our
algorithm
is
able
is
able
to
discern
to
interesting
interesting
that on the right to a real data set of 180 tasks described in the next subsection. As expected, the
functional
mapping
share
stochastic
characteristics.
(b)
Transfer
learning
in
neural
networks,
the
patterns
patterns
in
people’s
in
people’s
decision
decision
process.
process.
number of features decreases with γ.
hidden layer represents the level of sharing between all the task.
School
School
data.
experiments
experiments
withwith
the
school
thewith
school
data
usedused
in [3]
in achieved
[3] achieved
explained
explained
variance
variance
Figure 2 depicts the performance of our algorithm
fordata.
TPreliminary
=Preliminary
10, 25, 100
and 200 tasks
along
the data
37.1%
compared
compared
toon29.5%
to
in that
in that
paper.
These
results
results
will
be reported
be reported
in future
in future
work.
work.
performance of 200 independent standard37.1%
ridge
regressions
the29.5%
data.
For
T =paper.
10,
25These
and
100,
wewill
averaged the performance metrics over runs on all the data so that our estimates have comparable
3.3agreement
A Bayesian
Perspective
6 and
6Conclusion
Conclusion
variance. In
with past empirical
theoretical
evidence (see e.g. [4]), learning multiple
"V&@"&8H+GK"Y@"X.K"d@"gpK"X@"})'3"5'G"[@"g*'3"M%$$?N@"I"9,#/*/*0%"/),#,-"/-#A%+))&.+',(#,&%
tasks together
significantly improves on learning the tasks independently. Moreover, the perforWe
have
We
have
presented
presented
an algorithm
anThis
algorithm
which
which
learns
learns
common
sparse
function
function
representations
representations
across
across
a pool
a pool
In
this
section
we
give
a
Bayesian
perspective
to our
the
transfer
learning
problem
formumance of the
algorithm
improves
when
more
tasks
are
available.
improvement
is common
moderate
for sparse
6/34#A%,)-'0*/='*%7'&)A3%43/*0%$,#*3+),%A)#,*/*0%+,'7%83)4&'.$#3B3L@"4-)7@"[,,iE%$$?\""
of
related
of related
tasks.
tasks.
To our
To
knowledge,
our knowledge,
approach
our approach
provides
provides
the first
the first
convex
convex
optimization
optimization
formulation
formulation
low dimensionalities
but
increases
as
the
number
of
irrelevant
dimensions
increases.
lated in Section 3.2. While
(1, 2)feature
are
all
whatAlthough
isAlthough
needed
tooptimization
implement
the
proposed
for multi-task
for multi-task
feature
learning.
learning.
convex
convex
optimization
methods
methods
havehave
beenbeen
derived
derived
for the
for the
"Vg@"T*.K"_@"}5)K"g@3"u+K"T@"Z+'3K"Y@"Z.8"5'G"X+EX*"d5'3"M%$#>N@"I"<)8,)3)*$#='*%>)#,*/*0%?3/*0%@4A=.9#3B%
ure 1: Architecture
of the Multi-task Deep Neural Network (DNN) for Representation Learning:
e lower layers areC))8%D)4,#A%D)$(',B3%+',%E)7#*=-%2A#33/F-#='*%#*&%G*+',7#='*%<)$,/)6#A"L@"4-)7@";&&,TK"=5P"%$#>\""
shared across all tasks, while top layers are task-specific. The input X (either a query or
cument, with vocabulary size 500k)
is first represented as a bag of words, then hashed
into letter 3-grams
]>" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Non-linear projection W1 generates the shared semantic representation, a vector l2 (dimension 300) that
rained to capture the essential characteristics of queries and documents. Finally, for each task, additional
n-linear projections W2t generate task-specific representations l3 (dimension 128), followed by operations
approach, the sole purpose
of this section is to give more insight to the role of]k"the!"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
pseudo tasks and to formalize the claims we made near the end of Section 3.1.
In Section 3.2, we hypothesized that the pseudo tasks are realizable as a linear projection from the feature mapping layer output, Φ(x; θ), that is:
=.1DE(52F"1+5-'*'3"b*(8"G++B"'+.-51"'+(b)-F2"
Training CNN Using Transfer Learning from Pseudo-Tasks
75
Z)H5*'"5G5B(5D)'"5'G"(-5'2C+-"1+5-'*'3"
Fig. 2. Joint training using transfer-learning from pseudo-tasks
"V&@"&8H+GK"Y@"X.K"d@"gpK"X@"})'3"5'G"[@"g*'3"M%$$?N@"I"9,#/*/*0%"/),#,-"/-#A%+))&.+',(#,&%
5 × 5 neighborhood; (4) C2 layer: 256 filters of size 6 × 6, connections with sparsity2
6/34#A%,)-'0*/='*%7'&)A3%43/*0%$,#*3+),%A)#,*/*0%+,'7%83)4&'.$#3B3L@"4-)7@"[,,iE%$$?\""
0.5 between the 16 dimensions of P1 layer and the 256 dimensions of C2 layer; (5) P2
]l" !"#$%"
layer: max pooling over I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
each 5 × 5 neighborhood; (6) output layer: full connections
between 256 × 4 × 4 P2 features and outputs. Moreover, we used least square loss for
pseudo tasks and hinge loss for classification tasks. Every convolution filter is a linear
function followed by a sigmoid transformation (see [15] for more details).
It is interesting to contrast our approach with the layer-wise training one in [20].
In [20], each feature extractionZ)H5*'"5G5B(5D)'"
layer is trained to model its input in a layer-wise fashion: the first layer is trained on the raw images and then used to produce the input to
the second feature extraction layer. The whole resulting architecture is then used as a
multilayered
feature extractor over labeled data, and the resulting representation is then
•  M6N"#=.")
used to feed an SVM classifier. On contrast, in our approach, we jointly train the classi–  feature
:HB-)S+"5"4*!%"4)+!"1'#=$&):/&#=$&"*'"(8+"(5-3+("G)H5*'".2*'3"
fier and the
extraction layers, thus the feature extraction layer training is guided
F')b1+G3+"C-)H"(8+"0$/!#")1$2*'&""
by the pseudo-tasks
as well as the labeled information simultaneously. Moreover, we
believe that the two approaches are orthogonal as we might first pre-train the network
using the method in [20], and then use the result as a starting point for our method. We
leave• this68+"4!*'&'&%)5'G"4"04)0"4"75'"a+"C-)H"(8+"0*2")1$2*'&K"a.("b*(8"G*o+-+'("
exploration for future work.
B-)a5a*1*(P"G*2(-*a.D)'2"
–  ,)ES5-*5(+"28*"
5 Generating
Pseudo Tasks
–  ,)'7+B("G-*"
We use a set of pseudo tasks to incorporate prior knowledge into the training of recognition models. Therefore, these tasks need to be 1) automatically computable based
•  n-"(8+P"75'"a+"C-)H"1'O"!"&4)1$2*'&0""
on unlabeled images, and 2) relevant to the specific recognition task at hand, in other
–  highly
6-5'2C+-"
words, it is
likely that two semantically similar images would be assigned similar
outputs under a pseudo task.
]m" !"#$%"
A simple approach to I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
construct pseudo tasks is depicted in Fig. 4. In this figure,
the
pseudo-task is constructed by sampling a random 2D patch and using it as a template
to form a local 2D filter that operates on every training image. The value assigned to an
image under this task is taken to be the maximum over the result of this 2D convolution
Much recent researchMuch
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
recent
]?" !"#$%"
Muchrecent
recent
Much
re
Mu
Correcting
bias
3 main
classes ofsampling
algorithms
Much
recent
Much
recentresearch
research
Correcting sampling bias
+
+ -68-++"H5*'"5BB-)578+2"
[This work]
- methods
Correcting
bias
Reweighting/Instance-based
3’09]
main
ofsampling
algorithms
+classes
Correcting
bias
3 main
classes
of
algorithms
[Sethy
et al.,
-sampling
+
[Sugiyama et al., ’08]
Much
Mu
Correcting
sampling
bias
Correcting sampling
bias
Correcting
sa
+
work]
[Muandet et al.,[This
’13]
+
al.,
’09]
++-+-et- methods
[This work]
[Sethy et al., ’09] [Pan
[Huang et al., Bickel
et al.,Reweighting/Instance-based
’07]
-- methods
Reweighting/Instance-based
+
Correct
a • sample
bias
by
reweighting
labeled
data:
)P"7"'%8=&%)3)@&04*&#">6*0"1)2"48$10)
[Sethy
et
al.,
’09]
- source[Gong
[Sugiyama et al., ’08]
et al., ’12]
-
++ - - sampli
+Correcting
Inferring
Correct
sample
bias
by reweighting
source
labeled
data:
[Chen
et
al.,
’12]
source instances
close
toatarget
instances
are
more
important.
[Daumé
III, ’07]
[Muandet
et al., ’13]++
[Pan
et al., ’09]et al.,
[Sethy
’09]
[Huang et al., Bickel et al., ’07]
sa
,)--+7("5"25HB1+"a*52"aP"-+b+*38D'3"2).-7+"15a+1+G"G5(5e""
Inferring
-- Correctsource
a sample
bytoreweighting
source
labeled
data: + -Correcting
instances
target
instances
are
more
[Shimodaira,
’00]
[Gopalan
al.,
’11]
[Blitzerbias
et close
al., ’06]
domain[Gong
et
al.,’09]
’12]
+
[Argyriou
et al, et
’08]
[Sethy
etimportant.
al.,
[Sethy et al., ’06]
+
Inferring
domain- [Sethy
+[Chen
et
al., ’12]
source
instances close to target
instances
are
more
important.
2).-7+"*'2(5'7+2"R71)2+U"()"(5-3+("*'2(5'7+2"5-+"H)-+"*HB)-(5'("
[Daumé
III, ’07]
[Sugiyama
et
al.,
’08]
+
et al., ’09]
[Sethy
+
[Sugiyama
etet
al.,al.,
’08]’09]
[Sethy
et al., ’09]
[Shimodaira, ’00]
[Gopalan et al., ’11] invariant
[Blitzer et al., ’06]
invariant
domain- - et al.,[Pan
[Evgeniou and Pontil, ’05]
[Pan
et
et
al.,
’09]
[Sugiyama
et
’08]
[Sugiyama
’08]al.,
[Huang
etrepresentation
al., et
Bickel
et’07]
al.,
’07]
Feature-based
methods/Find
new
spaces
[Huang
et
al.,
Bickel
al.,
Feature-based methods/Find
new
representation
spaces
[Sugiyama
et
al.,
’08]
[Sethy
et
al.,
’09]
+ features
+ invariant
features
[Huang
et al., Bickel
et al., ’07]et al, ’08]
+ et
[Argyriou
[Duan
[Sethy et al., ’06] [Huang
- al., ’09]
et al.,
Bickel
et
’07]et a
[Evgeniou and Pontil, ’05]
+
[Pan
[Sugiyama
et al.,
al.,
’08]
-
[Argyriou et al, ’08]
[Sethy et al., ’06]
[Sugiyama
et al.,
’08]
[Huang et al.,
Bickel
et al., ’07]
[Argyriou et al, ’08]
[Sethy et al., ’06]
[Gong et al., ’12]
[Chen et al., ’12]
[Daumé III, ’07]
[Shimodaira, ’00]
[Muandet et al., ’13]
[Pan et al., ’09]
[Gopalan et al., ’11]
[Blitzer et al., ’06]
[Evgeniou and Pontil, ’05]
[Duan et al., ’09]
[Argyriou et a
Find +
a common
space where
source
and
are
close
)Q"*4/!">6*0"1)2"48$10)3)Q'&1)&"7)!"+!"0"&4*=$&)0+*#"0)
[Huang
etrepresentation
al., target
Bickel
et’06]
al.,spaces
’07]
new
[Sethy
et close
al.,
-•  Feature-based
--+++[Daumé
[Sethy et al., ’06]
-al.,-methods/Find
Find +
a common
space
source
and
target
are
+ [Duan
Daumé
III et et
al.,al.,Saenko
-++ features
III, ’07]
- etwhere
’09]
+
[Sethy
et
’06]
new [Duan
features,
etc) et al., ’10]
[Huang
et al.,
Bickel
et al.,
al., ’07]
+(projection,
- - Find [Kulis
+
[Argyriou
et a[D
[Sethy
et al.,
-al., Chenspace
-+et+[Daumé
-’00]
+’06] -+[Blitzer
III,
’07]
et
et
al., et
’11]
+
afeatures,
common
where
source
and target
[Duan
al., Daumé
III et al.,models
Saenko
et [Shimodaira,
al., ’10] are close
[Gopae
al.,
’06]
Q*'G"5"7)HH)'"2B57+"b8+-+"2).-7+"5'G"(5-3+("5-+"71)2+""
(projection, new
etc)
[Shimodaira,
[Blitzer
Adjusting
mismatched
+
[Sethy’00]
et al., ’06]
+
-++[Daumé III, ’07]
--+
et al., Chenetc)
et al., ’11]
(projection, new[Kulis
features,
[Shimodaira, ’00]
[Duan et al., Daumé III et al., Saenko et al., ’10]
[Kulis et al., Chen et al., ’11]
Ajustement/Iterative
methods
MB-)0+7D)'K"'+b"C+5(.-+2K"+(7@N"
Adjusting
mismatched
models
[Blitzer
et[Shimodaira,
al., ’06] ’00]
-+
[Shimodaira, ’00]
[Blitzer et[Shimodaira,
al., ’06]
’00] an
[Evgeniou
Adjusting mismatched models
[Evgeniou and Pontil, ’05]
Ajustement/Iterative
methods
+
[Duan
et al.,
Ajustement/Iterative
methods
+
Modify the model by incorporating+pseudo-labeled
information
-[Duan +
et al.,--’09]
[Ev
[Duan et al., Daum
+
[Evg
- andIII Pontil,
[Evgeniou
’05]
[Evgeniou
’05]
•  )K1N/042"&4)3)@4"!*=.")2"48$10) +
[Kulis
et al.,
Chen
et a
[Duan et
al., Dauméand
etPontil,
al., Saenko
et al.
+
+
+
-
- -
++ -
-[D
[Du
+etal.,
+
+pseudo-labeled
[Kulis
et al., Chen
al.,-’11]
Modifyby
theincorporating
model by incorporating
information
Modify the model
pseudo-labeled
Adjusting
- information
[Duan
et al.,
’09]mismatch
=)G*CP"(8+"H)G+1"aP"*'7)-B)-5D'3"B2+.G)E15a+1+G"*'C)-H5D)'"
[Duan
et +
’09]
(LaHC)
(LaHC)
(LaHC)
-- + + -- [Duan e
+
[Duan
et
al.,
Daumé
III
et
al.,
Saenk
Adjusting
mismatched
models
[Kulis
et al.
[Duan et al., Daumé III et al., Saen
- [Kulis et al., Chen et al., ’11]
Domain Adaptation - EPAT’14
29 / 95
Adjusting
29 /’11]
95
[Kulis et al., Chen
et al.,
mism
Adjusting mism
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
[Duan et
[Kulis et al.,
29 / 95
c$" !"#$%"
Adjusting mismatched mod
Adjusting
mismatched mod
,)ES5-*5(+"28*)
P"7"'%8=&%)H+(8)G2"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
c#" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
c%" !"#$%"
,)S5-*5(+"28*"
Environnement non stationnaire
•  :'B.("G*2(-*a.D)'"785'3+2"
•  D$>.*!'*4")08'R)
Examples of Covariate Shift
–  Z/-*S+"S*-(.+11+"
225
•  Q.'7D)'51"-+15D)'"-+H5*'2"
(Weak) extrapolation:
.'785'3+G"Predict output values outside training region
–  ;)'"*@*@G@"
Training samples
•  D8*&%"2"&4)1")#$&#"+4)
–  2'*-)8$%&,/H%
Test samples
–  ;)'"*@*@G@"~"')'"2(5D)''5*-+"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
c]" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
cc" !"#$%"
4-*'7*B1+"
Co-variate shift
•  d+"F')b"(85("[<="*2"#$&0'04"&4)
"@:)(8+"(+2("G*2(-*a.D)'"*2"(8+"25H+"52"(8+"(-5*'"G*2(-*a.D)'"B(-5*'MON"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
c>" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Q*-2("5'51P2*2"
Q*-2("5'51P2*2"
A first analysis
A first analysis
RPT (h) =
E
(xt ,y t )∼PT
�
I h(xt ) �= y
Domain Adaptation - EPAT’14
RPT (h) =
�
t
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
(LaHC)
ck" !"#$%"
=
cl" !"#$%"
32 / 95
(LaHC)
E
(xt ,y t )∼PT
E
(xt ,y t )∼PT
�
�
I h(xt ) �= y t
�
PS (xt , y t ) � t
I h(x ) �= y t
PS (xt , y t )
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
c?" !"#$%"
32 / 95
Q*-2("5'51P2*2"
A first analysis
RPT (h) =
E
(xt ,y t )∼PT
�
I h(xt ) �= y
(xt , y t )
RPT (h) =
�
t
E
(xt ,y t )∼PT
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
=
cm" !"#$%"
32 / 95
⇒ Assume similar tasks, PS (y |x) = PT (y |x), then:
S
�
PT (xt , y t ) � t
I h(x ) �= y t
t
t
PS (x , y )
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
>$" !"#$%"
32 / 95
:11.2(-5D)'"
DS (x) = DT (x) ⇒ ω(x) = 1
�
DT (xt )PT (y t |xt ) � t
=
E
I h(x ) �= y t
(xt ,y t )∼PS DS (xt )PS (y t |xt )
�
DT (xt ) � t
=
E
I h(x ) �= y t
t
t
t
(x ,y )∼PS DS (x )
�
�
DT (xt )
= E
E
}I h(xt ) �= y t
(xt )∼DS DS (xt ) y t ∼PS (y t |xt )
With Bias
With
DS (x)Bias
�= DT (x) ⇒ ω(x) �= 1
DS (x) �= DT (x) ⇒ ω(x) �= 1
(xt )
DT
DS (xt )
Idea reweight labeled source data according
of ω(x t ):
� t to an testimate
�
t
E
ω(x )I h(x ) �= y
(xt ,y t )∼PS
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
(LaHC)
E
(xt ,y t )∼P
No Bias
No
Bias
DS (x)
= DT (x) ⇒ ω(x) = 1
Covariate shift [Shimodaira,’00]
(LaHC)
E
Illustration
Illustration
Q*-2("5'51P2*2"
⇒ weighted error on the source domain: ω(x t ) =
�
�
I h(xt ) �= y t
(x ,y )
(x ,y )
(LaHC)
E
(xt ,y t )∼PT
�
PS (xt , y t ) � t
I h(x ) �= y t
t
t
(xt ,y t )∼PT PS (x , y )
�
�
PS (xt , y t ) � t
=
PT (xt , y t )
I h(x ) �= y t
t
t
PS (x , y )
t t
=
�
�
PS
I h(xt ) �= y t
PS (xt , y t )
�
�
PS (xt , y t ) � t
=
PT (xt , y t )
I h(x ) �= y t
t
t
PS (x , y )
t t
=
Q*-2("5'51P2*2"
A first analysis
(LaHC)
>#" !"#$%"
33 / 95
(LaHC)
Domain Adaptation - EPAT’14
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
34 / 95
/ 95
>%"34 !"#$%"
Z*t7.1("752+"
Difficult case
:11.2(-5D)'"
Illustration
No shared support
∃x, DS (x) = 0 and DT (x) �= 0
Shared support
(LaHC)
Illustration
(LaHC)
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
>]" !"#$%"
35 / 95
:11.2(-5D)'"
(LaHC)
Illustration
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
DS (x) = 0 if and only if DT (x) = 0
Intuition: the quality of the adaptation depends on the magnitude on the
weights
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
>c"37 !"#$%"
/ 95
:11.2(-5D)'"
37 / 95
>>" !"#$%"
(LaHC)
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
37 / 95
>k" !"#$%"
Illustration
(LaHC)
:11.2(-5D)'"
:11.2(-5D)'"
Illustration
Domain Adaptation - EPAT’14
37 / 95
>l" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
(LaHC)
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
[O*2D'3"5BB-)578+2"
Some existing approaches
(1/2)
Some existing approaches (1/2)
>?" !"#$%"
37 / 95
4-*'7*B1+"
•  T5b")C"15-3+"'.Ha+-2"
Density estimators
Density
estimators
Build density
estimators for source and target domains and estimate the
–  h5HB1+"5S+-53+2"7)'S+-3+"()"(8+"B)B.15D)'"H+5'"
Build
density estimators
source and
target domains and estimate the
ratio between
them - Ex for
[Sugiyama
et al.,NIPS’07]:
�them
b
ratio ω̂(x)
between
Ex
[Sugiyama
et
al.,NIPS’07]:
= �l=1 αl ψl (x)
b
ω̂(x)
= argmin
l (x)
l=1 αl ψα
Learning:
KL(ω̂DS , DT )
Learning: argminα KL(ω̂DS , DT )
Learn the weights discriminatively [Bickel et al.,ICML’07]
Learn the weights
[Bickel et al.,ICML’07]
DT (xi ) discriminatively
1
Assume DS (xi ) ∝ p(q=1|x,θ)
1
T (xi )
Assume D
DS (xi with
) ∝ p(q=1|x,θ)
Label source
label 1, target with label 0 and train a classifier (θ̂)
Label
source
with label
with logistic
label 0 regression)
and train a classifier (θ̂)
to classify
examples
1 or1,0 target
(e.g. with
to classify examples 1 or 0 (e.g.
1 regression)
s with logistic
Compute the new weights ω̂(xi ) =
1 s
Compute the new weights ω̂(xsi ) = p(q = 1|xsi ; θ̂)
p(q = 1|xi ; θ̂)
(LaHC)
(LaHC)
Domain Adaptation - EPAT’14
38 / 95
Domain Adaptation - EPAT’14
38 / 95
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
>m" !"#$%"
–  j.("8)b"()"+2DH5(+"
"
"
"S)
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
k$" !"#$%"
:HB)-(5'7+"b+*38D'3"
•  &"'5ÄS+"+2DH5D)'")C
"
"
,)S5-*5(+"28*"*'"-+3-+22*)'"
"G)+2"')("b)-F"
–  [2DH5D)'"G+'2*(P"*2"())"7-.G+"*'"8*38"G*H+'2*)'"2B57+"M5'G"b*(8"C+b"
F')b'"(+2D'3"*'2(5'7+2N"
•  :G+5")C"h.3*P5H5e"
–  E"*!&)*)+*!*2"4!'#)2$1",")C""
5'G"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
k#" !"#$%"
,)S5-*5(+"28*"*'"71522*^75D)'"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Bad news
k%" !"#$%"
j.("
DA is hard, even under covariate shift [Ben-David et al.,ALT’12]
⇒ To learn a classifier the number of examples depend on |H| (finite)
or exponentially on the dimension of X
Covariate shift assumption may fail: Tasks are not similar in general
PS (y |x) �= PT (y |x)
We did not consider the hypothesis space.
Can define a general theory about DA?
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
k]" !"#$%"
(LaHC)
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
kc" / 95
!"#$%"
40
68+"H#H"G*S+-3+'7+"Vj+'EZ5S*G"+("51@K";:4hE%$$kK"=T0`#$\"
The H∆H-divergence [Ben-David et al.,NIPS’06;MLJ’10]
&"^-2("5v+HB("5("5"(8+)-+D751"C-5H+b)-F"
Definition
dH∆H (DS , DT ) =
C)-"1$2*'&)*1*+4*=$&)
=
sup
(h,h� )∈H2
sup
k>" !"#$%"
�
�
�tE
(h,h� )∈H2 x ∼DT
h85*"j+'EZ5S*G"5'G"7)11+53.+2"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
�
�
�
�
�RDT (h, h� ) − RDS (h, h� )�
�
�
I h(xt ) �= h� (xt ) − s E
x ∼DS
�
���
I h(xs ) �= h� (xs ) �
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
(LaHC)
Domain Adaptation - EPAT’14
kk" !"#$%"
43 / 95
68+"H#H"G*S+-3+'7+"Vj+'EZ5S*G"+("51@K";:4hE%$$kK"=T0`#$\"
The H∆H-divergence [Ben-David et al.,NIPS’06;MLJ’10]
68+"H#H"G*S+-3+'7+"Vj+'EZ5S*G"+("51@K";:4hE%$$kK"=T0`#$\"
Computable from samples
Definition
Consider two samples S, T of size m from DS and DT
dH∆H (DS , DT ) =
=
sup
(h,h� )∈H2
sup
(h,h� )∈H2
�
�
�
�
�RDT (h, h� ) − RDS (h, h� )�
�
�
�tE
x ∼DT
�
�
I h(xt ) �= h� (xt ) − s E
x ∼DS
Illustration with only 2 hypothesis in H h and h�
�
���
I h(xs ) �= h� (xs ) �
dH∆H (DS , DT ) ≤ dH∆H (S, T ) + O(complexity(H)
�
log(m)
m )
complexity(H): VC-dimension [Ben-david et al.,06;’10], Rademacher [Mansour et al.,’09]
Empirical estimation


1
d̂H∆H (S, T ) = 2 1 − minh∈H 
m
�
x:h(x)=−1

1 �
I [x ∈ S] +
I [x ∈ T ]
m
x:h(x)=1
⇒ Already seen: label source examples as -1, target ones as +1 and try to
learn a classifier in H minimizing the associated empirical error
Note: With a larger H, the distance will be high since we can easily find
two hypothesis able to distinguish the two domains
(LaHC)
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
43 / !"#$%"
95
kl"
(LaHC)
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
44 /!"#$%"
95
k?"
A first bound
})*'3"()"5"3+'+-51*s5D)'"a).'G"
Going to a generalization bound
Preliminaries
RPT (h, h� ) =
E
(x,y )∼PS
I [h(x) �= h� (x)] = E
x∼DT
RPT (h) ≤ RPT (h∗ ) + RPT (h, h∗ )
I [h(x) �= h� (x)]
≤ RPT (h∗ ) + RPS (h, h∗ ) + RPT (h, h∗ ) − RPS (h, h∗ )
RPT (RPS ) fulfills the triangle inequality
|RPT
(h, h� )
− R PS
(h, h� )|
≤
≤ RPT (h∗ ) + RPS (h, h∗ ) + |RPT (h, h∗ ) − RPS (h, h∗ )|
1
≤ RPT (h∗ ) + RPS (h, h∗ ) + dH∆H (DS , DT )
2
1
∗
≤ RPT (h ) + RPS (h) + RPS (h∗ ) + dH∆H (DS , DT )
2
1
≤ RPS (h) + dH∆H (DS , DT ) + λ
2
�
1
log(m)
≤ RS (h) + dH∆H (S, T ) + O(complexity(H)
)+λ
2
m
1
2 dH∆H (DS
� , DT )
�
�
�
since dH∆H (DS , DT ) = 2 sup(h,h� )∈H2 �RDT (h, h� ) − RDS (h, h� )�
hS∗ = argminh∈H RPS (h): best on source
∗ = argmin
hT
h∈H RPT (h): best on target
Ideal joint hypothesis
h∗ = argminh∈H RPS (h) + RPT (h) ; λ = RPS (h∗ ) + RPT (h∗ )
(LaHC)
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
45 /!"#$%"
95
km"
=5*'"(8+)-+D751"a).'G"
Main theoretical bound
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
(LaHC)
Domain Adaptation - EPAT’14
La théorie classique de l’adaptation de domaine
Les résultats de S. Ben-David et al. et Mansour et al.
1
RPT (h) ≤ RPS (h) + dH∆H (DS , DT ) + λ
2
Théorème
classique
[Ben-Daviddeetdomaine
al., 2010, Mansour et al., 2009a]
La
théorie classique
de l’adaptation
Les résultats declassique
Ben-David[Ben-David
et al. et Mansouret
et al.,
al. 2010, Mansour et al., 2009a]
Théorème
Soit H unS.espace
d’hypothèses.
Si DS et DT sont deux distributions sur X , alors :
Soit H un espace d’hypothèses. Si DS et DT sont deux distributions sur X , alors :
erreur cible
� �� �
erreur cible
Théorème classique
et al.,
Mansour
al., 2009a]
� �� R� (h)
∀h ∈[Ben-David
H,
≤ 2010,
RPS (h)
+ 12 detH (D
PT
S , DT ) + ν
1
∀h ∈d’hypothèses.
H, RPT (h) Si
≤ DRPet
(h) +��2 d�
,�
DT ) + ν ��sur X , alors
H (DSdistributions
S D � sont
� :
Soit H un espace
deux
�S ��
�T
erreur source
erreur cible
divergences
erreur source
� �� �
1
∀h ∈ de
H, l’adaptation
RPT (h) ≤de Rdomaine
La théorie classique
PS (h) + 2 dH (DS , DT ) + ν
�
��
�
Les résultats de S. Ben-David et al. et Mansour et al. � �� �
erreur
source
R1PS (h) : erreur classique sur le domaine
source
d (DS , DT )
2 H
: la H-divergence entre DS et DT
divergences
Minimisable via une méthode de classification supervisée sans adaptation
Formalizes a natural approach: Move closer the two distributions while
ensuring a low error on the source domain.
Justifies many algorithms:
reweighting methods,
� 2010, Mansour et al., 2009a]
�
Théorème classique [Ben-David et al.,
1
�
�
�
�
1
despace
, H-divergence
DT ) = sup
(h,
) −distributions
RDS (h, h )sur
d (Dun
,2D
: Slad’hypothèses.
DDD
DThdeux
�R
� X , alors :
H)(D
T
STTet
Soit
Si entre
DS et
sont
2 HH S
(h,h� )∈H2
�
� s
� �� � ��
� h� ) − R1� (h,t h� )�� � t �
sup ≤
�RDR
� Th) (x
E + I Ddh(x
R=
D
+ )ν − s E I h(x )
T (h,
SH (D)
PT (h)sup
P�S (h)
S ,�=
2
x ∼DS
xt ∼D
� T �2
(h,h� )∈H
��
�
(h,h� )∈H�2 ��
� erreur source
divergences
�
�
�
���
�
= sup � t E I h(xt ) �= h� (xt ) − s E I h(xs ) �= h� (xs ) �
erreur cible
1
d (DS , D
)=
∈TH,
2 H ∀h
Emilie Morvant (LIF-Qarma)
Apprentissage
de vote de majorité
T
(h,h� )∈H2 x ∼D
x ∼DS
18 septembre 2013
feature-based methods,
ν : divergence
entre les étiquetages
Emilie Morvant (LIF-Qarma)
Apprentissage de vote de majorité
�
�
�
�
ν = Emilie
inf h�Morvant
∈H RP
(LIF-Qarma)
Apprentissage de vote de majorité
S (h ) + RPT (h ) ,
erreur jointe optimale [Ben-David et al., 2010]
adjusting/iterative methods.
ou ν = RPT (hT∗ ) + RPT (hT∗ , hS∗ ),
∗
hX
est la meilleure hypothèse sur le domaine X [Mansour et al., 2009a]
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
46 / 95
Les résultats de S. Ben-David et al. et Mansour et al.
•  La
p'+"H*2+"+'"(8/)-*+"B*)''*y-+"
théorie classique de l’adaptation de domaine Vj+'EZ5S*G"+("51@K"%$#$\"
Let H a symmetric hypothesis space. If DS and DT are respectively
the marginal distributions of source and target instances, then for all
δ ∈ (0, 1], with probability at least 1 − δ :
(LaHC)
l$" !"#$%"
T`5G5B(5D)'"G+"G)H5*'+"
Theorem [Ben-David et al.,MLJ’10,NIPS’06]
∀h ∈ H,
})*'3"()"5"3+'+-51*s5D)'"a).'G"
l#"
47 /!"#$%"
95
Emilie Morvant (LIF-Qarma)
18 septembre 2013
���
�= h� (xs ) �
11 / 40
18 septembre 2013
11 / 40
G&I)%e"#$&04!/'!").'"').S+1"
+2B57+"G+"B-)0+7D)'""
G5'2"15Å.+11+"1+2"G+.O"
G*2(-*a.D)'2"2)'("B-)78+2K"
().("+'"35-G5'(".'+"a)''+"
B+-C)-H5'7+"2.-"1+"
G)H5*'+"2).-7+"
11 / 40
18 septembre 2013
11 / 40
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Apprentissage de vote de majorité
l%" !"#$%"
4-*'7*B1+"
•  :'(+3-5(+"2)H+"'&:$!2*=$&)*6$/4)48")4*!%"4"25HB1+2"*(+-5DS+1P"
"{|".2+",+#12-3"$4#"+!
•  P"2$.")3)*11"2)H+"'&04*&#"0)2)"52"()"7'6)%$")%3'4,-)%
K1N/0=&%)3)@4"!*=.")2"48$10"
&/3$,/J4='*%$'(#,&3%$")%$#,0)$%&/3$,/J4='*%
•  P"+"*4)(8+"B-)7+22"/&=,)7)'S+-3+'7+")-"')"-+H5*'*'3"*'2(5'7+2"
l]" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
FK;TI"
Vj-.s)'+"+("51@K"%$#$\"
DASVM
[Bruzzone
et etal.,’10]
DASVM
[Bruzzone
al.,’10]
Z&hi=e"513)-*(8H"
A brief
recap
on SVM
A brief
recap
on SVM
KL  >E%M%E%
nn
Learning
sample
LS LS
= {(x
Learning
sample
= {(x
yi )}
i , yi ,i )}
i=1
i=1
%@  T+5-'"5"71522*^+-""$"C-)H"(8+"1+5-'*'3"25HB1+">E""
a classifier
h(x)
= �w,
LearnLearn
a classifier
h(x)
= �w,
x�x�
++b b
Formulation:
min
Formulation:
min
w,b,ξ
w,b,ξ
UV  P"+"*4)/&=,"2()BB*'3"7-*(+-*)'"
�
�nn
1 1 �w�
2 2++ CC
ξii
i=1 ξ
2
22
i=1
2 �w�
–  ;","#4)(8+"^-2("B)4*!%"4)"H*2+,"0"5(""2@(@"$"Ç""M5("N"Ç"#"b*(8"8*38+2("H5-3*'""
5'G"5o+7("(8+"B2+.G)E15a+1"E#"
subject
�i (�w,x x�i �++b)
b) ≥
≥ 11 −
− ξξi ,, 11≤≤i i≤≤n n
subject
to to �i (�w,
i
i
ξ�0
–  ;","#4)(8+"^-2("B)4*!%"4)"H*2+,"0"5("2@(@"E#"Ç""M5("N"Ç"$"b*(8"8*38+2("H5-3*'""
5'G"5o+7("(8+H"(8+"B2+.G)E15a+1"~#"
ξ�0
–  K11)(8+2+"%B"+O5HB1+2"MB2+.G)E15a+1+GN"()"67""
–  P"2$.")C-)H"67"(8+"^-2("B"B)2*DS+"5'G"B"'+35DS+"2).-7+"*'2(5'7+2"b*(8"8*38+2("H5-3*'"
WV  M/4+/4)(8+"152("71522*^+-"
(LaHC)
(LaHC)
lc" !"#$%"
Domain Adaptation - EPAT’14
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
68 / 95
68 / 95
l>" !"#$%"
"&13)-*(8H"2()B2"b8+'"(8+"'.Ha+-")C"2+1+7(+G"*'2(5'7+2"5("+578"2(+B"C5112"G)b'"a+1)b"
5"(8-+28)1G@"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
lk" !"#$%"
Z&hi=e"*11.2(-5D)'"
DASVM - graphical
illustration
Z&hi=e"*11.2(-5D)'"
DASVM - graphical
illustration
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
(LaHC)
ll" !"#$%"
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
70 / 95
Z&hi=e"*11.2(-5D)'"
DASVM - graphical
illustration
(LaHC)
Domain Adaptation - EPAT’14
70 / 95
Z&hi=e"*11.2(-5D)'"
DASVM - graphical
illustration
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
(LaHC)
Domain Adaptation - EPAT’14
l?" !"#$%"
lm" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
70 / 95
(LaHC)
Domain Adaptation - EPAT’14
?$" !"#$%"
70 / 95
Z&hi=e"*11.2(-5D)'"
DASVM - graphical
illustration
Z&hi="
•  68+-+"5-+"48"$!"=#*,)04/1'"0)
–  "j52+G")'"(8+"')D)'")C"b+5F"1+5-'+-2"
–  ":'")-G+-"()"G+(+-H*'+"(8+"7)'G*D)'2"5'G"3.5-5'(++2")C"Z&hi="
•  68+-+"5-+"*++,'#*=$&0)
–  "[@3@"*'"(8+"G)H5*'")C"785-57(+-"-+7)3'*D)'"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
(LaHC)
?#" !"#$%"
Domain Adaptation - EPAT’14
?%" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
70 / 95
Idea
:G+5"Change the feature representation X to better represent shared
characteristics between the two domains
some features are domain-specific,
others are generalizable
or there exist mappings from the original space
•  D8*&%")48"):"*4/!")!"+!"0"&4*=$&"N"()"a+v+-"
⇒ Make source and target domain explicitely similar
-+B-+2+'("285-+G"785-57(+-*2D72"a+(b++'"(8+"(b)"
⇒ Learn a new feature space by embedding or projection
G)H5*'2"
–  "2)H+"C+5(.-+2"5-+"G)H5*'E2B+7*7K"
Q"*4/!")!)9!$N"#=$&)a52+G"*++!$*#8"0"
–  ")(8+-2"5-+"3+'+-51*s5a1+"
–  ")-"(8+-+"+O*2("H5BB*'32"C-)H"(8+")-*3*'51"2B57+"
(LaHC)
Domain Adaptation - EPAT’14
53 / 95
"{|"=5F+"2).-7+"5'G"(5-3+("G)H5*'"+OB1*7*(1P"0'2',*!)
"{|"T+5-'"5"&"7):"*4/!")0+*#""aP"+Ha+GG*'3")-""
"""""B-)0+7D)'"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
?]" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
?c" !"#$%"
Q*'G"15(+'("2B57+2"9"h(-.7(.-51",)--+2B)'G+'7+"T+5-'*'3""
Find:11.2(-5D)'e"
latent spaces
- Structural Correspondence Learning
Vj1*(s+-"+("51@K"%$$l\"
[Blitzer et al.,’07]
Identify shared features
Q*'G"15(+'("2B57+2"9"h(-.7(.-51",)--+2B)'G+'7+"T+5-'*'3""
Find:11.2(-5D)'e"
latent spaces
- Structural
Correspondence Learning
Vj1*(s+-"+("51@K"%$$l\"
[Blitzer et al.,’07]
Apply PCA source+target new features to get a low rank latent
representation
Learn a classifier in the new projection space defined by PCA
Sentiment analysis - Bag of words (bigrams)
Choose K pivot features (frequent words in both domains, highly
correlated with labels)
Learn K classifiers to predict pivot features from remaining features
For each feature add K new features
Represents source and target data with these features
(LaHC)
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
57 / 95
?>" !"#$%"
:11.2(-5D)'e"=5'*C)1GEa52+G"H+(8)G2"
Manifold-based
methods
58 / 95
?k" !"#$%"
[Gopalan et al.,’10]
Apply PCA on source data ⇒ matrix S1 of rank d
Apply PCA on target data ⇒ matrix S2 of rank d
Geodesic path on the Grassman manifold GN,d (d-dimensional vector
subspaces ⊂ RN ) between S1 and S2
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
:11.2(-5D)'e"=5'*C)1GEa52+G"H+(8)G2"
Manifold-based
methods
Assume X ⊆ RN
(LaHC)
(LaHC)
?l" !"#$%"
59 / 95
Use of an exponential flow ψ(t � ) = Qexp(t � B)J
with Q N × N matrix with determinant 1 s.t. QT S1 = J and JT = [Id 0N−d,d ]
intermediate subspaces are obtained by computing B (skew block-diagonal matrix)
and varying t � between 0 and 1
Take a collection S � of l subspaces between S1 and S2 on the manifold
Project the data on S� and learn in that new space
(LaHC)
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Domain Adaptation - EPAT’14
??" !"#$%"
60 / 95
A simpler approach - Subspace alignment [Fernando et
al.,ICCV’13]
&"2*HB1+-"5BB-)578"9";/60+*#")*,'%&2"&4"VQ+-'5'G)"+("51@K":,,iE$]\"
h.a2B57+"51*3'H+'("513)-*(8H"
Subspace alignment algorithm
Algorithm 1: Subspace alignment DA algorithm
Data: Source data S, Target data T , Source labels YS , Subspace dimension d
Result: Predicted target labels YT
S1 ← PCA(S, d)
(source subspace defined by the first d eigenvectors) ;
S2 ← PCA(T , d)
(target subspace defined by the first d eigenvectors);
X a ← S1 S1 � S 2
(operator for aligning the source subspace to the target
one);
Sa = SXa
(new source data in the aligned space);
T T = T S2
(new target data in the aligned space);
YT ← Classifier (Sa , TT , YS ) ;
M∗ = S1 � S2 corresponds to the “subspace alignment matrix”:
M∗ = argminM �S1 M − S2 �
Move
closer PCA-based representations
•  =)S+"71)2+-"()"4,&Ea52+G"-+B-+2+'(5D)'2"
Totally
unsupervised
•  6)(511P".'2.B+-S*2+G"
(LaHC)
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
Xa = S1 S1 � S2 = S1 M∗ projects the source data to the target
subspace
61 / 95
?m" !"#$%"
A natural similarity: Sim(xs , xt ) = xs S1 M∗ S1 � x�t = xs Ax�t
(LaHC)
Domain Adaptation - EPAT’14
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
62 / 95
m$" !"#$%"
Q+5(.-+Ea52+G"H+(8)G2"
•  w"5-+"S+-P"B)B.15-"
•  u)("()B*7"-*38("')b"
P":"!"&#"0"
•  n'+"7+'(-51"Å.+2D)'e""
–  Z+^'+"5"2*H*15-*(P"H5B"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
m#" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
m%" !"#$%"
<+C+-+'7+2")'"Z)H5*'"&G5B(5D)'"5'G"6-5'2C+-"
" 
T*2(")C"(-5'2C+-"1+5-'*'3"B5B+-2"
8vBe!!bbb#@*%-@5E2(5-@+G.@23!É02B5'!7)'C+-+'7+6T@8(H1"
" 
T*2(")C"5S5*15a1+"2)b5-+2"
8vBe!!bbb@[email protected](@8F!6T!*'G+O@8(H1"
" 
h.-S+P2"
" 
– 
– 
45(+1K"})B515'K",8+115BB5@"i*2.51"Z)H5*'"&G5B(5D)'e"&'"nS+-S*+b")C<+7+'("&GS5'7+2@"6+78"-+B)-(K"%$#c@"
– 
– 
=5-3)1*2@"&"T*(+-5(.-+"<+S*+b")C"Z)H5*'"&G5B(5D)'"b*(8"p'15a+1+G"Z5(5@"6+78"-+B)-("%$##@"
&GG*D)'51"<+C+-+'7+2"
x*"T*@"T*(+-5(.-+"h.-S+Pe"Z)H5*'"&G5B(5D)'"&13)-*(8H2"C)-";5(.-51"T5'3.53+"4-)7+22*'3K"6+78"-+B)-(K"%$#%"
45'"5'G"X5'3@"&"2.-S+P")'"6-5'2C+-"T+5-'*'3`K"6YZ["%$#$@"
_@"x.*)'+-)E,5'G+15"5'G"=@"h.3*P5H5"5'G"&@"h78b5*38)C+-"5'G";@Z@"T5b-+'7+"M[G2N"
Z5(52+("h8*"*'"=578*'+"T+5-'*'3"
=:6"4-+22K"%$$m"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
h@"j+'EZ5S*G"
6)b5-G2"(8+)-+D751".'G+-2(5'G*'3")C"G)H5*'"
5G5B(5D)'"1+5-'*'3"
d)-F28)B"T;::Z"5("[,=TE$m"
" 
&@"u5a-5-G"
&'":'(-)G.7D)'"()"6-5'2C+-"T+5-'*'3"5'G"Z)H5*'"
&G5B(5D)'"
[7)1+"G`/(/"[4&6E%$#c"
" 
_@"j1*(s+-"5'G"u"Z5.H/":::"
Z)H5*'"&G5B(5D)'"
6.()-*51":,=T"%$#$"
" 
h@"45'K"x@"X5'3"5'G"d@"Q5'""
6.()-*51e"6-5'2C+-"T+5-'*'3"b*(8"&BB1*75D)'2"
:_,&:`#]"
" 
Y@"}-5.H5''"
&G5B(5D)'"C)-")a0+7(2"5'G"5v-*a.(+2"
d)-F28)B"i*2Z&"5(":,,i`#]"
" 
" 
&@"u5a-5-GK"_E4"4+P-578+"5'G"=@"h+aa5'"
:(+-5DS+"2+1CE15a+1*'3"Z)H5*'"&G5B(5D)'"C)-"T*'+5-"
h(-.7(.-+G":H53+",1522*^75D)'"
:_&:6E%$#]"
Q@"h85"5'G"j@"Y*'32a.-P""
Z)H5*'"&G5B(5D)'"*'"=578*'+"T+5-'*'3"5'G"
hB++78"<+7)3'*D)'"
6.()-*51"9":'(+-2B++78"%$#%"
" 
Z@"g.K"Y@"h5+'F)"5'G":@"65'3"
6.()-*51")'"Z)H5*'"6-5'2C+-"T+5-'*'3"C)-"i*2*)'"
&BB1*75D)'2"
,i4<`#%"
" 
&@"u5a-5-GK"_E4"4+P-578+"5'G"=@"h+aa5'"
j))2D'3"C)-".'2.B+-S*2+G"G)H5*'"5G5B(5D)'"
[,=TE%$#]"
m]" !"#$%"
&G5B(5D)'"G+"G)H5*'+"W"5'51)3*+"
•  &BB-+'G-+"Ñ"15"C)*2"e"
" 
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
mc" !"#$%"
&'"5BB-)578"()"5'51)3Pe".2*'3"Y)1H)3)-)S"7)HB1+O*(P"
Dictionnaires
x
–  p'+"a)''+"!"+!X0"&4*=$&))
CS
hx
•  Z."G)H5*'+"2).-7+"
•  Z."G)H5*'+"7*a1+"
x'
xt+1
xt
CT
ft
y
Mt
Mt+1
ft+1
–  p'+"a)''+"-y31+"G+"4!*&0:$!2*=$&)
Source
Cible
yt+1
yt
Dictionnaires
aababc
abc
hx
abd
Source
CS
• lettre
• succ-lettre
• ...
CT
• groupe
(règle-construc)
• succ-groupe
• ...
Cible
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
V,)-'./0)12K"#mmkK"#mmlK"#mm?K"%$#k\"
m>" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
mk" !"#$%"
est-ce que la règle ! S est de transformer tous les c par des d ?) ainsi que le transfert de la
source vers la cible (comment percevoir iijjkk ?, et quelle est la règle !C adéquate ?).
4.2
Théorie du domaine et longueurs de description
La théorie du domaine qui permet de décrire les différents aspects des objets du monde
inclut des primitives de représentation, ainsi que des structures de base. La table 1 ci-dessous
fournit la liste de celles que nous avons définies pour ce travail.
&'"5BB-)578"()"5'51)3Pe".2*'3"Y)1H)3)-)S"7)HB1+O*(P"
•
impératif
respecter
particulier
impératifde
respecterles
lescontraintes
contraintesdu
ducalcul
calculdes
desprobabilités,
probabilités,c'est-à-dire
c'est-à-direen
particulierque
que
impératif
dederespecter
les
contraintes
du
calcul
des
probabilités,
c'est-à-dire
enenparticulier
que
Descripteurs utilisés dans la définition des structures :
- orientation (-> / <-)
- cardinalité ou nombre d'éléments : n
1 bit
log2(n) + 1 bits
- type d'éléments
- longueur : l
•
•
•
•
xt+1
xt
ft
yt
Mt
Mt+1
•
•
ft+1
yt+1
•
•
&'"5BB-)578"()"5'51)3Pe".2*'3"Y)1H)3)-)S"7)HB1+O*(P"
la
V,)-'./0)12K"#mmkK"#mmlK"#mm?K"%$#k\"
sommedes
desprobabilités
probabilitésd'événements
d'événementsexhaustifs
exhaustifset
mutuellementexclusifs
exclusifségale
égale1.
lalasomme
somme
des
probabilités
d'événements
exhaustifs
etetmutuellement
mutuellement
exclusifs
égale
1.1.
Ainsi
Ainsil'objet
l'objet''abc
'abc'' pourrait
' pourraitêtre
êtrereprésenté
représentépar
par:: :
Ainsi
l'objet
abc
pourrait
être
représenté
par
(voir en-dessous)
log2(l) + 1 bits
- commençant ou se terminant par l'élément = x
L(x) bits
Lettre
(1/2) -> 1 bit
Une lettre particulière (e.g. 'd')
(1/2.26) -> 6 bits
Chaîne (orientation,éléments)
(1/8) -> 3 bits
L = 3 + L(orientation) + ! L(éléments)
e.g. L('a3bd' avec orientation = ->) = 3 + 1 + log2((1/2.26)3) + L(3)
xt+1
xt
ft
= 3 + 1 + 18 + 3 = 25 bits
Ensemble (type d'éléments, cardinalité, éléments)
(1/8) -> 3 bits
L = 3 + L(type) + L(cardinalité) + ! L(éléments)
Groupe (type d'éléments, nombre d'éléments, éléments)
(1/8) -> 3 bits
L = 3 + L(type) + L(nb él.) + ! L(éléments)
Séquence (orientation, type d'éléments, loi de succession ou nombre
d'éléments, longueur, commençant ou se terminant par)
(1/8)
L = 3+ L(orient.) + L(type) + L(loi) or L(nb él.) + L(long) + L(début/fin)
Description et longueur d'une loi de succession
succ(type-of-el.,n,x) ! le nième successeur de l'élément x du type type-of-el.
L = L(type) + L(n (voir ci-dessous)) + L(x)
L(n) = L(1/6)
si n=1 ou -1
(1er successeur ou prédécesseur)
L(1/3)
si n=0
(même élément)
L((1/3).(1/2)p) sinon (avec p=n si n"O, p=-n sinon)
Premier / Dernier (par rapport à l'orientation définie)
1 bit
nième
n bits
'abc' ! Chaîne
(1/8)
'abc'! ! Chaîne
Chaîne
(1/8)
'abc'
(1/8)
orientation
->
(1/2)
orientation:
->
(1/2)
orientation
::->
(1/2)
3
1er='A',
(1/4.26)
33
1er='A',2ème='B',
2ème='B',3ème='C'
3ème='C'
(1/4.26)
1er='A',
2ème='B',
3ème='C'
(1/4.26)
TOTAL
21
TOTAL(longueur)
(longueur) :
21bits
bits
TOTAL
(longueur)
:: 21
bits
Mt
yt
Mt+1
ou
oubien
bienpar:
par:
ou
bien
par:
ft+1
'abc'
Ensemble
'abc'!
Ensemble
'abc'
! ! Ensemble
{'A',
'B',
'C'}
{'A','B',
'B','C'}
'C'}
{'A',
yt+1
'abc'
Séquence
'abc'!
Séquence
'abc'
! ! Séquence
orientation
::->
->
orientation:
->
orientation
=
l(1/12)=
bits
==l(1/12)
l(1/12)
==4
44bits
bits
longueur
3
longueur=
bits
longueur
==3
33
33bits
bits
commençant
(1/26)
commençantavec
avecl'élément(lettre='A')
l'élément(lettre='A')
(1/26)
commençant
avec
l'élément(lettre='A')
(1/26)
TOTAL
TOTAL:
TOTAL
::
4.3
4.3
1
1
1
1
4
1
10 bits
• 
iijjkk => iijjll
=> iijjkl
Solution 13 ::"Remplacer
iijjkk
=> iijjll
Solution
"Remplacergroupe
lettre de
dedroite
droitepar
parson
D"successeur"
iijjkk
=> iijjkd
Solution 24 ::"Remplacer
delettre
droitepar
parson
son successeur"
successeur" iijjkk =>iijjkk
iijjkl => iikjkk
Solution
"Remplacerlettre
3ème
Solution 3 : "Remplacer lettre de droite par D"
iijjkk => iijjkd
Solution
5 : "Remplacer les C par D"
iijjkk => iijjkk
Solution 1 : "Remplacer groupe de droite par son successeur"
iijjkk =>
iijjkk
iijjkk =>
iijjkk =>
Solution 4 : "Remplacer 3ème lettre par son successeur"
Solution 6 : "Remplacer groupe de droite par la lettre D"
Solution 5 : "Remplacer les C par D"
Solution 6 : "Remplacer groupe de droite par la lettre D"
P1;S1
P1;S2
P1;S3
P1;S4
P1;S4 11
11 18
P1;S5
12
18
7
22
8
15
3
7
0
8
0
3
17
0
36
0
42
17
15
10
P1;S1
10 8
P1;S2 9
9 18
L(S
|MS)
L(
!SS|M
84
18 4
18
3
L(!SC|M
L(M
|MSS) )
L(MC|M
L(S
C |MCS))
45
4 0
0 36
3
0
L(S |M )
8
C)
L(!CC|MC
L(!C |MC )
Total-1 (bits)
Total-1 (bits)
Total-2 (bits)
Total-2 (bits)
Rang
Rang
Coût
(bits)
Coût (bits)
Rang
Rang
58
6
6
41
41
35
35
1
1
19
19
55
36
4
4
71
71
67
67
3
3
13
13
1 1
0 36
36
3
3
71
71
68
68
4
14
2
4
14
2
36
7
79
72
4
18
3
7
79
72
4
18
3
42
8
93
85
6
20
4
['"%"e""
['"]"e""
['"c"e""
['">"e""
iikjkk
=> iijjd
iijjkk
iijjd
P1;S3 11
11 18
L(MS)
L(S
L(M
) S)
SS|M
Expériences
Expériences
Lesexpériences
expériencesréalisées
réaliséesmanuellement
manuellementont
ontconsisté
consistéààprendre
prendreune
unesérie
sériede
detests
testsavec
avec
Les
différentessolutions
solutionsexposées
exposéesdans
dans[Mitchell,93],
[Mitchell,93],ainsi
ainsique
qued'autres,
d'autres,etetààcalculer
calculerpour
pourchaque
chaque
différentes
problèmeetetchaque
chaquesolution
solutionproposée
proposéeles
lesvaleurs
valeursde
decomplexité
complexitéalgorithmique
algorithmiquedes
desformules
formules
problème
(1)etet(2)
(2)dedelalasection
section3.3.L'espace
L'espacelimité
limiténenepermet
permetninidedefournir
fournirlalaliste
listeexhaustive
exhaustivedes
desessais
essais
(1)
réalisés,ninide
dedonner
donnerleledétail
détaildes
descalculs
calculs(se
(sereporter
reporteràà[Cornuéjols,
[Cornuéjols,96,
96,en
enpréparation],
préparation],
réalisés,
[Khedoucci,94]).
[Khedoucci,94]).
Brièvement,lalaméthode
méthodeest
estlalasuivante.
suivante.Pour
Pourchaque
chaqueproblème
problème(ex:
(ex:abc
abc =>
=> abd
abd ;;
Brièvement,
iijjkk =>
=> ??
pourchaque
chaquesolution
solutionproposée
proposée(ex:
(ex:iijjkk
iijjkk =>
=> iijjll
iijjll
perception,
iijjkk
) )etetpour
),),lalaperception,
2'*3/0*)%e"G/7).B+-"15"^3.-+"2.*S5'(+"+'"*"B5-D+2"2.B+-B)25a1+2"
donclaladescription,
description,associée
associéesont
sontconjecturées.
conjecturées.Ainsi,
Ainsi,par
parexemple,
exemple,lelemodèle
modèleMMS Scicietetdonc
dessouscorrespond
correspondààlalaperception
perceptionde
del'objet
l'objet 'abc
'abc' 'comme
commeune
uneséquence
séquenceavec
avecune
uneloi
loide
de
dessous
succession spécifique.
spécifique. Pour
Pour chacune
chacune des
des descriptions
descriptions ainsi
ainsi définies,
définies, les
les longueurs
longueurs de
de
succession
descriptionassociées,
associées,suivant
suivantles
lesformules
formules(1)
(1)etet(2)
(2)sont
sontcalculées.
calculées.On
Onpeut
peutalors
alorscomparer
comparerlala
description
valeurdedechaque
chaquesolution
solutionsuivant
suivantles
lesmesures
mesuresdéfinies
définiesenensection
section3.3.
valeur
[o+(2"G+"2/Å.+'7+2"
3
Problème
1 :2 : "Remplacer
abc lettre
=> de
abd
; par
iijjkk
=> ?iijjkk
Solution
droite
son successeur"
17
17bits
bits
17
bits
Dans
cet
exemple,
laladernière
dernière
représentation
est
lalaplus
plus
économique
alors
même
qu'elle
Danscet
cetexemple,
exemple,la
dernièrereprésentation
représentationest
estla
pluséconomique
économiquealors
alorsmême
mêmequ'elle
qu'elle
Dans
abc
décritplus
pluscomplètement
complètementlalastructure
structuredede'abc
'abc' que
' quepar
parexemple
exemplelalaseconde
secondedescription
descriptionqui
quin'en
n'en
décrit
m?"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
!"#$%"
a'a
b'b
c'c
retientque
quelalaperception
perceptiond'un
d'unensemble
ensembledes
destrois
troislettres
lettres'a
' et'c
retient
',','b
' et
'.'.
3
4
1
10 bits
(1/2)
(1/2)
(1/2)
successeur(élt(lettre=x))
successeur(élt(lettre=x))=
élt(succ(lettre,1,x))
successeur(élt(lettre=x))
==élt(succ(lettre,1,x))
élt(succ(lettre,1,x))
L(lettre)
L(lettre)+
L(1ersucc)
succ)+
L(x) =
L(1/2.
1/6.
1)
L(lettre)
++L(1er
L(1er
succ)
++L(x)
L(x)
==L(1/2
L(1/2
..1/6
1/6
..1)
1)
&'"5BB-)578"()"5'51)3Pe".2*'3"Y)1H)3)-)S"7)HB1+O*(P"
succes.(élt(lettre=x) = élt(succ(lettre,1,x))
Dernier
abc => abd ; iijjkk => TOTAL
? :
(1/8)
(1/8)
(1/8)
(1/2)
(1/2)
(1/2)
type
d'éléments
==lettres
lettres
typed'éléments
d'éléments=
lettres
type
loi
de
succession
::
loide
desuccession
succession:
loi
Afin de pouvoir calculer les complexités algorithmiques associées aux formules définies
dans la section 3, il est nécessaire de définir la longueur de description associée à chaque
primitive de représentation. Le choix de ces longueurs est arbitraire et doit normalement
ml" !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
refléter la connaissance a priori du domaine par l'agent. Il y a donc là une possibilité
d'apprentissage et de test de divers biais correspondant à des contextes ou des connaissances
différents. Certaines contraintes pèsent cependant sur ce choix. En effet, la longueur de
description L associée à un concept doit idéalement correspondre à sa probabilité a priori P,
par la formule L=-log2(P) (Ainsi par exemple, la longueur de description du concept de
chaîne ci-dessous est de 3 bits car sa probabilité a priori est estimée à 1/8). Il est alors
Problème 1 :
3
ou
ouencore
encorepar:
par:
ou
encore
par:
Table 1 : Liste des primitives de représentation et de leur longueur de description associée.
MS ! Séquence
orientation : ->
type d'éléments = lettres
MS ! Séquence
loi de succession :
orientation
: ->
succes.(élt(lettre=x)
= élt(succ(lettre,1,x))
type d'éléments = lettres
Dernier
V,)-'./0)12K"#mmkK"#mmlK"#mm?K"%$#k\" loi de succession :
TOTAL :
(1/8)
(1/8)
(1/8)
(1/4.26)
33
(1/4.26)
(1/4.26)
TOTAL
TOTAL :
20bits
bits
TOTAL
:: 20
20
bits
P1;S5
P1;S6
12 P1;S6
11
22
15
11
8
93
85
6
20
4
15
3
65
62
2
31
6
3
65
62
2
31
6
Table
complexitésassociées
associéesaux
aux
formules
et pour
(2) pour
chaque
solution
du problème
1 sont reportées
Table 22:: Les
Les complexités
formules
(1) (1)
et (2)
chaque
solution
du problème
1 sont reportées
mm"la !"#$%"
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
ici.
que, pour
pourceceproblème,
problème,
deux
formules
conduisent
au même
classement,
que l'analogie
ici.On
On notera
notera
que,
lesles
deux
formules
conduisent
au même
classement,
et que et
l'analogie
la
meilleure, selon
selon le
défini,
correspond
à la àsolution
1, ce 1,
quiceestqui
confirmé
par despar des
meilleure,
le principe
principed'économie
d'économie
défini,
correspond
la solution
est confirmé
expériences sur
sur des
on on
demande
de classer
les solutions
ci-dessus.
La sous-table
sur les sur les
expériences
des sujets
sujetshumains
humainsauxquels
auxquels
demande
de classer
les solutions
ci-dessus.
La sous-table
"coûts" est
est expliquée
expliquée dans
"coûts"
danslalasection
section5. 5.
Les résultats
résultats obtenus
montrent
d'une
part part
que le
volet de
Les
obtenussur
surces
cesexemples
exemples
montrent
d'une
quedeuxième
le deuxième
volet de
l'hypothèse analogique _la coïncidence des optima des formules (1) et (2)_ semble justifié
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
#$$" !"#$%"
6-5'2C+-"5'G"2+Å.+'7+"+o+7(2"
&'51)3PK"(-5'2C+-"5'G"+#81#&9#!#:#9)+!
V,)-'./0)12"W"=.-+'5K"%$#k\"
123
xt−1
abc
abc
abd
aababc
?
aababc
Mt−1
ft−1
ft
Mt
yt−1
?
!"#"$
aababc
1
?
%&'' + , ()*
-$.&/(%01$023"!023!"4-0523$&!"*
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
abc
efg
$"/!1'"3()23)*
$"/!1'"3()23)*
abd
#$#" !"#$%"
Mt+1
Mt
Mt−1
?
yt+1
'
6
%&''()*
ijjkkk
ft+1
Mt+1
yt
?
abc
abd
•  ()()"
124
xt+1
xt
ijk
$"/!1'"3()23)*
efh
?
I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N""
#$%" !"#$%"

Documents pareils