MulD-tasks learning and transfer and domain
Transcription
MulD-tasks learning and transfer and domain
=.1DE(52F2"1+5-'*'3"5'G"" (-5'2C+-"5'G""G)H5*'"5G5B(5D)'" :'(-)G.7D)'" &'()*'+",)-'./0)12" &3-)45-*26+78"9":;<&"""=:&">#?" 5'()*'+@7)-'.+0)12A53-)B5-*2(+78@C-" !"#$%" PAN AND YANG: A SURVEY ON TRANSFER LEARNING &"(5O)')HP" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 1349 Transfer Learning %" !"#$%" Z)H5*'"5G5B(5D)'" • Z+^'*D)'" V45'K"6TE:_,&:`#]"(.()-*51\ " Definition [Pan, TL-IJCAI’13 tutorial] Ability of a system to recognize and apply knowledge and skills learned in – &a*1*(P")C"5"2P2(+H"()"!"#$%&'(")5'G"*++,-)F')b1+G3+"5'G"2F*112"1+5-'+G"*'" previous+!".'$/0)1$2*'&034*050"()"&$.",)1$2*'&034*050) domains/tasks to novel domains/tasks An • example [O5HB1+" • We have labeled images from a Web image corpus – d+"85S+",*6","1)'2*%"0"C-)H"5"7"6)#$!+/0) • Is there a Person in unlabeled images from a Video corpus ? – ;)S+1"(52Fe"'0)48"!")*)+"!0$&"*'".'15a+1+G"*H53+2"C-)H"5".'1"$)#$!+/0f) Person Fig. 2. An overview of differentQ-)H"R&"2.-S+P")'"6-5'2C+-"T+5-'*'3U"V45'"W"X5'3K"6YZ[K"%$#$\" settings of transfer. I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" behind this context is that some relationship among the data in the source and target domains is similar. Thus, the knowledge to be transferred is the relationship among the data. Recently, statistical relational learning techniques dominate this context [51], [52]. ]" !"#$%" fT ð"Þ in DT using the knowledge in DS and T S , where T S 6¼ T T . Based on the above definition of the inductive transfer learning setting, a few labeled data in the target domain are (LaHC) no Person Is there a Person? Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 4 / 95 c" !"#$%" [O5HB1+2e"(-5'2C+-"1+5-'*'3"*'"S*2*)'" Hard to predict what will change in the new domain ;5(.-51"T5'3.53+"4-)7+22*'3" Natural Language Processing Natural Language Processing 6+O(2"5-+"-+B-+2+'(+G"aP"Rb)-G2U""M+@3@"j53")C"b)-G2N" Text of areWords) represented by “words” (Bag of Words) Text are represented by “words” (Bag • 652F2" of learned Speech Tagging: Adaptpapers a tagger learned Part of Speech Tagging: Adapt a Part tagger from medical – )9*!4)$:);+""#8)<*%%'&%e"&G5B("5"(533+-"1+5-'+G"C-)H"H+G*751"B5B+-2"()"5" a journal (Wall Street Journal) - Newsgroup to a journal (Wall Street Journal) to - Newsgroup 0).-'51"Md511"h(-++("_).-'51N"9";+b23-).B" – );+*2)1"4"#=$&e"&G5B("5"71522*^+-"C-)H")'+".2+-"()"5')(8+-" Spam detection: Adapt a classifier from one ma Spam detection: Adapt a classifier from one mailbox to another – );"&=2"&4)*&*,-0'0) [Xu,Saenko,Tsang, Domain Transfer Tutorial - CVPR’12] Vg.K"h5+'F)K"625'3""RZ)H5*'"6-5'2C+-U"(.()-*51"9",i4<`#%\" (LaHC) I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 Sentiment analysis: Sentiment analysis: (LaHC) >" 18!"#$%" / 95 Z)H5*'"5G5B(5D)'"C)-"2+'DH+'("5'51P2*2 "V45'K"6TE:_,&:`#]"(.()-*51\" Domain Adaptation for sentiment analysis I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" (LaHC) Domain Adaptation - EPAT’14 Domain k" Adaptation - EPAT’14 !"#$%" 19 / 95 Domain Adaptation for sentiment analysis - ex Z)H5*'"5G5B(5D)'"C)-"2+'DH+'("5'51P2*2" [Pan-IJCAI’13 tutorial] Electronics Video games (1) Compact; easy to operate; very good picture quality; looks sharp! (2) A very good game! It is action packed and full of excitement. I am very much hooked on this game. (3) I purchased this unit from Circuit City and I was very excited about the quality of the picture. It is really nice and sharp. (4) Very realistic shooting action and good plots. We played this and were hooked. (5) It is also quite blurry in very dark settings. I will never buy HP again. (6) It is so boring. I am extremely unhappy and will probably never buy UbiSoft again. Source specific: compact, sharp, blurry. Target specific: hooked, realistic, boring. Domain independent: good, excited, nice, never buy, unhappy. (LaHC) I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" (LaHC) Domain Adaptation - EPAT’14 l" !"#$%" 20 / 95 Domain Adaptation - EPAT’14 V45'K"6TE:_,&:`#]"(.()-*51\" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 21 / 95 ?" !"#$%" ,)E(-5*'*'3" • ;"2'>0/+"!.'0"1"1+5-'*'3" – &"02*,,)+*!4")C"(8+"(-5*'*'3"G5(5"*2"15a+1+G" – &",*!%")+*!4"*2".'15a+1+G" • RZ*253-++H+'(Ea52+GU"H+(8)G2" ,)E6-5*'*'3" – <7$)#,*00'?"!0"5-+"(-5*'+G"2+B5-5(+1P" – n'"G*o+-+'("2.a2+(2")C"(8+"G+27-*BDS+"C+5(.-+2"M47$).'"70N" • &BB1*75D)'2" – ;5(.-51"15'3.53+"B-)7+22*'3" – :H53+"-+(-*+S51" m" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" !"#$+#,(-#(!$00#0$1-0-%#($!30-#2)#0-$*/#'/'2'$0#*.0-(6# !"#$+#,(-#(!$00#0$1-0-%#($!30-#2)#0-$*/#'/'2'$0#*.0-(6# :)%--;-($+<-$=--(+$="+>&%$1+*+0"#$%&'('()[email protected];#B'$0C-..DEFG+ (J´P\DGYLVRUµSRLQWLQJWRDSDJHLVDJRRGLQGLFDWRULWLVD (J´P\DGYLVRUµSRLQWLQJWRDSDJHLVDJRRGLQGLFDWRULWLVD • )@1"*e".2+"2H511"15a+1+G"25HB1+"()","*!&)'&'=*,)!/,"0) 5$7.02"#8)!-#3$4-6# #++-3&;>.-1+0"($&'(+$="+1A//'0'-($+1-$1+"/+/-&$A%-1H+3+I+!+34H+36+"++ # – [@3@"RHP"5GS*2)-U"B)*'D'3"()"5"B53+"*2"5"3))G"*'G*75()-"(85("(8*2"B53+"*2"5" (J´,DPWHDFKLQJµRQDSDJHLVDJRRGLQGLFDWRULWLVDIDFXOW\ (J´,DPWHDFKLQJµRQDSDJHLVDJRRGLQGLFDWRULWLVDIDFXOW\ 8)!-#3$4-6# C57.1(P"8)H+"B53+@" #++<-.'-/*+$C-+>&%$1+&%-+0"(1'1$-($H+'G-G+$+04H+06+1G$G+04K34LI06K36LI0MK3L+ J"%+-3&;>.-H+'/+=-+=&($+$"+0.&11'/2+=-<+>&)-1*# x = ! x1, x2 "# #./0$1-0-%#%$2$#2)##3*)3$4$2-#0-$*/-%#'/5)*!$2')/6# #./0$1-0-%#%$2$#2)##3*)3$4$2-#0-$*/-%#'/5)*!$2')/6# – !"#$+#,([@3@"R:"5H"(+578*'3U")'"5"B53+"*2"5"3))G"*'G*75()-"*("*2"5"C57.1(P"8)H+"B53+@" # &1+/&0A.$2+;-;<-%+C";->&)-+"%+("$# !"#$%&'()*# 3+#+7'(8+'(/"+9+5-3$+'(/"+ +*),-#$&*'.#/01.# 34#+5-3$+'(/"+ #$" !"#$%" 92-*$2'&-#:);<*$'/'/4# :(+-5DS+"7)E(-5*'*'3" ,)E(-5*'*'3"5'G"2+1CE7)'2*2(+'7P" !"#$%&'('()*+,-./#0"(1'1$-(02+ +*),-#$&*'.#/01.# I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" !"#$%&'()*# 36#+7'(8+'(/"+ p2*'3"21*G+2")C"=5-*5EQ1)-*5'5"j5175'"VhhTE7)E(-5*'*'3Ej5175'E#mq221q$]E]$E%$#>@BGC\" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" ##" !"#$%" !"#$%&'()*# • )@1"*e"(8+'".2+".'15a+1+G"G5(5"()"B-)B535(+"1+5-'+G"*'C)-H5D)'" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" #%" !"#$%" 8,%/),19%":-;</)1'1'0" !"#$!"#$%"$3)(("()*%(%+"$)3.(%",-"(%)/'"1'1,1)("/&(%$4" :(+-5DS+"7)E(-5*'*'3" !"#$%"&'#()*+,$%&-&-.( G (J´P\DGYLVRUµSRLQWLQJWRDSDJHLVDJRRGLQGLFDWRULWLVD 2)5&(,6"7-3%".)0%4" " (J´,DPWHDFKLQJµRQDSDJHLVDJRRGLQGLFDWRULWLVDIDFXOW\ 7-3%".)0%4" H*$A1(<>(B1&-.(B-0%<#0#=(=%"%("*(( ?$*?%.%"#(0#%$-#=(&-8*$D%"&*-:( ( !"#$!"#$%"&'()*%(%+"+),)",-""./-.)0),%"(%)/'%+"1'2-/3),1-'4" "=-->"2-/"&'()*%(%+"%?)3.(%$"@7%/%"-'%"/&(%"1$"5-'21+%',")'+" ,7%"-,7%/"1$"'-,4"A)9%"1,"()*%(",7%"%?)3.(%"2-/",7%"-,7%/4"" !" !" !" 73( 7( /%'#(0#%$-&-.(%0.*1(234(25(*-(#%67(*8("7#("9*('[:( ;1#(0%<#0#=(=%"%("*(0#%$-("9*(&-&"&%0(7>?:(734(75:( ( F#?#%"( ( !?BC?D"# !?BC?D"# !?BC?D"# !?BC?D"# !?BC?D"# !?BC?D"# @**A("7$*B.7(B-0%<#0#=(=%"%("*(8&-=(#C%D?0#1( 97#$#(*-#(*8(7&(&1((6*-8&=#-"(<B"(*"7#$(&1(-*":( /%'#("7#(6*-8&=#-"(7&(0%<#0(&"(8*$(%0.*$&"7D(2E+&:( </)1'1'0"D"5()$$121%/$C"-'%"-'"%)57",6.%"-2"1'2-4""#$1'0" %)57",-"7%(.",/)1'",7%"-,7%/4" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" G5( 3( #]" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" #c" !"#$%" !"#$%"&'#()*+,$%&-&-.((( n-*3*'51"5BB1*75D)'e"d+aB53+"71522*^75D)'" !"#$#%&'()**'#+&,#-%.(/01*&$0(+'&22#3#+&,#-%( :(+-5DS+"7)E(-5*'*'3" /(0&123#(45%123#6(7#%$-&-.(!-"#$'%38( 45('&10'06(07&8*'029(4:::(;%'&10'06( ((7%=#3#>(#5%123#8( ((<-3%=#3#>(#5%123#8( • &"2*HB1+"+O5HB1+e!"#$%&'&(!'&)#%*$"+! A( 9:( +( <2&8*'0(";%=( ?:;( +( 9;( ?;;( <8#(3%=#3#>(>%"%("*(3#%$-(?;;(%->(?:;( <8#(@-3%=#3#>(>%"%("*(=**"8"$%2( ?::( ?:;( ?;:( I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" #>" !"#$%" ?;:( I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" #k" !"#$%" [OB5'2*)'e"1+5-'*'3"*'(+-S512" ,)E(-5*'*'3e"68+)-+D751"3.5-5'(++2" !"#$%&'(%)*!"$+#,-&.*/-$0%'%1*2%3-04$,&* :(%&'&3-%5;.*<-0(*#0(=$=','3;* +$&&*'%*3>-*0-1'(%&* 56* 57* ?(%@-"#$%A'%1*B%(%@>-,#CD,E* A'&30'=D3'(%* !"#$%A'%1*A'&30'=D3'(%* • +>%OXP0LWFKHOO&2/7¶@+ Vj1.H"W"=*(78+11K",nT6`m?\" #@ => %@ @> 89* 56* 56* F7* !"#$%&'('()*+,-."%.$'/&0+12&%&($..3+ !"#$%,%-,#%.#+!&'%()%*))&%+',%-'.$,#/*/*0%)-!/-%0!/#""1% 4-&$+5%"5.%$'.3+6"+7.+(..6+8"%+/"#$%&'('()+$"+7"%9+7.00:+ • d+"'++G"522.HBD)'2"5a).(e" 4.+(..6+&332;5$'"(3+&<"2$*+ #@ "68+".'G+-1P*'3"G5(5"G*2(-*a.D)'" => $-.+2(6.%0?'()+6&$&+6'3$%'<2$'"(+ %@ "68+"1+5-'*'3"513)2")'"(8+"(b)"2*G+2" @> $-.+0.&%('()+&0)"3+"(+$-.+$7"+3'6.3+ 89* ":'G+B+'G+'7+"3*S+'"(8+"15a+1" +A(6.5.(6.(/.+)'B.(+$-.+0&<.0+ "&13)"C)-"1+5-'*'3"-5'G)H"')*2+" +C0)>+8"%+0.&%('()+8%";+%&(6";+("'3.>+ • DE&0/&(F+E02;F+G&()F+HAIJ+@KKLM+ Vj5175'K"j1.H"W"X5'3K";:4h"%$$c\" #@ "Z*2(-*a.D)'51"+OB5'2*)'" F6* 57* ܦଵା + ܦଶା + ܦଵି + ܦଶି + => N'3$%'<2$'"(&0+.O5&(3'"(>+ @> +C0)>+8"%+0.&%('()+8%";+5"3'$B.+6&$&+"(0?>+ #l" !"#$%" ,)E(-5*'*'3e"68+)-+D751"3.5-5'(++2" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" #?" !"#$%" &')(8+-"5BB-)578e"7)E-+3.15-*s5D)'" !"#$%,%-,#%.#+!&'%()%*))&%+',%-'.$,#/*/*0%)-!/-%0!/#""1% • ;)"*(+-5DS+"B-)7+G.-+" • Vj5175'"+("51@"M%$$>N\" – P'.7+@+ %@ "&13)"C)-"1+5-'*'3"C-)H"B)2*DS+"G5(5")'1P" 57* I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" P'.7+=+ • Z*-+7(1P"H*'*H*s+2"" ":C"(8+"71522*^+-2"5-+"'+S+-"7)'^G+'("a.("b-)'3K"(8+"!E+OB5'2*)'" 522.HBD)'"75'"3.5-5'(++"(8+"2.77+22")C"7)E(-5*'*'3" – "(8+)"!!$!)!*4")$&),*6","1)1*4*"" – "&'G"(8+"1'0*%!""2"&4))S+-"/&,*6","1)1*4*) • Vd5'3"W"r8)."M%$$lN\" – "*C"(8+"(b)"71522*^+-2"85S+"15-3+"G*S+-2*(PK"7)E(-5*'*'32"2(P1+"513)-*(8H2" 75'"2.77++G" D$2+*=6','4-))C"5"B5*-")C"8PB)(8+2+2"C#K"C%e" &11"(8+2+"5'51P2+2"522.H+"(85("" "*#8)A.'"7B)'0)0/C#'"&4"()"1+5-'"b+11" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" "M')"B2+.G)E15a+12"5-+"522*3'+G"()".'15a+1+G"G5(5N" #m" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" %$" !"#$%" !"#$%&'('()*+,-$'#.'/01223415'%/6$1 ,)E(-5*'*'3!=.1DES*+b"hhTe"Z*-+7(")BDH*s5D)'")C"53-++H+'(" 78$'9':&$'"(1";1<)%//9/($1 1@(8,$41୪ =>Aଵ ǡ ଵ B«୫ౢ ǡ ୫ౢ B?11 1୳ =>ଵ «୫౫ ?1 1@(8,$41୪ =>Aଵ ǡ ଵ B«୫ౢ ǡ ୫ౢ B?11 1୳ =>ଵ «୫౫ ?1 ଶ ୫ౢ ଶ ୫౫ ୦భǡ୦మ ሺ୪ ୧ ǡ ୧ ሻ ሺଵ ୧ ǡ ଶ ୧ ሻ1 ୪ୀଵ ୧ୀଵ !"#$%&'('()*+,-$'#.'/01223415'%/6$1 ,)E(-5*'*'3!=.1DES*+b"hhTe"Z*-+7(")BDH*s5D)'")C"53-++H+'(" 78$'9':&$'"(1";1<)%//9/($1 ୫ౢ ୫౫ ୦భǡ୦మ ሺ୪ ୧ ǡ ୧ ሻ ሺଵ ୧ ǡ ଶ ୧ ሻ1 ୧ୀଵ ୪ୀଵ ୧ୀଵ ୧ୀଵ Ȉ ሺ ୧ ǡ ୧ ሻ1-"CC1;,(6$'"(1 C&6D1";1$D/91D&E1E9&--1 -&F/-/G1/%%"%1 DE)EF1CG,&%/1-"CC1 ୧ ǡ ୧ ൌ ୧ െ ݄ ୪ H/),-&%':/%1$"1/(6",%&)/1 &)%//9/($1"./%1,(-&F/-/G1G&$1 CJ)JL1 IJ1K&%$-/$$L15J1H"E/(F/%)L1<@2M<M21NOOPQ11 RJ12%'GD&%&(L12J1R&S&G/L1!73M1NOOT1 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" %#" !"#$%" ,)E(-5*'*'3"b*(8"*'2.t7*+'("S*+b2" DE)EF1H*I1-"CC1 ୧ ǡ ୧ ൌ ͳ௬ ஷሺ௫ሻ 1 DE)EF1 JE1K&%$-/$$F15E1L"C/(M/%)F1<@2N<N21OHHPQ11 RE12%'ST&%&(F12E1R&U&S/F1!73N1OHHV1 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" %%" !"#$%" ,)E(-5*'*'3"b*(8"*'2.t7*+'("S*+b2"M%N" "Vd+*"r85'3"W"r8*Eu.5"r8)."M%$#]N@"R2'.$,#/*/*0%(/$"%/*345-/)*$%6/)(3U@"" _=T<K"%mecklec?%K"%$#]\" "Vd+*"r85'3"W"r8*Eu.5"r8)."M%$#]N@"R2'.$,#/*/*0%(/$"%/*345-/)*$%6/)(3U@"" _=T<K"%mecklec?%K"%$#]\" • :C"&$)0*2+,'&%)6'*0e"" • 4-+S*).2"2(.G*+2"522.H+"(85("+578"S*+b"*2"2.t7*+'(" • <7$)+$4"&=*,)+!$6,"20"C)-"7)E(-5*'*'3" – )E*6",)&$'0"e"*C")'+"b+5F"1+5-'+-"G)+2"')("15a+1"577)-G*'3"()"(8+"(5-3+("7)'7+B(" – ":C"(8+"1'."!0'4-)b*(8"H5-3*'2""#"5'G""%"a+(b++'"(8+"(b)"S*+b2)'0)G"M+S+-P".'15a+1+G"*'2(5'7+"7).1G" a+"15a+1+G"b*(8"15-3+"H5-3*'"aP")'+"(8+"(b)"S*+b2N"7)E(-5*'*'3"7).1G"$/4+/4)*)&"*!)%$$1) 8-+$48"0'0@" j.("(8+-+"75'"a+"5"15-3+"G*o+-+'7+"a+(b++'"(8+"(b)")BDH51"71522*^+-2"*'"(8+"(b)"S*+b2@" – "68+"')D)'")C"*'C)-H5D)'"M2.t7*+'7PN"H.2("a+"*'(-)G.7+G@"@:)$&").'"7)#*&)+!$.'1")*)%$$1) '&:$!2*=$&K"5'G"*C"(8+"1'."!0'4-))C"71522*^+-2"(-5*'+G")'"(8+"*'*D51"15a+1+G"G5(5"b*(8"H5-3*'2""#"5'G" "%)'0)GK"(8+'"7)E(-5*'*'3"7).1G"$/4+/4)*)!>*++!$H'2*=$&)$:)48")$+=2*,)#,*00'?"!) – );*2+,")6'*0e"(8+"B2+.G)E15a+1+G"G5(5"H*38("a+"')("*@*@G@" • ;)D)'2")C"" • d*(8"0*2+,'&%)6'*0e" – )@&0/C#'"&#-))C"S*+b2" – ":C"(8+"1'."!0'4-)b*(8"H5-3*'2""#"5'G""%"a+(b++'"71522*C+-2"(-5*'+G")'"*'*D51"15a+1+G"G5(5)'0),*!%"K"7)E (-5*'*'3"#$/,1)'2+!$.")48")+"!:$!2*&#")$:)7"*5)8-+$48"0"2"aP"+OB1)*D'3".'15a+1+G"G5(5".'D1"(8+" G*S+-2*(P"a+(b++'"(8+"(b)"S*+b2"a+7)H+2"$@"" – )F'."!0'4-))C"S*+b2"Ma52+")'"(8+"H5-3*'2")C"(8+"(b)"71522*^+-2N" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" ଶ1 %]" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" %c" !"#$%" &22+22H+'(" • ,)E(-5*'*'3e"5"&'#")'1"*) • u52"a++'"C).'G"/0":/,"*'"2+S+-51"5BB1*75D)'2) =.1DE(52F"1+5-'*'3" • j.("(8+"48"$!"=#*,)*&*,-0'0"*2"2D11"'&#$2+,"4") – "T*'F2"b*(8"2+H*E2.B+-S*2+G"1+5-'*'3" – "d85("5a).("()(511P".'2.B+-S*2+G"7)'(+O(f" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" %>" !"#$%" &22.HBD)'"a+8*'G"=6T" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" %k" !"#$%" 4)22*a1+"-+15D)'2"a+(b++'"(52F2" • 68+"#$26'&"1),"*!&'&%")C"H.1DB1+"-+15(+G"(52F2"#*&)$/4+"!:$!2),"*!&'&%) "*#8)4*05)'&)'0$,*=$&) • =6T"511)b2"C)-"#$22$&)'&:$!2*=$&"285-+G"a+(b++'"(8+"(52F2"()"a+".2+G"*'" (8+"1+5-'*'3"B-)7+22K"b8*78"1+5G2"()"a+v+-"3+'+-51*s5D)'"*C"(8+"(52F2"5-+" -+15(+G" • [@3@"T+5-'*'3"())+!"1'#4)48")!*=&%0"C)-"2+S+-51"G*o+-+'("7-*D72"M*'"G*o+-+'(" 7).'(-*+2N"75'"1+5G"()"a+v+-"B+-C)-H5'7+2"C)-""*#8)0"+*!*4")4*05"MB-+G*7("(8+" -+2(5.-5'("-5D'32"C)-"5"2B+7*^7"7-*D7N" • &11"C.'7D)'2"()"a+"1+5-'"5-+"#,$0")()"+578")(8+-"'&)0$2")&$!2) – "[@3@"C.'7D)'2"75B(.-*'3"B-+C+-+'7+2"*'".2+-2`"H)G+1*'3"B-)a1+H2" • 652F2"(85("285-+"5"#$22$&)/&1"!,-'&%)!"+!"0"&4*=$&) – "[@3@"*'""47#*%6/3/'*K"511"(52F2".2+"(8+"0*2")0"4)$:):"*4/!"0"1+5-'("*'"(8+" ^-2("2(53+2")C"(8+"S*2.51"2P2(+H"M+@3@"1)751"^1(+-2"2*H*15-"()"b5S+1+(2N" – "p2+-2"H5P"512)"8,)+),"G*o+-+'("(PB+2")C"(8*'32"M+@3@"a))F2K"H)S*+2K"H.2*7N" a52+G")'"(8+"0*2")0"4)$:):"*4/!"0")-"0#$!")C.'7D)'2" • T+5-'*'3"()"!"#$%&'(")*):*#""5'G"(8+""H+!"00'$&)MC+5-K"G*23.2(K"5'3+-K"wN" • I/,=)2$1*,'4-),"*!&'&%e"+@3@"S*2*)'"5'G"B-)B-*)7+BD)'" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" %l" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" %?" !"#$%" x.+2D)'" &BB-)78+"B5-"-/3.15-*25D)'"e"5BB-+'D2253+"H.1DE(J78+2" u)b"G)"b+"78)2+"()"2$1",)48")08*!"1)'&:$!2*=$&"a+(b++'"(8+" (52F2f" • 9""(J78+2"G+"71522*^75D)'"a*'5*-+"G/^'*+2"2.-"X"O"Y") uPB)(8y2+2"1*'/5*-+2" • h)H+"285-+G".'G+-1P*'3"7)'2(-5*'(" – [@3@"5",$7)1'2"&0'$&*,)!"+!"0"&4*=$&"285-+G"57-)22"H.1DB1+"-+15(+G" (52F2" :#,$#0)%)*$,)%$;-")3" • jP"b5P")C"5"285-+G"8*GG+'"15P+-"*'"5"'+.-51"'+(b)-F" • jP"+OB1*7*(1P"7)'2(-5*'*'3"(8+"G*H+'2*)'51*(P")C"5"285-+G"-+B-+2+'(5D)'" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" %m" !"#$%" =.1DE(52F"C+5(.-+"1+5-'*'3" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" ]$" !"#$%" =.1DE(52F"C+5(.-+"1+5-'*'3"M%N" • h.BB)2+"(85("+578"-+3-+22*)'"C.'7D)'""$"*2" 1*'+5-"*'"C+5(.-+"C.'7D)'2"+$"" • h.BB)2+"$&,-)$&")4*05"b*(8":"*4/!"0)1')?H"1"5"B-*)-*" • T+5-'*'3"(52Fe"1+5-'"(8+"B5-5H+(+-"S+7()-"#("*'"<G"C-)H"G5(5"2+(" • 68+"C+5(.-+"C.'7D)'2"5-+"2.BB)2+G"()"a+" 1*'+5-@"&'G"(8+"1*"5-+"2.BB)2+G"()"a+" )-(8)')-H51" • d+"b5'("()"a).'G"(8+"'.Ha+-")C"')'"s+-)"7)HB)'+'(2")C"$(""" &@"&-3P-*)."W"6@"[S3+'*)."M%$$kN@"R=.1DE(52F"C+5(.-+"1+5-'*'3U@";:4hE%$$k@" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" ]#" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" ]%" !"#$%" =.1DE(52F"C+5(.-+"1+5-'*'3"M]N" =.1DE(52F"C+5(.-+"1+5-'*'3"McN" • LH+"!'2"&40)M2P'(8+D7"G5(5N"%$$"(52F2"" • h.BB)2+"2/,=+,")4*050"b*(8":"*4/!"0)1')4$)6"),"*!&"1" • • b*(8"(("C-)H"5">"G*H+'2*)'51"}5.22*5'"MH+5'"5'G"7)S5-*5'7+N~"%$"*--+1+S5'("G*H+'2*)'2" T+5-'*'3"(52Fe"1+5-'"(8+"B5-5H+(+-"S+7()-"#("*'"<G"5'G"(8+" • [578"(52Fe">")-"#$"+O5HB1+2" C+5(.-+2"1*"C-)H"G5(5"2+(" 18 • d+"b5'("()"a).'G"(8+"'.Ha+-")C"')'"s+-)"7)HB)'+'(2")C"$("" 5'G"85S+"J"()"a+"5"1)b"-5'F"H5(-*O" 12 15 T = 200T = 200 T = 100T = 100 0.16 0.16 T = 25 T = 25 T = 10 T = 10 0.14 0.14 10 independent independent 10 0.12 0.12 1 1 8 0.1 0.1 5 0.08 0.08 0.9 0.9 0.8 0.8 0.06 0.06 0.7 0.7 16 14 6 1.2 1.1 T = 200T = 200 T = 100T = 100 1.2 T = 25 T = 25 T = 10 T = 10 1.1 independent independent 4 2 −4 10 −2 10 0 0.04 0.04 5 5 0 10 0 10 10 10 15 15 20 20 1 10 25 25 5 5 10 10 15 15 20 20 25 25 Figure 1: Number of features learned versus theFigure regularization parameter γ (see text Figure 2: Test 2: Test errorerror (left)(left) and and residual residual of for learned ofdescription). learned features features (right) (right) vs. dimensionality vs. dimensionality of the of input. the input. j.("(8*2"*2"5"&$&>#$&."H)+!$6,"2K"5'G"(8+"')-H"zz&zz%K#"*2"&$&)02$$48@" {|"""&1(+-'5(+"H*'*H*s5D)'")C"1)22"b-(@"K"5'G"pK"" """""""5'G"(8+"7)HB.(5D)'")C"(8+"b(""" ;.Ha+-")C"C+5(.-+2" 6+2("+--)-"M1+N"5'G"-+2*G.51")C"1+5-'+G"C+5(.-+2"M-*38(N"" 1+5-'+G"S+-2.2"(8+" However, since the trace norm is nonsmooth, we have opted for the above alternating minimization S2@"G*H+'2*)'51*(P")C"(8+"*'B.(" -+3.15-*s5D)'"B5-5H+(+-"""" strategy which is simple to implement and has a natural interpretation. Indeed, Algorithm 1 alter5.3 5.3 5.2 5.2 5.1 5.1 5 5 0.8 0.8 0.25 0.7 0.7 0.2 0.2 0.15 0.15 nately performs a supervised and an unsupervised step, where in the latter step we learn common representations across the tasks and in the former step we learn task-specific functions using these representations. I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" ]]" !"#$%" 4.9 4.9 4.8 4.8 4.7 4.7 4.6 4.6 4.5 4.5 4.4 4.4 4.3 0 4.3 0 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.25 0.1 0.1 0.05 0.05 We conclude this section by noting that when matrix D in problem (3.2) is additionally constrained ]c" !"#$%" to be diagonal, problem (3.2) reduces toI"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" problem (2.5). Formally, we have the following corollary. Corollary 4.3. Problem (2.5) is equivalent to the problem ! #for the Figure Figure 3: Test 3: " Test error error vs. vs. number number of tasks of tasks (left)(left) for computer the computer survey survey datadata set. set. Significance Significance of of d d (middle) features (middle) and and attributes learned by by the most important important feature feature (right). (right). min R(W, Diag(λ)) : W ∈ IRd×T , λfeatures ∈ IR λi ≤ 1,attributes λi #=learned 0 when w ithe #= 0most (4.4) +, 50 50 100 100 150 150 200 200 0.2 0.2 0 0 0.1 0.1 −0.05 −0.05 0 0 −0.1 2 42 64 86 10 8 12 10 14 12 14 −0.1 TE RAM SC TE CPU RAMHD SC CD CPU CA HD CO CD AV CA WA CO SW AV GU WA PR SW GU PR i=1 and the optimal λ is given by =.1DE(52F"1+5-'*'3"b*(8"G++B"'+.-51"'+(b)-F2" Query classification posterior probability computed by sigmoid Query Classification Web Search P (C1 |Q) P (C2 |Q) W3t=C1 P (D1 |Q) P (D2 |Q) Relevance measured by cosine similarity W3t=C2 l3 : Task-Specific Q C1 Q C2 Representation (128) W2t=C1 W2t=C2 D1Sd Q Sq t=Sq W2 W2t=Sd l2 : Semantic Representation (300) Shared layers Web search posterior probability computed by softmax W1 l1 : Letter 3gram (50k) H X: Bag-of-Words Input (500k) D2Sd W2t=Sd On On the right, right, we have we have plotted plotted a residual a residual measure measure of how of how wellwell the the learned learned features features approximate approximate i the $wthe $2actual used to generate to generate the data. the data. More More specifically, specifically, we depict the Frobenius the Frobenius norm norm of the of the λi = the actual , onesones iused ∈ IN (4.5) we depict d. $W $ difference difference of CNN the of learned theUsing learned andTransfer and actual actual D’s D’s versus versus thefrom input the input dimensionality. dimensionality. We We observe observe that that adding adding Training Learning Pseudo-Tasks 73 =.1DE(52F"1+5-'*'3"b*(8"G++B"'+.-51"'+(b)-F2" moremore taskstasks leadsleads to better to better estimates estimates of the of underlying the underlying features. features. 2,1 Using this corollary we can make a simple modification to Algorithm 1 in order to use it for variable Conjoint Conjoint analysis analysis experiment. experiment. We We then tested tested the 1) the method using using a real a real datadata set about set about people’s people’s selection. That is, we modify the computation of the matrix D (penultimate line inthen Algorithm asmethod products of products fromfrom [13].[13]. TheThe datadata waswas taken taken fromfrom a survey a survey of 180 of 180 persons persons whowho ratedrated the the D = Diag(λ), where the vector λ = (λ1ratings , . . ratings . , λof d ) is computed using equation (4.5). likelihood likelihood of purchasing of purchasing one one of 20 of different 20 different personal personal computers. computers. HereHere the persons the persons correspond correspond to to taskstasks and and the PC the models PC models to examples. to examples. TheThe inputinput is represented is represented by the by following the following 13 binary 13 binary attributes: attributes: 5 Experiments telephone telephone hot hot line line (TE), (TE), amount amount of memory of memory (RAM), (RAM), screen screen sizesize (SC), (SC), CPUCPU speed speed (CPU), (CPU), hardhard In this section, we present experiments on a disk synthetic and a real data set. In all(CD), ofcache ourcache experiments, disk (HD), (HD), CD-ROM/multimedia CD-ROM/multimedia (CD), (CA), (CA), Color Color (CO), (CO), availability availability (AV), (AV), warranty warranty (WA), (WA), we used the square loss function and automatically tuned theguarantee regularization γ(PR). with leavesoftware software (SW), (SW), guarantee (GU) (GU) andparameter and priceprice (PR). We We also also added added an input an input component component accounting accounting for for one-out cross validation. the bias the bias term. term. TheThe output output is anisinteger an integer rating rating on the on scale the scale 0−10. 0−10. Following Following [13],[13], we used we used 4 examples 4 examples per task per task asdata the as sets test the test data and and 8 examples 8 examples task pertask task as the as training the training data.data. Synthetic Experiments. We created synthetic by data generating T =per 200 parameters wt from a 5-dimensional Gaussian mean and covariance equal to As distribution shown As shown in Figure inwith Figure 3,zero the 3, performance the performance of our of our algorithm algorithm improves improves withwith the number the number of tasks. of tasks. It also It also Diag(1, 0.25, 0.1, 0.05, 0.01). These areperforms theperforms relevant dimensions we wish to learn. To regressions, these wewhose much much better better thanthan independent independent ridgeridge regressions, whose test test errorerror is equal is equal to 16.53. to 16.53. In this In this kept adding up to 20 irrelevant dimensions which are exactly training setswhich were particular particular problem, problem, it zero. is italso is The also important important toand investigate to test investigate which features features are significant are significant to alltoconsumers all consumers selected randomly from [0, 1]25 and contained 5 and 10they examples respectively. The We outputs and and how how they weight weight theper 13 thetask computer 13 computer attributes. attributes. We demonstrate demonstrate the results the results in the in two the two adjacent adjacent yti were computed from the wt and xti asplots, ytiplots, =which %wwhich ν, where νwith is with zero-mean Gaussian were obtained obtained the data the data for all for 180 all noise 180 tasks. tasks. In the In middle, the middle, the distribution the distribution of the of the t , xwere ti & + with standard deviation equal to 0.1.(a) eigenvalues eigenvalues of DofisDdepicted, is depicted, indicating indicating that that therethere is a(b) is single a single mostmost important important feature feature which which is shared is shared by all bypersons. all persons. The The plotplot on the on right the right shows shows the weight the weight of each of each inputinput dimension dimension in this in this mostmost important important We first present, in Figure 1, the number of features learned by our algorithm, as measured by feature. Thisset This feature seems to weight to25 weight the technical the technical characteristics characteristics of aof computer a computer (RAM, (RAM, CPUCPU and and rank(D). Fig. The plot on the left corresponds tofeature. a data offeature 200seems tasks with input dimensions and 1. Illustrating the mechanism of transfer learning. (a) Functional view: tasks represented asdiscern CD-ROM) CD-ROM) against against its price. its price. Therefore, Therefore, in this in this application application our algorithm our algorithm is able is able to discern to interesting interesting that on the right to a real data set of 180 tasks described in the next subsection. As expected, the functional mapping share stochastic characteristics. (b) Transfer learning in neural networks, the patterns patterns in people’s in people’s decision decision process. process. number of features decreases with γ. hidden layer represents the level of sharing between all the task. School School data. experiments experiments withwith the school thewith school data usedused in [3] in achieved [3] achieved explained explained variance variance Figure 2 depicts the performance of our algorithm fordata. TPreliminary =Preliminary 10, 25, 100 and 200 tasks along the data 37.1% compared compared toon29.5% to in that in that paper. These results results will be reported be reported in future in future work. work. performance of 200 independent standard37.1% ridge regressions the29.5% data. For T =paper. 10, 25These and 100, wewill averaged the performance metrics over runs on all the data so that our estimates have comparable 3.3agreement A Bayesian Perspective 6 and 6Conclusion Conclusion variance. In with past empirical theoretical evidence (see e.g. [4]), learning multiple "V&@"&8H+GK"Y@"X.K"d@"gpK"X@"})'3"5'G"[@"g*'3"M%$$?N@"I"9,#/*/*0%"/),#,-"/-#A%+))&.+',(#,&% tasks together significantly improves on learning the tasks independently. Moreover, the perforWe have We have presented presented an algorithm anThis algorithm which which learns learns common sparse function function representations representations across across a pool a pool In this section we give a Bayesian perspective to our the transfer learning problem formumance of the algorithm improves when more tasks are available. improvement is common moderate for sparse 6/34#A%,)-'0*/='*%7'&)A3%43/*0%$,#*3+),%A)#,*/*0%+,'7%83)4&'.$#3B3L@"4-)7@"[,,iE%$$?\"" of related of related tasks. tasks. To our To knowledge, our knowledge, approach our approach provides provides the first the first convex convex optimization optimization formulation formulation low dimensionalities but increases as the number of irrelevant dimensions increases. lated in Section 3.2. While (1, 2)feature are all whatAlthough isAlthough needed tooptimization implement the proposed for multi-task for multi-task feature learning. learning. convex convex optimization methods methods havehave beenbeen derived derived for the for the "Vg@"T*.K"_@"}5)K"g@3"u+K"T@"Z+'3K"Y@"Z.8"5'G"X+EX*"d5'3"M%$#>N@"I"<)8,)3)*$#='*%>)#,*/*0%?3/*0%@4A=.9#3B% ure 1: Architecture of the Multi-task Deep Neural Network (DNN) for Representation Learning: e lower layers areC))8%D)4,#A%D)$(',B3%+',%E)7#*=-%2A#33/F-#='*%#*&%G*+',7#='*%<)$,/)6#A"L@"4-)7@";&&,TK"=5P"%$#>\"" shared across all tasks, while top layers are task-specific. The input X (either a query or cument, with vocabulary size 500k) is first represented as a bag of words, then hashed into letter 3-grams ]>" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Non-linear projection W1 generates the shared semantic representation, a vector l2 (dimension 300) that rained to capture the essential characteristics of queries and documents. Finally, for each task, additional n-linear projections W2t generate task-specific representations l3 (dimension 128), followed by operations approach, the sole purpose of this section is to give more insight to the role of]k"the!"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" pseudo tasks and to formalize the claims we made near the end of Section 3.1. In Section 3.2, we hypothesized that the pseudo tasks are realizable as a linear projection from the feature mapping layer output, Φ(x; θ), that is: =.1DE(52F"1+5-'*'3"b*(8"G++B"'+.-51"'+(b)-F2" Training CNN Using Transfer Learning from Pseudo-Tasks 75 Z)H5*'"5G5B(5D)'"5'G"(-5'2C+-"1+5-'*'3" Fig. 2. Joint training using transfer-learning from pseudo-tasks "V&@"&8H+GK"Y@"X.K"d@"gpK"X@"})'3"5'G"[@"g*'3"M%$$?N@"I"9,#/*/*0%"/),#,-"/-#A%+))&.+',(#,&% 5 × 5 neighborhood; (4) C2 layer: 256 filters of size 6 × 6, connections with sparsity2 6/34#A%,)-'0*/='*%7'&)A3%43/*0%$,#*3+),%A)#,*/*0%+,'7%83)4&'.$#3B3L@"4-)7@"[,,iE%$$?\"" 0.5 between the 16 dimensions of P1 layer and the 256 dimensions of C2 layer; (5) P2 ]l" !"#$%" layer: max pooling over I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" each 5 × 5 neighborhood; (6) output layer: full connections between 256 × 4 × 4 P2 features and outputs. Moreover, we used least square loss for pseudo tasks and hinge loss for classification tasks. Every convolution filter is a linear function followed by a sigmoid transformation (see [15] for more details). It is interesting to contrast our approach with the layer-wise training one in [20]. In [20], each feature extractionZ)H5*'"5G5B(5D)'" layer is trained to model its input in a layer-wise fashion: the first layer is trained on the raw images and then used to produce the input to the second feature extraction layer. The whole resulting architecture is then used as a multilayered feature extractor over labeled data, and the resulting representation is then • M6N"#=.") used to feed an SVM classifier. On contrast, in our approach, we jointly train the classi– feature :HB-)S+"5"4*!%"4)+!"1'#=$&):/&#=$&"*'"(8+"(5-3+("G)H5*'".2*'3" fier and the extraction layers, thus the feature extraction layer training is guided F')b1+G3+"C-)H"(8+"0$/!#")1$2*'&"" by the pseudo-tasks as well as the labeled information simultaneously. Moreover, we believe that the two approaches are orthogonal as we might first pre-train the network using the method in [20], and then use the result as a starting point for our method. We leave• this68+"4!*'&'&%)5'G"4"04)0"4"75'"a+"C-)H"(8+"0*2")1$2*'&K"a.("b*(8"G*o+-+'(" exploration for future work. B-)a5a*1*(P"G*2(-*a.D)'2" – ,)ES5-*5(+"28*" 5 Generating Pseudo Tasks – ,)'7+B("G-*" We use a set of pseudo tasks to incorporate prior knowledge into the training of recognition models. Therefore, these tasks need to be 1) automatically computable based • n-"(8+P"75'"a+"C-)H"1'O"!"&4)1$2*'&0"" on unlabeled images, and 2) relevant to the specific recognition task at hand, in other – highly 6-5'2C+-" words, it is likely that two semantically similar images would be assigned similar outputs under a pseudo task. ]m" !"#$%" A simple approach to I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" construct pseudo tasks is depicted in Fig. 4. In this figure, the pseudo-task is constructed by sampling a random 2D patch and using it as a template to form a local 2D filter that operates on every training image. The value assigned to an image under this task is taken to be the maximum over the result of this 2D convolution Much recent researchMuch I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" recent ]?" !"#$%" Muchrecent recent Much re Mu Correcting bias 3 main classes ofsampling algorithms Much recent Much recentresearch research Correcting sampling bias + + -68-++"H5*'"5BB-)578+2" [This work] - methods Correcting bias Reweighting/Instance-based 3’09] main ofsampling algorithms +classes Correcting bias 3 main classes of algorithms [Sethy et al., -sampling + [Sugiyama et al., ’08] Much Mu Correcting sampling bias Correcting sampling bias Correcting sa + work] [Muandet et al.,[This ’13] + al., ’09] ++-+-et- methods [This work] [Sethy et al., ’09] [Pan [Huang et al., Bickel et al.,Reweighting/Instance-based ’07] -- methods Reweighting/Instance-based + Correct a • sample bias by reweighting labeled data: )P"7"'%8=&%)3)@&04*&#">6*0"1)2"48$10) [Sethy et al., ’09] - source[Gong [Sugiyama et al., ’08] et al., ’12] - ++ - - sampli +Correcting Inferring Correct sample bias by reweighting source labeled data: [Chen et al., ’12] source instances close toatarget instances are more important. [Daumé III, ’07] [Muandet et al., ’13]++ [Pan et al., ’09]et al., [Sethy ’09] [Huang et al., Bickel et al., ’07] sa ,)--+7("5"25HB1+"a*52"aP"-+b+*38D'3"2).-7+"15a+1+G"G5(5e"" Inferring -- Correctsource a sample bytoreweighting source labeled data: + -Correcting instances target instances are more [Shimodaira, ’00] [Gopalan al., ’11] [Blitzerbias et close al., ’06] domain[Gong et al.,’09] ’12] + [Argyriou et al, et ’08] [Sethy etimportant. al., [Sethy et al., ’06] + Inferring domain- [Sethy +[Chen et al., ’12] source instances close to target instances are more important. 2).-7+"*'2(5'7+2"R71)2+U"()"(5-3+("*'2(5'7+2"5-+"H)-+"*HB)-(5'(" [Daumé III, ’07] [Sugiyama et al., ’08] + et al., ’09] [Sethy + [Sugiyama etet al.,al., ’08]’09] [Sethy et al., ’09] [Shimodaira, ’00] [Gopalan et al., ’11] invariant [Blitzer et al., ’06] invariant domain- - et al.,[Pan [Evgeniou and Pontil, ’05] [Pan et et al., ’09] [Sugiyama et ’08] [Sugiyama ’08]al., [Huang etrepresentation al., et Bickel et’07] al., ’07] Feature-based methods/Find new spaces [Huang et al., Bickel al., Feature-based methods/Find new representation spaces [Sugiyama et al., ’08] [Sethy et al., ’09] + features + invariant features [Huang et al., Bickel et al., ’07]et al, ’08] + et [Argyriou [Duan [Sethy et al., ’06] [Huang - al., ’09] et al., Bickel et ’07]et a [Evgeniou and Pontil, ’05] + [Pan [Sugiyama et al., al., ’08] - [Argyriou et al, ’08] [Sethy et al., ’06] [Sugiyama et al., ’08] [Huang et al., Bickel et al., ’07] [Argyriou et al, ’08] [Sethy et al., ’06] [Gong et al., ’12] [Chen et al., ’12] [Daumé III, ’07] [Shimodaira, ’00] [Muandet et al., ’13] [Pan et al., ’09] [Gopalan et al., ’11] [Blitzer et al., ’06] [Evgeniou and Pontil, ’05] [Duan et al., ’09] [Argyriou et a Find + a common space where source and are close )Q"*4/!">6*0"1)2"48$10)3)Q'&1)&"7)!"+!"0"&4*=$&)0+*#"0) [Huang etrepresentation al., target Bickel et’06] al.,spaces ’07] new [Sethy et close al., -• Feature-based --+++[Daumé [Sethy et al., ’06] -al.,-methods/Find Find + a common space source and target are + [Duan Daumé III et et al.,al.,Saenko -++ features III, ’07] - etwhere ’09] + [Sethy et ’06] new [Duan features, etc) et al., ’10] [Huang et al., Bickel et al., al., ’07] +(projection, - - Find [Kulis + [Argyriou et a[D [Sethy et al., -al., Chenspace -+et+[Daumé -’00] +’06] -+[Blitzer III, ’07] et et al., et ’11] + afeatures, common where source and target [Duan al., Daumé III et al.,models Saenko et [Shimodaira, al., ’10] are close [Gopae al., ’06] Q*'G"5"7)HH)'"2B57+"b8+-+"2).-7+"5'G"(5-3+("5-+"71)2+"" (projection, new etc) [Shimodaira, [Blitzer Adjusting mismatched + [Sethy’00] et al., ’06] + -++[Daumé III, ’07] --+ et al., Chenetc) et al., ’11] (projection, new[Kulis features, [Shimodaira, ’00] [Duan et al., Daumé III et al., Saenko et al., ’10] [Kulis et al., Chen et al., ’11] Ajustement/Iterative methods MB-)0+7D)'K"'+b"C+5(.-+2K"+(7@N" Adjusting mismatched models [Blitzer et[Shimodaira, al., ’06] ’00] -+ [Shimodaira, ’00] [Blitzer et[Shimodaira, al., ’06] ’00] an [Evgeniou Adjusting mismatched models [Evgeniou and Pontil, ’05] Ajustement/Iterative methods + [Duan et al., Ajustement/Iterative methods + Modify the model by incorporating+pseudo-labeled information -[Duan + et al.,--’09] [Ev [Duan et al., Daum + [Evg - andIII Pontil, [Evgeniou ’05] [Evgeniou ’05] • )K1N/042"&4)3)@4"!*=.")2"48$10) + [Kulis et al., Chen et a [Duan et al., Dauméand etPontil, al., Saenko et al. + + + - - - ++ - -[D [Du +etal., + +pseudo-labeled [Kulis et al., Chen al.,-’11] Modifyby theincorporating model by incorporating information Modify the model pseudo-labeled Adjusting - information [Duan et al., ’09]mismatch =)G*CP"(8+"H)G+1"aP"*'7)-B)-5D'3"B2+.G)E15a+1+G"*'C)-H5D)'" [Duan et + ’09] (LaHC) (LaHC) (LaHC) -- + + -- [Duan e + [Duan et al., Daumé III et al., Saenk Adjusting mismatched models [Kulis et al. [Duan et al., Daumé III et al., Saen - [Kulis et al., Chen et al., ’11] Domain Adaptation - EPAT’14 29 / 95 Adjusting 29 /’11] 95 [Kulis et al., Chen et al., mism Adjusting mism Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 [Duan et [Kulis et al., 29 / 95 c$" !"#$%" Adjusting mismatched mod Adjusting mismatched mod ,)ES5-*5(+"28*) P"7"'%8=&%)H+(8)G2" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" c#" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" c%" !"#$%" ,)S5-*5(+"28*" Environnement non stationnaire • :'B.("G*2(-*a.D)'"785'3+2" • D$>.*!'*4")08'R) Examples of Covariate Shift – Z/-*S+"S*-(.+11+" 225 • Q.'7D)'51"-+15D)'"-+H5*'2" (Weak) extrapolation: .'785'3+G"Predict output values outside training region – ;)'"*@*@G@" Training samples • D8*&%"2"&4)1")#$&#"+4) – 2'*-)8$%&,/H% Test samples – ;)'"*@*@G@"~"')'"2(5D)''5*-+" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" c]" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" cc" !"#$%" 4-*'7*B1+" Co-variate shift • d+"F')b"(85("[<="*2"#$&0'04"&4) "@:)(8+"(+2("G*2(-*a.D)'"*2"(8+"25H+"52"(8+"(-5*'"G*2(-*a.D)'"B(-5*'MON" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" c>" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Q*-2("5'51P2*2" Q*-2("5'51P2*2" A first analysis A first analysis RPT (h) = E (xt ,y t )∼PT � I h(xt ) �= y Domain Adaptation - EPAT’14 RPT (h) = � t I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" (LaHC) ck" !"#$%" = cl" !"#$%" 32 / 95 (LaHC) E (xt ,y t )∼PT E (xt ,y t )∼PT � � I h(xt ) �= y t � PS (xt , y t ) � t I h(x ) �= y t PS (xt , y t ) I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 c?" !"#$%" 32 / 95 Q*-2("5'51P2*2" A first analysis RPT (h) = E (xt ,y t )∼PT � I h(xt ) �= y (xt , y t ) RPT (h) = � t E (xt ,y t )∼PT I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 = cm" !"#$%" 32 / 95 ⇒ Assume similar tasks, PS (y |x) = PT (y |x), then: S � PT (xt , y t ) � t I h(x ) �= y t t t PS (x , y ) I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 >$" !"#$%" 32 / 95 :11.2(-5D)'" DS (x) = DT (x) ⇒ ω(x) = 1 � DT (xt )PT (y t |xt ) � t = E I h(x ) �= y t (xt ,y t )∼PS DS (xt )PS (y t |xt ) � DT (xt ) � t = E I h(x ) �= y t t t t (x ,y )∼PS DS (x ) � � DT (xt ) = E E }I h(xt ) �= y t (xt )∼DS DS (xt ) y t ∼PS (y t |xt ) With Bias With DS (x)Bias �= DT (x) ⇒ ω(x) �= 1 DS (x) �= DT (x) ⇒ ω(x) �= 1 (xt ) DT DS (xt ) Idea reweight labeled source data according of ω(x t ): � t to an testimate � t E ω(x )I h(x ) �= y (xt ,y t )∼PS I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 (LaHC) E (xt ,y t )∼P No Bias No Bias DS (x) = DT (x) ⇒ ω(x) = 1 Covariate shift [Shimodaira,’00] (LaHC) E Illustration Illustration Q*-2("5'51P2*2" ⇒ weighted error on the source domain: ω(x t ) = � � I h(xt ) �= y t (x ,y ) (x ,y ) (LaHC) E (xt ,y t )∼PT � PS (xt , y t ) � t I h(x ) �= y t t t (xt ,y t )∼PT PS (x , y ) � � PS (xt , y t ) � t = PT (xt , y t ) I h(x ) �= y t t t PS (x , y ) t t = � � PS I h(xt ) �= y t PS (xt , y t ) � � PS (xt , y t ) � t = PT (xt , y t ) I h(x ) �= y t t t PS (x , y ) t t = Q*-2("5'51P2*2" A first analysis (LaHC) >#" !"#$%" 33 / 95 (LaHC) Domain Adaptation - EPAT’14 Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 34 / 95 / 95 >%"34 !"#$%" Z*t7.1("752+" Difficult case :11.2(-5D)'" Illustration No shared support ∃x, DS (x) = 0 and DT (x) �= 0 Shared support (LaHC) Illustration (LaHC) I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 >]" !"#$%" 35 / 95 :11.2(-5D)'" (LaHC) Illustration Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" DS (x) = 0 if and only if DT (x) = 0 Intuition: the quality of the adaptation depends on the magnitude on the weights I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 >c"37 !"#$%" / 95 :11.2(-5D)'" 37 / 95 >>" !"#$%" (LaHC) Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 37 / 95 >k" !"#$%" Illustration (LaHC) :11.2(-5D)'" :11.2(-5D)'" Illustration Domain Adaptation - EPAT’14 37 / 95 >l" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" (LaHC) I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 [O*2D'3"5BB-)578+2" Some existing approaches (1/2) Some existing approaches (1/2) >?" !"#$%" 37 / 95 4-*'7*B1+" • T5b")C"15-3+"'.Ha+-2" Density estimators Density estimators Build density estimators for source and target domains and estimate the – h5HB1+"5S+-53+2"7)'S+-3+"()"(8+"B)B.15D)'"H+5'" Build density estimators source and target domains and estimate the ratio between them - Ex for [Sugiyama et al.,NIPS’07]: �them b ratio ω̂(x) between Ex [Sugiyama et al.,NIPS’07]: = �l=1 αl ψl (x) b ω̂(x) = argmin l (x) l=1 αl ψα Learning: KL(ω̂DS , DT ) Learning: argminα KL(ω̂DS , DT ) Learn the weights discriminatively [Bickel et al.,ICML’07] Learn the weights [Bickel et al.,ICML’07] DT (xi ) discriminatively 1 Assume DS (xi ) ∝ p(q=1|x,θ) 1 T (xi ) Assume D DS (xi with ) ∝ p(q=1|x,θ) Label source label 1, target with label 0 and train a classifier (θ̂) Label source with label with logistic label 0 regression) and train a classifier (θ̂) to classify examples 1 or1,0 target (e.g. with to classify examples 1 or 0 (e.g. 1 regression) s with logistic Compute the new weights ω̂(xi ) = 1 s Compute the new weights ω̂(xsi ) = p(q = 1|xsi ; θ̂) p(q = 1|xi ; θ̂) (LaHC) (LaHC) Domain Adaptation - EPAT’14 38 / 95 Domain Adaptation - EPAT’14 38 / 95 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" >m" !"#$%" – j.("8)b"()"+2DH5(+" " " "S) I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" k$" !"#$%" :HB)-(5'7+"b+*38D'3" • &"'5ÄS+"+2DH5D)'")C " " ,)S5-*5(+"28*"*'"-+3-+22*)'" "G)+2"')("b)-F" – [2DH5D)'"G+'2*(P"*2"())"7-.G+"*'"8*38"G*H+'2*)'"2B57+"M5'G"b*(8"C+b" F')b'"(+2D'3"*'2(5'7+2N" • :G+5")C"h.3*P5H5e" – E"*!&)*)+*!*2"4!'#)2$1",")C"" 5'G" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" k#" !"#$%" ,)S5-*5(+"28*"*'"71522*^75D)'" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Bad news k%" !"#$%" j.(" DA is hard, even under covariate shift [Ben-David et al.,ALT’12] ⇒ To learn a classifier the number of examples depend on |H| (finite) or exponentially on the dimension of X Covariate shift assumption may fail: Tasks are not similar in general PS (y |x) �= PT (y |x) We did not consider the hypothesis space. Can define a general theory about DA? I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" k]" !"#$%" (LaHC) I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 kc" / 95 !"#$%" 40 68+"H#H"G*S+-3+'7+"Vj+'EZ5S*G"+("51@K";:4hE%$$kK"=T0`#$\" The H∆H-divergence [Ben-David et al.,NIPS’06;MLJ’10] &"^-2("5v+HB("5("5"(8+)-+D751"C-5H+b)-F" Definition dH∆H (DS , DT ) = C)-"1$2*'&)*1*+4*=$&) = sup (h,h� )∈H2 sup k>" !"#$%" � � �tE (h,h� )∈H2 x ∼DT h85*"j+'EZ5S*G"5'G"7)11+53.+2" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" � � � � �RDT (h, h� ) − RDS (h, h� )� � � I h(xt ) �= h� (xt ) − s E x ∼DS � ��� I h(xs ) �= h� (xs ) � I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" (LaHC) Domain Adaptation - EPAT’14 kk" !"#$%" 43 / 95 68+"H#H"G*S+-3+'7+"Vj+'EZ5S*G"+("51@K";:4hE%$$kK"=T0`#$\" The H∆H-divergence [Ben-David et al.,NIPS’06;MLJ’10] 68+"H#H"G*S+-3+'7+"Vj+'EZ5S*G"+("51@K";:4hE%$$kK"=T0`#$\" Computable from samples Definition Consider two samples S, T of size m from DS and DT dH∆H (DS , DT ) = = sup (h,h� )∈H2 sup (h,h� )∈H2 � � � � �RDT (h, h� ) − RDS (h, h� )� � � �tE x ∼DT � � I h(xt ) �= h� (xt ) − s E x ∼DS Illustration with only 2 hypothesis in H h and h� � ��� I h(xs ) �= h� (xs ) � dH∆H (DS , DT ) ≤ dH∆H (S, T ) + O(complexity(H) � log(m) m ) complexity(H): VC-dimension [Ben-david et al.,06;’10], Rademacher [Mansour et al.,’09] Empirical estimation 1 d̂H∆H (S, T ) = 2 1 − minh∈H m � x:h(x)=−1 1 � I [x ∈ S] + I [x ∈ T ] m x:h(x)=1 ⇒ Already seen: label source examples as -1, target ones as +1 and try to learn a classifier in H minimizing the associated empirical error Note: With a larger H, the distance will be high since we can easily find two hypothesis able to distinguish the two domains (LaHC) Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 43 / !"#$%" 95 kl" (LaHC) Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 44 /!"#$%" 95 k?" A first bound })*'3"()"5"3+'+-51*s5D)'"a).'G" Going to a generalization bound Preliminaries RPT (h, h� ) = E (x,y )∼PS I [h(x) �= h� (x)] = E x∼DT RPT (h) ≤ RPT (h∗ ) + RPT (h, h∗ ) I [h(x) �= h� (x)] ≤ RPT (h∗ ) + RPS (h, h∗ ) + RPT (h, h∗ ) − RPS (h, h∗ ) RPT (RPS ) fulfills the triangle inequality |RPT (h, h� ) − R PS (h, h� )| ≤ ≤ RPT (h∗ ) + RPS (h, h∗ ) + |RPT (h, h∗ ) − RPS (h, h∗ )| 1 ≤ RPT (h∗ ) + RPS (h, h∗ ) + dH∆H (DS , DT ) 2 1 ∗ ≤ RPT (h ) + RPS (h) + RPS (h∗ ) + dH∆H (DS , DT ) 2 1 ≤ RPS (h) + dH∆H (DS , DT ) + λ 2 � 1 log(m) ≤ RS (h) + dH∆H (S, T ) + O(complexity(H) )+λ 2 m 1 2 dH∆H (DS � , DT ) � � � since dH∆H (DS , DT ) = 2 sup(h,h� )∈H2 �RDT (h, h� ) − RDS (h, h� )� hS∗ = argminh∈H RPS (h): best on source ∗ = argmin hT h∈H RPT (h): best on target Ideal joint hypothesis h∗ = argminh∈H RPS (h) + RPT (h) ; λ = RPS (h∗ ) + RPT (h∗ ) (LaHC) Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 45 /!"#$%" 95 km" =5*'"(8+)-+D751"a).'G" Main theoretical bound I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" (LaHC) Domain Adaptation - EPAT’14 La théorie classique de l’adaptation de domaine Les résultats de S. Ben-David et al. et Mansour et al. 1 RPT (h) ≤ RPS (h) + dH∆H (DS , DT ) + λ 2 Théorème classique [Ben-Daviddeetdomaine al., 2010, Mansour et al., 2009a] La théorie classique de l’adaptation Les résultats declassique Ben-David[Ben-David et al. et Mansouret et al., al. 2010, Mansour et al., 2009a] Théorème Soit H unS.espace d’hypothèses. Si DS et DT sont deux distributions sur X , alors : Soit H un espace d’hypothèses. Si DS et DT sont deux distributions sur X , alors : erreur cible � �� � erreur cible Théorème classique et al., Mansour al., 2009a] � �� R� (h) ∀h ∈[Ben-David H, ≤ 2010, RPS (h) + 12 detH (D PT S , DT ) + ν 1 ∀h ∈d’hypothèses. H, RPT (h) Si ≤ DRPet (h) +��2 d� ,� DT ) + ν ��sur X , alors H (DSdistributions S D � sont � : Soit H un espace deux �S �� �T erreur source erreur cible divergences erreur source � �� � 1 ∀h ∈ de H, l’adaptation RPT (h) ≤de Rdomaine La théorie classique PS (h) + 2 dH (DS , DT ) + ν � �� � Les résultats de S. Ben-David et al. et Mansour et al. � �� � erreur source R1PS (h) : erreur classique sur le domaine source d (DS , DT ) 2 H : la H-divergence entre DS et DT divergences Minimisable via une méthode de classification supervisée sans adaptation Formalizes a natural approach: Move closer the two distributions while ensuring a low error on the source domain. Justifies many algorithms: reweighting methods, � 2010, Mansour et al., 2009a] � Théorème classique [Ben-David et al., 1 � � � � 1 despace , H-divergence DT ) = sup (h, ) −distributions RDS (h, h )sur d (Dun ,2D : Slad’hypothèses. DDD DThdeux �R � X , alors : H)(D T STTet Soit Si entre DS et sont 2 HH S (h,h� )∈H2 � � s � �� � �� � h� ) − R1� (h,t h� )�� � t � sup ≤ �RDR � Th) (x E + I Ddh(x R= D + )ν − s E I h(x ) T (h, SH (D) PT (h)sup P�S (h) S ,�= 2 x ∼DS xt ∼D � T �2 (h,h� )∈H �� � (h,h� )∈H�2 �� � erreur source divergences � � � ��� � = sup � t E I h(xt ) �= h� (xt ) − s E I h(xs ) �= h� (xs ) � erreur cible 1 d (DS , D )= ∈TH, 2 H ∀h Emilie Morvant (LIF-Qarma) Apprentissage de vote de majorité T (h,h� )∈H2 x ∼D x ∼DS 18 septembre 2013 feature-based methods, ν : divergence entre les étiquetages Emilie Morvant (LIF-Qarma) Apprentissage de vote de majorité � � � � ν = Emilie inf h�Morvant ∈H RP (LIF-Qarma) Apprentissage de vote de majorité S (h ) + RPT (h ) , erreur jointe optimale [Ben-David et al., 2010] adjusting/iterative methods. ou ν = RPT (hT∗ ) + RPT (hT∗ , hS∗ ), ∗ hX est la meilleure hypothèse sur le domaine X [Mansour et al., 2009a] I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 46 / 95 Les résultats de S. Ben-David et al. et Mansour et al. • La p'+"H*2+"+'"(8/)-*+"B*)''*y-+" théorie classique de l’adaptation de domaine Vj+'EZ5S*G"+("51@K"%$#$\" Let H a symmetric hypothesis space. If DS and DT are respectively the marginal distributions of source and target instances, then for all δ ∈ (0, 1], with probability at least 1 − δ : (LaHC) l$" !"#$%" T`5G5B(5D)'"G+"G)H5*'+" Theorem [Ben-David et al.,MLJ’10,NIPS’06] ∀h ∈ H, })*'3"()"5"3+'+-51*s5D)'"a).'G" l#" 47 /!"#$%" 95 Emilie Morvant (LIF-Qarma) 18 septembre 2013 ��� �= h� (xs ) � 11 / 40 18 septembre 2013 11 / 40 G&I)%e"#$&04!/'!").'"').S+1" +2B57+"G+"B-)0+7D)'"" G5'2"15Å.+11+"1+2"G+.O" G*2(-*a.D)'2"2)'("B-)78+2K" ().("+'"35-G5'(".'+"a)''+" B+-C)-H5'7+"2.-"1+" G)H5*'+"2).-7+" 11 / 40 18 septembre 2013 11 / 40 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Apprentissage de vote de majorité l%" !"#$%" 4-*'7*B1+" • :'(+3-5(+"2)H+"'&:$!2*=$&)*6$/4)48")4*!%"4"25HB1+2"*(+-5DS+1P" "{|".2+",+#12-3"$4#"+! • P"2$.")3)*11"2)H+"'&04*&#"0)2)"52"()"7'6)%$")%3'4,-)% K1N/0=&%)3)@4"!*=.")2"48$10" &/3$,/J4='*%$'(#,&3%$")%$#,0)$%&/3$,/J4='*% • P"+"*4)(8+"B-)7+22"/&=,)7)'S+-3+'7+")-"')"-+H5*'*'3"*'2(5'7+2" l]" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" FK;TI" Vj-.s)'+"+("51@K"%$#$\" DASVM [Bruzzone et etal.,’10] DASVM [Bruzzone al.,’10] Z&hi=e"513)-*(8H" A brief recap on SVM A brief recap on SVM KL >E%M%E% nn Learning sample LS LS = {(x Learning sample = {(x yi )} i , yi ,i )} i=1 i=1 %@ T+5-'"5"71522*^+-""$"C-)H"(8+"1+5-'*'3"25HB1+">E"" a classifier h(x) = �w, LearnLearn a classifier h(x) = �w, x�x� ++b b Formulation: min Formulation: min w,b,ξ w,b,ξ UV P"+"*4)/&=,"2()BB*'3"7-*(+-*)'" � �nn 1 1 �w� 2 2++ CC ξii i=1 ξ 2 22 i=1 2 �w� – ;","#4)(8+"^-2("B)4*!%"4)"H*2+,"0"5(""2@(@"$"Ç""M5("N"Ç"#"b*(8"8*38+2("H5-3*'"" 5'G"5o+7("(8+"B2+.G)E15a+1"E#" subject �i (�w,x x�i �++b) b) ≥ ≥ 11 − − ξξi ,, 11≤≤i i≤≤n n subject to to �i (�w, i i ξ�0 – ;","#4)(8+"^-2("B)4*!%"4)"H*2+,"0"5("2@(@"E#"Ç""M5("N"Ç"$"b*(8"8*38+2("H5-3*'"" 5'G"5o+7("(8+H"(8+"B2+.G)E15a+1"~#" ξ�0 – K11)(8+2+"%B"+O5HB1+2"MB2+.G)E15a+1+GN"()"67"" – P"2$.")C-)H"67"(8+"^-2("B"B)2*DS+"5'G"B"'+35DS+"2).-7+"*'2(5'7+2"b*(8"8*38+2("H5-3*'" WV M/4+/4)(8+"152("71522*^+-" (LaHC) (LaHC) lc" !"#$%" Domain Adaptation - EPAT’14 Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 68 / 95 68 / 95 l>" !"#$%" "&13)-*(8H"2()B2"b8+'"(8+"'.Ha+-")C"2+1+7(+G"*'2(5'7+2"5("+578"2(+B"C5112"G)b'"a+1)b" 5"(8-+28)1G@" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" lk" !"#$%" Z&hi=e"*11.2(-5D)'" DASVM - graphical illustration Z&hi=e"*11.2(-5D)'" DASVM - graphical illustration I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" (LaHC) ll" !"#$%" Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 70 / 95 Z&hi=e"*11.2(-5D)'" DASVM - graphical illustration (LaHC) Domain Adaptation - EPAT’14 70 / 95 Z&hi=e"*11.2(-5D)'" DASVM - graphical illustration I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" (LaHC) Domain Adaptation - EPAT’14 l?" !"#$%" lm" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 70 / 95 (LaHC) Domain Adaptation - EPAT’14 ?$" !"#$%" 70 / 95 Z&hi=e"*11.2(-5D)'" DASVM - graphical illustration Z&hi=" • 68+-+"5-+"48"$!"=#*,)04/1'"0) – "j52+G")'"(8+"')D)'")C"b+5F"1+5-'+-2" – ":'")-G+-"()"G+(+-H*'+"(8+"7)'G*D)'2"5'G"3.5-5'(++2")C"Z&hi=" • 68+-+"5-+"*++,'#*=$&0) – "[@3@"*'"(8+"G)H5*'")C"785-57(+-"-+7)3'*D)'" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" (LaHC) ?#" !"#$%" Domain Adaptation - EPAT’14 ?%" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 70 / 95 Idea :G+5"Change the feature representation X to better represent shared characteristics between the two domains some features are domain-specific, others are generalizable or there exist mappings from the original space • D8*&%")48"):"*4/!")!"+!"0"&4*=$&"N"()"a+v+-" ⇒ Make source and target domain explicitely similar -+B-+2+'("285-+G"785-57(+-*2D72"a+(b++'"(8+"(b)" ⇒ Learn a new feature space by embedding or projection G)H5*'2" – "2)H+"C+5(.-+2"5-+"G)H5*'E2B+7*7K" Q"*4/!")!)9!$N"#=$&)a52+G"*++!$*#8"0" – ")(8+-2"5-+"3+'+-51*s5a1+" – ")-"(8+-+"+O*2("H5BB*'32"C-)H"(8+")-*3*'51"2B57+" (LaHC) Domain Adaptation - EPAT’14 53 / 95 "{|"=5F+"2).-7+"5'G"(5-3+("G)H5*'"+OB1*7*(1P"0'2',*!) "{|"T+5-'"5"&"7):"*4/!")0+*#""aP"+Ha+GG*'3")-"" """""B-)0+7D)'" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" ?]" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" ?c" !"#$%" Q*'G"15(+'("2B57+2"9"h(-.7(.-51",)--+2B)'G+'7+"T+5-'*'3"" Find:11.2(-5D)'e" latent spaces - Structural Correspondence Learning Vj1*(s+-"+("51@K"%$$l\" [Blitzer et al.,’07] Identify shared features Q*'G"15(+'("2B57+2"9"h(-.7(.-51",)--+2B)'G+'7+"T+5-'*'3"" Find:11.2(-5D)'e" latent spaces - Structural Correspondence Learning Vj1*(s+-"+("51@K"%$$l\" [Blitzer et al.,’07] Apply PCA source+target new features to get a low rank latent representation Learn a classifier in the new projection space defined by PCA Sentiment analysis - Bag of words (bigrams) Choose K pivot features (frequent words in both domains, highly correlated with labels) Learn K classifiers to predict pivot features from remaining features For each feature add K new features Represents source and target data with these features (LaHC) Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 57 / 95 ?>" !"#$%" :11.2(-5D)'e"=5'*C)1GEa52+G"H+(8)G2" Manifold-based methods 58 / 95 ?k" !"#$%" [Gopalan et al.,’10] Apply PCA on source data ⇒ matrix S1 of rank d Apply PCA on target data ⇒ matrix S2 of rank d Geodesic path on the Grassman manifold GN,d (d-dimensional vector subspaces ⊂ RN ) between S1 and S2 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" :11.2(-5D)'e"=5'*C)1GEa52+G"H+(8)G2" Manifold-based methods Assume X ⊆ RN (LaHC) (LaHC) ?l" !"#$%" 59 / 95 Use of an exponential flow ψ(t � ) = Qexp(t � B)J with Q N × N matrix with determinant 1 s.t. QT S1 = J and JT = [Id 0N−d,d ] intermediate subspaces are obtained by computing B (skew block-diagonal matrix) and varying t � between 0 and 1 Take a collection S � of l subspaces between S1 and S2 on the manifold Project the data on S� and learn in that new space (LaHC) I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Domain Adaptation - EPAT’14 ??" !"#$%" 60 / 95 A simpler approach - Subspace alignment [Fernando et al.,ICCV’13] &"2*HB1+-"5BB-)578"9";/60+*#")*,'%&2"&4"VQ+-'5'G)"+("51@K":,,iE$]\" h.a2B57+"51*3'H+'("513)-*(8H" Subspace alignment algorithm Algorithm 1: Subspace alignment DA algorithm Data: Source data S, Target data T , Source labels YS , Subspace dimension d Result: Predicted target labels YT S1 ← PCA(S, d) (source subspace defined by the first d eigenvectors) ; S2 ← PCA(T , d) (target subspace defined by the first d eigenvectors); X a ← S1 S1 � S 2 (operator for aligning the source subspace to the target one); Sa = SXa (new source data in the aligned space); T T = T S2 (new target data in the aligned space); YT ← Classifier (Sa , TT , YS ) ; M∗ = S1 � S2 corresponds to the “subspace alignment matrix”: M∗ = argminM �S1 M − S2 � Move closer PCA-based representations • =)S+"71)2+-"()"4,&Ea52+G"-+B-+2+'(5D)'2" Totally unsupervised • 6)(511P".'2.B+-S*2+G" (LaHC) Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" Xa = S1 S1 � S2 = S1 M∗ projects the source data to the target subspace 61 / 95 ?m" !"#$%" A natural similarity: Sim(xs , xt ) = xs S1 M∗ S1 � x�t = xs Ax�t (LaHC) Domain Adaptation - EPAT’14 I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" 62 / 95 m$" !"#$%" Q+5(.-+Ea52+G"H+(8)G2" • w"5-+"S+-P"B)B.15-" • u)("()B*7"-*38("')b" P":"!"&#"0" • n'+"7+'(-51"Å.+2D)'e"" – Z+^'+"5"2*H*15-*(P"H5B" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" m#" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" m%" !"#$%" <+C+-+'7+2")'"Z)H5*'"&G5B(5D)'"5'G"6-5'2C+-" " T*2(")C"(-5'2C+-"1+5-'*'3"B5B+-2" 8vBe!!bbb#@*%-@5E2(5-@+G.@23!É02B5'!7)'C+-+'7+6T@8(H1" " T*2(")C"5S5*15a1+"2)b5-+2" 8vBe!!bbb@[email protected](@8F!6T!*'G+O@8(H1" " h.-S+P2" " – – 45(+1K"})B515'K",8+115BB5@"i*2.51"Z)H5*'"&G5B(5D)'e"&'"nS+-S*+b")C<+7+'("&GS5'7+2@"6+78"-+B)-(K"%$#c@" – – =5-3)1*2@"&"T*(+-5(.-+"<+S*+b")C"Z)H5*'"&G5B(5D)'"b*(8"p'15a+1+G"Z5(5@"6+78"-+B)-("%$##@" &GG*D)'51"<+C+-+'7+2" x*"T*@"T*(+-5(.-+"h.-S+Pe"Z)H5*'"&G5B(5D)'"&13)-*(8H2"C)-";5(.-51"T5'3.53+"4-)7+22*'3K"6+78"-+B)-(K"%$#%" 45'"5'G"X5'3@"&"2.-S+P")'"6-5'2C+-"T+5-'*'3`K"6YZ["%$#$@" _@"x.*)'+-)E,5'G+15"5'G"=@"h.3*P5H5"5'G"&@"h78b5*38)C+-"5'G";@Z@"T5b-+'7+"M[G2N" Z5(52+("h8*"*'"=578*'+"T+5-'*'3" =:6"4-+22K"%$$m" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" h@"j+'EZ5S*G" 6)b5-G2"(8+)-+D751".'G+-2(5'G*'3")C"G)H5*'" 5G5B(5D)'"1+5-'*'3" d)-F28)B"T;::Z"5("[,=TE$m" " &@"u5a-5-G" &'":'(-)G.7D)'"()"6-5'2C+-"T+5-'*'3"5'G"Z)H5*'" &G5B(5D)'" [7)1+"G`/(/"[4&6E%$#c" " _@"j1*(s+-"5'G"u"Z5.H/":::" Z)H5*'"&G5B(5D)'" 6.()-*51":,=T"%$#$" " h@"45'K"x@"X5'3"5'G"d@"Q5'"" 6.()-*51e"6-5'2C+-"T+5-'*'3"b*(8"&BB1*75D)'2" :_,&:`#]" " Y@"}-5.H5''" &G5B(5D)'"C)-")a0+7(2"5'G"5v-*a.(+2" d)-F28)B"i*2Z&"5(":,,i`#]" " " &@"u5a-5-GK"_E4"4+P-578+"5'G"=@"h+aa5'" :(+-5DS+"2+1CE15a+1*'3"Z)H5*'"&G5B(5D)'"C)-"T*'+5-" h(-.7(.-+G":H53+",1522*^75D)'" :_&:6E%$#]" Q@"h85"5'G"j@"Y*'32a.-P"" Z)H5*'"&G5B(5D)'"*'"=578*'+"T+5-'*'3"5'G" hB++78"<+7)3'*D)'" 6.()-*51"9":'(+-2B++78"%$#%" " Z@"g.K"Y@"h5+'F)"5'G":@"65'3" 6.()-*51")'"Z)H5*'"6-5'2C+-"T+5-'*'3"C)-"i*2*)'" &BB1*75D)'2" ,i4<`#%" " &@"u5a-5-GK"_E4"4+P-578+"5'G"=@"h+aa5'" j))2D'3"C)-".'2.B+-S*2+G"G)H5*'"5G5B(5D)'" [,=TE%$#]" m]" !"#$%" &G5B(5D)'"G+"G)H5*'+"W"5'51)3*+" • &BB-+'G-+"Ñ"15"C)*2"e" " I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" mc" !"#$%" &'"5BB-)578"()"5'51)3Pe".2*'3"Y)1H)3)-)S"7)HB1+O*(P" Dictionnaires x – p'+"a)''+"!"+!X0"&4*=$&)) CS hx • Z."G)H5*'+"2).-7+" • Z."G)H5*'+"7*a1+" x' xt+1 xt CT ft y Mt Mt+1 ft+1 – p'+"a)''+"-y31+"G+"4!*&0:$!2*=$&) Source Cible yt+1 yt Dictionnaires aababc abc hx abd Source CS • lettre • succ-lettre • ... CT • groupe (règle-construc) • succ-groupe • ... Cible I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" V,)-'./0)12K"#mmkK"#mmlK"#mm?K"%$#k\" m>" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" mk" !"#$%" est-ce que la règle ! S est de transformer tous les c par des d ?) ainsi que le transfert de la source vers la cible (comment percevoir iijjkk ?, et quelle est la règle !C adéquate ?). 4.2 Théorie du domaine et longueurs de description La théorie du domaine qui permet de décrire les différents aspects des objets du monde inclut des primitives de représentation, ainsi que des structures de base. La table 1 ci-dessous fournit la liste de celles que nous avons définies pour ce travail. &'"5BB-)578"()"5'51)3Pe".2*'3"Y)1H)3)-)S"7)HB1+O*(P" • impératif respecter particulier impératifde respecterles lescontraintes contraintesdu ducalcul calculdes desprobabilités, probabilités,c'est-à-dire c'est-à-direen particulierque que impératif dederespecter les contraintes du calcul des probabilités, c'est-à-dire enenparticulier que Descripteurs utilisés dans la définition des structures : - orientation (-> / <-) - cardinalité ou nombre d'éléments : n 1 bit log2(n) + 1 bits - type d'éléments - longueur : l • • • • xt+1 xt ft yt Mt Mt+1 • • ft+1 yt+1 • • &'"5BB-)578"()"5'51)3Pe".2*'3"Y)1H)3)-)S"7)HB1+O*(P" la V,)-'./0)12K"#mmkK"#mmlK"#mm?K"%$#k\" sommedes desprobabilités probabilitésd'événements d'événementsexhaustifs exhaustifset mutuellementexclusifs exclusifségale égale1. lalasomme somme des probabilités d'événements exhaustifs etetmutuellement mutuellement exclusifs égale 1.1. Ainsi Ainsil'objet l'objet''abc 'abc'' pourrait ' pourraitêtre êtrereprésenté représentépar par:: : Ainsi l'objet abc pourrait être représenté par (voir en-dessous) log2(l) + 1 bits - commençant ou se terminant par l'élément = x L(x) bits Lettre (1/2) -> 1 bit Une lettre particulière (e.g. 'd') (1/2.26) -> 6 bits Chaîne (orientation,éléments) (1/8) -> 3 bits L = 3 + L(orientation) + ! L(éléments) e.g. L('a3bd' avec orientation = ->) = 3 + 1 + log2((1/2.26)3) + L(3) xt+1 xt ft = 3 + 1 + 18 + 3 = 25 bits Ensemble (type d'éléments, cardinalité, éléments) (1/8) -> 3 bits L = 3 + L(type) + L(cardinalité) + ! L(éléments) Groupe (type d'éléments, nombre d'éléments, éléments) (1/8) -> 3 bits L = 3 + L(type) + L(nb él.) + ! L(éléments) Séquence (orientation, type d'éléments, loi de succession ou nombre d'éléments, longueur, commençant ou se terminant par) (1/8) L = 3+ L(orient.) + L(type) + L(loi) or L(nb él.) + L(long) + L(début/fin) Description et longueur d'une loi de succession succ(type-of-el.,n,x) ! le nième successeur de l'élément x du type type-of-el. L = L(type) + L(n (voir ci-dessous)) + L(x) L(n) = L(1/6) si n=1 ou -1 (1er successeur ou prédécesseur) L(1/3) si n=0 (même élément) L((1/3).(1/2)p) sinon (avec p=n si n"O, p=-n sinon) Premier / Dernier (par rapport à l'orientation définie) 1 bit nième n bits 'abc' ! Chaîne (1/8) 'abc'! ! Chaîne Chaîne (1/8) 'abc' (1/8) orientation -> (1/2) orientation: -> (1/2) orientation ::-> (1/2) 3 1er='A', (1/4.26) 33 1er='A',2ème='B', 2ème='B',3ème='C' 3ème='C' (1/4.26) 1er='A', 2ème='B', 3ème='C' (1/4.26) TOTAL 21 TOTAL(longueur) (longueur) : 21bits bits TOTAL (longueur) :: 21 bits Mt yt Mt+1 ou oubien bienpar: par: ou bien par: ft+1 'abc' Ensemble 'abc'! Ensemble 'abc' ! ! Ensemble {'A', 'B', 'C'} {'A','B', 'B','C'} 'C'} {'A', yt+1 'abc' Séquence 'abc'! Séquence 'abc' ! ! Séquence orientation ::-> -> orientation: -> orientation = l(1/12)= bits ==l(1/12) l(1/12) ==4 44bits bits longueur 3 longueur= bits longueur ==3 33 33bits bits commençant (1/26) commençantavec avecl'élément(lettre='A') l'élément(lettre='A') (1/26) commençant avec l'élément(lettre='A') (1/26) TOTAL TOTAL: TOTAL :: 4.3 4.3 1 1 1 1 4 1 10 bits • iijjkk => iijjll => iijjkl Solution 13 ::"Remplacer iijjkk => iijjll Solution "Remplacergroupe lettre de dedroite droitepar parson D"successeur" iijjkk => iijjkd Solution 24 ::"Remplacer delettre droitepar parson son successeur" successeur" iijjkk =>iijjkk iijjkl => iikjkk Solution "Remplacerlettre 3ème Solution 3 : "Remplacer lettre de droite par D" iijjkk => iijjkd Solution 5 : "Remplacer les C par D" iijjkk => iijjkk Solution 1 : "Remplacer groupe de droite par son successeur" iijjkk => iijjkk iijjkk => iijjkk => Solution 4 : "Remplacer 3ème lettre par son successeur" Solution 6 : "Remplacer groupe de droite par la lettre D" Solution 5 : "Remplacer les C par D" Solution 6 : "Remplacer groupe de droite par la lettre D" P1;S1 P1;S2 P1;S3 P1;S4 P1;S4 11 11 18 P1;S5 12 18 7 22 8 15 3 7 0 8 0 3 17 0 36 0 42 17 15 10 P1;S1 10 8 P1;S2 9 9 18 L(S |MS) L( !SS|M 84 18 4 18 3 L(!SC|M L(M |MSS) ) L(MC|M L(S C |MCS)) 45 4 0 0 36 3 0 L(S |M ) 8 C) L(!CC|MC L(!C |MC ) Total-1 (bits) Total-1 (bits) Total-2 (bits) Total-2 (bits) Rang Rang Coût (bits) Coût (bits) Rang Rang 58 6 6 41 41 35 35 1 1 19 19 55 36 4 4 71 71 67 67 3 3 13 13 1 1 0 36 36 3 3 71 71 68 68 4 14 2 4 14 2 36 7 79 72 4 18 3 7 79 72 4 18 3 42 8 93 85 6 20 4 ['"%"e"" ['"]"e"" ['"c"e"" ['">"e"" iikjkk => iijjd iijjkk iijjd P1;S3 11 11 18 L(MS) L(S L(M ) S) SS|M Expériences Expériences Lesexpériences expériencesréalisées réaliséesmanuellement manuellementont ontconsisté consistéààprendre prendreune unesérie sériede detests testsavec avec Les différentessolutions solutionsexposées exposéesdans dans[Mitchell,93], [Mitchell,93],ainsi ainsique qued'autres, d'autres,etetààcalculer calculerpour pourchaque chaque différentes problèmeetetchaque chaquesolution solutionproposée proposéeles lesvaleurs valeursde decomplexité complexitéalgorithmique algorithmiquedes desformules formules problème (1)etet(2) (2)dedelalasection section3.3.L'espace L'espacelimité limiténenepermet permetninidedefournir fournirlalaliste listeexhaustive exhaustivedes desessais essais (1) réalisés,ninide dedonner donnerleledétail détaildes descalculs calculs(se (sereporter reporteràà[Cornuéjols, [Cornuéjols,96, 96,en enpréparation], préparation], réalisés, [Khedoucci,94]). [Khedoucci,94]). Brièvement,lalaméthode méthodeest estlalasuivante. suivante.Pour Pourchaque chaqueproblème problème(ex: (ex:abc abc => => abd abd ;; Brièvement, iijjkk => => ?? pourchaque chaquesolution solutionproposée proposée(ex: (ex:iijjkk iijjkk => => iijjll iijjll perception, iijjkk ) )etetpour ),),lalaperception, 2'*3/0*)%e"G/7).B+-"15"^3.-+"2.*S5'(+"+'"*"B5-D+2"2.B+-B)25a1+2" donclaladescription, description,associée associéesont sontconjecturées. conjecturées.Ainsi, Ainsi,par parexemple, exemple,lelemodèle modèleMMS Scicietetdonc dessouscorrespond correspondààlalaperception perceptionde del'objet l'objet 'abc 'abc' 'comme commeune uneséquence séquenceavec avecune uneloi loide de dessous succession spécifique. spécifique. Pour Pour chacune chacune des des descriptions descriptions ainsi ainsi définies, définies, les les longueurs longueurs de de succession descriptionassociées, associées,suivant suivantles lesformules formules(1) (1)etet(2) (2)sont sontcalculées. calculées.On Onpeut peutalors alorscomparer comparerlala description valeurdedechaque chaquesolution solutionsuivant suivantles lesmesures mesuresdéfinies définiesenensection section3.3. valeur [o+(2"G+"2/Å.+'7+2" 3 Problème 1 :2 : "Remplacer abc lettre => de abd ; par iijjkk => ?iijjkk Solution droite son successeur" 17 17bits bits 17 bits Dans cet exemple, laladernière dernière représentation est lalaplus plus économique alors même qu'elle Danscet cetexemple, exemple,la dernièrereprésentation représentationest estla pluséconomique économiquealors alorsmême mêmequ'elle qu'elle Dans abc décritplus pluscomplètement complètementlalastructure structuredede'abc 'abc' que ' quepar parexemple exemplelalaseconde secondedescription descriptionqui quin'en n'en décrit m?" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" !"#$%" a'a b'b c'c retientque quelalaperception perceptiond'un d'unensemble ensembledes destrois troislettres lettres'a ' et'c retient ',','b ' et '.'. 3 4 1 10 bits (1/2) (1/2) (1/2) successeur(élt(lettre=x)) successeur(élt(lettre=x))= élt(succ(lettre,1,x)) successeur(élt(lettre=x)) ==élt(succ(lettre,1,x)) élt(succ(lettre,1,x)) L(lettre) L(lettre)+ L(1ersucc) succ)+ L(x) = L(1/2. 1/6. 1) L(lettre) ++L(1er L(1er succ) ++L(x) L(x) ==L(1/2 L(1/2 ..1/6 1/6 ..1) 1) &'"5BB-)578"()"5'51)3Pe".2*'3"Y)1H)3)-)S"7)HB1+O*(P" succes.(élt(lettre=x) = élt(succ(lettre,1,x)) Dernier abc => abd ; iijjkk => TOTAL ? : (1/8) (1/8) (1/8) (1/2) (1/2) (1/2) type d'éléments ==lettres lettres typed'éléments d'éléments= lettres type loi de succession :: loide desuccession succession: loi Afin de pouvoir calculer les complexités algorithmiques associées aux formules définies dans la section 3, il est nécessaire de définir la longueur de description associée à chaque primitive de représentation. Le choix de ces longueurs est arbitraire et doit normalement ml" !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" refléter la connaissance a priori du domaine par l'agent. Il y a donc là une possibilité d'apprentissage et de test de divers biais correspondant à des contextes ou des connaissances différents. Certaines contraintes pèsent cependant sur ce choix. En effet, la longueur de description L associée à un concept doit idéalement correspondre à sa probabilité a priori P, par la formule L=-log2(P) (Ainsi par exemple, la longueur de description du concept de chaîne ci-dessous est de 3 bits car sa probabilité a priori est estimée à 1/8). Il est alors Problème 1 : 3 ou ouencore encorepar: par: ou encore par: Table 1 : Liste des primitives de représentation et de leur longueur de description associée. MS ! Séquence orientation : -> type d'éléments = lettres MS ! Séquence loi de succession : orientation : -> succes.(élt(lettre=x) = élt(succ(lettre,1,x)) type d'éléments = lettres Dernier V,)-'./0)12K"#mmkK"#mmlK"#mm?K"%$#k\" loi de succession : TOTAL : (1/8) (1/8) (1/8) (1/4.26) 33 (1/4.26) (1/4.26) TOTAL TOTAL : 20bits bits TOTAL :: 20 20 bits P1;S5 P1;S6 12 P1;S6 11 22 15 11 8 93 85 6 20 4 15 3 65 62 2 31 6 3 65 62 2 31 6 Table complexitésassociées associéesaux aux formules et pour (2) pour chaque solution du problème 1 sont reportées Table 22:: Les Les complexités formules (1) (1) et (2) chaque solution du problème 1 sont reportées mm"la !"#$%" I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" ici. que, pour pourceceproblème, problème, deux formules conduisent au même classement, que l'analogie ici.On On notera notera que, lesles deux formules conduisent au même classement, et que et l'analogie la meilleure, selon selon le défini, correspond à la àsolution 1, ce 1, quiceestqui confirmé par despar des meilleure, le principe principed'économie d'économie défini, correspond la solution est confirmé expériences sur sur des on on demande de classer les solutions ci-dessus. La sous-table sur les sur les expériences des sujets sujetshumains humainsauxquels auxquels demande de classer les solutions ci-dessus. La sous-table "coûts" est est expliquée expliquée dans "coûts" danslalasection section5. 5. Les résultats résultats obtenus montrent d'une part part que le volet de Les obtenussur surces cesexemples exemples montrent d'une quedeuxième le deuxième volet de l'hypothèse analogique _la coïncidence des optima des formules (1) et (2)_ semble justifié I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" #$$" !"#$%" 6-5'2C+-"5'G"2+Å.+'7+"+o+7(2" &'51)3PK"(-5'2C+-"5'G"+#81#&9#!#:#9)+! V,)-'./0)12"W"=.-+'5K"%$#k\" 123 xt−1 abc abc abd aababc ? aababc Mt−1 ft−1 ft Mt yt−1 ? !"#"$ aababc 1 ? %&'' + , ()* -$.&/(%01$023"!023!"4-0523$&!"* I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" abc efg $"/!1'"3()23)* $"/!1'"3()23)* abd #$#" !"#$%" Mt+1 Mt Mt−1 ? yt+1 ' 6 %&''()* ijjkkk ft+1 Mt+1 yt ? abc abd • ()()" 124 xt+1 xt ijk $"/!1'"3()23)* efh ? I"&BB-+'D2253+"H.1DE(J78+2K"(-5'2C+-("+("5G5B(5D)'"G+"G)H5*'+"L""""""M&@",)-'./0)12N"" #$%" !"#$%"