I. INTRODUCTION A help for end-users: Software for Analysis of
Transcription
I. INTRODUCTION A help for end-users: Software for Analysis of
A help for end-users: Software for Analysis of Survey Gerard HAT ABIAN Christine JOUAN Electricite de France - Direction des Etudes et Recherches 1, avenue du General de Gaulle - 92141 Clam art - France I. INTRODUCTION Analysing a survey needs a great number of techniques, as reading the data, coding them, sometimes excluding aberrant individuals, ••• then computing so-called classical statistics and finally using the techniques of the multivariate data analysis. Specific programs or soft wares, using different languages have been written to make these processings easier. However, it proves impossible to use them without a good knowledge of a computer language. Now we can find a lot of people interested in such surveys, who understand statistics enough to choose the right methods to process, but who don't understand any software and so can't carry out the survey analysis themselves. The application we've written tries to improve this situation. Indeed it looks like a logical series of menu screens, program screens and help screens. We are going to see how. II. BASIC PRINCIPES 1. A software for statisticians unused to programming language Our application aims at endusers having no computer knowledge and particularly no SAS knowledge. The enduser never needs to program j he only has to choose explicitly methods, options and variables to study. He is offered these choices by screens which appear in a logical and recursive way. 137 The most delicate point is to build the initial data set, because the data are generalJy on an external support. This ~nly step requires some knowledge of reading conventions; a help screen will provide these few conventions. 2. Specificities of the three kinds of screens Each kind of screen has a specific use the menu screens They allow the enduser to choose a treatment or a class of treatment. Then enduser is driven to a program screen or an other menu screen. the program screens Wh~n he has chosen a treatment, the enduser must give the necessary parameters for the execution. These screens try to be precise enough to allow the enduser to filling the fields without any ambiguity. After the execution, the initial menu or sometimes an other program ,screen comes back. the help screens They give some recalls about the statistical techniques. The enduser is given a real lecture on the chosen methods, recalls about the meanirtgs of the parameters and the the results. 3. The referring data set The whole applications is written first to build a referring data set (if it does not exist) then to analyse to contents of this referring data set • • 1lI. DESCRIPTION OF EXISTING SCREENS We are going to describe the chaining and the aims of the most important screens. 138 interpr~tation of 1. The basic menu DEDOUILLlMENT D UNE Pr~~aratlon ~ ENQUET~ 1. Building data set du tabLeau des donn.es Statistllues descrlptives des 2. Descriptive statistics donn~es 3 AnaLyse en Composantes PrincipaLes 4 AnaLyse des Correspondances MuLtipLes 5 Analyse Discriminante 6 CLassification 3. Principal component analysis 4. Correspondence Analysis 5. Discriminant Analysis 6. Clustering This menu gives the different steps of survey analysis. After building the referring data set (option 1), we must analyse the contents. These analysis can be simple statistics (option 2) or multidimensional analysis (options 3 to 6). We have deliberately centered applications on the possibilities given by the multidimensional analysis. 2. Option 1 of the basic menu: building the data set PREPARATION" DU TABLEAU DES DONNEES Press SeLect Option ===, FORWARD for more. 1. Reading the initial data Lecture des donn.es initiaLes I.ffipression des donn~e5 ffianluantes 3 Recodage 4 SupprE"55ic-.n j ' i--c;v i du5 5 Suppression de 6 Calcul de nouvelles variabLes 7 r.finition des 8 Centrale du contenu du tableau des ~al-iabLes LiteLL~s en cLair donn~es 2. Writing missing values 3. Recoding 4. Excluding individuals 5. Excluding variables 6. Computing new variables 7. Labels and formats 8. Controls ~- ~. ~. 1:: t~ Un .cran d'aide ind'\ue les ~uel\ues conventions du language SAS (tab,Leau,types de ".niabLes,forrol,ts! ·l.u'iL est ,,~:e~"""'e de Lor"Mitre PO'l'- utiLi!'e,- cette appLic.~tlon.Il es t 'o1i:-PQnibLe en apPu'/dnt sur La touche p1 (OU p13). j i~ t' i~c l 139 The data are usually on an external support and have to be read to build an initial refiering data set (option!). Then we can control the contents of this data set (options 2 and 8) or modify this contents (option ) to 6). At last, we can associate labels and formats to each variable (option 7). As mentioned above, this step requires few conventions and so a help screen can be read from this menu screen. i) Reading initial data: option 1 This option leads to a first screen which asks a choice between ... LECTURE DES DONNEES INITIALES COj'IF'l.al~d Cette :.==) .ta?~ p~rffiet tabl~au conform~ au et~~ rentr~es ~ au lndi.uez votre cas Vos 2 La l~cture syst~me.Ces des donn~es dann~es extetnes et La constitution d'un peuvent se trouver dans un fichier tso l'~cra~. ===> donn~es sont dans un fichier tso Vous devez rentrer vos donn~es , Quel nOm voulez vous donner au tableau .ui sera l'~cran cr~~ : ===) App uyez ."ur la touche p3 (ou pIS l pour cont i nuer • ... reading on a T50 file or ... LECTURE DES DONNEES SUR UN FICHIER TSG CO'IHlia nd Quel est ===) le'nom du flehler tso contenant Les donn~es 'H6BHH4S.ZZDGNNEE.DATA' Indi~uel ===) ===) Le forroaf de Lecture de ces donn@es VI V2 V3 $ V4 140 ••• reading 'at the screen LECTURE DES DUNNEES A L $ EtRA~ Donnez,en les s~parant par un blanc, les noms des vari~bles (suivis du symbole si i I S'''9 i t de val".iab les caracttol"es) • ===> VAR1 1 VAR2 3-4 VAR3 Rentrez vos donn~es. Vous devez met tre iii la su i te, en les s~parant par un blanc, v~riables du premier individu,puis du second Mettre un ' . ' en cas de donn~es man.uant~s. les va leurs des ===) 1 22 7 3 33 6 9 42 8 ii) modifications of the data (options 3 to 6) It means classical· modifications of the initial data. At each step, we can irreversibly modify the initial. data set or build a new one with the modifications. For instance, we can exclude variables (option 5) '3U~~R~SSlUN D~ VA~lA~L~S ===} Ce wenu perl~et de suppriMer d~fjnltivement certaines variables de v~tre CO!IHlland tableau ou de crter,iiI partir du tableau initial,un nouveau tableau ne contenant certaines variables. pas Qual est l~ nOM de votre tableau ===:. TABLEAU1 rndi~uez l'option choisie :===} 2 Vous gardez Ie meme tableau 2 Dans le Don~ez v1 deuxi~ffle les no~s Un autre tableau doit etre cas, indlquez Le nOM du des variables 1ue ~ous nouve~u cr.~ tableau tableau2 _______ _ vouLez suppriroer v2 _________________________________________________________________________ _ Or exclude individuals (option 4) 141 M~nu Ce ~P~ffiPt au de Lr~~I-, ci certains de suppr.l~er P':Jj·tl.- d~1lnltivement du tableau l"iiitlal, un cert~~~~ nOUVedlJ indivl~IJS de votre tableau tabfeau ne contenaHt pitS Indl~irlus Quel est le nom de votre tableau ",==> TA9LEAUI ===) l'option choisie Indl~uez 2 Vous gardez le meme tableau 2 deuxi~me Dans Ie Un autre tableau doit etre cas,indi~uez cr~f Ie nom du nouveau tableau tab1eau2 _______ _ Tourner la page With a filter on a variable DEUXIEHE CAS Com,.and ===) Donnez Ie nom de la variable: VI :rnd I'\uez pour cet te var i ab Ie les va leur s concern~es (roet tre e"tl-e gu i llemets ~odalit~s des varIables caract~res)_ l~s Les individus correspondant aux valeur. de la variable ca~prises e"tre ~a origine- et La -valeur fin- indiqu~es seront sIJPpriM~s.On peut donner ptusieurs couple's de va.leurs.Si la 'valeur fin' est laiss~e en blanc,seuls Les individus ayantla 'vaLeur origine' sefont supprim~s • ·vate~r va leur origine v.a leur * fin * *************************** 1_______ * _______ _ 5_______ * 7 ______ _ * * * * *.. ,. ~ * ~ f~ f( Or an explicit list of the number of the individuals to exclude ,..' r,. SU?PRESSICN D INDIVIDUS c.O(l\ma~1d ~i. ~~us '! === 1 pouvez SIJPpl-.i~ler un ou Deux cas so"t possibles: , les i"dividu. plusle~rs ~ui ;ndividus de votre tableau. doivent etre supprim~s sont rep~r~s par leur ntiM~rO ~ ~ 1· h dans le fichier. Vous ~ouLez suppriroer les individus correspondant TrH1i'llJo:?= '~'ctre ~3r ~t .. ) app~~er s~r : !:===-} 1 la touche p3 (ou pIS) pour co"ti"uer. ,'REIHER CAS SOfllllland ====} ,.~ ! I " I ===) 1 _____ ~ _____ 4 ____ _ r .f .t I ,~~ ~ ~ ~ 142 ~ u~e ~~ iii) controls Of the data (option 2 and 8) These options. allow us to control the contents of a referring data set. They can be used everytime during th~ applications. Functions provided by option 8 are the following ones : Ce menu permet,' tout moment,de controle; le contenu de vot~e tableau. Vous pouvez def~ander le listage des variablesdu tableau, le tri;i plat de ces variables,ou l'impression de tout le tableau. Quel est le nom de votre tableau . 1. listing the variables ===} TABLEAU1 Indi~uez votre option : ==~} 1 Listage des variables 2 Tr j , plat des var j""ab les 3 Impressicn du tableau 2. computing frequencies 3. printing the data The option 2 lists all the observations with missing values· iv) formats and labels This option gives the possibility to show the variables one by one, on successive screens, and to associate a name (the label) to these variables. If necessary, all the modalities of these variables can be listed, and a name (the format) can be associated to these modalities. 143 3. Option 2 of the basic menu: descriptive statistics This option is often the only one used in the survey analysises ; indeed for a long time these have been summ~d up to crosstabulation, frequencies, univariate statistics and correlations computed on the initial data. ;? Of course the first phasis is always necessary. But this option can be used in a selective way, with variables chosen according to the results of the multivariate analysis. This menu is the following. Select Option STATISTIQUES DESCRiPTIVES DES DONNEES Press FORWARD for More. ==c)' 1. frequencies and charts 1 Tris' 2 Statisti,ues 3 Tris croises " COl"re Lat ions plat-histo9ra~Mes 2. univariate statistics univari~es 3. crosstabultation 4. correlations I~. ~ ,f ! Statistics may be computed in a standard option, or giving a weight to each individual, or inside t each group of individuals defined by a group variable. Forthemore a help sreen is associated to each of these options. This help screen is a recall of theoritical notions (chi-squarred, significance probability, median ....) 144 r: O'"II}~ nd T-Ipez •. I,.; • .:.T J ':IJ! S "NJVto~.JEr.S ~::=) un 'x' F:1JI • ~ "._. ... ~- ,i perhlet La J~5cr;ptlon d':!.e ~u pL'l!;~U~; variabl~i et dp tpsts si,.~ptp.t" ~ :es ~~alyses s~p~I·'es peuvent Ft: ~ cLtenues si vas donn~es ;~( ~I'P va~l~hl~ de ]~O'lre . ~~ GbS~fv~tlons peuvent etre mU~les d'U~ poids • ~ "I~~U ~ ) I'aide d@ Jt~tlsti iJ~.5' We give some examples The univariate statistics ~" .. el I!st le ,1nill 1l.' ~ont p~rtitl~~"~~~ ..rot.-e tab Leau ===) TABLEAUt T~pe% U~1 ',' 5; Sin~n, indi~uez vous voulez La ~eS~rl?tIQn de toutes les varj~bles . ===) x le au les nOMS des variables dont vous ~oulez la descri~tion --~--------------------------------------------~------------------------------- Variable de 9roup~ Variable de poids ===) 999 ____ _ ===> ppp----- sTATISn",UES' UNIVARfEES COIRliiand ===) Cholsissez votre option ===) siIRples d~crivant Las obs~rvations N =le nOMbre d'observations MEAN= la liloyenne STD= l' ecar t type elop i r i ~u'e MIN=la pLus petite vaLeur MAX=la plus grande valeur SUM=la sOIRIRe de toutes les vaLeurs VAR= La val' i ance emp i I· i ~ue CV =Le coefficient de variation . T =Ie T de Student testant L'hypoth~se de nuLlite de La IRoyenne PRT=la probabi lite d'une pLus 9rande vaLeur du T de Student Quel.u~s statlsti~ues 2 Une description plus cOIIIPIl>te des observ.ations.Outre les statisti~ues editees dans le preIRier cas,vous trouverez les statisti~ues suivantes: RANGE=Ld difference entre la plus petite et la plus grande valeur MODE= Ie .,ode ' Q3=le 75% ~uantile MEDIAN=la mediane Ql=le 25% .uantile Pt=Le t% .uanti Le P5=le 5% ~uantile ••• P99=le 99% ~uantile D =Ia statisti~ue de KolIR090TOV permettant de tester si les observations sont distribuees nOTIRalement (si I'echantillon est de tai lie plus petite ~ue 51 c'est W la statisti~ue de de Shapiro-Wi Ik) PRD=la probabilite d'une plus 9rande valeur de La statisti~ue D de Ko 1000gOI'ov and the cross-tabulations TFIS t.:rWISES COldmand ===) 7apez un 'x' pour lire l'ecran d'aide associe ~ cette option: ===) Ce menu permet d'obtenir les tableaux croisant entre elles certaines variables ~e votr~ tableau. :es croiseIRents separes peuvent etre faits si vos donnees sont partitlonnees par 11n~ variable de 9roupe • :..es cibse.rvatioils peuvent etre ,,,unies d'un poids • :uel est le nOIR de votre tableau ===) TABLEAU1 vt ____________________________________________________________________________ _ ;~ec' les variables v2 ____________________________________________________ ________________________ _ -~pel ~n 'x' pour cbtenir Ie test d' ondependance du Chi2 Varl~h~~ 1~ ~rG~pe ~=~) 999 ____ _ Var;~hl~ 4~ pOlds =:'=~ ppp----- : ===) x 145 IV. CONCLUSION Obviously, this paper describes the current step of our application and only gives an idea of what the final product wlll be ; for instance nothing about multidimentional techniques has been shown. Modifications will occur when the application wlll be tested by the future end-users. And they wlll be also our final juges. Anyway, studying the feasablllty of such an application, using the SASAF product, was necessary. And the great faclllty we have met to build our software, convinces us to go further. 146