I. INTRODUCTION A help for end-users: Software for Analysis of

Transcription

I. INTRODUCTION A help for end-users: Software for Analysis of
A help for end-users: Software for Analysis of Survey
Gerard HAT ABIAN
Christine JOUAN
Electricite de France - Direction des Etudes et Recherches
1, avenue du General de Gaulle - 92141 Clam art - France
I. INTRODUCTION
Analysing a survey needs a great number of techniques, as reading the data, coding them, sometimes
excluding aberrant individuals, ••• then computing so-called classical statistics and finally using the
techniques of the multivariate data analysis. Specific programs or soft wares, using different
languages have been written to make these processings easier.
However, it proves impossible to use them without a good knowledge of a computer language. Now we
can find a lot of people interested in such surveys, who understand statistics enough to choose the
right methods to process, but who don't understand any software and so can't carry out the survey
analysis themselves.
The application we've written tries to improve this situation. Indeed it looks like a logical series of
menu screens, program screens and help screens. We are going to see how.
II. BASIC PRINCIPES
1. A software for statisticians unused to programming language
Our application aims at endusers having no computer knowledge and particularly no SAS
knowledge. The enduser never needs to program
j
he only has to choose explicitly methods, options
and variables to study. He is offered these choices by screens which appear in a logical and
recursive way.
137
The most delicate point is to build the initial data set, because the data are generalJy on an
external support. This ~nly step requires some knowledge of reading conventions; a help screen
will provide these few conventions.
2. Specificities of the three kinds of screens
Each kind of screen has a specific use
the menu screens
They allow the enduser to choose a treatment or a class of treatment. Then enduser is driven
to a program screen or an other menu screen.
the program screens
Wh~n he has chosen a treatment, the enduser must give the necessary parameters for the
execution. These screens try to be precise enough to allow the enduser to filling the fields
without any ambiguity. After the execution, the initial menu or sometimes an other program
,screen comes back.
the help screens
They give some recalls about the statistical techniques. The enduser is given a real lecture on
the chosen methods, recalls about the meanirtgs of the parameters and the
the results.
3. The referring data set
The whole applications is written
first to build a referring data set (if it does not exist)
then to analyse to contents of this referring data set •
• 1lI. DESCRIPTION OF EXISTING SCREENS
We are going to describe the chaining and the aims of the most important screens.
138
interpr~tation
of
1. The basic menu
DEDOUILLlMENT D UNE
Pr~~aratlon
~
ENQUET~
1. Building data set
du tabLeau des donn.es
Statistllues descrlptives des
2. Descriptive statistics
donn~es
3
AnaLyse en Composantes PrincipaLes
4
AnaLyse des Correspondances MuLtipLes
5
Analyse Discriminante
6
CLassification
3. Principal component analysis
4. Correspondence Analysis
5. Discriminant Analysis
6. Clustering
This menu gives the different steps of survey analysis.
After building the referring data set (option 1), we must analyse the contents. These analysis can
be simple statistics (option 2) or multidimensional analysis (options 3 to 6).
We have deliberately centered applications on the possibilities given by the multidimensional
analysis.
2. Option 1 of the basic menu: building the data set
PREPARATION" DU TABLEAU DES DONNEES
Press
SeLect Option ===,
FORWARD for more.
1. Reading the initial data
Lecture des donn.es initiaLes
I.ffipression des
donn~e5
ffianluantes
3
Recodage
4
SupprE"55ic-.n j ' i--c;v i du5
5
Suppression de
6
Calcul de nouvelles variabLes
7
r.finition des
8
Centrale du contenu du tableau des
~al-iabLes
LiteLL~s
en cLair
donn~es
2. Writing missing values
3. Recoding
4. Excluding individuals
5. Excluding variables
6. Computing new variables
7. Labels and formats
8. Controls
~-
~.
~.
1::
t~
Un .cran d'aide
ind'\ue
les ~uel\ues conventions du
language SAS (tab,Leau,types
de ".niabLes,forrol,ts! ·l.u'iL est ,,~:e~"""'e de Lor"Mitre PO'l'- utiLi!'e,- cette
appLic.~tlon.Il es t 'o1i:-PQnibLe en apPu'/dnt sur La touche p1 (OU p13).
j
i~
t'
i~c
l
139
The data are usually on an external support and have to be read to build an initial refiering data set
(option!). Then we can control the contents of this data set (options 2 and 8) or modify this contents
(option ) to 6).
At last, we can associate labels and formats to each variable (option 7).
As mentioned above, this step requires few conventions and so a help screen can be read from this
menu screen.
i) Reading initial data: option 1
This option leads to a first screen which asks a choice between ...
LECTURE DES DONNEES INITIALES
COj'IF'l.al~d
Cette
:.==)
.ta?~
p~rffiet
tabl~au
conform~
au
et~~
rentr~es
~
au
lndi.uez votre cas
Vos
2
La
l~cture
syst~me.Ces
des
donn~es
dann~es
extetnes et La constitution d'un
peuvent se trouver dans un fichier tso
l'~cra~.
===>
donn~es
sont dans un fichier tso
Vous devez rentrer vos
donn~es
,
Quel nOm voulez vous donner au tableau .ui sera
l'~cran
cr~~
:
===)
App uyez ."ur
la touche p3 (ou pIS l pour cont i nuer •
... reading on a T50 file or ...
LECTURE DES DONNEES SUR UN FICHIER TSG
CO'IHlia nd
Quel est
===)
le'nom du flehler tso contenant
Les donn~es
'H6BHH4S.ZZDGNNEE.DATA'
Indi~uel
===)
===)
Le forroaf de Lecture de ces donn@es
VI V2 V3 $ V4
140
••• reading 'at the screen
LECTURE DES DUNNEES A L
$
EtRA~
Donnez,en les s~parant par un blanc, les noms des vari~bles (suivis du symbole
si i I S'''9 i t de val".iab les caracttol"es) •
===>
VAR1 1 VAR2 3-4 VAR3
Rentrez vos donn~es.
Vous devez met tre iii la su i te, en les s~parant par un blanc,
v~riables du premier individu,puis du second
Mettre un ' . ' en cas de donn~es man.uant~s.
les va leurs des
===) 1 22 7
3 33 6
9 42 8
ii) modifications of the data (options 3 to 6)
It means classical· modifications of the initial data. At each step, we can irreversibly
modify the initial. data set or build a new one with the modifications.
For instance, we can exclude variables (option 5)
'3U~~R~SSlUN
D~
VA~lA~L~S
===}
Ce wenu perl~et de suppriMer d~fjnltivement certaines variables de v~tre
CO!IHlland
tableau ou de crter,iiI partir du tableau initial,un nouveau tableau ne contenant
certaines variables.
pas
Qual est
l~
nOM
de votre tableau
===:. TABLEAU1
rndi~uez
l'option choisie :===} 2
Vous gardez Ie meme tableau
2
Dans le
Don~ez
v1
deuxi~ffle
les
no~s
Un autre tableau doit etre
cas, indlquez
Le nOM du
des variables 1ue
~ous
nouve~u
cr.~
tableau
tableau2 _______ _
vouLez suppriroer
v2 _________________________________________________________________________ _
Or exclude individuals (option 4)
141
M~nu
Ce
~P~ffiPt
au de Lr~~I-, ci
certains
de
suppr.l~er
P':Jj·tl.-
d~1lnltivement
du tableau l"iiitlal, un
cert~~~~
nOUVedlJ
indivl~IJS de votre tableau
tabfeau ne contenaHt pitS
Indl~irlus
Quel est le nom de votre tableau
",==> TA9LEAUI
===)
l'option choisie
Indl~uez
2
Vous gardez le meme tableau
2
deuxi~me
Dans Ie
Un autre tableau doit etre
cas,indi~uez
cr~f
Ie nom du nouveau tableau
tab1eau2 _______ _
Tourner la page
With a filter on a variable
DEUXIEHE CAS
Com,.and ===)
Donnez Ie nom de la variable:
VI
:rnd I'\uez pour cet te var i ab Ie les va leur s concern~es (roet tre e"tl-e gu i llemets
~odalit~s des varIables caract~res)_
l~s
Les individus correspondant aux valeur. de la variable ca~prises e"tre ~a
origine- et La -valeur fin- indiqu~es seront sIJPpriM~s.On peut donner
ptusieurs couple's de va.leurs.Si la 'valeur fin' est laiss~e en blanc,seuls Les
individus ayantla 'vaLeur origine' sefont supprim~s •
·vate~r
va leur
origine
v.a leur
*
fin
*
***************************
1_______ * _______ _
5_______ *
7 ______ _
*
*
*
*
*..
,.
~
*
~
f~
f(
Or an explicit list of the number of the individuals to exclude
,..'
r,.
SU?PRESSICN D INDIVIDUS
c.O(l\ma~1d
~i.
~~us
'!
=== 1
pouvez
SIJPpl-.i~ler
un ou
Deux cas so"t possibles:
,
les i"dividu.
plusle~rs
~ui
;ndividus de votre tableau.
doivent etre
supprim~s
sont
rep~r~s
par leur
ntiM~rO
~
~
1·
h
dans le fichier.
Vous ~ouLez suppriroer les individus correspondant
TrH1i'llJo:?= '~'ctre ~3r
~t
.. )
app~~er
s~r
: !:===-} 1
la touche p3 (ou pIS) pour co"ti"uer.
,'REIHER CAS
SOfllllland ====}
,.~
!
I
"
I
===) 1 _____
~
_____ 4 ____ _
r
.f
.t
I
,~~
~
~
~
142
~
u~e
~~
iii) controls Of the data (option 2 and 8)
These options. allow us to control the contents of a referring data set. They can be used
everytime during
th~
applications.
Functions provided by option 8 are the following ones :
Ce menu permet,' tout moment,de controle; le contenu de vot~e tableau.
Vous pouvez def~ander le listage des variablesdu tableau, le tri;i plat de ces
variables,ou l'impression de tout le tableau.
Quel est le nom de votre tableau
.
1. listing the variables
===} TABLEAU1
Indi~uez
votre option :
==~}
1
Listage des variables
2
Tr j , plat des var j""ab les
3
Impressicn du tableau
2. computing frequencies
3. printing the data
The option 2 lists all the observations with missing values·
iv) formats and labels
This option gives the possibility to show the variables one by one, on successive screens,
and to associate a name (the label) to these variables. If necessary, all the modalities of
these variables can be listed, and a name (the format) can be associated to these
modalities.
143
3. Option 2 of the basic menu: descriptive statistics
This option is often the only one used in the survey analysises ; indeed for a long time these have
been summ~d up to crosstabulation, frequencies, univariate statistics and correlations computed
on the initial data.
;?
Of course the first phasis is always necessary. But this option can be used in a selective way, with
variables chosen according to the results of the multivariate analysis.
This menu is the following.
Select Option
STATISTIQUES DESCRiPTIVES DES DONNEES
Press FORWARD for More.
==c)'
1. frequencies and charts
1
Tris'
2
Statisti,ues
3
Tris croises
"
COl"re Lat ions
plat-histo9ra~Mes
2. univariate statistics
univari~es
3. crosstabultation
4. correlations
I~.
~
,f
!
Statistics may be computed in a standard option, or giving a weight to each individual, or inside
t
each group of individuals defined by a group variable.
Forthemore a help sreen is associated to each of these options. This help screen is a recall of
theoritical notions (chi-squarred, significance probability, median ....)
144
r: O'"II}~ nd
T-Ipez
•. I,.; • .:.T J ':IJ!
S
"NJVto~.JEr.S
~::=)
un 'x'
F:1JI
•
~
"._.
... ~-
,i
perhlet La J~5cr;ptlon d':!.e ~u pL'l!;~U~; variabl~i
et dp tpsts si,.~ptp.t" ~
:es ~~alyses s~p~I·'es peuvent Ft: ~ cLtenues si vas donn~es
;~( ~I'P va~l~hl~ de ]~O'lre .
~~ GbS~fv~tlons peuvent etre mU~les d'U~ poids •
~
"I~~U
~
)
I'aide d@
Jt~tlsti
iJ~.5'
We give some examples
The univariate statistics
~"
.. el I!st
le
,1nill
1l.'
~ont
p~rtitl~~"~~~
..rot.-e tab Leau
===) TABLEAUt
T~pe%
U~1
','
5;
Sin~n, indi~uez
vous voulez La ~eS~rl?tIQn de toutes les varj~bles . ===) x
le au les nOMS des variables dont vous ~oulez la descri~tion
--~--------------------------------------------~-------------------------------
Variable de 9roup~
Variable de poids
===) 999 ____ _
===> ppp-----
sTATISn",UES' UNIVARfEES
COIRliiand ===)
Cholsissez votre option
===)
siIRples d~crivant Las obs~rvations
N =le nOMbre d'observations
MEAN= la liloyenne
STD= l' ecar t type elop i r i ~u'e
MIN=la pLus petite vaLeur
MAX=la plus grande valeur
SUM=la sOIRIRe de toutes les vaLeurs
VAR= La val' i ance emp i I· i ~ue
CV =Le coefficient de variation
.
T =Ie T de Student testant L'hypoth~se de nuLlite de La IRoyenne
PRT=la probabi lite d'une pLus 9rande vaLeur du T de Student
Quel.u~s statlsti~ues
2
Une description plus cOIIIPIl>te des observ.ations.Outre les statisti~ues
editees dans le preIRier cas,vous trouverez les statisti~ues suivantes:
RANGE=Ld difference entre la plus petite et la plus grande valeur
MODE= Ie .,ode
' Q3=le 75% ~uantile MEDIAN=la mediane Ql=le 25% .uantile
Pt=Le t% .uanti Le P5=le 5% ~uantile ••• P99=le 99% ~uantile
D =Ia statisti~ue de KolIR090TOV permettant de tester si les
observations sont distribuees nOTIRalement (si I'echantillon
est de tai lie plus petite ~ue 51 c'est W la statisti~ue de
de Shapiro-Wi Ik)
PRD=la probabilite d'une plus 9rande valeur de La statisti~ue D
de Ko 1000gOI'ov
and the cross-tabulations
TFIS t.:rWISES
COldmand ===)
7apez un 'x'
pour lire l'ecran d'aide associe
~
cette option: ===)
Ce menu permet d'obtenir les tableaux croisant entre elles certaines variables
~e votr~ tableau.
:es croiseIRents separes peuvent etre faits si vos donnees sont partitlonnees
par
11n~
variable de 9roupe •
:..es cibse.rvatioils peuvent etre ,,,unies d'un poids •
:uel est le nOIR de votre tableau
===) TABLEAU1
vt ____________________________________________________________________________ _
;~ec'
les variables
v2 ____________________________________________________ ________________________ _
-~pel
~n
'x' pour cbtenir Ie test d' ondependance du Chi2
Varl~h~~
1~
~rG~pe
~=~)
999 ____ _
Var;~hl~
4~
pOlds
=:'=~
ppp-----
: ===) x
145
IV. CONCLUSION
Obviously, this paper describes the current step of our application and only gives an idea of what the
final product wlll be ; for instance nothing about multidimentional techniques has been shown.
Modifications will occur when the application wlll be tested by the future end-users. And they wlll be
also our final juges.
Anyway, studying the feasablllty of such an application, using the SASAF product, was necessary. And
the great faclllty we have met to build our software, convinces us to go further.
146

Documents pareils