Data consistency in context of INSPIRE

Transcription

Data consistency in context of INSPIRE
Data consistency
in the context of INSPIRE
INSPIRE Drafting Team
DT “Data specifications”
8 Novembre 2006
1
Anne Ruas
Expert for IGN France
CONTEXT OF THE DT
SPECIFICATION WORKFLOW
Modelling rules
D2.5
D2.6
DT
Specifications
D2.7
by theme
Themes
8 Novembre 2006
D2.3
needs
CT
2
COMPONENT OF
HARMANISATION
1. INSPIRE Information Model
1.1 INSPIRE Principles
1.4 ISO 19100 Profile
1.7 Object referencing
modelling
3. Guidelines & Best
Practice
3.1 Metadata
1.2 Reference model
1.5 Multi -lingual text and
cultural adaptibility
1.8 Data translation
model/guidelines
1.3 Application Schemas
1.6 Coordinate referencing and units model
1.9 Portrayal model
2. Operational components/registers
2.1 Identifier
Management
8 Novembre 20062.2 Terminology
2.3 Feature catalogues
2.5 Conformance
2.4 Dictionaries
White : proprietary components
3
3.5
re -re
3.5Derived
Derived
porting
&
multiple
porting multiple
representations
representations
3.2 Maintenance
3.6
3.6Consistency
Consistency
between
data
between
data
3.3 Quality
3.7 Data capturing
3.4 Data Transfer
Extract of INSPIRE requirement
• “The implementing rules shall be designed to
ensure consistency as between items of the
information which refer to the same location or
between items of information which refer to the
same object represented at different scales.”
(article13.3, p19)
8 Novembre 2006
4
• “In order to ensure that spatial data relating to a
spatial feature the location of which spans the
frontier between two Member States are coherent,
Member States shall, where appropriate, decide
by mutual consent on the depiction and position of
such common features” (article 16.2, p19)
• INSPIRE intends to allow – as far as possible the access to data all over Europe from different
sources for environmental studies
8 Novembre 2006
5
• To allow certain use and analysis, data should be
coherent one to another :
– How ? (which methods ?)
– Who ? (the data providers or the server ?)
– Where are the data ? (physically)
– When (do we make the data consistent ?)
– For which data ?
– … at what cost ?
In order to fit data together …
• First step :
– To have the same reference system
– To have the same data schema
• Including the same ‘rules of selection’ (not only the same
name of class and name of attribute)
• Second step : control the consistency btw
the representations
8 Novembre 2006
6
Correctness : Does a data base well
depict the reality ?
•
For each object, does it well represent its entity(ies)?
– geometric and attribute accuracy,
– shape and size accuracy (that complete the geometric
accuracy)
– attribute actuality, correctness and completeness
•
For each class (theme) does the collection of objects of a
type (e.g. the hydrographic class) well represents the set of
entities (e.g. the hydrographic network)
– appropriate selection, appropriate distribution,
– actuality, completeness
•
8 Novembre 2006
all together, do the objects have relationships coherent with
the relationships of the entities they represent?
– topology (connectivity, adherence, inclusion)
– overlapping (in case of 2D5 or 3D data)
7
Consistency
• Egenhofer 1994
– logical consistency : the data are coherent
with the model
– the inter-representation consistency : the
different representations do not contradict
8 Novembre 2006
8
– “Consistency refers to the lack of any logical
contradiction within a model of reality. This must not be
confused with correctness, which excludes any
contradiction with reality. […]
– In itself, each individual level may be consistent,
however, when integrating and comparing the different
levels, inconsistencies may be detected if the
representations contradict “
Level of Detail
• LoD is defined by :
– type of information (the class and the attributes)
– selection rules (that explain which entities of the real
world will be represented in the data base)
– accuracy of the attribute
– type of geometry (3D, 2D5, 2D ; Volume, Polygon, Line
or points)
– accuracy of the geometry
8 Novembre 2006
9
• The number that defines the LoD (e.g. 1meter)
sums up all these information, while the LoD is
accurately defined by the data base
specifications.
Consistency in the context of
INSPIRE
• Consistency between different themes at
the same Level of detail,
• Consistency of a theme at two different
level of detail
• Consistency of objects along a boundary.
8 Novembre 2006
10
Between at the same Level of
detail,
• Consistency btw objects of the same theme
– Eg : topology
– Under the Responsibility of data provider (?)
8 Novembre 2006
11
Between different themes at the
same Level of detail,
• Before starting … :
– Checking geometrical consistency btw objects
of different themes, having different LoD is
certainly useless
– some themes do not share any constraint of
coherence one to another.
8 Novembre 2006
12
• As an example if one can study the interactions between
roads and risk or population areas, there is a priori no
constraint between theses themes
Simple inconsistencies
• Simple inconsistencies exist when
objects are supposed to have the same
geometry
– These inconsistencies can be detected and
corrected by appropriate data matching
algorithms
8 Novembre 2006
13
– it can be checked only if some information
are redundant or if hypothesis exist (DTM)
Identifying constraints
• Checking the consistency between themes
requires checking theme by theme if the data
share specific constraints.
– Topographic data (that describe the landscape) are
certainly the most constrained.
– As an example Relief and rivers should be coherent one
to another. In the same way road should lay on a DTM
8 Novembre 2006
14
• In the case of underground data, some
consistency rules might be checked when a
geological layer touches the ground.
Z
8 Novembre 2006
15
Complex inconsistencies
• More complex inconsistencies concern the
relationships that look coherent without
external information.
– For example we could have a building that is
represented inside a forest whereas in reality it
is outside the forest.
8 Novembre 2006
16
• In such a case, the representation looks
coherent but the information is false.
External information is needed to detect
such errors.
Consistency between LoDs
• Sheeren 2005 distinguished
– Differences are due to the data base
specifications (LoD)
With
– inconsistencies are differences that are not
explained by the specifications.
8 Novembre 2006
17
Example 1
• the river flow is the same (in terms of
network) even if the geometry is simplified
from polygons to lines,
8 Novembre 2006
18
Example 2
• the build-up area is coherent with the
distribution of buildings, even if small
building extensions are not included in
these areas
8 Novembre 2006
19
Example 3
• even very simplified, the logic of car
navigation is respected.
8 Novembre 2006
20
• Checking the coherence between LoD
requires :
– to identify class by class the relationships
between classes of both LoD (aggregation,
generalisation, selection, simplification).
8 Novembre 2006
21
– To check if the main properties at a level of
detail are well maintained at the other level of
detail (e.g. the build-up area and the buildings).
1Checking relationships btw LoD
•
•
•
8 Novembre 2006
22
•
aggregation : an object at LoD2 is an aggregate of objects
at LoD1 (e.g. the build-up area at LoD2 is composed of
buildings at LoD1)
generalisation (in the DB meaning) : an object at LoD2 is
represented in LoD1 by several spatially connected objects
from more specific classes (e.g. a forest in LoD2 and
conifers and leafy trees areas in LoD1). This class relation is
also named classification hierarchy by some authors.
object selection: a set of objects of a class in LoD2
represents a selection of the main objects of a larger set at
LoD1 (e.g. the road or river network).
Reduction of geometric dimension: An object represented
by an area at LoD1 is represented by a line or point at LoD2
(e.g. a river from polygon to line or a building from polygon
to point).
2Are main properties well maintained ?
Ex IGN : BDTopo - BDCarto
8 Novembre 2006
23
• Gesbert, Sheeren and Mustière (Gesbert,
2004, 2005; Sheeren at al., 2004,
Sheeren, 2005) argue that the analysis of
consistency between levels of detail
requires an accurate description of data
base specifications by means of a formal
model of description.
8 Novembre 2006
24
Ontologie
«Ent. géog.»
Élt du réseau hydro
0..1
«Ent. géog.»
Cours d'eau
*
«Ent. géog.»
Fossé
«Ent. géog.»
Canal
«Ent. géog.»
Acc. parcours
affluent de
«Ent. géog.»
Source
«Ent. géog.»
Rivière
«Ent. géog.»
Aqueduc
«Ent. géog.»
Nœud réseau
«Ent. géog.»
Barrage
«Ent. géog.»
Cascade
(canalisation)
«Ent. géog.»
Confluent
«Ent. géog.»
Écluse
8 Novembre 2006
«Objet de la base»
Surface d’eau
«Objet de la base»
Cours d'eau nommé
25
Modélisation :
Nature = cascade
Sélection :
largeur > 7,5 m
«Objet de la base»
Tronçon cours d’eau
Sélection :
a un toponyme
«Ent. géog.»
Diffluent
Modélisation :
axe à résolution 2,5 m
artificiel = oui
découpé en tronçons aux
changements d’attributs
«Ent. géog.»
Embouchure
«Ent. géog.»
Perte
BD Topo Pays
Modélisation :
attribut Nature = source
«Objet de la base»
Point d’eau
Formal description of DB
Specifications
8 Novembre 2006
GESBERT 2005
26
Consistency across boundaries
8 Novembre 2006
27
• Rubber Sheeting is a famous method to
stretch data one to another (see for
example Laurini 1996; Haunert 2005).
• Which side could be changed ?
8 Novembre 2006
28
Consistency for INSPIRE
– How ?
• which methods ? Web Matching service ?
– Who ?
• the data providers or the server ?
– Where are the coherent data ?
• On the provider side or on the server side
– When ?
8 Novembre 2006
• On demand or as soon as available ?
– For which data ?
– … at what cost ?
29
Open questions
• Can we identify how far we can go today in
the automation of data base integration ?
• Are there good practise that would simplify
this integration process ?
• Are there any weak points on which
studies or research should be made ?
8 Novembre 2006
30
– INSPIRE starts around 2009
•
•
•
•
Some paper
on levels of
detail,
multiple
representatio
n and data
base
specification
•
•
•
•
•
8 Novembre 2006
•
31
•
Chaudhry O., and W. Mackaness, 2006a, Modelling Geographic Phenomena at
Different Levels of Detail, In Proceedings of Autocarto 2006, USA.
Chaudhry O., and W. Mackaness, 2006b, Creation of Fiat Boundaries in Higher Order
Phenomenon. ICA workshop on Generalisation and multiple Representation, Portland
2006, aci.ign.fr/Portland/paper/ICA2006-ChaudhryMackaness.pdf
Gesbert N., 2004, Formalisation of Geographical Database Specifications, in
proceeding of the Conference on Advances in Databases and Information Systems
(ADBIS), September 2004, Budapest, pp. 202-211
Gesbert N., 2005, « Etude de la formalisation des spécifications de bases de données
géographiques en vue de leur intégration ». PhD Thesis Université de Marne-LaVallée, on line at ftp://ftp.ign.fr/ign/COGIT/THESES
Mustière S. and van Smaalen J. 2007. Databases Requirements for Generalisation
and Multiple Representations. To be published in “Generalisation of Geographic
Information : Cartographic Modelling and Applications”, W. Mackaness, A. Ruas et T.
Sarjakoski (eds), Elsevier.
Racine JB, 1981, "Problématiques et méthodologie : de l'implicite à l'explicite" dans
H. Isuard, JB Racine et H Raynard (ed.) Problématiques de la géographie Paris, PUF
le géographe.
Ruas A 2004 « Le changement de niveau de détail dans la représentation de
l'information géographique » HDR University of Marne la Vallée (on line
ftp://ftp.ign.fr/ign/COGIT/HDR/)
Ruas A., Bianchin A., 2002, "Echelle et Niveau de détail" dans A. Ruas (ed.)
Généralisation et représentation multiple, Paris, Hermes Lavoisier, Chapitre 1, p25-44.
Sarjakoski L. T.. 2007. “Conceptual Models of Generalisation and Multiple
Representation” To be published in “Generalisation of Geographic Information :
Cartographic Modelling and Applications”, W. Mackaness, A. Ruas et T. Sarjakoski
(eds), Elsevier.
Smith, B., and A. C. Varzi. 2000. Fiat and Bona Fide Boundaries. Philosophy and
Phenomenological Research 60:401-420.
Vangenot, C., Parent, C., Spaccapietra, S., 2002, Modelling and manipulating multiple
representations of spatial data. Proceedings of the 10th International Symposium on
Spatial Data Handling, pp.81-93.
Some papers on consistency between levels of
detail
•
•
•
•
8 Novembre 2006
32
Egenhofer M.J., Clementini E. and Di Felice P. 1994. Evaluating
inconsistencies among multiple representations, In Proceedings of
the 6th International Symposium on Spatial Data Handling
(SDH’94), pp. 901-920.
Paiva J.A. 1998. Topological equivalence and similarity in multirepresentation geographic databases, PhD Thesis in Spatial
Information Science and Engineering, University of Maine, 188 p.
Sheeren, D., Mustière, S., Zucker, J.-D. 2004, Consistency
Assessment Between Multiple Representations of Geographical
Databases: a Specification-Based Approach. Proc. of the 11th
International Symposium on Spatial Data Handling, Leicester, UK.
Sheeren D., 2005, « Méthodologie d'évaluation de la cohérence
inter-représentations pour l'intégration de bases de données
spatiales. Une approche combinant l'utilisation de métadonnées et
l'apprentissage automatique. » PhD Thesis, University of Paris 6,
292 p. on line at ftp://ftp.ign.fr/ign/COGIT/THESES
Some papers on geometric solution for data matching or data
integration (eventually btw LoD)
•
•
•
•
•
•
•
•
8 Novembre 2006
•
•
33
•
Beeri C., Kanza Y., Safra E., Sagiv Y. 2004. Object fusion in Geographic Information
Systems. Proceedings of the 30th VLDB Conference, Toronto, Canada.
Gomboši M., Žalik B., Krivograd S. 2003. Comparing two sets of polygons.
International Journal of Geographical Information Science, 17(5), pp. 431-443.
Haunert J.-H. 2005. Link based Conflation of Geographic Datasets. 8th ICA workshop
on generalisation and multiple representation, A Coruña, Spain, July 2005.
Lamine K. and S. Mustiere, 2005, “Integration de données transfrontalieres relatives à
la randonnée pedestre” Laboratoire COGIT, Paris 6. European Project WoW: Walk on
Web; IST-2-004688-STP.
Langlois, P., 1994, ‘Une transformations élastique du plan basée sur un modèle
d’interaction spatiale, applications à la géomatique’. Les journées de la recherche
SIG-CASSINI, Lyon, 13-14 Octobre 1994.
Laurini R., 1996, ‘Raccordement géométrique de bases de données géographiques
fédérées’. Revue internationale de géomatique. Volume4- n°3/1996, pages 361 à 388.
Mustière S. 2006. Results on experiments on automated matching of networks.
Proceedings of the ISPRS Workshop on Multiple Representation and Interoperability
of Spatial Data, Hanover, pp.92-100
Rousseaux Frédéric, Bonin Olivier (2003) Toward a coherent integration of 2D linear
data into a DTM . International Cartographic Conference 2003 (ACI), Durban, South
Africa.
Sester M., Anders K.-A. and Walter V. 1998. Linking objects of different spatial data
sets by integration and aggregation, GeoInformatica, 2(4), pp. 335-358.
Volz S. 2006. An iterative approach for matching multiple representations of street
data. Proceedings of the ISPRS Workshop on Multiple Representation and
Interoperability of Spatial Data, Hanover, pp.101-110
Walter V. and Fritsch D. 1999. Matching Spatial Data Sets: a Statistical Approach,
International Journal of Geographical Information Science, 13(5), pp. 445-473.

Documents pareils