Data consistency in context of INSPIRE
Transcription
Data consistency in context of INSPIRE
Data consistency in the context of INSPIRE INSPIRE Drafting Team DT “Data specifications” 8 Novembre 2006 1 Anne Ruas Expert for IGN France CONTEXT OF THE DT SPECIFICATION WORKFLOW Modelling rules D2.5 D2.6 DT Specifications D2.7 by theme Themes 8 Novembre 2006 D2.3 needs CT 2 COMPONENT OF HARMANISATION 1. INSPIRE Information Model 1.1 INSPIRE Principles 1.4 ISO 19100 Profile 1.7 Object referencing modelling 3. Guidelines & Best Practice 3.1 Metadata 1.2 Reference model 1.5 Multi -lingual text and cultural adaptibility 1.8 Data translation model/guidelines 1.3 Application Schemas 1.6 Coordinate referencing and units model 1.9 Portrayal model 2. Operational components/registers 2.1 Identifier Management 8 Novembre 20062.2 Terminology 2.3 Feature catalogues 2.5 Conformance 2.4 Dictionaries White : proprietary components 3 3.5 re -re 3.5Derived Derived porting & multiple porting multiple representations representations 3.2 Maintenance 3.6 3.6Consistency Consistency between data between data 3.3 Quality 3.7 Data capturing 3.4 Data Transfer Extract of INSPIRE requirement • “The implementing rules shall be designed to ensure consistency as between items of the information which refer to the same location or between items of information which refer to the same object represented at different scales.” (article13.3, p19) 8 Novembre 2006 4 • “In order to ensure that spatial data relating to a spatial feature the location of which spans the frontier between two Member States are coherent, Member States shall, where appropriate, decide by mutual consent on the depiction and position of such common features” (article 16.2, p19) • INSPIRE intends to allow – as far as possible the access to data all over Europe from different sources for environmental studies 8 Novembre 2006 5 • To allow certain use and analysis, data should be coherent one to another : – How ? (which methods ?) – Who ? (the data providers or the server ?) – Where are the data ? (physically) – When (do we make the data consistent ?) – For which data ? – … at what cost ? In order to fit data together … • First step : – To have the same reference system – To have the same data schema • Including the same ‘rules of selection’ (not only the same name of class and name of attribute) • Second step : control the consistency btw the representations 8 Novembre 2006 6 Correctness : Does a data base well depict the reality ? • For each object, does it well represent its entity(ies)? – geometric and attribute accuracy, – shape and size accuracy (that complete the geometric accuracy) – attribute actuality, correctness and completeness • For each class (theme) does the collection of objects of a type (e.g. the hydrographic class) well represents the set of entities (e.g. the hydrographic network) – appropriate selection, appropriate distribution, – actuality, completeness • 8 Novembre 2006 all together, do the objects have relationships coherent with the relationships of the entities they represent? – topology (connectivity, adherence, inclusion) – overlapping (in case of 2D5 or 3D data) 7 Consistency • Egenhofer 1994 – logical consistency : the data are coherent with the model – the inter-representation consistency : the different representations do not contradict 8 Novembre 2006 8 – “Consistency refers to the lack of any logical contradiction within a model of reality. This must not be confused with correctness, which excludes any contradiction with reality. […] – In itself, each individual level may be consistent, however, when integrating and comparing the different levels, inconsistencies may be detected if the representations contradict “ Level of Detail • LoD is defined by : – type of information (the class and the attributes) – selection rules (that explain which entities of the real world will be represented in the data base) – accuracy of the attribute – type of geometry (3D, 2D5, 2D ; Volume, Polygon, Line or points) – accuracy of the geometry 8 Novembre 2006 9 • The number that defines the LoD (e.g. 1meter) sums up all these information, while the LoD is accurately defined by the data base specifications. Consistency in the context of INSPIRE • Consistency between different themes at the same Level of detail, • Consistency of a theme at two different level of detail • Consistency of objects along a boundary. 8 Novembre 2006 10 Between at the same Level of detail, • Consistency btw objects of the same theme – Eg : topology – Under the Responsibility of data provider (?) 8 Novembre 2006 11 Between different themes at the same Level of detail, • Before starting … : – Checking geometrical consistency btw objects of different themes, having different LoD is certainly useless – some themes do not share any constraint of coherence one to another. 8 Novembre 2006 12 • As an example if one can study the interactions between roads and risk or population areas, there is a priori no constraint between theses themes Simple inconsistencies • Simple inconsistencies exist when objects are supposed to have the same geometry – These inconsistencies can be detected and corrected by appropriate data matching algorithms 8 Novembre 2006 13 – it can be checked only if some information are redundant or if hypothesis exist (DTM) Identifying constraints • Checking the consistency between themes requires checking theme by theme if the data share specific constraints. – Topographic data (that describe the landscape) are certainly the most constrained. – As an example Relief and rivers should be coherent one to another. In the same way road should lay on a DTM 8 Novembre 2006 14 • In the case of underground data, some consistency rules might be checked when a geological layer touches the ground. Z 8 Novembre 2006 15 Complex inconsistencies • More complex inconsistencies concern the relationships that look coherent without external information. – For example we could have a building that is represented inside a forest whereas in reality it is outside the forest. 8 Novembre 2006 16 • In such a case, the representation looks coherent but the information is false. External information is needed to detect such errors. Consistency between LoDs • Sheeren 2005 distinguished – Differences are due to the data base specifications (LoD) With – inconsistencies are differences that are not explained by the specifications. 8 Novembre 2006 17 Example 1 • the river flow is the same (in terms of network) even if the geometry is simplified from polygons to lines, 8 Novembre 2006 18 Example 2 • the build-up area is coherent with the distribution of buildings, even if small building extensions are not included in these areas 8 Novembre 2006 19 Example 3 • even very simplified, the logic of car navigation is respected. 8 Novembre 2006 20 • Checking the coherence between LoD requires : – to identify class by class the relationships between classes of both LoD (aggregation, generalisation, selection, simplification). 8 Novembre 2006 21 – To check if the main properties at a level of detail are well maintained at the other level of detail (e.g. the build-up area and the buildings). 1Checking relationships btw LoD • • • 8 Novembre 2006 22 • aggregation : an object at LoD2 is an aggregate of objects at LoD1 (e.g. the build-up area at LoD2 is composed of buildings at LoD1) generalisation (in the DB meaning) : an object at LoD2 is represented in LoD1 by several spatially connected objects from more specific classes (e.g. a forest in LoD2 and conifers and leafy trees areas in LoD1). This class relation is also named classification hierarchy by some authors. object selection: a set of objects of a class in LoD2 represents a selection of the main objects of a larger set at LoD1 (e.g. the road or river network). Reduction of geometric dimension: An object represented by an area at LoD1 is represented by a line or point at LoD2 (e.g. a river from polygon to line or a building from polygon to point). 2Are main properties well maintained ? Ex IGN : BDTopo - BDCarto 8 Novembre 2006 23 • Gesbert, Sheeren and Mustière (Gesbert, 2004, 2005; Sheeren at al., 2004, Sheeren, 2005) argue that the analysis of consistency between levels of detail requires an accurate description of data base specifications by means of a formal model of description. 8 Novembre 2006 24 Ontologie «Ent. géog.» Élt du réseau hydro 0..1 «Ent. géog.» Cours d'eau * «Ent. géog.» Fossé «Ent. géog.» Canal «Ent. géog.» Acc. parcours affluent de «Ent. géog.» Source «Ent. géog.» Rivière «Ent. géog.» Aqueduc «Ent. géog.» Nœud réseau «Ent. géog.» Barrage «Ent. géog.» Cascade (canalisation) «Ent. géog.» Confluent «Ent. géog.» Écluse 8 Novembre 2006 «Objet de la base» Surface d’eau «Objet de la base» Cours d'eau nommé 25 Modélisation : Nature = cascade Sélection : largeur > 7,5 m «Objet de la base» Tronçon cours d’eau Sélection : a un toponyme «Ent. géog.» Diffluent Modélisation : axe à résolution 2,5 m artificiel = oui découpé en tronçons aux changements d’attributs «Ent. géog.» Embouchure «Ent. géog.» Perte BD Topo Pays Modélisation : attribut Nature = source «Objet de la base» Point d’eau Formal description of DB Specifications 8 Novembre 2006 GESBERT 2005 26 Consistency across boundaries 8 Novembre 2006 27 • Rubber Sheeting is a famous method to stretch data one to another (see for example Laurini 1996; Haunert 2005). • Which side could be changed ? 8 Novembre 2006 28 Consistency for INSPIRE – How ? • which methods ? Web Matching service ? – Who ? • the data providers or the server ? – Where are the coherent data ? • On the provider side or on the server side – When ? 8 Novembre 2006 • On demand or as soon as available ? – For which data ? – … at what cost ? 29 Open questions • Can we identify how far we can go today in the automation of data base integration ? • Are there good practise that would simplify this integration process ? • Are there any weak points on which studies or research should be made ? 8 Novembre 2006 30 – INSPIRE starts around 2009 • • • • Some paper on levels of detail, multiple representatio n and data base specification • • • • • 8 Novembre 2006 • 31 • Chaudhry O., and W. Mackaness, 2006a, Modelling Geographic Phenomena at Different Levels of Detail, In Proceedings of Autocarto 2006, USA. Chaudhry O., and W. Mackaness, 2006b, Creation of Fiat Boundaries in Higher Order Phenomenon. ICA workshop on Generalisation and multiple Representation, Portland 2006, aci.ign.fr/Portland/paper/ICA2006-ChaudhryMackaness.pdf Gesbert N., 2004, Formalisation of Geographical Database Specifications, in proceeding of the Conference on Advances in Databases and Information Systems (ADBIS), September 2004, Budapest, pp. 202-211 Gesbert N., 2005, « Etude de la formalisation des spécifications de bases de données géographiques en vue de leur intégration ». PhD Thesis Université de Marne-LaVallée, on line at ftp://ftp.ign.fr/ign/COGIT/THESES Mustière S. and van Smaalen J. 2007. Databases Requirements for Generalisation and Multiple Representations. To be published in “Generalisation of Geographic Information : Cartographic Modelling and Applications”, W. Mackaness, A. Ruas et T. Sarjakoski (eds), Elsevier. Racine JB, 1981, "Problématiques et méthodologie : de l'implicite à l'explicite" dans H. Isuard, JB Racine et H Raynard (ed.) Problématiques de la géographie Paris, PUF le géographe. Ruas A 2004 « Le changement de niveau de détail dans la représentation de l'information géographique » HDR University of Marne la Vallée (on line ftp://ftp.ign.fr/ign/COGIT/HDR/) Ruas A., Bianchin A., 2002, "Echelle et Niveau de détail" dans A. Ruas (ed.) Généralisation et représentation multiple, Paris, Hermes Lavoisier, Chapitre 1, p25-44. Sarjakoski L. T.. 2007. “Conceptual Models of Generalisation and Multiple Representation” To be published in “Generalisation of Geographic Information : Cartographic Modelling and Applications”, W. Mackaness, A. Ruas et T. Sarjakoski (eds), Elsevier. Smith, B., and A. C. Varzi. 2000. Fiat and Bona Fide Boundaries. Philosophy and Phenomenological Research 60:401-420. Vangenot, C., Parent, C., Spaccapietra, S., 2002, Modelling and manipulating multiple representations of spatial data. Proceedings of the 10th International Symposium on Spatial Data Handling, pp.81-93. Some papers on consistency between levels of detail • • • • 8 Novembre 2006 32 Egenhofer M.J., Clementini E. and Di Felice P. 1994. Evaluating inconsistencies among multiple representations, In Proceedings of the 6th International Symposium on Spatial Data Handling (SDH’94), pp. 901-920. Paiva J.A. 1998. Topological equivalence and similarity in multirepresentation geographic databases, PhD Thesis in Spatial Information Science and Engineering, University of Maine, 188 p. Sheeren, D., Mustière, S., Zucker, J.-D. 2004, Consistency Assessment Between Multiple Representations of Geographical Databases: a Specification-Based Approach. Proc. of the 11th International Symposium on Spatial Data Handling, Leicester, UK. Sheeren D., 2005, « Méthodologie d'évaluation de la cohérence inter-représentations pour l'intégration de bases de données spatiales. Une approche combinant l'utilisation de métadonnées et l'apprentissage automatique. » PhD Thesis, University of Paris 6, 292 p. on line at ftp://ftp.ign.fr/ign/COGIT/THESES Some papers on geometric solution for data matching or data integration (eventually btw LoD) • • • • • • • • 8 Novembre 2006 • • 33 • Beeri C., Kanza Y., Safra E., Sagiv Y. 2004. Object fusion in Geographic Information Systems. Proceedings of the 30th VLDB Conference, Toronto, Canada. Gomboši M., Žalik B., Krivograd S. 2003. Comparing two sets of polygons. International Journal of Geographical Information Science, 17(5), pp. 431-443. Haunert J.-H. 2005. Link based Conflation of Geographic Datasets. 8th ICA workshop on generalisation and multiple representation, A Coruña, Spain, July 2005. Lamine K. and S. Mustiere, 2005, “Integration de données transfrontalieres relatives à la randonnée pedestre” Laboratoire COGIT, Paris 6. European Project WoW: Walk on Web; IST-2-004688-STP. Langlois, P., 1994, ‘Une transformations élastique du plan basée sur un modèle d’interaction spatiale, applications à la géomatique’. Les journées de la recherche SIG-CASSINI, Lyon, 13-14 Octobre 1994. Laurini R., 1996, ‘Raccordement géométrique de bases de données géographiques fédérées’. Revue internationale de géomatique. Volume4- n°3/1996, pages 361 à 388. Mustière S. 2006. Results on experiments on automated matching of networks. Proceedings of the ISPRS Workshop on Multiple Representation and Interoperability of Spatial Data, Hanover, pp.92-100 Rousseaux Frédéric, Bonin Olivier (2003) Toward a coherent integration of 2D linear data into a DTM . International Cartographic Conference 2003 (ACI), Durban, South Africa. Sester M., Anders K.-A. and Walter V. 1998. Linking objects of different spatial data sets by integration and aggregation, GeoInformatica, 2(4), pp. 335-358. Volz S. 2006. An iterative approach for matching multiple representations of street data. Proceedings of the ISPRS Workshop on Multiple Representation and Interoperability of Spatial Data, Hanover, pp.101-110 Walter V. and Fritsch D. 1999. Matching Spatial Data Sets: a Statistical Approach, International Journal of Geographical Information Science, 13(5), pp. 445-473.