Genio Suite™ A Technical Overview

Transcription

Genio Suite™ A Technical Overview
Genio Suite™
A Technical Overview
While every attempt has been made to ensure the accuracy and completeness of the
information in this document, some typographical or technical errors may exist.
Hummingbird Connectivity – a division of Open Text cannot accept responsibility for
customers’ losses resulting from the use of this document. The information contained
in this document is subject to change without notice.
This document contains proprietary information that is protected by copyright. This
document, in whole or in part, may not be photocopied, reproduced, or translated into
another language without prior written consent from Hummingbird Connectivity.
This edition published December 2007
www.hummingbird.com
2
Contents
Abstract
Introduction
Data Integration for the Enterprise
Genio Suite™
Genio Suite Architecture
Genio Suite Components
Genio Engine
Genio Repository
Genio Designer
Genio Scheduler
Genio MetaLinks
Genio DataLinks
Genio Suite Features
Environment Neutral
Extraction
Middleware and Standards Support
Incremental Extraction
Transformation Functions
Support of Stored Procedures and SQL Functions
Support for External Functions
National Language and Unicode Support
Mapping
Joins, Aggregation, Distinction, Filter and Sorting
Transaction and Nested Transaction Support
Validation against Metadata
Loading
Scheduling
Failover Capabilities
Performance Measurement
Process Optimization and Tuning
Data Quality Management
Error Handling
Audit and Monitoring
Parallelism and Process Slicing
Genio — Unique Processing Methodology
Use of the Genio Engine Exclusively
www.hummingbird.com
5
6
7
7
7
9
9
9
10
11
11
12
13
13
13
15
15
15
17
18
18
18
18
19
19
19
21
22
22
22
23
23
23
24
25
25
3
Use of the Genio Engine with Processing on the Remote Database
Exclusive Use of the Transformation Capabilities of Remote Databases
Genio — Superior Productivity
Procedural Scripting Language
Reusability
Tracking Changes
Dynamic Impact Analysis
Auto-Documentation
Versioning
Metadata Management
Looking Forward
What is the Hummingbird Connectivity Vision?
Genio Suite Development Directions
Conclusion
www.hummingbird.com
25
26
27
27
27
27
28
28
28
29
30
30
30
32
4
Abstract
This paper is intended for IT professionals interested in understanding and learning
about Genio Suite™. It begins with an introduction to the product, and then it
explores the product architecture and unique features, and concludes with an
indication of future product directions.
www.hummingbird.com
5
Introduction
The challenge of managing and leveraging all of the data within an enterprise grows
increasingly complex. More and more applications, such as Customer Relationship
Management (CRM), Enterprise Resource Planning (ERP), and Supply Chain
Management (SCM), have become embedded in the enterprise’s daily business, and
combined with Web applications and legacy systems, they have created an elaborate
and complicated IT environment. Many of these applications represent large
investments by the company and yet the data contained in these systems is often
isolated and not easily accessible.
However, in today’s competitive and demanding business environment, organizations
are recognizing the value of analyzing all enterprise data to gain a ‘single version of
the truth’—truths about customer relationships, business performance, and supplier
capabilities. And, this analysis is starting to take place in ‘real time,’ as businesses
operate on 24x7 requirements.
The first, and most challenging, step in this analysis process is Data Integration—
accessing and consolidating the disparate data and systems to feed data
warehouses, operational data stores and analytic applications—which is the basis for
analysis of the entire enterprise. Moreover, to enable faster implementation of
business processes, organizations need a solution that will provide the ability to
interchange data between all systems in their IT environment.
www.hummingbird.com
6
Data Integration for the Enterprise
Genio is Hummingbird Connectivity universal data integration solution that accesses,
transforms, enriches, cleanses and directs information from across the entire
spectrum of enterprise systems and applications. This powerful solution spans the
functional areas of data extract, load, and transformation (ETL) and enterprise
application integration (EAI) to handle today’s large data volumes and ‘near real-time’
requirements. It provides accurate, consistent, and timely information by connecting
any data source to any target system throughout the enterprise. Genio allows
organizations to make more informed, strategic decisions based on a single, accurate
version of the truth.
Genio Suite™
Genio extends an organization’s existing investment in technology and human
resources by seamlessly integrating its IT infrastructure. Genio distributes data
extraction, transformation, and load processes across computing resources and is
platform- and database-independent. This allows organizations to select the
operating system and RDBMS of their choice to store the Genio repository rather
than locking users into a single vendor or proprietary solution.
Adding to this capability is a powerful component of Genio that uses native
capabilities of source and target relational databases to delegate transformation
processes, leveraging existing technology and minimizing network traffic.
Genio Suite Architecture
The Genio architecture is built on an extensible, component-based, hub-and-spoke
design. It features a centralized engine and metadata repository (the hub) and data
sources and targets (the spokes) between which the hub exchanges data. Unlike
other hub-and-spoke architecture products, Genio can optimize data management
processes, avoid bottlenecks and reduce network traffic by leveraging the local
database capabilities.
www.hummingbird.com
7
Figure 1
Genio Architecture
The benefit of a hub-and-spoke architecture with a centralized and open repository is
that organizations can maintain full control of all data exchange processes, business
rules, and metadata that make up any and all projects within the enterprise. This
enhances environment management and empowers knowledge workers to make
better, more efficient use of business intelligence and analytical applications.
Since its initial development, Genio has followed an open and extensible design
concept in order to provide a solid platform for future development, simplifying
development of additional functionality and unifying the look and feel of different
applications in the suite.
This structural architecture has enabled Hummingbird Connectivity to develop a
procedural approach to data transformation and exchange processing that gives
users unlimited capabilities to transform and process all types of data. With this
approach, users are not limited to the functions provided by the tool. Rather, they are
free to develop their own re-usable code transformation to any degree of complexity.
Genio is built on client/server architecture, and incorporates a centralized and open
metadata repository. It can be implemented within a distributed deployment model,
allowing multiple developers to work on projects simultaneously with complete
version control and customized access privileges.
www.hummingbird.com
8
Genio Suite Components
Figure 2
Genio Components
Genio Suite offers an integrated set of components that allows organizations to
design, deploy and maintain data transformation and exchange processes. Genio
Suite main components include Genio Engine, Genio Repository, Genio Designer,
Genio Scheduler, Genio MetaLinks and Genio DataLinks.
Genio Engine
Genio harnesses the power of a scalable, multi-threaded, transformation engine that
brokers information from any source to any target. The Genio architecture supports
distribution and synchronization of data transformation and exchange processes over
multiple Genio engines. This is crucial as data volumes increase in size and as
transformation processes increase in complexity. It allows Genio to leverage the
power of existing distributed computing resources. The Genio Engine supports
Windows®, UNIX (Sun® Solaris, IBM AIX and HP/UX) and popular Linux platforms
(SUSE and Red Hat).
Genio Repository
The Genio Repository stores and manages all aspects of data transformation and
exchange process metadata. All technical metadata (e.g. data structures,
transformation rules…), business metadata (e.g. business rules, Data Flows…) and
production metadata (e.g. programs, logs…) are stored in this repository. This
repository is database-neutral and completely open. It can reside on Oracle,
www.hummingbird.com
9
Microsoft® SQL Server, Sybase Adaptive Server, Sybase SQL Anywhere, IBM UDB,
Informix or MySQL and on any platform supported by these databases.
Each component of a data transformation and exchange process is created as an
object and stored in this repository. Genio also automatically maintains relationships
between those objects and delivers a comprehensive set of dependency
management features. Moreover, Genio automatically identifies any change made to
metadata, and it leverages its dependency management capabilities to deliver a oneof-a-kind dynamic impact analysis. Therefore, every dependent objects impacted by a
change (internally or externally) will be automatically indentified even before the next
data transformation and exchange process is executed. This ensures information
quality and consistency, and reduces the time required to maintain data integration
processes.
Genio Designer
Genio Designer is a multi-user graphical environment for designing data
transformation and exchange processes. Data structures can be imported directly
from source and target systems or using metadata bridges (Genio MetaLinks). Userdefined business rules, functions, and procedures created in Genio Designer are
stored as objects within the Genio Repository, and are completely reusable from
project to project. Genio also incorporates a graphical interface that provides a
complete and powerful graphical procedural scripting environment for designing data
transformation processes of any complexity.
www.hummingbird.com
10
Figure 3
Genio Designer
Genio Scheduler
Genio Scheduler provides the ability to program processes execution either on
calendar basis or on event basis. Genio Scheduler also provides monitoring of
process executions as well as full history and audit-trail reporting. If desired, Genio
Scheduler can work alongside external schedulers like IBM Tivoli® or CA-Unicenter.
The substitution process is straightforward, as it can be implemented using standard
API or command line interface calls to the underlying architecture.
Genio MetaLinks
Genio MetaLinks are ‘pluggable’ metadata bridges embedded in Genio Designer that
enable the import of data structures from CASE tools, ERP systems, XML Schema or
Web Services Description Language documents.
Genio Suite currently provides the following MetaLinks:
•
Genio MetaLink for Web Services
•
Genio MetaLink for XML Schema
www.hummingbird.com
11
•
Genio MetaLink for SAP R/3
•
Genio MetaLink for SAP BW
•
Genio MetaLink for SAP Idoc
•
Genio MetaLink for Sybase PowerDesigner
•
Genio MetaLink for Computer Associates Erwin
Genio DataLinks
Genio DataLinks provide native connectivity to most relational and multi-dimensional
database systems and flat files. Data access is achieved through native APIs,
ADO/OLEDB, pass-through ODBC or generic ODBC. Genio has specific grammar for
all connectivities, which is used to generate native SQL statements. Genio is also
able to natively access files, such as Fixed Length Files (including mainframe flat
files), Delimited Files (CSV…) or XML files, and allows processing of any complex
files (EDI, IDoc, WebLogs, etc.).
Genio also offers the native population of multi-dimensional databases, such as
Hyperion Essbase. With this functionality, users can directly create all hierarchies or
members, set all necessary attributes, and load or refresh cubes. Through native
access, users do not require an additional staging area or complex multi-layer tools
from multiple vendors. There are two advantages to this approach — namely much
better performance due to the elimination of any staging area, and maintenance of
programmatic control of multi-dimensional cubes within the transformation logic.
Genio also gives organizations the ability to access information stored in SAP
applications, combine it with data from other sources, and then share it with other
systems throughout the enterprise. It delivers native connectivity to extract SAP data,
supports bi-directional data interchange with SAP through the SAP IDoc format, and
populates SAP BW with data from other systems.
Genio is certified by SAP for both CA-ALE and BW-STA interfaces.
Genio can also interface with thousands of other systems, by leveraging its Web
Services connectivity. This Universal and comprehensive Web Services support allow
Genio to interfaces with all internal and external Web Services compliant
applications. It also allows Genio to participate in Service Oriented Architectures
(SOA), and integrate other application function in Data Integration processes.
www.hummingbird.com
12
Genio Suite Features
Environment Neutral
Genio is completely platform and database neutral.
All interfaces developed with Genio leverage Genio graphical interface and Genio
procedural language. These features allow Genio users to develop generic business
rules without binding them to any specific environment. Objects created in Genio
Designer are stored in the centralized metadata repository. This centralized
development model eliminates the need to re-code business rules, lookup tables, and
custom functions for each new transformation project. At execution time, the Genio
Engine reloads those metadata driven processes and generates the appropriate code
for the target environment.
This allows the portability of these interfaces on virtually any platform.
Currently Genio supports natively, either in 32 bit mode or in 64 bit mode, 6 main
platforms (Windows, Sun Solaris, IBM AIX, HP/UX, SUSE Linux and Red Hat Linux).
Extraction
Genio extracts from the source databases using native SQL grammar, making it
possible to optimize the use of source database power and minimize network traffic.
This is done by accessing only the source rows that are pertinent to the
transformation work, thereby avoiding loading of all the data into a staging area.
When working with text sources, Genio has a variety of tools to help manage
complex structures like hierarchical data dumps from mainframes or EDI files. Even
when dealing with these kinds of files, the Genio function is still the same, and makes
no distinction if the source or target is a text file or database table.
Supported Sources
File Formats
•
AS-400 Flat Files
•
Delimited Text files (CSV…)
•
Fixed Text files
•
SAP IDoc
•
Standard Flat Files (Cobol)
•
Palm Pilot JFile, JfilePro
www.hummingbird.com
13
•
XML files
RDBMS
•
Microsoft Access 97, 2000, 2002, 2003 (Native, ODBC)
•
Microsoft SQL Server 6.5, 7, 2000, 2005 (ADO/OLEDB, ODBC)
•
Oracle 7, 8, 8i, 9i, 10g (Native, ADO/OLEDB, ODBC)
•
Informix 7.3x, 9.x (ADO/OLEDB, ODBC)
•
Sybase Adaptive Server 11.x, 12.x , 15 (Native, ADO/OLEDB, ODBC)
•
Sybase IQ 12.x (Native SQL, ODBC)
•
Sybase SQL Anywhere 5.5, 6.x, 7.x, 8.x, 9.x (Native SQL, ODBC)
•
Teradata V2R3.x, V2R4.x, V2R5.x, V2R6.x (Native SQL, ODBC)
•
IBM UDB 5.x, 6.x, 7.x, 8.x (ADO/OLEDB, ODBC)
•
MySQL 4.1.x (Native SQL, ODBC)
•
Generic ODBC
Mainframe Connectivity (Native, ODBC)
•
IBM DB2 AS400
•
IBM DB2 MVS, OS/930, Z/OS
•
IBM IMS
•
Adabas
•
VSAM
•
Natural
•
CICS
Other Systems and Standards
•
SAP R/3 (Native)
•
Web Services (Native)
www.hummingbird.com
14
Middleware and Standards Support
Genio supports all middleware providing ODBC, Web Services or command line
interfaces. It also natively supports message-oriented middleware (MOM), such as
IBM WebSphere MQ. Also, Genio can use FTP protocol, MAPI, RSH or any external
application to get access to data or push data to the target environment.
By leveraging these capabilities, Genio provide access to virtually any legacy system,
ERP, CRM, SCM systems and custom applications.
Incremental Extraction
Genio offers multiple strategies to perform incremental extraction. Simple
approaches, like selection limits based on time stamps, use of database log tables
(i.e. Oracle Snapshot) or use of database triggers to capture changes, can easily be
implemented to enable incremental extraction. These techniques are environmentneutral and can be implemented without any additional software investments.
Genio can also leverage its Web Services connectivity to capture data changes in
most applications. By accessing the application layer of operational systems through
Web Services, Genio can get access to any business transactions that occurred in
these systems during a certain period of time (e.g., new purchase orders, updated
product records, recently edited documents…)
Genio also provides specific DataLinks that can automatically capture changes from
legacy databases or VSAM files.
Finally, Genio can also interface with products like Sybase Replication Server and
directly process deltas created by this external middleware. This can also be done by
leveraging MOM solutions or any message-based environment.
Transformation Functions
Genio has a complete set of transformation functions making it as capable as a
programming language, but providing a graphical and optimized user interface to
make the design of transformation routines more productive.
Genio offers roughly 150 generic functions that can be used to build complex
expressions or custom functions. These functions cover the entire spectrum of string,
dates, number or boolean manipulation. Complex clauses—such as IF, THEN, ELSE
or CASE—can also be written in expressions. These functions can be processed
inside the Genio Engine on Windows, UNIX or Linux, but they can also be
automatically translated in native SQL functions in order to execute them on the
database engine side.
Standard Functions
www.hummingbird.com
15
String Functions
ASCII, CHR, CONVERTTO, FILL, FORMAT (from Binary), FORMAT (from Boolean),
FORMAT (from Date), FORMAT (from Numbers), FORMAT (from String), LEFT,
LENGTH, LOCATE, LOCATE (with start), LOWER, LTRIM, MID, MID (with length),
REPLACE, RIGHT, RTRIM, SOUNDEX, STRING (from Date), STRING (from
Numbers), TRIM, UCHAR, UNICODE, UPPER
Date Functions
ADDMONTH, DATE (from three numbers), DATE (from six numbers), DATE (from
seven numbers), DATE (from string), DATE (from string with format), DATEADD
(number of days), DATEADD (specified period of time), DATEDEC, DATEDIFF
(number of days), DATEDIFF (specified period of time), DATEINC, DAY,
DAYOFWEEK, DAYOFYEAR, HOUR, LASTDAYOFMONTH, MINUTE, MONTH,
NANOSECOND, SECOND, TIMESTAMP, TODAY, WEEK, WEEK3, YEAR
Mathematical and Trigonometric Functions
ACOS, ASIN, ATAN, COS, COT, EXP, LOG, MOD, SIN, SQRT, TAN
Numeric Functions
ABS, CEILING, CEILING (with decimal), DTOR, FLOOR, FLOOR (with decimal),
NULLIFZERO, POWER, RND, ROUND, ROUND (with decimal), RTOD, SIGN,
TRUNCATE, VAL, VAL (with decimal separator), VAL (with format), ZEROIFNULL
XML Functions
GetNodeName, GetNodeVale, GetXPath
Aggregate Functions
AVG,
AVG_DISTINCT,
AVG_OVER,
CORR,
COUNT,
COUNT_ALL,
COUNT_DISTINCT, COUNT_OVER, COVAR_POP, COVAR_SAMP, KURTOSIS,
MAX, MIN, PERCENT_RANK, RANK_OVER, REGR_AVGX, REGR_AVGY,
REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE, REGR_SXX,
REGR_SXY, REGR_SYY, SKEW, SKEW_DISTINCT, STDDEV, STDDEV_DISTINCT,
STDDEV_DISTINCT OVER_1, STDDEV_DISTINCT OVER_3, STDDEV_OVER_1,
STDDEV_OVER_3, STDDEV_POP, STDDEV_POP_DISTINCT, STDDEV_SAMP,
STDDEV_SAMP_DISTINCT, SUM, SUM_OVER_1, SUM_OVER_2, SUM_OVER_3,
VAR, VAR_POP, VAR_POP_DISTINCT, VAR_SAMP, VAR_SAMP_DISTINCT
www.hummingbird.com
16
Miscellaneous Functions
BINARY (from string), BOOLEAN (from string), CASE, GETREGISTRYVALUE,
GETINIFILEVALUE, GENERATEFILENAMEFORDM, IFTHEN, IFTHENELSE,
ISDATE, ISEMPTY, ISNULL, ISNUMERIC, MAX, MIN, NBEXCEPTIONS, SCASE,
SLEEP, SYSVAR, THREADID, VALUEOF, NVL
Operators
Arithmetic Operators
+, –, *, /
Boolean Operators
NOT, AND, OR
Relational Operators
= <>, <, >, <=, >=
Additional Operators
BETWEEN, BITWISEAND, BITWISEOR, BITWISEXOR, Concatenation (&), IN,
LIKE, NOT IN, NOT LIKE
Using these standard functions, Genio users can create their own macros to describe
business rules. For example, a function ‘Discount’ can be calculated from a given
sales amount and used everywhere across the Genio transformations, being
processed inside the Genio Engine or on a remote database.
Support of Stored Procedures and SQL Functions
Genio can also invoke stored procedures or any piece of SQL code that can be
executed on databases. These SQL scripts can be declared in Genio’s repository to
guarantee the reusability of existing code defined within relational databases, either
source or target. These stored procedures and SQL Functions can either be used to
retrieve data, either to extend Genio transformation feature set or simply to trigger
external processing on the database side. This also enables better distribution of
www.hummingbird.com
17
processing by allowing the use of remote databases transformation functions within
data integration processes. For example, Oracle sequences can be reused this way.
Support for External Functions
To extend the processing capabilities of Genio, it is also possible to use any legacy
function in a DLL written in C++ or any other language. These external functions are
declared once in Genio Designer, and can be used seamlessly in all Genio
transformations, thereby preserving legacy investments.
Also, Genio can also call any Web Service, executable, external batch or shell script
for specialized transformation needs.
National Language and Unicode Support
Genio delivers a comprehensive National Language Support and Unicode support.
It allows simultaneous connections to multiple systems encoded in different character
sets and exchanges data between these systems. Genio supports most single byte,
double-byte and multi-byte character sets as well as Unicode. Whenever it’s possible,
Genio can convert data from one character set to another and simultaneously
manipulate strings encoded in different code page.
Genio user interface also fully supports Unicode. It allows manipulation of metadata
encoded in different character sets, and delivers support for international
development teams.
Mapping
Genio provides different ways to define mapping. Whenever possible, the tools can
automatically detect mapping based on field names, field order or any custom
algorithm. Also, simple graphical mapping from the source to target can be manually
defined using drag and drop functionality, and more complex mapping can be done
using the Genio graphical procedural language.
Joins, Aggregation, Distinction, Filter and Sorting
When multiple sources, heterogeneous or otherwise, are required, users are able to
define datasets (logical view on the information system) to de-normalize, join,
aggregate, sort and distinguish data from the various source systems.
These datasets can combine multiple objects from heterogeneous systems, and can
also be used in other datasets.
www.hummingbird.com
18
These operations are defined graphically inside Genio, with no need to write SQL
code. Nevertheless, they cover the entire functional spectrum of relational database
features. One can define regular joins, external left, right joins, full outer joins,
calculated joins, or recursive joins involving the same table or view several times,
through aliases that Genio manipulates transparently. Filters are transformed in
WHERE or HAVING clauses, and sorting becomes ORDER BY. Genio recognizes
each SQL grammar, thus adapting itself to the source or target DBMS.
Transaction and Nested Transaction Support
Genio uses the traditional transactional mechanisms of relational databases,
COMMIT/ROLLBACK, on one or more databases. Transactions can be distributed on
several systems. They can occur at any moment during the execution of the
interface, for each single treatment or for each important step inside a single
treatment, for each functional unit. For example, it is possible to validate a transaction
for each client change when processing a single table containing invoices. The
COMMIT/ROLLBACK mechanisms can be triggered automatically or conditionally,
based on data test results on previous interface executions. This is typically used to
add failover capabilities to existing interfaces.
Genio can do conditional COMMIT/ROLLBACK for any module within a process. This
means that Genio Designer can create a complex condition for loading and data
quality checking, and decide whether to commit or rollback the load.
Validation against Metadata
Genio is completely metadata-driven and always validates the business rules and
changes against the metadata and operational sources and targets. While it is
possible to write logically incorrect statements, their physical grammar is
automatically tested and validated by Genio.
Metadata can be defined using Genio Designer, either by manually editing it, or by
importing it. Metadata can be imported from database schema, using sample files or
using Genio MetaLinks. By using third party solutions, such as MetaIntegration, it’s
also possible to implemented bi-directional metadata exchange with complementary
software such as BI tools (BO, Cognos…), metadata repositories or industry
standards (CWM…)
Loading
Genio has multiple loading strategies—single row, packet and bulk. In certain cases,
the loading can be done by the source database directly, when the developer has
decided to bypass the engine altogether.
www.hummingbird.com
19
Explained later in ‘Genio Unique Processing Methodology’.
Genio’s design environment does not impose a pre-defined methodology to
implement data loading processes. It has been designed to be highly productive and
generic to support different kinds of needs. Genio provides a comprehensive and
flexible solution that supports both a full data refresh strategy as well as data updates
when needed.
Delete, Insert, and Update strategy are all supported natively. Genio also provides
high level, user-friendly commands such as SmartInsert and SmartUpdate, designed
to simply perform row additions or updates in tables (database Merge).
Supported Targets
File Formats
•
AS-400 Flat Files
•
Delimited Text files (CSV…)
•
Fixed Text files
•
SAP IDoc
•
Standard Flat Files (Cobol)
•
Palm Pilot JFile, JfilePro
•
XML files
RDBMS
•
Microsoft Access 97, 2000, 2002, 2003 (Native, ODBC)
•
Microsoft SQL Server 6.5, 7, 2000, 2005 (ADO/OLEDB, ODBC)
•
Oracle 7, 8, 8i, 9i, 10g (Native, ADO/OLEDB, ODBC)
•
Informix 7.3x, 9.x (ADO/OLEDB, ODBC)
•
Sybase Adaptive Server 11.x, 12.x , 15 (Native, ADO/OLEDB, ODBC)
•
Sybase IQ 12.x (Native SQL, ODBC)
•
Sybase SQL Anywhere 5.5, 6.x, 7.x, 8.x, 9.x (Native SQL, ODBC)
•
Teradata V2R3.x, V2R4.x, V2R5.x, V2R6.x (Native SQL, ODBC)
•
IBM UDB 5.x, 6.x, 7.x, 8.x (ADO/OLEDB, ODBC)
•
MySQL 4.1.x (Native SQL, ODBC)
•
Generic ODBC
www.hummingbird.com
20
Mainframe Connectivity (Native, ODBC)
•
IBM DB2 AS400
•
IBM DB2 MVS, OS/930, Z/OS
•
IBM IMS
•
Adabas
•
VSAM
•
Natural
•
CICS
Multi-dimensional Databases
•
Hyperion Essbase 4.x, 5.x, 6.x, 7.x
•
Oracle Express 5.x, 6.x
Other Systems and Standards
•
SAP R/3 (Native)
•
SAP BW (1.2, 2.0, 3.0) (Native)
•
Web Services (Native)
Scheduling
Genio includes a complete scheduling facility, making it possible to schedule process
execution at a fixed time, periodically (daily, weekly, monthly), triggered by outside
events or from the polling service (files based events). Data Integration processes
can also be triggered by external events or Message Oriented Middleware (MOM)
such as IBM WebSphere MQ. Combining these functions, the Genio developer can
build as complex scheduling rules as necessary.
Genio Scheduler is not always required, as there is support for external applications
setting Genio variables, launching processes and receiving the result of the process.
This makes it very easy to implement use of system management applications like
IBM Tivoli or CA Unicenter. The substitution process is straightforward, and it can be
implemented on UNIX, Linux or Windows using standard API calls or command line
utilities.
www.hummingbird.com
21
Failover Capabilities
Genio does not impose any methodology on failover functionality. The open
architecture enables the developer to use any preferred technology for failover
systems—including “power-off” re-starts or complex rules on continuity.
To permit such implementations, Genio provides the basic features required to
implement complex failover strategy. When triggering a process execution, Genio
users are able to define the list of Genio engines and timeout for each one.
If the process execution fails, Genio Scheduler can automatically trigger a fail
process that will implement desired failover strategy or restart the same process.
Within each process, users can define restart points and therefore automate process
restart.
Hummingbird Connectivity training and best practices routinely teach these different
approaches, and can help Genio developers find the best approach for each specific
situation.
Performance Measurement
Genio has a unique performance meter inside its logs. All the different tasks are
timed, including the module coherence tests and SQL statement performance, as
well as the load processes. Also the volume of impacted data on every single target
systems is readably available in these logs. As a result, the Data Integration process
administrators can easily spot any potential performance problems.
Genio can automatically email this report to the developer or the system
administrator after each execution as well as keep it in the repository, making it
possible to analyze the performance.
It is also possible to use performance measurement tools to detect and isolate
networks, machines or any other potential bottlenecks.
Process Optimization and Tuning
Once again, Genio openness delivers multiple ways of optimizing process
executions.
Genio offers ways of offloading part of, or the entire process execution on the source
or the target system.
For more information, see ‘Genio Unique Processing Methodology’ section.
Also, Genio provides multiple reading and writing strategies (single, packet, and bulk)
that enable Genio users to optimize data movements based on the topology of their
information system.
www.hummingbird.com
22
When loading needs to be optimized, indexes can be dropped and restored on the
target so that the database engine can accept rows at the speed they are sent from
the Genio Engine. This is very simple to achieve with Genio’s SQL procedures and
SQL functions.
Also, when required, SQL statements can be tuned using the database SQL analyzer
or database hints that will ensure maximum utilization of resources. The tuning of the
target is usually based on the performance of the database and the type of end-user
tools used, and how they are used. Typically, the database environment provides
tools to assist tuning by showing statistics on the use of disk, indexes and processor
time for queries.
Data Quality Management
Genio delivers data cleansing capabilities through its own functions or through
partner products. Using Genio built-in functions (String functions, Soundex…), and
Genio procedural language, it’s possible to implement a basic cleansing process.
Also, by leveraging third party products, such as Harte-Hanks Trillium Software or
SPAD DQM…, it’s possible to implement a more complex cleansing process
involving address cleansing, pattern matching, and etc.
Error Handling
Genio, through its graphical procedural language, also delivers exceptional error
handling capabilities. Genio automatically reports all errors and anomalies in its log.
Technical exceptions (Datatype issues, Constraint violation…) are automatically
handled by the tool. While other exceptions types, such as business rules driven
exceptions, can be handle through user-defined exceptions. The Genio user can then
implement various exception handling strategies and decide if the execution should
be stopped after certain number of exception, or if incriminated data should be output
into rejection files. All of this can be done using Genio procedural language, which
provides the Genio user with an easy and comprehensive mechanism for error
handling. It is possible to define any logical test and implement virtually any type of
processing according to organization business rules.
Audit and Monitoring
All process execution information is logged into the Genio repository. Genio logs
everything from used SQL statements to all timing information and anomalies —
either system or designer defined exceptions. This log information can be accessed
from the Genio Scheduler client, or by using reporting tools to query Genio’s open
repository. It is also possible to use email as a delivery vehicle to inform DBAs or
www.hummingbird.com
23
system administrator of processing results. Genio can also package the logs into
different formats (Text, XML…). These can be sent as SMS messages to mobile
phones or via email.
Parallelism and Process Slicing
Genio has native multi-threading capabilities that enable processing tasks to be
performed in parallel—it communicates through global variables and events between
the processes. Increased performance can be achieved by splitting the load work
across multiple CPUs or across multiple physical servers, depending on where the
bottleneck resides.
Genio has a simple facility to perform source slicing based on the RDBMS sources.
Again this architecture is open and does not impose a foreign methodology on the
developer. Also, Genio gives users an easy way of defining multiple execution
contexts that will handle a subdivision of the entire process with its ‘Running Context’.
www.hummingbird.com
24
Genio — Unique Processing Methodology
Genio offers a unique methodology that distributes transformation workload by
offloading certain tasks to idle database engines during off-peak hours to maximize
efficiency and system performance. The following graphics and descriptions depict
the three modes of transformation processing that Genio offers:
Use of the Genio Engine Exclusively
The first mode of transformation processing involves Genio extracting data from the
source database, transforming it within its own engine, and then populating target
databases directly. This is the standard ETL approach. This architectural schema is
recommended when sources and targets are heterogeneous, or when the required
transformations cannot be performed by remote source or target databases.
Figure 4
Data Access Method One
Use of the Genio Engine with Processing on the Remote
Database
The second Genio data access mode involves extracting data and transforming it on
the source database platform. The resulting data is then transmitted through the
Genio platform, and loaded into the target database where any further transformation
is carried out. This is the ELT or ETLT approach. If possible, aggregations are
processed on the source, thereby reducing the volume of data transmitted between
source and target and, as a result, the required bandwidth to move data through the
network. In this mode, data is transported through Genio and copied without
transformations to the target.
Figure 5
Data Access Method Two
www.hummingbird.com
25
Exclusive Use of the Transformation Capabilities of Remote
Databases
With the third model, the source and target are on either part of the same logical
server or visible to each other (using database link). In this situation, it is not
necessary for the data to leave the server or to transport the data through a
communication layers (Network). In this context Genio can operate as a dynamic
code generator, and only send SQL orders that have been adapted to the relational
database. The RDBMS manages the extractions, transformations, and insertions (or
updates). As a result of this architecture, this model outperforms the two previous
ones. This is because no network bandwidth is used and it fully exploits the
processing capabilities of this database platform, including Massive Parallel
Processing (MPP) architectures.
Each data access mode is accessible through a common user interface and data
integration processes. Leveraging these various modes are defined using the same
graphical metaphor and programming methodology. By maximizing user control over
data flow, the Genio data access architecture enables users to significantly improve
the performance of their data exchange processes. Being able to select, manage and
summarize only relevant data and control, the platform on which work is executed will
vastly improve performance.
Regardless of which data access model is chosen, Genio impact analysis capabilities
are maintained, ensuring that if changes are made to any element of the data
exchange process, administrators are notified prior to the next scheduled execution.
Figure 6
Data Access Method Three
www.hummingbird.com
26
Genio — Superior Productivity
Procedural Scripting Language
Genio provides a graphical scripting language, designed specifically for simple and
complex data transformation, that supplies the transformation power typically found
only in traditional 4GLs. The language permits the use of loops, tests, variables, as
well as IF, ELSE, THEN and CASE statements.
Procedural transformations can be performed on individual rows of data, on a set of
rows, or on an entire table. The ability of Genio to manipulate data on a row-by-row
basis allows the development of extremely complex transformations. The ability to
support complex transformations directly within the Genio procedural language is
critical as it permits the full power of impact analysis.
Reusability
Objects created in Genio Designer are stored in the centralized metadata repository,
making them available for re-use. Objects are developed once and deployed many
times, providing consistency across all data integration processes. The centralized
development model of Genio eliminates the need to re-code business rules, lookup
tables, and custom functions for each new data integration project.
Also, Genio allows the reuse of external code by providing native support for Stored
Procedure, SQL Function, DLL function, Shell commands and Web Services.
This enables organizations to capitalize on existing investments and simplifies the
reuse of existing IT developments within Genio processes.
Tracking Changes
Genio provides the ability to track differences between an object definition stored in
the Genio Repository and the state of the same object, as it exists in the remote
system (physical object in the source or target). By utilizing the ‘Track Changes’
wizard, Genio users can automatically detect and import changes into the Genio
repository. Every change made to an object, whether it’s located in a remote
database or in Genio, is also stored and available for documentation purposes. This
means that Genio is always consistent with data structures, as they exist on remote
sources and targets, ensuring data accuracy and consistency in every data
transformation and exchange process.
www.hummingbird.com
27
Dynamic Impact Analysis
Genio provides a unique one-of-a-kind Dynamic Impact Analysis solution. The
consistency and status of objects are automatically maintained in real-time within the
design environment. Genio Designer provides two impact analysis modes—
immediate and deferred.
In the immediate impact analysis mode, Genio immediately checks the impact of a
change on an object and on all dependent objects. If the change impacts the integrity
of an object, its status is automatically changed to ‘Invalid Object’.
When an object is modified in the deferred impact analysis mode, Genio changes its
status to ‘Undefined Object’. Objects with this status can be verified at a later date.
This impact analysis is triggered either by changes made by developer within the
Genio environment or by the ‘Track Changes’ feature. For example, if a source data
structure changes, the ‘Track Changes’ feature detects it, and the impact analysis
identifies the effect the change on all related interfaces. Genio’s impact analysis
eliminates the need for developers to spend time manually tracking down
dependencies whenever a change is made. Genio provides a persistent list of invalid
and undefined objects, allowing developers to know the exact state of their metadata,
and the immediate consequence of making a change to it. This dynamic impact
analysis helps developers fix impacted objects by providing a thorough description of
required changes, and even auto-correction mechanisms. This decreases the
maintenance cycle and increases developer productivity.
Auto-Documentation
Genio Designer automatically manages the documentation of Genio projects,
including dependencies between objects, modification history and comments.
Genio users can automatically print or generate HTML documentation, dependency
graphs or dataflow graphs at any time. This significantly reduces documentation
efforts, and ensures the accuracy of project documentation.
Versioning
Genio is built on client/server architecture and leverages an open metadata
repository. It can be implemented in a centralized or distributed deployment model,
allowing multiple developers to work on projects simultaneously with complete
version control and customized access privileges.
Genio natively supports version management and status management
(Development, Test, Production…). All versions of data integration projects are
independent from each other and can be used in parallel. All objects in theses
www.hummingbird.com
28
projects have timestamps for creation and modification as well as user information
and comments. History of data structure modifications is also maintained
automatically by the tools.
Metadata Management
With the vast amount of information that organizations currently have at their
disposal, there is an ever-increasing need to collect, manage and reuse that
information. Organizations want to know what information they possess, its location,
its origin, and its size. This ‘data about data’ is called metadata. It can describe any
characteristic of the data—such as the content, its structure, its quality, or any
attributes related to its processing or changes.
Quite simply, metadata is an important catalog of information from any number of
sources—data warehouses, data exchange tools, business intelligence tools, ERP
CRM, SCM, business process modeling, workflow, data quality tools, ECM systems,
or any other application dealing with data.
Metadata secures the lineage of data, enabling knowledge workers to gain access to
business rules, and to understand where the data came from and how it has been
handled to date. This makes the time they spend on query and analysis activity more
productive.
Metadata management provides critical access for both business users and technical
users working with the data. Depending on the type of user, metadata can serve
either as a blueprint to the inner technical workings of the warehouse, or as a
roadmap to assist in navigating the warehouse and locating useful information.
Metadata also delivers valuable help to organizations when it comes to their
compliance with regulatory rules.
Genio repository contains all metadata used by data integration processes. This
metadata is made available to users through Genio tools—either by querying Genio’s
open database repository or though XML datagrams.
www.hummingbird.com
29
Looking Forward
Today, the data integration market includes several tools. The 3 main categories that
are used to implement data exchange are ETL (Extract, Transform and Load), EAI
(Enterprise Application Integration) and Enterprise Information Integration (EII) tools.
ETL, EAI and EII architectures, share many common concepts—although they
address different business requirements and provide specific functionalities.
Generally speaking, ETL tools are primarily batch data exchange solutions, focusing
on ‘Technical’ data and working mainly with RDBMS.
On the other hand, EAI tools are considered real-time data exchange solutions,
focusing on ‘Business’ data and working mainly with message-oriented middleware.
However, both product categories often deal with the same data, and implement the
same business rules to process this data.
EII tools provide data abstraction to address the data access challenges associated
with data heterogeneity and data contextualization. They provide a dynamic view on
multiple source systems and enable unified access to the data residing in those
various systems.
Today, organizations that want to implement these 3 types of data exchange
solutions typically have to use different technologies and products, resulting in
increased total cost of ownership of the data integration platform.
What is the Hummingbird Connectivity Vision?
To increase their competitiveness, organizations have to make more informed,
strategic decisions based on a single and accurate version of the truth. This means
that they require a single solution that will deliver them accurate, consistent and
timely information by connecting their data sources to any target system throughout
the enterprise.
To enable organizations to implement this vision, Hummingbird Connectivity already
delivers a comprehensive data integration solution that offers a strong ETL package,
with some extensions into the EAI world (XML connectivity, support for IBM
WebSphere® MQ, Support for Web Services...).
Genio Suite Development Directions
To further incorporate this model, Hummingbird Connectivity intends to develop and
extend its data integration product line to deliver a single solution that will address all
business needs, in a comprehensive metadata driven environment that operates as
an enterprise wide information broker.
www.hummingbird.com
30
Hummingbird Connectivity will continue to extend and enhance existing
functionalities of the product—including productivity, openness and ease of
maintenance. Multiple improvements will be introduced to the user interface to
ensure even better productivity. Interoperability and connectivity will be extended to
guarantee maximum integration with IT environments. Finally, impact analysis,
dependency management and auto-documentation will be enhanced to guarantee
minimum maintenance time.
In order to respond to the increasing demand for data processing; performance,
scalability and high availability will also receive specific attention. The Genio Engine
and Genio DataLinks will be further enhanced and improved to guarantee maximum
usage of resources. To enhance scalability, Hummingbird Connectivity will work on
enhancing workload repartition on multiple CPUs or clustered environments, and the
failover capabilities will be strengthened to guarantee 24/7 availability.
www.hummingbird.com
31
Conclusion
It would be accurate to say that data just might represent one of the last great
competitive advantages for today’s businesses. If this is true, then companies taking
advantage of a robust, flexible, and fully functional data integration solution are going
to be leaders in their market.
Because they address the complex requirements of sophisticated IT environments,
data integration solutions are at the technology forefront for many organizations. With
its advanced architecture, support for the most complete range of database and
hardware systems, and its near real-time functionality, Genio Suite offers companies
around the world a best-of-breed solution for today and for the future.
www.hummingbird.com
32
Sales & Support
Canada
38 Leek Crescent,
Richmond Hill, Ontario,
L4B 4N8
Phone: +1 905-762-6400
Fax: +1 905-762-6407
Toll Free: 1 877-359-4866
Corporate Head Office
www.hummingbird.com
[email protected]
[email protected]
www.opentext.com
[email protected]
North America Support
North America Sales
1 800-486-0095
1 800-499-6544
International Sales
Worldwide Support
+800-4996-5440
+1 905-762-6400
If you’re a Hummingbird Connectivity partner or customer, visit www.hummingbird.com or online.opentext.com
for more information about this and other Open Text solutions.
Open Text is a publicly traded company on the NASDAQ (OTEX) and the TSX (OTC).
Copyright © 2007 Hummingbird Ltd. All other trademarks or registered trademarks are the property of their respective owners. All rights reserved.
Hummingbird Connectivity is a Division of Open Text Corporation. Printed in Canada. WP-03-00-EN-142.12/07
www.hummingbird.com
33

Documents pareils