Data Warehouse Design Solutions Pdf Download
Nowadays, the data warehouse is recognized as the essential component of decision support systems since it ensures the best response to the decision problems of different functional areas of an organization. However, designing and building a data warehouse remain a very complex task, difficult to accomplish. This complexity is mainly due to the absence of technics and methods that are recognized in the field. Thus, the present paper identifies different rules for designing a data warehouse from relational data and introduces a new method that aims to automate this process using MDA techniques and XML.
Figures - uploaded by Nawfal el Moukhi
Author content
All figure content in this area was uploaded by Nawfal el Moukhi
Content may be subject to copyright.
Discover the world's research
- 20+ million members
- 135+ million publications
- 700k+ research projects
Join for free
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 1
Towards a new automatic data warehouse design
method
Ver s une nou ve lle mé th od e auto ma ti que de co ncept io n des en tre pôt s de do nnée s
Nawfal El Moukhi
MISC Laboratory, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco
elmoukhi.nawfal@gmail.com
Ikram El Azami
MISC Laboratory, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco
akram_elazami@yahoo.fr
Abdelaaziz Mouloudi
MISC Laboratory, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco
mouloudi_aziz@hotmail.com
Abdelali Elmounadi
LASTIMI, Mohammadia School of Engineers, Mohammed V University , Rabat, Morocco
a.elmounadi@gmail.com
Résumé
Les entrepôts de données sont actuellement reconnus comme étant un composant essentiel
des systèmes d'aide à la décision dans la mesure où ils offrent la meilleure réponse aux
problèmes de prise de dé cision des différents domaine s fonctionnels des organisations.
Cependant, la conception et la construction des entrepôts de données demeurent une tâche
très complexe, difficile à accomplir. Cette complexité est ess entiellement la conséquence de
l'absence de techniques et de méthodes reconnues dans le domaine. Dans ce contexte, cet
article identifie différentes règles pour la conception des entrepôts de données à partir de
données relationnelles et introduit une nouvelle méthode pour l'automatisation de ce
processus en se fondant sur les technologies MDA et XML.
Abstract
Nowadays, the data warehouse is recognized as the essential component of decision support
systems since it ensures the best response to the decision problems of different functional
areas of an organization. However, designing and building a data warehouse remain a very
complex task, dif ficult to accomplish. This complexity is mainly due to the absence of
technics and methods that are recognized in the field. Thus, the present paper identifies
different rules for designing a data warehouse from relational data and introduces a new
method that aims to automate this process using MDA techniques and XML.
Mots clés
Entrepôt de données, modèle relationnel, modèle multidimensionnel,
conception d'entrepôts de données, Architecture orientée modèle
Keywords
Data warehouse, Relational model, Multidimensional model,
designing data warehouses, Model Driven Architecture.
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 2
1. Introduction
The design of data warehouses is one of the most complicated issues in business intelligence.
Its complexity is due to the proliferation of data types on the one hand, and on the other, to
the abundance of information that gave rise to the new era of Big Data. Furthermore, the data
warehouse design phase is the first and the most important step in the decision-making
process, since all other steps of the process - data transformation, data analysis, information
extraction, online analytical processing (OLAP), etc. - are largely dependent on the quality of
the designed and adopted model. For all these reasons, this issue started quickly to arouse
researchers' interest since the 1990s when the first research work appeared, which focused on
the structure of data warehouses.
Much effort has been devoted to data warehouse design, and several methods automating the
data warehouse modeling were developed but none of them has become a consensus (Gosain
and Singh, 2015). Despite the lack of a standard model, it is widely assumed that the data
warehouse design must follow the multidimensional paradigm (Kumari and Yadav, 2015) and
it must be derived from the data sources, since a data warehouse is the result of homogenizing
and integrating relevant data of the organization in a single and detailed view (Taniar and
Chen, 2011). Other research considers that user requirement analysis is crucial in data
warehouse design (Abai et al., 2013) and therefore some experts developed a new method that
supports both approaches (Battaglia et al., 2011).
In this paper, we propose a new method that follows the data-driven paradigm to design a data
warehouse from relational data sources. We opted for this approach because it permits
significant time saving since the start of the data warehouse design project requires only the
availability of the transactional data. The choice of relational data is explained by their
widespread use in all types of organizations (Ghosh, 2010).
The rest of this paper is organized as follows: in the next section, we present a set of rules
dedicated to perform the transformation from a relational data model to a data warehouse
model. Section 3 describes the transformation method by applying the set of rules previously
elaborated in the first part, before moving to describing the transformation engine. Section 4
comes as the conclusion part where we describe the perspectives of this work.
2. Related work
The design of data warehouses has been subject of several research projects. Generally,
existing approaches can be categorized into three categories: Bottom-up, top-down and mixed
approaches.
Bottom-up approaches start from a detailed analysis of data sources, but missed the decision-
makers needs. The works of (Golfarelli and Rizzi, 1998), (Moody and Kortink, 2000),
(Vrdoljak et al., 2003), (Varga, 2002) and (Sehgal and Ranga, 2016) present different
approaches that allow to generate the data warehouse schema from Entity / Association
diagrams of the data sources. The methods proposed by (Romero and Abelló, 2010) and
(Jensen et al., 2004) exploit data mining techniques and ontologies to generate the
multidimensional schema. There are even solutions suggested by big companies such as
Oracle. They proposed a set of tools that allow the transformation of logical structure to
relational structure and next transformation to Multidimensional Model of warehouse in star
or snowflake schema (Drzymala et al., 2012).
Top-down approaches (Winter and Strauch, 2003), (Annoni et al., 2006) and (Jovanovic et al.,
2014) allow building the data warehouse schema from a detailed analysis of the decision
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 3
makers needs. Verification of the correspondence between these needs and the data sources is
done a posteriori.
Mixed approaches (Phipps and Davis, 2002), (Giorgini et al., 2005) and (Abdelhedi and
Zurfluh, 2013) consider both the needs of decision makers and the source data. Therefore,
they have the advantage of designing multidimensional schemas that respect the data source
structure.
If we analyze these different methods, especially those following the data-driven approach
(Bottom-up), we can see that they are all semi-automatic, and there are even some methods
that just provide guidelines and recommendations to get suitable multidimensional schemas.
In this context, our work consists of developing a new fully automatic method called X-ETL.
This method will allow to transform a relational model into a multidimensional model without
any human intervention.
3. Rules for data warehouse design from relational data
This section presents the set of rules that we have developed from previous work (Khouri et
al., 2014) (Elmoukhi et al., 2015)(Khnaisser et al., 2015)(Santos et al., 2016) (Dahlan and
Wibowo, 2016). These rules will form the foundation of our solution to standardize the data
warehouses design.
3.1 Rules for Facts and Measures
• The fact tables are the concepts of main interest for the decision making
process. They correspond to events that always occur in the organization or
company (Chandwani and Uppal, 2015) ;
• The measures of the fact table should be numeric and additives (at worst
semi-additives) (Akbar et al., 2013) ;
• The data of a fact table are fixed and cannot be changed (Bliujute et al.,
1998) ;
• A fact table represents always a particular activity and should be
interrogated from a particular context (one or a few dimensions) ;
• No line of the fact table can contain an empty value ;
• A fact table contains only the foreign keys which represent the primary keys
of the dimensions and these keys must be numeric, to ensure that the fact
table is more efficient (Rudra and Nimmagadda, 2005) ;
• Each combination of dimension values defines an instance of the fact table,
which is characterized by one and only one value for each measure.
Below are the mathematical representations of the rules for facts and measures:
Let TF be a fact table, MTF a fact table measures, Di a dimension of the fact table and m an
instance of MTF.
• TF = P(Ev )
with:
P : Main interests
Ev : Company events ;
• Let m 1 and m 2 be two instances of MTF. If m 1 and m 2 are additives:
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 4
∃ m3 = m1 + m2
With m3 an instance of the same measure MTF ;
• Suppose that f is a change function on TF
∀ m ∈ MTF
f(m)=α
With α a constant ;
• Let F be the set of fact tables and A a particular activity of the organization
For each TF ∈F we have:
TF =A ;
• ∀ TF there is at least one function f which applies at least one dimension Di
on TF ;
• Let LTF be the set of rows of a fact table and l a row of LTF
∀
l
∈ LTF
l ≠ ∅ ;
• Let fbe a foreign key and p a primary key
we have :
{f1,f2,...,fn}={p1,p2,...,pn }
∀k ∈ {1,2,...,n}
with fk ∈ TF and pk ∈ Di
and fk and pk of type Integer ;
• Let C be the set of combinations of dimension values, c a combination of C
and f a function on MTF :
- For each instance m of M TF , the combination f( m ) ∈ C.
- For each combination c of C, the equation f(m) = c admits a unique
solution (any combination c of C admits a unique antecedent MTF) f(m)
is bijective.
3.2 Rules for Dimensions and Attributes
• The dimensions determine how fact instances can be aggregated
significantly for decision making process ;
• A fact table must always contain the time dimension ;
• The dimensions should have numeric primary keys ;
• The primary key of each dimension table should be unique (preferably auto-
increment), and fields should have an atomic value (not compound) ;
• The dimension hierarchies should preferably have a simple form of 1-n
type, and avoid relationships of n-n type ;
• A non-dimensional attribute contains additional information on an attribute
of the hierarchy, and is linked by to-one relationship (Golfarelli et al.,
1998) ;
• The non-dimensional attributes cannot be used for aggregation (Golfarelli et
al., 1998) ;
• The relationship between a fact table and a dimension is always many-to-
one(Cavalheiro and Carreira, 2016).
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 5
Below are the mathematical representations of the rules for dimensions and attributes:
Let TF be a fact table and Di a dimension of the fact table.
• Let f be an aggregation function on TF
f is significant if and only if f applies one or more dimensions on the
instances of TF ;
• ∀ TF (TF ∋ Dit ) with Dit a time dimension ;
• Let Cp be the set of primary keys of Di
∀ Di
Cp ∈ ℕ ;
• Let Cp be the set of primary keys of Di and p1 and p2 two instances of Cp
∀ p1 and p2
p1 ≠ p2 ;
• Let R be a relationship between two dimensions
∀ R
R≠ (n,n) ;
• Let f be an aggregation function, A the set of its attributes, and an a non
dimensional attribute
For any non-dimensional attribute an we have:
an ∉ A;
• Let R be a relationship between TF and Di
TF and Di
R= (n,1).
3.3 Application of Rules
We take as an example the model below. It represents the sales activity of a set of products in
a chain of stores located in several cities and countries. The model is composed of a purchase
table related to three tables:
- The product table containing the reference, the description, the price and the type of the
product sold. We assume that a purchase concerns one and only one product, but a
product may concern several purchases;
- The customer table that contains the customers' information;
- The city table where the store is located (a city can only contain a single store).
The city table is also related to the department table to which belongs a set of cities, then the
department table to the region table and finally the region table to the country table.
Department
#Number
Name
FKRegion
Customer
#CustomerCode
LastName
FirstName
BirthDate
Gender
FKCityPC
FKCityCity
City
#PostalCode
City
FKDepartment
Purchase
#IdPurchase
Quantity
Timestamp
FKCityPC
FKCityCity
FKCustomer
FKProduct
Product
#Reference
Description
Price
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 6
Figure 1. Example of a sales transactional model
By following the rules presented in the previous sections, the fact table will be the Purchase
table. It is the table that represents a particular activity of the company and the main interest
for decision makers. It contains a numeric and additive field which can be considered as the
main measure of the fact table. In addition, this table contains the highest number of many-to-
one relationships (the highest number of foreign keys), which is the privileged type for
relations between fact tables and dimension tables (the last rule in the section 3.2).
Concerning the dimension tables, they will be respectively the Product table, the Customer
table, and the City table since they are related directly to the fact table by a one-to-many
relationship. All these tables represent the analysis contexts of the fact table and determine
how fact instances can be aggregated significantly for the decision-making process. The City
table is related to a tree of tables representing the location, and therefore they can all be
grouped together in a single table (Dim_Place). Finally, our example does not contain any
time table, so it is necessary to add one that will represent the time dimension (rule 2 in
section 3.2).
The multidimensional model below is the final result of this transformation:
Figure 2. Sales transactional model example after rules application
Thus, we chose the Purchase table as fact table, as it represents a particular business activity
and a main interest for the decision making process. This fact table contains a numeric and
additive measure, and the foreign keys of the dimensions that are also numeric. We also
added a dimension table that contains the different granularity of time, and we tried to choose
the dimensions and attributes that will enable a significant aggregation of fact instances. In
this sense, the attribute Description of the Product table was not retained as an attribute of the
dimension since it contains only a description of the product and no data that can determine
how instances of the fact table can be aggregated (cf. rule 1 for dimensions and attributes).
There is no doubt that these rules will facilitate the identification of facts, measures,
dimensions and attributes from relational data. Therefore, we have the essential components
for the construction of our own method while using the Model-driven Architecture (MDA)
techniques. Once our method will be developed, we will apply it for building a data
warehouse for the National Library of Morocco from their relational data.
Fact_Purchase
#IdPurchase
Quantity
FKPlace
FKDate
FKCustomer
FKProduct
Dim_Place
#PostalCode
Department
Region
Country
Dim_Product
#Reference
Price
Type
Dim_Date
#Timestamp
Hour
DayOfWeek
DayOfYear
Week
Month
Quarter
Semester
Year
Dim_Customer
#CustomerCode
Age
Gender
PostalCode
Department
Region
Country
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 7
4. A new method for transforming a relational model to
multidimensional model
4.1 Model Driven Architecture
MDA (Model-Driven Architecture) is a standard of the OMG (OMG, 2001), which is based
on the MDE (Model-driven engineering), providing a set of guidelines and an architecture for
the design of software systems. The MDA approach provides the opportunity to understand
complex systems and the real world through their abstraction. This abstract view of the
system is elaborated in a conceptual framework as well as a number of standards provided by
the OMG. These standards allow to define the models, their relations and their
transformations (for example: UML -Unified Modeling Language, MOF and XMI -XML
Metadata Interchange). In order to visually represent the MDA approach, the OMG has set up
a framework, structured of several types of models. Figure 3 shows the development cycle in
Y, which implements these models and their relationships:
Figure 3. The MDA process (OMG, 2001)
• CIM (Computation Independent Model): he CIM allows a vision of
the system and its environment, while hiding the details of structure
and implementation. Models of the CIM level help narrow the gap
between domain experts and designers. As a result, a CIM model is
sometimes called a domain model;
• PIM (Platform Independent Model): Models of the PIM level
represent a vision of system analysis and design, independently of
any technological details concerning the platform (operating system,
programming language, hardware, network performance, etc.);
• PSM (Platform Specific Model): The PSM level presents a
projection of PIM level models to a specific platform. These models
combine PIM specifications with platform specific details;
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 8
• PDM (Platform Description Model): These models describe the
platform on which the system will be executed, by providing a set of
technical data regarding the functionality and use of the platform.
4.2 Models transformation
Model transformation in MDA context consists to transform the PIM models to PSM models.
This process is performed by a transformation engine that applies a set of rules to PIM to
generate the PSM.
The concept of meta-model is omnipresent in this case. Thus, each model (PIM or PSM) is
based on a meta-model used to describe it. When both models use the same meta-model, it's
about an "endogenous" transformation; we talk about "exogenous" transformation in the
opposite case.
There are mainly two types of transformations:
• Transformations M2M (Model to Model): used to transform models
to other models, these transformations concern all tasks to be
executed in order to get a model respecting the technical
specifications of the target environment. The standard which
technically represents this type of transformation in the MDA
approach is the MOF 2.0 QVT;
• Transformations M2T (Model to Text): used to generate code or
documentation. M2T transformations constitute the MOFM2T
project which is one of many parts of MDA project.
In our case, we are interested in M2M transformation since we aim to transform a relational
model to a multidimensional model.
Basically, the main steps to complete the transformation are as follows:
1. To specify the source meta-model: first, we have to specify the
source meta-model, in our case it is the meta-model of relational
schema;
2. To specify the target meta-mo del : we must also specify the meta-
model representing the decisional concept;
3. To build the transformation engine: this step will be based on the
rules presented in section 3.
The figure below illustrates these different steps:
Conform s to
Input
Source
meta-model
Relational
meta-model
Target
meta-model
Multidimensi
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 9
Figure 4. Transformation of a relational model into a multidimensional model (Blanc and
Salvatori, 2005)
4.3 Eclipse Modeling Framework
EMF is a modeling and code generation platform that facilitates the construction of tools. It is
about a set of development tools integrated into the Eclipse environment in the form of
plugins among which we quote: the Ecore meta-model, the editor EMF.Edit, the generation
model GenModel, etc. EMF was designed to open Eclipse to the model-driven development.
It is an approach based on simplifying MOF. It allows to define meta-models then to derive
an implementation in Java to build instance models (Budinsky et al., 2009).
Figure 5 shows the architecture of the EMF Framework. The main role of this structure is to
accept models or files as input, and to generate code corresponding to tools (plug-in)
manipulating the input data.
Figure 5. EMF Architecture (Budinsky et al., 2009)
4.4 Defining the source and target meta-models
We started by defining our two meta-models (source and target) by using Ecore which is
considered as an EMF model (Eclipse Modeling Framework). In this regard, we note that all
the meta-models presented in this section are our proposal: in the ISCRAM-med conference
(El Moukhi et al., 2016), we have presented the relational metamodel and the
multidimensional metamodel that covers all multidimensional models.
The relational meta-model consists of three essential elements: a database that contains tables
which in turn contain columns. So we tried to resume these components in the figure below
(El Moukhi et al., 2016):
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 10
Figure 6. The relational meta-model
Concerning the multidimensional meta-model, it consists of a multidimensional schema that
contains facts and dimensions. In order to perform a multidimensional analysis it is necessary
to have at least two dimensions. Each fact contains measures and each dimension contains an
hierarchy of attributes. The figure 7 below resumes these components and describes our
multidimensional meta-model (target) (El Moukhi et al., 2016):
Figure 7. The multidimensional meta-model
Contrary to the previous work, this paper deals in detail with the metamodels of the various
types of multidimensional model, (Figures 8, 10, 12) and introduces a new method that allows
to transform the relational model into a multidimensional one. Thus, we have:
• The star schema which contains a single fact table directly linked to
dimensions (see example of figure 2) and no dimension is related to
another, that's why we removed the reflexive link on the dimension.
Its meta-model is described below (figure 8):
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 11
Figure 8. The multidimensional meta-model for star schema
• The snowflake schema which contains a single fact table with
dimensions which may be linked to other dimensions. The example
below (figure 9) illustrates this type of model.
Figure 9. The snowflake schema for sales example
The meta-model corresponding to snowflake schema is shown in the figure 10:
Fact_Purchase
#IdPurchase
Quantity
PurchasePrice
FKPlace
FKDate
FKCustomer
FKProduct
Dim_Place
#PostalCode
Department
Region
FKCountry
Dim_Product
#Reference
Name
Price
FKCategory
Dim_Date
#Timestamp
Day
FKMonth
Dim_Customer
#CustomerCode
Age
Gender
PostalCode
Department
Region
Country
Dim_Category
#IdCategory
Type
Dim_Month
#IdMonth
Month
FKYear
Dim_Country
#IdCountry
Country
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 12
Figure 10. The multidimensional meta-model for snowflake schema
• The constellation schema which is the most complex. It may contain
two or many fact tables with shared dimensions, as shown in the
example below (Figure 11).
Figure 11. The constellation schema for sales example
Fact_Purchase
#IdPurchase
Quantity
PurchasePrice
FKPlace
FKDate
FKCustomer
FKProduct
Dim_Place
#PostalCode
Department
Region
Country
Dim_Product
#Reference
Name
Price
Dim_Date
#Timestamp
Hour
DayOfWeek
DayOfYear
Week
Month
Quarter
Semester
Year
Dim_Customer
#CustomerCode
Age
Gender
PostalCode
Department
Region
Country
Fact_Delivery
Amount
Volume
FKProduct
FKDate
FKSupplier
Dim_Supplier
#IdSupplier
SupplierName
Adress
FKCountry
Dim_Country
#IdCountry
CountryName
CodeName
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 13
Figure 12. The multidimensional meta-model for constellation schema
In order to verify the conformity of our models to these meta-models, we created two other
files with xsd language in which we described how the model structure should be to comply
with its meta-model. Thus, we got an xsd file to validate the relational model and another one
to validate the multidimensional model. The codes fragments below correspond to these two
files.
The XML Schema Definition for relational models :
!"#$%&'()*+,- "1.0" "./
!#)0)12'$3#$%,)0#)- "http://www.w3.org/2001/XMLSchema"344(*564'7+($8'936%4- "unquali
fied"'%'$',47+($8'936%4- "qualified"./
!#)0'%'$',4,3$'- "relationalSchema"4:;'- "relationalSchemaType" <./
!#)01+$;%'#=:;',3$'- "columnType" ./
!#)0)*$;%'>+,4',4 ./
!#)0'#4',)*+,53)'- "xs:string" ./
!#)0344(*564'4:;'- "xs:string",3$'- "name"6)'- "optional" <./
!#)0344(*564'4:;'- "xs:string",3$'- "isPk"6)'- "optional" <./
!#)0344(*564'4:;'- "xs:string",3$'- "type"6)'- "optional" <./
!#)0344(*564'4:;'- "xs:string",3$'- "isFk"6)'- "optional" <./
!<#)0'#4',)*+, ./
!<#)0)*$;%'>+,4',4 ./
!<#)01+$;%'#=:;' ./
!#)01+$;%'#=:;',3$'- "columnsType" ./
!#)0)'?6',1' ./
!#)0'%'$',44:;'- "columnType",3$'- "column"$3#@116()- "unbounded"$*,@116()- "0" <./
!<#)0)'?6',1' ./
!<#)01+$;%'#=:;' ./
!#)01+$;%'#=:;',3$'- "associationType" ./
!#)0)*$;%'>+,4',4 ./
!#)0'#4',)*+,53)'- "xs:string" ./
!#)0344(*564'4:;'- "xs:string",3$'- "multiplicity"6)'- "optional" <./
!#)0344(*564'4:;'- "xs:string",3$'- "target"6)'- "optional" <./
!<#)0'#4',)*+, ./
!<#)0)*$;%'>+,4',4 ./
!<#)01+$;%'#=:;' ./
!#)01+$;%'#=:;',3$'- "associationsType" ./
!#)0)'?6',1' ./
!#)0'%'$',44:;'- "associationType",3$'- "association"$3#@116()- "unbounded"$*,@116()-
"0"<./
!<#)0)'?6',1' ./
!<#)01+$;%'#=:;' ./
!#)01+$;%'#=:;',3$'- "tableType" ./
!#)0)'?6',1' ./
/
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 14
The XML Schema Definition for multidimensional models
!"#$%&'()*+,- "1.0" "./
!#)0)12'$3#$%,)0#)- "http://www.w3.org/2001/XMLSchema"344(*564'7+($8'936%4- "unquali
fied"'%'$',47+($8'936%4- "qualified"./
!#)0'%'$',4,3$'- "multidimensionalSchema"4:;'- "multidimensionalSchemaType" <./
!#)01+$;%'#=:;',3$'- "measureType" ./
!#)0)*$;%'>+,4',4 ./
!#)0'#4',)*+,53)'- "xs:string" ./
!#)0344(*564'4:;'- "xs:string",3$'- "name" <./
!<#)0'#4',)*+, ./
!<#)0)*$;%'>+,4',4 ./
!<#)01+$;%'#=:;' ./
!#)01+$;%'#=:;',3$'- "foreignkeyType" ./
!#)0)*$;%'>+,4',4 ./
!#)0'#4',)*+,53)'- "xs:string" ./
!#)0344(*564'4:;'- "xs:string",3$'- "name"6)'- "optional" <./
!<#)0'#4',)*+, ./
!<#)0)*$;%'>+,4',4 . /
!<#)01+$;%'#=:;' ./
!#)01+$;%'#=:;',3$'- "fieldsType" ./
!#)0)'?6',1' ./
!#)0'%'$',44:;'- "measureType",3$'- "measure" <./
!#)0'%'$',44:;'- "foreignkeyType",3$'- "foreignkey"$3#@116()- "unbounded"$*,@116()- "0
"<./
!<#)0)'?6',1' ./
!<#)01+$;%'#=:;' ./
!#)01+$;%'#=:;',3$'- "associationType" ./
!#)0)*$;%'>+,4',4 ./
!#)0'#4',)*+,53)'- "xs:string" ./
!#)0344(*564'4:;'- "xs:string",3$'- "multiplicity"6)'- "optional" <./
!#)0344(*564'4:;'- "xs:string",3$'- "target"6)'- "optional" <./
!<#)0'#4',)*+, ./
!<#)0)*$;%'>+,4',4 ./
!<#)01+$;%'#=:;' ./
!#)01+$;%'#=:;',3$'- "associationsType" ./
!#)0)'?6',1' ./
!#)0'%'$',44:;'- "associationType",3$'- "association"$3#@116()- "unbounded"$*,@116()-
"0"<./
!<#)0)'?6',1' ./
!<#)01+$;%'#=:;' ./
!#)01+$;%'#=:;',3$'- "factType" ./
!#)0)'?6',1' ./
!#)0'%'$',44:;'- "fieldsType",3$'- "fields" <./
!#)0'%'$',44:;'- "associationsType",3$'- "associations" <./
!<#)0)'?6',1' ./
!#)0344(*564'4:;'- "xs:string",3$'- "name" <./
!<#)01+$;%'#=:;' ./
!#)01+$;%'#=:;',3$'- "attributeType" ./
!#)0)*$;%'>+,4',4 ./
!#)0'#4',)*+,53)'- "xs:string" ./
!#)0344(*564'4:;'- "xs:string",3$'- "name"6)'- "optional" <./
!#)0344(*564'4:;'- "xs:string",3$'- "isPk"6)'- "optional" <./
!#)0344(*564'4:;'- "xs:string",3$'- "type"6)'- "optional" <./
!#)0344(*564'4:;'- "xs:string",3$'- "isFk"6)'- "optional" <./
Many other meta-models have been proposed in the literature. We quote as examples
(Hachaichi and Feki, 2013) (Srai et al., 2017) (Sapia et al., 1999) (Choura and Feki, 2011).
Concerning the relational meta-model, it is a well-known standard that is repeated in almost
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 15
all research with some slight differences (Lämmel, 2005) (Chang et al., 2003) (Inria, 2005).
As for the multidimensional meta-model, we find that each existing research work presents a
new schema but the common point between all these works is that these meta-models remain
too general and do not treat each type of multidimensional model separately. We take as an
example the CWM schema (Figure 13) which represents the most known and the most used
multidimensional meta-model in the field.
Figure 13. Multidimensional Metamodel of CWM (OMG, 2003)
As we can see, it is a meta-model that covers all types of multidimensional models and
therefore there will be fewer restrictions when we choose to work with a specific type. In this
context, our work comes to complete these research works by treating each type of
multidimensional model (star-flocon-constellation) separately and presenting its meta-model.
Thus, we obtained for the snowflake schema (figure 10) a cardinality of three for the
dimension table, since we must have at least one more dimension related to another. For the
constellation schema (figure 12), we must have at least two fact tables related by dimensions,
which explains the cardinality of two for the fact table.
4.5 The transformation engine
After defining our meta-models, we built the transformation engine based on the rules
presented in section 2 and principles of the MDA. Firstly, we developed a Java program that
calculates the number of foreign keys in each table of the relational model, which allowed us
to detect the (potential) fact tables.
Once the fact tables are identified, we generate a multidimensional model that contains only
the fact table with the dimension tables that are directly related to it in the relational model.
After that, we will also include the indirectly related dimensions in a future work.
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 16
The figure 14 shows the architecture of the X-ETL project.
Figure 14. X-ETLp roject architecture
The most important part of the project is the file that allows parsing tables in order to
calculate the number of foreign keys and then identify potential fact tables. The figure 15
shows the edit window for this file.
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 17
Figure 15. Dom parsing of table element
Once the fact tables are detected, the second step consists of detecting the dimensions by
parsing associations of the relational model to identify many-to-one relationships. The figure
16 represents the Dom parsing of association element.
Figure 16. Dom parsing of association element
The complex calculations that are made on the cardinalities to detect the dimensions justify
the choice of the JAVA language, since the other transformation languages such as ATL
(Atlas Transformation Language) or QVT (Query/View/Transformation) do not allow tomake
this kind of complicated calculation.
After detecting the dimension tables, the file described in Figure 17 allows to create them in
the target multidimensional model.
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 18
Figure 17. Creating a new dimension
The diagram below illustrates all the steps of the X-ETL transformation.
Figure 18. The steps of the X-ETL transformation
Below is a screenshot of the X-ETL application.
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 19
Figure 19. Screenshot of the X-ETL engine
5. Conclusion
In this paper, we presented a new data-driven method for designing a multidimensional model
from a relational model. This method is mainly based on a list of rules to identify the different
elements of the multidimensional schema and consists of two steps. The first one aims to
identify fact tables by calculating the number of foreign keys in each table of the relational
model, and the second one allows identifying dimensions that are directly related to the fact
table, by analyzing the cardinalities of relations. At the end, several multidimensional models
are generated in an automatic way. At this stage, it should be noted that the quality of these
generated models depends greatly on the quality of the source model, and therefore it is very
important to verify the relational source model before using the X-ETL engine. Our future
work will consist of finding a method to select dimensions from tables that are indirectly
related to the fact table in order to generate a complete multidimensional model.
References
Abai, N.H.Z., Yahaya, J.H., Deraman, A., (2013). User Requirement Analysis in Data
Warehouse Design: A Review. Procedia Technology 11, pp. 801–806.
doi:10.1016/j.protcy.2013.12.261
Abdelhédi, F., Zurfluh, G., (2013). User support system for designing decisional database, in :
Proceedings of the 6th International Conference on Advances in Computer-Human
Interactions. Nice, France, pp. 377–382.
Akbar, K., Krishna, S.M., Reddy, T.V.S., (2013). ETL process modeling in DWH using
enhanced quality techniques. International Journal of Database Theory & Application 6,
pp. 179–197.
Annoni, E., Ravat, F., Teste, O., Zurfluh, G., (2006). Towards multidimensional requirement
design, in: Proceedings of the 8th International Conference on Data Warehousing and
Knowledge Discovery. Springer-Verlag, Krakow, Poland, pp. 75–84.
Battaglia, A., Golfarelli, M., Rizzi, S., (2011). Qbx: a case tool for data mart design, in:
International Conference on Conceptual Modeling. Springer, pp. 358–363.
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 20
Blanc, X., Salvatori, O., (2005). MDA en action: ingénierie logicielle guidée par les modèles.
Eyrolles.
Bliujute, R., Saltenis, S., Slivinskas, G., Jensen, G., (1998). Systematic change management
in dimensional data warehousing. Citeseer.
Budinsky, F., Merks, E., Paternostro, M., Steinberg, D., (2009). EMF: Eclipse Modeling
Framework. Addison-Wesley,
Cavalheiro, J., Carreira, P., (2016). A multidimensional data model design for building energy
management. Advanced Engineering Informatics, Vol. 30, issue 4, pp. 619–632. doi:
10.1016/j.aei.2016.08.001
Chandwani, G., Uppal, V., (2015). Implementation of Star Schemas from ER Model.
International Journal of Database Theory and Application 8, pp. 111–130.
doi:10.14257/ijdta.2015.8.3.10
Chang, D., Mellor, D., Poole, J., Tolbert, D., (2003). Common Warehouse Metamodel:
Developer's Guide. Wiley, p. 86.
Choura, H., Feki, J., (2011). MDA Compliant Approach for Data Mart Schemas Generation,
in: Proceedings of the First international conference on Model and data engineering.
Springer, pp. 262–269.
Dahlan, A., Wibowo, F. W., (2016). Design of Library Data Warehouse Using SnowFlake
Scheme Method, in: 7th International Conference on Intelligent Systems, Modelling and
Simulation (ISMS). IEEE, pp. 318–322. doi: 10.1109/ISMS.2016.71
ElMoukhi, N., El Azami, I., Mouludi, A., (2016). X-ETL Engine: from relational model to a
multidimensional model, in: 3rd International Conference on Information Systems for
Crisis Response and Management in Mediterranean Countries (ISCRAM). Madrid,
Spain, pp. 39–42.
El Moukhi, N., El Azami, I., Mouludi, A., (2015). Data warehouse state of the art and future
challenges, in: International Conference on Cloud Technologies and Applications
(CloudTech). IEEE. Doi : 10.1109/CloudTech.2015.7337004.
Ghosh, D., (2010). Multiparadigm data storage for enterprise applications. IEEE software 27,
pp. 57–60.
Giorgini, P., Rizzi, S., Garzetti, M., (2005). Goal-oriented requirement analysis for data
warehouse design, in: Proceedings of the 8th ACM International Workshop on Data
Warehousing and OLAP. ACM, Bremen, Germany, pp. 47–56.
Golfarelli, M., Maio, D., Rizzi, S., (1998). Conceptual design of data warehouses from E/R
schemes, in: Proceedings of the Thirty-First Hawaii International Conference on System
Sciences. IEEE, pp. 334–343.
Gosain, A., Singh, J., (2015). Conceptual Multidimensional Modeling for Data Warehouses:
A Survey, in:Proceedings of the 3rd International Conference on Frontiers of Intelligent
Computing: Theory and Applications (FICTA). Springer International Publishing, pp.
305–316.
Hachaichi, Y., Feki, J., (2013). An automatic method for the design of multidimensional
schemas from object oriented databases. International Journal of Information Technology
& Decision Making, Vol. 12, issue 6, pp. 1223–1259.
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 21
Inria, (2005). ATL transformation example: class to relational.
https://www.eclipse.org/atl/atlTransformations/Class2Relational/ExampleClass2Relation
al[v00.01].pdf
Jensen, M.R., Holmgren, T., Pedersen, T.B., (2004). Discovering Multidimensional Structure
in Relational Data, in: Proceedings of the 6th International Conference on Data
Warehousing and Knowledge Discovery. Springer Berlin Heidelberg, Berlin, Heidelberg,
pp. 138–148.
Jovanovic, P., Romero, O., Simitsis, A., Abelló, A., Mayorova, D., (2014). A requirement-
driven approach to the design and evolution of data warehouses. Information Systems,
Vol. 44, pp. 94–119. https://doi.org/10.1016/j.is.2014.01.004
Khnaisser, C., Lavoie, L., Diab, H., Ethier, J.-F., (2015). Data Warehouse Design Methods
Review: Trends, Challenges and Future Directions for the Healthcare Domain, in : East
European Conference on Advances in Databases and Information Systems (ADBIS).
Springer, pp. 76–87. doi: 10.1007/978-3-319-23201-0_10
Khouri, S., Bellatreche, L., Jean, S., Ait-Ameur, Y., (2014). Requirements Driven Data
Warehouse Design: We Can Go Further, in: International Symposium On Leveraging
Applications of Formal Methods, Verification and Validation (ISoLA). Springer, pp. 588–
603.
Kumari, S., Yadav, P., (2015). Study of Influence of Data Mining & Data Warehousing, in:
Proceedings of National Conference on Innovative Trends in Computer Science
Engineering (ITCSE). IJRRA, pp. 138–140.
Lämmel, R., Saraiva, J., Visser, J., (2005). Generative and Transformational Techniques in
Software Engineering. Springer, p. 41.
Moody, D.L., Kortink, M.A.R., (2000). From enterprise models to dimensional models: a
methodology for data warehouse and data mart design., in: Proceedings of the
International Workshop on Design andManagement of Data Warehouses.St ockholm,
Sweden, p. 5.
OMG, (2003). Common Warehouse Metamodel (CWM) Specification. Vol. 1, Version 1.1,
pp. 8-3.
OMG, (2001). MDA Specifications. https://www.omg.org/mda/specs.htm
Phipps, C., Davis, K.C., (2002). Automating data warehouse conceptual schema design and
evaluation., in: Proceedings of the 4th International Conference on Design and
Management of Data Warehouses. Toronto, Canada, pp. 23–32.
Romero, O., Abelló, A., (2010). A framework for multidimensional design of data
warehouses from ontologies. Data & Knowledge Engineering, Vol. 69, pp. 1138–1157.
https://doi.org/10.1016/j.datak.2010.07.007
Rudra, A., Nimmagadda, S.L., (2005). Roles of multidimensionality and granularity in
warehousing Australian resources data, in: Proceedings of the 38th Annual Hawaii
International Conference on System Sciences. IEEE, pp. 216b–216b.
Santos, M. Y., Oliveira e Sá, J., (2016). A Data Warehouse Model for Business Processes
Data Analytics, in: International Conference on Computational Science and Its
Applications (ICCSA). Springer, pp. 241–256. doi: 10.1007/978-3-319-42092-9_19
Towards a new automatic data warehouse design method, Nawfal El Moukhi, Ikram El Azami, Abdelaaziz Mouloudi &Abdelali Elmounadi
e-TI, Revue électronique des Technologies de l'Information. http://www.revue- eti.net, Numéro 11. 2018. ISSN 1114-8802 22
Sapia, C., Blaschka, M., Höfling, G., Dinter, B., (2017). Extending the E/R Model for the
Multidimensional Paradigm, in: Proceedings of the Workshops on Data Warehousing
and Data Mining: Advances in Database Technologies. Springer, pp. 105–116.
Sehgal, S., Ranga, K. K., (2016). Translation of Entity Relational Model to Dimensional
Model. International Journal of Computer Science and Mobile Computing, Vol. 5, issue
5, pp. 439–447.
Srai, A., Guerouate, F., Berbiche, N., Drissi, H., (2017). An MDA approach for the
development of data warehouses from Relational Databases Using ATL Transformation
Language. International Journal of Applied Engineering Research, Vol. 12, issue 12, pp.
3532–3538.
Taniar, D., Chen, L. (Eds.), (2011). Integrations of Data Warehousing, Data Mining and
Database Technologies: Innovative Approaches. IGI Global.
Varga, M., (2002). A Procedure of Conversion of Relational into Multidimensional Database
Schema. Journal of Computing and Information Technology, Vol. 10, pp. 69–84.
Vrdoljak, B., Banek, M., Rizzi, S., (2003). Designing web warehouses from XML schemas,
in: International Conference on Data Warehousing and Knowledge Discovery. Springer,
pp. 89–98.
Wiak, S., Drzymala, P., Welfle, H., (2012). Using ORACLE tools to generate
Multidimensional Model in Warehouse. Przegląd Elektrotechniczny, pp. 257–262.
Winter, R., Strauch, B., (2003). A method for demand-driven information requirements
analysis in data warehousing projects, in: Proceedings of the 36th Annual Hawaii
International Conference on System Sciences. IEEE.
A data warehouse (DW) is a vast repository of data that facilitates decision-making for businesses and companies. This concept dates back to the 1980s and it has been widely accepted. One of the key points for the success of the process of data warehousing lies in the definition of the warehouse model depending on data sources and analysis needs. Once the data warehouse is designed, the content and structure of the data sources, as well as the requirements analysis are required to evolve, therefore, an evolution of the model must take place (diagram and data). In this context, several approaches have been developed to design and implement data warehouses. Nevertheless, there is no standard process that deals with designing all of the data warehouse layers, also, there is no software that encompasses this type of problem. In general, the majority of these approaches focus on a particular aspect of data warehouse such as data storage, ETL process, OLAP, reporting, etc, and does not cover its entire lifecycle. A Model-Driven Architecture (MDA) is a standard approach, its aims to support all phases of software manufacturing by promoting the use of models and the transformations between them. Moreover, this approach aims to automate the process of software engineering, thereby decreasing the cost of software development and enhancing its productivity. In this study, we present a systematic review of various works on the data warehouse design methods. We compare and discuss these works according to the criteria that seem relevant for this issue. We present a new design approach for multidimensional schemas construction from relational models using MDA techniques, we also develop the resulting research perspectives.
The existence of the library as a technical service unit in a campus is very important to provide services to the academic community. With increasing number of library collections, database that is built has to be able to improve the services oriented on providing data warehouse. Especially at managerial level requires complete information, quickly and accurately to support the process and planning, evaluation, and right decision-making. The design of data warehouse is determined by description of proper information requirement, selection of valid data source, design of data warehouse and ETL process to integrate, extract, cleanse, transform and populate it into data warehouse. The snowflake scheme design method has been applied to accommodate dimension tables of database and other dimension sub-table, so it can generate more information that would be used as material to make a decision.
These last years, the sources of data became very heterogeneous and massive. The term "big data" is born to indicate this phenomenon. Unfortunately, he becomes difficult to manage big data to power the decisional system, and also it increases the time of interrogation of data from data warehouses. Several research works focused recently, on the proposal for the architectures of data warehouses for this type of data ''big data'', and on the implementation of new algorithms of interrogation of these warehouses to improve the time of the answers. This article proposes a Model-Driven Architecture (MDA) approach for the development of data warehouses independently of any execution platform, to allow the facilitation of the development of these data warehouses as well as the migration of information systems based on relational DBMS to systems NoSQL.
Data organization is a critical aspect in Building Energy Data Management. Yet, despite the importance of the topic, no sound reference model for energy data has been proposed in the literature that has been developed according to well-founded methodologies. This article proposes a reference data model developed according to standard multidimensional modeling methodologies and improved iteratively in review meetings with expert users (in the building energy management domain). The quality of the model is evaluated according to complexity, usability, and design metrics thus achieving a high-quality re-usable multidimensional data model that can be applied to create or improve on the data model designs of building energy management systems.
Business Process Management and Business Intelligence initiatives are commonly seen as separated organizational projects, suffering from lack of coordination, leading to a poor alignment between strategic management and operational business processes execution. Researchers and professionals of information systems have recognized that business processes are the key for identifying the user needs for developing the software that supports those needs. In this case, a process-driven approach could be used to obtain a Data Warehouse model for the Business Intelligence supporting software. This paper presents a process-based approach for identifying an analytical data model using as input a set of interrelated business processes, modeled with Business Process Model and Notation version 2.0, and the corresponding operational data model. The proposed approach ensures the identification of an analytical data model for a Data Warehouse repository, integrating dimensions, facts, relationships and measures, providing useful data analytics perspectives of the data under analysis.
The paper presents modern techniques of data modelling and processing, collected by the company. It presents the process of multidimensional data modelling (include the transformation of logical structure to relational structure and next transformation to Multidimensional Model of warehouse in star or snowflake schema). It also shows the ETL process and methods of creating OLAP cubes by use of ORACLE tools to support decision making by business analysts. An approach based on data mining techniques allows analysts to capture certain features in customers, to offer dedicated products for the customer groups. Based on customer behaviour can be concluded about his tendencies to their certain behaviours and preferences.
In secondary data use context, traditional data warehouse design methods don't address many of today's challenges; particularly in the healthcare domain were semantics plays an essential role to achieve an effective and implementable heterogeneous data integration while satisfying core requirements. Forty papers were selected based on seven core requirements: data integrity, sound temporal schema design, query expressiveness, heterogeneous data integration, knowledge/source evolution integration, traceability and guided automation. Proposed methods were compared based on twenty-two comparison criteria. Analysis of the results shows important trends and challenges, among them (1) a growing number of methods unify knowledge with source structure to obtain a well-defined data warehouse schema built on semantic integration; (2) none of the published methods cover all the core requirements as a whole and (3) their potential in real world is not demonstrated yet.
Data warehouses (\(\mathcal{D}\mathcal{W}\)) are defined as data integration systems constructed from a set of heterogeneous sources and user's requirements. Heterogeneity is due to syntactic and semantic conflicts occurring between used concepts. Existing \(\mathcal{D}\mathcal{W}\) design methods associate heterogeneity only to data sources. We claim in this paper that heterogeneity is also associated to users' requirements. Actually, requirements are collected from heterogeneous target users, which can cause semantic conflicts between concepts expressed. Besides, requirements can be analyzed by heterogeneous designers having different design skills, which can cause formalism heterogeneity. Integration is the process that manages heterogeneity in \(\mathcal{D}\mathcal{W}\) design. Ontologies are recognized as the key solution for ensuring an automatic integration process. We propose to extend the use of ontologies to resolve conflicts between requirements. A pivot model is proposed for integrating requirements schemas expressed in different formalisms. A \(\mathcal{D}\mathcal{W}\) design method is proposed for providing the target \(\mathcal{D}\mathcal{W}\) schema (star or snowflake schema) that meets a uniformed and consistent set of requirements.
Source: https://www.researchgate.net/publication/331594346_Towards_a_new_automatic_data_warehouse_design_method
Posted by: jorgeleneaue06917.blogspot.com
Posting Komentar untuk "Data Warehouse Design Solutions Pdf Download"