SOLUTION OF EXTRACTION STAGE:
It would be necessary to develop an enhanced ETL framework to integrate heterogeneous data sources to manage the data integration when the requirement from a larger scale environment like Data ware House wants to be met. The enhance ETL solution includes two core components: ODS and two-step ETL subcomponents, in which the ODS manage to integrate various operational data sources (ODSS) into a simple relational database while the corresponding ETL tool takes two steps to dynamically integrate ODSS to detailed data warehouse(DCDW). Due to the heterogeneous data sources and the related rich data formats (e.g. Excel, Oracle, and SQL Server. Therefore, the proposed enhanced ETL system was designed based on the technology of Eclipse Rich Client Platform (RCP) Plug-ins to meet these requirements. The description of the ETL component is following which is describing the ODS implementation.
OPERATIONAL DATE STORE (ODS) DATA MODEL
ODS is a subject oriented, integrated, variable data set. ODS is used to store the heterogeneous data. Integration of data form the different source can done easily by the ODS data model.
In ODS, data warehouse architectures use data, its type, content and its detail as subject domain, and use as the subject model of main activity content. Then that data divided into tables as per their content detail. These tables include this information which are divided on some kind of uniqueness. In ODS mechanism all table are related to each other like a relationship system. Each record in each table have a unique identifier by which data can be extract easily without doing any extra operation. Due to the support of ODS data model, we could manage ETL functionality to directly extract the data records from original operational data sources to ODS database. Than the stored data in ODS database could be further transformed to DCDW by using specialized procedures. With the help of ODS database we can directly get the original data source.
To Avoid the Data Source overloading problem the warehouse need to do perfect mapping between the ODS and the user request to manage the operational activities properly.
The reading operation queue should be manage properly by using CDC change data capture processor to perform the transaction. By limiting the thread pool we cant overcome this problem. The pool will set the requests in different queue then it can manage easily. By limiting the HTTP sessions amoung the different queues on the basic of same performance we can do minimum load on data source. The processors don’t have to handle different type of request simtanouelsy due to managing the same type of operational requests in thread pool.
To avoiding the data source overloading stuck, the data warehouse need to manage the threads pool. The thread pool having many requests of the same type demanding the same operation on the same time. To avoiding the stuck in pool, the request should be terminated after a specific time and Exist On out of memory exception should be alert.
The out of memory exception have to rejects the requests to enter into the thread pool when the memory is full.
Solution of the Transformation Stage:
The main problem in transaction stage is Master Data overhead that means the data is divided into two types the name of the data in storage called master data and the transactional data. As the master data remains same than transactional data. therefore Data management is one of the most important, and complex operation facing by any data warehouse. Non-integrity and inaccuracy of manually assembling inconsistent, redundant, and outdated data, many organizations are seeking a new solution of data management to seamlessly convert hundreds of data sources into powerful data assets that can be shared across the data warehouse.
Master Data Management (MDM) would be the best solution to this problem due to which the aggregation of the data can also be under control. Master views are created by integrating data from a variety of internal data sources such as enterprise resource planning (ERP), MDM virtually set the dta sources and done the operation on it. MDM is a mechanisim which divide the data into three core things: