Getting Unstructured Financial Data into Data Warehouses

Print Version

Many complex businesses need to draw data from a variety of sources to support central risk and financial management. This requirement is particularly important in the insurance community with the advent of Solvency II. It commonly leads to the implementation of data warehouse projects to deliver one central version of the truth. In so doing, it is likely that many ‘operational’ spreadsheets that previously manipulated and reported data will be eliminated.

However, even at the end of these projects, many spreadsheets and other ‘unstructured’ sources such as .CSV files and MS Access databases still remain as the source of specific items for the data warehouse. Architects face a choice: rely on a manual ETL (Extract, Transform, Load) process to uplift this data into the warehouse or automate a ‘hard-wired’ solution. Unfortunately, the reason the data originates from an unstructured source is commonly that the exact structure, timeliness and volume of data will vary over time. Manual intervention can accommodate these variations but will be inconsistent and error prone. On the other hand ‘hard-wiring’ may need reconfiguration every time the business changes or, worse, may easily uplift the wrong or inadequate data without being detected.

ClusterSeven provides a solution to these problems for your unstructured financial data, whether in spreadsheets, Access databases or other file types such as .CSV, .TXT, .DAT etc. This ensures that data is automatically verified for timeliness, structure, content before automatically uplift into your warehouse. Potential anomalies are alerted by message or dashboard notifications to ensure remediation before uplift. Follow this link for a Case Study of what this means in practice.

The following table provides examples of how common operational problems are now eliminated. If you have other challenges to solve please do not hesitate to ask us.

Challenge ClusterSeven Solution

Changes to data location in source file invalidates planned data uplift

Source data is indexed and tracked so that common user changes such as moving cell locations or sorting data have no impact on uplift

Updates to source data may be erratic in timing

Warehouse can be updated when source updates occur, whether or not they fit a fixed timetable. Notifications available if updates occur or if updates do not occur in specified time periods.

Updates to source data may be incomplete or outside tolerance levels

Users may be notified if data values change outside expected tolerances, or do not change at all - at a single cell level or across a large range.

Alterations to structure of source data file may corrupt planned uplift

Users may be notified if source data format changes e.g. addition of new spreadsheet data columns within source data

Specific source data items are changed, created or deleted

Notifications available if data conditions are broken e.g. transaction maturation dates do not match today’s date. Management information on new/deleted/changed items is also available.

Print Version

“When ClusterSeven approached us, it was the first time we had seen such a product. We immediately realized its value to us as a company.”

George Flynn, business analyst of product control, European Credit Management (ClusterSeven client)

“Adopting ClusterSeven gives us central visibility of activity in our business critical spreadsheets and helps protect against operational loss.”

Pete Coleman, Head of IT Development at Nationwide