My Tweet Curator: May 2013

Sunday, May 26, 2013

How to assign Data Stewards to logical pieces of an org's data | LinkedIn Group: DAMA International

Follow the LinkedIn discussion

My comment

As a first step, an organization needs a high-level logical data model to represent the target enterprise data architecture. Using that model, you can assign each subject area / entity to an organizational target unit that is typically responsible to create/update that entity as their respective owner. (I use the term "target" because the current enterprise data architecture - if even documented - may be siloed and also that the current organizational structure may not be optimized for future purpose.)

Each owning organizational unit should be represented by one or several Data Stewards.

Applying the above principle, the subject areas / domains of entities that are assigned to a certain organizational unit can be easily drawn, as the vast majority of entities has (more precisely: in the future should have) only one natural owner which typically is not only the creator of the data but usually also its main consumer.

However, particular attention is to be paid to the Master Data domains, especially Party. The objects of Party occur in many different roles (customer, supplier, employee etc.) and therefore have either none or many potential "owners" (Sales, Customer Service, Purchase, Human Resources etc.).

I recommend to create a separate central unit that takes ownership e.g. for all matters related to Party and represents the interest of the organization as a whole and not only of one department (the latter being one of the reasons for a siloed structure in the first place). This central unit has "the license" to define the entities/attributes related to Party, the business rules, the data governance measures etc. Other organizational units ("licensees") embed this "one and only" way of creating/updating objects of Party into their processes and supporting applications. Certainly, the Data Steward(s) representing the licensor need to consult the data stewards representing the licensees to make sure that all the requirements of the latter units are taken into consideration.

This been said, there is no reason to "boil the ocean", i.e. no need to have a complete coverage of the whole organization, before any fruitful work can be started.

I agree with a previous comment to focus in priority on areas with problems, i.e. where engaging in Data Stewardship can be justified by the ROI and/or where the COI (cost of inaction) indicates ongoing loss of money or the risk of fines (for industries that need to comply with requirements of regulatory authorities). Data Stewards for prioritized areas can start their work as a project task force and may later evolve into a formal, permanent role.

Where to store data quality standards? | LinkedIn Group: Data Governance & Data Quality

Follow the LinkedIn discussion

My comment

The metadata repository is the right place to store data quality standards: those that can be automatically transformed into database constraints such as referential integrity, data types, data nullability, data domains etc. as well as, more importantly, those business rules that require human interaction.

The hardest part is the perseverance and discipline necessary to maintain the data quality standards, but also to instruct and monitor users that standards are consequently applied.

My additional comment

To answer the question .., related to my comment "The hardest part is the perseverance and discipline necessary to maintain the data quality standards, but also to instruct and monitor users that standards are consequently applied.":

The weakest element in the integrated system of people - processes - tools is undoubtedly the human factor. Users that enter data do not only need to be trained and monitored in their doing, but the organization has to create a cultural climate that rewards high quality of data.

Example: If people that enter data are paid by number of correctly and completely created/updated objects (persons, addresses, products, orders etc.)), the resulting data quality will naturally be higher than if those people are paid by time.

In general, there needs to be a system of incentives that make it attractive for users to contribute to data quality. A simple, but important factor to increase their motivation is also to ask users on a regular basis for their feedback about difficulties and possible improvements of the process.

MDM vs DWH vs CRM | LinkedIn Group: MDM - Master Data Management

Following the LinkedIn discussion

My comment

Subject of MDM is the management of all master entities and the relationships among them as well as with other non-master entities, since the commercial value of MDM is derived from the relationships (roles). ...

An insurance company like your organization that covers Life, General and Health Insurance will need to consider the following master data entities

* Party (natural person or legal entity, but also social group such as household) with their roles "policy owner", "insured person", "injured person" etc.

* Thing (any tangible or non-tangible object, e.g. your products, but also cars, houses) with their roles "policy product", "insured object", "claim object" etc.

* Location (physical or virtual place) with their roles "insured address", "claim location" etc.

The above examples show that MDM in an insurance company is quite complex if you want to profit from it to the maximal extent. Since MDM in any established insurance company is a multi-year integration endeavor that will affect (almost) each and every department, I recommend to develop a plan for the best individual economical approach (cost of inaction, return on investment) to stepwise cover your organization's application landscape.

My additional comment

The duplication of master data should of course be avoided.

In the ideal case, master data are managed by a central application, and all operational applications directly create and update master data using an API of that central MDM application (hub architecture style: "Transaction"), but this is also the most ambitious solution.

Since it will not be possible (and is not recommended) to reorganize all operational applications in one project, your organization will need to develop the already mentioned stepwise approach. During the interim period, it may be necessary to keep master data redundant in the legacy applications and the evolving central MDM application.

However, what will be the best economical solution (i.e. which hub architecture style will best resonate with your organization), can only be found via an individual analysis and business modeling (data, data flows and processes) of existing and potential future applications. For a quick overview of the principal architecture styles for the MDM hub, I recommend you to check out this page: http://datamanagement.manjeetss.com/page/2

Data Governance as a part of the SDLC | LinkedIn Group: Data Governance & Stewardship

Follow the LinkedIn discussion

My comment

I agree with most of what has already been said in preceding comments:

Standardization of the SDLC and the related artifacts such as data models, process models and data flow diagrams definitely contribute to transparency which is a basic demand of any Data Governance endeavor, regardless of industry-specific compliance requirements.

However, Data Governance primarily demands traceability of the production data itself, i.e. transparent data lineage is a major prerequisite so that consuming applications / users can judge the reliability and trustability of data.

Consequently, the SDLC of applications that create, update or delete governance-sensitive data will need to include logbook tables into the application data models and subsequently into the application databases. Such logbook tables comprise e.g. the following columns (and their related trigger functions) and record for each modification event of an application database row (and possibly even of an application database row column):

Timestamp
Actor (e.g. staff member, batch process, third-party)
Physical source (e.g. third-party self-service (Web) application form, postal code verification from external reference, MDM hub, migrated database, merger / acquisition database)
Status (e.g. active, inactive because customer passed away, inactive as being a duplicate entry)
Quality indicator, i.e flagged if incomplete and/or incorrect (NOT NULL columns empty, filled with semantically incorrect values or meaningless defaults); flagged if referential integrity is violated (e.g. not every customer has an address)).

Data extraction mechanisms for data warehouses / BI purpose will need to have the ability of filtering data based on its logbook information (and of tracing the data lineage backwards) to make sure that only reliable data contributes to a decision process (or the user is accordingly warned about the related risk).

MDM and map projection | LinkedIn Group: MDM - Master Data Management

See the triggering post

Follow the LinkedIn discussion

My comment

Surprisingly many organizations have not realized yet that they are part of a global economy. At least those that have a Web presence should consider that they have an audience (and maybe even clients) outside of their geographical, political and cultural area (usually their country).

On a daily basis, we can observe that Web publications, even those of internationally renowned businesses, show a lack of awareness and sensibility for their foreign visitors. A simple example is the date format:

Numerous organizations continue to use their "local" way of displaying a date like "02/04/2012" which leaves their audience second guessing if this is "February 4, 2012" or "April 2, 2012" (the latter one is the way most Europeans will interpret it). With all due respect to cultural identity and geographical habits, the Web is a worldwide forum, and time-related information such as a date needs to be displayed in a non-ambiguous manner.

My thoughts regarding MDM (certainly not exhaustive, but a start):

Think as a cosmopolitan, make the world your universe.
Make sure that the descriptors that comprise an address identify the place uniquely worldwide, i.e. the foreign post service can deliver successfully. (The use of a geographic coordinate system based on longitude and latitude in addition to the postal address system is already in discussion.)
Apply international standards. If there is no globally accepted standard, the minimum "standard" is non-ambiguity. In the above example, a date should/could be comprised of using two digits for the day, (at least) three letters for the month and 4 digits for the year. So depending on the cultural background, personal taste etc. the date could be displayed as "4. Feb 2012", "Feb 4, 2012" and even as "4 2012 Feb" and will not leave any room for interpretation.
Mention the unit system, e.g. a temperature of "40 degrees" can be either very warm (if related to "degrees Celsius (Centigrade)") or pretty cold if related to "degrees Fahrenheit"; same of course applicable to length, weight etc. Certainly the metric system is recommended.

Also, avoid derived data as part of MDM: the age of a person is not Master Data, the birth date is. (The Mercator projection coordinates of the outline of Greenland are not Master Data, the coordinates in longitude and latitude are.)

With the above suggestions, MDM should have a solid basis - until we conquer other planets or will be victim of a "merger" after being invaded from a distant galaxy...

What domains are people managing with MDM? | LinkedIn Group: Multi-Domain MDM

Follow the LinkedIn discussion

My comment

There are three main domains to be considered for Master Data Management: Party, Location and Thing.

Party is any natural person or legal entity that is relevant for your business. Depending on your industry, you may also want to consider Household as a sub-domain of Party. A party can take multiple roles e.g. be a customer and/or a supplier and/or an employee and/or a subcontractor.

Location is a physical or virtual place that is relevant for your business. Traditional examples are Postal Address, Phone Number. With the raise of social media / alternative ways of communication, you may want to consider e.g. Twitter Handle, LinkedIn Account, Skype Id as (virtual) locations.

Thing is any tangible or non-tangible object that is relevant for your business. As opposed to Party and Location, the range of sub-domains for Thing will vary from industry to industry. Typical examples are Product, Service, Material, Item, Part, Store, Machine, Tool.

How to start MDM? | LinkedIn Group: Multi-Domain MDM

See the triggering post

Follow the LinkedIn discussion

My comment

Your question revives the age-old discussion whether the development of a new application system requires a detailed analysis of the legacy system, and if so, at which stage of the project and to which extent.

For MDM projects as well as for the development of any application, I recommend the following principal steps in this particular order (which is of course only an extract of the actual activities during a software project):

1. Develop the data model structure with the primary metadata (entities, relationships, keys, main attributes including their names and textual definitions) solely based on the requirements for the future application system to avoid being biased by the legacy system (or by any standard software package considered as candidate for the replacement of the legacy system).

2. Once the structure of the new data model is solid, compare the metadata (data model) of the new application with the metadata of the legacy system to add missing attributes including their names and textual definitions to the future data model (i.e. ensure that at least all existing metadata or their semantic correspondences are included in the new data model).

3. Analyze the current data content to complete the attributes’ descriptors (type, length, nullability, permitted range of values, default values) in the future system's data model (i.e. ensure that all values of the legacy system can be mapped / migrated to the new system).

In other words, the current data content does not need to be examined before the structure of the future data model is solid, but certainly before the (logical) data model can be considered being complete and verified.

Technical proficiency of PMs in Canada ? | LinkedIn Group: Canadian PM

Follow the LinkedIn discussion

My comment

Glad to read the previous comments.

For the last few decades, since hardware and software have conquered organizations for the better or the worse, there has been a widespread myth about the power of the "IT silver bullet" that just requires technically-correct implementation, and the magic will happen. And if not, there will be a next technology that will do it.

Accordingly, too many organizations are in the belief to need IT project managers that are the xyz software/database etc. expert. <begin of snark>Of course, in case a project fails, stakeholders of those organizations will be exonerated, as they have done "everything" to choose the right project manager.<end of snark>

Let's face it - if we look at real reasons why IT projects failed (and continue to fail), we find organizational issues such as

Insufficient involvement of stakeholders
Undefined project scope
Lack of resources
Insufficient communication (not appropriate to the problem)
Poor planning
Bad budgets
Missing methodology and/or tools

and "the project manager did not know the technology" is none of them (Independent sources can be easily googled).

My short answer: A seasoned IT project manager's success will not depend on his knowledge about any specific technology (and I do not cite from a book, but reflect my own experience!) - Does it hurt to be a technical specialist? Yes, it can, because it may turn the focus too much away from the real business requirements and from "managing the project".

My additional comment

This last question .. [What if the PM does not look at solutions outside his current skill set? by Patrick Richard ing., PMP ] .. is actually one of my major concerns, as the history of IT projects continues to repeat itself: Too many client organizations have a biased fixation on a particular target technology and, instead of evaluating a solution based on documented business process and data models, a prematurely selected technical environment is twisted until it "approximately" matches the requirements.

Returning to the original question - my advice for Human Resources departments (and recruiters acting on their behalf): Pay attention to the soft skills and don't look for the technical expert, otherwise you may end up only with the second best project manager!

Data Governance Management. Is it a Program or a Project? | LinkedIn Group: Data Governance & Data Quality

Follow the LinkedIn discussion

My comment

In a nutshell: Data Governance Management starts as a Project with the purpose to set up roles / responsibilities, procedures and technology to ensure regulatory compliance, data quality and data security.

It turns into a Program where operational units practice - as agreed in the initial Data Governance Project - their responsibilities and use the defined procedures and technology on a daily basis. Operational units should report issues with the Program to a Data Governance Committee which may trigger follow-up Projects to adjust responsibilities, procedures and technology to improve the existing Program.

It is the task of the Internal Audit to check on a regular (and/or random) basis that operational units follow their obligations as defined in the Program.

Tuesday, May 21, 2013

Foreign Key Constraints | LinkedIn Group: Data Modeling

Responded to "Ways to document 'un-enforceable' FK constraints in your PDM?" lnkd.in/cHmnst #DataModeling #ForeignKeys
— Axel Troike (@AxelTroike) May 21, 2013

Saturday, May 18, 2013

Relationships | LinkedIn Group: Data Modeling

Discussed "Relationships" in the LinkedIn group "Data Modeling" lnkd.in/y2ANwY #DataModeling
— Axel Troike (@AxelTroike) May 14, 2013

Shadow Tables | LinkedIn Group: Data Modeling

Replied to #DataModeling question "How to show SHADOW [history/log] tables in Data Models?" lnkd.in/XGb4YV
— Axel Troike (@AxelTroike) May 15, 2013

Data Quality Standards | LinkedIn Group: Data Governance

Responded to LinkedIn question "Where can we store #DataQuality standards?" linkedin.com/groupItem?view… #DataGovernance
— Axel Troike (@AxelTroike) May 17, 2013