Chief Data Officer Melbourne – Day 3

Today’s session opened with a fantastic presentation by Ravi Vijayaraghavan , VP at Flipkart.

It was amazing start to learn that Flipkart is “The Amazon of India”. It reminds me of the presentation made by at the  Databricks AI summit where there  analytics and artificial intelligence are used to solve real world business problems ( eg: supply / demand forecasting, fraud detection , optimise conversion via automated image recognition).

Presentation by Ravi Vijayaraghavan , VP at Flipkart

Key takeaways :

  • Flipkart is the “Amazon” of India
  • Flipkart has nearly 120m customers, with over 80m product reviews
  • They have a team of 200 analysts (comprising of data analysts, business analysts who build predictive models and a small core data scientists).
  •  Business analysts can build predictive models and have a good understanding of statistics.
  • Scale of data matters and can given a lot of insightsOver 120m users they really have a detailed understanding of customers
    • Mobile phones are transforming India (not laptops) – Penetration of mobile phones reigns supreme.

Scale of Flipart data platform

  • Leveraging the data platform and machine learning platform at scale
  • Distinguish between data assisted human decisions (eg: Operations planning/strategy) vs moData assisted machine decisions (listing quality, selection quality, fraud, personalisation)

A culture of continual experimentation : The need to continually experiment -> nearly 50+ launches a week which go through an A/B test.

Data translating to business decisions across nearly all areas including product insights, user insights, traffic, price elasticity modelling , supply chain optimisation etc,

Data science vs Measurement science

Measurement science equally as important as data science 

  • Understand the company strategy, identifying the areas of focus,  frame it into Level 0 metrics.
  • Articulate these L0 metrics then build out a statistical model for the underlying driver metrics
  • Assign each of drivers to the relevant business unit head to ensure accountability

Business problem:  Flipkart net promoter scores were dropping and they wanted to turn it around.

  • Flipkart felt that they lost ground in net promoter scores compared to a few years ago.
  • The execs and Ravi set an ambituous goal to improve net promoter scores by 10pts in a year.
  • Ravi’s team created a statistical model that looked at the relative importance of each driver, its sub drivers and assigned accountability where ownership is directly attributed to each of the sub drivers.
  • The key here is really assigning KPI accountabilities to the relevant business area and holding them to accountPricing  index assigned to Business category
    • Product category assigned to the consumer platform etc etc

Business problem  : How can we help customers purchase better through the use of online reviews?

  • Over 80m reviews so it cant be done manually.
  • Initial simplistic algorithm was based on ‘proportion of up-votes’ and recency of review however they found this was not working well

Applying machine learning to incorporate natural language ‘sentiment’ together with other metrics to inform a more balanced view for customers

Business problem solved by machine learning : “How much do inventory should Flipkart stock of each item? 

Leveraging deep learning and computer vision to Improve purchase conversion rates and reduce fraud  

  • Customers are sensitive to the quality of images and they dont seem to purchase/convert if its a poor quality image.  Leverage computer vision to identify and classify images by extracting key features
  • Auto categorisation of images that do not comply with standards (eg: images with a shadow)
  • Using image recognition to correctly scan product items to find their “Real retail price”

Presentation by Yellowfin – Future of analytics departments

  • As an industry, there is a need to shift to using BI to create value for the business
  • Majority of companies still struggling with preparing data
  • Focus on people with techniques and don’t hire for people with tools skills
  • We need to run analytics like a business

Round table Panel discussion: Aligning your data strategy to support AI

  • Focus on business value and not Hollywood AI.
  • Need to assign a business owner to each analytics initiative.
  • Focus on educating and applying the solution in the business.
  • Most companies still struggling with data prep and acquisition. AI not the real problem
  • Bring use cases, quick POCs , bring minimum viable product . Showcase to business and leadership teams

Presentation by Agustinus Nalwan, Head of AI at

Data quality is critical for AI to work effectively

  • Rubbish data will generate rubbish AI outcomes  eg: see example of generation of baby names

Cyclops 2.0 is car-sales image recognition software built in-house using the tensorflow framework

  • Incredible accuracy
  • 1st version was built in a few months and 2nd version built in 5 months (eg: Spent 1 day a week for next 3 months and buildout AI )
  • Key learnings:
    • AI enables competition advantage however
    • User interface design and the way customers use the product is critical for adoption to occur.
    • Use of transfer learning enabled the model to saved a lot of computation time( 4 weeks vs 8 months)

using AI to do comparisons on cars

leveraging AI to build features – Tensorflow

remember to incorporate customer behaviour

Roundtable : Future proofing for GDPR discussion

Key learnings include:

  • Impact of data breach can be significant eg: equifax., this can be used to educate your board to invest in data and analytics.
  • Run a scenario with your board for data breaches and the potential costs and impact.
  • Take the opportunity to re-architect IT to support the business outcomes.
  • Make the effort to identify data owners within the company.
  • Data Stewart/data owners can be segmented by:
    • geography
    • department :
    • data domain (can be multiple ) – Don’t look for perfection
  • Identify process owners, applications owners, data owners, and IT owners.
  • Data Stewart’s also funnel things upwards via management reports. Analytics can be used to influence data Stewards. eg: inconsistent management reports
  • Setup a support function- Leverage existing IT platform eg: servicenow to align tickets. Ticket can go to legal for GDPR deletion, only the end user can close the query.

Chief Data Officer Melbourne – Day 2

Another fantastic day in Melbourne for all analytic professionals attending the Chief data officer Melbourne event at the Park Hyatt.


Presentation by Anwar Mirza , Chief Data Officer TNT Global 

Anwar is an amazing speaker, sharing his 30 years of data governance expertise with the audience. After all its all about understanding and leveling the playing field with analytics.

Key takeaways are :

  • Shift analytics from compliance to enable digital transformation. We are here to not to just deliver data but instead to support business outcomes
  • Define what comprises data governance, master data management and business outcomes.
  • Data governance is the control and support of
    • Business definitions
    • Business rules
    • Master data management
  • All of us own the data and not just the chief data officer
    • There is an opportunity to setup a business analytics support desk whereby anyone in the business can call up just like a IT helpdesk.
  • Put a tangible value on the actual data itself
    • Data quality index
    • Completness
  • Typical cross functional processes may give
    • Incorrect or absent business rules which can then be used to derive a cost of data quality. A fantastic example was given on the unit of cost delivery



Quick poll of key challenges facing analytics professionals

There remains large challenges around data governance and modernizing legacy systems 


Presentation by Bala Ayyar – Chief data officer Société Générale

In any crisis lies an opportunity.. During the GFC period, there was need to run analytics in real-time and intraday especially in a crisis scenario eg Lehman brothers collapse


Building a strong defence is key to enabling the enterprise


Don’t be the next KODAK – Leverage your data to enable value creation such as improving the customer experience, driving operational efficiency and insights as well as monetise data.


Does your data strategy deliver a google like experience? Why do I need to get a data scientist to call a IT database guy.

  • Data is still viewed as a departmental process not an enterprise capability
  • Legacy processes and systems
  • Cultural willingness to change and drive decision making.

Data lineage

  • Starting and the end points appear to be more important than just the middleware
  • Experiment with different tools  to see what works

Presentation by Marklogic : Moving from just observing the business to running the business

Presentation by Mario Vinasco, Data science & Analytics at Uber

A few breakthroughs in AI came from image classification, availability of cheap computing resources to do scale and compute with a neural network.

Business question: how do we identify those drivers or customers who will churn

  • Most importantly is the way a classifier works eg: driver churn, riders who order Uber eats , people who unsubscribe
  • Similar to classifying cats and dogs, eg driver history , how far from home, collect many data points.
  • Tips of displaying churn outputs to the business
    • Setting a threshold
    • Evaluate ratio of true positives to the rest

Business problem: The business asks ‘how many emails should we send to our riders and driver’

Uber has a history of running other models to solve similar business problems.

  • No model is 100%
  • Split things into deciles and compare model output
  • Triage things into simple high medium and low (email open probability vs unsubscribe frequency)
  • Combine business intuition and model


Business model problem: how many u we customers will want to get an Uber eat account

  • Focus on people who may need help with converting ie: Middle of the pack
    • Don’t worry about those that fail to convert on the left,
    • Those on right will convert on right at high rates.

Example of optimisation of a key performance indicator eg: Marketing $

  • Marketing mix models
  • Top down attribution via mixed models
    • Smart marketers can just use a handful of curves until marginal return diminishes. Estimation of curve is hard

How much of extra marketing can lead to incremental trips?

Learns the parameters by iterating through

Roundtable discussion on centralised vs decentralised analytic models: 

Carsales operates a hybrid model ensuring skills and integrity of data and definitions are maintained

  • Centralised model:
    • A central team may be helpful to grow the capability given small pool of deep capability and expertise ? Like minded come together challenge each other and learn
    • Creating a community of practice is helpful.
    • Consistency in tools and skills will enable users to tap into various business data sources
  • De-centralised model:
    • Evaluate if there is enough data literacy in the business before embarking on a decentralised model
    • Focus on building reuseable data assets and reduce duplication of data.
    • Decentralised doesn’t mean that teams don’t talk to each other. How do we co-locate analysts to enable people of the same mindsets talking to each other
  • Hybrid model:
    • An alternative model is to have BI functions in the business , centralise of data science function to get scale.
    • Combine teams with diverse skill sets ( engineering, data science , IT) to get maximum business benefit.
  • Focus on developing productive relationships with IT, be the conduit between technology, business and other stakeholders

Chief data officer forum – Melbourne Focus day 1

Today I had the opportunity to attend the chief data officer conference in Melb on 3rd Sep 2018. It was the second time that I have attended the chief data officer summit and it was such a fantastic learning opportunity for analytic practitioners.

The day started with a fantastic keynote by Chris Butler, Chief data officer HSBC APAC.

Some of the key learning include :

  • Data governance comprises policy and standards , measurement , quality and change management.
  • Data is generated and dependent on processes and other parts of the organisation .Processes create data , fix the root cause instead of fixing data quality issues downstream. Take for example when a user interface is not properly designed Eg : include country for swift funds transfer. Address the root cause.
  • Measure the effectiveness of the process eg how many times does it take to get the right address
  • Talk in business language establish ownership
  • Dashboards must be intuitive, have a target and people must do something with it.
  • Embed data quality metrics in executive scorecards. CEOs should have a data quality measurement ( largely met , fully met, partial met. Don’t give just numbers )

This was followed by a Roundtable of CDOs which I hosted in data governance. Key learnings are

  • Data governance requires continual effort and focus to fix at the source.
  • Data governance is no longer a nice have but instead critical for the growth of both new and iconic companies
  • Developing a vision and align data governance to a business problem is a fantastic way to sell it into senior management

Presentation by ANZ bank – Paul Davies head of data governance

  • Distinguish data quality issues from source and target. Conscious decision around remediation and risk acceptance.
  • Data quality is an operational risk and it’s everyone’s problem
  • Align Data ownership assignment with process ownership
  • Win the hearts and minds – creators of data
  • Process is important as data quality as it links to data. Raise it as a data quality issue
  • Distinguish data quality issue vs system enhancements- people put system enhancements . Prexisting requirement Data quality vs data gap

Embedding data into the DNA of your company

Data governance by AiHua Kam, Standard chartered bank

Have a data requirements document , the standards and what it means. Data acceptance testing – profiling and testing.

Presentation from QBE head of data

Link business risks to data issues. One can inform the other.

An opportunity to align both offensive and defensive strategies together.

Realising the benefits of analytics requires a strong foundation. – data

It is important to include data ethics given the changing dynamics

  • build quality in and don’t just inspect quality coming out.

Data stewards – go beyond traditional KPIs and measurement

Recognise what people do. Creation of data ninjas – those that champion data in the business eg: governance

Presentation by GM information management Medibank

    opportunity to simplify and centralise 4 data warehouses into 1. Currently 2 in cloud ,2 on premise. 7 Bi tools looking to reduce to 2-3.

nearly 140 FTEs across analytics

Link a business glossary ( with appropriate business owners) to the physical data assets

Leverage the same metric from the global library of the metric owners.