The rise of different technologies such as AWS, Alteryx, Tableau, Databricks, Datarobot havedemocratised analytics for everyone.
Democratisation is about giving business users the power and capability to do advanced analytics without coding just in the same way as we use Windows instead of DOS.
What this democratisation is really about is people’s battle for relevance, the battle for existing people within companies who don’t wish to embrace change vs those that do. Often it comes from a fear that they are no longer useful, important or relevant to the organisation.
AWS has democratised infrastructure enabling companies to focus on business outcomes
Democratising and modernising analytics platforms
Anyone with a little knowledge can easily spin up a AWS EMR cluster and access nearly 100 cores in Spark to do processing without breaking a sweat. That type of processing power is incredible compared to what can be assessable on premise. In addition, the cost to run it is only $4 an hour. It’s almost crazy cheap! Certainly cheaper in a server at home.
One of the most recent advances in AWS is AWS Redshift spectrum. This is a really fast and easy way to query data that is stored in s3 instead of going through the traditional process of loading to a warehouse. This is definately a game changer as it enables users to directly access all their data , not just data stored within the databases be it Redshift or Aurora.
Big data is relative
it’s funny how everyone thinks that their data is ‘big’. After reading a few case studies on AWS like Netflix with 60 Petabytes in S3 and who makes up nearly 1/3 of all Internet traffic in the US. they also land over 760 billion events into s3 daily.
The weather company who produces 15 billion predictions a day or NTT Docomo who has over 68million customers. Most data in companies are really small by comparison. It’s really funny when you read about people using redshift for Petabyte scale queries. That’s unreal. Check out their video and Slide share
A few interesting insights are
- Data team is relatively large with over 100 staff out of 2000 total company staff
- S3 is the main data lake where everything gets stored before downstream processing. The real benefit is infinite scaling and you can collect as granular as you want. I was lucky enough to use the technology at Amaysim and it’s fantastic as it enables one to use other database technologies easily in the future as you are not stuck to 1 tech.
- hive metastore used to store information about files in s3.
- Presto used for interactive queries
- Redshift and Teradata used for data warehousing and Tableau for visualisation
Alteryx enables anyone to advanced analytics without coding
Alteryx has revolutionised the analytics market by making it easy for a business user to simply download the product, combine data from tons of data sources and then perform ETL, spatial, predictive and optimisation in 1 hit.
This type of reach and power is completely unreal as it shift the power from the analyst directly into the business. It also enables anyone to become an analyst and ask and answer questions of their data. Alteryx turns every data worker into the discoverer of marginal profitability!
To top it off Alteryx server is also available as an AMI on Amazon. just click start free trial.
Tableau the King of visualisation
The ability to see and understand your data is clearly the mission of Tableau and they take it really seriously. Just look at the number of customer stories and you can tell it’s a successful company. It’s also fantastic that it’s also available on Amazon.