March 18, 2020
As the fuel that powers their ongoing digital transformation efforts, businesses everywhere are looking for ways to derive as much insight as possible from their data. The accompanying increased demand for advanced predictive and prescriptive analytics has, in turn, led to a call for more data scientists proficient with the latest artificial intelligence (AI) and machine learning (ML) tools.
But such highly-skilled data scientists are expensive and in short supply. In fact, they’re such a precious resource that the phenomenon of the “citizen data scientist” has recently arisen to help close the skills gap. A complementary role, rather than a direct replacement, citizen data scientists lack specific advanced data science expertise. However, they are capable of generating models using state-of-the-art diagnostic and predictive analytics. And this capability is partly due to the advent of accessible new technologies such as “automated machine learning” (AutoML) that now automate many of the tasks once performed by data scientists.
Algorithms and automation
According to a recent Harvard Business Review article, “Organisations have shifted towards amplifying predictive power by coupling big data with complex automated machine learning. AutoML, which uses machine learning to generate better machine learning, is advertised as affording opportunities to “democratise machine learning” by allowing firms with limited data science expertise to develop analytical pipelines capable of solving sophisticated business problems.”
Comprising a set of algorithms that automate the writing of other ML algorithms, AutoML automates the end-to-end process of applying ML to real-world problems. By way of illustration, a standard ML pipeline is made up of the following: data pre-processing, feature extraction, feature selection, feature engineering, algorithm selection, and hyper-parameter tuning. But the considerable expertise and time it takes to implement these steps means there’s a high barrier to entry.
AutoML removes some of these constraints. Not only does it significantly reduce the time it would typically take to implement an ML process under human supervision, it can also often improve the accuracy of the model in comparison to hand-crafted models, trained and deployed by humans. In doing so, it offers organisations a gateway into ML, as well as freeing up the time of ML engineers and data practitioners, allowing them to focus on higher-order challenges.
Overcoming scalability problems
The trend for combining ML with Big Data for advanced data analytics began back in 2012, when “deep learning” became the dominant approach to solving ML problems. This approach heralded the generation of a wealth of new software, tooling, and techniques that altered both the workload and the workflow associated with ML on a large scale. Entirely new ML toolsets, such as TensorFlow and PyTorch were created, and people increasingly began to engage more with graphics processing units (GPUs) to accelerate their work.
Until this point, companies’ efforts had been hindered by the scalability problems associated with running ML algorithms on huge datasets. Now, though, they were able to overcome these issues. By quickly developing sophisticated internal tooling capable of building world-class AI applications, the BigTech powerhouses soon overtook their Fortune 500 peers when it came to realising the benefits of smarter data-driven decision-making and applications.
Insight, innovation and data-driven decisions
AutoML represents the next stage in ML’s evolution, promising to help non-tech companies access the capabilities they need to quickly and cheaply build ML applications.
In 2018, for example, Google launched its Cloud AutoML. Based on Neural Architecture Search (NAS) and transfer learning, it was described by Google executives as having the potential to “make AI experts even more productive, advance new fields in AI, and help less-skilled engineers build powerful AI systems they previously only dreamed of.”
The one downside to Google’s AutoML is that it’s a proprietary algorithm. There are, however, a number of alternative open-source AutoML libraries such as AutoKeras, developed by researchers at Texas University and used to power the NAS algorithm.
Technological breakthroughs such as these have given companies the capability to easily build production-ready models without the need for expensive human resources. By leveraging AI, ML, and deep learning capabilities, AutoML gives businesses across all industries the opportunity to benefit from data-driven applications powered by statistical models - even when advanced data science expertise is scarce.
With organisations increasingly reliant on civilian data scientists, 2020 is likely to be the year that enterprise adoption of AutoML will start to become mainstream. Its ease of access will compel business leaders to finally open the “black box” of ML, thereby elevating their knowledge of its processes and capabilities. AI and ML tools and practices will become ever more ingrained in businesses’ everyday thinking and operations as they become more empowered to identify those projects whose invaluable insight will drive better decision-making and innovation.
Tags: #ML #AI #deeplearning #TensorFlow #PyTorch #bigdata #autoML #MLblackbox #algorithms #datascientist #automation #scalability #GPU #innovation #datadrivendecisions #NeuralArchitectureSearch #AutoKeras