What is TensorFlow? An Intro to The Most Popular Machine Learning Framework
In the last few years, we’ve seen an amazing rise of software tools for Machine Learning (ML). Starting from ML-specialized libraries in modern programming languages – such as Python, Scala and R, to cloud-based data engineering and analytics platforms – such as AWS, GCP and Azure, to software frameworks for development of large-scale ML solutions, such as PyTorch, Watson, CNTK and TensorFlow. We’re especially interested in the latter ones. Google’s TensorFlow is currently “the king” of all ML frameworks, based on GitHub stars and Google searches.
What makes TensorFlow so special and successful? What are its downsides compared to other competitors, and will it manage to stay on top? To explain this, we’ll have to go way back to the basics.
A History of Machine Learning
Machine Learning, as a tool for Artificial Intelligence, is not a new thing. It’s been around since 1959, when first introduced by IBM. Both terms, ML and AI, in fact, have gained and lost popularity a couple of times in the last few decades.
Today it seems that finally we’re nearing a stage where learning machines can become a reality. This is not only because of the software tools we have today, but also because of recent advances from a hardware and algorithmic perspective. And of course, funding in AI has never been seen on such a scale before.
Chart 1: ML and AI – Google Trends search interest, relative to the highest point in the chart.
Neural networks have gained popularity again, after one or two decades of non-belief in their prospects. So let’s look at algorithms.
Algorithms: Deep Neural Nets and Reinforcement Learning Algorithms
Today we also have deep neural nets, which became a state of the art solution in the field. Deep neural nets managed to move machine performance to the human level, and beyond. Self-driving cars, computer vision, speech recognition, bioinformatics are a few example use cases. Deep neural nets need lots of historical data to learn from, which is not a problem in most of digitized application areas.
Compare that to reinforcement learning algorithms, which can learn from experience instead of big data. This was demonstrated in the board game Go in 2016 and 2017. A machine called AlphaGo beat a human, 18-times world champion Lee Sedol, by 4 to 1. And that was just the start. Later on, its successor AlphaGo Zero reached a superhuman level of play and beat AlphaGo by 100 to 0. Reinforcement learning, together with unsupervised learning, are regarded as the future direction for achieving true artificial intelligence.
Machine Learning Hardware: CPUs vs. GPUs
ML algorithms today work with matrix multiplication, especially in deep neural nets. An operation which is not very suitable for traditional central processing units (CPUs). Instead, it’s done much faster on graphics processing units (GPUs). Or even more – Google’s tensor processing units (TPUs). A comparison of the three shows a 83x improvement of TPU against CPU, and 29x in TPU against GPU in performance/watt measure.
So back to our original question, why is TensorFlow so special? The short answer is: timing. TensorFlow showed a huge leap in innovation over its competitors when released in November 2015. More or less, it seemed to be a perfect fit to advances in both areas, hardware and algorithms. PyTorch, its biggest competitor at the moment, was released around a year later, and caught less attention. This was probably also due to differences in marketing budgets.
Chart 2: ML frameworks – Google Trends search interest, relative to the highest point in the chart.
In the hardware domain, most solutions utilized the performance gains of GPUs over CPUs. Google went a step further. Shortly after TensorFlow’s release, they released their non-commercially available TPUs. Nowadays, the hardware perspective behind TensorFlow still seems to hold the winning cards. Although NVIDIA’s GPUs are improving fast.
TensorFlow Building Blocks
TensorFlow has a broad range of applications, but is most suited for deep learning. And deep learning is currently the most promising field of ML.
- Tensors represent multidimensional data arrays, suitable for carrying matrix-like data in machine learning.
- Graphs are simply a set of operations performed on tensors. Once a graph is constructed, data can flow inside it through a Session. Sessions (and Placeholders) are the connection between TensorFlow’s internals and the outer world of data. This design is well suited to the machine learning workflow, and can work well for most of ML use cases.
One downside of TensorFlow lies within its computational graphs. A graph needs to be defined before running data through it, which makes it static.
In contrast, competitors such as PyTorch, went a step further using dynamic graphs. This means that you can define, change and execute the graph’s nodes as data flows. There’s no need to have a fixed structure before opening the pipes. This feature is crucial to modern use cases in sequence models, such as natural language processing (NLP). There are a few more advantages of PyTorch comparing to TensorFlow. One is the ease of use – code in PyTorch is better integrated with Python. Another is its openness – there’s no need to use special Sessions or Placeholders to communicate with the outer world.
Apart from the timing, another thing which makes TensorFlow better than the rest is its broad use. TensorFlow has very wide spectrum of use cases compared to available solutions on the market. It targets ML projects of any size, in both research and industry, and deployment in the cloud, and the edge. PyTorch on the other side seems to have a more narrow focus, targeting production and smaller-sized projects.
There’s also TensorFlow’s big community contributing to the available knowledge. Many programming APIs are available, thus attracting developers from different backgrounds.
From a programming language perspective, TensorFlow provides a Python API, as well as C++, Haskell, Java, Go, and Rust APIs. Third party packages are available for C#, Julia, R, and Scala. There are also different APIs for developing ML projects on different abstraction levels. The lowest one is TensorFlow Core and it provides full control over the modeling details, making it suitable for scientific research. Higher level APIs are built on top of TensorFlow Core and make apps development easier and more automated. One of these is Keras which recently got official support by Google. These APIs are more suitable for enterprise level production.
Bottom line, TensorFlow is a well-fitted framework to today’s biggest machine learning advances from hardware and algorithmic perspective. It came in at the right place in the right time. Its one-size-fits-all philosophy makes it suitable for all kinds of ML problems but might also turn out to be a bit risky choice – making TensorFlow “fatter” than other solutions. Competitors seem to have a more narrow and focused approach which gives them advantage in certain situations. With the current pace of innovation in AI, it’s hard to predict which framework will be a “winner” in a few years. But for now, TensorFlow is the leading ML framework out there.
And so if you’re extracting data from e.g. Amazon Redshift to feed it into your TensorFlow apps, maybe along with a scheduler like Apache Airflow, then open up a chat window on this page. We’d love to work with you to make make sure your algorithms will have always have reliable data pipelines feeding them with fresh training data.