Start Now Login
Titans of Data with Florian Leibert – CEO Mesosphere

Titans of Data with Florian Leibert – CEO Mesosphere

Florian Leibert is a co-founder and the CEO of Mesosphere. Before founding Mesosphere, Florian was a tech lead at AirBnB, in charge of the data infrastructure team. While at AirBnB, Florian’s team built and open-sourced Chronos, a distributed and fault-tolerant scheduler which runs on top of Mesos. In this interview, we talked with Florian about the rise of “data-intensive applications”.

Q: Can you tell us a little bit about Mesosphere? You’ve been around since 2013. What do you guys do? What has changed since then?

The mission of Mesosphere is to make it insanely easy to build and scale world-changing technology. The way we want to do this is to offer public, cloud-like services that are composed of an open ecosystem and which can run on any infrastructure.

In recent time, the industry has been seeing an increase in edge computing, thanks to things like IoT, connected cars, and sensor networks. Edge computing is kind of the antithesis that centralizes the cloud because you need processing power at the edge.

For example, the cars from auto manufacturers that are using our platform to drive their autonomous car projects produce about four terabytes of data for every eight hours of driving on a single car. Multiply this by 100 thousand cars, times a million or 10 million cars, and it becomes evident very quickly that there’s no way that all this data goes to the Central Data Center in Frankfurt.

This is where Mesosphere comes in to create a cloud-like service that spans other clouds around the data centers.

Q: Can you elaborate on the example of a car manufacturer, with its three types of “infrastructures”? Mesosphere is an abstraction layer on top of that?

Yes, of course. On the car itself, you need to have a machine-learning model, and the car with all the sensors and the main sensors, HD cameras, LiDAR, for the three-dimensional map, radar, sonar, and GPS.

All this data needs to be processed, and the car needs to act right away with very low latency. What you need to do to make this happen is to collectively send this data out into either a centralized or a de-centralized cloud.

The machine-learning models are continuously trained and improved at the edge data centers, and then they are served back to the cars to enable them to make better decisions. That’s the play of this modern distributed application.

Q: When I go to Mesosphere’s website, it says, “It’s a platform for containerized data-intensive application.” Can you explain this concept in a few words?

 Containers are a way of packaging an application and making sure that the application can run anywhere. You can run it the same way on your laptop in a container as you can run it on a cloud.

A container provides portability via packaging. It also goes all the way back to Linux and Solaris. Solaris has a concept called Zones and Zones were there as a lightweight form of virtualization. To run many things on one machine and taking advantage of the increase in CPU memory and disk, what’s needed to happen is to isolate processes from one another. There are two ways to do this.

We are not about the standard of containers; that has been standardized and commoditized. It’s pretty much like the MP3, which is a standard that you can play anywhere. What we are building is the iTunes analogy for it. We are allowing you to run these containers next to each other in case one of these containers fails, it is going to be restarted elsewhere.

We also provide data services alongside application lifecycle management which are a necessity when transporting data. Data transport happens via a message queue and cloud service providers have their own proprietary implementations of these message queues. The leading message queue is for example, in our stack, it’s Apache Kafka. Cloud service providers often times have their own proprietary implementations of these message queues. Amazon has a message queue called Kinesis and one that’s called SQS for transport, and then other services to analyze the data.

For example, if a car is driving, you could be plotting on a map where the car is driving based on the stream of GPS coordinates that we are getting. The stream comes through via the message queue and then you have an analysis tool like Spark that’s analyzing it and doing transformations on it.

We store either structured, unstructured, or semi-structured data. For that, you have NoSQL databases which are horizontally scalable databases that do not implement the same transactional logic as a traditional database but allows you to store much more data. What our platform provides is a set of all these components and they are all packaged and installable in your own data center or across clouds so that in the end, you’re not locked into a single cloud and you have more cost control.

Q: Can Mesosphere allow a global brand to operate globally in whatever infrastructure?

Yes, it can. You have the same APIs in your own data centers or if you use the infrastructure layer from Amazon, you can use our software for that as well.

Here’s the key point; why use Mesosphere? Why not just go with a cloud service provider’s high-level APIs? The answer to that is that you have very little cost control when you go with a cloud service provider because a cloud service provider charges you for each transaction or for each message that might be sent into this message queue.

Imagine you have a thousand developers or ten thousand developers that are all interacting with hundreds or even thousands of APIs that are all individually priced. There is no way for a CIO to predict the cost of this development effort.

For example, Snap Inc. tied themselves to two cloud service providers. They are currently paying one of them 400 million dollars a year and the other one, I think, should be north of a 100 million a year. That’s half a billion dollars a year they are paying for cloud service providers. On top of that, if they ever wanted to, they couldn’t easily get out of that.

Q: Mesosphere gives you the capability to deal with massive amounts of data, but it also gives you tons of flexibility because you do not have any lock-in with any of these vendors. Is this a correct statement?

Absolutely, yes. We have some of our customers that are running on Azure and Google Cloud and some that are running in their own data centers and use the cloud for bursting, so when they have high demand on their website for example or if they have a high data volume, then they add additional infrastructure as a service from a cloud service provider to augment the local capacity that they have on premise.

Q; Can you share a few stories about your customers? Maybe some innovative transformational use cases you have seen with your customers that are using Mesosphere?

Royal Caribbean Cruise Lines, they have a lot of cruise ships which offer all sorts of services to the people on a cruise. What Royal Caribbean Cruise Lines wanted to do is they wanted to offer their customers a much better experience on their cruise ships via personalized applications that can be accessed from a cellphone that allows you, for example, to show like, “Hey. Where’s the latest Happy Hour?”

The big problem with a cruise ship is that it’s most of the time at sea. The only way you get Internet at sea is if you go via satellite. Now, that makes for a bad user experience if you must have many seconds of refresh time when writing a message.

So what Royal Caribbean Cruise Lines has done is they’ve created data centers on each of the ships that run DC/OS and a centralized data center or cloud where they’re running the same high-level API. They’re running DC/OS on both. That allows them to give their customers an amazing service while keeping low latency and leveraging this hybrid cloud set up.

Q: DC/OS, that’s a Mesosphere product, correct?

Mesosphere DC/OS refers to the Data Center/Operating System. Some people call it the Distributed Cloud Operating System, but we just call it DC/OS.

Q: When a customer on the cruise line opens their phone, where does that data go? What happens?

When someone writes a mobile app, this app needs to communicate with the back end. Maybe the back end serves the latest food menu at a local restaurant and the pricing. That back-end service is maybe written as a microservice, i.e. it might use some sort of a database as a back end and it needs to be deployed. It’s packaged into a container and the container is deployed twice or three times on three different servers, so even in case of an outage or a reboot, the user can still be communicating with the system.

For example, when a user places an order or says, “Hey, I want to have a seat reservation at this restaurant tonight,” as the user sends data, the data is placed on a message queue and then maybe it processes and some back-end transactions happens. Another API is called to say, “Oh. Maybe they’re integrated with some third-party reservation provider.” That API is triggered. Then the user is at some point, served back a response saying like, “You successfully reserved the table at Restaurant X.” All of this requires a couple of components like a SQL database, a message queue, and maybe a container orchestrator that places the containers where the microservice is packaged in.

What DC/OS does is it encompasses all these components in form of a hybrid cloud. It has all these components built in. You only need DC/OS and you get all these pieces of software that are very complex to set up and manage.

Q: An industry like cruise lines, they’re investing into this because it helps them to differentiate the customer experience and to bring transformational experiences to their cruise ships, correct?

 Absolutely, yes. Digital transformation is real. It allows companies to optimize and to reinvent themselves. Going back to the car use case, car companies use to be hardware companies where the developers are interacting with an ASIC chip. They are not cloud-native developers. Now, car companies are becoming software companies. A Tesla, for example, has so many software engineers working on it and sometimes, when you park your car, the software update is installed while you’re out to dinner.

Q: Can you maybe touch on the two concepts of Kubernetes and Tensorflow and how it is related to what Mesosphere is doing?

 As I mentioned earlier, DC/OS is the platform that gives you a lot of higher level platform services.

Q: What advice you can give these companies when they approach this new world of data? Also, how do you work with customers when they approach you?

 I think the number one thing to look at is the fact that the technologies that we’re using today are likely not the technologies we’ll be using in 10 years.

Maybe for some parts, but let’s look at Hadoop for example. Hadoop was sold back then as the panacea for everything. People were implementing machine learning and algorithms on Hadoop. But the model of Hadoop didn’t really lend itself to solving a lot of problems.

To solve these problems, other systems came along. For example, Spark, which allows you to carry out manipulation of data and getting instant feedback of that data. Looking at GitHub, you can see how developers all over the world are now contributing to creating more and more new technologies that solve problems better.

This is one of the reasons why we created Mesosphere in the first place and why we created DC/OS. We wanted to give our customers an evergreen technology platform where they take the best of breed open source and proprietary technology and make it available. And we put the operational knowledge in that is required for deploying and installing these platform services or these new projects into the platform, and thus companies can try these out.

Q: What excites you the most now with all these trends about Mesosphere’s future as you have been around for about four years?

When we started Mesosphere, the very first thing that we tackled was making Hadoop and containers more easily manageable. Now, what I’m excited about is this evergreen nature of our platform. We are able to embrace new technologies and make them usable by anybody while lowering the barrier of entry for any company and enabling digital transformation.

 Q: Apart from working for Mesosphere, what are some of the advice you can give people who are trying to get into the data ecosystem? What should they do?

You can sign up for some data science course on any of the leading online education platforms. Alternatively, once you have some fundamental programming experience you can download a data set from the government. Crime databases, for example, can be downloaded and explored using tools that are open source like Jupiter Notebooks or Spark, and just start playing with the data because exploratory data analysis is powerful and allows you to learn more about not just the underlying data, but also strengthen your skill set.

Q: Thank you Florian!

Related content
3 Things to Avoid When Setting Up an Amazon Redshift Cluster Apache Spark vs. Amazon Redshift: Which is better for big data? Amazon Redshift Spectrum: Diving into the Data Lake! What Causes "Serializable Isolation Violation Errors" in Amazon Redshift? A Quick Guide to Using Short Query Acceleration and WLM for Amazon Redshift for Faster Queries What is TensorFlow? An Intro to The Most Popular Machine Learning Framework Titans of Data with Mirko Novakovic - How Containers are Giving Rise to New Data Services Why We Built - “APM for Data” 4 Simple Steps To Set-up Your WLM in Amazon Redshift For Better Workload Scalability World-class Data Engineering with Amazon Redshift - Training Announcing App Tracing - Monitoring Your Data Apps With Have Your Postgres Cake with Amazon Redshift and eat it, too. 4 Real World Use Cases for Amazon Redshift 3 Steps for Fixing Slow Looker Dashboards with Amazon Redshift Zero Downtime Elasticsearch Migrations Improve Amazon Redshift COPY performance:  Don’t ANALYZE on every COPY Building a Better Data Pipeline - The Importance of Being Idempotent The Future of Machine Learning in the Browser with TensorFlow.js Gradient Boosting Libraries — A Comparison Crowdsourcing Weather Data With Amazon Redshift The Future of Apache Airflow Announcing Query Groups – Intelligent Query Classification Top 14 Performance Tuning Techniques for Amazon Redshift Product Update: An Easy Way To Find The Cause of Disk Usage Spikes in Amazon Redshift How We Reduced Our Amazon Redshift Cost by 28%
Ready to start seeing into your data infrastructure?
Get started with a 14-day free trial, with access to the full platform

No Credit Card Required