👯Open Source Contributions Are Awesome

The power of having an open-source framework in a noisy tooling landscape.

Jun 24, 2022

Hey there - today I’m fulfilling my digital nomad dream and writing this all the way from a beach somewhere in southern France. I know this fulfills some sort of weird startup founder stereotype, so I do apologize for your collective eyes roll as you continue reading.

I haven’t written this newsletter personally in a while, so this time I find myself hesitating. So much has happened in the last month! I’ll just let the word salad flow and then reorganize everything below. Let’s begin!

🖌️Data Annotation <> MLOps

In the last post, Adam mentioned our latest company retreat, and how we had made some big strategic decisions. One of them has been a deliberate shift towards addressing unstructured data problems. In the vision space, the #1 problem to address is the wild wild west of data annotation :

We started with simply writing about the problem space here. The blog noted how hard it is even today to include data annotation as part of an overall MLOps workflow, and how we were going to attempt to bridge that gap with upcoming releases of ZenML.

Accompanying this, we released an `awesome- open-data-annotation` GitHub repository, which is a collection of, you guessed it, awesome open-source data annotation tools.

The response was brilliant.

The repository has received as of writing 241 GitHub stars, all organic, and with little push from our side. The launch blog post also went viral and received hundreds of views and many interesting comments.

At ZenML, we take this as encouragement and validation that the problem space is indeed ripe for innovation. Watch out for our data annotation integrations in the coming releases to see how we tackle the solution.

🎸Open Source Contributors are Awesome

Let me share two stories to illustrate why building an open-source, extensible framework is so powerful and fulfilling. As some of you may know, ZenML does not take a hard opinion on where its pipelines are orchestrated. You can either use a growing list of orchestrators (Kubeflow, Airflow, GitHub Actions) or build your own.

We released the ZenML orchestrator extension docs a few weeks ago and within a few days received a pull request from our amazing community member, Gabriel Martín Blázquez, with a brand-new Google Vertex AI orchestrator. He had read the docs on Friday, worked on the weekend, and made a pull request on Monday because he wanted his team to run ZenML pipelines both locally and on Google Vertex AI. Here is the result:

➡️Check out how to run ZenML pipelines on Google Vertex AI in our latest blog post showcasing Gabriel’s integration

Here’s another one: As some of you may know, we have an accompanying GitHub repository to ZenML called ZenFiles, which are end-to-end use cases and fully worked examples of using it in production. As these are detailed examples, we never really expected people to actually create them (We didn't even have a CONTRIBUTING.md!). To our utmost surprise, a few weeks ago, our community member Lukas Rasocha (@lukyrasocha) created a pull request for an end-to-end example of using ZenML with time series data. He illustrated how to write an MLOps pipeline to pull data from BigQuery and train a model using the Vertex AI ML platform.

➡️Lukas's ZenFile can be found here. Give it a⭐to show Lukas some love!

We didn't know Gabriel or Lukas before, nor did they know us personally. They simply saw an avenue to contribute, so they did. Now everyone else out there in the wild can benefit from the Gabriels orchestrator and Lukas’s ZenFile!

This is why open-source is incredible, and why we believe it is one of the most powerful disruptions in software.

🎼Orchestrators Galore

Our latest messaging push is all about portability, and orchestration agnosticism is probably the key way to set ZenML apart in the MLOps space. Let me clarify what an orchestrator is:

At a high level, an orchestrator in MLOps is a tool that enables developers to write, schedule, monitor, and manage workflows. ZenML is not an orchestrator, but rather is a tool that lets you write pipelines that can be run on multiple orchestration systems. There are standard orchestrators that ZenML supports out-of-the-box, but you are encouraged to write your own orchestrator in order to gain more control as to exactly how your pipelines are executed. ZenML not only allows you to easily swap orchestrators, but also focuses on adding value within the pipeline itself, with features such as model deployment, metadata tracking, and management of the configuration of your MLOps stack.

To that end, the latest ZenML Release 0.9.0, featured two brand new orchestrators:

➡️GitHub Actions [EXAMPLE | BLOG]

➡️Vertex AI [EXAMPLE | BLOG]

Running ZenML pipelines in different environments with different MLOps stacks illustrates that ZenML is a framework, rather than a platform, something we went in-depth within our launch blog a few weeks ago.

🎙️Don’t forget the latest podcasts

Pipeline Conversations has been on fire recently! We had some amazing episodes released in the last weeks and have some bangers coming up.

First, Ben Wilson graced the pod. Ben works over at Databricks and has also just released a new book called ‘Machine Learning Engineering in Action’. It’s a jam-packed guide to all the lessons that Ben has learned over his years working to help companies get models out into the world and run them in production.

We were then lucky to get to talk to Iva Gumnishka, the founder of Humans in the Loop. They are an organization that provides data annotation and collection services. Their teams are primarily made up of those who have been affected by conflict and now are asylum seekers or refugees.

➡️Ben Wilson on ML Engineering in Action [BLOG | PODCAST]

➡️Iva Gumnishka on data annotation in production [BLOG | PODCAST]

See all the episodes here, and don’t forget to subscribe!

🤓All good things..

All good things must come to a (temporary) end. Our awesome young rockstar, Ayush Singh, ended his internship with us last week. This is his sign-off meeting where we celebrated his accomplishments.

Ayush came into ZenML with little experience in MLOps, and ended up contributing 5 end-to-end production-grade MLOps use-cases (see them all in our ZenFiles repo). Not only that, he is already using his experience with us to train hundreds of other students across the world with his online MLOps course!

No alternative text description for this image — Ayush’s farewell meeting at the last day of his internship

Good luck with your future endeavors Ayush, and hopefully you find your way back to your ZenHome soon enough!

🤝Design Partnerships

Ok, this was a LONG newsletter, so let me end by asking for a bit of support from this great community. We are now offering design partnerships with official contracts for companies that want to up their game with ML in production (using ZenML). These design partnerships have a defined scope and timeline that is customized for every use case. We will support these companies in implementing ZenML pipelines in their production settings.

E.g. Last year we worked with Airbus Defence and Space in such a setting with two dedicated full-time ZenML engineers. It was really fun to support a rocket science company in their ML pipeline activities. Similarly, if you are part of a company that wants to bring ML into production in a professional setting (or know the like), please hit reply and send me a note!

And that’s a wrap. I’ll be back next time with more updates about ZenML. In the meantime, don’t forget to give the GitHub repository a star, and share this newsletter with your friends!

Cheers,

-H

Building an Open-Source Startup in Public.

Discussion about this post