
Azure Data Scientist Associate Certification Guide
By :

DevOps is a team mindset that tries to minimize the silos between developers and system operators to shorten the development life cycle of a product. Developers are constantly changing a product to introduce new features and modify existing behaviors. On the other side, system operators need to keep the production systems stable and up and running. In the past, these two groups of people were isolated, and developers were throwing the new piece of software over to the operations team who would try to deploy it in production. As you can imagine, things didn't work that well all the time, causing frictions between those two groups. When it comes to DevOps, one fundamental practice is that a team needs to be autonomous and should contain all required disciplines, both developers and operators.
When it comes to data science, some people refer to the practice as MLOps, but the fundamental ideas remain the same. A team should be self-sufficient, capable of developing all required components for the overall solution, from the data engineering parts that bring in data and the training of the models all the way to operationalizing the model in production. These teams usually work in an agile manner, which embraces an iterative approach, seeking constant improvement based on feedback, as seen in Figure 1.7:
Figure 1.7 – The feedback flow in an agile MLOps team
The MLOps team operates on its backlog and performs the iterative steps you saw in the Working on a data science project section. Once the model is ready, the system administrators, who are part of the team, are aware of what needs to be done to take the model into production. The model is monitored closely, and if a defect or performance degradation is observed, a backlog item is created for the MLOps team to address in their next sprint.
In order to minimize the development and deployment life cycle of new features in production, automation needs to be embraced. The goal of a DevOps team is to minimize the number of human interventions in the deployment process and automate as many repeatable tasks as possible.
Figure 1.8 shows the most frequently used components while developing real-time models using the MLOps mindset:
Figure 1.8 – Components usually seen in MLOps-driven data science projects
Let's analyze those components:
Figure 1.9 – Versioning a notebook and a script file using Git within AzureML
The ability to ensure both code and model quality plays a crucial role in this automation process. In Python, you can use various tools, such as Flake8, Bandit, and Black, to ensure code quality, check for common security issues, and consistently format your code base. You can also use the pytest
framework to write your functional testing, where you will be testing the model results against a golden dataset. With pytest
, you can even perform integration testing to verify that the end-to-end system is working as expected.
Adopting DevOps is a never-ending journey. The team will become better every time you repeat the process. The trick is to build trust in the end-to-end development and deployment process so that everyone is confident to make changes and deploy them in production. When the process fails, understand why it failed and learn from your mistakes. Create the mechanisms that will prevent future failures and move on.