Petr Gusev is an ML expert with over six years of hands-on experience in ML engineering and product management. As an ML Tech Lead at Deliveroo, Gusev developed a proprietary internal experimentation product from scratch as the sole owner. As part of the innovative stream of Yandex Music transforming the product to add podcast listening experience to the service, he built a podcast recommendation system from scratch as an ML Engineer at Yandex and achieved a remarkable 15% target metrics improvement. Additionally, as Head of Recommendations at SberMarket, his tech-driven roadmap elevated AOV by 2% and GMV by 1%.
I led a project to personalize the ranking of products featured on the main page of a grocery delivery service. Our main task was to increase business metrics through personalized product offerings.
To make sure the product would be successful, I did:
Interviews with users, with the intention of understanding what is valuable to them and the type of products they would like to see on the main page. In this project, I also invited ML engineers to participate in the interviews so that they could get context by listening to the users.
Checked whether the hypotheses that arose after interviews with users were confirmed by the data we gathered
Recorded the metrics that we expected to reach as a result of developing this functionality
Before starting actual development, we developed the model and made sure that it gives us the necessary quality and that we can achieve the necessary results in the A/B test. This is a very important point because developing an ML model is a research project that may not give us the results we expect. And if development has already started, we will waste the entire team’s time without achieving the necessary results.
Assembled a cross-functional team of analysts, developers, and ML engineers and made sure that all people working on the project were on the same page throughout every stage of product development.
A key aspect of managing an ML team is defining the project scope with precision, given the exploratory nature of ML projects. This requires setting clear, structured tasks. For instance, to delineate tasks accurately, I often use a template like "Implementing X will impact metric Y by Z.” This approach helps us set tangible goals and measure progress accurately.
Talent acquisition and skills development are also vital. Building a team with the right mix of skills is essential, and continuous learning should be encouraged to keep up with the rapidly evolving field of machine learning. This includes nurturing both technical skills and soft skills, such as problem-solving and teamwork.
Cross-functional collaboration is another important element. Especially when integrating and building a new ML model, the ML team needs to work closely with other departments, such as data engineering, product management, and marketing, to ensure that the ML model goes live.
Lastly, in this field, risk management and experimentation are crucial. Managing an ML team involves balancing and mitigating the risks associated with new and untested models while encouraging innovation and experimentation. It's about creating a safe space for trial and error, learning from failures, and continuously improving the models and techniques used.
Setting Clear Objectives and Expectations: For each task or project, I clearly define the objectives, expected outcomes, and timelines. This clarity helps team members understand their roles and responsibilities, enabling them to work more autonomously and efficiently.
Scheduled regular meetings where team members from different functions can update each other on their progress, discuss challenges, and brainstorm solutions.
Ensured all work, whether code, data analysis, or research findings, is well-documented and accessible on shared platforms.
Established and agreed with team members on clear channels for communication specific to different needs – such as urgent queries, brainstorming, or status updates.
Involved all relevant team members in goal-setting and project-planning sessions.
Often, the key factor is the model's execution speed. Initially, it's crucial to understand the specific constraints needed, as these can vary across different segments of a product. For instance, a model might be required to deliver results in less than X milliseconds or operate on less powerful machines.
To effectively manage this, a rapid prototype should be developed to evaluate the model's performance in a production setting. If meeting these constraints is impossible, then it’s important to negotiate with stakeholders to determine which aspect we should prioritize: the model's accuracy or its execution speed and resource consumption. Another common challenge in ML teams, particularly those with a stronger focus on science and mathematics rather than engineering, is optimizing the model for high performance in a production environment. Teams with a predominantly scientific and mathematical orientation might excel in creating robust models but may lack the skills to enhance their operational efficiency. Therefore, incorporating diverse profiles into the team is vital, as this ensures that we have both engineering and scientific backgrounds.
Additionally, managing the required accuracy constraints of the models is a critical aspect of the process. Tasks often necessitate careful balancing of precision and recall. For instance, in scenarios where false positives have significant implications, prioritizing precision might be more crucial. Conversely, in cases where missing true positives is costly, focusing on recall becomes essential. Tailoring the model to effectively manage these trade-offs according to the specific requirements of the task is fundamental to the successful deployment of machine learning models.
Problem: with a fast-growing user base, our team ran out of resources to store personalized recommendations, which were generated offline.
Short-Term Adaptation:As an immediate measure, I ceased generating personalized recommendations for users with insufficient history feedback. Recognizing that the quality of these recommendations would not be optimal due to the lack of data, I’ve replaced them with non-personalized recommendations. This approach leaned more towards popular recommendations, utilizing a different ML model to manage the immediate storage issue.
Long-Term Solution: To sustainably address the problem, my team developed an online recommendation system. Unlike the previous method, this new system did not require storing recommendations for every user. Instead, it generated recommendations dynamically, in real-time, when a client requested them. This innovative solution effectively resolved the storage constraints and adapted the recommendation process, and made it more scalable and responsive to user demands.
Ensuring that the ML team's work aligns with the organization's business goals and objectives involves a strategic and proactive approach:
Regular Business Context Sharing: I have established a process for consistently sharing the relevant business context of our project with the team. This involves frequent updates and discussions about the company's broader goals, market trends, and customer needs, ensuring that the team's work remains aligned with the overarching business objectives.
Setting Quarterly OKRs in Alignment with Company Goals: In collaboration with the team, we set quarterly Objectives and Key Results (OKRs) that are directly derived from and support the company’s own OKRs. This ensures that our projects and initiatives are contributing to the company's strategic objectives.
Monthly Stakeholder Updates: I proactively engage with stakeholders, providing them with monthly progress updates. This not only keeps them informed about the ML team's contributions and advancements but also fosters transparency and trust. It allows stakeholders to provide feedback and ensures that our efforts are in sync with the company's evolving priorities and needs.
Dashboard Monitoring:Post-deployment, I actively monitor a set of dashboards that track various company and proxy ML-model metrics. This includes, but is not limited to, metrics like the number of purchases facilitated by the recommendation system or distribution of travel time model predictions. Regular monitoring of these dashboards, typically on a weekly basis, allows for real-time assessment of the model's impact on business operations.
Quarterly Degradation Testing: To further understand the value added by the ML models, I conduct quarterly degradation tests. These tests involve temporarily disabling the ML models in specific parts of the business flow or across the entire application for a randomly selected, small user group. This helps in quantifying the impact of the ML systems by observing changes in business metrics in its absence. However, it’s very important to align with all the departments involved, so everyone is aware and actively agrees on performing such a test.
Maintaining Team Diversity:I prioritize building and maintaining a diverse team. This includes diversity in skills, backgrounds, experiences, and perspectives. A diverse team is more likely to generate innovative ideas and solutions, and it also fosters a richer, more inclusive work culture.
Inclusive Meeting Management:During team meetings, I ensure that everyone has the opportunity to speak and contribute. I actively facilitate discussions in a way that every member feels comfortable sharing their thoughts. This might involve directly inviting quieter team members to share their views or setting up meeting structures that ensure balanced participation.
Encouraging Team Building and Networking: Organizing team-building activities and informal networking opportunities helps in building rapport and understanding among team members.
Providing Opportunities for Professional Growth: I support the professional development of each team member through opportunities like personal development plans, workshops, and conferences.
Recognizing and Celebrating Contributions: Regularly acknowledging and celebrating the achievements and contributions of team members.
Reading Club: We have established a reading club where team members take turns to present and discuss a machine learning science paper relevant to our work. This not only keeps the team updated on the latest research but also encourages critical thinking and knowledge sharing.
Biweekly Research and Development Days: Every two weeks, we dedicate a day to R&D, where each team member explores a research project or tests new hypotheses. This initiative is designed to foster innovation and practical application of new ideas that could benefit our daily work.
Quarterly ML Hackathons: To stimulate creativity and team bonding, we organize quarterly ML hackathons. These events span a few days and are focused on working on exciting, out-of-the-box projects. They provide an opportunity for the team to experiment with new technologies, collaborate in different team configurations than usual, and offer a fun and stress-free environment to innovate.
Initially, I conducted one-on-one meetings with the involved parties. This allows me to understand each person's perspective and the core issues at hand. It's important to listen actively and empathetically during these discussions. Following the individual meetings, I arranged a joint meeting with both parties. Here, I share my observations, facilitate a constructive discussion, and encourage both individuals to express their views. This step often helps in clearing misunderstandings or mismatched expectations.
Most conflicts are resolved at this stage. However, if the disagreement persists, I involve HR to assist in finding a resolution, ensuring that all parties feel heard and that a fair solution is reached.
For technical disagreements, I advocate for a data-driven approach to conflict resolution. I encourage team members to support their arguments with measurable metrics or empirical evidence. For example, a conflict arose over which algorithm would perform faster for a particular application. To resolve this, I asked both parties to provide benchmarks and empirical data supporting their claims. This approach not only resolved the conflict but also fostered a culture of evidence-based decision-making.
I arrange for experts from other companies within the industry to visit and share their experiences with our team. This exposure to external expertise provides valuable insights into new practices, tools, and methodologies being used in the industry. It's an opportunity for the team to learn from the successes and challenges faced by others in similar fields.
I actively encourage team members to both attend and present at relevant conferences. This serves a dual purpose. Attending conferences keeps them informed about the latest trends and breakthroughs in machine learning while presenting their own work fosters a deeper understanding of their areas of expertise and enhances their professional profiles. This involvement in the professional community not only benefits individual team members but also brings fresh perspectives and ideas back to our team.