MLOps Community
Feature Store at Shopify and Skyscanner // Matt Delacour and Mike Moran // Reading Group #4
MLOps Reading Group meeting on February 11, 2022  
Reading Group Session about Feature Stores with Matt Delacour and Mike Moran  
--------------- ✌️Connect With Us ✌️ ------------- 
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity 
Connect with us on LinkedIn: https://www.linkedin.com/company/mlopscommunity/
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/
Timestamps: 
[00:05] Matt's intro 
[00:26] Mike's intro 
[01:09] Matt’s talk: Feature store system at Shopify 
[01:45] What is Shopify? 
[02:05] Shopify Use Case 
[02:38] Choosing a solution 
[03:19] Managed service vs In-house vs Open-source (Feast) 
[06:01] Why did we choose Feast? 
[11:25] Implementation Strategy (multi-repo vs mono-repo approaches) 
[13:01] Mono-repo approach breakdown 
[14:30] Internal SDK 
[17:01] Q&A: Does feast satisfy scalability for online inference of Shopify latency requirements? 
[19:05] Q&A: Do you rely on Feast to serialize data to the online store? 
[20:13] Q&A: Is your mono-repo library a subset of Feast? 
[21:18] Q&A: Did you consider using git submodules for a multi-repo? 
[23:02] Q&A: Are you storing embeddings with Feast? 
[24:30] Q&A: Regarding the mono-repo, which modules are responsible for feature engineering? How do you guarantee that different feature engineering can be used across many DS? 
[27:58] Mike’s talk (Feature store at Skyscanner) 
[28:08] Kaleidoscope System 
[28:25] Background and context of the Feature store 
[29:30] Initial state of the feature store 
[30:13] How does the marketing team also leverage the feature store 
[31:04] Current state of the feature store (marketing & machine learning) 
[31:44] SDK approach of creating schemas with dataframes (easy access) 
[32:16] Reusability across teams among marketing and DS team 
[33:06] GDPR constraints 
[33:34] Data updates at the feature store 
[36:09] Q&A: When a DS updates a feature, how are you communicating that across teams? 
[38:25] Q&A: Are you applying different levels of feature engineering to increase the likelihood of a DS going back to a previous checkpoint of processing? 
[40:55] Q&A: In what languages are you implementing the feature store? 
[44:28] Q&A: Regarding performance-wise, how do you decide what code remains in Apache Spark vs SQL? 
[49:00] Wrap-up
MLOps Community