ABOUT ME
Chief Data Scientist at Waze (A Google Company. Leading several groups which include the Data Science Group, Product Analytics Group and Data Infra),
Faculty Member at Reichman University's Business School (Director of the AI & Big Data MBA Program),
Director of Data Science at Google
Former Chief Data Scientist at Outbrain,
PhD, Econometrics - Tel Aviv University
I am a hands-on Data Scientist and Google Director. I specialize in Economic applications of machine learning and AI. My work combines academic research with industry data-driven products. You may ask yourself - what do theoretical Economic models have to do with the machine learning that Data Scientists practice? Well.. from my experience.. everything.. Economic applications of Machine Learning range from marketplace management and auction optimization in the field of recommendation systems, to efficient transportation and behavioral nudging people to carpool, and up to optimizing the decision-making processes of NBA coaches and CEOs. A sample of my work in these various domains is detailed below.
If you have any queries, feel free to contact me via LinkedIn or via sasson.roy@runi.ac.il
Or simply subscribe to my newsletter below to stay up to date.
SELECTED RESEARCH & PRODUCTS
(PRESENTED IN CONFERENCES + ACADEMIC JOURNALS)
ETA PREDICTION MODELS USING DEEP LEARNING AT WAZE
This post describes a recent Machine Learning model for ETA prediction, based on LSTMs, which is now in production for most of Waze's user base. Besides the technical details about how supervision data is created, model architecture and implementation - the work describes how this all relates to my view of building a proper data strategy which takes under consideration machine learning, analytics, data engineering and cross functional work as part of engineering teams.
THE ECONOMICS AND DATA SCIENCE OF ELIMINATING TRAFFIC ALTOGETHER
This short lecture outlines the work my Data Science team does at Waze - helping people find great matches for Carpooling on their daily commute. The lecture highlight how our work is influenced by classic Economic models (specifically by William Vickrey's), combined with modern Economic Models (such as Nudging theories) and Machine Leaning modeling.
URBAN PULL: THE ROLES OF AMENITIES & EMPLOYMENT
This paper leverages new measurement of neighborhood amenities to demonstrate that housing prices and rents in U.S. cities are determined nearly as much by proximity to amenities as they are by proximity to employment. We develop a revealed preference measure of amenities using navigations data indicating the locations in which people consume leisure. Consumption amenity centers overlap substantially with employment centers but are distinct and have distinct effects on prices. Using the Alonso-Muth-Mills within-city spatial equilibrium framework, we estimate the relative importance of amenities and employment in demand for neighborhoods. The navigations-based amenity measure strongly and positively predicts local and nearby prices with spatial decay. It adds substantial explanatory value relative to observable-venues-based amenity measures as well as to several strictly localized amenities, such as school quality or crime. We show that constraining neighborhood amenities to be consumed only by locals, when in fact people may travel within city to consume amenities, misses a key feature of cities and biases estimates of both commute costs and the value of amenities. These improvements in amenity measurement increase the estimated importance of amenities relative to employment in location demand and suggest the potential robustness of cities to changes in employment locations.
Since the 1970s, high-occupancy vehicle (HOV) lanes have been a common policy instrument to mitigate traffic congestion. Yet, their effectiveness remains a controversial topic among researchers, policy makers, and the public. In this debate, a key unknown has been the impact of HOV lanes on commuters’ carpooling behaviors. This paper brings a new piece of evidence by offering a data driven assessment of carpooling intent and adoption, using revealed-preferences data. We partner with Waze, a major carpooling platform, and leverage a natural experiment following the introduction of three HOV lanes in Israel in 2019. Using tailored treatment and control groups coupled with econometric analyses, we derive four main findings. First, HOV lanes bring new users to the carpooling platform, which contributes to alleviating the ‘‘cold-start’’ problem in the marketplace. Second, HOV lanes have a positive impact on carpool intent: the number of carpool offers sent by drivers increase manifold following the introduction of the HOV lanes. Third, HOV lanes have a disparate impact on carpool adoption: carpools increase significantly for two out of three HOV lanes. This result underscores the critical impact of HOV lanes design: it seems more beneficial to have round-trip HOV lanes (as opposed to one-way lanes) and two-passenger occupancy requirements (as opposed to three-passenger requirements).
Last, HOV lanes have a broader impact, by increasing carpooling on non-HOV routes and shifting the travel behaviors of non-carpoolers. We conclude by discussing policy implications, highlighting collaboration opportunities between policy makers and digital carpooling platforms to enhance the design and operations of HOV lanes.
NUDGING COMMUTERS TO CARPOOL: A LARGE FIELD EXPERIMENT WITH WAZE
Traffic congestion is a serious global issue. A potential solution, which requires zero investment in infrastructure, is to convince solo car users to carpool. In this paper, we leverage the Waze Carpool service and run the largest ever digital field experiment to nudge commuters to carpool. We find a strong relationship between the affinity to carpool and the potential time saving through a high-occupancy vehicle (HOV) lane. Specifically, we estimate that mentioning the HOV lane increases the click-through rate and conversion rate by 133-185% and 64-141%, respectively relative to sending a generic message.
Joint with Maxime Cohen, Michael-David Fiszer and Avia Ratzon.
MIND THE DATA CONFERENCE
This lecture lays my manifesto about the current role that Economists have in the industry, and how they should change their practice if they want to keep the Science of Economics in their hands. The most important lesson from this lecture - "Economists should have their skin in the game", meaning - they should build products instead of consulting, and stand behind their failures.
WHICH INCENTIVES GET PEOPLE TO CARPOOL?
(WAZE LATAM SUMMIT, MEXICO CITY 2019)
This lecture outlines the Analytical work that is being done at Waze about Carpool Incentives: Subsidies, Matching Algorithms, Lock-In Supply and many more.
"FOR YOUR EYES ONLY": CONSUMING VS. SHARING CONTENT ON FACEBOOK
The most comprehensive work ever done to compare what people read online vs. what they share on Facebook. The paper analyzes two types of user interactions with online content: (1) private engagement with content, measured by page-views and click-through rate; and (2) social engagement, measured by the number of shares on Facebook as well as share-rate. Based on more than a billion data points across hundreds of publishers worldwide and two time periods, it is shown that the correlation between these signals is generally low. Potential reasons for the low correlation are discussed, and the notion of private-social dissonance is defined. A more in-depth analysis shows that the dissonance between private engagement and social engagement consistently depends on content category. Categories such as Sex, Crime and Celebrities have higher private engagement than social engagement. On the other hand, categories such as Books, Careers and Music have higher social engagement than private engagement. In addition to the offline analysis, a model which utilizes the different signals was trained and deployed on a live recommendation system. The resulting weights ranked the social signal lower than clickthrough rate. The results are relevant for publishers, content marketers, architects of recommendation systems and researchers who wish to use social signals in order to measure and predict user engagement.
Joint work with Ram Meshulam.
INTRODUCING OUTBRAIN LOOKALIKE AUDIENCES
This is a product that my team at Outbrain developed - a marketer (for example - an online retailer) delivers Outbrain a list of valuable users, for example - users who have made a purchase, not necessarily through Outbrain. We use machine learning models, such as logistic regression, decision trees and matrix factorization to characterise these valuable users' content interests. Such interests (we call those 'features'. There are thousands of those) may include the main content categories they read and not likely to read, publishers they visit and not likely to visit, personas and companies they're interested in etc. Using these models, we identify in real time a user which is not included in the marketer's list, but similar to those users, and recommend them with campaigns by that marketer.
Research led by Moran Gavish.
USER ENGAGEMENT - BEYOND CLICKS
Outbrain serves over 150 billion content recommendations to more than 500 million users every month. Masses of data tell us what’s driving the mindset of the crowd at each point in time. But how do you analyze if the individual user finds real value in recommendations? And why being satisfied with click-focused-metrics is dangerous for long term growth?
This lecture outlines a Data Scientist’s experience and challenges when analyzing post-click-engagement, in the context of content discovery. This lecture shows examples of how relying on click-focused-metrics might be misleading you in the long run. We will share data of how crowd preferences of consuming content differ from individual user preferences. Finally, we suggest a 3-layer framework for Data Scientists to measure and analyze post-click-engagement, while considering the perspectives of host publishers, marketers and recommendation providers.
TERMINATION RISK AND AGENCY PROBLEMS: EVIDENCE FROM THE NBA
When organizational structures and contractual arrangements face agents with a significant risk of termination in the short term, such agents may under-invest in projects whose results would be realized only in the long term. We use NBA data to study how risk of termination in the short term affects the decision of coaches. Because letting a rookie play produces long-term benefits on which coaches with a shorter investment horizon might place lower weight, we hypothesize that higher termination risk might lead to lower rookie participation. Consistent with this hypothesis, we find that, during the period of the NBA’s 1999 collective bargaining agreement (CBA) and controlling for the characteristics of rookies and their teams, higher termination risk was associated with lower rookie participation and that this association was driven by important games. We also find that the association does not exist for second-year players and that the identified association disappeared when the 2005 CBA gave team owners stronger incentives to monitor the performance of rookies and preclude their underuse.
Joint with Alma Cohen (Harvard & TAU) and Nadav Levy (IDC).
ACADEMIC COURSES
INTRODUCTION TO ECONOMETRICS
Tel Aviv University,
The Eitan Berglas School of Economics,
Undergraduate program
All Lectures are open for free on YouTube
Students' Survey 2018 (Mean = 95/100)
Students' Survey 2019 (Mean = 91/100)
DATA-DRIVEN GROWTH
Reichman University,
Executive Education Program
BIG DATA FOR ECONOMISTS
IDC Hertzliya,
Arison School of Business Administration
Graduate program
(Also taught at TAU School of Economics)
PROMOTING THE DATA SCIENCE COMMUNITY AND EDUCATION
Y-DATA TECH TALK – SCALING KNOWLEDGE AT WAZE: THE ROLE OF DATA SCIENCE(S)
In this talk, I discuss the intersection between the Analytical Infra and scientific practices we have embraced at Waze, on top of Google Cloud Platform, and how they fit our view of the evolving roles of Data Scientists.
A/B testing systems have become a mandatory tool for Data Scientists and Product Managers for getting insights and learning about which features work and drive engagement with users. In this Lecture (Hebrew), I draw the 4 fundamental hazards that rapid-growth startup face in utilizing A/B tests for key learnings, especially in today's marketplace-oriented products
HEBREW UNIVERSITY'S PPE CONFERENCE
LEARN DATA SCIENCE ONLINE FOR FREE
Even if you don't have the capability of going to college - you can still become a proficient data scientist, almost for free. This is my "greatest hits" list of online classes. It comprises a pretty full survival kit for to-be-data-scientists.
CLICK PREDICTION CONTEST ON KAGGLE
See full contest details here.
Our “Outbrain Challenge” was a call out to the research community to analyze our data and model user reading patterns, in order to predict individuals’ future content choices. The best models were rewarded with cash prizes totaling $25,000. The sheer size of the data we’ve released (100 GBs) was unprecedented on Kaggle, the competition’s platform, and was considered extraordinary for such competitions in general. Crunching all of the data may be challenging to some participants—though Outbrain does it on a daily basis.
Joint work with Ronny Lempel and Ran Locar.
THE TECHNION'S NEW DATA SCIENCE PROGRAM - A REVIEW
BIG DATA ON THE BAR
A light lecture for potential undergraduate students at Dizzy Frishdon, Tel Aviv)