In previous articles (see Grab’s in-house chat platform, ), we shared how chat has grown to become one of the primary channels for support in the last few years.
With continuous chat growth and a new in-house tool, helping our agents be more efficient and productive was key to ensure a faster support time for our users and scale chat even further.
Starting from the analysis on the usage of another third-party tool as well as some shadowing sessions, we realised that building a templated-based feature wouldn’t help. We needed to offer personalisation capabilities, as our consumer support specialists care about their writing style and tone, and using templates often feels robotic.
We decided to build a machine learning model, called SmartChat, which offers contextual suggestions by leveraging several sources of internal data, helping our chat specialists type much faster, and hence serving more consumers.
In this article, we are going to explain the process from problem discovery to design iterations, and share how the model was implemented from both a data science and software engineering perspective.
How SmartChat Works
Diving Deeper into the Problem
Agent productivity became a key part in the process of scaling chat as a channel for support.
After splitting chat time into all its components, we noted that agent typing time represented a big portion of the chat support journey, making it the perfect problem to tackle next.
After some analysis on the usage of the third-party chat tool, we found out that even with functionalities such as canned messages, 85% of the messages were still free typed.
Hours of shadowing sessions also confirmed that the consumer support specialists liked to add their own flair. They would often use the template and adjust it to their style, which took more time than just writing it on the spot. With this in mind, it was obvious that templates wouldn’t be too helpful, unless they provided some degree of personalisation.
We needed something that reduces typing time and also:
- Allows some degree of personalisation, so that answers don’t seem robotic and repeated.
- Works with multiple languages and nuances, considering مینون operates in 8 markets, even some of the English markets have some slight differences in commonly used words.
- It’s contextual to the problem and takes into account the user type, issue reported, and even the time of the day.
- Ideally doesn’t require any maintenance effort, such as having to keep templates updated whenever there’s a change in policies.
Considering the constraints, this seemed to be the perfect candidate for a machine learning-based functionality, which predicts sentence completion by considering all the context about the user, issue and even the latest messages exchanged.
Usability is Key
To fulfil the hypothesis, there are a few design considerations:
- Minimising the learning curve for agents.
- Avoiding visual clutter if recommendations are not relevant.
To increase the probability of predicting an agent’s message, one of the design explorations is to allow agents to select the top 3 predictions (Design 1). To onboard agents, we designed a quick tip to activate SmartChat using keyboard shortcuts.
By displaying the top 3 recommendations, we learnt that it slowed agents down as they started to read all options even if the recommendations were not helpful. Besides, by triggering this component upon every recommendable text, it became a distraction as they were forced to pause.
In our next design iteration, we decided to leverage and reuse the interaction of SmartChat from a familiar platform that agents are using – Gmail’s Smart Compose. As agents are familiar with Gmail, the learning curve for this feature would be less steep. For first time users, agents will see a “Press tab” tooltip, which will activate the text recommendation. The tooltip will disappear after 5 times of use.
To relearn the shortcut, agents can hover over the recommended text.
How We Track Progress
Knowing that this feature would come in multiple iterations, we had to find ways to track how well we were doing progressively, so we decided to measure the different components of chat time.
We realised that the agent typing time is affected by:
- Percentage of characters saved. This tells us that the model predicted correctly, and also saved time. This metric should increase as the model improves.
- Model’s effectiveness. The agent writes the least number of characters possible before getting the right suggestion, which should decrease as the model learns.
- Acceptance rate. This tells us how many messages were written with the help of the model. It is a good proxy for feature usage and model capabilities.
- Latency. If the suggestion is not shown in about 100-200ms, the agent would not notice the text and keep typing.
Architecture
The architecture involves support specialists initiating the fetch suggestion request, which is sent for evaluation to the machine learning model through API gateway. This ensures that only authenticated requests are allowed to go through and also ensures that we have proper rate limiting applied.
We have an internal platform called Catwalk, which is a microservice that offers the capability to execute machine learning models as a HTTP service. We used the Presto query engine to calculate and analyse the results from the experiment.
Designing the Machine Learning Model
I am sure all of us can remember an experiment we did in school when we had to catch a falling ruler. For those who have not done this experiment, feel free to try it at home! The purpose of this experiment is to define a ballpark number for typical human reaction time (equations also included in the video link).
Typically, the human reaction time ranges from 100ms to 300ms, with a median of about 250ms (read morehere). Hence, we decided to set the upper bound for SmartChat response time to be 200ms while deciding the approach. Otherwise, the experience would be