CIKM 2016 Industry Keynote Speakers
Industry Track Keynote, 10/25/2016 Tuesday, 11:00am
Building Industry-specific Knowledge Bases
Shivakumar Vaithyanathan (IBM Research)
Abstract: Building industry-specific knowledge bases relies heavily on collecting and representing domain knowledge over time. Domain knowledge includes: (1) the logical schema, constraints and domain vocabulary of the application, (2) the models and algorithms to populate instances of that schema, and (3) the data necessary to build and maintain those models and algorithms. In IBM Watson we are using an ontology-driven approach for the creation and consumption of industry-specific knowledge bases. The creation of such knowledge bases involves well known building blocks: natural language processing, entity resolution, data transformation, etc. It is critical that the models and algorithms that implement these building blocks be transparent and optimizable for efficient execution.
In this talk, I will describe the design of domain-specific languages (DSL) with specialized constructs that serve as target languages for learning these models and algorithms, and the generation of training data for scaling up the learning.
2. Industry Track Keynote, 10/25/2016 Tuesday, 2:00pm
Duer: Intelligent Personal Assistant
Haifeng Wang (Baidu, Inc.)
Abstract: Intelligent personal assistant is widely recognized as a more natural and efficient way of human-computer interaction, which has attracted extensive interests from both academia and industry. In this talk, I describe Duer, Baidu’s intelligent personal assistant. In particular, I would like to focus on the following three features. Firstly, Duer comprehensively understands people’s requirements via multiple channels, including not only explicit utterances, but also user models and rich contexts. Duer’s user models are learnt from users’ interaction history, and the rich contexts consist of temporal and geographical information, as well as the foregoing dialogues. Secondly, Duer meets diverse requirements with a range of instruments, such as chatting, information provision, reminder service, etc. These instruments are implemented based on mining the big data of web pages, applications, and user logs, which are then seamlessly integrated in the dialogue flow. Thirdly, Duer features multi-modal interaction, which allows people to interact with it by means of
texts, speech, and images. We believe the above features will enable Duer to become a better and distinguished intelligent assistant for each of you.
3. Industry Track Keynote, 10/26/2016 Wednesday, 10:40am
Using Machine Learning to Improve the Email Experience
Marc Najork (Google Research)
Abstract: Email is an essential communication medium for billions of people, with most users relying on web-based email services. Two recent trends are changing the email experience: smartphones have become the primary tool for accessing online services including email, and machine learning has come of age. Smartphones have a number of compelling properties (they are location-aware, usually with us, and allow us to record and share photos and videos), but they also have a few limitations, notably limited screen size and small and tedious virtual keyboards. Over the past few years, Google researchers and engineers have leveraged machine learning to ameliorate these weaknesses, and in the process created novel experiences. In this talk, I will give three examples of machine learning improving the email experience.
The first example describes how we are improving email search. Displaying the most relevant results as the query is being typed is particularly useful on smartphones due to the aforementioned limitations. Combining hand-crafted and machine-learned rankers is powerful, but training learned rankers requires a relevance-labeled training set. User privacy prohibits us from employing raters to produce relevance labels. Instead, we leverage implicit feedback (namely clicks) provided by the users themselves. Using click logs as training data in a learning-to-rank setting is intriguing, since there is a vast and continuous supply of fresh training data. However, the click stream is biased towards queries that receive more clicks — e.g. queries for which we already return the best result in the top-ranked position. I will summarize our work on neutralizing that bias.
The second example describes how we extract key information from appointment and reservation emails and surface it at the appropriate time as a reminder on the user’s smartphone. Our basic approach is to learn the templates that were used to generate these emails, use these templates to extract key information such as places, dates and times, store the extracted records in a personal information store, and surface them at the right time, taking contextual information such as estimated transit time into account.
The third example describes Smart Reply, a system that offers a set of three short responses to those incoming emails for which a short response is appropriate, allowing users to respond quickly with just a few taps, without typing or involving voice-to-text transcription. The basic approach is to learn a model of likely short responses to original emails from the corpus, and then to apply the model whenever a new message arrives. Other considerations include offering a set of responses that are all appropriate and yet
diverse, and triggering only when sufficiently confident that each responses is of high quality and appropriate.
4.Industry Track Keynote, 10/27/2016 Thursday, 10:40am
Large-scale Robust Online Matching and Its Application in E-commerce
Rong Jin (Alibaba)
Abstract: This talk will be focused on large-scale matching problem that aims to find the optimal assignment of tasks to different agents under linear constraints. Large-scale matching has found numerous applications in e-commerce. A well-known example is budget aware online advertisement. A common practice in online advertisement is to find, for each opportunity or user, the advertisements that fit best with his/her interests. The main shortcoming with this greedy approach is that it did not take into account the budget limits set by advertisers. Our studies, as well as others, have shown that by carefully taking into budget limits of individual advertisers, we could significantly improve performance of the advertisement system.
Despite of rich literature, two important issues are often overlooked in the previous studies of matching/assignment problem. The first issues arises from the fact that most quantities used by optimization are estimated based on historical data and therefore are likely to be inaccurate and unreliable. The second challenge is how to perform online matching as in many e-commerce problems, tasks are created in an online fashion and algorithm has to make assignment decision immediately when every task emerges. We refer to these two issues as challenges of “robust matching” and “online matching”. To address the first challenge, I will introduce two different techniques for robust matching. The first approach is based on the theory of robust optimization that takes into account the uncertainties of estimated quantities when performing optimization. The second approach is based on the theory of two-sided matching whose result only depends on the partial preference of estimated quantities. To deal with the challenge of online matching, I will discuss two online optimization techniques, one based on theory of primal-dual online optimization and one based on minimizing dynamic regret under long term constraints. We verify the effectiveness of all these approaches by applying them to real-world projects developed in Alibaba.