The rise of personal assistants has made conversational question answering (ConvQA) a very popular mechanism for user-system interaction. State-of-the-art methods for ConvQA over knowledge graphs can only learn from crisp question-answer pairs found in popular benchmarks. In reality, however, such training data is hard to come by: users would rarely mark answers explicitly as correct or wrong. In this work, we take a step towards a more natural learning paradigm – from noisy and implicit feedback via question reformulations. A reformulation is likely to be triggered by an incorrect system response, whereas a new follow-up question could be a positive signal on the previous turn’s answer.

CONQUER: An RL Method for Learning from User Feedback in form of Reformulations

We present a reinforcement learning model, termed CONQUER, that can learn from a conversational stream of questions and reformulations. CONQUER models the answering process as multiple agents walking in parallel on the knowledge graph:

Answering process in CONQUER
KG excerpt required for answering "When was Avengers: Endgame released in Germany?" and "What was the next from Marvel?".
Agents are shown with possible walk directions. The colored box ("Spider-man: Far from Home") is the correct answer.
Entities, literals and types are shown in red, predicates in blue.

To answer a question (e.g. What was the next from Marvel?) given its conversational context (e.g. When was Avengers: Endgame released in Germany?), CONQUER creates and maintains a set of context entities from the KG that are the most relevant to the conversation so far (Avengers Endgame, Marvel Cinematic Universe, Captain Marvel, thick-bordered boxes in the figure). From each context entity, an agent starts walking over the KG to entities in its neighborhood, which are candidate answers for the current question. These are aggregated to produce the final answer (Spider-Man: Far from Home, shaded in the figure). Walking directions are determined by actions selected by a policy network. The input to the policy network consists of the encodings of the question along with the conversational context and the KG paths (actions) involving the context entity. The output is a probability distribution over the available actions from which the agent samples an action (during training) or takes the top action (at answering time). It is trained via noisy rewards obtained from the reformulation likelihood (reformulation: -1, new intent: +1). A follow-up question is classified as reformulation or new intent by our reformulation predictor, a fine-tuned BERT model.

ConvRef: A ConvQA Benchmark with Reformulations

To evaluate CONQUER, we create and release ConvRef, a benchmark with about 11k natural conversations containing around 205k reformulations. Experiments show that CONQUER successfully learns to answer conversational questions from noisy reward signals, significantly improving over the state-of-the-art system CONVEX.

ConvRef builds upon the publicly available conversational KG-QA benchmark ConvQuestions. We used conversation sessions in ConvQuestions as input to our user study to create our benchmark ConvRef. Study participants interacted with a baseline QA system, that was trained using the available paraphrases in ConvQuestions as proxies for reformulations. Users were shown follow-up questions in a given conversation interactively, one after the other, along with the answer coming from the baseline QA system. For wrong answers, the user was prompted to reformulate the question up to four times if needed. In this way, users were able to pose reformulations based on previous wrong answers and the conversation history. We followed ConvQuestions ratios for the train-dev-test split, leading to 6.7k training conversations and 2.2k each for dev and test sets.

Sample Reformulations from ConvRef:

Movies Books Soccer Music TV series
Question: And who's the actor that plays Rachel McAdams' dad? Question: What novel was the final one? Question: To which club did CR7 move in 2018? Question: Which album was it followed by? Question: Which company produced it?
Wrong Answer: Frank Ray Perilli Wrong Answer: 01 July 2007 Wrong Answer: no answer Wrong Answer: pop music Wrong Answer: light-year
Reformulation: Who's playing Claire Cleary's dad? Reformulation: What was the name of the final novel? Reformulation: Which club Ronaldo joined in 2018? Reformulation: Name the album that followed it? Reformulation: Which company produced Buzz Lightyear of Star Command?
Wrong Answer: Michael Sheen Wrong Answer: Warner Bros. and J. K. Rowling v. RDR Books Wrong Answer: Wikimedia human name disambiguation page Wrong Answer: Judas Priest Wrong Answer: The Walt Disney Company
Reformulation: Who's playing William Cleary? Reformulation: What is the final novel of the series? Reformulation: Which team did Ronaldo start playing for in 2018? Reformulation: Which album followed Stoney? Reformulation: Which animation company produced this tv series?
Wrong Answer: no answer Wrong Answer: Harry Potter and the Philosopher's Stone (German edition) Wrong Answer: Portuguese Wrong Answer: music executive Wrong Answer: The Santa Clause 3: The Escape Clause
Reformulation: Who's casted as William Cleary? Reformulation: What is the final book of the Harry Potter series? Reformulation: Where did Ronaldo move to in 2018? Reformulation: Stoney was followed by which album? Reformulation: Name of company that produced this series?
Wrong Answer: Summer Altice Wrong Answer: Legal disputes over the Harry Potter series Wrong Answer: +81 Wrong Answer: no answer Wrong Answer: Nicole Sullivan
Reformulation: What actor plays role of William Cleary in Wedding Crashers? Reformulation: What is the final book of the series? Reformulation: Which football club did footballer Ronaldo join in 2018? Reformulation: Name the album that followed Stoney? Reformulation: Which production company is behind Buzz Lightyear of Star Command?
Wrong Answer: Alliance of Canadian Television and Radio Artist Wrong Answer: Harry Potter and the Philosopher's Stone Wrong Answer: football at the 10th National Games of the People's Republic of China Wrong Answer: no answer Correct Answer: Disney Television Animation

Please refer to our paper for further details.


For more information, please contact: Magdalena Kaiser (mkaiser AT mpi HYPHEN inf DOT mpg DOT de), Rishiraj Saha Roy (rishiraj AT mpi HYPHEN inf DOT mpg DOT de) or Gerhard Weikum (weikum AT mpi HYPHEN inf DOT mpg DOT de).

To know more about our group, please visit


Please click on the buttons below to load sample conversations:

Download ConvRef

Training Set (6720 Conversations) Dev Set (2240 Conversations) Test Set (2240 Conversations) ConvRef is licensed under a Creative Commons Attribution 4.0 International License. Creative Commons License

Code on GitHub



"Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs", Magdalena Kaiser, Rishiraj Saha Roy and Gerhard Weikum, in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021 (SIGIR'21), Virtual Event, Canada, 11 - 15 July 2021. [Preprint] [Slides] [Video] [Poster] [ACM Badge]