Discovering Essential Code Elements in Informal Documentation

Peter C. Rigby and Martin P. Robillard

McGill University, Canada

Track: Technical Research
Session: Tools
To access the knowledge contained in developer communication, such as forum posts, it is useful to determine automatically the code elements referred to in the discussions. We propose a novel traceability recovery approach to extract the code elements contained in various documents. As opposed to previous work, our approach does not require an index of code elements to find links, which makes it particularly well-suited for the analysis of informal documentation. When evaluated on 188 StackOverflow answer posts containing 993 code elements, the technique performs with average 0.92 precision and 0.90 recall. As a major refinement on traditional traceability approaches, we also propose to detect which of the code elements in a document are salient, or germane, to the topic of the post. To this end we developed a three-feature decision tree classifier that performs with a precision of 0.65-0.74 and recall of 0.30-0.65, depending on the subject of the document.