Last month I attended some sessions of Philly ETE, and I took a bunch of notes from this talk and have procrastinated about putting them into post-form. Videos from the conference are available to attendees now, and will be public online in a few months, but this talk is up online as performed at different conferences if you’re very curious. No guarantees of which version you’re getting though 🙂
This talk was provocatively titled (pointing out how technical debt often isn’t prioritized), with some coverage of techniques and approaches recommended by the speaker, Adam Tornhill. Tornhill has a couple books and a company that does this work, so it seems like reading the books is a natural follow-on for being intrigued by the talk.
- CodeScene (basically, all this analysis as a product. Sounds similar, but different approach, to Code Climate’s quality analysis)
“Reducing” technical debt is like jumping from a better floor
In opening the talk, Tornhill used an analogy to take issue with the common refactoring tactic of “make less code” saying (paraphrasing from my notes notes): Reducing from 7000 LoC to 5000 LoC; it’s like jumping from the 5th instead of the 7th floor (i.e. still bad). Some interesting points here were:
- What behavior do we reinforce by quantifying technical debt?
Like many “manage to the metric” issues, if you approach technical debt from purely a quantitative approach, it will be inefficiently done (see point three).
- Quantifying technical debt isn’t generally actionable
Later in the talk, this point is clarified or possibly nullified — Tornhill shows some techniques that do examine the code, and does talk a bit about code analysis
- Technical debt is invisible in the code — remediation cost
You can’t generally see technical debt, because the cost of it also includes the cost it would take to fix it (why it doesn’t get fixed)
Tornhill referenced a paper that, to me, sounded like it forms the basis of most of his work: Fenton’s Software measurement paper.
Tornhill talked about looking for symptoms of low code health, including: low cohesion, deeply nested logic, bumpy roads, primitive obsession, excess function arguments… a blog post on the Codescene product site talks about the “Code Health” concept in the context of how it elevates that data: codescene.com/blog/measure-health-of-your-codebase
If we find a large area of code, how do we decide what to work on? Tornhill describes using “software design x-rays” (book reference for sure). In the example in the talk I saw, it was a section of 500 lines of code, which is far less than the larger component’s 20k lines, and of course, far more approachable than refactoring the whole codebase.
One of the motivators behind using hotspots is “You don’t have to fix all technical debt.” You optimize improvements in the beginning, and then there should be a long tail, because the hairiest things to tackle have been covered.
The idea is to find parts of the code that are complex, and have been touched often. If you don’t touch it often, why would it be a priority to fix that area?
Quote: “Your best bug fix is time” — code that hasn’t been modified is likely to be bug-free (from the Software Design X-Rays book)
Technical debt that wasn’t; or: turns out, people write software
One definition of legacy code is “code lacking in quality”. Another is “that we didn’t write ourselves”. Tornhill told a story of looking at code complexity for a project he’d been told was very complicated. By using “behavioral code analysis” aka, using the git history, it showed that the work was largely done by people who were no longer working on the project. This is valuable information for ex. a manager to keep track of to mitigate offboarding risk.
knowledge loss + relevance (hotspot) + impact (complexity)
[note from current Pam reading the notes: this formula doesn’t have a = so I am unsure it is written correctly]
We overestimate complexity of unfamiliar code; conversely we underestimate complexity of familiar code. Get objective data on the codebase to make more informed decisions!
Tornhill started working in this field10+ years ago, without much tooling around it. Now there is!
- Codescene — product
- Track functions with git!
git log -L ::file
Git then traces the evolution of that function
Discussing static analysis, one problem with static analysis is it doesn’t take time into context — it is a low-level feedback loop during development. Need another set of techniques to analyze and prioritize tech debt
Relationship between test coverage and hotspots: Tend to find that test coverage is already a good sign; but some of the worst technical debt can be in the test code. Opinion: create a mental divide between application and test code.
I was really into this talk and it was a fantastic pitch to get me to buy the book (which I will … when I finish one or two of the current technical books I’m reading. oops.). Even then, I would like to see what happens if I ran some of the open source tooling against ex. work projects … I would also suggest if you read through these notes and use VS Code, the Git Lens extension is glorious as far as navigating around a file and being able to easily access git history (when did this change?) within your editor.