Sprint retrospective - Postgres phase 2, June 22 - July 24, 2020
What worked
- Well-structured
- Thanks to @helfi92 for organizing the work so well
- Concerns at the outset about whether we had signed up for too much work, but because the work was so well-defined, we were able to stay on track.
- It was large, but mostly mechanical, process
- It could have been tempting to not continue to phase 2 and stick with entities.
- Boring, but necessary and fruitful work in order to realize the other benefits of postgres down the road.
- Got it done, like ripping a band-aid off
- Over 30K lines of code changed, mostly in same area
- @petemoore touched node code and survived!
- @petemoore was also on PTO for the final two weeks of the sprint yet we still managed to get stuff done.
- Rest of the team remained effective, efficient
- Not afraid of creating throwaway, intermediate infra to make the process much easier.
- The tooling and test frameworks that @helfi92 & @djmitche made it easy for others to plug-in quickly.
- Look ahead to Object Service which may benefit from a similar pattern.
- Merge conflict resolution skills got a workout
- Usually this is something teams shy away from, but the skills themselves are super-useful to have.
- Again, the scripting that was developed to assist with rebasing was great.
- We used the minimum viable tooling to assist with migration
- We didn’t rely on an extra ORM or framework like Knex which meant that there was less for people to learn.
- We established a virtuous help cycle
- At some point during the sprint, every person on the team helped everyone else
- Lots of pairing
- Useful and practical, again because everyone was on same page
- The tree-closing window (TCW)
- We decided to rollback rather than risk a downtime of unknown duration.
- Versioning saved us, or at least gave us an acceptable fallback position where we lost less work.
- Having multiple releases as fallback points for the TCW
- Flexibility in TCW timing with Release Management
- We were able to schedule the TCW a week later than it would have normally taken place.
- Because systems are generally better automated and more resilient than a few years ago. RelMan can be more flexible with timing, provided we engage them early and we have a solid rollback story.
- Online migration
- This saved our bacon after the TCW rollback, allowing us to get back on track with the sprint. It is also a very useful tool to have in our back pocket for future work.
- Zenhub
- The Zenhub tooling around issues proved better (more full-featured) than the github version.
- The blocking/blocker labelling was useful
- e.g. When we decided to cut signing, we immediately knew how many other issues would be effective and could also be cut.
- We managed to maintain a balance between external requests and sprint focus.
What didn’t
- Zenhub
- Despite generally being a welcome addition to the sprint process, we ran into some issues with Zenhub that weren’t great:
- Occasional blank pages, sometimes associated with switching tabs.
- Labels that we added to issues sometimes went missing, i.e. our changes didn’t stick and we needed to redo some work.
- The default layout requires large display.
- @imbstack found it hard to use on a small laptop screen.
- Nice-to-have features we wish the tool had:
- the ability to fullscreen a particular pipeline for triage.
- clone an issue/card
- The tree-closing window (TCW)
- We didn’t manage to accomplish all the work we planned due to slower performance on the production database than we saw in staging.
- We still don’t understand the reason(s) behind the performance difference.
- We could have had better communication with CloudOps in the run-up to the TCW.
- If we’re planning for people to need to work weekends, we should involve them in the process as early as possible.
Process changes for next sprint
- Avoid merge conflicts by having a work item to merge other pieces of work
- When we’re all working on the same files (which is likely in the sprint model), designate a “merge master” to resolve conflicts.
- Triage
- There was some uncertainty around where issues should go when filed: Icebox vs New Issues vs Product Backlog
- By default, new issues go to the Icebox.
- This is a good default state because it means some small action is required to acknowledge issues that we will actually work on at some point.
- The team found Zenhub useful for triaging issues added to the Icebox because the volume is low.
- …whereas Github was more useful for larger triage of New Issues/Product Backlog
- Product owner responsibilities
- Now that we have access to it, the burndown chart is essential for visualizing progress and delays.
- Keeping on top of burndown chart is now a primary responsibility for the Product Owner, although the rest of the team show also be looking at it.
- Be sure to compare the view with “In Progress” issues included alongside the view with only “Closed” issues.
- To help avoid the initial plateau of productivity, future Product Owners are encouraged to front-load sprints with smaller issues/pieces of work to help people ramp up, especially when they are unfamiliar with the domain.
- When organizing the sprint, it’s important to consider factors that might introduce delays and schedule accordingly. Two such factors we encountered during this sprint were:
- Reviews for known large changes - some of the database table changes requires thousands of lines of code and proper reviews take time; and,
- Deployments - while most deployments can happen quickly, there are external factors like CloudOps availability and Firefox release schedules that we need to account for.
- Stand-ups
- We decided that forcing people on the west coast to sacrifice their sleep to attend stand-ups was not kind.
- Also, east coast people found it strange to reporting status midway through their day.
- @petemoore was on PTO, so we didn’t get his feedback re: CEST timing
- New model will have localized stand-ups at 09:30 for each timezone
- We will try to automate the announcement of the stand-ups in the Slack channel using the gcal integration.
- We will also try the Zenhub slack integration to inject more status into the channel automatically. We understand that this may make the channel noisy at the start of future sprints.
- It is still incumbent on the Product Owner and Scrum Master to review the daily stand-up notes to look for blockers or other problems.