by Nameeta Raj
Do you often find yourself in the middle of an infinite data preparation, modeling and testing loop? How about utilizing the rapid-delivery agile software development methodology for your analytics projects?
What is CRISP-DM?
The cross-industry standard process for data mining (CRISP-DM) is a framework used for creating and deploying machine learning solutions. The process involves the phases as shown in Figure 1.
There have been times when I found myself stuck in between a never-ending data preparation, modeling and testing phase, which has left me pondering around the minimum viable product concept of scrum agile.
What is Agile and What is Scrum?
Agile is an iterative software development methodology intended to reduce the time to market (time it takes from a product being conceived until its being available for sale). Scrum is one of many frameworks that can be used to implement agile development. In scrum agile, development is done in sprint cycles, and at the end of each sprint a minimum viable product is deployed. Typically, a sprint ranges anywhere from 1 to 4 weeks.
Extending agile software development approach to analytics projects.
Let us see how the merger can be accomplished. Any new requirement is prioritized and added to the product backlog by the product owner. The typical time-bound scrum meetings that are conducted are listed below;
Product Backlog Refinement Meeting:
The meeting should take place a few days before the start of a new sprint. The aim of the meeting is to understand the basic business, analyze cost benefit, and check the data scope. Initial estimation, finalization of the definition of ready and acceptance criteria are included in the meeting agenda. Business success criteria and data accessibility are some of the factors that can constitute towards the definition of ready.
Spring Planning Meeting:
The meeting should take place right before the start of a new sprint. By the end of this meeting, the team members have a thorough understanding of the requirement, which would cover a substantial portion of the business understanding phase of CRISP-DM. Re-estimation of items in the product backlog is done if required. The few days lag between the backlog refinement meeting and the sprint planning meeting is to ensure that all related activities required to meet the definition of ready has been completed. The acceptance criteria are finalized, the first sprint with a new requirement will aim at creating a minimum model fit to be demonstrated at the end of the sprint. Each consequent sprint will include further data preparation, data cleansing, and model enhancement activities. Taking the teams past velocity into consideration finalized requirements from the top of the product backlog are moved into the sprint backlog. The team is now committed to deliver the items on the sprint backlog and is ready to step into the next sprint.
Daily Scrum Meeting:
The 15-minute daily standup meeting is conducted to answer three main questions. What work was completed the previous day? What is the work planned for the day? Are there any issues obstructing progress?
Sprint Review / Customer Review / Demo meeting:
The meeting is scheduled on the last day of the sprint. During this meeting the work committed by the team is compared to the work delivered. A brief demo of the completed work is done during this meeting. An overview of the data engineering activities along with the model created can be demonstrated to obtain feedback and new ideas from the team and stakeholders. These ideas can be implemented to improve the data engineering / modeling process in upcoming sprints. Any potential flaw in business understanding or irrelevant hypothesis testing can also be caught very early on during the demo session.
Sprint Retrospect Meeting:
The good, the bad, and the ugly of the completed sprint are discussed in this meeting.
I see a few probable advantages of using the scrum agile methodology. Those advantages include all stakeholders being well informed of the project progress right from the beginning. Potential never-ending modeling cycles can be eliminated, thus saving time. The sprint demo facilitates healthy team discussions and sharing of ideas. Technical bugs or mistakes in understanding the requirements can be detected very early during the lifecycle.