In your traditional software project it's much easier to make statements of what is doable. Machine learning projects require experimentation. Much more so than you might expect.
Three ways in which a machine learning project differs from your regular software project
I'm going to share my knowledge on how a ML project is different to a regular software development project. The biggest shift for me was to internalize the fact that it is so hard to foresee what is possible and what is not before the project is actually started.
In your traditional software project it's much easier to make statements of what is doable and not with just your experience as a reference. This really bugged me when leading my first ML projects. I thought, “how hard can it be to come up with an informed guess on the feasibility of this project?”. It turns out it's quite hard and requires a significantly different approach to your typical SW project. Data is your fuel but you won't know which engine to run it on. That requires experimentation. Much more so than you might expect.
From my years of running ML projects I have learned the following three things (the hard way!)
1. The budget for your ML project is harder to set than for your typical SW dev project. Example factors: A)Your data might not be what you think it is. If it's for example spread over various types of storage, you can be almost certain that you will run into much higher costs for retrieving the data than you might expect. B) The experimentation might need more time than you would guess. This is almost always true. Experimentation is just what it sounds like in this context. As a Project Manager it's easy to discount the tech teams talk about experimentation as a case of “im-sure-they-know-what-to-do-without-experimenting-they-just-want-a-little-extra-budget-to-play-around-with”. This is not the case. These projects are date driven to such an extreme extent that nobody knows what the outcome is of the early stage experimenting.
2. Open up to the Data Scientists' perspective on what is important. A common approach to many software projects is to have an IT-architect or Tech-lead be responsible for drafting up the technical approach to the project. I would recommend being extra careful not to give away all the mandate to the IT-architect. There is a very different culture/approach in a data science team vs. in an architect /developer team. The data scientist community is all about trying out the latest technique, quickly iterating experimentation and doing POCs,. They chase the latest research/new tech since the state of the art techniques are always challenged in this fast moving field.
The typical architect/developer community on the other hand are in my opinion more weary about using the latest since the focus often is on stability, scalability and low maintenance costs.
These two different approaches are of equal value when trying to develop a product that is to be put in production at scale.
Let the Data Scientist be part of planning the project in order not to lose his/hers perspective on how to go about the project.
3. Assume there is not enough data for the problem you want to solve. There is no more common phrase than “we have lots of data” when it comes to the early stages of a ML project. Unfortunately that seldom proves to be the fact of the matter in the end. A typical example is a client we worked with a while back. They did have lots of data. But! It was spread out over multiple databases, in multiple formats and owned by multiple legal entities. In the end it proved easier to just start gathering data from scratch. Make sure that you investigate what it would take to gather new data early on in the project in order not to lose valuable time.
As a project manager you need to know the above stated otherwise you run the risk of either ending up with no machine learning functionality at all or a budget completely out of control.