The first step to kick off a Machine Learning (ML) project is to have a written proposition for the business problem, and second, to frame the ML problem. Before even discussing an ML method, it is necessary first to understand the business problem and map it to direct ML problem formulations. Those ML problem formulations are ultimately a specific adaptation of a known ML problem in the context of the use case at hand.
Defining a business problem involves writing the project success criteria, its objectives, its financial justification, the background scenario and assessing the risks involved. It’s very important to define the success criteria because they are a set of expectations about the ML work progress. Expectations? Yes. An ML project is surrounded by risks and it depends on how the data available represents reality. Establishing a ground truth is near impossible. The financial justification for an ML project is sometimes quite difficult to determine. For example, if a company intends to use computer vision to automate information extraction from documents, time is saved and the company becomes more agile. But if the automated process doesn’t impact the business directly, it is hard to determine the ROI.
Mapping the business problem to an ML problem may lead to not considering a couple of the success criteria. Breaking the original problem into multiple ML problems may not be a conventional task. “There is no free lunch.” Most of the time we have to refine the problem to a specific situation, a simulated event, or change the original business problem statement. Maybe the business problem is too hard to decompose or there is no data or evidence to observe the real scenario. For example, everybody wants to implement a bot that always earns money on the stock market, but even people lose sometimes. A company has implemented a bot called @monkeystocks that operates randomly buying and selling. Check out its performance sometime: https://www.instagram.com/p/ByxqPtQBsOI/?igshid=agzze7mhu5ac.
After working on those two steps first, you can then determine a data governance strategy. Data governance as a framework defines and helps implement the overall management of the obtainability, usability, integrity, security, and effectiveness of data used in any ecosystem. Data governance offers an unpretentious and direct method to track and safeguard usage of the right data, but also recognizes data errors and promptly raises red flags and helps eliminate those errors. It empowers an organization to spend less time unearthing the accurate data source needed to feed the ML algorithms and instead dedicate more time to creating and refining the AI models. However, these rules for data filtering and curating might be too constraining. This means that an ML algorithm that uses them might suffer from decreased performance if the data environment were to change dynamically over time.
Google offers an ML problem framing for those data scientist and business analysts who want to understand in more detail how this process works: https://developers.google.com/machine-learning/problem-framing/.
I also find the following blog post very useful for understanding the whole process: https://becominghuman.ai/data-science-simplified-principles-and-process-b06304d63308.