The difference between data science consultants and a data science team at a company depending on machine learning (ML) solutions is significant. In the first instance, the consultants need to have a good understanding of both the business and technology. In the second, the company’s data scientists must follow the established, well-documented process.
That well-documented process involves selecting, cleaning and generating representative samples. It also involves prototyping solutions and assessing them, training the approved prototype candidates at scales with large compute power and data, validating the training model results on simulated scenarios using testing data and, finally, deploying the best performance model into production to produce results in real-time or periodically. Not to mention the continuous monitoring and operation of all deployed ML solutions.
At Lyft, for example, the mobile app is able to send several GPS coordinates to the platform. Internally, a data processing job calculates the exact location of the client. Next, possessing this information, an algorithm predicts the client destination based on the rider’s history. Following that, a payment method is suggested. If there are multiple cards registered, and depending on the trip the client is taking – business or personal – it can also run ML models to identify the best driver to dispatch and can estimate the ETA for the driver to reach the client’s location. If a shared ride is selected, it can pick the best passenger matches that would not delay the ride.
However, a data science consultant team needs to implement a different process in order to create a documented and established process at the end of the project, depending on the project requirements. Our process has three stages, the first stage is called “design and experiment,” the second stage is called “build pipelines,” the third stage is called “productionalization”. The data science consulting team members not only need to identify the best features, build model prototypes, assess the prototypes, and train and serve the model at scale, but they also need to implement the whole infrastructure and tools that are going to be used and maintained by the client.
At Pythian, for instance, the enterprise data science team is responsible for architecting and implementing solutions. The team is able to make design decisions and establish the methods to be followed in order to successfully achieve the business results. Last but not least, they share Pythian’s experience and train the client team, collaborating on the development phase and operating the delivered solution according to the best practices in the market, creating confidence between the team and the client for engaging in future opportunities.