This post demonstrates how you can build complex machine learning Workflows on Onepanel and use it from CVAT to build end-to-end machine learning pipelines on Onepanel. This post uses an example of ensemble inference as an example. Here, we will be using multiple object detection models to generate inference and then use ensemble strategies to come up with the final output,
In the previous blog, we saw how we can add new models to train directly from CVAT. Once the model has been trained, you would also want to use it to run the inference. By default, CVAT supports inference on models supported by TensorFlow Object Detection API and Mask RCNN. Currently, the inference runs inside CVAT itself, which will be changed in the near future to serverless functions, so it will be easier to add support for new models. But for now, you can create a Workflow that runs the inference and outputs a CVAT XML or any other file that you can upload to CVAT. You can also update the training Workflow to accept a flag for training or inference, if you prefer to have only one Workflow. In this blog post, we will see how you can create a Workflow to run inference on two different models running on different machines. To further demonstrate how you can add post-processing blocks, we will see how you can combine output from multiple object detection models.
Below flowchart demonstrates the Workflow that we will be creating.
Let's first briefly review how we can combine output from multiple object detection models. We will be using a technique described in this paper. This method takes a list (`L`) of lists where each sublist is a list of bounding boxes detected in any given image. The first step in the ensembling process is to flatten `L` so it has a list of detections rather than a list of lists.
Next, the overlapping boxes will be grouped together using intersection over union. Below is a formula for the calculation of IoU.
For a pair of bounding boxes, IoU indicates how much area is overlapping. With this operation, we now have a list of lists where each sublist is a list of detected objects surrounding a particular area. The ensemble algorithm uses the length of each of these sublists to determine if this region contains an object or not. The final decision can be made using one of these three voting strategies:
The authors have also published their code on GitHub. It has scripts to run inference using models such as YOLO, SSD, Mask RCNN, Faster RCNN, and Retina Net.
The first thing we need to do is make sure this code can be used in Onepanel Workflows. Normally, we recommend to have a script in your repository which takes all the required inputs from the user - usually using something like `argparse` - and performs certain actions based on those inputs. We are going to do something similar here by creating a script `run.py` which takes inputs from user and performs appropriate actions.
This script will have one function which performs inference or ensembling based on user input.
We will also have argument parser like this:
To further ensure we have all the dependencies installed, we will create a requirements.txt file as follows.
Note: The best way to get a list of dependencies with specific version would be to use a virtual environment to run this code and then use pip freeze to get a list of dependencies.
To make setup smoother, we will create a setup.sh file which installs other dependencies.
Finally, in order to seamlessly integrate this workflow into Onepanel, we should use input and output artifacts. In simpler terms, we will be attaching a S3 directory to pull input data from and will be dumping output into a S3 directory as well. This can be done while creating a Workflow but we just need to make sure our code reads and writes to correct location.
We will be mounting input data to /mnt/data/datasets and output data from /mnt/output will be saved to S3 (or GCS). Therefore, we need to update our code to read data from /mnt/data/datasets and write data into /mnt/output. Thankfully, the script accepts an input folder as an argument. So we can just pass /mnt/data/datasets as an input. For output, we will be moving all the files to /mnt/output. You can handle this case any way you like; we just need to ensure output files are in /mnt/output.
One last change we need to make is to convert the output XML file into CVAT compatible XML file. We already have a script that does this, but that script requires a dictionary as an input, so we will need to add the following function:
Here, we take input path as an input and generate a final XML which has output of all images. We also use exported data to get a list of labels. Following is a function to get a list of labels.
Now, our code is good to go for automation in Workflows.
It is usually a good idea to start with a base template and then make required changes to achieve our goal. For this template, our goal is to run inference using two different models on two different nodes. For this task, we can use following template as our starting point:
At first, it might look intimidating but a graph might be easier to look at. You can look at the graph by clicking on Show Graph Preview while creating a Workflow. Here is what it looks like:
As you can see, two models are being trained following a post processing node. However, our pipeline is simpler than this. We just want to run inference on two different nodes and then use output from those two as an input for the ensemble method. Here is what our pipeline would look like visually:
Now, let's see how we can further update this template to create our Workflow.
We will start out by updating parameters; these are input parameters that we will take from the user. In this case, we will be taking ensemble option, dataset path, and output path as parameters. Let's add these to the top of the template:
Here, we have two special parameters (denoted with cvat- prefix) that are automatically populated by CVAT based on where it dumped the annotation data.
Now, we will remove those unnecessary tasks such as post processing ones and rename others to match our needs.
Next, we will update containers for each of the tasks except process-input-data. You can have process-input-data perform certain actions but we will leave it as it is for now.
First, let's update predict-yolo-model:
Here, we updated the command which executes the script, Docker image, and artifacts.
Now, let's update predict-retinanet-model and ensemble. All three are very similar:
Note that each of these containers can run on a different machines. Below example runs the container on K80 (Standard_NC6 on Azure) GPU.
Here is what our final template looks like:
Now that our template is ready, let's add a label used-by with a value cvat so that we can use it from any CVAT Workspace.
In CVAT, click on Execute training Workflow for a specific task and select the newly created Workflow. An important thing here is to select MS COCO as a dump format since we used this format for our code changes above.