Combining multiple deep learning models for production scale license plate processing pipelines

This post demonstrates a Onepanel use case using an example of license plate detection and OCR.

As deep learning models are getting more accurate in a myriad of tasks in computer vision and natural language processing, more and more companies are adopting these new more accurate methods for various business use cases. Since most of these deep learning models are compute-intensive, it becomes increasingly difficult to manage these systems as more and more users use these systems. That is why there is an increasing demand for systems that not only help to create end-to-end pipelines but also scales as the compute requirement increases. In this post, we will use an example of license plate detection and license number detection (OCR) to demonstrate how Onepanel can help you streamline this process of building deep learning pipelines on Kubernetes which are easily scalable and portable.

We will be building a license plate detector and then apply optical character recognition to identify the number from the license plate. For this, we will start by collecting the required data to train both of these models, annotate these samples, train models, and finally create Onepanel Workflows. These Workflows can be run on any cloud provider or a local machine with auto-scaling. Since Onepanel runs on the Kubernetes, it is very easy to scale your Workspaces or Workflows or change machines for your Workspaces (i.e JupyterLab) at any time without losing your data.


License Plate Detection

For license plate detection, we will be using a mixture of data from Kaggle and other sources such as this one. The following steps demonstrate how to train object detection models on Onepanel without writing any code.

1. Launch CVAT workspace

First of all, we need to launch a CVAT workspace where we will annotate our images and we can also train a model from CVAT. Go to WORKSPACES page, click on CREATE WORKSPACE, and select CVAT. For more information on various parameters, see this guide.


2. Create a CVAT task and annotate frames

Now that the CVAT workspace is up and running. Next, create a task in CVAT with those license plate images. There will be only one label license. In the demo environment, the data can be found inside raw-input/license-plates directory.


3. Train model from CVAT

Once the annotation is done, you can train the object detection (or semantic segmentation) model from CVAT. Click on Execute training Workflow from a CVAT task and select TF Object Detection Training workflow. Here, you can change some parameters or just use as it is. For this demo, Faster RCNN 101 model was trained on a K-80 GPU.


4. Pre-annotation on CVAT

If you want to annotate more images, you can use this model to pre-annotate these images by clicking on Automatic annotation. Note that you need to upload a model first.


Since we will be creating a Workflow eventually, we will need a Python script that can run inference on images. When you run a TF Object Detection Training Workflow on Onepanel, it also exports a frozen graph for inference. We can write a script to run the inference as follows.


image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
(boxes, scores, classes, num_detections) = sess.run([boxes, scores, classes, num_detections], feed_dict={image_tensor: image_np_expanded})

result = []
for i in range(len(classes[0])):
    if scores[0][i] >= 0.5:
        xmin, ymin, xmax, ymax = _normalize_box(boxes[0][i], width, height)
        label = CLASSES[classes[0][i]]
        result.append({'label':label, 'xmin': xmin, 'ymin':ymin, 'xmax':xmax, 'ymax':ymax})
output['annotations'][imagePath]  = str(result)


We consider bounding boxes with confidence greater than 0.5 only. You can find the complete script here to run the inference.


Attention-OCR


Now that we have a model that can detect the license plates. We will train a model that can detect text from those detected license plates. We will be using a popular model called Attention-OCR for this task. TensorFlow has a nice implementation of Attention-OCR here, which we will be using for this demo. The following steps demonstrate how to train this model on Onepanel.


1. Data annotation

This implementation requires an image and corresponding text file. So, this cannot be annotated on CVAT. However, you can use VS Code or JupyterLab to annotate these images. These images were cropped from the same dataset we used for license plate detection. We tried with two different annotation methods. In one, we included all the information such as the state or other text present on the license plate. In the second one, we only considered the license plate number. An annotated dataset can be found in the Onepanel demo environment in annotation-dump/ directory.


Also, since we just need a license plate in this case, original images need to be cropped. You can dump the annotations from CVAT and use it for cropping as follows.


def get_crops(args):
    tree = ET.parse(args.xml_file)
    root = tree.getroot()
    count = 0

    if not os.path.exists(args.output_dir):
        os.makedirs(args.output_dir)

    for image in root.iter('image'):
       
        img = Image.open(os.path.join(args.image_dir, image.attrib['name']))
        for idx, box in enumerate(image.findall('box')):
            cropped = img.crop((float(box.attrib['xtl']), float(box.attrib['ytl']), float(box.attrib['xbr']), float(box.attrib['ybr'])))
            cropped.save(os.path.join(args.output_dir, image.attrib['name'][:-4]+'_'+str(idx)+image.attrib['name'][-4:]))
            _ = open(os.path.join(args.output_dir, image.attrib['name'][:-4]+'_'+str(idx)+'.txt'), 'w') #empty txt file to be filled with annotations


2. Generate TFRecords

Like many other TensorFlow implementations, this one also requires input data to be in TFRecord format. Note that the input needs to be of a fixed size, so we pad the input with nulls if its length is less than the max-length. You may use JupyterLab workspace to perform such operations. A Python script used to generate TFRecords can be found here.


3. Model training

The implementation used here requires a config file like this one for each dataset we want to train the model on. Once this is done, model training can be started using train.py script. We can use the provided pre-trained model to fine-tune our model.

python train.py \
    --dataset_name=custom \
    --batch_size=1 \
    --train_log_dir=/mnt/output/ \
    --checkpoint=model.ckpt-399731 \
    --max_number_of_steps=110000


4. Model inference

The TensorFlow implementation comes with the demo inference script that we can use to run the inference on test images. This requires a dataset config file (custom.py). The script needs to be modified as shown below to run the inference on a separate image and write the output to a file.


def run(checkpoint, batch_size, dataset_name, image_path_pattern, annotations):
  images_placeholder, endpoints = create_model(batch_size,
                                               dataset_name)
  session_creator = monitored_session.ChiefSessionCreator(
      checkpoint_filename_with_path=checkpoint)
  count = 0
  width, height = get_dataset_image_size(dataset_name)
  with monitored_session.MonitoredSession(
          session_creator=session_creator) as sess:
    for path,boxes in annotations.items():
        print("Processing: ", path) 
        img = cv2.imread(os.path.join('/mnt/data/datasets/images', os.path.basename(path)))
        for box in boxes:
            img_cropped = img[box['xmin']:box['xmax']+1, box['ymin']:box['ymax']+1]
            pil_img = PIL.Image.fromarray(img_cropped)
            img = pil_img.resize((width, height), PIL.Image.ANTIALIAS)
            count += 1
            predictions = sess.run(endpoints.predicted_text,
                           feed_dict={images_placeholder: np.asarray(img)[np.newaxis, ...]})
            file_writer = open('/mnt/output/'+os.path.basename(path).split('.')[0]+'.txt', 'w')
            file_writer.write([pr_bytes.decode('utf-8') for pr_bytes in predictions.tolist()][0])


The complete script can be found here.


5. Creating Onepanel Workflow

The Workflow for Object Detection training comes by default in the Onepanel CE. But since we are working on a custom model, we need to create a Workflow so that it is easily reproducible and scalable. If you are new to Onepanel, it might be helpful to take a look at this and this guide to better understand the concept of Workflows.


Since Onepanel does not have a training Workflow for this model by default, we need to create a new Workflow for the same. We can use TF Object Detection Training Workflow as a base template. We already have everything in place, we just need to update the parameters, and commands we execute.

arguments:
  parameters:
  - name: source
    value: https://github.com/tensorflow/models.git
    displayName: Model source code
    type: hidden
    visibility: private

  - name: trainingsource
    value: https://github.com/onepanelio/LicensePlateOcr.git
    type: hidden
    visibility: private
    
  - name: batch-size
    value: 1
    visibility: internal
    
  - name: max-steps
    value: 11000
    visibility: internal
    
  - name: cvat-annotation-path
    value: annotation-dump/license-plate-shorter-new-15
    displayName: Dataset path
    hint: Path to annotated data in default object storage (i.e S3). In CVAT, this parameter will be pre-populated.
    visibility: internal

  - name: cvat-output-path
    value: workflow-data/output/license-plate-ocr-output1
    hint: Path to store output artifacts in default object storage (i.e s3). In CVAT, this parameter will be pre-populated.
    displayName: Workflow output path
    visibility: internal
    

  - name: tf-image
    value: tensorflow/tensorflow:1.13.1-py3
    type: select.select
    displayName: Select TensorFlow image
    visibility: public
    hint: Select the GPU image if you are running on a GPU node pool
    options:
    - name: 'TensorFlow 1.13.1 CPU Image'
      value: 'tensorflow/tensorflow:1.13.1-py3'
    - name: 'TensorFlow 1.13.1 GPU Image'
      value: 'tensorflow/tensorflow:1.13.1-gpu-py3'
    

  - displayName: Node pool
    hint: Name of node pool or group to run this workflow task
    type: select.select
    name: sys-node-pool
    value: Standard_D4s_v3
    visibility: public
    required: true
    options:
    - name: 'CPU: 2, RAM: 8GB'
      value: Standard_D2s_v3
    - name: 'CPU: 4, RAM: 16GB'
      value: Standard_D4s_v3
    - name: 'GPU: 1xK80, CPU: 6, RAM: 56GB'
      value: Standard_NC6
    - name: 'GPU: 1xV100'
      value: Standard_NC6s_v3
      
  - name: dump-format
    value: cvat_tfrecord
    visibility: public
    
entrypoint: main
templates:
- dag:
    tasks:
    - name: train-model
      template: tensorflow

  name: main
- container:
    args:
    - |
      apt-get update && \
      apt-get install -y python3-pip git wget unzip libglib2.0-0 libsm6 libxext6 libxrender-dev && \
      
      cd /mnt/src/tf/research && \
      export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim && \
      cd /mnt/src/train && \
      pip install -r requirements.txt && \
      cp -f custom.py /mnt/src/tf/research/attention_ocr/python/datasets/ && \
      cp -f __init__.py /mnt/src/tf/research/attention_ocr/python/datasets/ && \
      cp -f ./data/charset_size.txt /mnt/data/datasets/ && \
      python generate_tfrecords.py \
        --charset_path=/mnt/src/train/data/charset_size.txt \
        --data_dir=/mnt/data/datasets/ \
        --output_tfrecord=/mnt/data/datasets/tfexample_train  \
        --text_length=14 && \
      
      cd /mnt/src/tf/research/attention_ocr/python/ && \
      wget http://download.tensorflow.org/models/attention_ocr_2017_08_09.tar.gz && \
      tar xf attention_ocr_2017_08_09.tar.gz && \
      export PYTHONPATH=$PYTHONPATH:./datasets/ && \
      python train.py \
        --dataset_name=custom \
        --batch_size={{workflow.parameters.batch-size}} \
        --train_log_dir=/mnt/output/ \
        --checkpoint=model.ckpt-399731 \
        --max_number_of_steps={{workflow.parameters.max-steps}}
    command:
    - sh
    - -c
    image: '{{workflow.parameters.tf-image}}'
    volumeMounts:
    - mountPath: /mnt/data
      name: data
    - mountPath: /mnt/output
      name: output
    workingDir: /mnt/src
  nodeSelector:
    beta.kubernetes.io/instance-type: '{{workflow.parameters.sys-node-pool}}'
  inputs:
    artifacts:
    - name: data
      path: /mnt/data/datasets/
      s3:
        key: '{{workflow.namespace}}/{{workflow.parameters.cvat-annotation-path}}'
   
    - git:
        repo: '{{workflow.parameters.source}}'
      name: src
      path: /mnt/src/tf
    - git:
        repo: '{{workflow.parameters.trainingsource}}'
      name: tsrc
      path: /mnt/src/train
  name: tensorflow
  outputs:
    artifacts:
    - name: model
      optional: true
      path: /mnt/output
      s3:
        key: '{{workflow.namespace}}/{{workflow.parameters.cvat-output-path}}/{{workflow.name}}'

volumeClaimTemplates:
- metadata:
    creationTimestamp: null
    name: data
  spec:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 200Gi
- metadata:
    creationTimestamp: null
    name: output
  spec:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 200Gi


We added some parameters that we may take from the user. The dump-format parameter is used when the workflow is being executed from CVAT since CVAT needs to know the dump format. The current implementation accepts the COCO format.



Inference Workflow


Now that both models are ready, we can go ahead and create a final Workflow which takes images as input and generates the OCR output. Unlike other Workflows, this Workflow will have multiple steps.


We will first define a container that detects license plates, then this output will be written in a JSON file. This is done by the script which was referenced earlier.


Here is how our DAG step looks like for this container.

- name: license-detector
    inputs:
      artifacts:
      - name: src
        path: /mnt/src
        git:
          repo: "https://github.com/onepanelio/LicensePlateOcr.git"
      - name: data
        path: /mnt/data/datasets/
        s3:
          key: '{{workflow.namespace}}/{{workflow.parameters.cvat-annotation-path}}'
      - name: models
        path: /mnt/data/models
        s3:
          key: '{{workflow.parameters.detector-path}}'
    outputs:
      artifacts:
      - name: model
        path: /mnt/output
        optional: true
        s3:
          key: '{{workflow.namespace}}/{{workflow.parameters.cvat-output-path}}/{{workflow.name}}'
    container:
      image: '{{workflow.parameters.tf-image}}'
      command: [sh,-c]
      args:
       - |
        && apt update \
        && apt install libgl1-mesa-glx ffmpeg libsm6 libxext6 libglib2.0-0 libxext6 libxrender-dev wget unzip -y \
        && cd /mnt/src/ \
        && pip install -r requirements.txt \
        && python license_detection.py --weights=/mnt/data/models/frozen_inference_graph.pb --dataset=/mnt/data/datasets/images/
      workingDir: /mnt/src
      volumeMounts:
      - name: output
        mountPath: /mnt/output


Next, this will be followed by an OCR model as follows.

- name: ocr-detector
    inputs:
      artifacts:
      - name: tsrc
        path: /mnt/src/train
        git:
          repo: 'https://github.com/onepanelio/LicensePlateOcr.git'
      - git:
          repo: https://github.com/tensorflow/models.git
        name: src
        path: /mnt/src/tf
      - name: data
        path: /mnt/data/datasets/
        s3:
          key: '{{workflow.namespace}}/{{workflow.parameters.cvat-annotation-path}}'
      - name: ocr-model
        path: /mnt/data/models/
        s3:
          key: '{{workflow.namespace}}/{{workflow.parameters.ocr-model-path}}'
      - name: output-data
        path: /mnt/data/outputdata/
        s3:
          key: '{{workflow.namespace}}/{{workflow.parameters.cvat-output-path}}/{{workflow.name}}'
    outputs:
      artifacts:
      - name: model
        path: /mnt/output
        optional: true
        s3:
          key: '{{workflow.namespace}}/{{workflow.parameters.cvat-output-path}}/{{workflow.name}}'
    container:
      image: '{{workflow.parameters.tf-image}}'
      command: [sh,-c]
      args:
       - |
          apt-get update && \
          apt-get install -y python3-pip git wget unzip libglib2.0-0 libsm6 libxext6 libxrender-dev && \

          cd /mnt/src/tf/research && \
          export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim && \
          cd /mnt/src/train && \
          pip install -r requirements.txt && \
          cp -f custom.py /mnt/src/tf/research/attention_ocr/python/datasets/ && \
          cp -f __init__.py /mnt/src/tf/research/attention_ocr/python/datasets/ && \
          cp -f demo_inference.py /mnt/src/tf/research/attention_ocr/python/ && \
          cp -f ./data/charset_size.txt /mnt/data/datasets/ && \
         
          cd /mnt/src/tf/research/attention_ocr/python/ && \
          export PYTHONPATH=$PYTHONPATH:./datasets/ && \
          python demo_inference.py \
            --dataset_name=custom \
            --checkpoint=/mnt/data/models/ \
            --batch_size=1 \
            --license_boxes_json_path=/mnt/data/outputdata/output.json
            
      workingDir: /mnt/src

Note that this container takes original data as an input as well since it crops license plates using coordinates produced by the previous model. Once you run this Workflow, the output will be saved in the /mnt/output/ which we can see directly from the Onepanel Workflow page by clicking on the detect-ocr node, then Artifacts. Also note that the final Workflow has one more node preprocess-input-data which we are not using in this demo. But if you want to add any pre-processing, feel free to modify that.

Live Demo

In this post, you saw how you can leverage various components of Onepanel to easily build end-to-end deep learning pipelines that are portable and scalable.

You can also try out this Workflow and other components such as CVAT on Onepanel. If you want to try this out live on Onepanel, please fill out this form so that we can send you a token for the login.  Onepanel CE is open-source and can be found on GitHub. If you have any questions, please feel free to reach out if you have any questions at info@onepanel.io or join our Slack channel.


View all articles
Share: