Data Format
MSCOCO
This style is used by the youtubevos' dataset and thus this MaskTrackRCNN repo. Details about youtubevos can be found in the other resources section. The labels is in JSON format, and this is how it looks like:
{
"info" : info,
"videos" : [video],
"annotations" : [annotation],
"categories" : [category],
}
video{
"id" : int,
"width" : int,
"height" : int,
"length" : int,
"file_names" : [file_name],
}
annotation{
"id" : int,
"video_id" : int,
"category_id" : int,
"segmentations" : [RLE or [polygon] or None],
"areas" : [float or None],
"bboxes" : [[x,y,width,height] or None],
"iscrowd" : 0 or 1,
}
category{
"id" : int,
"name" : str,
"supercategory" : str,
}
There are some important notes about this dataformat:
[1] The category id must starts from 1
In The i returned by the enumerate starts from 1, and thus the cat_id maps to i+1 that starts from 1. Following are related code in the ytvos.py ---END DETAILS---
details
/mmdet/datasets/kitti.py
, the category id is loaded through def load_annotations(self, ann_file):
self.ytvos = YTVOS(ann_file)
self.cat_ids = self.ytvos.getCatIds()
self.cat2label = {
cat_id: i + 1
for i, cat_id in enumerate(self.cat_ids)
}
def getCatIds(self, catNms=[], supNms=[], catIds=[]):
"""
filtering parameters. default skips that filter.
:param catNms (str array) : get cats for given cat names
:param supNms (str array) : get cats for given supercategory names
:param catIds (int array) : get cats for given cat ids
:return: ids (int array) : integer array of cat ids
"""
catNms = catNms if _isArrayLike(catNms) else [catNms]
supNms = supNms if _isArrayLike(supNms) else [supNms]
catIds = catIds if _isArrayLike(catIds) else [catIds]
if len(catNms) == len(supNms) == len(catIds) == 0:
cats = self.dataset['categories']
else:
cats = self.dataset['categories']
cats = cats if len(catNms) == 0 else [cat for cat in cats if cat['name'] in catNms]
cats = cats if len(supNms) == 0 else [cat for cat in cats if cat['supercategory'] in supNms]
cats = cats if len(catIds) == 0 else [cat for cat in cats if cat['id'] in catIds]
ids = [cat['id'] for cat in cats]
return ids
def __init__(self, annotation_file=None):
...
dataset = json.load(open(annotation_file, 'r'))
self.dataset = dataset
...
[2] For annotation, len(bboxes) = len(segmentations) = len(areas) = the number of frames in that sequence. So if an instance does not appear in frame i, then bboxes[i]
, segmentations[i]
, and areas[i]
for that frame are None
.
[3] Annotation id corresponds to the global instance id in other dataset. This means one instance id cannot appear in different sequences/videos.
Kitti MOTS
Kitti MOTS is developed upon the Kitti Tracking 2012 dataset by adding mask information. In order to understand its data format better, we need to be familiar with both of them.
Each video is a sequence of image frames. Kitti provides txt label file for each sequence, and each row in that txt file is in the following format:
# Values Name Description
----------------------------------------------------------------------------
1 frame Frame within the sequence where the object appearers
1 track id Unique tracking id of this object within this sequence
1 type Describes the type of object: 'Car', 'Van', 'Truck',
'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram',
'Misc' or 'DontCare'
1 truncated Integer (0,1,2) indicating the level of truncation.
Note that this is in contrast to the object detection
benchmark where truncation is a float in [0,1].
1 occluded Integer (0,1,2,3) indicating occlusion state:
0 = fully visible, 1 = partly occluded
2 = largely occluded, 3 = unknown
1 alpha Observation angle of object, ranging [-pi..pi]
4 bbox 2D bounding box of object in the image (0-based index):
contains left, top, right, bottom pixel coordinates
3 dimensions 3D object dimensions: height, width, length (in meters)
3 location 3D object location x,y,z in camera coordinates (in meters)
1 rotation_y Rotation ry around Y-axis in camera coordinates [-pi..pi]
1 score Only for results: Float, indicating confidence in
detection, needed for p/r curves, higher is better.
For Kitti MOTS, the mask labels are in either png or txt format. The txt format looks like:
# time_frame obj_id class_id img_height img_width rle
52 1005 1 375 1242 WSV:2d;1O10000O10000O1O100O100O1O100O1000000000000000O100O102N5K00O1O1N2O110OO2O001O1NTga3
# obj_id 10,000 denotes an ignore region and 0 is background
# class and instance id is calculated from obj_id:
class_id = obj_id // 1000
obj_instance_id = obj_id % 1000
Dataset Convertion
In order to use other dataset on MaskTrackRCNN, we need to first convert it to MSCOCO style and the decide how to split it to train/val/test set. The way to do the split depends on if the dataset has official split requirement. Here are two code sample to convert Kitti MOTS to MSCOCO and random split to train/test set in 8:2 ratio.
linkAndRenameFiles.py: prepare data for mots2cocoVer2.py. This link image from different folder to a combined folder
import os
cmd = 'ln -s %s/%s %s/%s_%s'
# path = '/home/liz220/Documents/dataset/MOTS/training/image_02'
# des = '/home/liz220/Documents/dataset/MOTS/training/image_combine'
path = '/home/liz220/Documents/dataset/MOTS/Annotations/training/instances'
des = '/home/liz220/Documents/dataset/MOTS/Annotations/training/instances_combine'
for root, dirs, files in os.walk(path):
dirs.sort()
for dirname in dirs:
current_path = os.path.join(path,dirname)
for subroot,_, subfiles in os.walk(current_path):
for filename in subfiles:
os.system(cmd %(current_path,filename,des,dirname,filename))
mots2cocoVer2.py: convert kitti to MSCOCO
import argparse
import json
import sys
import os
import cv2
import numpy as np
from tqdm import tqdm
from PIL import Image
import imagesize
import random
from pycocotools import mask as maskUtils
def parseNonemptyArgs():
parser = argparse.ArgumentParser(description='Convert dataset')
parser.add_argument(
'--outdir', help="output dir for outputVer2.json files", default=".", type=str)
parser.add_argument(
'--mask_label_dir', help="data dir for mask annotations to be converted",
default='/home/liz220/Documents/dataset/MOTS/Annotations/training/instances_txt', type=str)
parser.add_argument(
'--bbox_label_dir', help="data dir for ground truth bbox annotations to be converted",
default='/home/liz220/Documents/dataset/MOTS/Annotations/training/label_02', type=str)
parser.add_argument(
'--img_label_dir', help="data dir for original images in %s_%06d.png naming format",
default='/home/liz220/Documents/code/MaskTrackRCNN/data/MOTS/images/image_combine', type=str)
return parser.parse_args()
def main():
next_img_id = 0
filename2imageid = {}
next_instance_id = 0
local2globalinstanceid = {}
ann_dict = {}
bbox_dict = {}
images = []
annotations = []
source_label = {'Car': 1, 'Pedestrian': 2}
to_dest_label = {1:2, 2:1}
dest_label = {'person': 1, 'car': 2}
args = parseNonemptyArgs()
# build filename to imageid map, using all the png file. In case the some frame has blank label.
for _, _, filenames in os.walk(args.img_label_dir):
filenames.sort()
for file_name in filenames:
image_id = next_img_id
filename2imageid[file_name] = next_img_id
next_img_id+=1
image = {}
image['id'] = image_id
image['width'], image['height']= list(map(int,imagesize.get(os.path.join(args.img_label_dir, file_name))))
image['file_name'] = file_name
image['seg_file_name'] = file_name
images.append(image)
for _, _, filenames in os.walk(args.mask_label_dir):
filenames.sort()
for filename in tqdm(filenames):
print('Processed %s' % filename)
sequence = filename[:-4]
mask_label_file_path = os.path.join(args.mask_label_dir, filename)
bboxs_label_file_path = os.path.join(args.bbox_label_dir, filename)
# load bbox labels
with open (bboxs_label_file_path,"r",encoding='utf-8') as bbox_source:
# time_frame id class_id img_height img_width rle
for line in bbox_source.readlines():
line = line.split(" ")
if(source_label.get(line[2],-1)==-1):
continue
frame_id, local_instance_id, category, left, top, right, bottom = int(line[0]), int(line[1]), to_dest_label[source_label[line[2]]], float(line[6]), float(line[7]), float(line[8]), float(line[9])
bbox_key = "%s_%04d_%04d"%(sequence,frame_id,local_instance_id)
bbox_dict[bbox_key] = [frame_id, local_instance_id, category, left, top, right, bottom]
# load mask labels
with open (mask_label_file_path,"r",encoding='utf-8') as mask_source:
# time_frame id class_id img_height img_width rle
for line in mask_source.readlines():
frame_id, obj_id, class_id, img_height, img_width, rle = line.strip("\n").split(" ")
assert(int(class_id) == int(obj_id) // 1000)
local_instance_id = int(obj_id) % 1000
ignored_region_id = 10000
if(int(obj_id) == ignored_region_id):
continue
file_name = "%s_%06d.png"%(sequence,int(frame_id))
image_id = filename2imageid.get(file_name, -1)
assert(image_id != -1)
instance_id_key = "%s_%04d"%(sequence,local_instance_id)
global_instance_id = local2globalinstanceid.get(instance_id_key,-1)
if(global_instance_id == -1):
global_instance_id = next_instance_id
local2globalinstanceid[instance_id_key] = next_instance_id
next_instance_id+=1
bbox_key = "%s_%04d_%04d"%(sequence,int(frame_id),local_instance_id)
bbox_info = bbox_dict.get(bbox_key, None)
if bbox_info!=None:
_frame_id, _local_instance_id, category, left, top, right, bottom = bbox_info
assert(_frame_id == int(frame_id) and _local_instance_id == local_instance_id and category == to_dest_label[int(class_id)])
mask = {'size': [int(img_height), int(img_width)], 'counts': rle.encode(encoding='UTF-8')}
bbox = xyxy_to_xywh([left, top, right, bottom])
area = float(maskUtils.area(mask))
ann = {}
use_polygon = False
if use_polygon:
ground_truth_binary_mask = maskUtils.decode(mask)
contours, _ = cv2.findContours(ground_truth_binary_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_TC89_L1)
if contours == []:
print('Warning: empty contours.')
continue
new_contours = []
for x in range(np.shape(contours)[0]):
if np.size(contours[x]) < 6:
print('3 points required for polygon')
continue
new_contours.append( np.reshape(contours[x],(np.size(contours[x]))).tolist())
if new_contours == []:
continue
ann['segmentation'] = new_contours
else:
ann['segmentation'] = {'size': [int(img_height), int(img_width)], 'counts': rle}
# if random.random() < 0.01:
# displaybbox(maskUtils.decode(mask),[left, top, right, bottom],bbox_key,to_dest_label[int(class_id)])
ann['id'] = str(global_instance_id)
ann['image_id'] = image_id
ann['category_id'] = to_dest_label[int(class_id)]
ann['iscrowd'] = 1
ann['area'] = area # changed later
ann['bbox'] = bbox #xywh_box
annotations.append(ann)
ann_dict['images'] = images
categories = [{"id": dest_label[name], "name": name} for name in dest_label.keys()]
ann_dict['categories'] = categories
ann_dict['annotations'] = annotations
saveAnnotationAsJson(ann_dict,args.outdir)
def saveAnnotationAsJson(ann_dict, out_dir):
with open(os.path.join(out_dir, 'outputVer2.json'), 'w') as outfile:
outfile.write(json.dumps(ann_dict))
def xyxy_to_xywh(xyxy_box):
xmin, ymin, xmax, ymax = xyxy_box
TO_REMOVE = 1
xywh_box = (xmin, ymin, xmax - xmin + TO_REMOVE, ymax - ymin + TO_REMOVE)
return xywh_box
def displaybbox(mask, xyxy_box, name, category):
'''visualize the contours and bbox for testing purposs'''
height, width = mask.shape
mask*=255 # use the fact that rle bits are either 0 or 1
canvas = np.asarray(Image.fromarray(mask.astype(np.uint8)))
cv2.cvtColor(canvas, cv2.COLOR_GRAY2BGR)
x1,y1,x2,y2 = list(map(int,xyxy_box))
cv2.rectangle(canvas, (x1, y1), (x2, y2), 255 , 1)
cv2.putText(canvas, str(category), (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, 255, 1)
cv2.imwrite('%s.png'%(name),canvas)
if __name__ == '__main__':
main()
We need to split the dataset to two different set: train set and validation set. If the dataset you are working on has a split file, you need to follow that. For kitti tracking, we seperate the sequence [0, 1, 3, 4, 5, 9, 11, 12, 15, 17, 19, 20] for training, and [2, 6, 7, 8, 10, 13, 14, 16, 18] for testing.
cocosplit_modified.py: split MSCOCO's json file
import json
import argparse
import funcy
from tqdm import tqdm
import numpy as np
from sklearn.model_selection import train_test_split
parser = argparse.ArgumentParser(description='Splits COCO annotations file into training and test sets.')
parser.add_argument('--annotations', metavar='coco_annotations', type=str, default="../MOTS_v2_rle.json",
help='Path to COCO annotations file.')
parser.add_argument('--train', type=str, default="../instances_train_sub.json", help='Where to store COCO training annotations')
parser.add_argument('--test', type=str, default="../instances_val_sub.json", help='Where to store COCO test annotations')
parser.add_argument('--having-annotations', dest='having_annotations', action='store_true',
help='Ignore all images without annotations. Keep only these with at least one annotation')
parser.add_argument('--remove-car-label', dest='remove_car_label', action='store_true',
help='Ignore all car annotations. Keep only these are pedestrains')
args = parser.parse_args()
instance_id_counter = 0
def getNewInstanceId():
global instance_id_counter
current_id = instance_id_counter
instance_id_counter += 1
return current_id
def save_coco(file, videos, annotations, categories):
with open(file, 'wt', encoding='UTF-8') as coco:
json.dump({ 'videos': videos, 'annotations': annotations, 'categories': categories}, coco, indent=2, sort_keys=True)
def extract_video_sequence(images, sequence, sequence_id):
imgs = funcy.lfilter(lambda a: int(a['file_name'][:4]) == sequence, images)
ids = funcy.lmap(lambda i: i["id"], imgs)
file_names = funcy.lmap(lambda i: i["file_name"], imgs)
file_names.sort()
ids.sort()
return ids, {"id": sequence_id, "width":imgs[0]["width"], "height":imgs[0]["height"], "length": len(imgs), "file_names": file_names }
def main(args):
with open(args.annotations, 'rt', encoding='UTF-8') as annotations:
coco = json.load(annotations)
images = coco['images']
annotations = coco['annotations']
categories = coco['categories']
person_id, car_id = 1, 2 # hard-coded
train_sequence_split = [0, 1, 4, 9, 11, 12, 13, 15, 17, 19]
test_sequence_split = [2, 7, 10, 14, 16]
# print(images[0])
# {'id': 0, 'height': 375, 'width': 1242, 'file_name': '0000_000000.png', 'seg_file_name': '0000_000000.png'}
# print(annotations[0])
# {'id': 0, 'image_id': 0, 'segmentation': [[1139,..., 176]] or {'size': [int(img_height), int(img_width)], 'counts': rle)}, 'category_id': 1, 'iscrowd': 0, 'area': 4809.0, 'bbox': [1106, 176, 93, 142]}
if args.remove_car_label:
annotations = funcy.lremove(lambda i: int(i['category_id']) == car_id, annotations)
if args.having_annotations:
images_with_annotations = funcy.lmap(lambda a: int(a['image_id']), annotations)
images = funcy.lremove(lambda i: i['id'] not in images_with_annotations, images)
train_video = []
train_ann = []
for sequence_id, sequence in enumerate(tqdm(train_sequence_split),start=1):
ids, video= extract_video_sequence(images,sequence, sequence_id)
train_video.append(video)
anns_in_sequence = funcy.lfilter(lambda a: int(a['image_id']) in ids, annotations)
unique_ann_ids_in_sequence = set(funcy.lmap(lambda i: i["id"], anns_in_sequence))
# print(len(unique_ann_ids_in_sequence),len(anns_in_sequence))
for instance_id in unique_ann_ids_in_sequence:
segmentations = []
areas = []
bboxes = []
labels = []
for id in ids:
a = funcy.lfilter(lambda a: int(a['image_id']) == int(id) and int(a['id']) == int(instance_id), anns_in_sequence)
if a == []:
segmentations.append(None)
areas.append(None)
bboxes.append(None)
else:
segmentations.append(a[0]["segmentation"])
areas.append(a[0]["area"])
bboxes.append(a[0]["bbox"])
labels.append(a[0]['category_id'])
category_id = np.unique(labels)
assert(len(category_id) == 1)
train_ann.append({"id": getNewInstanceId(), "video_id": sequence_id, "category_id": int(category_id[0]), "segmentations": segmentations, "areas": areas, "bboxes": bboxes, "iscrowd": 0})
save_coco(args.train, train_video, train_ann, categories)
test_video = []
test_ann = []
for sequence_id, sequence in enumerate(tqdm(test_sequence_split),start=1):
ids, video= extract_video_sequence(images,sequence, sequence_id)
test_video.append(video)
anns_in_sequence = funcy.lfilter(lambda a: int(a['image_id']) in ids, annotations)
unique_ann_ids_in_sequence = set(funcy.lmap(lambda i: i["id"], anns_in_sequence))
# print(len(unique_ann_ids_in_sequence),len(anns_in_sequence))
for instance_id in unique_ann_ids_in_sequence:
segmentations = []
areas = []
bboxes = []
labels = []
for id in ids:
a = funcy.lfilter(lambda a: int(a['image_id']) == int(id) and int(a['id']) == int(instance_id), anns_in_sequence)
if a == []:
segmentations.append(None)
areas.append(None)
bboxes.append(None)
else:
segmentations.append(a[0]["segmentation"])
areas.append(a[0]["area"])
bboxes.append(a[0]["bbox"])
labels.append(a[0]['category_id'])
category_id = np.unique(labels)
assert(len(category_id) == 1)
test_ann.append({"id": getNewInstanceId(), "video_id": sequence_id, "category_id": int(category_id[0]), "segmentations": segmentations, "areas": areas, "bboxes": bboxes, "iscrowd": 0})
save_coco(args.test, test_video, test_ann, categories)
# print("Saved {} entries in {} and {} in {}".format(len(img_train), args.train, len(img_test), args.test))
if __name__ == "__main__":
main(args)
Then you can put the dataset and label at data folder according to the MaskTrackRCNN repo:
mmdetection
├── mmdet
├── tools
├── configs
├── data
│ ├── train
│ ├── val
│ ├── annotations
│ │ ├── instances_train_sub.json
│ │ ├── instances_val_sub.json
For images that could take plenty of space, it is usually stored at /data forder in the server, and we do not want to have multiple copy of them. Thus we can create soft link for the files/folders. Then command is ln -s path/to/source path/to/dest
.