(a) High-quality videos are filtered by metadata, image quality, scene cuts, and camera motion detection. (b) Panoptic segmentation identify motionable objects from key frames. (c) Robust tracking with CoTracker3 ensures trajectory accuracy, followed by bounding box regression to define Frame In or Frame Out cases. (d) Bounding Boxes with arbitrary aspect ratio and size is randomly regressively generated to find ideal parition between the First Frame and the Canvas Region for training.