弱监督时序行为定位算法

我要开发同款
匿名用户2024年07月31日
16阅读
所属分类ai
开源地址https://modelscope.cn/models/zhencang/JCDNet

作品详情

《JCDNet》 for Weakly Supervised Temporal Action Localization

PyTorch Implementation of 'JCDNet: Joint of Common and Definite phases Network for Weakly Supervised Temporal Action Localization'

CDNet: Joint of Common and Definite phases Network for Weakly Supervised Temporal Action Localization
Yifu Liu, Xiaoxia Li, Zhiling Luo, and Wei Zhou

Paper: https://arxiv.org/abs/2303.17294

Abstract: Weakly-supervised temporal action localization aims to localize action instances in untrimmed videos with only video-level supervision. We witness that different actions record common phases, e.g., the run-up in the HighJump and LongJump. These different actions are defined as conjoint actions, whose rest parts are definite phases, e.g., leaping over the bar in a HighJump. Compared with the common phases, the definite phases are more easily localized in existing researches. Most of them formulate this task as a Multiple Instance Learning paradigm, in which the common phases are tended to be confused with the background, and affect the localization completeness of the conjoint actions. To tackle this challenge, we propose a Joint of Common and Definite phases Network (JCDNet) by improving feature discriminability of the conjoint actions. Specifically, we design a Class-Aware Discriminative module to enhance the contribution of the common phases in classification by the guidance of the coarse definite-phase features. Besides, we introduce a temporal attention module to learn robust action-ness scores via modeling temporal dependencies, distinguishing the common phases from the background. Extensive experiments on three datasets (THUMOS14, ActivityNetv1.2, and a conjoint-action subset) demonstrate that JCDNet achieves competitive performance against the state-of-the-art methods.

Prerequisites

Recommended Environment

  • Python 3.6
  • Pytorch 1.6
  • Tensorflow 1.15 (for Tensorboard)
  • CUDA 10.2

Depencencies

You can set up the environments by using

pip install -r requirements.txt

Data Preparation

  1. Prepare THUMOS'14 dataset.

    • We excluded three test videos (270, 1292, 1496) as previous work did.
  2. Extract features with two-stream I3D networks

    • We recommend extracting features using this repo.
    • For convenience, we provide the features we used. You can find them here.
  3. Place the features inside the dataset folder.

    • Please ensure the data structure is as below.
├── dataset
   └── THUMOS14
       ├── gt.json
       ├── split_train.txt
       ├── split_test.txt
       └── features
           ├── train
               ├── rgb
                   ├── video_validation_0000051.npy
                   ├── video_validation_0000052.npy
                   └── ...
               └── flow
                   ├── video_validation_0000051.npy
                   ├── video_validation_0000052.npy
                   └── ...
           └── test
               ├── rgb
                   ├── video_test_0000004.npy
                   ├── video_test_0000006.npy
                   └── ...
               └── flow
                   ├── video_test_0000004.npy
                   ├── video_test_0000006.npy
                   └── ...

Usage

Running

You can easily train and evaluate the model by running the script below.

If you want to try other training options, please refer to options.py.

$ bash run.sh

Citation

If you find this code useful, please cite our paper.

@misc{liu2023jcdnet,
      title={JCDNet: Joint of Common and Definite phases Network for Weakly Supervised Temporal Action Localization}, 
      author={Yifu Liu and Xiaoxia Li and Zhiling Luo and Wei Zhou},
      year={2023},
      eprint={2303.17294},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论