Paper with code

Paper with code

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Unbiased Multi-Modality Guidance for Image Inpainting

yeates/MMT • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 25 Aug 2022

Image inpainting is an ill-posed problem to recover missing or damaged image content based on incomplete images with masks.

Skeleton Prototype Contrastive Learning with Multi-Level Graph Relation Modeling for Unsupervised Person Re-Identification

Lastly, we propose a skeleton prototype contrastive learning scheme that clusters feature-correlative instances of unlabeled graph representations and contrasts their inherent similarity with representative skeleton features («skeleton prototypes») to learn discriminative skeleton representations for person re-ID.

Polarimetric Inverse Rendering for Transparent Shapes Reconstruction

We build a polarization dataset for multi-view transparent shapes reconstruction to verify our method.

Data-driven Predictive Tracking Control based on Koopman Operators

We seek to combine the nonlinear modeling capabilities of a wide class of neural networks with the safety guarantees of model predictive control (MPC) in a rigorous and online computationally tractable framework.

Learning to Construct 3D Building Wireframes from 3D Line Clouds

luo1cheng/lc2wf • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 25 Aug 2022

Line clouds, though under-investigated in the previous work, potentially encode more compact structural information of buildings than point clouds extracted from multi-view images.

Deep Learning-based approaches for automatic detection of shell nouns and evaluation on WikiText-2

All discovered shell nouns as well as pre-trained models and code are available on GitHub.

Identity-Sensitive Knowledge Propagation for Cloth-Changing Person Re-identification

kimbingng/deskpro • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 25 Aug 2022

To mitigate the resolution degradation issue and mine identity-sensitive cues from human faces, we propose to restore the missing facial details using prior facial knowledge, which is then propagated to a smaller network.

Refine and Represent: Region-to-Object Representation Learning

kkallidromitis/r2o • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 25 Aug 2022

Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives.

An Empirical Analysis of the Efficacy of Different Sampling Techniques for Imbalanced Classification

There are numerous approaches to dealing with imbalanced data, but the efficacy of such techniques or an experimental comparison among those techniques has not been conducted.

Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data

A new interpretability technique has been developed to identify the important speech & image features leading to the prediction of particular emotion classes.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

DynaVINS: A Visual-Inertial SLAM for Dynamic Environments

Then, a keyframe grouping and a multi-hypothesis-based constraints grouping methods are proposed to reduce the effect of temporarily static objects in the loop closing.

I Learn to Diffuse, or Data Alchemy 101: a Mnemonic Manifesto

alembics/disco-diffusion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 8 Aug 2022

In this manifesto, we put forward the idea of data alchemy as a narrative device to discuss storytelling and transdisciplinarity in visualization.

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

jaywalnut310/vits • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jun 2021

Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel sampling have been proposed, but their sample quality does not match that of two-stage TTS systems.

Sound Audio and Speech Processing

Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

PaddlePaddle/PaddleSpeech • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• Interspeech2020 2020

In this paper, we propose multi-band MelGAN, a much faster waveform generation model targeting to high-quality text-to-speech.

Sound Audio and Speech Processing

MPro: Combining Static and Symbolic Analysis for Scalable Testing of Smart Contract

We have implemented our technique in a tool called MPro, a scalable and automated smart contract analyzer based on the existing symbolic analysis tool Mythril-Classic and the static analysis tool Slither.

Cryptography and Security

SLAM-Supported Self-Training for 6D Object Pose Estimation

520xyxyzq/slam-super-6d • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 8 Mar 2022

Combining the pose predictions with robot odometry, we formulate and solve pose graph optimization to refine the object pose estimates and make pseudo labels more consistent across frames.

stdgpu: Efficient STL-like Data Structures on the GPU

Tremendous advances in parallel computing and graphics hardware opened up several novel real-time GPU applications in the fields of computer vision, computer graphics as well as augmented reality (AR) and virtual reality (VR).

Distributed, Parallel, and Cluster Computing Graphics

A lightweight design for serverless Function-as-a-Service

FaaS (Function as a Service) allows developers to upload and execute code in the cloud without managing servers.

Distributed, Parallel, and Cluster Computing

Robust Real-time LiDAR-inertial Initialization

We implement the proposed method as an initialization module, which, if enabled, automatically detects the degree of excitation of the collected data and calibrate, on-the-fly, the temporal offset, extrinsic, gravity vector, and IMU bias, which are then used as high-quality initial state values for real-time LiDAR-inertial odometry systems.

Million.js: A Fast, Compiler-Augmented Virtual DOM for Performant JavaScript UI Libraries

The need for developing and delivering interactive web applications has grown rapidly.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Language models enable zero-shot prediction of the effects of mutations on protein function

facebookresearch/esm • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• NeurIPS 2021

Modeling the effect of sequence variation on function is a fundamental problem for understanding and designing proteins.

Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

compvis/latent-diffusion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 26 Jul 2022

In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples.

Musika! Fast Infinite Waveform Music Generation

We release the source code and pretrained autoencoder weights at github. com/marcoppasini/musika, such that a GAN can be trained on a new music domain with a single GPU in a matter of hours.

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

PaddlePaddle/Paddle3D • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• CVPR 2021

The key of PAConv is to construct the convolution kernel by dynamically assembling basic weight matrices stored in Weight Bank, where the coefficients of these weight matrices are self-adaptively learned from point positions through ScoreNet.

YOLOV: Making Still Image Object Detectors Great at Video Object Detection

yuhengsss/yolov • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 20 Aug 2022

On the positive side, the detection in a certain frame of a video, compared with in a still image, can draw support from other frames.

towhee

towhee-io/towhee • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Evaluating Large Language Models Trained on Code

We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities.

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

hpcaitech/colossalai • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 28 Oct 2021

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing.

Pix4Point: Image Pretrained Transformers for 3D Point Cloud Understanding

In the realm of 3D point clouds, the availability of large datasets is a challenge, which exacerbates the issue of training Transformers for 3D tasks.

A ConvNet for the 2020s

lucidrains/denoising-diffusion-pytorch • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• CVPR 2022

The «Roaring 20s» of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Our Mission

The mission of Papers with Code is to create a free and open resource with Machine Learning papers, code, datasets, methods and evaluation tables.

We believe this is best done together with the community, supported by NLP and ML.

We also operate specialized portals for papers with code in astronomy, physics, computer sciences, mathematics and statistics.

Joining the community

Join our community of thousands of contributors across academia and industry!

You can also follow us and get in touch on

Contributing

Want to submit a new code implementation? Search for the paper title, and then add the implementation on the paper page.

If you are running a competition, you can mirror the competition results on Papers with Code.

Please note that any contribution you make (i.e. linking code or submitting results) will be licensed under the free CC BY-SA licence.

Inclusion policy

To ensure high quality of data, all edits are monitored on Slack on the #recentchanges channel. This is an open channel and everyone is invited to follow and review contributions.

For a result to be included as a benchmark result we require that the paper is published as pre-print, in a conference or a journal. Having code is strongly encouraged but not required so we can capture the latest published results even before the code has been released.

Downloading the data

All data is licenced under the CC BY-SA licence, same as Wikipedia.

Additional data sources

The vast majority of the data is either annotated by the community or ourselves. However, we also included data from other resources that are published under a compatible licence, such as NLP-progress, EFF AI metrics, SQuAD and RedditSota.

More information on what has been included and how, please see the paperswithcode/sota-extractor repository.

The core Papers with Code team is based in Meta AI Research. The service has been created and is managed by Robert, Ross, Marcin, Elvis, Guillem, Andrew and Thomas.

Papers with Code is a community project. No data is shared with any Meta Platforms product.

All contributions are welcome!

Careers

Interested at working at Papers with Code? Visit our careers page for open positions.

Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments

We introduce a novel federated learning framework, FedD3, which reduces the overall communication volume and with that opens up the concept of federated learning to more application scenarios in network-constrained environments.

FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning

(FL) has recently attracted increasing attention from academia and industry, with the ultimate goal of achieving collaborative training under privacy and communication constraints.

Improving Diffusion Model Efficiency Through Patching

crowsonkb/k-diffusion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 9 Jul 2022

Diffusion models are a powerful class of generative models that iteratively denoise samples to produce data.

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks.

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

timdettmers/bitsandbytes • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 15 Aug 2022

We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance.

TotalSegmentator: robust segmentation of 104 anatomical structures in CT images

wasserth/totalsegmentator • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Aug 2022

Finally, we train a segmentation algorithm on this new dataset.

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

alpa-projects/alpa • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 28 Jan 2022

Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.

LMOT: Efficient Light-Weight Detection and Tracking in Crowds

RanaMostafaAbdElMohsen/LMOT • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• IEEE Access 2022

This paper introduces a novel real-time model, LMOT, i. e., Light-weight Multi-Object Tracker, that performs joint pedestrian detection and tracking.

Explaining Explanations: Axiomatic Feature Interactions for Deep Networks

cdpierse/transformers-interpret • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 10 Feb 2020

Integrated Hessians overcomes several theoretical limitations of previous methods to explain interactions, and unlike such previous methods is not limited to a specific architecture or class of neural network.

Topology-aware Convolutional Neural Network for Efficient Skeleton-based Action Recognition

hikvision-research/skelact • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 8 Dec 2021

In particular, we develop a novel cross-channel feature augmentation module, which is a combo of map-attend-group-map operations.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Relighting4D: Neural Relightable Human from Videos

frozenburning/relighting4d • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 14 Jul 2022

Our key insight is that the space-time varying geometry and reflectance of the human body can be decomposed as a set of neural fields of normal, occlusion, diffuse, and specular maps.

Gender Classification and Bias Mitigation in Facial Images

We worked to increase classification accuracy and mitigate algorithmic biases on our baseline model trained on the augmented benchmark database.

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

fundamentalvision/Uni-Perceiver • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 9 Jun 2022

To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models.

Contrastive Audio-Language Learning for Music

In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain.

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

xinntao/Real-ESRGAN • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 22 Jul 2021

Though many attempts have been made in blind super-resolution to restore low-resolution images with unknown and complex degradations, they are still far from addressing general real-world degraded images.

ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking

alexa/refined • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• NAACL (ACL) 2022

The model is capable of generalising to large-scale knowledge bases such as Wikidata (which has 15 times more entities than Wikipedia) and of zero-shot entity linking.

Neural-Symbolic Models for Logical Queries on Knowledge Graphs

DeepGraphLearning/GNN-QE • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• ICML 2022

Answering complex first-order logic (FOL) queries on knowledge graphs is a fundamental task for multi-hop reasoning.

Unbiased Multi-Modality Guidance for Image Inpainting

yeates/MMT • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 25 Aug 2022

Image inpainting is an ill-posed problem to recover missing or damaged image content based on incomplete images with masks.

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

nvidiagameworks/kaolin-wisp • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 16 Jan 2022

Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate.

PSA-GAN: Progressive Self Attention GANs for Synthetic Time Series

awslabs/gluon-ts • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• ICLR 2022

Realistic synthetic time series data of sufficient length enables practical applications in time series modeling tasks, such as forecasting, but remains a challenge.

Ivy: Templated Deep Learning for Inter-Framework Portability

ivy-dl/ivy • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 4 Feb 2021

We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.

Neural Volume Rendering: NeRF And Beyond

yenchenlin/awesome-NeRF • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 17 Dec 2020

Besides the COVID-19 pandemic and political upheaval in the US, 2020 was also the year in which neural volume rendering exploded onto the scene, triggered by the impressive NeRF paper by Mildenhall et al. (2020).

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

zhiqi-li/BEVFormer • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 31 Mar 2022

In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.

Patchwork++: Fast and Robust Ground Segmentation Solving Partial Under-Segmentation Using 3D Point Cloud

Moreover, even if the parameters are well adjusted, a partial under-segmentation problem can still emerge, which implies ground segmentation failures in some regions.

Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting

awslabs/gluon-ts • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Dec 2019

A Simple Baseline for Multi-Camera 3D Object Detection

First, we extract multi-scale features and generate the perspective object proposals on each monocular image.

Learning Visibility for Robust Dense Human Body Estimation

chhankyao/visdb • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

An alternative approach is to estimate dense vertices of a predefined template body in the image space.

Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer

Large-scale deployment of autonomous vehicles has been continually delayed due to safety concerns.

pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning

alibaba/federatedscope • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 8 Jun 2022

Personalized Federated Learning (pFL), which utilizes and deploys distinct local models, has gained increasing attention in recent years due to its success in handling the statistical heterogeneity of FL clients.

Fewer is More: A Deep Graph Metric Learning Perspective Using Fewer Proxies

YuehuaZhu/ProxyGML • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• NeurIPS 2020

In this paper, we propose a novel Proxy-based deep Graph Metric Learning (ProxyGML) approach from the perspective of graph classification, which uses fewer proxies yet achieves better comprehensive performance.

Papers with code

Pinned

The SOTA extractor pipeline

Tools for extracting tables and results from Machine Learning papers

The full dataset behind paperswithcode.com

⏰ AI conference deadline countdowns

API Client for paperswithcode.com

Basic guidance on how to contribute to Papers with Code

⏰ AI conference deadline countdowns

2 Updated Aug 18, 2022

Tips for releasing research code in Machine Learning (with official NeurIPS 2020 recommendations)

1 Updated Aug 9, 2022

Basic guidance on how to contribute to Papers with Code

0 Updated Mar 28, 2022

The SOTA extractor pipeline

0 Updated Mar 9, 2022

API Client for paperswithcode.com

0 Updated Dec 2, 2021

The full dataset behind paperswithcode.com

0 Updated Oct 8, 2021

Tools for extracting tables and results from Machine Learning papers

0 Updated Jun 23, 2021

Create a source of truth for ML model results and browse it on Papers with Code

0 Updated Jun 9, 2021

Easily evaluate machine learning models on public benchmarks

1 Updated May 21, 2021

Easily benchmark machine learning models in PyTorch

1 Updated Dec 19, 2020

People

Top languages

Most used topics

Footer

© 2022 GitHub, Inc.

You can’t perform that action at this time.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.

A deep learning approach to predict the number of k-barriers for intrusion detection over a circular region using wireless sensor networks

One of the crucial applications of WSNs is intrusion detection and surveillance at the border areas and in the defense establishments.

Efficient Truncated Linear Regression with Unknown Noise Variance

In this paper, we provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown, estimating both the linear model and the variance of the noise.

Learning Relational Causal Models with Cycles through Relational Acyclification

We introduce \textit, an operation specifically designed for relational models that enables reasoning about the identifiability of cyclic relational causal models.

Combining phylogeny and coevolution improves the inference of interaction partners among paralogous proteins

We show that these two signals can be combined to improve the performance of the inference of interaction partners among paralogs.

Adjoint Optimisation for Wind Farm Flow Control with a Free-Vortex Wake Model

The free-vortex wake model with gradient information shows potential for efficient optimisation and provides a promising way to further explore dynamic wind farm flow control.

DPTDR: Deep Prompt Tuning for Dense Passage Retrieval

tangzhy/dptdr • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

We believe this work facilitates the industry, as it saves enormous efforts and costs of deployment and increases the utility of computing resources.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

Self-Supervised Endoscopic Image Key-Points Matching

abenhamadou/Self-Supervised-Endoscopic-Image-Key-Points-Matching • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

Feature matching and finding correspondences between endoscopic images is a key step in many clinical applications such as patient follow-up and generation of panoramic image from clinical sequences for fast anomalies localization.

Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation

lashoun/hanna-benchmark-asg • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

However, there is no consensus on which human evaluation criteria to use, and no analysis of how well automatic criteria correlate with them.

Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

Their feedback identified that Gradio should support a variety of interfaces and frameworks, allow for easy sharing of the interface, allow for input manipulation and interactive inference by the domain expert, as well as allow embedding the interface in iPython notebooks.

Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation

ZhangGongjie/Meta-DETR • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 30 Jul 2022

Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes.

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

thudm/cogvideo • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 May 2022

Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation.

Papers with Code Newsletter #3

👋🏻 Welcome to the 3rd issue of the Papers with Code newsletter! In this edition, we h ighlight 10 novel applications of Transformers, a new leader on the ImageNet leaderboard, a state-of-the-art BERT model, a trillion parameter language model, a novel pooling method, and much more.

Trending Papers with Code 📄

Meta Pseudo Labels [CV]

Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code
Denotes the difference between Pseudo Labels (left) and Meta Pseudo Labels (right). Meta Pseudo Labels shows the teacher being trained along with the student. (Figure source: Pham et al. (2020))

Scaling to Trillion Parameter NLP Models [ NLP]

Large scale neural language models have been used to obtain strong performance on a range of NLP tasks, but they are computationally intensive! Mixture of Experts (MoE) models are one approach to scale models through sparse activations, but their use is hindered by communication cost and training instabilities.

Why it matters: Fedus et al. propose the Switch Transformer (based on T5 ) which simplifies the MoE routing algorithm resulting in a model that scales pre-training to a trillion parameters (!). It achieves greater computational efficiency to support scaling on three NLP regimes: pre-training, fine-tuning, and multi-task training. It claims a 7x speedup in pre-training with the same computational resources used by T5 variants. Authors also report improvements on multilingual data across 101 languages and enhanced distilled models.

Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code
Illustration of a Switch Transformer encoder block where the router independently routes each token across four FFNs. (Figure source: Fedus et al. (2020))

DeBERTa sits atop the SuperGLUE Benchmark [NLP]

There are ongoing efforts to improve the generalization and efficiency of language models like BERT and RoBERTa. DeBERTa ( D ecoding e nhanced BERT with disentangled a ttention) is a new architecture based on a disentangled attention mechanism and an enhanced mask decoder.

What’s new: DeBERTa calculates attention weights for words using disentangled matrices of word content and relative positions. The architecture also makes use of the absolute position of words, after all Transformer layers and before the softmax layer, as complementary information to decode masked words. DeBERTA is pre-trained using masked language modeling and fine-tuned using a new virtual adversarial training method. The result is improved performance on many downstream tasks. It is currently the top-performing model on SuperGLUE.

Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code

Comparison of the decoding layer of vanilla BERT (left) and proposed Enhanced Mask Decoder (right).

Making VGG-style ConvNets Great Again [CV]

Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code

Architecture of RepVGG at inference-time (B) and training-time (C). (Figure source: Ding et al. (2020))

A New Pooling Method for CNN Architectures [DL]

Pooling methods in CNNs decrease the size of the activation maps helping to achieve spatial invariance and increase the receptive field. This paper seeks to improve pooling methods by minimizing loss of information while keeping the memory and computation overhead limited.

Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code

Overview of SoftPool operation. (Figure source: Stergiou et al. (2020) )

10 Novel Applications using Transformers [DL]

Transformers have had a lot of success in training neural language models. In the past few weeks, we’ve seen several trending papers with code applying Transformers to new types of task:

Trending Libraries and Datasets 🛠

Trending libraries of the week:

Trending with 196 ★

Trending with 190 ★

Trending with 146 ★

Trending with 292 ★

Trending with 252 ★

Trending datasets of the week:

Trending with 51 ★

Trending with 306 ★

Community Highlights ✍️

The following are some of the community highlights for this week:

Special thanks to users @htvr, @tienduang, @rrafikova, @donovanOng, @maksymets, @humamalwassel, @zhaochengqi and hundreds of other contributors for several contributions to Papers with Code tasks, methods, and benchmarks results.

More from PwC 🗣

Deep Hyperspectral and Multispectral Image Fusion with Inter-image Variability

The fusion problem is stated as an optimization problem in the maximum a posteriori framework.

Towards an Awareness of Time Series Anomaly Detection Models’ Adversarial Vulnerability

shahroztariq/adversarial-attacks-on-timeseries • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

To the best of our understanding, we demonstrate, for the first time, the vulnerabilities of anomaly detection systems against adversarial attacks.

Semi-supervised Semantic Segmentation with Mutual Knowledge Distillation

jianlong-yuan/semi-mmseg • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

Consistency regularization has been widely studied in recent semi-supervised semantic segmentation methods.

aidotse/stylegan2-ada-pytorch • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The lack of sufficiently large open medical databases is one of the biggest challenges in AI-powered healthcare.

Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments

We introduce a novel federated learning framework, FedD3, which reduces the overall communication volume and with that opens up the concept of federated learning to more application scenarios in network-constrained environments.

Addressing Token Uniformity in Transformers via Singular Value Transformation

hanqi-qi/tokenuni • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

In this paper, we propose to use the distribution of singular values of outputs of each transformer layer to characterise the phenomenon of token uniformity and empirically illustrate that a less skewed singular value distribution can alleviate the `token uniformity’ problem.

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

However, CL on VQA involves not only the expansion of label sets (new Answer sets).

A Deep Learning Approach Using Masked Image Modeling for Reconstruction of Undersampled K-spaces

Aopsmath99/MIMMRI • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The model was evaluated through L1 loss, gradient normalization, and structural similarity values.

EEG4Students: An Experimental Design for EEG Data Collection and Machine Learning Analysis

However, during the COVID-19 pandemic, data collection and analysis could be more challenging.

Applying Eigencontours to PolarMask-Based Instance Segmentation

dnjs3594/Eigencontours • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

Eigencontours are the first data-driven contour descriptors based on singular value decomposition.

Learning Sub-Pixel Disparity Distribution for Light Field Depth Estimation

In our method, we construct the cost volume at sub-pixel level to produce a finer depth distribution and design an uncertainty-aware focal loss to supervise the disparity distribution to be close to the groundtruth one.

A Multi-Head Model for Continual Learning via Out-of-Distribution Replay

k-gyuhak/more • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 20 Aug 2022

Instead of using the saved samples in memory to update the network for previous tasks/classes in the existing approach, MORE leverages the saved samples to build a task specific classifier (adding a new classification head) without updating the network learned for previous tasks/classes.

PARSE challenge 2022: Pulmonary Arteries Segmentation using Swin U-Net Transformer(Swin UNETR) and U-Net

akansh12/parse2022 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 20 Aug 2022

In this work, we present our proposed method to segment the pulmonary arteries from the CT scans using Swin UNETR and U-Net-based deep neural network architecture.

Quo Vadis: Hybrid Machine Learning Meta-Model based on Contextual and Behavioral Malware Representations

dtrizna/quo.vadis • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 20 Aug 2022

The detection heuristic in contemporary machine learning Windows malware classifiers is typically based on the static properties of the sample since dynamic analysis through virtualization is challenging for vast quantities of samples.

UniCausal: Unified Benchmark and Model for Causal Text Mining

Therefore, we proposed UniCausal, a unified benchmark for causal text mining across three tasks: Causal Sequence Classification, Cause-Effect Span Detection and Causal Pair Classification.

Curbing Task Interference using Representation Similarity-Guided Multi-Task Feature Sharing

neurai-lab/progressivedecoderfusion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

However, increased sharing exposes more parameters to task interference which likely hinders both generalization and robustness.

Simulation-Informed Revenue Extrapolation with Confidence Estimate for Scaleup Companies Using Scarce Time-Series Data

Investment professionals rely on extrapolating company revenue into the future (i. e. revenue forecast) to approximate the valuation of scaleups (private companies in a high-growth stage) and inform their investment decision.

Dialogue Policies for Confusion Mitigation in Situated HRI

Confusion is a mental state triggered by cognitive disequilibrium that can occur in many types of task-oriented interaction, including Human-Robot Interaction (HRI).

Evaluating Explainability for Graph Neural Networks

mims-harvard/graphxai • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations.

Diverse Video Captioning by Adaptive Spatio-temporal Attention

zohrehghaderi/vasta • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

To generate proper captions for videos, the inference needs to identify relevant concepts and pay attention to the spatial relationships between them as well as to the temporal development in the clip.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression

We introduce a new empirical Bayes approach for large-scale multiple linear regression.

gemslab/caper • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

Network alignment, or the task of finding corresponding nodes in different networks, is an important problem formulation in many application domains.

A Constrained Deformable Convolutional Network for Efficient Single Image Dynamic Scene Blind Deblurring with Spatially-Variant Motion Blur Kernels Estimation

Most existing deep-learning-based single image dynamic scene blind deblurring (SIDSBD) methods usually design deep networks to directly remove the spatially-variant motion blurs from one inputted motion blurred image, without blur kernels estimation.

Unsupervised Question Answering via Answer Diversifying

Previous works usually make use of heuristic rules as well as pre-trained models to construct data and train QA models.

Neural PCA for Flow-Based Representation Learning

Multi-Modal Representation Learning with Self-Adaptive Thresholds for Commodity Verification

hanchenchen/ccks2022-track2-solution • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

In this paper, we propose a method to identify identical commodities.

Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization

kaminyou/urust • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

Hence, we proposed a strategy for ultra-high-resolution unpaired image-to-image translation: Kernelized Instance Normalization (KIN), which preserves local information and successfully achieves seamless stain transformation with constant GPU memory usage.

Retinal Structure Detection in OCTA Image via Voting-based Multi-task Learning

imed-lab/vaff-net • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

Automated detection of retinal structures, such as retinal vessels (RV), the foveal avascular zone (FAZ), and retinal vascular junctions (RVJ), are of great importance for understanding diseases of the eye and clinical decision-making.

AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

The homepage of this challenge is at https://github. com/RenYang-home/AIM22_CompressSR.

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild

We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.

Consistency Regularization for Domain Adaptation

kw01sg/crda • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

Collection of real world annotations for training semantic segmentation models is an expensive process.

Prompting as Probing: Using Language Models for Knowledge Base Construction

ProP implements a multi-step approach that combines a variety of prompting techniques to achieve this.

DeepInteraction: 3D Object Detection via Modality Interaction

Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy.

VILT: Video Instructions Linking for Complex Tasks

This work addresses challenges in developing conversational assistants that support rich multimodal video interactions to accomplish real-world tasks interactively.

Adversarial Feature Augmentation for Cross-domain Few-shot Classification

youthhoo/afa_for_few_shot_learning • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

Existing methods based on meta-learning predict novel-class labels for (target domain) testing tasks via meta knowledge learned from (source domain) training tasks of base classes.

Bitext Mining for Low-Resource Languages via Contrastive Learning

steventan0110/align-filter • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

Mining high-quality bitexts for low-resource languages is challenging.

Inter- and Intra-Series Embeddings Fusion Network for Epidemiological Forecasting

xiefeng69/sefnet • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

In Inter-Series Embedding Module, a multi-scale unified convolution component called Region-Aware Convolution is proposed, which cooperates with self-attention to capture dynamic dependencies between time series obtained from multiple regions.

Collective targeted migrations: a balancing act involving aggregation, group size and environmental clues: a simulation study

What is behind the \emph described by Simons (2004)?

Solving Royal Game of Ur Using Reinforcement Learning

Reinforcement Learning has recently surfaced as a very powerful tool to solve complex problems in the domain of board games, wherein an agent is generally required to learn complex strategies and moves based on its own experiences and rewards received.

Survey of Machine Learning Techniques To Predict Heartbeat Arrhythmias

Many works in biomedical computer science research use machine learning techniques to give accurate results.

ExpoCloud: a Framework for Time and Budget-Effective Parameter Space Explorations Using a Cloud Compute Engine

Large parameter space explorations are among the most time consuming yet critically important tasks in many fields of modern research.

Distributed, Parallel, and Cluster Computing

PREVENT: An Unsupervised Approach to Predict Software Failures in Production

This paper presents PREVENT, an approach for predicting and localizing failures in distributed enterprise applications by combining unsupervised techniques.

Synthetic End-User Testing: Modeling Realistic Agents Based on Behavioral Examples

For software interacting directly with real-world end-users, it is common practice to script scenario tests validating the system’s compliance with a number of its features.

Motif-Based Visual Analysis of Dynamic Networks

The network census captures significantly occurring motifs compared to their expected occurrences in random networks and exposes structural changes in a dynamic network.

Social and Information Networks Human-Computer Interaction

Ctrl-VIO: Continuous-Time Visual-Inertial Odometry for Rolling Shutter Cameras

In this paper, we propose a probabilistic continuous-time visual-inertial odometry (VIO) for rolling shutter cameras.

DynaVINS: A Visual-Inertial SLAM for Dynamic Environments

Then, a keyframe grouping and a multi-hypothesis-based constraints grouping methods are proposed to reduce the effect of temporarily static objects in the loop closing.

Collective Intelligence in Human-AI Teams: A Bayesian Theory of Mind Approach

In this paper, we develop a network of Bayesian agents that collectively model a team’s mental states from the team’s observed communication.

What are the Practices for Secret Management in Software Artifacts?

The goal of our paper is to aid practitioners in avoiding the exposure of secrets by identifying secret management practices in software artifacts through a systematic derivation of practices disseminated in Internet artifacts.

Software Engineering Cryptography and Security 68-01

SAP Signavio Academic Models: A Large Process Model Dataset

In this paper, we introduce the SAP Signavio Academic Models (SAP-SAM) dataset, a collection of hundreds of thousands of business models, mainly process models in BPMN notation.

Other Computer Science Software Engineering

An open dataset of scholars on Twitter

This paper presents a novel and simple approach to match authors from OpenAlex with Twitter users identified in Crossref Event Data.

A novel approach for Fair Principal Component Analysis based on eigendecomposition

Principal component analysis (PCA), a ubiquitous dimensionality reduction technique in signal processing, searches for a projection matrix that minimizes the mean squared error between the reduced dataset and the original one.

A Bayesian Variational principle for dynamic Self Organizing Maps

anthony-neo/vdsom • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

We propose organisation conditions that yield a method for training SOM with adaptative neighborhood radius in a variational Bayesian framework.

Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution

The proposed nearest convolution has the same performance as the nearest upsampling but is much faster and more suitable for Android NNAPI.

DCSF: Deep Convolutional Set Functions for Classification of Asynchronous Time Series

Because of the asynchronous nature, they pose a significant challenge to deep learning architectures, which presume that the time series presented to them are regularly sampled, fully observed, and aligned with respect to time.

Unrestricted Black-box Adversarial Attack Using GAN with Limited Queries

ndb796/latenthsja • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

First, we demonstrate that our targeted attack method is query-efficient to produce unrestricted adversarial examples for a facial identity recognition model that contains 307 identities.

Self-Filtering: A Noise-Aware Sample Selection for Label Noise with Confidence Penalization

1998v7/self-filtering • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 24 Aug 2022

Sample selection is an effective strategy to mitigate the effect of label noise in robust learning.

Sliding Window Recurrent Network for Efficient Video Super-Resolution

Different from single image super-resolution, VSR can utilize frames’ temporal information to reconstruct results with more details.

RZSR: Reference-based Zero-Shot Super-Resolution with Depth Guided Self-Exemplars

To advance ZSSR, we obtain reference image patches with rich textures and high-frequency details which are also extracted only from the input image using cross-scale matching.

Tracking by weakly-supervised learning and graph optimization for whole-embryo C. elegans lineages

Our work specifically addresses the following challenging properties of C. elegans embryo recordings: (1) Many cell divisions as compared to benchmark recordings of other organisms, and (2) the presence of polar bodies that are easily mistaken as cell nuclei.

On the Design of Privacy-Aware Cameras: a Study on Deep Neural Networks

At the same time, we ensure that useful non-sensitive data can still be extracted from distorted images.

Multi-domain Learning for Updating Face Anti-spoofing Models

In this work, we study multi-domain learning for face anti-spoofing(MD-FAS), where a pre-trained FAS model needs to be updated to perform equally well on both source and target domains while only using target domain data for updating.

Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents

Tables are widely used in several types of documents since they can bring important information in a structured way.

Exact Penalty Method for Federated Learning

Federated learning has burgeoned recently in machine learning, giving rise to a variety of research topics.

Enhancing User Behavior Sequence Modeling by Generative Tasks for Session Search

haon-chen/ase-official • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

To help the encoding of the current user behavior sequence, we propose to use a decoder and the information of future sequences and a supplemental query.

Distance-Aware Occlusion Detection with Focused Attention

yang-li-2000/distance-aware-occlusion-detection-with-focused-attention • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

In this work, (1) we propose a novel three-decoder architecture as the infrastructure for focused attention; 2) we use the generalized intersection box prediction task to effectively guide our model to focus on occlusion-specific regions; 3) our model achieves a new state-of-the-art performance on distance-aware relationship detection.

IMPaSh: A Novel Domain-shift Resistant Representation for Colorectal Cancer Tissue Classification

trinhvg/impash • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

The appearance of histopathology images depends on tissue type, staining and digitization procedure.

Hierarchical Perceptual Noise Injection for Social Media Fingerprint Privacy Protection

nlsde-safety-team/fingersafe • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

The threat of fingerprint leakage from social media raises a strong desire for anonymizing shared images while maintaining image qualities, since fingerprints act as a lifelong individual biometric password.

Data augmentation on graphs for table type classification

Tables are widely used in documents because of their compact and structured representation of information.

Multimodal Across Domains Gaze Target Detection

francescotonini/multimodal-across-domains-gaze-target-detection • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 23 Aug 2022

This paper addresses the gaze target detection problem in single images captured from the third-person perspective.

SurvSHAP(t): Time-dependent explanations of machine learning survival models

Experiments on synthetic and medical data confirm that SurvSHAP(t) can detect variables with a time-dependent effect, and its aggregation is a better determinant of the importance of variables for a prediction than SurvLIME.

Inductive Knowledge Graph Reasoning for Multi-batch Emerging Entities

We propose a walk-based inductive reasoning model to tackle the new setting.

Incorporating Rivalry in Reinforcement Learning for a Competitive Game

Recent advances in reinforcement learning with social agents have allowed such models to achieve human-level performance on specific interaction tasks.

A simple learning agent interacting with an agent-based market model

We consider the learning dynamics of a single reinforcement learning optimal execution trading agent when it interacts with an event driven agent-based financial market model.

FairDisCo: Fairer AI in Dermatology via Disentanglement Contrastive Learning

siyi-wind/FairDisCo • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 22 Aug 2022

Deep learning models have achieved great success in automating skin lesion diagnosis.

Deep 3D Vessel Segmentation based on Cross Transformer Network

qibaolian/ctn • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 22 Aug 2022

In CTN, a transformer module is constructed in parallel to a U-Net to learn long-distance dependencies between different anatomical regions; and these dependencies are communicated to the U-Net at multiple stages to endow it with global awareness.

Semi-supervised classification using a supervised autoencoder for biomedical applications

cypriengille/semi-supervised-autoencoder • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 22 Aug 2022

Experiments show that the SSAE outperforms Label Propagation and Spreading and the Fully Connected Neural Network both on a synthetic dataset and on two real-world biological datasets.

FedOS: using open-set learning to stabilize training in federated learning

mohamad-m2/federated-learning • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 22 Aug 2022

Federated Learning is a recent approach to train statistical models on distributed datasets without violating privacy constraints.

Dynamic Adaptive Threshold based Learning for Noisy Annotations Robust Facial Expression Recognition

1980x/dnfer • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 22 Aug 2022

To handle noisy annotations, we propose a dynamic FER learning framework (DNFER) in which clean samples are selected based on dynamic class specific threshold during training.

High-quality Task Division for Large-scale Entity Alignment

To include in the EA subtasks a high proportion of the potential mappings originally present in the large EA task, we devise a counterpart discovery method that exploits the locality principle of the EA task and the power of trained EA models.

LTE4G: Long-Tail Experts for Graph Neural Networks

SukwonYun/LTE4G • Paper with code. Смотреть фото Paper with code. Смотреть картинку Paper with code. Картинка про Paper with code. Фото Paper with code• 22 Aug 2022

After having trained an expert for each balanced subset, we adopt knowledge distillation to obtain two class-wise students, i. e., Head class student and Tail class student, each of which is responsible for classifying nodes in the head classes and tail classes, respectively.

Источники:

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *