Paper with code
Paper with code
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Unbiased Multi-Modality Guidance for Image Inpainting
yeates/MMT • • 25 Aug 2022
Image inpainting is an ill-posed problem to recover missing or damaged image content based on incomplete images with masks.
Skeleton Prototype Contrastive Learning with Multi-Level Graph Relation Modeling for Unsupervised Person Re-Identification
Lastly, we propose a skeleton prototype contrastive learning scheme that clusters feature-correlative instances of unlabeled graph representations and contrasts their inherent similarity with representative skeleton features («skeleton prototypes») to learn discriminative skeleton representations for person re-ID.
Polarimetric Inverse Rendering for Transparent Shapes Reconstruction
We build a polarization dataset for multi-view transparent shapes reconstruction to verify our method.
Data-driven Predictive Tracking Control based on Koopman Operators
We seek to combine the nonlinear modeling capabilities of a wide class of neural networks with the safety guarantees of model predictive control (MPC) in a rigorous and online computationally tractable framework.
Learning to Construct 3D Building Wireframes from 3D Line Clouds
luo1cheng/lc2wf • • 25 Aug 2022
Line clouds, though under-investigated in the previous work, potentially encode more compact structural information of buildings than point clouds extracted from multi-view images.
Deep Learning-based approaches for automatic detection of shell nouns and evaluation on WikiText-2
All discovered shell nouns as well as pre-trained models and code are available on GitHub.
Identity-Sensitive Knowledge Propagation for Cloth-Changing Person Re-identification
kimbingng/deskpro • • 25 Aug 2022
To mitigate the resolution degradation issue and mine identity-sensitive cues from human faces, we propose to restore the missing facial details using prior facial knowledge, which is then propagated to a smaller network.
Refine and Represent: Region-to-Object Representation Learning
kkallidromitis/r2o • • 25 Aug 2022
Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives.
An Empirical Analysis of the Efficacy of Different Sampling Techniques for Imbalanced Classification
There are numerous approaches to dealing with imbalanced data, but the efficacy of such techniques or an experimental comparison among those techniques has not been conducted.
Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data
A new interpretability technique has been developed to identify the important speech & image features leading to the prediction of particular emotion classes.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
DynaVINS: A Visual-Inertial SLAM for Dynamic Environments
Then, a keyframe grouping and a multi-hypothesis-based constraints grouping methods are proposed to reduce the effect of temporarily static objects in the loop closing.
I Learn to Diffuse, or Data Alchemy 101: a Mnemonic Manifesto
alembics/disco-diffusion • • 8 Aug 2022
In this manifesto, we put forward the idea of data alchemy as a narrative device to discuss storytelling and transdisciplinarity in visualization.
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
jaywalnut310/vits • • 11 Jun 2021
Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel sampling have been proposed, but their sample quality does not match that of two-stage TTS systems.
Sound Audio and Speech Processing
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
PaddlePaddle/PaddleSpeech • • Interspeech2020 2020
In this paper, we propose multi-band MelGAN, a much faster waveform generation model targeting to high-quality text-to-speech.
Sound Audio and Speech Processing
MPro: Combining Static and Symbolic Analysis for Scalable Testing of Smart Contract
We have implemented our technique in a tool called MPro, a scalable and automated smart contract analyzer based on the existing symbolic analysis tool Mythril-Classic and the static analysis tool Slither.
Cryptography and Security
SLAM-Supported Self-Training for 6D Object Pose Estimation
520xyxyzq/slam-super-6d • • 8 Mar 2022
Combining the pose predictions with robot odometry, we formulate and solve pose graph optimization to refine the object pose estimates and make pseudo labels more consistent across frames.
stdgpu: Efficient STL-like Data Structures on the GPU
Tremendous advances in parallel computing and graphics hardware opened up several novel real-time GPU applications in the fields of computer vision, computer graphics as well as augmented reality (AR) and virtual reality (VR).
Distributed, Parallel, and Cluster Computing Graphics
A lightweight design for serverless Function-as-a-Service
FaaS (Function as a Service) allows developers to upload and execute code in the cloud without managing servers.
Distributed, Parallel, and Cluster Computing
Robust Real-time LiDAR-inertial Initialization
We implement the proposed method as an initialization module, which, if enabled, automatically detects the degree of excitation of the collected data and calibrate, on-the-fly, the temporal offset, extrinsic, gravity vector, and IMU bias, which are then used as high-quality initial state values for real-time LiDAR-inertial odometry systems.
Million.js: A Fast, Compiler-Augmented Virtual DOM for Performant JavaScript UI Libraries
The need for developing and delivering interactive web applications has grown rapidly.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Language models enable zero-shot prediction of the effects of mutations on protein function
facebookresearch/esm • • NeurIPS 2021
Modeling the effect of sequence variation on function is a fundamental problem for understanding and designing proteins.
Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models
compvis/latent-diffusion • • 26 Jul 2022
In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples.
Musika! Fast Infinite Waveform Music Generation
We release the source code and pretrained autoencoder weights at github. com/marcoppasini/musika, such that a GAN can be trained on a new music domain with a single GPU in a matter of hours.
PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
PaddlePaddle/Paddle3D • • CVPR 2021
The key of PAConv is to construct the convolution kernel by dynamically assembling basic weight matrices stored in Weight Bank, where the coefficients of these weight matrices are self-adaptively learned from point positions through ScoreNet.
YOLOV: Making Still Image Object Detectors Great at Video Object Detection
yuhengsss/yolov • • 20 Aug 2022
On the positive side, the detection in a certain frame of a video, compared with in a still image, can draw support from other frames.
towhee
towhee-io/towhee • • 11 Jul 2022
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Evaluating Large Language Models Trained on Code
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities.
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
hpcaitech/colossalai • • 28 Oct 2021
The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing.
Pix4Point: Image Pretrained Transformers for 3D Point Cloud Understanding
In the realm of 3D point clouds, the availability of large datasets is a challenge, which exacerbates the issue of training Transformers for 3D tasks.
A ConvNet for the 2020s
lucidrains/denoising-diffusion-pytorch • • CVPR 2022
The «Roaring 20s» of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Our Mission
The mission of Papers with Code is to create a free and open resource with Machine Learning papers, code, datasets, methods and evaluation tables.
We believe this is best done together with the community, supported by NLP and ML.
We also operate specialized portals for papers with code in astronomy, physics, computer sciences, mathematics and statistics.
Joining the community
Join our community of thousands of contributors across academia and industry!
You can also follow us and get in touch on
Contributing
Want to submit a new code implementation? Search for the paper title, and then add the implementation on the paper page.
If you are running a competition, you can mirror the competition results on Papers with Code.
Please note that any contribution you make (i.e. linking code or submitting results) will be licensed under the free CC BY-SA licence.
Inclusion policy
To ensure high quality of data, all edits are monitored on Slack on the #recentchanges channel. This is an open channel and everyone is invited to follow and review contributions.
For a result to be included as a benchmark result we require that the paper is published as pre-print, in a conference or a journal. Having code is strongly encouraged but not required so we can capture the latest published results even before the code has been released.
Downloading the data
All data is licenced under the CC BY-SA licence, same as Wikipedia.
Additional data sources
The vast majority of the data is either annotated by the community or ourselves. However, we also included data from other resources that are published under a compatible licence, such as NLP-progress, EFF AI metrics, SQuAD and RedditSota.
More information on what has been included and how, please see the paperswithcode/sota-extractor repository.
The core Papers with Code team is based in Meta AI Research. The service has been created and is managed by Robert, Ross, Marcin, Elvis, Guillem, Andrew and Thomas.
Papers with Code is a community project. No data is shared with any Meta Platforms product.
All contributions are welcome!
Careers
Interested at working at Papers with Code? Visit our careers page for open positions.
Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments
We introduce a novel federated learning framework, FedD3, which reduces the overall communication volume and with that opens up the concept of federated learning to more application scenarios in network-constrained environments.
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning
(FL) has recently attracted increasing attention from academia and industry, with the ultimate goal of achieving collaborative training under privacy and communication constraints.
Improving Diffusion Model Efficiency Through Patching
crowsonkb/k-diffusion • • 9 Jul 2022
Diffusion models are a powerful class of generative models that iteratively denoise samples to produce data.
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks.
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
timdettmers/bitsandbytes • • 15 Aug 2022
We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance.
TotalSegmentator: robust segmentation of 104 anatomical structures in CT images
wasserth/totalsegmentator • • 11 Aug 2022
Finally, we train a segmentation algorithm on this new dataset.
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
alpa-projects/alpa • • 28 Jan 2022
Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.
LMOT: Efficient Light-Weight Detection and Tracking in Crowds
RanaMostafaAbdElMohsen/LMOT • • IEEE Access 2022
This paper introduces a novel real-time model, LMOT, i. e., Light-weight Multi-Object Tracker, that performs joint pedestrian detection and tracking.
Explaining Explanations: Axiomatic Feature Interactions for Deep Networks
cdpierse/transformers-interpret • • 10 Feb 2020
Integrated Hessians overcomes several theoretical limitations of previous methods to explain interactions, and unlike such previous methods is not limited to a specific architecture or class of neural network.
Topology-aware Convolutional Neural Network for Efficient Skeleton-based Action Recognition
hikvision-research/skelact • • 8 Dec 2021
In particular, we develop a novel cross-channel feature augmentation module, which is a combo of map-attend-group-map operations.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Relighting4D: Neural Relightable Human from Videos
frozenburning/relighting4d • • 14 Jul 2022
Our key insight is that the space-time varying geometry and reflectance of the human body can be decomposed as a set of neural fields of normal, occlusion, diffuse, and specular maps.
Gender Classification and Bias Mitigation in Facial Images
We worked to increase classification accuracy and mitigate algorithmic biases on our baseline model trained on the augmented benchmark database.
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
fundamentalvision/Uni-Perceiver • • 9 Jun 2022
To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models.
Contrastive Audio-Language Learning for Music
In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain.
Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data
xinntao/Real-ESRGAN • • 22 Jul 2021
Though many attempts have been made in blind super-resolution to restore low-resolution images with unknown and complex degradations, they are still far from addressing general real-world degraded images.
ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking
alexa/refined • • NAACL (ACL) 2022
The model is capable of generalising to large-scale knowledge bases such as Wikidata (which has 15 times more entities than Wikipedia) and of zero-shot entity linking.
Neural-Symbolic Models for Logical Queries on Knowledge Graphs
DeepGraphLearning/GNN-QE • • ICML 2022
Answering complex first-order logic (FOL) queries on knowledge graphs is a fundamental task for multi-hop reasoning.
Unbiased Multi-Modality Guidance for Image Inpainting
yeates/MMT • • 25 Aug 2022
Image inpainting is an ill-posed problem to recover missing or damaged image content based on incomplete images with masks.
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
nvidiagameworks/kaolin-wisp • • 16 Jan 2022
Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate.
PSA-GAN: Progressive Self Attention GANs for Synthetic Time Series
awslabs/gluon-ts • • ICLR 2022
Realistic synthetic time series data of sufficient length enables practical applications in time series modeling tasks, such as forecasting, but remains a challenge.
Ivy: Templated Deep Learning for Inter-Framework Portability
ivy-dl/ivy • • 4 Feb 2021
We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.
Neural Volume Rendering: NeRF And Beyond
yenchenlin/awesome-NeRF • • 17 Dec 2020
Besides the COVID-19 pandemic and political upheaval in the US, 2020 was also the year in which neural volume rendering exploded onto the scene, triggered by the impressive NeRF paper by Mildenhall et al. (2020).
BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
zhiqi-li/BEVFormer • • 31 Mar 2022
In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.
Patchwork++: Fast and Robust Ground Segmentation Solving Partial Under-Segmentation Using 3D Point Cloud
Moreover, even if the parameters are well adjusted, a partial under-segmentation problem can still emerge, which implies ground segmentation failures in some regions.
Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting
awslabs/gluon-ts • • 19 Dec 2019
A Simple Baseline for Multi-Camera 3D Object Detection
First, we extract multi-scale features and generate the perspective object proposals on each monocular image.
Learning Visibility for Robust Dense Human Body Estimation
chhankyao/visdb • • 23 Aug 2022
An alternative approach is to estimate dense vertices of a predefined template body in the image space.
Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer
Large-scale deployment of autonomous vehicles has been continually delayed due to safety concerns.
pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning
alibaba/federatedscope • • 8 Jun 2022
Personalized Federated Learning (pFL), which utilizes and deploys distinct local models, has gained increasing attention in recent years due to its success in handling the statistical heterogeneity of FL clients.
Fewer is More: A Deep Graph Metric Learning Perspective Using Fewer Proxies
YuehuaZhu/ProxyGML • • NeurIPS 2020
In this paper, we propose a novel Proxy-based deep Graph Metric Learning (ProxyGML) approach from the perspective of graph classification, which uses fewer proxies yet achieves better comprehensive performance.
Papers with code
Pinned
The SOTA extractor pipeline
Tools for extracting tables and results from Machine Learning papers
The full dataset behind paperswithcode.com
⏰ AI conference deadline countdowns
API Client for paperswithcode.com
Basic guidance on how to contribute to Papers with Code
⏰ AI conference deadline countdowns
2 Updated Aug 18, 2022
Tips for releasing research code in Machine Learning (with official NeurIPS 2020 recommendations)
1 Updated Aug 9, 2022
Basic guidance on how to contribute to Papers with Code
0 Updated Mar 28, 2022
The SOTA extractor pipeline
0 Updated Mar 9, 2022
API Client for paperswithcode.com
0 Updated Dec 2, 2021
The full dataset behind paperswithcode.com
0 Updated Oct 8, 2021
Tools for extracting tables and results from Machine Learning papers
0 Updated Jun 23, 2021
Create a source of truth for ML model results and browse it on Papers with Code
0 Updated Jun 9, 2021
Easily evaluate machine learning models on public benchmarks
1 Updated May 21, 2021
Easily benchmark machine learning models in PyTorch
1 Updated Dec 19, 2020
People
Top languages
Most used topics
Footer
© 2022 GitHub, Inc.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
A deep learning approach to predict the number of k-barriers for intrusion detection over a circular region using wireless sensor networks
One of the crucial applications of WSNs is intrusion detection and surveillance at the border areas and in the defense establishments.
Efficient Truncated Linear Regression with Unknown Noise Variance
In this paper, we provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown, estimating both the linear model and the variance of the noise.
Learning Relational Causal Models with Cycles through Relational Acyclification
We introduce \textit
Combining phylogeny and coevolution improves the inference of interaction partners among paralogous proteins
We show that these two signals can be combined to improve the performance of the inference of interaction partners among paralogs.
Adjoint Optimisation for Wind Farm Flow Control with a Free-Vortex Wake Model
The free-vortex wake model with gradient information shows potential for efficient optimisation and provides a promising way to further explore dynamic wind farm flow control.
DPTDR: Deep Prompt Tuning for Dense Passage Retrieval
tangzhy/dptdr • • 24 Aug 2022
We believe this work facilitates the industry, as it saves enormous efforts and costs of deployment and increases the utility of computing resources.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
Self-Supervised Endoscopic Image Key-Points Matching
abenhamadou/Self-Supervised-Endoscopic-Image-Key-Points-Matching • • 24 Aug 2022
Feature matching and finding correspondences between endoscopic images is a key step in many clinical applications such as patient follow-up and generation of panoramic image from clinical sequences for fast anomalies localization.
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation
lashoun/hanna-benchmark-asg • • 24 Aug 2022
However, there is no consensus on which human evaluation criteria to use, and no analysis of how well automatic criteria correlate with them.
Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild
Their feedback identified that Gradio should support a variety of interfaces and frameworks, allow for easy sharing of the interface, allow for input manipulation and interactive inference by the domain expert, as well as allow embedding the interface in iPython notebooks.
Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation
ZhangGongjie/Meta-DETR • • 30 Jul 2022
Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes.
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
thudm/cogvideo • • 29 May 2022
Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation.
Papers with Code Newsletter #3
👋🏻 Welcome to the 3rd issue of the Papers with Code newsletter! In this edition, we h ighlight 10 novel applications of Transformers, a new leader on the ImageNet leaderboard, a state-of-the-art BERT model, a trillion parameter language model, a novel pooling method, and much more.
Trending Papers with Code 📄
Meta Pseudo Labels [CV]
Denotes the difference between Pseudo Labels (left) and Meta Pseudo Labels (right). Meta Pseudo Labels shows the teacher being trained along with the student. (Figure source: Pham et al. (2020))
Scaling to Trillion Parameter NLP Models [ NLP]
Large scale neural language models have been used to obtain strong performance on a range of NLP tasks, but they are computationally intensive! Mixture of Experts (MoE) models are one approach to scale models through sparse activations, but their use is hindered by communication cost and training instabilities.
Why it matters: Fedus et al. propose the Switch Transformer (based on T5 ) which simplifies the MoE routing algorithm resulting in a model that scales pre-training to a trillion parameters (!). It achieves greater computational efficiency to support scaling on three NLP regimes: pre-training, fine-tuning, and multi-task training. It claims a 7x speedup in pre-training with the same computational resources used by T5 variants. Authors also report improvements on multilingual data across 101 languages and enhanced distilled models.
Illustration of a Switch Transformer encoder block where the router independently routes each token across four FFNs. (Figure source: Fedus et al. (2020))
DeBERTa sits atop the SuperGLUE Benchmark [NLP]
There are ongoing efforts to improve the generalization and efficiency of language models like BERT and RoBERTa. DeBERTa ( D ecoding e nhanced BERT with disentangled a ttention) is a new architecture based on a disentangled attention mechanism and an enhanced mask decoder.
What’s new: DeBERTa calculates attention weights for words using disentangled matrices of word content and relative positions. The architecture also makes use of the absolute position of words, after all Transformer layers and before the softmax layer, as complementary information to decode masked words. DeBERTA is pre-trained using masked language modeling and fine-tuned using a new virtual adversarial training method. The result is improved performance on many downstream tasks. It is currently the top-performing model on SuperGLUE.
Comparison of the decoding layer of vanilla BERT (left) and proposed Enhanced Mask Decoder (right).
Making VGG-style ConvNets Great Again [CV]
Architecture of RepVGG at inference-time (B) and training-time (C). (Figure source: Ding et al. (2020))
A New Pooling Method for CNN Architectures [DL]
Pooling methods in CNNs decrease the size of the activation maps helping to achieve spatial invariance and increase the receptive field. This paper seeks to improve pooling methods by minimizing loss of information while keeping the memory and computation overhead limited.
Overview of SoftPool operation. (Figure source: Stergiou et al. (2020) )
10 Novel Applications using Transformers [DL]
Transformers have had a lot of success in training neural language models. In the past few weeks, we’ve seen several trending papers with code applying Transformers to new types of task:
Trending Libraries and Datasets 🛠
Trending libraries of the week:
Trending with 196 ★
Trending with 190 ★
Trending with 146 ★
Trending with 292 ★
Trending with 252 ★
Trending datasets of the week:
Trending with 51 ★
Trending with 306 ★
Community Highlights ✍️
The following are some of the community highlights for this week:
Special thanks to users @htvr, @tienduang, @rrafikova, @donovanOng, @maksymets, @humamalwassel, @zhaochengqi and hundreds of other contributors for several contributions to Papers with Code tasks, methods, and benchmarks results.
More from PwC 🗣
Deep Hyperspectral and Multispectral Image Fusion with Inter-image Variability
The fusion problem is stated as an optimization problem in the maximum a posteriori framework.
Towards an Awareness of Time Series Anomaly Detection Models’ Adversarial Vulnerability
shahroztariq/adversarial-attacks-on-timeseries • • 24 Aug 2022
To the best of our understanding, we demonstrate, for the first time, the vulnerabilities of anomaly detection systems against adversarial attacks.
Semi-supervised Semantic Segmentation with Mutual Knowledge Distillation
jianlong-yuan/semi-mmseg • • 24 Aug 2022
Consistency regularization has been widely studied in recent semi-supervised semantic segmentation methods.
aidotse/stylegan2-ada-pytorch • • 24 Aug 2022
The lack of sufficiently large open medical databases is one of the biggest challenges in AI-powered healthcare.
Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments
We introduce a novel federated learning framework, FedD3, which reduces the overall communication volume and with that opens up the concept of federated learning to more application scenarios in network-constrained environments.
Addressing Token Uniformity in Transformers via Singular Value Transformation
hanqi-qi/tokenuni • • 24 Aug 2022
In this paper, we propose to use the distribution of singular values of outputs of each transformer layer to characterise the phenomenon of token uniformity and empirically illustrate that a less skewed singular value distribution can alleviate the `token uniformity’ problem.
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task
However, CL on VQA involves not only the expansion of label sets (new Answer sets).
A Deep Learning Approach Using Masked Image Modeling for Reconstruction of Undersampled K-spaces
Aopsmath99/MIMMRI • • 24 Aug 2022
The model was evaluated through L1 loss, gradient normalization, and structural similarity values.
EEG4Students: An Experimental Design for EEG Data Collection and Machine Learning Analysis
However, during the COVID-19 pandemic, data collection and analysis could be more challenging.
Applying Eigencontours to PolarMask-Based Instance Segmentation
dnjs3594/Eigencontours • • 24 Aug 2022
Eigencontours are the first data-driven contour descriptors based on singular value decomposition.
Learning Sub-Pixel Disparity Distribution for Light Field Depth Estimation
In our method, we construct the cost volume at sub-pixel level to produce a finer depth distribution and design an uncertainty-aware focal loss to supervise the disparity distribution to be close to the groundtruth one.
A Multi-Head Model for Continual Learning via Out-of-Distribution Replay
k-gyuhak/more • • 20 Aug 2022
Instead of using the saved samples in memory to update the network for previous tasks/classes in the existing approach, MORE leverages the saved samples to build a task specific classifier (adding a new classification head) without updating the network learned for previous tasks/classes.
PARSE challenge 2022: Pulmonary Arteries Segmentation using Swin U-Net Transformer(Swin UNETR) and U-Net
akansh12/parse2022 • • 20 Aug 2022
In this work, we present our proposed method to segment the pulmonary arteries from the CT scans using Swin UNETR and U-Net-based deep neural network architecture.
Quo Vadis: Hybrid Machine Learning Meta-Model based on Contextual and Behavioral Malware Representations
dtrizna/quo.vadis • • 20 Aug 2022
The detection heuristic in contemporary machine learning Windows malware classifiers is typically based on the static properties of the sample since dynamic analysis through virtualization is challenging for vast quantities of samples.
UniCausal: Unified Benchmark and Model for Causal Text Mining
Therefore, we proposed UniCausal, a unified benchmark for causal text mining across three tasks: Causal Sequence Classification, Cause-Effect Span Detection and Causal Pair Classification.
Curbing Task Interference using Representation Similarity-Guided Multi-Task Feature Sharing
neurai-lab/progressivedecoderfusion • • 19 Aug 2022
However, increased sharing exposes more parameters to task interference which likely hinders both generalization and robustness.
Simulation-Informed Revenue Extrapolation with Confidence Estimate for Scaleup Companies Using Scarce Time-Series Data
Investment professionals rely on extrapolating company revenue into the future (i. e. revenue forecast) to approximate the valuation of scaleups (private companies in a high-growth stage) and inform their investment decision.
Dialogue Policies for Confusion Mitigation in Situated HRI
Confusion is a mental state triggered by cognitive disequilibrium that can occur in many types of task-oriented interaction, including Human-Robot Interaction (HRI).
Evaluating Explainability for Graph Neural Networks
mims-harvard/graphxai • • 19 Aug 2022
As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations.
Diverse Video Captioning by Adaptive Spatio-temporal Attention
zohrehghaderi/vasta • • 19 Aug 2022
To generate proper captions for videos, the inference needs to identify relevant concepts and pay attention to the spatial relationships between them as well as to the temporal development in the clip.
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
arpitbansal297/cold-diffusion-models • • 19 Aug 2022
We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
rinongal/textual_inversion • • 2 Aug 2022
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
PeRFception: Perception using Radiance Fields
POSTECH-CVLab/PeRFception • • 24 Aug 2022
The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.
NeuMan: Neural Human Radiance Field from a Single Video
apple/ml-neuman • • 23 Mar 2022
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
Audio-Visual Segmentation
opennlplab/avsbench • • 11 Jul 2022
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
wongkinyiu/yolov7 • • 6 Jul 2022
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This paper deals with the problem of audio source separation.
In Defense of Online Models for Video Instance Segmentation
wjf5203/vnext • • 21 Jul 2022
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression
We introduce a new empirical Bayes approach for large-scale multiple linear regression.
gemslab/caper • • 23 Aug 2022
Network alignment, or the task of finding corresponding nodes in different networks, is an important problem formulation in many application domains.
A Constrained Deformable Convolutional Network for Efficient Single Image Dynamic Scene Blind Deblurring with Spatially-Variant Motion Blur Kernels Estimation
Most existing deep-learning-based single image dynamic scene blind deblurring (SIDSBD) methods usually design deep networks to directly remove the spatially-variant motion blurs from one inputted motion blurred image, without blur kernels estimation.
Unsupervised Question Answering via Answer Diversifying
Previous works usually make use of heuristic rules as well as pre-trained models to construct data and train QA models.
Neural PCA for Flow-Based Representation Learning
Multi-Modal Representation Learning with Self-Adaptive Thresholds for Commodity Verification
hanchenchen/ccks2022-track2-solution • • 23 Aug 2022
In this paper, we propose a method to identify identical commodities.
Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization
kaminyou/urust • • 23 Aug 2022
Hence, we proposed a strategy for ultra-high-resolution unpaired image-to-image translation: Kernelized Instance Normalization (KIN), which preserves local information and successfully achieves seamless stain transformation with constant GPU memory usage.
Retinal Structure Detection in OCTA Image via Voting-based Multi-task Learning
imed-lab/vaff-net • • 23 Aug 2022
Automated detection of retinal structures, such as retinal vessels (RV), the foveal avascular zone (FAZ), and retinal vascular junctions (RVJ), are of great importance for understanding diseases of the eye and clinical decision-making.
AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results
The homepage of this challenge is at https://github. com/RenYang-home/AIM22_CompressSR.
ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild
We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.
Consistency Regularization for Domain Adaptation
kw01sg/crda • • 23 Aug 2022
Collection of real world annotations for training semantic segmentation models is an expensive process.
Prompting as Probing: Using Language Models for Knowledge Base Construction
ProP implements a multi-step approach that combines a variety of prompting techniques to achieve this.
DeepInteraction: 3D Object Detection via Modality Interaction
Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy.
VILT: Video Instructions Linking for Complex Tasks
This work addresses challenges in developing conversational assistants that support rich multimodal video interactions to accomplish real-world tasks interactively.
Adversarial Feature Augmentation for Cross-domain Few-shot Classification
youthhoo/afa_for_few_shot_learning • • 23 Aug 2022
Existing methods based on meta-learning predict novel-class labels for (target domain) testing tasks via meta knowledge learned from (source domain) training tasks of base classes.
Bitext Mining for Low-Resource Languages via Contrastive Learning
steventan0110/align-filter • • 23 Aug 2022
Mining high-quality bitexts for low-resource languages is challenging.
Inter- and Intra-Series Embeddings Fusion Network for Epidemiological Forecasting
xiefeng69/sefnet • • 23 Aug 2022
In Inter-Series Embedding Module, a multi-scale unified convolution component called Region-Aware Convolution is proposed, which cooperates with self-attention to capture dynamic dependencies between time series obtained from multiple regions.
Collective targeted migrations: a balancing act involving aggregation, group size and environmental clues: a simulation study
What is behind the \emph
Solving Royal Game of Ur Using Reinforcement Learning
Reinforcement Learning has recently surfaced as a very powerful tool to solve complex problems in the domain of board games, wherein an agent is generally required to learn complex strategies and moves based on its own experiences and rewards received.
Survey of Machine Learning Techniques To Predict Heartbeat Arrhythmias
Many works in biomedical computer science research use machine learning techniques to give accurate results.
ExpoCloud: a Framework for Time and Budget-Effective Parameter Space Explorations Using a Cloud Compute Engine
Large parameter space explorations are among the most time consuming yet critically important tasks in many fields of modern research.
Distributed, Parallel, and Cluster Computing
PREVENT: An Unsupervised Approach to Predict Software Failures in Production
This paper presents PREVENT, an approach for predicting and localizing failures in distributed enterprise applications by combining unsupervised techniques.
Synthetic End-User Testing: Modeling Realistic Agents Based on Behavioral Examples
For software interacting directly with real-world end-users, it is common practice to script scenario tests validating the system’s compliance with a number of its features.
Motif-Based Visual Analysis of Dynamic Networks
The network census captures significantly occurring motifs compared to their expected occurrences in random networks and exposes structural changes in a dynamic network.
Social and Information Networks Human-Computer Interaction
Ctrl-VIO: Continuous-Time Visual-Inertial Odometry for Rolling Shutter Cameras
In this paper, we propose a probabilistic continuous-time visual-inertial odometry (VIO) for rolling shutter cameras.
DynaVINS: A Visual-Inertial SLAM for Dynamic Environments
Then, a keyframe grouping and a multi-hypothesis-based constraints grouping methods are proposed to reduce the effect of temporarily static objects in the loop closing.
Collective Intelligence in Human-AI Teams: A Bayesian Theory of Mind Approach
In this paper, we develop a network of Bayesian agents that collectively model a team’s mental states from the team’s observed communication.
What are the Practices for Secret Management in Software Artifacts?
The goal of our paper is to aid practitioners in avoiding the exposure of secrets by identifying secret management practices in software artifacts through a systematic derivation of practices disseminated in Internet artifacts.
Software Engineering Cryptography and Security 68-01
SAP Signavio Academic Models: A Large Process Model Dataset
In this paper, we introduce the SAP Signavio Academic Models (SAP-SAM) dataset, a collection of hundreds of thousands of business models, mainly process models in BPMN notation.
Other Computer Science Software Engineering
An open dataset of scholars on Twitter
This paper presents a novel and simple approach to match authors from OpenAlex with Twitter users identified in Crossref Event Data.
A novel approach for Fair Principal Component Analysis based on eigendecomposition
Principal component analysis (PCA), a ubiquitous dimensionality reduction technique in signal processing, searches for a projection matrix that minimizes the mean squared error between the reduced dataset and the original one.
A Bayesian Variational principle for dynamic Self Organizing Maps
anthony-neo/vdsom • • 24 Aug 2022
We propose organisation conditions that yield a method for training SOM with adaptative neighborhood radius in a variational Bayesian framework.
Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution
The proposed nearest convolution has the same performance as the nearest upsampling but is much faster and more suitable for Android NNAPI.
DCSF: Deep Convolutional Set Functions for Classification of Asynchronous Time Series
Because of the asynchronous nature, they pose a significant challenge to deep learning architectures, which presume that the time series presented to them are regularly sampled, fully observed, and aligned with respect to time.
Unrestricted Black-box Adversarial Attack Using GAN with Limited Queries
ndb796/latenthsja • • 24 Aug 2022
First, we demonstrate that our targeted attack method is query-efficient to produce unrestricted adversarial examples for a facial identity recognition model that contains 307 identities.
Self-Filtering: A Noise-Aware Sample Selection for Label Noise with Confidence Penalization
1998v7/self-filtering • • 24 Aug 2022
Sample selection is an effective strategy to mitigate the effect of label noise in robust learning.
Sliding Window Recurrent Network for Efficient Video Super-Resolution
Different from single image super-resolution, VSR can utilize frames’ temporal information to reconstruct results with more details.
RZSR: Reference-based Zero-Shot Super-Resolution with Depth Guided Self-Exemplars
To advance ZSSR, we obtain reference image patches with rich textures and high-frequency details which are also extracted only from the input image using cross-scale matching.
Tracking by weakly-supervised learning and graph optimization for whole-embryo C. elegans lineages
Our work specifically addresses the following challenging properties of C. elegans embryo recordings: (1) Many cell divisions as compared to benchmark recordings of other organisms, and (2) the presence of polar bodies that are easily mistaken as cell nuclei.
On the Design of Privacy-Aware Cameras: a Study on Deep Neural Networks
At the same time, we ensure that useful non-sensitive data can still be extracted from distorted images.
Multi-domain Learning for Updating Face Anti-spoofing Models
In this work, we study multi-domain learning for face anti-spoofing(MD-FAS), where a pre-trained FAS model needs to be updated to perform equally well on both source and target domains while only using target domain data for updating.
Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents
Tables are widely used in several types of documents since they can bring important information in a structured way.
Exact Penalty Method for Federated Learning
Federated learning has burgeoned recently in machine learning, giving rise to a variety of research topics.
Enhancing User Behavior Sequence Modeling by Generative Tasks for Session Search
haon-chen/ase-official • • 23 Aug 2022
To help the encoding of the current user behavior sequence, we propose to use a decoder and the information of future sequences and a supplemental query.
Distance-Aware Occlusion Detection with Focused Attention
yang-li-2000/distance-aware-occlusion-detection-with-focused-attention • • 23 Aug 2022
In this work, (1) we propose a novel three-decoder architecture as the infrastructure for focused attention; 2) we use the generalized intersection box prediction task to effectively guide our model to focus on occlusion-specific regions; 3) our model achieves a new state-of-the-art performance on distance-aware relationship detection.
IMPaSh: A Novel Domain-shift Resistant Representation for Colorectal Cancer Tissue Classification
trinhvg/impash • • 23 Aug 2022
The appearance of histopathology images depends on tissue type, staining and digitization procedure.
Hierarchical Perceptual Noise Injection for Social Media Fingerprint Privacy Protection
nlsde-safety-team/fingersafe • • 23 Aug 2022
The threat of fingerprint leakage from social media raises a strong desire for anonymizing shared images while maintaining image qualities, since fingerprints act as a lifelong individual biometric password.
Data augmentation on graphs for table type classification
Tables are widely used in documents because of their compact and structured representation of information.
Multimodal Across Domains Gaze Target Detection
francescotonini/multimodal-across-domains-gaze-target-detection • • 23 Aug 2022
This paper addresses the gaze target detection problem in single images captured from the third-person perspective.
SurvSHAP(t): Time-dependent explanations of machine learning survival models
Experiments on synthetic and medical data confirm that SurvSHAP(t) can detect variables with a time-dependent effect, and its aggregation is a better determinant of the importance of variables for a prediction than SurvLIME.
Inductive Knowledge Graph Reasoning for Multi-batch Emerging Entities
We propose a walk-based inductive reasoning model to tackle the new setting.
Incorporating Rivalry in Reinforcement Learning for a Competitive Game
Recent advances in reinforcement learning with social agents have allowed such models to achieve human-level performance on specific interaction tasks.
A simple learning agent interacting with an agent-based market model
We consider the learning dynamics of a single reinforcement learning optimal execution trading agent when it interacts with an event driven agent-based financial market model.
FairDisCo: Fairer AI in Dermatology via Disentanglement Contrastive Learning
siyi-wind/FairDisCo • • 22 Aug 2022
Deep learning models have achieved great success in automating skin lesion diagnosis.
Deep 3D Vessel Segmentation based on Cross Transformer Network
qibaolian/ctn • • 22 Aug 2022
In CTN, a transformer module is constructed in parallel to a U-Net to learn long-distance dependencies between different anatomical regions; and these dependencies are communicated to the U-Net at multiple stages to endow it with global awareness.
Semi-supervised classification using a supervised autoencoder for biomedical applications
cypriengille/semi-supervised-autoencoder • • 22 Aug 2022
Experiments show that the SSAE outperforms Label Propagation and Spreading and the Fully Connected Neural Network both on a synthetic dataset and on two real-world biological datasets.
FedOS: using open-set learning to stabilize training in federated learning
mohamad-m2/federated-learning • • 22 Aug 2022
Federated Learning is a recent approach to train statistical models on distributed datasets without violating privacy constraints.
Dynamic Adaptive Threshold based Learning for Noisy Annotations Robust Facial Expression Recognition
1980x/dnfer • • 22 Aug 2022
To handle noisy annotations, we propose a dynamic FER learning framework (DNFER) in which clean samples are selected based on dynamic class specific threshold during training.
High-quality Task Division for Large-scale Entity Alignment
To include in the EA subtasks a high proportion of the potential mappings originally present in the large EA task, we devise a counterpart discovery method that exploits the locality principle of the EA task and the power of trained EA models.
LTE4G: Long-Tail Experts for Graph Neural Networks
SukwonYun/LTE4G • • 22 Aug 2022
After having trained an expert for each balanced subset, we adopt knowledge distillation to obtain two class-wise students, i. e., Head class student and Tail class student, each of which is responsible for classifying nodes in the head classes and tail classes, respectively.
Источники:
- http://paperswithcode.com/?ref=bestofml
- http://paperswithcode.com/
- http://paperswithcode.com/latest
- http://paperswithcode.com/?ref=xranks
- http://paperswithcode.com/?ref=undesign
- http://paperswithcode.com/?amp=1
- http://paperswithcode.com/?from=n15
- http://paperswithcode.com/?ref=codebldr
- http://paperswithcode.com/?ref=bestofml.com
- http://paperswithcode.com/?ref=semscholar
- http://paperswithcode.com/?ref=driverlayer.com
- http://cs.paperswithcode.com/
- http://paperswithcode.com/?ref=datahackers
- http://paperswithcode.com/?ref=steemhunt
- http://paperswithcode.com/?ref=ratrating.com
- http://paperswithcode.com/?page=2
- http://paperswithcode.com/?ref=codebldr&page=3
- http://paperswithcode.com/?ref=codebldr&page=4
- http://paperswithcode.com/about
- http://paperswithcode.com/?page=3
- http://paperswithcode.com/?ref=producthunt&page=2
- http://paperswithcode.com/?ref=codebldr&page=2
- http://paperswithcode.com/?ref=producthunt&page=3
- http://paperswithcode.com/?page=5
- http://paperswithcode.com/?page=6
- http://github.com/paperswithcode
- http://paperswithcode.com/latest?page=3
- http://paperswithcode.com/?page=20
- http://paperswithcode.com/newsletter/3
- http://paperswithcode.com/latest?page=4
- http://paperswithcode.com/latest?page=19
- http://paperswithcode.com/?ref=steemhunt&page=2
- http://paperswithcode.com/latest?page=9
- http://paperswithcode.com/latest?page=11
- http://cs.paperswithcode.com/latest
- http://paperswithcode.com/latest?page=5
- http://paperswithcode.com/latest?page=10
- http://paperswithcode.com/latest?page=12