Papers with code

03.09.202203.09.2022 admin 0 Comments

Papers with code

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • • 19 Aug 2022

We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • • 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • • 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • • 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • • 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • • 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • • 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • • 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • • 19 Aug 2022

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • • 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • • 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • • 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • • 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • • 6 Jul 2022

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • • 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • • 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization.

ResT: An Efficient Transformer for Visual Recognition

wofmanaf/ResT • • NeurIPS 2021

This paper presents an efficient multi-scale vision Transformer, called ResT, that capably served as a general-purpose backbone for image recognition.

007: Democratically Finding The Cause of Packet Drops

Network failures continue to plague datacenter operators as their symptoms may not have direct correlation with where or why they occur.

A General Approach for Using Deep Neural Network for Digital Watermarking

Technologies of the Internet of Things (IoT) facilitate digital contents such as images being acquired in a massive way.

DecAug: Out-of-Distribution Generalization via Decomposed Feature Representation and Semantic Augmentation

HaoyueBaiZJU/DecAug • • 17 Dec 2020

To address that, we propose DecAug, a novel decomposed feature representation and semantic augmentation approach for OoD generalization.

Big Data for Traffic Estimation and Prediction: A Survey of Data and Tools

Big data has been used widely in many areas including the transportation industry.

MS-MDA: Multisource Marginal Distribution Adaptation for Cross-subject and Cross-session EEG Emotion Recognition

VoiceBeer/MS-MDA • • 16 Jul 2021

Although several studies have adopted domain adaptation (DA) approaches to tackle this problem, most of them treat multiple EEG data from different subjects and sessions together as a single source domain for transfer, which either fails to satisfy the assumption of domain adaptation that the source has a certain marginal distribution, or increases the difficulty of adaptation.

Long-Tailed Recognition via Weight Balancing

shadealsha/ltr-weight-balancing • • CVPR 2022

In contrast, weight decay penalizes larger weights more heavily and so learns small balanced weights; the MaxNorm constraint encourages growing small weights within a norm ball but caps all the weights by the radius.

Zero-Shot Logit Adjustment

cdb342/ijcai-2022-zla • • 25 Apr 2022

As a consequence of our derivation, the aforementioned two properties are incorporated into the classifier training as seen-unseen priors via logit adjustment.

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

nvlabs/minvis • • 3 Aug 2022

By only training a query-based image instance segmentation model, MinVIS outperforms the previous best result on the challenging Occluded VIS dataset by over 10% AP.

Understanding Diffusion Models: A Unified Perspective

no code yet • 25 Aug 2022

Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2.

PEER: A Collaborative Language Model

no code yet • 24 Aug 2022

Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes.

The Alberta Plan for AI Research

no code yet • 23 Aug 2022

Herein we describe our approach to artificial intelligence research, which we call the Alberta Plan.

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

microsoft/unilm • • 22 Aug 2022

A big convergence of language, vision, and multimodal pretraining is emerging.

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

no code yet • 25 Aug 2022

Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes.

Bugs in the Data: How ImageNet Misrepresents Biodiversity

We find that many of the classes are ill-defined or overlapping, and that 12% of the images are incorrectly labeled, with some classes having >90% of images incorrect.

Efficient Planning in a Compact Latent Action Space

ZhengyaoJiang/latentplan • • 22 Aug 2022

While planning-based sequence modelling methods have shown great potential in continuous control, scaling them to high-dimensional state-action sequences remains an open challenge due to the high computational complexity and innate difficulty of planning in high-dimensional spaces.

JAXFit: Trust Region Method for Nonlinear Least-Squares Curve Fitting on the GPU

dipolar-quantum-gases/jaxfit • • 25 Aug 2022

We implement a trust region method on the GPU for nonlinear least squares curve fitting problems using a new deep learning Python library called JAX.

Supervised Contrastive Learning for Affect Modelling

Affect modeling is viewed, traditionally, as the process of mapping measurable affect manifestations from multiple modalities of user input to affect labels.

The ReprGesture entry to the GENEA Challenge 2022

YoungSeng/ReprGesture • • 25 Aug 2022

This paper describes the ReprGesture entry to the Generation and Evaluation of Non-verbal Behaviour for Embodied Agents (GENEA) challenge 2022.

Supervised Dimensionality Reduction and Classification with Convolutional Autoencoders

It turned out that this methodology can also be greatly beneficial in enforcing explainability of deep learning architectures.

Pix4Point: Image Pretrained Transformers for 3D Point Cloud Understanding

In the realm of 3D point clouds, the availability of large datasets is a challenge, which exacerbates the issue of training Transformers for 3D tasks.

Time Series Clustering with an EM algorithm for Mixtures of Linear Gaussian State Space Models

To address this problem, we propose a novel model-based time series clustering method with mixtures of linear Gaussian state space models, which have high flexibility.

Masked Autoencoders Enable Efficient Knowledge Distillers

For example, by distilling the knowledge from an MAE pre-trained ViT-L into a ViT-B, our method achieves 84. 0% ImageNet top-1 accuracy, outperforming the baseline of directly distilling a fine-tuned ViT-L by 1. 2%.

A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks

In this paper, we empirically investigate the applications of carefully selected RL algorithms on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing.

Adaptive Perception Transformer for Temporal Action Localization

soupero/adaperformer • • 25 Aug 2022

Besides, their multi-stage designs cannot generate action boundaries and categories straightforwardly.

Contrastive Audio-Language Learning for Music

In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain.

Datasets

Subscribe to the PwC Newsletter

Join the community

You need to log in to edit.
You can create a new account if you don’t have one.

201 dataset results for 3D
ShapeNet is a large scale repository for 3D CAD models developed by researchers from Stanford University, Princeton University and the Toyota Technological Institute at Chicago, USA. The repository contains over 300M models with 220,000 classified into 3,135 classes arranged using WordNet hypernym-hyponym relationships. ShapeNet Parts subset contains 31,693 meshes categorised into 16 common object classes (i.e. table, chair, plane etc.). Each shapes ground truth contains 2-5 parts (with a total of 50 part classes).
1,063 PAPERS • 10 BENCHMARKS
The ModelNet40 dataset contains synthetic object point clouds. As the most widely used benchmark for point cloud analysis, ModelNet40 is popular because of its various categories, clean shapes, well-constructed dataset, etc. The original ModelNet40 consists of 12,311 CAD-generated meshes in 40 categories (such as airplane, car, plant, lamp), of which 9,843 are used for training while the rest 2,468 are reserved for testing. The corresponding point cloud data points are uniformly sampled from the mesh surfaces, and then further preprocessed by moving to the origin and scaling into a unit sphere.
854 PAPERS • 7 BENCHMARKS
The nuScenes dataset is a large-scale autonomous driving dataset. The dataset has 3D bounding boxes for 1000 scenes collected in Boston and Singapore. Each scene is 20 seconds long and annotated at 2Hz. This results in a total of 28130 samples for training, 6019 samples for validation and 6008 samples for testing. The dataset has the full autonomous vehicle data suite: 32-beam LiDAR, 6 cameras and radars with complete 360° coverage. The 3D object detection challenge evaluates the performance on 10 classes: cars, trucks, buses, trailers, construction vehicles, pedestrians, motorcycles, bicycles, traffic cones and barriers.
580 PAPERS • 12 BENCHMARKS
The Stanford 3D Indoor Scene Dataset (S3DIS) dataset contains 6 large-scale indoor areas with 271 rooms. Each point in the scene point cloud is annotated with one of the 13 semantic categories.
248 PAPERS • 6 BENCHMARKS
The Matterport3D dataset is a large RGB-D dataset for scene understanding in indoor environments. It contains 10,800 panoramic views inside 90 real building-scale scenes, constructed from 194,400 RGB-D images. Each scene is a residential building consisting of multiple rooms and floor levels, and is annotated with surface construction, camera poses, and semantic segmentation.
227 PAPERS • 4 BENCHMARKS
The Pascal3D+ multi-view dataset consists of images in the wild, i.e., images of object categories exhibiting high variability, captured under uncontrolled settings, in cluttered scenes and under many different poses. Pascal3D+ contains 12 categories of rigid objects selected from the PASCAL VOC 2012 dataset. These objects are annotated with pose information (azimuth, elevation and distance to camera). Pascal3D+ also adds pose annotated images of these 12 categories from the ImageNet dataset.
195 PAPERS • 1 BENCHMARK
The 3D Poses in the Wild dataset is the first dataset in the wild with accurate 3D poses for evaluation. While other datasets outdoors exist, they are all restricted to a small recording volume. 3DPW is the first one that includes video footage taken from a moving phone camera.
176 PAPERS • 4 BENCHMARKS
SUNCG is a large-scale dataset of synthetic 3D scenes with dense volumetric annotations.
170 PAPERS • NO BENCHMARKS YET
The Waymo Open Dataset is comprised of high resolution sensor data collected by autonomous vehicles operated by the Waymo Driver in a wide variety of conditions.
169 PAPERS • 9 BENCHMARKS
MPI-INF-3DHP is a 3D human body pose estimation dataset consisting of both constrained indoor and complex outdoor scenes. It records 8 actors performing 8 activities from 14 camera views. It consists on >1.3M frames captured from the 14 cameras.
159 PAPERS • 4 BENCHMARKS
ShapeNetCore is a subset of the full ShapeNet dataset with single clean 3D models and manually verified category and alignment annotations. It covers 55 common object categories with about 51,300 unique 3D models. The 12 object categories of PASCAL 3D+, a popular computer vision 3D benchmark dataset, are all covered by ShapeNetCore.
115 PAPERS • 1 BENCHMARK
AMASS is a large database of human motion unifying different optical marker-based motion capture datasets by representing them within a common framework and parameterization. AMASS is readily useful for animation, visualization, and generating training data for deep learning.
108 PAPERS • 1 BENCHMARK
The Chairs dataset contains rendered images of around 1000 different three-dimensional chair models.
106 PAPERS • 1 BENCHMARK
ScanObjectNN is a newly published real-world dataset comprising of 2902 3D objects in 15 categories. It is a challenging point cloud classification datasets due to the background, missing parts and deformations.
96 PAPERS • 2 BENCHMARKS
The Pix3D dataset is a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, viewpoint estimation, etc.
94 PAPERS • 5 BENCHMARKS
The Replica Dataset is a dataset of high quality reconstructions of a variety of indoor spaces. Each reconstruction has clean dense geometry, high resolution and high dynamic range textures, glass and mirror surface information, planar segmentation as well as semantic class and instance segmentation.
94 PAPERS • 2 BENCHMARKS
AFLW2000-3D is a dataset of 2000 images that have been annotated with image-level 68-point 3D facial landmarks. This dataset is used for evaluation of 3D facial landmark detection models. The head poses are very diverse and often hard to be detected by a CNN-based face detector.
92 PAPERS • 8 BENCHMARKS
The smallNORB dataset is a datset for 3D object recognition from shape. It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by two cameras under 6 lighting conditions, 9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340 every 20 degrees). The training set is composed of 5 instances of each category (instances 4, 6, 7, 8 and 9), and the test set of the remaining 5 instances (instances 0, 1, 2, 3, and 5).
86 PAPERS • 1 BENCHMARK
SUN3D contains a large-scale RGB-D video database, with 8 annotated sequences. Each frame has a semantic segmentation of the objects in the scene and information about the camera pose. It is composed by 415 sequences captured in 254 different spaces, in 41 different buildings. Moreover, some places have been captured multiple times at different moments of the day.
83 PAPERS • NO BENCHMARKS YET
PartNet is a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information. The dataset consists of 573,585 part instances over 26,671 3D models covering 24 object categories. This dataset enables and serves as a catalyst for many tasks such as shape analysis, dynamic 3D scene modeling and simulation, affordance analysis, and others.
81 PAPERS • 1 BENCHMARK
FaceWarehouse is a 3D facial expression database that provides the facial geometry of 150 subjects, covering a wide range of ages and ethnic backgrounds.
78 PAPERS • NO BENCHMARKS YET
The BP4D-Spontaneous dataset is a 3D video database of spontaneous facial expressions in a diverse group of young adults. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains using both person-specific and generic approaches. The database includes forty-one participants (23 women, 18 men). They were 18 – 29 years of age; 11 were Asian, 6 were African-American, 4 were Hispanic, and 20 were Euro-American. An emotion elicitation protocol was designed to elicit emotions of participants effectively. Eight tasks were covered with an interview process and a series of activities to elicit eight emotions. The database is structured by participants. Each participant is associated with 8 tasks. For each task, there are both 3D and 2D videos. As well, the Metadata include manually annotated
69 PAPERS • 3 BENCHMARKS
FreiHAND is a 3D hand pose dataset which records different hand actions performed by 32 people. For each hand image, MANO-based 3D hand pose annotations are provided. It currently contains 32,560 unique training samples and 3960 unique samples for evaluation. The training samples are recorded with a green screen background allowing for background removal. In addition, it applies three different post processing strategies to training samples for data augmentation. However, these post processing strategies are not applied to evaluation samples.
57 PAPERS • 1 BENCHMARK
Aachen Day-Night is a dataset designed for benchmarking 6DOF outdoor visual localization in changing conditions. It focuses on localizing high-quality night-time images against a day-time 3D model. There are 14,607 images with changing conditions of weather, season and day-night cycles.
54 PAPERS • 1 BENCHMARK
T-LESS is a dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from
51 PAPERS • 2 BENCHMARKS
Semantic3D is a point cloud dataset of scanned outdoor scenes with over 3 billion points. It contains 15 training and 15 test scenes annotated with 8 class labels. This large labelled 3D point cloud data set of natural covers a range of diverse urban scenes: churches, streets, railroad tracks, squares, villages, soccer fields, castles to name just a few. The point clouds provided are scanned statically with state-of-the-art equipment and contain very fine details.
48 PAPERS • 1 BENCHMARK
CoMA contains 17,794 meshes of the human face in various expressions

Datasets

Subscribe to the PwC Newsletter

Join the community

You need to log in to edit.
You can create a new account if you don’t have one.

91 dataset results for Tabular
The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset contains additional unlabeled data.
1,058 PAPERS • 7 BENCHMARKS
The MovieLens datasets, first released in 1998, describe people’s expressed preferences for movies. These preferences take the form of tuples, each the result of a person expressing a preference (a 0-5 star rating) for a movie at a particular time. These preferences were entered by way of the MovieLens web site1 — a recommender system that asks its users to give movie ratings in order to receive personalized movie recommendations.
738 PAPERS • 10 BENCHMARKS
The Medical Information Mart for Intensive Care III (MIMIC-III) dataset is a large, de-identified and publicly-available collection of medical records. Each record in the dataset includes ICD-9 codes, which identify diagnoses and procedures performed. Each code is partitioned into sub-codes, which often include specific circumstantial details. The dataset consists of 112,000 clinical reports records (average length 709.3 tokens) and 1,159 top-level ICD-9 codes. Each report is assigned to 7.6 codes, on average. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more.
560 PAPERS • 7 BENCHMARKS
NAS-Bench-101 is the first public architecture dataset for NAS research. To build NASBench-101, the authors carefully constructed a compact, yet expressive, search space, exploiting graph isomorphisms to identify 423k unique convolutional architectures. The authors trained and evaluated all of these architectures multiple times on CIFAR-10 and compiled the results into a large dataset of over 5 million trained models. This allows researchers to evaluate the quality of a diverse range of models in milliseconds by querying the precomputed dataset.
93 PAPERS • 1 BENCHMARK
Netflix Prize consists of about 100,000,000 ratings for 17,770 movies given by 480,189 users. Each rating in the training dataset consists of four entries: user, movie, date of grade, grade. Users and movies are represented with integer IDs, while ratings range from 1 to 5.
90 PAPERS • NO BENCHMARKS YET
UNSW-NB15 is a network intrusion dataset. It contains nine different attacks, includes DoS, worms, Backdoors, and Fuzzers. The dataset contains raw network packets. The number of records in the training set is 175,341 records and the testing set is 82,332 records from the different types, attack and normal.
68 PAPERS • 2 BENCHMARKS
WikiTableQuestions is a question answering dataset over semi-structured tables. It is comprised of question-answer pairs on HTML tables, and was constructed by selecting data tables from Wikipedia that contained at least 8 rows and 5 columns. Amazon Mechanical Turk workers were then tasked with writing trivia questions about each table. WikiTableQuestions contains 22,033 questions. The questions were not designed by predefined templates but were hand crafted by users, demonstrating high linguistic variance. Compared to previous datasets on knowledge bases it covers nearly 4,000 unique column headers, containing far more relations than closed domain datasets and datasets for querying knowledge bases. Its questions cover a wide range of domains, requiring operations such as table lookup, aggregation, superlatives (argmax, argmin), arithmetic operations, joins and unions.
36 PAPERS • 1 BENCHMARK
The Yahoo! Learning to Rank Challenge dataset consists of 709,877 documents encoded in 700 features and sampled from query logs of the Yahoo! search engine, spanning 29,921 queries.
23 PAPERS • NO BENCHMARKS YET
The friedman1 data set is commonly used to test semi-supervised regression methods.
22 PAPERS • NO BENCHMARKS YET
CAL500 (Computer Audition Lab 500) is a dataset aimed for evaluation of music information retrieval systems. It consists of 502 songs picked from western popular music. The audio is represented as a time series of the first 13 Mel-frequency cepstral coefficients (and their first and second derivatives) extracted by sliding a 12 ms half-overlapping short-time window over the waveform of each song. Each song has been annotated by at least 3 people with 135 musically-relevant concepts spanning six semantic categories:
18 PAPERS • NO BENCHMARKS YET
This dataset contains card descriptions of the card game Hearthstone and the code that implements them. These are obtained from the open-source implementation Hearthbreaker (https://github.com/danielyule/hearthbreaker).
17 PAPERS • NO BENCHMARKS YET
Data Set Information: Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))
16 PAPERS • 1 BENCHMARK
The T2Dv2 dataset consists of 779 tables originating from the English-language subset of the WebTables corpus. 237 tables are annotated for the Table Type Detection task, 236 for the Columns Property Annotation (CPA) task and 235 for the Row Annotation task. The annotations that are used are DBpedia types, properties and entities.
10 PAPERS • 4 BENCHMARKS
Two datasets are provided. the original dataset, in the form provided by Prof. Hofmann, contains categorical/symbolic attributes and is in the file «german.data».
9 PAPERS • NO BENCHMARKS YET
The Amazon-Google dataset for entity resolution derives from the online retailers Amazon.com and the product search service of Google accessible through the Google Base Data API. The dataset contains 1363 entities from amazon.com and 3226 google products as well as a gold standard (perfect mapping) with 1300 matching record pairs between the two data sources. The common attributes between the two data sources are: product name, product description, manufacturer and price.
8 PAPERS • 1 BENCHMARK
The Abt-Buy dataset for entity resolution derives from the online retailers Abt.com and Buy.com. The dataset contains 1081 entities from abt.com and 1092 entities from buy.com as well as a gold standard (perfect mapping) with 1097 matching record pairs between the two data sources. The common attributes between the two data sources are: product name, product description and product price.
7 PAPERS • 1 BENCHMARK
Many e-shops have started to mark-up product data within their HTML pages using the schema.org vocabulary. The Web Data Commons project regularly extracts such data from the Common Crawl, a large public web crawl. The Web Data Commons Training and Test Sets for Large-Scale Product Matching contain product offers from different e-shops in the form of binary product pairs (with corresponding label «match» or «no match») for four product categories, computers, cameras, watches and shoes.
7 PAPERS • 4 BENCHMARKS
The ToughTables (2T) dataset was created for the SemTab challenge and includes 180 tables in total. The tables in this dataset can be categorized in two groups: the control (CTRL) group tables and tough (TOUGH) group tables.
6 PAPERS • 4 BENCHMARKS
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy.
4 PAPERS • NO BENCHMARKS YET
ACS PUMS stands for American Community Survey (ACS) Public Use Microdata Sample (PUMS) and has been used to construct several tabular datasets for studying fairness in machine learning:
3 PAPERS • NO BENCHMARKS YET
3 PAPERS • NO BENCHMARKS YET
Open Dataset: Mobility Scenario FIMU
3 PAPERS • NO BENCHMARKS YET
The Papers with Code Leaderboards dataset is a collection of over 5,000 results capturing performance of machine learning models. Each result is a tuple of form (task, dataset, metric name, metric value). The data was collected using the Papers with Code review interface.
3 PAPERS • 1 BENCHMARK
VizNet-Sato is a dataset from the authors of Sato and is based on the VizNet dataset. The authors choose from VizNet only relational web tables with headers matching their selected 78 DBpedia semantic types. The selected tables are divided into two categories: Full tables and Multi-column only tables. The first category corresponds to 78,733 selected tables from VizNet, while the second category includes 32,265 tables which have more than one column. The tables of both categories are divided into 5 subsets to be able to conduct 5-fold cross validation: 4 subsets are used for training and the last for evaluation.
3 PAPERS • 2 BENCHMARKS
The WikiTables-TURL dataset was constructed by the authors of TURL and is based on the WikiTable corpus, which is a large collection of Wikipedia tables. The dataset consists of 580,171 tables divided into fixed training, validation and testing splits. Additionally, the dataset contains metadata about each table, such as the table name, table caption and column headers.
3 PAPERS • 3 BENCHMARKS
The WikipediaGS dataset was created by extracting Wikipedia tables from Wikipedia pages. It consists of 485,096 tables which were annotated with DBpedia entities for the Cell Entity Annotation (CEA) task.
3 PAPERS • 2 BENCHMARKS
A coronavirus dataset with 98 countries constructed from different reliable sources, where each row represents a country, and the columns represent geographic, climate, healthcare, economic, and demographic factors that may contribute to accelerate/slow the spread of the COVID-19. The assumptions for the different factors are as follows:
2 PAPERS • NO BENCHMARKS YET
This resource, our Concepticon, links concept labels from different conceptlists to concept sets. Each concept set is given a unique identifier, a unique label, and a human-readable definition. Concept sets are further structured by defining different relations between the concepts, as you can see in the graphic to the right, which displays the relations between concept sets linked to the concept set SIBLING. The resource can be used for various purposes. Serving as a rich reference for new and existing databases in diachronic and synchronic linguistics, it allows researchers a quick access to studies on semantic change, cross-linguistic polysemies, and semantic associations.
2 PAPERS • NO BENCHMARKS YET
2 PAPERS • 1 BENCHMARK
The GitTables-SemTab dataset is a subset of the GitTables dataset and was created to be used during the SemTab challenge. The dataset consists of 1101 tables and is used to benchmark the Column Type Annotation (CTA) task.
2 PAPERS • 2 BENCHMARKS
The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
2 PAPERS • 1 BENCHMARK
The original dataset was provided by Orange telecom in France, which contains anonymized and aggregated human mobility data. The Multivariate-Mobility-Paris dataset comprises information from 2020-08-24 to 2020-11-04 (72 days during the COVID-19 pandemic), with time granularity of 30 minutes and spatial granularity of 6 coarse regions in Paris, France. In other words, it represents a multivariate time series dataset.
2 PAPERS • NO BENCHMARKS YET
2 PAPERS • NO BENCHMARKS YET
The softwarised network data zoo (SNDZoo) is an open collection of software networking data sets aiming to streamline and ease machine learning research in the software networking domain. Most of the published data sets focus on, but are not limited to, the performance of virtualised network functions (VNFs). The data is collected using fully automated NFV benchmarking frameworks, such as tng-bench, developed by us or third party solutions like Gym. The collection of the presented data sets follows the general VNF benchmarking methodology described in.
2 PAPERS • NO BENCHMARKS YET
This resource is designed to allow for research into Natural Language Generation. In particular, with neural data-to-text approaches although it is not limited to these.
2 PAPERS • NO BENCHMARKS YET
The eSports Sensors dataset contains sensor data collected from 10 players in 22 matches in League of Legends. The sensor data collected includes:
2 PAPERS • 2 BENCHMARKS
1 PAPER • NO BENCHMARKS YET
Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which comprises a five-fold cross-validation set and independent test set totaling 25,440 unique subjective perceptual responses to augmented soundscapes presented as audio-visual stimuli. Each augmented soundscape is made by digitally adding «maskers» (bird, water, wind, traffic, construction, or silence) to urban soundscape recordings at fixed soundscape-to-masker ratios. Responses were then collected by asking participants to rate how pleasant, annoying, eventful, uneventful, vibrant, monotonous, chaotic, calm, and appropriate each augmented soundscape was, in accordance with ISO 12913-2:2018. Pa
1 PAPER • NO BENCHMARKS YET
Multimodal object recognition is still an emerging field. Thus, publicly available datasets are still rare and of small size. This dataset was developed to help fill this void and presents multimodal data for 63 objects with some visual and haptic ambiguity. The dataset contains visual, kinesthetic and tactile (audio/vibrations) data. To completely solve sensory ambiguity, sensory integration/fusion would be required. This report describes the creation and structure of the dataset. The first section explains the underlying approach used to capture the visual and haptic properties of the objects. The second section describes the technical aspects (experimental setup) needed for the collection of the data. The third section introduces the objects, while the final section describes the structure and content of the dataset.
1 PAPER • NO BENCHMARKS YET
Measurement data related to the publication „Active TLS Stack Fingerprinting: Characterizing TLS Server Deployments at Scale“. It contains weekly TLS and HTTP scan data and the TLS fingerprints for each target.
1 PAPER • NO BENCHMARKS YET
AnoShift is a large-scale anomaly detection benchmark, which focuses on splitting the test data based on its temporal distance to the training set, introducing three testing splits: IID, NEAR, and FAR. This testing scenario proves to capture the in-time performance degradation of anomaly detection methods for classical to masked language models.
1 PAPER • 2 BENCHMARKS
Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we present the real results severity (BIRADS) and pathology (post-report) classifications provided by the Radiologist Director from the Radiology Department of Hospital Fernando Fonseca while diagnosing several patients (see dataset-uta4-dicom) from our User Tests and Analysis 4 (UTA4) study. Here, we provide a dataset for the measurements of both severity (BIRADS) and pathology classifications concerning the patient diagnostic. Work and results are published on a top Human-Computer Interaction (HCI) conference named AVI 2020 (page). Results were analyzed and interpreted from our Statistical Analysis charts. The user tests were made in clinical institutions, where clinicians diagnose several patients for a Single-Modality vs Multi-Modality comparison. For example, in these t
1 PAPER • NO BENCHMARKS YET
Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we present our severity rates (BIRADS) of clinicians while diagnosing several patients from our User Tests and Analysis 4 (UTA4) study. Here, we provide a dataset for the measurements of severity rates (BIRADS) concerning the patient diagnostic. Work and results are published on a top Human-Computer Interaction (HCI) conference named AVI 2020 (page). Results were analyzed and interpreted from our Statistical Analysis charts. The user tests were made in clinical institutions, where clinicians diagnose several patients for a Single-Modality vs Multi-Modality comparison. For example, in these tests, we used both prototype-single-modality and prototype-multi-modality repositories for the comparison. On the same hand, the hereby dataset represents the pieces of information of bot
1 PAPER • NO BENCHMARKS YET
The CANDOR corpus is a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections.
1 PAPER • NO BENCHMARKS YET
This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition).
1 PAPER • 1 BENCHMARK
Co/FeMn bilayers measured.
1 PAPER • NO BENCHMARKS YET
This dataset includes Direct Borohydride Fuel Cell (DBFC) impedance and polarization test in anode with Pd/C, Pt/C and Pd decorated Ni–Co/rGO catalysts. In fact, different concentration of Sodium Borohydride (SBH), applied voltages and various anode catalysts loading with explanation of experimental details of electrochemical analysis are considered in data. Voltage, power density and resistance of DBFC change as a function of weight percent of SBH (%), applied voltage and amount of anode catalyst loading that are evaluated by polarization and impedance curves with using appropriate equivalent circuit of fuel cell. Can be stated that interpretation of electrochemical behavior changes by the data of related cell is inevitable, which can be useful in simulation, power source investigation and depth analysis in DB fuel cell researches.

OCR-free Document Understanding Transformer

clovaai/donut • • 30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

yenchenlin/nerf-pytorch • • ECCV 2020

Multimodal Image Synthesis and Editing: A Survey

fnzhan/mise • • 27 Dec 2021

As information exists in various modalities in real world, effective interaction and fusion among multimodal information plays a key role for the creation and perception of multimodal data in computer vision and deep learning research.

DeepInteraction: 3D Object Detection via Modality Interaction

Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy.

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

andreas128/RePaint • • CVPR 2022

In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks.

YOLOv4: Optimal Speed and Accuracy of Object Detection

nemonameless/PaddleDetection_YOLOSeries • • 23 Apr 2020

There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy.

Statistical significance testing plays an important role when drawing conclusions from experimental results in NLP papers.

cosFormer: Rethinking Softmax in Attention

OpenNLPLab/cosFormer • • ICLR 2022

As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation

mit-han-lab/bevfusion • • 26 May 2022

Multi-sensor fusion is essential for an accurate and reliable autonomous driving system.

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

alibaba/EasyCV • • 6 Jun 2022

In this paper we present Mask DINO, a unified object detection and segmentation framework.

Datasets

Subscribe to the PwC Newsletter

Join the community

You need to log in to edit.
You can create a new account if you don’t have one.

93 dataset results for Environment
OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. It includes environment such as Algorithmic, Atari, Box2D, Classic Control, MuJoCo, Robotics, and Toy Text.
907 PAPERS • 3 BENCHMARKS
MuJoCo (multi-joint dynamics with contact) is a physics engine used to implement environments to benchmark Reinforcement Learning methods.
901 PAPERS • 2 BENCHMARKS
CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation).
583 PAPERS • 2 BENCHMARKS
The Arcade Learning Environment (ALE) is an object-oriented framework that allows researchers to develop AI agents for Atari 2600 games. It is built on top of the Atari 2600 emulator Stella and separates the details of emulation from agent design.
301 PAPERS • 57 BENCHMARKS
The DeepMind Control Suite (DMCS) is a set of simulated continuous control environments with a standardized structure and interpretable rewards. The tasks are written and powered by the MuJoCo physics engine, making them easy to identify. Control Suite tasks include Pendulum, Acrobot, Cart-pole, Cart-k-pole, Ball in cup, Point-mass, Reacher, Finger, Hooper, Fish, Cheetah, Walker, Manipulator, Manipulator extra, Stacker, Swimmer, Humanoid, Humanoid_CMU and LQR.
187 PAPERS • 11 BENCHMARKS
AirSim is a simulator for drones, cars and more, built on Unreal Engine. It is open-source, cross platform, and supports software-in-the-loop simulation with popular flight controllers such as PX4 & ArduPilot and hardware-in-loop with PX4 for physically and visually realistic simulations. It is developed as an Unreal plugin that can simply be dropped into any Unreal environment. Similarly, there exists an experimental version for a Unity plugin.
159 PAPERS • NO BENCHMARKS YET
D4RL is a collection of environments for offline reinforcement learning. These environments include Maze2D, AntMaze, Adroit, Gym, Flow, FrankKitchen and CARLA.
142 PAPERS • NO BENCHMARKS YET
ViZDoom is an AI research platform based on the classical First Person Shooter game Doom. The most popular game mode is probably the so-called Death Match, where several players join in a maze and fight against each other. After a fixed time, the match ends and all the players are ranked by the FRAG scores defined as kills minus suicides. During the game, each player can access various observations, including the first-person view screen pixels, the corresponding depth-map and segmentation-map (pixel-wise object labels), the bird-view maze map, etc. The valid actions include almost all the keyboard-stroke and mouse-control a human player can take, accounting for moving, turning, jumping, shooting, changing weapon, etc. ViZDoom can run a game either synchronously or asynchronously, indicating whether the game core waits until all players’ actions are collected or runs in a constant frame rate without waiting.
124 PAPERS • 3 BENCHMARKS
AI2-Thor is an interactive environment for embodied AI. It contains four types of scenes, including kitchen, living room, bedroom and bathroom, and each scene includes 30 rooms, where each room is unique in terms of furniture placement and item types. There are over 2000 unique objects for AI agents to interact with.
118 PAPERS • 1 BENCHMARK
TORCS (The Open Racing Car Simulator) is a driving simulator. It is capable of simulating the essential elements of vehicular dynamics such as mass, rotational inertia, collision, mechanics of suspensions, links and differentials, friction and aerodynamics. Physics simulation is simplified and is carried out through Euler integration of differential equations at a temporal discretization level of 0.002 seconds. The rendering pipeline is lightweight and based on OpenGL that can be turned off for faster training. TORCS offers a large variety of tracks and cars as free assets. It also provides a number of programmed robot cars with different levels of performance that can be used to benchmark the performance of human players and software driving agents. TORCS was built with the goal of developing Artificial Intelligence for vehicular control and has been used extensively by the machine learning community ever since its inception.
79 PAPERS • NO BENCHMARKS YET
RLBench is an ambitious large-scale benchmark and learning environment designed to facilitate research in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning.
34 PAPERS • 1 BENCHMARK
The General Video Game AI (GVGAI) framework is widely used in research which features a corpus of over 100 single-player games and 60 two-player games. These are fairly small games, each focusing on specific mechanics or skills the players should be able to demonstrate, including clones of classic arcade games such as Space Invaders, puzzle games like Sokoban, adventure games like Zelda or game-theory problems such as the Iterative Prisoners Dilemma. All games are real-time and require players to make decisions in only 40ms at every game tick, although not all games explicitly reward or require fast reactions; in fact, some of the best game-playing approaches add up the time in the beginning of the game to run Breadth-First Search in puzzle games in order to find an accurate solution. However, given the large variety of games (many of which are stochastic and difficult to predict accurately), scoring systems and termination conditions, all unknown to the players, highly-adaptive genera
31 PAPERS • NO BENCHMARKS YET
Jericho is a learning environment for man-made Interactive Fiction (IF) games.
30 PAPERS • NO BENCHMARKS YET
The StarCraft II Learning Environment (S2LE) is a reinforcement learning environment based on the game StarCraft II. The environment consists of three sub-components: a Linux StarCraft II binary, the StarCraft II API and PySC2. The StarCraft II API allows programmatic control of StarCraft II. It can be used to start a game, get observations, take actions, and review replays. PyC2 is a Python environment that wraps the StarCraft II API to ease the interaction between Python reinforcement learning agents and StarCraft II. It defines an action and observation specification, and includes a random agent and a handful of rule-based agents as examples. It also includes some mini-games as challenges and visualization tools to understand what the agent can see and do.
23 PAPERS • NO BENCHMARKS YET
MINOS is a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments. MINOS leverages large datasets of complex 3D environments and supports flexible configuration of multimodal sensor suites.
22 PAPERS • NO BENCHMARKS YET
Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators. Brax is written in JAX and is designed for use on acceleration hardware. It is both efficient for single-device simulation, and scalable to massively parallel simulation on multiple devices, without the need for pesky datacenters.
21 PAPERS • NO BENCHMARKS YET
HoME (Household Multimodal Environment) is a multimodal environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more.
20 PAPERS • NO BENCHMARKS YET
Obstacle Tower is a high fidelity, 3D, 3rd person, procedurally generated environment for reinforcement learning. An agent playing Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other benchmarks such as the Arcade Learning Environment, evaluation of agent performance in Obstacle Tower is based on an agent’s ability to perform well on unseen instances of the environment.
18 PAPERS • 6 BENCHMARKS
Benchmark for physical reasoning that contains a set of simple classical mechanics puzzles in a 2D physical environment. The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles.
18 PAPERS • 2 BENCHMARKS
Gibson is an opensource perceptual and physics simulator to explore active and real-world perception. The Gibson Environment is used for Real-World Perception Learning.
13 PAPERS • NO BENCHMARKS YET
The StarCraft Multi-Agent Challenges+ requires agents to learn completion of multi-stage tasks and usage of environmental factors without precise reward functions. The previous challenges (SMAC) recognized as a standard benchmark of Multi-Agent Reinforcement Learning are mainly concerned with ensuring that all agents cooperatively eliminate approaching adversaries only through fine manipulation with obvious reward functions. This challenge, on the other hand, is interested in the exploration capability of MARL algorithms to efficiently learn implicit multi-stage tasks and environmental factors as well as micro-control. This study covers both offensive and defensive scenarios. In the offensive scenarios, agents must learn to first find opponents and then eliminate them. The defensive scenarios require agents to use topographic features. For example, agents need to position themselves behind protective structures to make it harder for enemies to attack.
11 PAPERS • 13 BENCHMARKS
CHALET is a 3D house simulator with support for navigation and manipulation. Unlike existing systems, CHALET supports both a wide range of object manipulation, as well as supporting complex environemnt layouts consisting of multiple rooms. The range of object manipulations includes the ability to pick up and place objects, toggle the state of objects like taps or televesions, open or close containers, and insert or remove objects from these containers. In addition, the simulator comes with 58 rooms that can be combined to create houses, including 10 default house layouts. CHALET is therefore suitable for setting up challenging environments for various AI tasks that require complex language understanding and planning, such as navigation, manipulation, instruction following, and interactive question answering.
9 PAPERS • NO BENCHMARKS YET
A rich, extensible and efficient environment that contains 45,622 human-designed 3D scenes of visually realistic houses, ranging from single-room studios to multi-storied houses, equipped with a diverse set of fully labeled 3D objects, textures and scene layouts, based on the SUNCG dataset (Song et.al.)
9 PAPERS • NO BENCHMARKS YET
The NetHack Learning Environment (NLE) is a Reinforcement Learning environment based on NetHack 3.6.6. It is designed to provide a standard reinforcement learning interface to the game, and comes with tasks that function as a first step to evaluate agents on this new environment. NetHack is one of the oldest and arguably most impactful videogames in history, as well as being one of the hardest roguelikes currently being played by humans. It is procedurally generated, rich in entities and dynamics, and overall an extremely challenging environment for current state-of-the-art RL agents, while being much cheaper to run compared to other challenging testbeds. Through NLE, the authors wish to establish NetHack as one of the next challenges for research in decision making and machine learning.
9 PAPERS • 1 BENCHMARK
Mario AI was a benchmark environment for reinforcement learning. The gameplay in Mario AI, as in the original Nintendo’s version, consists in moving the controlled character, namely Mario, through two-dimensional levels, which are viewed sideways. Mario can walk and run to the right and left, jump, and (depending on which state he is in) shoot fireballs. Gravity acts on Mario, making it necessary to jump over cliffs to get past them. Mario can be in one of three states: Small, Big (can kill enemies by jumping onto them), and Fire (can shoot fireballs).
8 PAPERS • NO BENCHMARKS YET
SUMMIT is a high-fidelity simulator that facilitates the development and testing of crowd-driving algorithms. By leveraging the open-source OpenStreetMap map database and a heterogeneous multi-agent motion prediction model developed in our earlier work, SUMMIT simulates dense, unregulated urban traffic for heterogeneous agents at any worldwide locations that OpenStreetMap supports. SUMMIT is built as an extension of CARLA and inherits from it the physical and visual realism for autonomous driving simulation. SUMMIT supports a wide range of applications, including perception, vehicle control, planning, and end-to-end learning.
8 PAPERS • NO BENCHMARKS YET
Our dataset which consists of multiple indoor and outdoor experiments for up to 30 m gNB-UE link. In each experiment, we fixed the location of the gNB and move the UE with an increment of roughly one degrees. The table above specifies the direction of user movement with respect to gNB-UE link, distance resolution, and the number of user locations for which we conduct channel measurements. Outdoor 30 m data also contains blockage between 3.9 m to 4.8 m. At each location, we scan the transmission beam and collect data for each beam. By doing so, we can get the full OFDM channels for different locations along the moving trajectory with all the beam angles. Moreover, we use 240 kHz subcarrier spacing, which is consistent with the 5G NR numerology at FR2, so the data we collect will be a true reflection of what a 5G UE will see.
7 PAPERS • NO BENCHMARKS YET
ALFWorld contains interactive TextWorld environments (Côté et. al) that parallel embodied worlds in the ALFRED dataset (Shridhar et. al). The aligned environments allow agents to reason and learn high-level policies in an abstract space before solving embodied tasks through low-level actuation.
7 PAPERS • NO BENCHMARKS YET
Griddly is an environment for grid-world based research. Griddly provides a highly optimized game state and rendering engine with a flexible high-level interface for configuring environments. Not only does Griddly offer simple interfaces for single, multi-player and RTS games, but also multiple methods of rendering, configurable partial observability and interfaces for procedural content generation.
7 PAPERS • NO BENCHMARKS YET
LANI is a 3D navigation environment and corpus, where an agent navigates between landmarks. Lani contains 27,965 crowd-sourced instructions for navigation in an open environment. Each datapoint includes an instruction, a human-annotated ground-truth demonstration trajectory, and an environment with various landmarks and lakes. The dataset train/dev/test split is 19,758/4,135/4,072. Each environment specification defines placement of 6–13 landmarks within a square grass field of size 50m×50m.
6 PAPERS • NO BENCHMARKS YET
PasticineLab is a differentiable physics benchmark, which includes a diverse collection of soft body manipulation tasks. In each task, the agent uses manipulators to deform the plasticine into the desired configuration. The underlying physics engine supports differentiable elastic and plastic deformation using the DiffTaichi system, posing many under-explored challenges to robotic agents.
6 PAPERS • NO BENCHMARKS YET
ManiSkill is a large-scale learning-from-demonstrations benchmark for articulated object manipulation with visual input (point cloud and image). ManiSkill supports object-level variations by utilizing a rich and diverse set of articulated objects, and each task is carefully designed for learning manipulations on a single category of objects. ManiSkill is equipped with high-quality demonstrations to facilitate learning-from-demonstrations approaches and perform evaluations on common baseline algorithms. ManiSkill can encourage the robot learning community to explore more on learning generalizable object manipulation skills.
5 PAPERS • NO BENCHMARKS YET
MineRL BASALT is an RL competition on solving human-judged tasks. The tasks in this competition do not have a pre-defined reward function: the goal is to produce trajectories that are judged by real humans to be effective at solving a given task.
5 PAPERS • NO BENCHMARKS YET
RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, the datasets are provided with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established. This is a dataset accompanying the paper RL Unplugged: Benchmarks for Offline Reinforcement Learning.
5 PAPERS • NO BENCHMARKS YET
Random sampled instances of the Capacitated Vehicle Routing Problem with Time Windows (CVRPTW) for 20, 50 and 100 customer nodes.
4 PAPERS • NO BENCHMARKS YET
Kubric is a data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.
4 PAPERS • NO BENCHMARKS YET
MengeROS is an open-source crowd simulation tool for robot navigation that integrates Menge with ROS. It extends Menge to introduce one or more robot agents into a crowd of pedestrians. Each robot agent is controlled by external ROS-compatible controllers. MengeROS has been used to simulate crowds with up to 1000 pedestrians and 20 robots.
4 PAPERS • NO BENCHMARKS YET
NeoRL is a collection of environments and datasets for offline reinforcement learning with a special focus on real-world applications. The design follows real-world properties like the conservative of behavior policies, limited amounts of data, high-dimensional state and action spaces, and the highly stochastic nature of the environments. The datasets include robotics, industrial control, finance trading and city management tasks with real-world properties, containing three-level sizes of dataset, three-level quality of data to mimic the dataset we will meet in offline RL scenarios. Users can use the dataset to evaluate offline RL algorithms with near real-world application nature.
4 PAPERS • NO BENCHMARKS YET
The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure. It was created to test for the ability of agents to reason and plan via latent state inference, as well as useful exploration and experimentation.
3 PAPERS • NO BENCHMARKS YET
The 2021 SIGIR workshop on eCommerce is hosting the Coveo Data Challenge for «In-session prediction for purchase intent and recommendations». The challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers. We release a new session-based dataset including more than 30M fine-grained browsing events (product detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (images, text, pricing information). On this dataset, we ask participants to showcase innovative solutions fo
3 PAPERS • 1 BENCHMARK
LemgoRL is an open-source benchmark tool for traffic signal control designed to train reinforcement learning agents in a highly realistic simulation scenario with the aim to reduce Sim2Real gap. In addition to the realistic simulation model, LemgoRL encompasses a traffic signal logic unit that ensures compliance with all regulatory and safety requirements. LemgoRL offers the same interface as the well-known OpenAI gym toolkit to enable easy deployment in existing research work.
3 PAPERS • NO BENCHMARKS YET
SPACE is a simulator for physical Interactions and causal learning in 3D environments. The SPACE simulator is used to generate the SPACE dataset, a synthetic video dataset in a 3D environment, to systematically evaluate physics-based models on a range of physical causal reasoning tasks. Inspired by daily object interactions, the SPACE dataset comprises videos depicting three types of physical events: containment, stability and contact.
3 PAPERS • 1 BENCHMARK
The 2048 game task involves training an agent to achieve high scores in the game 2048 (Wikipedia)
2 PAPERS • 1 BENCHMARK
The AtariARI (Atari Annotated RAM Interface) is an environment for representation learning. The Atari Arcade Learning Environment (ALE) does not explicitly expose any ground truth state information. However, ALE does expose the RAM state (128 bytes per timestep) which are used by the game programmer to store important state information such as the location of sprites, the state of the clock, or the current room the agent is in. To extract these variables, the dataset creators consulted commented disassemblies (or source code) of Atari 2600 games which were made available by Engelhardt and Jentzsch and CPUWIZ. The dataset creators were able to find and verify important state variables for a total of 22 games. Once this information was acquired, combining it with the ALE interface produced a wrapper that can automatically output a state label for every example frame generated from the game. The dataset creators make this available with an easy-to-use gym wrapper, which returns this infor
2 PAPERS • NO BENCHMARKS YET
CARL (context adaptive RL) provides highly configurable contextual extensions to several well-known RL environments. It’s designed to test your agent’s generalization capabilities in all scenarios where intra-task generalization is important.
2 PAPERS • NO BENCHMARKS YET
CinemAirSim is an extension of the well-known drone simulator, AirSim, with a cinematic camera as well as extended its API to control all of its parameters in real time, including various filming lenses and common cinematographic properties.
2 PAPERS • NO BENCHMARKS YET
MineRLis an imitation learning dataset with over 60 million frames of recorded human player data. The dataset includes a set of tasks which highlights many of the hardest problems in modern-day Reinforcement Learning: sparse rewards and hierarchical policies.
2 PAPERS • NO BENCHMARKS YET
Multirotor gym environment for learning control policies for various unmanned aerial vehicles.

EpiGNN: Exploring Spatial Transmission with Graph Neural Network for Regional Epidemic Forecasting

xiefeng69/epignn • • 23 Aug 2022

Epidemic forecasting is the key to effective control of epidemic transmission and helps the world mitigate the crisis that threatens public health.

Faint Features Tell: Automatic Vertebrae Fracture Screening Assisted by Contrastive Learning

Our method has a specificity of 99\% and a sensitivity of 85\% in binary classification, and a macio-F1 of 77\% in multi-classification, indicating that contrastive learning significantly improves the accuracy of vertebrae fracture screening, especially for the mild fractures and normal controls.

Hierarchically Decomposed Graph Convolutional Networks for Skeleton-Based Action Recognition

Jho-Yonsei/HD-GCN • • 23 Aug 2022

Graph convolutional networks (GCNs) are the most commonly used method for skeleton-based action recognition and have achieved remarkable performance.

pystacked: Stacking generalization and machine learning in Stata

pystacked implements stacked generalization (Wolpert, 1992) for regression and binary classification via Python’s scikit-lear>.

Unsupervised Anomaly Localization with Structural Feature-Autoencoders

felime/feature-autoencoder • • 23 Aug 2022

Threshold-adaptive Unsupervised Focal Loss for Domain Adaptation of Semantic Segmentation

ywher/TUFL • • 23 Aug 2022

In stage one, we design a threshold-adaptative unsupervised focal loss to regularize the prediction in the target domain, which has a mild gradient neutralization mechanism and mitigates the problem that hard samples are barely optimized in entropy-based methods.

CitySim: A Drone-Based Vehicle Trajectory Dataset for Safety Oriented Research and Digital Twins

The development of safety-oriented research ideas and applications requires fine-grained vehicle trajectory data that not only has high accuracy but also captures a substantial number of critical safety events.

Learning an Efficient Multimodal Depth Completion Model

dwhou/emdc-pytorch • • 23 Aug 2022

With the wide application of sparse ToF sensors in mobile devices, RGB image-guided sparse depth completion has attracted extensive attention recently, but still faces some problems.

Efficient Self-Supervision using Patch-based Contrastive Learning for Histopathology Image Segmentation

nickeopti/bach-contrastive-segmentation • • 23 Aug 2022

Contrastive self-supervised learning provides a framework to learn meaningful representations using learned notions of similarity measures from simple pretext tasks.

Joint Privacy Enhancement and Quantization in Federated Learning

langnatalie/jopeq • • 23 Aug 2022

The distributed operation of FL gives rise to challenges that are not encountered in centralized machine learning, including the need to preserve the privacy of the local datasets, and the communication load due to the repeated exchange of updated models.

Datasets

Subscribe to the PwC Newsletter

Join the community

You need to log in to edit.
You can create a new account if you don’t have one.

83 dataset results for Russian
The Universal Dependencies (UD) project seeks to develop cross-linguistically consistent treebank annotation of morphology and syntax for multiple languages. The first version of the dataset was released in 2015 and consisted of 10 treebanks over 10 languages. Version 2.7 released in 2020 consists of 183 treebanks over 104 languages. The annotation consists of UPOS (universal part-of-speech tags), XPOS (language-specific part-of-speech tags), Feats (universal morphological features), Lemmas, dependency heads and universal dependency labels.
445 PAPERS • 8 BENCHMARKS
The Cross-lingual Natural Language Inference (XNLI) corpus is the extension of the Multi-Genre NLI (MultiNLI) corpus to 15 languages. The dataset was created by manually translating the validation and test sets of MultiNLI into each of those 15 languages. The English training set was machine translated for all languages. The dataset is composed of 122k train, 2490 validation and 5010 test examples.
230 PAPERS • 10 BENCHMARKS
OpenSubtitles is collection of multilingual parallel corpora. The dataset is compiled from a large database of movie and TV subtitles and includes a total of 1689 bitexts spanning 2.6 billion sentences across 60 languages.
169 PAPERS • 1 BENCHMARK
144 PAPERS • 202 BENCHMARKS
MuST-C currently represents the largest publicly available multilingual corpus (one-to-many) for speech translation. It covers eight language directions, from English to German, Spanish, French, Italian, Dutch, Portuguese, Romanian and Russian. The corpus consists of audio, transcriptions and translations of English TED talks, and it comes with a predefined training, validation and test split.
130 PAPERS • 2 BENCHMARKS
WMT 2016 is a collection of datasets used in shared tasks of the First Conference on Machine Translation. The conference builds on ten previous Workshops on statistical Machine Translation.
123 PAPERS • 20 BENCHMARKS
XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel across 11 languages.
102 PAPERS • 2 BENCHMARKS
The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study.
89 PAPERS • 1 BENCHMARK
UNSW-NB15 is a network intrusion dataset. It contains nine different attacks, includes DoS, worms, Backdoors, and Fuzzers. The dataset contains raw network packets. The number of records in the training set is 175,341 records and the testing set is 82,332 records from the different types, attack and normal.
68 PAPERS • 2 BENCHMARKS
This corpus comprises of monolingual data for 100+ languages and also includes data for romanized languages. This was constructed using the urls and paragraph indices provided by the CC-Net repository by processing January-December 2018 Commoncrawl snapshots. Each file comprises of documents separated by double-newlines and paragraphs within the same document separated by a newline. The data is generated using the open source CC-Net repository.
52 PAPERS • NO BENCHMARKS YET
Multilingual Document Classification Corpus (MLDoc) is a cross-lingual document classification dataset covering English, German, French, Spanish, Italian, Russian, Japanese and Chinese. It is a subset of the Reuters Corpus Volume 2 selected according to the following design choices:
40 PAPERS • 11 BENCHMARKS
OSCAR or Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture. The dataset used for training multilingual models such as BART incorporates 138 GB of text.
32 PAPERS • NO BENCHMARKS YET
WMT 2020 is a collection of datasets used in shared tasks of the Fifth Conference on Machine Translation. The conference builds on a series of annual workshops and conferences on Statistical Machine Translation.
25 PAPERS • NO BENCHMARKS YET
WikiAnn is a dataset for cross-lingual name tagging and linking based on Wikipedia articles in 295 languages.
23 PAPERS • 7 BENCHMARKS
21 PAPERS • 7 BENCHMARKS
The MULTEXT-East resources are a multilingual dataset for language engineering research and development. It consists of the (1) MULTEXT-East morphosyntactic specifications, defining categories (parts-of-speech), their morphosyntactic features (attributes and values), and the compact MSD tagset representations; (2) morphosyntactic lexica, (3) the annotated parallel «1984» corpus; and (4) some comparable text and speech corpora. The specifications are available for the following macrolanguages, languages and language varieties: Albanian, Bulgarian, Chechen, Czech, Damaskini, English, Estonian, Hungarian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbo-Croatian, Slovak, Slovene, Torlak, and Ukrainian, while the other resources are available for a subset of these languages.
21 PAPERS • NO BENCHMARKS YET
Multilingual Knowledge Questions and Answers (MKQA) is an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Answers are based on a language-independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date for evaluating question answering.
20 PAPERS • NO BENCHMARKS YET
770k article and summary pairs in 18 languages from WikiHow. Gold-standard article-summary alignments across languages are extracted by aligning the images that are used to describe each how-to step in an article.
20 PAPERS • 4 BENCHMARKS
CoVoST is a large-scale multilingual speech-to-text translation corpus. Its latest 2nd version covers translations from 21 languages into English and from English into 15 languages. It has total 2880 hours of speech and is diversified with 78K speakers and 66 accents.
19 PAPERS • NO BENCHMARKS YET
News translation is a recurring WMT task. The test set is a collection of parallel corpora consisting of about 1500 English sentences translated into 5 languages (Czech, German, Finnish, Romanian, Russian, Turkish) and additional 1500 sentences from each of the 5 languages translated to English. For Romanian a third of the test set were released as a development set instead. For Turkish additional 500 sentence development set was released. The sentences were selected from dozens of news websites and translated by professional translators. The training data consists of parallel corpora to train translation models, monolingual corpora to train language models and development sets for tuning. Some training corpora were identical from WMT 2015 (Europarl, United Nations, French-English 10⁹ corpus, Common Crawl, Russian-English parallel data provided by Yandex, Wikipedia Headlines provided by CMU) and some were update (CzEng v1.6pre, News Commentary v11, monolingual news data). Additionally,
19 PAPERS • 8 BENCHMARKS
AVSpeech is a large-scale audio-visual dataset comprising speech clips with no interfering background signals. The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 4700 hours of video segments with approximately 150,000 distinct speakers, spanning a wide variety of people, languages and face poses.
18 PAPERS • NO BENCHMARKS YET
The first parallel corpus composed from United Nations documents published by the original data creator. The parallel corpus presented consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.
15 PAPERS • NO BENCHMARKS YET
XGLUE is an evaluation benchmark XGLUE,which is composed of 11 tasks that span 19 languages. For each task, the training data is only available in English. This means that to succeed at XGLUE, a model must have a strong zero-shot cross-lingual transfer capability to learn from the English data of a specific task and transfer what it learned to other languages. Comparing to its concurrent work XTREME, XGLUE has two characteristics: First, it includes cross-lingual NLU and cross-lingual NLG tasks at the same time; Second, besides including 5 existing cross-lingual tasks (i.e. NER, POS, MLQA, PAWS-X and XNLI), XGLUE selects 6 new tasks from Bing scenarios as well, including News Classification (NC), Query-Ad Matching (QADSM), Web Page Ranking (WPR), QA Matching (QAM), Question Generation (QG) and News Title Generation (NTG). Such diversities of languages, tasks and task origin provide a comprehensive benchmark for quantifying the quality of a pre-trained model on cross-lingual natural lan
14 PAPERS • 3 BENCHMARKS
MaSS (Multilingual corpus of Sentence-aligned Spoken utterances) is an extension of the CMU Wilderness Multilingual Speech Dataset, a speech dataset based on recorded readings of the New Testament.
13 PAPERS • 3 BENCHMARKS
The Multilingual Quality Estimation and Automatic Post-editing (MLQE-PE) Dataset is a dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains seven language pairs, with human labels for 9,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.
12 PAPERS • NO BENCHMARKS YET
Opusparcus is a paraphrase corpus for six European languages: German, English, Finnish, French, Russian, and Swedish. The paraphrases are extracted from the OpenSubtitles2016 corpus, which contains subtitles from movies and TV shows.
12 PAPERS • NO BENCHMARKS YET
Global Voices is a multilingual dataset for evaluating cross-lingual summarization methods. It is extracted from social-network descriptions of Global Voices news articles to cheaply collect evaluation data for into-English and from-English summarization in 15 languages.
7 PAPERS • NO BENCHMARKS YET
7 PAPERS • 1 BENCHMARK
Synbols is a dataset generator designed for probing the behavior of learning algorithms. By defining the distribution over latent factors one can craft a dataset specifically tailored to answer specific questions about a given algorithm.
7 PAPERS • NO BENCHMARKS YET
6 PAPERS • 1 BENCHMARK
WiC: The Word-in-Context Dataset A reliable benchmark for the evaluation of context-sensitive word embeddings.
6 PAPERS • 1 BENCHMARK
News translation is a recurring WMT task. The test set is a collection of parallel corpora consisting of about 1500 English sentences translated into 5 languages (Chinese, Czech, Estonian, German, Finnish, Russian, Turkish) and additional 1500 sentences from each of the 7 languages translated to English. The sentences were selected from dozens of news websites and translated by professional translators.
6 PAPERS • NO BENCHMARKS YET
XL-Sum is a comprehensive and diverse dataset for abstractive summarization comprising 1 million professionally annotated article-summary pairs from BBC, extracted using a set of carefully designed heuristics. The dataset covers 44 languages ranging from low to high-resource, for many of which no public dataset is currently available. XL-Sum is highly abstractive, concise, and of high quality, as indicated by human and intrinsic evaluation.
6 PAPERS • NO BENCHMARKS YET
XQA is a data which consists of a total amount of 90k question-answer pairs in nine languages for cross-lingual open-domain question answering.
6 PAPERS • NO BENCHMARKS YET
The Image-Grounded Language Understanding Evaluation (IGLUE) benchmark brings together—by both aggregating pre-existing datasets and creating new ones—visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. The benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups.
5 PAPERS • 13 BENCHMARKS
Frame-to-frame video alignment/synchronization

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • • 19 Aug 2022

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • • 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • • 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • • 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • • 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • • 6 Jul 2022

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • • 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • • 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Datasets

Subscribe to the PwC Newsletter

Join the community

You need to log in to edit.
You can create a new account if you don’t have one.

22 dataset results for Tables
TAT-QA (Tabular And Textual dataset for Question Answering) is a large-scale QA dataset, aiming to stimulate progress of QA research over more complex and realistic tabular and textual data, especially those requiring numerical reasoning.
13 PAPERS • 1 BENCHMARK
GitTables is a corpus of currently 1M relational tables extracted from CSV files in GitHub covering 96 topics. Table columns in GitTables have been annotated with more than 2K different semantic types from Schema.org and DBpedia. The column annotations consist of semantic types, hierarchical relations, range types, table domain and descriptions.
5 PAPERS • NO BENCHMARKS YET
The original dataset was provided by Orange telecom in France, which contains anonymized and aggregated human mobility data. The Multivariate-Mobility-Paris dataset comprises information from 2020-08-24 to 2020-11-04 (72 days during the COVID-19 pandemic), with time granularity of 30 minutes and spatial granularity of 6 coarse regions in Paris, France. In other words, it represents a multivariate time series dataset.
2 PAPERS • NO BENCHMARKS YET
Yavuz Selim TASPINAR, Murat KOKLU and Mustafa ALTIN
1 PAPER • NO BENCHMARKS YET
The ArxivPapers dataset is an unlabelled collection of over 104K papers related to machine learning and published on arXiv.org between 2007–2020. The dataset includes around 94K papers (for which LaTeX source code is available) in a structured form in which paper is split into a title, abstract, sections, paragraphs and references. Additionally, the dataset contains over 277K tables extracted from the LaTeX papers.
1 PAPER • NO BENCHMARKS YET
This dataset includes Direct Borohydride Fuel Cell (DBFC) impedance and polarization test in anode with Pd/C, Pt/C and Pd decorated Ni–Co/rGO catalysts. In fact, different concentration of Sodium Borohydride (SBH), applied voltages and various anode catalysts loading with explanation of experimental details of electrochemical analysis are considered in data. Voltage, power density and resistance of DBFC change as a function of weight percent of SBH (%), applied voltage and amount of anode catalyst loading that are evaluated by polarization and impedance curves with using appropriate equivalent circuit of fuel cell. Can be stated that interpretation of electrochemical behavior changes by the data of related cell is inevitable, which can be useful in simulation, power source investigation and depth analysis in DB fuel cell researches.
1 PAPER • NO BENCHMARKS YET
1 PAPER • NO BENCHMARKS YET
This is a real-world industrial benchmark dataset from a major medical device manufacturer for the prediction of customer escalations. The dataset contains features derived from IoT (machine log) and enterprise data including labels for escalation from a fleet of thousands of customers of high-end medical devices.
1 PAPER • NO BENCHMARKS YET
The M5Product dataset is a large-scale multi-modal pre-training dataset with coarse and fine-grained annotations for E-products.
1 PAPER • NO BENCHMARKS YET
US Macroeconomic dataset containing 14 time series of monthly observations. They have various lengths but all end in 1988. The variables: consumer price index, industrial production, nominal GNP, velocity, employment, interest rate, nominal wages, GNP deflator, money stock, real GNP, stock prices (S&P500), GNP per capita, real wages, unemployment.
1 PAPER • NO BENCHMARKS YET
This dataset are about Nafion 112 membrane standard tests and MEA activation tests of PEM fuel cell in various operation condition. Dataset include two general electrochemical analysis method, Polarization and Impedance curves. In this dataset, effect of different pressure of H2/O2 gas, different voltages and various humidity conditions in several steps are considered. Behavior of PEM fuel cell during distinct operation condition tests, activation procedure and different operation condition before and after activation analysis can be concluded from data. In Polarization curves, voltage and power density change as a function of flows of H2/O2 and relative humidity. Resistance of the used equivalent circuit of fuel cell can be calculated from Impedance data. Thus, experimental response of the cell is obvious in the presented data, which is useful in depth analysis, simulation and material performance investigation in PEM fuel cell researches.
1 PAPER • NO BENCHMARKS YET
The dataset provides information about 450 HYIPs collected between November 2020 and September 2021. This dataset was analyzed and the results are discussed in the paper.
1 PAPER • NO BENCHMARKS YET
SKAB is designed for evaluating algorithms for anomaly detection. The benchmark currently includes 30+ datasets plus Python modules for algorithms’ evaluation. Each dataset represents a multivariate time series collected from the sensors installed on the testbed. All instances are labeled for evaluating the results of solving outlier detection and changepoint detection problems.
1 PAPER • 1 BENCHMARK
Our SRSD (Feynman) datasets are designed to discuss the performance of Symbolic Regression for Scientific Discovery. We carefully reviewed the properties of each formula and its variables in the Feynman Symbolic Regression Database to design reasonably realistic sampling range of values so that our SRSD datasets can be used for evaluating the potential of SRSD such as whether or not an SR method con (re)discover physical laws from such datasets.
1 PAPER • NO BENCHMARKS YET
Our SRSD (Feynman) datasets are designed to discuss the performance of Symbolic Regression for Scientific Discovery. We carefully reviewed the properties of each formula and its variables in the Feynman Symbolic Regression Database to design reasonably realistic sampling range of values so that our SRSD datasets can be used for evaluating the potential of SRSD such as whether or not an SR method con (re)discover physical laws from such datasets.
1 PAPER • NO BENCHMARKS YET
Our SRSD (Feynman) datasets are designed to discuss the performance of Symbolic Regression for Scientific Discovery. We carefully reviewed the properties of each formula and its variables in the Feynman Symbolic Regression Database to design reasonably realistic sampling range of values so that our SRSD datasets can be used for evaluating the potential of SRSD such as whether or not an SR method con (re)discover physical laws from such datasets.
1 PAPER • NO BENCHMARKS YET
The SegmentedTables dataset is a collection of almost 2,000 tables extracted from 352 machine learning papers. Each table consists of rich text content, layout and caption. Tables are annotated with types (leaderboard, ablation, irrelevant) and cells of relevant tables are annotated with semantic roles (such as “paper model”, “competing model”, “dataset”, “metric”).
1 PAPER • NO BENCHMARKS YET
It contains data from two different realities: Food.com, a well-known American recipe site, and Planeat, an Italian site that allows you to plan recipes to save food waste. The dataset is divided into two parts: embeddings, which can be used directly to execute the work and receive suggestions, and raw data, which must first be processed into embeddings.
1 PAPER • NO BENCHMARKS YET
The eICU Collaborative Research Database is a large multi-center critical care database made available by Philips Healthcare in partnership with the MIT Laboratory for Computational Physiology.
1 PAPER • NO BENCHMARKS YET
Dataset contains about 48K contracts which are open source on Etherscan.
1 PAPER • NO BENCHMARKS YET
Here I provided the datasets I used for this analysis. It includes the tweets I streamed using the Tweepy package on Python during the peach of the wildfire season in late summer/early fall of 2020.
1 PAPER • NO BENCHMARKS YET
ata Set Name: Rice Dataset (Commeo and Osmancik) Abstract: A total of 3810 rice grain’s images were taken for the two species (Cammeo and Osmancik), processed and feature inferences were made. 7 morphological features were obtained for each grain of rice.

Datasets

Subscribe to the PwC Newsletter

Join the community

You need to log in to edit.
You can create a new account if you don’t have one.

91 dataset results for Tabular
The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset contains additional unlabeled data.
1,058 PAPERS • 7 BENCHMARKS
The MovieLens datasets, first released in 1998, describe people’s expressed preferences for movies. These preferences take the form of tuples, each the result of a person expressing a preference (a 0-5 star rating) for a movie at a particular time. These preferences were entered by way of the MovieLens web site1 — a recommender system that asks its users to give movie ratings in order to receive personalized movie recommendations.
738 PAPERS • 10 BENCHMARKS
The Medical Information Mart for Intensive Care III (MIMIC-III) dataset is a large, de-identified and publicly-available collection of medical records. Each record in the dataset includes ICD-9 codes, which identify diagnoses and procedures performed. Each code is partitioned into sub-codes, which often include specific circumstantial details. The dataset consists of 112,000 clinical reports records (average length 709.3 tokens) and 1,159 top-level ICD-9 codes. Each report is assigned to 7.6 codes, on average. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more.
560 PAPERS • 7 BENCHMARKS
NAS-Bench-101 is the first public architecture dataset for NAS research. To build NASBench-101, the authors carefully constructed a compact, yet expressive, search space, exploiting graph isomorphisms to identify 423k unique convolutional architectures. The authors trained and evaluated all of these architectures multiple times on CIFAR-10 and compiled the results into a large dataset of over 5 million trained models. This allows researchers to evaluate the quality of a diverse range of models in milliseconds by querying the precomputed dataset.
93 PAPERS • 1 BENCHMARK
Netflix Prize consists of about 100,000,000 ratings for 17,770 movies given by 480,189 users. Each rating in the training dataset consists of four entries: user, movie, date of grade, grade. Users and movies are represented with integer IDs, while ratings range from 1 to 5.
90 PAPERS • NO BENCHMARKS YET
UNSW-NB15 is a network intrusion dataset. It contains nine different attacks, includes DoS, worms, Backdoors, and Fuzzers. The dataset contains raw network packets. The number of records in the training set is 175,341 records and the testing set is 82,332 records from the different types, attack and normal.
68 PAPERS • 2 BENCHMARKS
WikiTableQuestions is a question answering dataset over semi-structured tables. It is comprised of question-answer pairs on HTML tables, and was constructed by selecting data tables from Wikipedia that contained at least 8 rows and 5 columns. Amazon Mechanical Turk workers were then tasked with writing trivia questions about each table. WikiTableQuestions contains 22,033 questions. The questions were not designed by predefined templates but were hand crafted by users, demonstrating high linguistic variance. Compared to previous datasets on knowledge bases it covers nearly 4,000 unique column headers, containing far more relations than closed domain datasets and datasets for querying knowledge bases. Its questions cover a wide range of domains, requiring operations such as table lookup, aggregation, superlatives (argmax, argmin), arithmetic operations, joins and unions.
36 PAPERS • 1 BENCHMARK
The Yahoo! Learning to Rank Challenge dataset consists of 709,877 documents encoded in 700 features and sampled from query logs of the Yahoo! search engine, spanning 29,921 queries.
23 PAPERS • NO BENCHMARKS YET
The friedman1 data set is commonly used to test semi-supervised regression methods.
22 PAPERS • NO BENCHMARKS YET
CAL500 (Computer Audition Lab 500) is a dataset aimed for evaluation of music information retrieval systems. It consists of 502 songs picked from western popular music. The audio is represented as a time series of the first 13 Mel-frequency cepstral coefficients (and their first and second derivatives) extracted by sliding a 12 ms half-overlapping short-time window over the waveform of each song. Each song has been annotated by at least 3 people with 135 musically-relevant concepts spanning six semantic categories:
18 PAPERS • NO BENCHMARKS YET
This dataset contains card descriptions of the card game Hearthstone and the code that implements them. These are obtained from the open-source implementation Hearthbreaker (https://github.com/danielyule/hearthbreaker).
17 PAPERS • NO BENCHMARKS YET
Data Set Information: Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))
16 PAPERS • 1 BENCHMARK
The T2Dv2 dataset consists of 779 tables originating from the English-language subset of the WebTables corpus. 237 tables are annotated for the Table Type Detection task, 236 for the Columns Property Annotation (CPA) task and 235 for the Row Annotation task. The annotations that are used are DBpedia types, properties and entities.
10 PAPERS • 4 BENCHMARKS
Two datasets are provided. the original dataset, in the form provided by Prof. Hofmann, contains categorical/symbolic attributes and is in the file «german.data».
9 PAPERS • NO BENCHMARKS YET
The Amazon-Google dataset for entity resolution derives from the online retailers Amazon.com and the product search service of Google accessible through the Google Base Data API. The dataset contains 1363 entities from amazon.com and 3226 google products as well as a gold standard (perfect mapping) with 1300 matching record pairs between the two data sources. The common attributes between the two data sources are: product name, product description, manufacturer and price.
8 PAPERS • 1 BENCHMARK
The Abt-Buy dataset for entity resolution derives from the online retailers Abt.com and Buy.com. The dataset contains 1081 entities from abt.com and 1092 entities from buy.com as well as a gold standard (perfect mapping) with 1097 matching record pairs between the two data sources. The common attributes between the two data sources are: product name, product description and product price.
7 PAPERS • 1 BENCHMARK
Many e-shops have started to mark-up product data within their HTML pages using the schema.org vocabulary. The Web Data Commons project regularly extracts such data from the Common Crawl, a large public web crawl. The Web Data Commons Training and Test Sets for Large-Scale Product Matching contain product offers from different e-shops in the form of binary product pairs (with corresponding label «match» or «no match») for four product categories, computers, cameras, watches and shoes.
7 PAPERS • 4 BENCHMARKS
The ToughTables (2T) dataset was created for the SemTab challenge and includes 180 tables in total. The tables in this dataset can be categorized in two groups: the control (CTRL) group tables and tough (TOUGH) group tables.
6 PAPERS • 4 BENCHMARKS
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy.
4 PAPERS • NO BENCHMARKS YET
ACS PUMS stands for American Community Survey (ACS) Public Use Microdata Sample (PUMS) and has been used to construct several tabular datasets for studying fairness in machine learning:
3 PAPERS • NO BENCHMARKS YET
3 PAPERS • NO BENCHMARKS YET
Open Dataset: Mobility Scenario FIMU
3 PAPERS • NO BENCHMARKS YET
The Papers with Code Leaderboards dataset is a collection of over 5,000 results capturing performance of machine learning models. Each result is a tuple of form (task, dataset, metric name, metric value). The data was collected using the Papers with Code review interface.
3 PAPERS • 1 BENCHMARK
VizNet-Sato is a dataset from the authors of Sato and is based on the VizNet dataset. The authors choose from VizNet only relational web tables with headers matching their selected 78 DBpedia semantic types. The selected tables are divided into two categories: Full tables and Multi-column only tables. The first category corresponds to 78,733 selected tables from VizNet, while the second category includes 32,265 tables which have more than one column. The tables of both categories are divided into 5 subsets to be able to conduct 5-fold cross validation: 4 subsets are used for training and the last for evaluation.
3 PAPERS • 2 BENCHMARKS
The WikiTables-TURL dataset was constructed by the authors of TURL and is based on the WikiTable corpus, which is a large collection of Wikipedia tables. The dataset consists of 580,171 tables divided into fixed training, validation and testing splits. Additionally, the dataset contains metadata about each table, such as the table name, table caption and column headers.
3 PAPERS • 3 BENCHMARKS
The WikipediaGS dataset was created by extracting Wikipedia tables from Wikipedia pages. It consists of 485,096 tables which were annotated with DBpedia entities for the Cell Entity Annotation (CEA) task.
3 PAPERS • 2 BENCHMARKS
A coronavirus dataset with 98 countries constructed from different reliable sources, where each row represents a country, and the columns represent geographic, climate, healthcare, economic, and demographic factors that may contribute to accelerate/slow the spread of the COVID-19. The assumptions for the different factors are as follows:
2 PAPERS • NO BENCHMARKS YET
This resource, our Concepticon, links concept labels from different conceptlists to concept sets. Each concept set is given a unique identifier, a unique label, and a human-readable definition. Concept sets are further structured by defining different relations between the concepts, as you can see in the graphic to the right, which displays the relations between concept sets linked to the concept set SIBLING. The resource can be used for various purposes. Serving as a rich reference for new and existing databases in diachronic and synchronic linguistics, it allows researchers a quick access to studies on semantic change, cross-linguistic polysemies, and semantic associations.
2 PAPERS • NO BENCHMARKS YET
2 PAPERS • 1 BENCHMARK
The GitTables-SemTab dataset is a subset of the GitTables dataset and was created to be used during the SemTab challenge. The dataset consists of 1101 tables and is used to benchmark the Column Type Annotation (CTA) task.
2 PAPERS • 2 BENCHMARKS
The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
2 PAPERS • 1 BENCHMARK
The original dataset was provided by Orange telecom in France, which contains anonymized and aggregated human mobility data. The Multivariate-Mobility-Paris dataset comprises information from 2020-08-24 to 2020-11-04 (72 days during the COVID-19 pandemic), with time granularity of 30 minutes and spatial granularity of 6 coarse regions in Paris, France. In other words, it represents a multivariate time series dataset.
2 PAPERS • NO BENCHMARKS YET
2 PAPERS • NO BENCHMARKS YET
The softwarised network data zoo (SNDZoo) is an open collection of software networking data sets aiming to streamline and ease machine learning research in the software networking domain. Most of the published data sets focus on, but are not limited to, the performance of virtualised network functions (VNFs). The data is collected using fully automated NFV benchmarking frameworks, such as tng-bench, developed by us or third party solutions like Gym. The collection of the presented data sets follows the general VNF benchmarking methodology described in.
2 PAPERS • NO BENCHMARKS YET
This resource is designed to allow for research into Natural Language Generation. In particular, with neural data-to-text approaches although it is not limited to these.
2 PAPERS • NO BENCHMARKS YET
The eSports Sensors dataset contains sensor data collected from 10 players in 22 matches in League of Legends. The sensor data collected includes:
2 PAPERS • 2 BENCHMARKS
1 PAPER • NO BENCHMARKS YET
Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which comprises a five-fold cross-validation set and independent test set totaling 25,440 unique subjective perceptual responses to augmented soundscapes presented as audio-visual stimuli. Each augmented soundscape is made by digitally adding «maskers» (bird, water, wind, traffic, construction, or silence) to urban soundscape recordings at fixed soundscape-to-masker ratios. Responses were then collected by asking participants to rate how pleasant, annoying, eventful, uneventful, vibrant, monotonous, chaotic, calm, and appropriate each augmented soundscape was, in accordance with ISO 12913-2:2018. Pa
1 PAPER • NO BENCHMARKS YET
Multimodal object recognition is still an emerging field. Thus, publicly available datasets are still rare and of small size. This dataset was developed to help fill this void and presents multimodal data for 63 objects with some visual and haptic ambiguity. The dataset contains visual, kinesthetic and tactile (audio/vibrations) data. To completely solve sensory ambiguity, sensory integration/fusion would be required. This report describes the creation and structure of the dataset. The first section explains the underlying approach used to capture the visual and haptic properties of the objects. The second section describes the technical aspects (experimental setup) needed for the collection of the data. The third section introduces the objects, while the final section describes the structure and content of the dataset.
1 PAPER • NO BENCHMARKS YET
Measurement data related to the publication „Active TLS Stack Fingerprinting: Characterizing TLS Server Deployments at Scale“. It contains weekly TLS and HTTP scan data and the TLS fingerprints for each target.
1 PAPER • NO BENCHMARKS YET
AnoShift is a large-scale anomaly detection benchmark, which focuses on splitting the test data based on its temporal distance to the training set, introducing three testing splits: IID, NEAR, and FAR. This testing scenario proves to capture the in-time performance degradation of anomaly detection methods for classical to masked language models.
1 PAPER • 2 BENCHMARKS
Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we present the real results severity (BIRADS) and pathology (post-report) classifications provided by the Radiologist Director from the Radiology Department of Hospital Fernando Fonseca while diagnosing several patients (see dataset-uta4-dicom) from our User Tests and Analysis 4 (UTA4) study. Here, we provide a dataset for the measurements of both severity (BIRADS) and pathology classifications concerning the patient diagnostic. Work and results are published on a top Human-Computer Interaction (HCI) conference named AVI 2020 (page). Results were analyzed and interpreted from our Statistical Analysis charts. The user tests were made in clinical institutions, where clinicians diagnose several patients for a Single-Modality vs Multi-Modality comparison. For example, in these t
1 PAPER • NO BENCHMARKS YET
Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we present our severity rates (BIRADS) of clinicians while diagnosing several patients from our User Tests and Analysis 4 (UTA4) study. Here, we provide a dataset for the measurements of severity rates (BIRADS) concerning the patient diagnostic. Work and results are published on a top Human-Computer Interaction (HCI) conference named AVI 2020 (page). Results were analyzed and interpreted from our Statistical Analysis charts. The user tests were made in clinical institutions, where clinicians diagnose several patients for a Single-Modality vs Multi-Modality comparison. For example, in these tests, we used both prototype-single-modality and prototype-multi-modality repositories for the comparison. On the same hand, the hereby dataset represents the pieces of information of bot
1 PAPER • NO BENCHMARKS YET
The CANDOR corpus is a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections.
1 PAPER • NO BENCHMARKS YET
This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition).
1 PAPER • 1 BENCHMARK
Co/FeMn bilayers measured.
1 PAPER • NO BENCHMARKS YET
This dataset includes Direct Borohydride Fuel Cell (DBFC) impedance and polarization test in anode with Pd/C, Pt/C and Pd decorated Ni–Co/rGO catalysts. In fact, different concentration of Sodium Borohydride (SBH), applied voltages and various anode catalysts loading with explanation of experimental details of electrochemical analysis are considered in data. Voltage, power density and resistance of DBFC change as a function of weight percent of SBH (%), applied voltage and amount of anode catalyst loading that are evaluated by polarization and impedance curves with using appropriate equivalent circuit of fuel cell. Can be stated that interpretation of electrochemical behavior changes by the data of related cell is inevitable, which can be useful in simulation, power source investigation and depth analysis in DB fuel cell researches.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • • 19 Aug 2022

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • • 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • • 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • • 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • • 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • • 6 Jul 2022

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • • 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • • 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

From Easy to Hard: A Dual Curriculum Learning Framework for Context-Aware Document Ranking

daod/dcl • • 22 Aug 2022

In this work, we propose a curriculum learning framework for context-aware document ranking, in which the ranking model learns matching signals between the search context and the candidate document in an easy-to-hard manner.

Towards Calibrated Hyper-Sphere Representation via Distribution Overlap Coefficient for Long-tailed Learning

To our knowledge, this is the first work to measure representation quality of classifiers and features from the perspective of distribution overlap coefficient.

SVD-NAS: Coupling Low-Rank Approximation and Neural Architecture Search

yu-zhewen/svd-nas • • 22 Aug 2022

The task of compressing pre-trained Deep Neural Networks has attracted wide interest of the research community due to its great benefits in freeing practitioners from data access requirements.

ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition

zju-vipa/protopformer • • 22 Aug 2022

The global prototypes are adopted to provide the global view of objects to guide local prototypes to concentrate on the foreground while eliminating the influence of the background.

Learning Branched Fusion and Orthogonal Projection for Face-Voice Association

msaadsaeed/FOP • • 22 Aug 2022

In addition, we leverage cross-modal verification and matching tasks to analyze the impact of multiple languages on face-voice association.

Individual Tree Detection in Large-Scale Urban Environments using High-Resolution Multispectral Imagery

We introduce a novel deep learning method for detection of individual trees in urban environments using high-resolution multispectral aerial imagery.

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

microsoft/unilm • • 22 Aug 2022

A big convergence of language, vision, and multimodal pretraining is emerging.

Rethinking Knowledge Distillation via Cross-Entropy

Furthermore, we smooth students’ target output to treat it as the soft target for training without teachers and propose a teacher-free new KD loss (tf-NKD).

Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation

jayqine/mgd-ssss • • 22 Aug 2022

Consequently, we offer the first attempt to provide lightweight SSSS models via a novel multi-granularity distillation (MGD) scheme, where multi-granularity is captured from three aspects: i) complementary teacher structure; ii) labeled-unlabeled data cooperative distillation; iii) hierarchical and multi-levels loss setting.

A Simple Baseline for Multi-Camera 3D Object Detection

First, we extract multi-scale features and generate the perspective object proposals on each monocular image.

General Classification

3799 papers with code • 11 benchmarks • 8 datasets

Algorithms trying to solve the general task of classification.

Benchmarks

Libraries

Datasets

Most implemented papers

Very Deep Convolutional Networks for Large-Scale Image Recognition

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting.

YOLO9000: Better, Faster, Stronger

On the 156 classes not in COCO, YOLO9000 gets 16. 0 mAP.

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

We present a class of efficient models called MobileNets for mobile and embedded vision applications.

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

ramprs/grad-cam • • ICCV 2017

For captioning and VQA, we show that even non-attention based models can localize inputs.

Convolutional Neural Networks for Sentence Classification

facebookresearch/pytext • • EMNLP 2014

We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks.

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Point cloud is an important type of geometric data structure.

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

PaddlePaddle/PaddleSeg • • 2 Nov 2015

We show that SegNet provides good performance with competitive inference time and more efficient inference memory-wise as compared to other architectures.

Going Deeper with Convolutions

We propose a deep convolutional neural network architecture codenamed «Inception», which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014).

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change.

Diverse Title Generation for Stack Overflow Posts with Multiple Sampling Enhanced Transformer

zfj1998/m3nsct5 • • 24 Aug 2022

Stack Overflow is one of the most popular programming communities where developers can seek help for their encountered problems.

Bugs in the Data: How ImageNet Misrepresents Biodiversity

We find that many of the classes are ill-defined or overlapping, and that 12% of the images are incorrectly labeled, with some classes having >90% of images incorrect.

Improving Computed Tomography (CT) Reconstruction via 3D Shape Induction

esizikova/medsynth_public • • 23 Aug 2022

Chest computed tomography (CT) imaging adds valuable insight in the diagnosis and management of pulmonary infectious diseases, like tuberculosis (TB).

Survival Mixture Density Networks

xintianhan/survival-mdn • • 23 Aug 2022

Survival MDN applies an invertible positive function to the output of Mixture Density Networks (MDNs).

Structure Regularized Attentive Network for Automatic Femoral Head Necrosis Diagnosis and Localization

tomas-lilingfeng/sranet • • 23 Aug 2022

However, due to the tissue overlap, X-ray images are difficult to provide fine-grained features for early diagnosis.

AniWho : A Quick and Accurate Way to Classify Anime Character Faces in Images

This paper aims to dive more deeply into various models available, including InceptionV3, InceptionResNetV2, MobileNetV2, and EfficientNetB7, using transfer learning to classify Japanese animation-style character faces.

ULISSE: A Tool for One-shot Sky Exploration and its Application to Active Galactic Nuclei Detection

In this work, we focus on applying our method to the detection of AGN candidates in a Sloan Digital Sky Survey galaxy sample, since the identification and classification of Active Galactic Nuclei (AGN) in the optical band still remains a challenging task in extragalactic astronomy.

Integrative conformal p-values for powerful out-of-distribution testing with labeled outliers

This paper develops novel conformal methods to test whether a new observation was sampled from the same distribution as a reference set.

On Fitness Landscape Analysis of Permutation Problems: From Distance Metrics to Mutation Operator Selection

Our implementations of the permutation metrics, permutation mutation operators, and associated evolutionary algorithm, are available in a pair of open source Java libraries.

Learning Visibility for Robust Dense Human Body Estimation

chhankyao/visdb • • 23 Aug 2022

An alternative approach is to estimate dense vertices of a predefined template body in the image space.

DP-Rewrite: Towards Reproducibility and Transparency in Differentially Private Text Rewriting

trusthlt/dp-rewrite • • 22 Aug 2022

Text rewriting with differential privacy (DP) provides concrete theoretical guarantees for protecting the privacy of individuals in textual documents.

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

lmm077/SWEM • • CVPR 2022

Matching-based methods, especially those based on space-time memory, are significantly ahead of other solutions in semi-supervised video object segmentation (VOS).

Rigid Base Biasing in Molecular Dynamics enables enhanced sampling of DNA conformations

All-atom simulations have become increasingly popular to study conformational and dynamical properties of nucleic acids as they are accurate and provide high spatial and time resolutions.

RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact DNN

huyvnphan/eccv2022-ribac • • 22 Aug 2022

Recently backdoor attack has become an emerging threat to the security of deep neural network (DNN) models.

Equivariant Hypergraph Neural Networks

jw9730/ehnn • • 22 Aug 2022

Many problems in computer vision and machine learning can be cast as learning on hypergraphs that represent higher-order relations.

Shapelet-Based Counterfactual Explanations for Multivariate Time Series

In this work, we take advantage of the inherent interpretability of shapelets to develop a model agnostic multivariate time series (MTS) counterfactual explanation algorithm.

Transductive Decoupled Variational Inference for Few-Shot Classification

anujinho/trident • • 22 Aug 2022

The versatility to learn from a handful of samples is the hallmark of human intelligence.

Efficient Planning in a Compact Latent Action Space

ZhengyaoJiang/latentplan • • 22 Aug 2022

InstanceFormer: An Online Video Instance Segmentation Framework

We propose three novel components to model short-term and long-term dependency and temporal coherence.

PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling

naver/posebert • • 22 Aug 2022

It is simple, generic and versatile, as it can be plugged on top of any image-based model to transform it in a video-based model leveraging temporal information.

Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models

ai4healthuol/sssd • • 19 Aug 2022

The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines.

SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability

havelhuang/eval_xai_robustness • • 19 Aug 2022

Interpretability of Deep Learning (DL) models is arguably the barrier in front of trustworthy AI.

PyMIC: A deep learning toolkit for annotation-efficient medical image segmentation

hilab-git/pymic • • 19 Aug 2022

We aim to develop a new deep learning toolkit to support annotation-efficient learning for medical image segmentation, which can accelerate and simply the development of deep learning models with limited annotation budget, e. g., learning from partial, sparse or noisy annotations.

An Unsupervised Short- and Long-Term Mask Representation for Multivariate Time Series Anomaly Detection

qiumiao30/slmr • • 19 Aug 2022

Anomaly detection of multivariate time series is meaningful for system behavior monitoring.

Expressing Multivariate Time Series as Graphs with Time Series Attention Transformer

radiantresearch/tsat • • 19 Aug 2022

We further visualize the embedded dynamic graphs to illustrate the graph representation power of TSAT.

Neural network facilitated ab initio derivation of linear formula: A case study on formulating the relationship between DNA motifs and gene expression

We showed that this linear model could predict gene expression levels using promoter sequences with a performance comparable to deep neural network models.

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

jeffwang987/movedepth • • 19 Aug 2022

In contrast, multi-frame depth estimation methods improve the depth accuracy thanks to the success of Multi-View Stereo (MVS), which directly makes use of geometric constraints.

Intersection of Parallels as an Early Stopping Criterion

alivard/cdc-early-stopping • • 19 Aug 2022

Using this result, we propose to train two parallel instances of a linear model, initialized with different random seeds, and use their intersection as a signal to detect overfitting.

FORBID: Fast Overlap Removal By stochastic gradIent Descent for Graph Drawing

While many graph drawing algorithms consider nodes as points, graph visualization tools often represent them as shapes.

Cross-Domain Evaluation of a Deep Learning-Based Type Inference System

dlr-dw/type-inference • • 19 Aug 2022

In this work, we investigate the generalization ability of Type4Py as a representative for state-of-the-art deep learning-based type inference systems, by conducting extensive cross-domain experiments.

Datasets

Subscribe to the PwC Newsletter

Join the community

You need to log in to edit.
You can create a new account if you don’t have one.

83 dataset results for Russian
The Universal Dependencies (UD) project seeks to develop cross-linguistically consistent treebank annotation of morphology and syntax for multiple languages. The first version of the dataset was released in 2015 and consisted of 10 treebanks over 10 languages. Version 2.7 released in 2020 consists of 183 treebanks over 104 languages. The annotation consists of UPOS (universal part-of-speech tags), XPOS (language-specific part-of-speech tags), Feats (universal morphological features), Lemmas, dependency heads and universal dependency labels.
445 PAPERS • 8 BENCHMARKS
The Cross-lingual Natural Language Inference (XNLI) corpus is the extension of the Multi-Genre NLI (MultiNLI) corpus to 15 languages. The dataset was created by manually translating the validation and test sets of MultiNLI into each of those 15 languages. The English training set was machine translated for all languages. The dataset is composed of 122k train, 2490 validation and 5010 test examples.
230 PAPERS • 10 BENCHMARKS
OpenSubtitles is collection of multilingual parallel corpora. The dataset is compiled from a large database of movie and TV subtitles and includes a total of 1689 bitexts spanning 2.6 billion sentences across 60 languages.
169 PAPERS • 1 BENCHMARK
144 PAPERS • 202 BENCHMARKS
MuST-C currently represents the largest publicly available multilingual corpus (one-to-many) for speech translation. It covers eight language directions, from English to German, Spanish, French, Italian, Dutch, Portuguese, Romanian and Russian. The corpus consists of audio, transcriptions and translations of English TED talks, and it comes with a predefined training, validation and test split.
130 PAPERS • 2 BENCHMARKS
WMT 2016 is a collection of datasets used in shared tasks of the First Conference on Machine Translation. The conference builds on ten previous Workshops on statistical Machine Translation.
123 PAPERS • 20 BENCHMARKS
XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel across 11 languages.
102 PAPERS • 2 BENCHMARKS
The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study.
89 PAPERS • 1 BENCHMARK
UNSW-NB15 is a network intrusion dataset. It contains nine different attacks, includes DoS, worms, Backdoors, and Fuzzers. The dataset contains raw network packets. The number of records in the training set is 175,341 records and the testing set is 82,332 records from the different types, attack and normal.
68 PAPERS • 2 BENCHMARKS
This corpus comprises of monolingual data for 100+ languages and also includes data for romanized languages. This was constructed using the urls and paragraph indices provided by the CC-Net repository by processing January-December 2018 Commoncrawl snapshots. Each file comprises of documents separated by double-newlines and paragraphs within the same document separated by a newline. The data is generated using the open source CC-Net repository.
52 PAPERS • NO BENCHMARKS YET
Multilingual Document Classification Corpus (MLDoc) is a cross-lingual document classification dataset covering English, German, French, Spanish, Italian, Russian, Japanese and Chinese. It is a subset of the Reuters Corpus Volume 2 selected according to the following design choices:
40 PAPERS • 11 BENCHMARKS
OSCAR or Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture. The dataset used for training multilingual models such as BART incorporates 138 GB of text.
32 PAPERS • NO BENCHMARKS YET
WMT 2020 is a collection of datasets used in shared tasks of the Fifth Conference on Machine Translation. The conference builds on a series of annual workshops and conferences on Statistical Machine Translation.
25 PAPERS • NO BENCHMARKS YET
WikiAnn is a dataset for cross-lingual name tagging and linking based on Wikipedia articles in 295 languages.
23 PAPERS • 7 BENCHMARKS
21 PAPERS • 7 BENCHMARKS
The MULTEXT-East resources are a multilingual dataset for language engineering research and development. It consists of the (1) MULTEXT-East morphosyntactic specifications, defining categories (parts-of-speech), their morphosyntactic features (attributes and values), and the compact MSD tagset representations; (2) morphosyntactic lexica, (3) the annotated parallel «1984» corpus; and (4) some comparable text and speech corpora. The specifications are available for the following macrolanguages, languages and language varieties: Albanian, Bulgarian, Chechen, Czech, Damaskini, English, Estonian, Hungarian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbo-Croatian, Slovak, Slovene, Torlak, and Ukrainian, while the other resources are available for a subset of these languages.
21 PAPERS • NO BENCHMARKS YET
Multilingual Knowledge Questions and Answers (MKQA) is an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Answers are based on a language-independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date for evaluating question answering.
20 PAPERS • NO BENCHMARKS YET
770k article and summary pairs in 18 languages from WikiHow. Gold-standard article-summary alignments across languages are extracted by aligning the images that are used to describe each how-to step in an article.
20 PAPERS • 4 BENCHMARKS
CoVoST is a large-scale multilingual speech-to-text translation corpus. Its latest 2nd version covers translations from 21 languages into English and from English into 15 languages. It has total 2880 hours of speech and is diversified with 78K speakers and 66 accents.
19 PAPERS • NO BENCHMARKS YET
News translation is a recurring WMT task. The test set is a collection of parallel corpora consisting of about 1500 English sentences translated into 5 languages (Czech, German, Finnish, Romanian, Russian, Turkish) and additional 1500 sentences from each of the 5 languages translated to English. For Romanian a third of the test set were released as a development set instead. For Turkish additional 500 sentence development set was released. The sentences were selected from dozens of news websites and translated by professional translators. The training data consists of parallel corpora to train translation models, monolingual corpora to train language models and development sets for tuning. Some training corpora were identical from WMT 2015 (Europarl, United Nations, French-English 10⁹ corpus, Common Crawl, Russian-English parallel data provided by Yandex, Wikipedia Headlines provided by CMU) and some were update (CzEng v1.6pre, News Commentary v11, monolingual news data). Additionally,
19 PAPERS • 8 BENCHMARKS
AVSpeech is a large-scale audio-visual dataset comprising speech clips with no interfering background signals. The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 4700 hours of video segments with approximately 150,000 distinct speakers, spanning a wide variety of people, languages and face poses.
18 PAPERS • NO BENCHMARKS YET
The first parallel corpus composed from United Nations documents published by the original data creator. The parallel corpus presented consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish.
15 PAPERS • NO BENCHMARKS YET
XGLUE is an evaluation benchmark XGLUE,which is composed of 11 tasks that span 19 languages. For each task, the training data is only available in English. This means that to succeed at XGLUE, a model must have a strong zero-shot cross-lingual transfer capability to learn from the English data of a specific task and transfer what it learned to other languages. Comparing to its concurrent work XTREME, XGLUE has two characteristics: First, it includes cross-lingual NLU and cross-lingual NLG tasks at the same time; Second, besides including 5 existing cross-lingual tasks (i.e. NER, POS, MLQA, PAWS-X and XNLI), XGLUE selects 6 new tasks from Bing scenarios as well, including News Classification (NC), Query-Ad Matching (QADSM), Web Page Ranking (WPR), QA Matching (QAM), Question Generation (QG) and News Title Generation (NTG). Such diversities of languages, tasks and task origin provide a comprehensive benchmark for quantifying the quality of a pre-trained model on cross-lingual natural lan
14 PAPERS • 3 BENCHMARKS
MaSS (Multilingual corpus of Sentence-aligned Spoken utterances) is an extension of the CMU Wilderness Multilingual Speech Dataset, a speech dataset based on recorded readings of the New Testament.
13 PAPERS • 3 BENCHMARKS
The Multilingual Quality Estimation and Automatic Post-editing (MLQE-PE) Dataset is a dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains seven language pairs, with human labels for 9,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.
12 PAPERS • NO BENCHMARKS YET
Opusparcus is a paraphrase corpus for six European languages: German, English, Finnish, French, Russian, and Swedish. The paraphrases are extracted from the OpenSubtitles2016 corpus, which contains subtitles from movies and TV shows.
12 PAPERS • NO BENCHMARKS YET
Global Voices is a multilingual dataset for evaluating cross-lingual summarization methods. It is extracted from social-network descriptions of Global Voices news articles to cheaply collect evaluation data for into-English and from-English summarization in 15 languages.
7 PAPERS • NO BENCHMARKS YET
7 PAPERS • 1 BENCHMARK
Synbols is a dataset generator designed for probing the behavior of learning algorithms. By defining the distribution over latent factors one can craft a dataset specifically tailored to answer specific questions about a given algorithm.
7 PAPERS • NO BENCHMARKS YET
6 PAPERS • 1 BENCHMARK
WiC: The Word-in-Context Dataset A reliable benchmark for the evaluation of context-sensitive word embeddings.
6 PAPERS • 1 BENCHMARK
News translation is a recurring WMT task. The test set is a collection of parallel corpora consisting of about 1500 English sentences translated into 5 languages (Chinese, Czech, Estonian, German, Finnish, Russian, Turkish) and additional 1500 sentences from each of the 7 languages translated to English. The sentences were selected from dozens of news websites and translated by professional translators.
6 PAPERS • NO BENCHMARKS YET
XL-Sum is a comprehensive and diverse dataset for abstractive summarization comprising 1 million professionally annotated article-summary pairs from BBC, extracted using a set of carefully designed heuristics. The dataset covers 44 languages ranging from low to high-resource, for many of which no public dataset is currently available. XL-Sum is highly abstractive, concise, and of high quality, as indicated by human and intrinsic evaluation.
6 PAPERS • NO BENCHMARKS YET
XQA is a data which consists of a total amount of 90k question-answer pairs in nine languages for cross-lingual open-domain question answering.
6 PAPERS • NO BENCHMARKS YET
The Image-Grounded Language Understanding Evaluation (IGLUE) benchmark brings together—by both aggregating pre-existing datasets and creating new ones—visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. The benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups.
5 PAPERS • 13 BENCHMARKS
Frame-to-frame video alignment/synchronization

Datasets

Subscribe to the PwC Newsletter

Join the community

You need to log in to edit.
You can create a new account if you don’t have one.

114 dataset results for RGB-D
ScanNet is an instance-level indoor RGB-D dataset that includes both 2D and 3D data. It is a collection of labeled voxels rather than points or objects. Up to now, ScanNet v2, the newest version of ScanNet, has collected 1513 annotated scans with an approximate 90% surface coverage. In the semantic segmentation task, this dataset is marked in 20 classes of annotated 3D voxelized objects.
566 PAPERS • 14 BENCHMARKS
The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect. It features:
549 PAPERS • 17 BENCHMARKS
NTU RGB+D is a large-scale dataset for RGB-D human action recognition. It involves 56,880 samples of 60 action classes collected from 40 subjects. The actions can be generally divided into three categories: 40 daily actions (e.g., drinking, eating, reading), nine health-related actions (e.g., sneezing, staggering, falling down), and 11 mutual actions (e.g., punching, kicking, hugging). These actions take place under 17 different scene conditions corresponding to 17 video sequences (i.e., S001–S017). The actions were captured using three cameras with different horizontal imaging viewpoints, namely, −45∘,0∘, and +45∘. Multi-modality information is provided for action characterization, including depth maps, 3D skeleton joint position, RGB frames, and infrared sequences. The performance evaluation is performed by a cross-subject test that split the 40 subjects into training and test groups, and by a cross-view test that employed one camera (+45∘) for testing, and the other two cameras for
286 PAPERS • 18 BENCHMARKS
The SUN RGBD dataset contains 10335 real RGB-D images of room scenes. Each RGB image has a corresponding depth and segmentation map. As many as 700 object categories are labeled. The training and testing sets contain 5285 and 5050 images, respectively.
281 PAPERS • 12 BENCHMARKS
The Matterport3D dataset is a large RGB-D dataset for scene understanding in indoor environments. It contains 10,800 panoramic views inside 90 real building-scale scenes, constructed from 194,400 RGB-D images. Each scene is a residential building consisting of multiple rooms and floor levels, and is annotated with surface construction, camera poses, and semantic segmentation.
227 PAPERS • 4 BENCHMARKS
SUNCG is a large-scale dataset of synthetic 3D scenes with dense volumetric annotations.
170 PAPERS • NO BENCHMARKS YET
TUM RGB-D is an RGB-D dataset. It contains the color and depth images of a Microsoft Kinect sensor along the ground-truth trajectory of the sensor. The data was recorded at full frame rate (30 Hz) and sensor resolution (640×480). The ground-truth trajectory was obtained from a high-accuracy motion-capture system with eight high-speed tracking cameras (100 Hz).
124 PAPERS • NO BENCHMARKS YET
The YCB-Video dataset is a large-scale video dataset for 6D object pose estimation. provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames.
103 PAPERS • 5 BENCHMARKS
SUN3D contains a large-scale RGB-D video database, with 8 annotated sequences. Each frame has a semantic segmentation of the objects in the scene and information about the camera pose. It is composed by 415 sequences captured in 254 different spaces, in 41 different buildings. Moreover, some places have been captured multiple times at different moments of the day.
83 PAPERS • NO BENCHMARKS YET
ALFRED (Action Learning From Realistic Environments and Directives), is a new benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.
59 PAPERS • NO BENCHMARKS YET
The CAD-60 and CAD-120 data sets comprise of RGB-D video sequences of humans performing activities which are recording using the Microsoft Kinect sensor. Being able to detect human activities is important for making personal assistant robots useful in performing assistive tasks. The CAD dataset comprises twelve different activities (composed of several sub-activities) performed by four people in different environments, such as a kitchen, a living room, and office, etc.
55 PAPERS • 1 BENCHMARK
T-LESS is a dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from
51 PAPERS • 2 BENCHMARKS
ETHD is a multi-view stereo benchmark / 3D reconstruction benchmark that covers a variety of indoor and outdoor scenes. Ground truth geometry has been obtained using a high-precision laser scanner. A DSLR camera as well as a synchronized multi-camera rig with varying field-of-view was used to capture images.
43 PAPERS • 1 BENCHMARK
A hand-object interaction dataset with 3D pose annotations of hand and object. The dataset contains 66,034 training images and 11,524 test images from a total of 68 sequences. The sequences are captured in multi-camera and single-camera setups and contain 10 different subjects manipulating 10 different objects from YCB dataset. The annotations are automatically obtained using an optimization algorithm. The hand pose annotations for the test set are withheld and the accuracy of the algorithms on the test set can be evaluated with standard metrics using the CodaLab challenge submission(see project page). The object pose annotations for the test and train set are provided along with the dataset.
42 PAPERS • NO BENCHMARKS YET
The EYEDIAP dataset is a dataset for gaze estimation from remote RGB, and RGB-D (standard vision and depth), cameras. The recording methodology was designed by systematically including, and isolating, most of the variables which affect the remote gaze estimation algorithms:
38 PAPERS • 2 BENCHMARKS
SceneNN is an RGB-D scene dataset consisting of more than 100 indoor scenes. The scenes are captured at various places, e.g., offices, dormitory, classrooms, pantry, etc., from University of Massachusetts Boston and Singapore University of Technology and Design. All scenes are reconstructed into triangle meshes and have per-vertex and per-pixel annotation. The dataset is additionally enriched with fine-grained information such as axis-aligned bounding boxes, oriented bounding boxes, and object poses.
37 PAPERS • 1 BENCHMARK
Diode Dense Indoor/Outdoor DEpth (DIODE) is the first standard dataset for monocular depth estimation comprising diverse indoor and outdoor scenes acquired with the same hardware setup. The training set consists of 8574 indoor and 16884 outdoor samples from 20 scans each. The validation set contains 325 indoor and 446 outdoor samples with each set from 10 different scans. The ground truth density for the indoor training and validation splits are approximately 99.54% and 99%, respectively. The density of the outdoor sets are naturally lower with 67.19% for training and 78.33% for validation subsets. The indoor and outdoor ranges for the dataset are 50m and 300m, respectively.
30 PAPERS • 2 BENCHMARKS
REAL275 is a benchmark for category-level pose estimation. It contains 4300 training frames, 950 validation and 2750 for testing across 18 different real scenes.
30 PAPERS • 1 BENCHMARK
AVD focuses on simulating robotic vision tasks in everyday indoor environments using real imagery. The dataset includes 20,000+ RGB-D images and 50,000+ 2D bounding boxes of object instances densely captured in 9 unique scenes.
25 PAPERS • 1 BENCHMARK
InteriorNet is a RGB-D for large scale interior scene understanding and mapping. The dataset contains 20M images created by pipeline:
22 PAPERS • NO BENCHMARKS YET
UAV-Human is a large dataset for human behavior understanding with UAVs. It contains 67,428 multi-modal video sequences and 119 subjects for action recognition, 22,476 frames for pose estimation, 41,290 frames and 1,144 identities for person re-identification, and 22,263 frames for attribute recognition. The dataset was collected by a flying UAV in multiple urban and rural districts in both daytime and nighttime over three months, hence covering extensive diversities w.r.t subjects, backgrounds, illuminations, weathers, occlusions, camera motions, and UAV flying attitudes. This dataset can be used for UAV-based human behavior understanding, including action recognition, pose estimation, re-identification, and attribute recognition.
22 PAPERS • 4 BENCHMARKS
The ReDWeb dataset consists of 3600 RGB-RD image pairs collected from the Web. This dataset covers a wide range of scenes and features various non-rigid objects.
21 PAPERS • NO BENCHMARKS YET
For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. Hypersim is a photorealistic synthetic dataset for holistic indoor scene understanding. It contains 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry.
16 PAPERS • 1 BENCHMARK
Dataset containing RGB-D data of 4 large scenes, comprising a total of 12 rooms, for the purpose of RGB and RGB-D camera relocalization. The RGB-D data was captured using a Structure.io depth sensor coupled with an iPad color camera. Each room was scanned multiple times, with the multiple sequences run through a global bundle adjustment in order to obtain globally aligned camera poses though all sequences of the same scene.
15 PAPERS • NO BENCHMARKS YET
The Wide Multi Channel Presentation Attack (WMCA) database consists of 1941 short video recordings of both bonafide and presentation attacks from 72 different identities. The data is recorded from several channels including color, depth, infra-red, and thermal.
14 PAPERS • 1 BENCHMARK
Washington RGB-D is a widely used testbed in the robotic community, consisting of 41,877 RGB-D images organized into 300 instances divided in 51 classes of common indoor objects (e.g. scissors, cereal box, keyboard etc). Each object instance was positioned on a turntable and captured from three different viewpoints while rotating.
14 PAPERS • NO BENCHMARKS YET
Source: https://www.vicos.si/Projects/CDTB 4.2 State-of-the-art Comparison A TH CTB (color-and-depth visual object tracking) dataset is recorded by several passive and active RGB-D setups and contains indoor as well as outdoor sequences acquired in direct sunlight. The sequences were recorded to contain significant object pose change, clutter, occlusion, and periods of long-term target absence to enable tracker evaluation under realistic conditions. Sequences are per-frame annotated with 13 visual attributes for detailed analysis. It contains around 100,000 samples. Image Source: https://www.vicos.si/Projects/CDTB
11 PAPERS • NO BENCHMARKS YET
First-Person Hand Action Benchmark is a collection of RGB-D video sequences comprised of more than 100K frames of 45 daily hand action categories, involving 26 different objects in several hand configurations.
11 PAPERS • 2 BENCHMARKS
The dataset was collected using the Intel RealSense D435i camera, which was configured to produce synchronized accelerometer and gyroscope measurements at 400 Hz, along with synchronized VGA-size (640 x 480) RGB and depth streams at 30 Hz. The depth frames are acquired using active stereo and is aligned to the RGB frame using the sensor factory calibration. All the measurements are timestamped.
11 PAPERS • 1 BENCHMARK
The Drive&Act dataset is a state of the art multi modal benchmark for driver behavior recognition. The dataset includes 3D skeletons in addition to frame-wise hierarchical labels of 9.6 Million frames captured by 6 different views and 3 modalities (RGB, IR and depth).
10 PAPERS • 1 BENCHMARK
Developing robot perception systems for handling objects in the real-world requires computer vision algorithms to be carefully scrutinized with respect to the expected operating domain. This demands large quantities of ground truth data to rigorously evaluate the performance of algorithms.
10 PAPERS • NO BENCHMARKS YET
DeepLoc is a large-scale urban outdoor localization dataset. The dataset is currently comprised of one scene spanning an area of 110 x 130 m, that a robot traverses multiple times with different driving patterns. The dataset creators use a LiDAR-based SLAM system with sub-centimeter and sub-degree accuracy to compute the pose labels that provided as groundtruth. Poses in the dataset are approximately spaced by 0.5 m which is twice as dense as other relocalization datasets.
7 PAPERS • NO BENCHMARKS YET
The EgoDexter dataset provides both 2D and 3D pose annotations for 4 testing video sequences with 3190 frames. The videos are recorded with body-mounted camera from egocentric viewpoints and contain cluttered backgrounds, fast camera motion, and complex interactions with various objects. Fingertip positions were manually annotated for 1485 out of 3190 frames.
7 PAPERS • NO BENCHMARKS YET
The HandNet dataset contains depth images of 10 participants’ hands non-rigidly deforming in front of a RealSense RGB-D camera. The annotations are generated by a magnetic annotation technique. 6D pose is available for the center of the hand as well as the five fingertips (i.e. position and orientation of each).
7 PAPERS • NO BENCHMARKS YET
NTU RGB+D 2D is a curated version of NTU RGB+D often used for skeleton-based action prediction and synthesis. It contains less number of actions.
7 PAPERS • 1 BENCHMARK
The High-Quality Wide Multi-Channel Attack database (HQ-WMCA) database consists of 2904 short multi-modal video recordings of both bona-fide and presentation attacks. There are 555 bonafide presentations from 51 participants and the remaining 2349 are presentation attacks. The data is recorded from several channels including color, depth, thermal, infrared (spectra), and short-wave infrared (spectra).
6 PAPERS • NO BENCHMARKS YET
This is a 3D action recognition dataset, also known as 3D Action Pairs dataset. The actions in this dataset are selected in pairs such that the two actions of each pair are similar in motion (have similar trajectories) and shape (have similar objects); however, the motion-shape relation is different.
6 PAPERS • 1 BENCHMARK
A Large Dataset of Object Scans is a dataset of more than ten thousand 3D scans of real objects. To create the dataset, the authors recruited 70 operators, equipped them with consumer-grade mobile 3D scanning setups, and paid them to scan objects in their environments. The operators scanned objects of their choosing, outside the laboratory and without direct supervision by computer vision professionals. The result is a large and diverse collection of object scans: from shoes, mugs, and toys to grand pianos, construction vehicles, and large outdoor sculptures. The authors worked with an attorney to ensure that data acquisition did not violate privacy constraints. The acquired data was placed in the public domain and is available freely.
5 PAPERS • NO BENCHMARKS YET
ARKitScenes is an RGB-D dataset captured with the widely available Apple LiDAR scanner. Along with the per-frame raw data (Wide Camera RGB, Ultra Wide camera RGB, LiDar scanner depth, IMU) the authors also provide the estimated ARKit camera pose and ARKit scene reconstruction for each iPad Pro sequence. In addition to the raw and processed data from the mobile device, ARKit.
5 PAPERS • 1 BENCHMARK
The Freiburg Forest dataset was collected using a Viona autonomous mobile robot platform equipped with cameras for capturing multi-spectral and multi-modal images. The dataset may be used for evaluation of different perception algorithms for segmentation, detection, classification, etc. All scenes were recorded at 20 Hz with a camera resolution of 1024×768 pixels. The data was collected on three different days to have enough variability in lighting conditions as shadows and sun angles play a crucial role in the quality of acquired images. The robot traversed about 4.7 km each day. The dataset creators provide manually annotated pixel-wise ground truth segmentation masks for 6 classes: Obstacle, Trail, Sky, Grass, Vegetation, and Void.
5 PAPERS • 2 BENCHMARKS
The Hands in action dataset (HIC) dataset has RGB-D sequences of hands interacting with objects.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • • 19 Aug 2022

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • • 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • • 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • • 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • • 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • • 6 Jul 2022

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • • 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • • 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • • 19 Aug 2022

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • • 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • • 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • • 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • • 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • • 6 Jul 2022

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • • 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • • 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

Their feedback identified that Gradio should support a variety of interfaces and frameworks, allow for easy sharing of the interface, allow for input manipulation and interactive inference by the domain expert, as well as allow embedding the interface in iPython notebooks.

Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation

ZhangGongjie/Meta-DETR • • 30 Jul 2022

Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes.

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

thudm/cogvideo • • 29 May 2022

Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation.

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

arpitbansal297/cold-diffusion-models • • 19 Aug 2022

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • • 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

PeRFception: Perception using Radiance Fields

POSTECH-CVLab/PeRFception • • 24 Aug 2022

The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

NeuMan: Neural Human Radiance Field from a Single Video

apple/ml-neuman • • 23 Mar 2022

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Audio-Visual Segmentation

opennlplab/avsbench • • 11 Jul 2022

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge.

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 • • 6 Jul 2022

Multi-scale Multi-band DenseNets for Audio Source Separation

Anjok07/ultimatevocalremovergui • • 29 Jun 2017

This paper deals with the problem of audio source separation.

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext • • 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

DualSmoke: Sketch-Based Smoke Illustration Design with Two-Stage Generative Model

shasph/dualsmoke • • 23 Aug 2022

In this work, we propose DualSmoke, two stage global-to-local generation framework for the interactive smoke illustration design.

Towards a Formal Approach for Detection of Vulnerabilities in the Android Permissions System

Android is a widely used operating system that employs a permission-based access control model.

Cryptography and Security

FirmCore Decomposition of Multilayer Networks

Social and Information Networks

Event-Triggered Model Predictive Control with Deep Reinforcement Learning for Autonomous Driving

dangfengying/rl-based-event-triggered-mpc • • 22 Aug 2022

First of all, a model-free reinforcement learning (RL) agent is used to learn the optimal event-trigger policy without the need for a complete dynamical system and communication knowledge in this framework.

Robotics Systems and Control Systems and Control

Tree-structured Auxiliary Online Knowledge Distillation

linwenye/tree-supervised • • 22 Aug 2022

Instead, in this work, we focus on the design of the global architecture and propose Tree-Structured Auxiliary online knowledge distillation (TSA), which adds more parallel peers for layers close to the output hierarchically to strengthen the effect of knowledge distillation.

Networking and Internet Architecture

Evaluating Cardiovascular Surgical Planning in Mobile Augmented Reality

Advanced surgical procedures for congenital heart diseases (CHDs) require precise planning before the surgeries.

Competition for popularity and identification of interventions on a Chinese microblogging site

Microblogging sites are important vehicles for the users to obtain information and shape public opinion thus they are arenas of continuous competition for popularity.

Social and Information Networks Physics and Society

Examining Audio Communication Mechanisms for Supervising Fleets of Agricultural Robots

In this work, we explore methods for communication between a remote human operator and multiple agbots and examine the impact of audio communication on the operator’s preferences and productivity.

Robotics Computers and Society Human-Computer Interaction Sound Audio and Speech Processing

The Single Robot Line Coverage Problem: Theory, Algorithms and Experiments

This paper addresses the single robot line coverage problem for aerial and ground robots by modeling it as an optimization problem on a graph.

Area Coverage with Multiple Capacity-Constrained Robots

We present a novel formulation for generating coverage routes for multiple capacity-constrained robots, where capacity can be specified in terms of battery life or flight time.

Papers with Code Terms of Use

Effective Date: 12 August, 2021

Papers with Code is a free resource supported by Meta Platforms, Inc. (together, “Meta Platforms”, “us», “our”) for papers, code, evaluation tables, libraries, methods, datasets and other materials relating to Machine Learning.

These Terms of Use (“Terms”) govern the use of the Papers with Code website (the “Website”), including the materials hosted on the Website. These Terms constitute an agreement between you and us, so it is important that you review them carefully. By accessing the Website, or downloading materials made available on this Website, or generally when you access any page, document or file which links to these Terms, you accept and agree to be bound by both these Terms and our Data Policy.

Content

The Website is available for a personal and non-commercial use and for informational purposes only. You may not access or use the Website if you are not at least 18 years old or the age of majority in the jurisdiction from which you are accessing the Website.

We make no representations or warranties of any kind as to the accuracy, currency, or completeness of the information and other materials made available through the Website. We are not liable for any decisions you may make in reliance on this content.

Your Commitments

We administer the Website and its contents, and in exchange for accessing the Website, we ask you to confirm that:

User-Generated Content

“User Content” means any information and content that a user submits to, or uses with, the Website (including any information entered in any open text field). This includes any content or submission that you post, submit, make available, provide or otherwise share with us on our Website. Examples include any text, code, photos, videos, or other media or content.

You assume all risks associated with the use of your User Content. If your User Content contains information relating to an identified or identifiable individual, you represent that your collection and use of it complies with the applicable laws, and you remain responsible for any disclosure of your User Content that personally identifies you or any third party. Unless otherwise specified, we are not obligated to backup any User Content. You are solely responsible for creating and maintaining your own backup copies of your User Content if you desire.

We reserve the right (but have no obligation) to review any User Content, and to investigate and/or take appropriate action against you in our sole discretion if you violate any other provision of these Terms. Such action may include terminating your account.

You should not submit or provide any confidential, private or personal information (other than your contact details if you wish to receive an answer to your query or to be contacted in response to your feedback) in the open text field.

To the extent that you provide User Content, you hereby grant us (and represent and warrant that you have the right to grant) an irrevocable, non-exclusive, royalty-free and fully-paid-up, worldwide license to reproduce, distribute, publicly display and perform, prepare derivative works of, incorporate into other works and otherwise use and exploit such User Content, and to grant sublicenses of the foregoing rights. You hereby irrevocably waive (and agree to cause to be waived) any claims and assertions of moral rights or attribution with respect to your User Content.

You acknowledge that much of the information and content available on the Website have not been verified or approved by us. Views which may be expressed by other users on the Website do not represent our views or values. Although we reserve the right to review or remove all content that appears on the Website, we do not necessarily review all of it. If we remove any User Content, we will let you know and explain any options you have to request another review, unless you seriously or repeatedly violate these Terms or if doing so may expose us or others to legal liability, harm our community of users, compromise or interfere with the integrity or operation of the Website, where we are restricted due to technical limitations, or where we are prohibited from doing so for legal reasons.

To help support our community, we encourage you to report to us content or conduct that you believe violates your rights (including intellectual property rights) or our Terms and policies.

Ownership

Excluding any User Content that you may provide, you acknowledge that all intellectual property rights, including copyrights, patents, trademarks, and trade secrets, in the Website and their content are owned by us, our suppliers, partners and/or third party licensors. For the avoidance of doubt, Meta Platforms does not claim ownership of User Content you submit or other content made available for inclusion via our Website.

Without limiting any of the above, we have the right (though not the obligation) to, in our sole discretion, refuse or remove any User Content that, in our reasonable opinion, is in any way harmful or objectionable; or terminate or deny access to and use of the Website to any individual or entity for any reason, in our sole discretion.

Please note that the above license only applies to the User Content on the Website. This license does not cover any development, improvement or modification to the Website itself.

Neither these Terms nor your access to the Website transfers to you or any third party any rights, title or interest in or to such intellectual property rights. We reserve all rights not granted in these Terms.

Prohibited Conduct

You may not access or use, or attempt to access or use, the Website to take any action that could harm us or any third party, interfere with the operation of the Website, or use the Website in a manner that violates any laws. For example, and without limitation, you may not:

In addition, you may not use our Website, or do or share anything that:

We may suspend or terminate access to the Website for any reason at any time without notice. We also may investigate and work with law enforcement authorities to prosecute users who violate the Terms. Violations of system or network security may result in civil or criminal liability.

Copyright

We respect intellectual property rights. If you believe in good faith that your work has been reproduced or is accessible on the Website in a way that constitutes copyright infringement, please provide our designated agent with the following information in writing:

Our designated agent may be reached by contacting:

Meta Platforms Designated Agent, Meta Platforms, Inc.
1601 Willow Road
Menlo Park, CA 94025
(650) 543-4800
ip@fb.com

Upon receipt of a valid notice of claimed infringement or any statement in conformance with 17 U.S.C. § 512(c)(3), we will expeditiously remove or disable access to the allegedly infringing content. We will terminate the privileges of users who repeatedly infringe copyright.

Limitation of liability

Your use of the Website (including the User Content) is at your own risk. The Website and its respective content are provided “as is” without warranties of any kind, either express or implied, including without limitation warranties of merchantability, fitness for a particular purpose, non-infringement, or other violation of rights. We do not warrant the adequacy, currency, accuracy, likely results, or completeness of the User Content, the Website or any third-party sites linked to or from the Website, or that the functions provided will be uninterrupted, virus, or error-free.

We expressly disclaim any liability for any errors or omissions in the content included on the Website, including User Content. We do not control or direct what people and others do or say, and we are not responsible for their actions or conduct (whether online or offline) or any content they share (including offensive, inappropriate, obscene, unlawful, and other objectionable content). Some jurisdictions may not allow the exclusion of implied warranties, so some of the above exclusions may not apply to you.

In no event will we, or our subsidiaries, affiliates, directors, officers, employees, agents, and assigns be liable for any direct or indirect, special, incidental, consequential or punitive damages, lost profits, or other damages whatsoever arising in connection with the User Content or use of the Website, any interruption in availability of the Website, delay in operation or transmission, computer virus, loss of data, or use, misuse, reliance, review, manipulation, or other utilization in any manner whatsoever of the Website, or the data collected through the Website, even if one or more of them has been advised of the possibility of such damages or loss.

Indemnification

You agree to indemnify, defend, and hold us and our subsidiaries, affiliates, directors, officers, employees, agents and assigns harmless from and against any and all loss, costs, expenses (including reasonable attorneys’ fees and expenses), claims, damages, and liabilities related to or associated with your submission or use of User Content, your use of the Website and any alleged violation by you of these Terms. We reserve the right to assume the exclusive defense of any claim for which we are entitled to indemnification under this section. In such event, you shall provide us with such cooperation as we reasonably request.

Modification

We reserve the right, at any time, to modify, suspend, or discontinue all or any elements of the Website (in whole or in part) with or without notice to you. You agree that we will not be liable to you or to any third party for any such modification, suspension, or discontinuation.

Third Party Materials

The Website may contain links to third party content or otherwise contain materials that are hosted by another party. We do not control, endorse, sponsor, recommend, or otherwise accept responsibility for such content. You access these materials at your own risk.

Rules about linking to our Website

You may link to our Website, provided you do so in a way that is fair and legal and does not damage our reputation or take advantage of it. You must not establish a link in such a way as to suggest any form of association, approval or endorsement on our part where none exists.

Our Website must not be framed on any other site, nor may you create a link to any part of our Website other than the home page. We reserve the right to withdraw linking permission without notice.

Miscellaneous

Papers With Code
c/o Meta Platforms, Inc.
1601 Willow Road
Menlo Park, CA 94025

Datasets

Subscribe to the PwC Newsletter

Join the community

You need to log in to edit.
You can create a new account if you don’t have one.

139 dataset results for Speech
The LibriSpeech corpus is a collection of approximately 1,000 hours of audiobooks that are a part of the LibriVox project. Most of the audiobooks come from the Project Gutenberg. The training data is split into 3 partitions of 100hr, 360hr, and 500hr sets while the dev and test data are split into the ’clean’ and ’other’ categories, respectively, depending upon how well or challening Automatic Speech Recognition systems would perform against. Each of the dev and test sets is around 5hr in audio length. This corpus also provides the n-gram language models and the corresponding texts excerpted from the Project Gutenberg books, which contain 803M tokens and 977K unique words.
1,162 PAPERS • 8 BENCHMARKS
Speech Commands is an audio dataset of spoken words designed to help train and evaluate keyword spotting systems.
206 PAPERS • 3 BENCHMARKS
WSJ0-2mix is a speech recognition corpus of speech mixtures using utterances from the Wall Street Journal (WSJ0) corpus.
105 PAPERS • 2 BENCHMARKS
AISHELL-1 is a corpus for speech recognition research and building speech recognition systems for Mandarin.
104 PAPERS • 1 BENCHMARK
MUSAN is a corpus of music, speech and noise. This dataset is suitable for training models for voice activity detection (VAD) and music/speech discrimination. The dataset consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises.
103 PAPERS • NO BENCHMARKS YET
LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, prepared by Heiga Zen with the assistance of Google Speech and Google Brain team members. The LibriTTS corpus is designed for TTS research. It is derived from the original materials (mp3 audio files from LibriVox and text files from Project Gutenberg) of the LibriSpeech corpus. The main differences from the LibriSpeech corpus are listed below:
79 PAPERS • NO BENCHMARKS YET
The WSJ0 Hipster Ambient Mixtures (WHAM!) dataset pairs each two-speaker mixture in the wsj0-2mix dataset with a unique noise background scene. It has an extension called WHAMR! that adds artificial reverberation to the speech signals in addition to the background noise.
42 PAPERS • 3 BENCHMARKS
Continuous speech separation (CSS) is an approach to handling overlapped speech in conversational audio signals. A real recorded dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate a conversation and capturing the audio replays with far-field microphones.
40 PAPERS • NO BENCHMARKS YET
The REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge is a benchmark for evaluation of automatic speech recognition techniques. The challenge assumes the scenario of capturing utterances spoken by a single stationary distant-talking speaker with 1-channe, 2-channel or 8-channel microphone-arrays in reverberant meeting rooms. It features both real recordings and simulated data.
38 PAPERS • 1 BENCHMARK
LibriMix is an open-source alternative to wsj0-2mix. Based on LibriSpeech, LibriMix consists of two- or three-speaker mixtures combined with ambient noise samples from WHAM!.
37 PAPERS • 1 BENCHMARK
Multimodal Opinionlevel Sentiment Intensity (MOSI) contains: (1) multimodal observations including transcribed speech and visual gestures as well as automatic audio and visual features, (2) opinion-level subjectivity segmentation, (3) sentiment intensity annotations with high coder agreement, and (4) alignment between words, visual and acoustic features.
35 PAPERS • 1 BENCHMARK
The VOICES corpus is a dataset to promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions.
34 PAPERS • NO BENCHMARKS YET
AISHELL-2 contains 1000 hours of clean read-speech data from iOS is free for academic usage.
33 PAPERS • NO BENCHMARKS YET
2000 HUB5 English Evaluation Transcripts was developed by the Linguistic Data Consortium (LDC) and consists of transcripts of 40 English telephone conversations used in the 2000 HUB5 evaluation sponsored by NIST (National Institute of Standards and Technology).
31 PAPERS • 1 BENCHMARK
Europarl-ST is a multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the debates held in the European Parliament in the period between 2008 and 2012.
28 PAPERS • NO BENCHMARKS YET
VoxPopuli is a large-scale multilingual corpus providing 100K hours of unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised representation learning as well as semi-supervised learning. VoxPopuli also contains 1.8K hours of transcribed speeches in 16 languages and their aligned oral interpretations into 5 other languages totaling 5.1K hours.
28 PAPERS • 1 BENCHMARK
27 PAPERS • NO BENCHMARKS YET
CN-Celeb is a large-scale speaker recognition dataset collected `in the wild’. This dataset contains more than 130,000 utterances from 1,000 Chinese celebrities, and covers 11 different genres in real world.
27 PAPERS • 1 BENCHMARK
THCHS-30 is a free Chinese speech database THCHS-30 that can be used to build a full-fledged Chinese speech recognition system.
26 PAPERS • NO BENCHMARKS YET
WHAMR! is a dataset for noisy and reverberant speech separation. It extends WHAM! by introducing synthetic reverberation to the speech sources in addition to the existing noise. Room impulse responses were generated and convolved using pyroomacoustics. Reverberation times were chosen to approximate domestic and classroom environments (expected to be similar to the restaurants and coffee shops where the WHAM! noise was collected), and further classified as high, medium, and low reverberation based on a qualitative assessment of the mixture’s noise recording.
26 PAPERS • 3 BENCHMARKS
The DIHARD II development and evaluation sets draw from a diverse set of sources exhibiting wide variation in recording equipment, recording environment, ambient noise, number of speakers, and speaker demographics. The development set includes reference diarization and speech segmentation and may be used for any purpose including system development or training.
24 PAPERS • 1 BENCHMARK
The TIMIT Acoustic-Phonetic Continuous Speech Corpus is a standard dataset used for evaluation of automatic speech recognition systems. It consists of recordings of 630 speakers of 8 dialects of American English each reading 10 phonetically-rich sentences. It also comes with the word and phone-level transcriptions of the speech.
22 PAPERS • 1 BENCHMARK
CoVoST is a large-scale multilingual speech-to-text translation corpus. Its latest 2nd version covers translations from 21 languages into English and from English into 15 languages. It has total 2880 hours of speech and is diversified with 78K speakers and 66 accents.
19 PAPERS • NO BENCHMARKS YET
AVSpeech is a large-scale audio-visual dataset comprising speech clips with no interfering background signals. The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 4700 hours of video segments with approximately 150,000 distinct speakers, spanning a wide variety of people, languages and face poses.
18 PAPERS • NO BENCHMARKS YET
AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. The corpus contains roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers and total 88035 utterances. Their auxiliary attributes such as gender, age group and native accents are explicitly marked and provided in the corpus. Accordingly, transcripts in Chinese character-level and pinyin-level are provided along with the recordings. The word & tone transcription accuracy rate is above 98%, through professional speech annotation and strict quality inspection for tone and prosody.
16 PAPERS • NO BENCHMARKS YET
GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training.
15 PAPERS • 3 BENCHMARKS
MaSS (Multilingual corpus of Sentence-aligned Spoken utterances) is an extension of the CMU Wilderness Multilingual Speech Dataset, a speech dataset based on recorded readings of the New Testament.
13 PAPERS • 3 BENCHMARKS
The Switchboard-1 Telephone Speech Corpus (LDC97S62) consists of approximately 260 hours of speech and was originally collected by Texas Instruments in 1990-1, under DARPA sponsorship. The first release of the corpus was published by NIST and distributed by the LDC in 1992-3.
13 PAPERS • 1 BENCHMARK
ESD is an Emotional Speech Database for voice conversion research. The ESD database consists of 350 parallel utterances spoken by 10 native English and 10 native Chinese speakers and covers 5 emotion categories (neutral, happy, angry, sad and surprise). More than 29 hours of speech data were recorded in a controlled acoustic environment. The database is suitable for multi-speaker and cross-lingual emotional voice conversion studies.
12 PAPERS • NO BENCHMARKS YET
In SpokenSQuAD, the document is in spoken form, the input question is in the form of text and the answer to each question is always a span in the document. The following procedures were used to generate spoken documents from the original SQuAD dataset. First, the Google text-to-speech system was used to generate the spoken version of the articles in SQuAD. Then CMU Sphinx was sued to generate the corresponding ASR transcriptions. The SQuAD training set was used to generate the training set of Spoken SQuAD, and SQuAD development set was used to generate the testing set for Spoken SQuAD. If the answer of a question did not exist in the ASR transcriptions of the associated article, the question-answer pair was removed from the dataset because these examples are too difficult for listening comprehension machine at this stage.
12 PAPERS • 1 BENCHMARK
BSTC (Baidu Speech Translation Corpus) is a large-scale dataset for automatic simultaneous interpretation. BSTC version 1.0 contains 50 hours of real speeches, including three parts, the audio files, the transcripts, and the translations. The corpus can be used to build automatic simultaneous interpretation system. The corpus is collected from the Chinese mandarin talks and reports, including science, technology, culture, economy, etc. The utterances in talks and reports are carefully transcribed into Chinese text, and further translated into English text. The sentence boundary is determined by the English text instead of the Chinese text which is analogous to the previous related corpus (TED and Translation Augmented LibriSpeech Corpus).
8 PAPERS • NO BENCHMARKS YET
8 PAPERS • 5 BENCHMARKS
Stuttering Events in Podcasts (SEP-28k) is a dataset containing over 28k clips labeled with five event types including blocks, prolongations, sound repetitions, word repetitions, and interjections. Audio comes from public podcasts largely consisting of people who stutter interviewing other people who stutter.
8 PAPERS • NO BENCHMARKS YET
SPEECH-COCO contains speech captions that are generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (more than 600h) paired with images.
8 PAPERS • NO BENCHMARKS YET
GUM is an open source multilayer English corpus of richly annotated texts from twelve text types. Annotations include:
7 PAPERS • 1 BENCHMARK
WenetSpeech is a multi-domain Mandarin corpus consisting of 10,000+ hours high-quality labeled speech, 2,400+ hours weakly labelled speech, and about 10,000 hours unlabeled speech, with 22,400+ hours in total. The authors collected the data from YouTube and Podcast, which covers a variety of speaking styles, scenarios, domains, topics, and noisy conditions. An optical character recognition (OCR) based method is introduced to generate the audio/text segmentation candidates for the YouTube data on its corresponding video captions.
7 PAPERS • 1 BENCHMARK
The MRDA corpus consists of about 75 hours of speech from 75 naturally-occurring meetings among 53 speakers. The tagset used for labeling is a modified version of the SWBD-DAMSL tagset. It is annotated with three types of information: marking of the dialogue act segment boundaries, marking of the dialogue acts and marking of correspondences between dialogue acts.
6 PAPERS • 1 BENCHMARK
Overall duration per microphone: about 36 hours (31 hrs train / 2.5 hrs dev / 2.5 hrs test) Count of microphones: 3 (Microsoft Kinect, Yamaha, Samson) Count of wave-files per microphone: about 14500 Overall count of participations: 180 (130 male / 50 female)
6 PAPERS • 1 BENCHMARK
VOCASET is a 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio. The dataset has 12 subjects and 480 sequences of about 3-4 seconds each with sentences chosen from an array of standard protocols that maximize phonetic diversity.
6 PAPERS • NO BENCHMARKS YET
5 PAPERS • 3 BENCHMARKS
JVS is a Japanese multi-speaker voice corpus which contains voice data of 100 speakers in three styles (normal, whisper, and falsetto). The corpus contains 30 hours of voice data including 22 hours of parallel normal voices.
5 PAPERS • NO BENCHMARKS YET
Libri-adhoc40 is a synchronized speech corpus which collects the replayed Librispeech data from loudspeakers by ad-hoc microphone arrays of 40 strongly synchronized distributed nodes in a real office environment. Besides, to provide the evaluation target for speech frontend processing and other applications, the authors also recorded the replayed speech in an anechoic chamber.
5 PAPERS • NO BENCHMARKS YET
VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac). Image Source: http://www.voxforge.org/home
5 PAPERS • 7 BENCHMARKS
ClovaCall is a new large-scale Korean call-based speech corpus under a goal-oriented dialog scenario from more than 11,000 people. The raw dataset of ClovaCall includes approximately 112,000 pairs of a short sentence and its corresponding spoken utterance in a restaurant reservation domain.
4 PAPERS • NO BENCHMARKS YET
The DISRPT 2019 workshop introduces the first iteration of a cross-formalism shared task on discourse unit segmentation. Since all major discourse parsing frameworks imply a segmentation of texts into segments, learning segmentations for and from diverse resources is a promising area for converging methods and insights. We provide training, development and test datasets from all available languages and treebanks in the RST, SDRT and PDTB formalisms, using a uniform format. Because different corpora, languages and frameworks use different guidelines for segmentation, the shared task is meant to promote design of flexible methods for dealing with various guidelines, and help to push forward the discussion of standards for discourse units. For datasets which have treebanks, we will evaluate in two different scenarios: with and without gold syntax, or otherwise using provided automatic parses for comparison.
4 PAPERS • NO BENCHMARKS YET
Baseline code for the three tracks of ExVo 2022 competition.
4 PAPERS • NO BENCHMARKS YET
Spoken Language Understanding Evaluation (SLUE) is a suite of benchmark tasks for spoken language understanding evaluation. It consists of limited-size labeled training sets and corresponding evaluation sets. This resource would allow the research community to track progress, evaluate pre-trained representations for higher-level tasks, and study open questions such as the utility of pipeline versus end-to-end approaches. The first phase of the SLUE benchmark suite consists of named entity recognition (NER), sentiment analysis (SA), and ASR on the corresponding datasets.
4 PAPERS • 3 BENCHMARKS
SPGISpeech (pronounced “speegie-speech”) is a large-scale transcription dataset, freely available for academic research. SPGISpeech is a collection of 5,000 hours of professionally-transcribed financial audio. Contrary to previous transcription datasets, SPGISpeech contains global english accents, strongly varying audio quality as well as both spontaneous and presentation style speech. The transcripts have each been cross-checked by multiple professional editors for high accuracy and are fully formatted including sentence structure and capitalization.

Papers with Code Newsletter #8

Welcome to the 8th issue of the Papers with Code newsletter. In this edition, we cover:

Trending Papers with Code 📄

🏆 Top Trending Papers of March 2021

A new addition to the newsletter is highlighting top papers of the month. Here is a list of the top 10 trending papers on Papers with Code for the month of March:

💬 Revisiting Neural Probabilistic Language Models

Modernized NPLM which concatenates representations of the distant context. Figure source: Sun and Iyyer (2021)

Recent developments and progress in language models are driven by advances in neural network architectures, availability of compute at scale (hardware), and large datasets. One of the earlier language models is the neural probabilistic language model (NPLM), proposed by Bengio et al. (2003), which uses a fixed window to concatenate word embeddings and passes results through a feed-forward neural network to predict the next word. Sun and Iyyer (2021) recently published a paper that revisits NPLMs by scaling this family of models to modern hardware. Besides studying NPLMs more closely, the authors also study the performance of these models when combined with the more recent transformer architecture.

What’s new: This paper investigates more closely the limitations of NPLMs such as lack of parameter sharing and small context window and to what extent these could be mitigated using modern design and optimization. The proposed modernized NPLM incorporates changes such as increasing depth and window size, residual connections, layer normalization, and dropout. Through these modifications, the perplexity is substantially decreased (216 to 31.7) as compared to the original NPLM on WikiText-103. Results indicate that NPLM outperforms a Transformer model when given shorter prefixes. Elements of the NPLM are then incorporated to Transformer LMs which improves performance across three word-level LM datasets.

🗣 SpeechStew

More recently, there has been rapid progress in speech recognition: from incorporating new techniques, to scaling model capacity, to more innovative and feature-rich libraries. Some of the recent improvements in speech recognition are attributed to large deep models and abundance of training data. While some recent end-to-end speech recognition models have shown improvements they suffer from overfitting issues on low-resource datasets. Techniques like multi-lingual training and transfer learning have helped with generalization. To continue building on these ideas, Chan et al. (2021) recently proposed a simple approach to end-to-end speech recognition, which leverages both multi-domain and transfer learning.

Key ideas: SpeechStew trains a single large neural network on a combination of various publicly available speech recognition datasets such as Common Voice and LibriSpeech. The combination of datasets is done without any domain-dependent re-balancing and re-weighting. No domain labels or additional hyperparameters are used for combining the data. Without relying on external language models as prior work do, SpeechStew achieves state-of-the-art or near state-of-the-art results on various tasks. The transfer learning capabilities of the model allows simple fine-tuning on unseen data that results in strong empirical results. Overall, the results in this work encourages the possibility to leverage all available data as opposed to training on only task-specific datasets.

⚡️ Efficiently Training Large Language Models on GPU Clusters

Trends of sizes state-of-the-art NLP models over time. Figure source: Narayanan et al. (2021)

The effectiveness of large language models on a variety of NLP tasks has been possible due to the availability of computation at scale and larger datasets. However, training large-scale language models requires efficient use of compute resources. Specifically, the large number of parameters present in these models makes it challenging as GPU memory capacity becomes insufficient and training times become longer. Narayanan et al. (2021) show ways to compose different parallelism methods such as data parallelism and pipeline model parallelism to more efficiently scale large language model training on GPU clusters.

What’s new: This work proposes to combine different types of parallelism methods to efficiently scale large language model training to thousands of GPUs. They achieve two-order-of-magnitude increase in sizes of models that can efficiently be trained as compared to existing systems. The authors propose a novel schedule that can improve throughput by more than 10% with comparable memory footprint as compared to previously-proposed approaches. The combination of parallelism techniques allows training iterations on a model with 1 trillion parameters at 502 petaFLOP/s on 3072 GPUs with per-GPU throughput of 52% of peak, which is a higher throughput compared to previous efforts to train similar-sized models (36% of theoretical peak).

🔋 EfficientNetV2

In a previous newsletter, we highlighted important recent works and progress on efficient ML models. One of the more popular methods is called EfficientNet, a convolutional neural network architecture which shows that carefully balancing network depth, width, and resolution can lead to better performance. More recently, Tan and Le (2021) proposed a newly improved EfficientNet model called EfficientNetV2 with faster training speed and better parameter efficiency than previous models.

What’s new: EfficientNetV2 combines training-aware neural architecture search and scaling to jointly optimize training speed and parameter efficiency. It is trained with an improved method of progressive learning which adaptively adjusts regularization along with image size. This method speeds up training and keeps accuracy from dropping. EfficientNetV2 significantly outperforms previous models on ImageNet and CIFAR/Cars/Flowers datasets. By pretraining on ImageNet21K, EfficientNetV2 also outperforms the recently proposed ViT-L/16 model by 2.0% accuracy on ImageNet ILSVRC2012 while training 5x-11x faster.

The automatic understanding of source code language has recently sparked interest in the field of natural language processing (NLP). This line of research has the potential to be used for building applications that improve the software engineering process. Elnaggar et al. (2021) recently proposed to use self-supervised deep learning including an encoder-decoder transformer model for tasks in the software engineering domain.

Why it matters: Understanding source code language for improving on automated software engineering tasks has been under-researched. The authors combine the latest NLP techniques to extensively study their effectiveness on six software engineering tasks, including 13 subtasks. Many different training strategies such as single-task learning and multi-task learning were used to evaluate models. Pretrained models are also made available to further encourage ML research in the software engineering domain. See detailed results on the main tasks using the table below.

Results produced by the Transformer model on various software engineering tasks. Table source: Elnaggar et al. (2021)

Trending Libraries and Datasets 🛠

Trending datasets

Trending libraries

Community Highlights ✍️

Special thanks to @xavigiro, @HTVR, @DegardinBruno, @Ijchangyu, @Johnaflalo, @Sanqing, @nicholashkust and the hundreds of contributors for all their contributions to Papers with Code.

MolGraph: a Python package for the implementation of small molecular graphs and graph neural networks with TensorFlow and Keras

Molecular machine learning (ML) has proven important for tackling various molecular problems, including the prediction of protein-drug interactions and blood brain-barrier permeability.

We find experimentally that when artificial neural networks are connected in parallel and trained together, they display the following properties.

A Graphical Model for Fusing Diverse Microbiome Data

We present a computationally scalable variational Expectation-Maximization (EM) algorithm for inferring the latent variables and the parameters of the model.

Depth-Assisted ResiDualGAN for Cross-Domain Aerial Images Semantic Segmentation

miemieyanga/ResiDualGAN-DRDG • • 21 Aug 2022

Generative methods are common approaches to minimizing the domain gap of aerial images which improves the performance of the downstream tasks, e. g., cross-domain semantic segmentation.

Inferring Sensitive Attributes from Model Explanations

vasishtduddu/attinfexplanations • • 21 Aug 2022

We focus on the specific privacy risk of attribute inference attack wherein an adversary infers sensitive attributes of an input (e. g., race and sex) given its model explanations.

GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization

xashely/gretel_extractive • • 21 Aug 2022

Recently, neural topic models (NTMs) have been incorporated into pre-trained language models (PLMs), to capture the global semantic information for text summarization.

Objects Can Move: 3D Change Detection by Geometric Transformation Constistency

katadam/objectscanmove • • 21 Aug 2022

AR/VR applications and robots need to know when the scene has changed.

A semi-supervised Teacher-Student framework for surgical tool detection and localization

mansoor-at/semi-supervised-surgical-tool-detection • • 21 Aug 2022

Therefore, in this paper we introduce a semi-supervised learning (SSL) framework in surgical tool detection paradigm which aims to mitigate the scarcity of training data and the data imbalance through a knowledge distillation approach.

Towards Principled User-side Recommender Systems

This approach opens the door to user-defined fair systems even if the official recommender system of the service is not fair.

Energy-aware Scheduling of Virtualized Base Stations in O-RAN with Online Learning

The design of Open Radio Access Network (O-RAN) compliant systems for configuring the virtualized Base Stations (vBSs) is of paramount importance for network operators.

Optimization-Based Autonomous Racing of 1:43 Scale RC Cars

This paper describes autonomous racing of RC race cars based on mathematical optimization.

Optimization and Control Robotics Systems and Control

PySINDy: A Python package for the Sparse Identification of Nonlinear Dynamics from Data

PySINDy is a Python package for the discovery of governing dynamical systems models from data.

Dynamical Systems Computational Physics

Implicit bulk-surface filtering method for node-based shape optimization and comparison of explicit and implicit filtering techniques

This work studies shape filtering techniques, namely the convolution-based (explicit) and the PDE-based (implicit), and introduces an implicit bulk-surface filtering method to control the boundary smoothness and preserve the internal mesh quality simultaneously in the course of bulk (solid) shape optimization.

Numerical Analysis Numerical Analysis Optimization and Control

Stability of Evolutionary Dynamics on Time Scales

We combine incentive, adaptive, and time-scale dynamics to study multipopulation dynamics on the simplex equipped with a large class of Riemmanian metrics, simultaneously generalizing and extending many dynamics commonly studied in dynamic game theory and evolutionary dynamics.

Dynamical Systems Information Theory Information Theory Populations and Evolution 91A22

Super-Acceleration with Cyclical Step-sizes

google/jaxopt • • NeurIPS 2021

We develop a convergence-rate analysis of momentum with cyclical step-sizes.

Optimization and Control

On symmetrizing the ultraspherical spectral method for self-adjoint problems

A mechanism is described to symmetrize the ultraspherical spectral method for self-adjoint problems.

The automatic solution of partial differential equations using a global spectral method

A spectral method for solving linear partial differential equations (PDEs) with variable coefficients and general boundary conditions defined on rectangular domains is described, based on separable representations of partial differential operators and the one-dimensional ultraspherical spectral method.

A practical framework for infinite-dimensional linear algebra

The framework contains a data structure on which row operations can be performed, allowing for the solution of linear equations by the adaptive QR approach.

Numerical Analysis 33A65, 35C11, 65N35

acados: a modular open-source framework for fast embedded optimal control

The acados software package is a collection of solvers for fast embedded optimization, intended for fast embedded applications.

Optimization and Control

Undergraduate Lecture Notes in De Rham-Hodge Theory

These lecture notes in the De Rham-Hodge theory are designed for a 1-semester undergraduate course (in mathematics, physics, engineering, chemistry or biology).

Differential Geometry Mathematical Physics Mathematical Physics

Equivariant Hypergraph Neural Networks

jw9730/ehnn • • 22 Aug 2022

Many problems in computer vision and machine learning can be cast as learning on hypergraphs that represent higher-order relations.

Towards Calibrated Hyper-Sphere Representation via Distribution Overlap Coefficient for Long-tailed Learning

To our knowledge, this is the first work to measure representation quality of classifiers and features from the perspective of distribution overlap coefficient.

Individual Tree Detection in Large-Scale Urban Environments using High-Resolution Multispectral Imagery

We introduce a novel deep learning method for detection of individual trees in urban environments using high-resolution multispectral aerial imagery.

Rethinking Knowledge Distillation via Cross-Entropy

Furthermore, we smooth students’ target output to treat it as the soft target for training without teachers and propose a teacher-free new KD loss (tf-NKD).

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

lmm077/SWEM • • CVPR 2022

Matching-based methods, especially those based on space-time memory, are significantly ahead of other solutions in semi-supervised video object segmentation (VOS).

A simple learning agent interacting with an agent-based market model

We consider the learning dynamics of a single reinforcement learning optimal execution trading agent when it interacts with an event driven agent-based financial market model.

FairDisCo: Fairer AI in Dermatology via Disentanglement Contrastive Learning

siyi-wind/FairDisCo • • 22 Aug 2022

Deep learning models have achieved great success in automating skin lesion diagnosis.

Survey of Machine Learning Techniques To Predict Heartbeat Arrhythmias

Many works in biomedical computer science research use machine learning techniques to give accurate results.

Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation

jayqine/mgd-ssss • • 22 Aug 2022

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

microsoft/unilm • • 22 Aug 2022

A big convergence of language, vision, and multimodal pretraining is emerging.

Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation

ZhangGongjie/Meta-DETR • • 30 Jul 2022

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

thudm/cogvideo • • 29 May 2022

Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation.

Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation

ZhangGongjie/Meta-DETR • • 30 Jul 2022

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

thudm/cogvideo • • 29 May 2022

Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation.

Datasets

Subscribe to the PwC Newsletter

Join the community

You need to log in to edit.
You can create a new account if you don’t have one.

155 dataset results for Graphs
The Pubmed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words.
699 PAPERS • 16 BENCHMARKS
DBpedia (from «DB» for «database») is a project aiming to extract structured content from the information created in the Wikipedia project. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.
437 PAPERS • 4 BENCHMARKS
The FB15k dataset contains knowledge base relation triples and textual mentions of Freebase entity pairs. It has a total of 592,213 triplets with 14,951 entities and 1,345 relationships. FB15K-237 is a variant of the original dataset where inverse relations are removed, since it was found that a large number of test triplets could be obtained by inverting triplets in the training set.
402 PAPERS • 7 BENCHMARKS
FrameNet is a linguistic knowledge graph containing information about lexical and predicate argument semantics of the English language. FrameNet contains two distinct entity classes: frames and lexical units, where a frame is a meaning and a lexical unit is a single meaning for a word.
396 PAPERS • NO BENCHMARKS YET
The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner. OGB is a community-driven initiative in active development.
381 PAPERS • 15 BENCHMARKS
The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on both. In total this dataset contains 232,965 posts with an average degree of 492. The first 20 days are used for training and the remaining days for testing (with 30% used for validation). For features, off-the-shelf 300-dimensional GloVe CommonCrawl word vectors are used.
333 PAPERS • 5 BENCHMARKS
The WN18 dataset has 18 relations scraped from WordNet for roughly 41,000 synsets, resulting in 141,442 triplets. It was found out that a large number of the test triplets can be found in the training set with another relation or the inverse relation. Therefore, a new version of the dataset WN18RR has been proposed to address this issue.
322 PAPERS • 4 BENCHMARKS
The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.
312 PAPERS • 20 BENCHMARKS
FB15k-237 is a link prediction dataset created from FB15k. While FB15k consists of 1,345 relations, 14,951 entities, and 592,213 triples, many triples are inverses that cause leakage from the training to testing and validation splits. FB15k-237 was created by Toutanova and Chen (2015) to ensure that the testing and evaluation datasets do not have inverse relation test leakage. In summary, FB15k-237 dataset contains 310,079 triples with 14,505 entities and 237 relation types.
270 PAPERS • 2 BENCHMARKS
Yet Another Great Ontology (YAGO) is a Knowledge Graph that augments WordNet with common knowledge facts extracted from Wikipedia, converting WordNet from a primarily linguistic resource to a common knowledge base. YAGO originally consisted of more than 1 million entities and 5 million facts describing relationships between these entities. YAGO2 grounded entities, facts, and events in time and space, contained 446 million facts about 9.8 million entities, while YAGO3 added about 1 million more entities from non-English Wikipedia articles. YAGO3-10 a subset of YAGO3, containing entities which have a minimum of 10 relations each.
264 PAPERS • 7 BENCHMARKS
WN18RR is a link prediction dataset created from WN18, which is a subset of WordNet. WN18 consists of 18 relations and 40,943 entities. However, many text triples are obtained by inverting triples from the training set. Thus the WN18RR dataset is created to ensure that the evaluation dataset does not have inverse relation test leakage. In summary, WN18RR dataset contains 93,003 triples with 40,943 entities and 11 relation types.
234 PAPERS • 2 BENCHMARKS
PROTEINS is a dataset of proteins that are classified as enzymes or non-enzymes. Nodes represent the amino acids and two nodes are connected by an edge if they are less than 6 Angstroms apart.
229 PAPERS • 1 BENCHMARK
The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 3703 unique words.
202 PAPERS • 13 BENCHMARKS
IMDB-BINARY is a movie collaboration dataset that consists of the ego-networks of 1,000 actors/actresses who played roles in movies in IMDB. In each graph, nodes represent actors/actress, and there is an edge between them if they appear in the same movie. These graphs are derived from the Action and Romance genres.
192 PAPERS • 2 BENCHMARKS
In particular, MUTAG is a collection of nitroaromatic compounds and the goal is to predict their mutagenicity on Salmonella typhimurium. Input graphs are used to represent chemical compounds, where vertices stand for atoms and are labeled by the atom type (represented by one-hot encoding), while edges between vertices represent bonds between the corresponding atoms. It includes 188 samples of chemical compounds with 7 discrete node labels.
171 PAPERS • 2 BENCHMARKS
COLLAB is a scientific collaboration dataset. A graph corresponds to a researcher’s ego network, i.e., the researcher and its collaborators are nodes and an edge indicates collaboration between two researchers. A researcher’s ego network has three possible labels, i.e., High Energy Physics, Condensed Matter Physics, and Astro Physics, which are the fields that the researcher belongs to. The dataset has 5,000 graphs and each graph has label 0, 1, or 2.
167 PAPERS • 2 BENCHMARKS
IMDB-MULTI is a relational dataset that consists of a network of 1000 actors or actresses who played roles in movies in IMDB. A node represents an actor or actress, and an edge connects two nodes when they appear in the same movie. In IMDB-MULTI, the edges are collected from three different genres: Comedy, Romance and Sci-Fi.
166 PAPERS • 2 BENCHMARKS
The NCI1 dataset comes from the cheminformatics domain, where each input graph is used as representation of a chemical compound: each vertex stands for an atom of the molecule, and edges between vertices represent bonds between atoms. This dataset is relative to anti-cancer screens where the chemicals are assessed as positive or negative to cell lung cancer. Each vertex has an input label representing the corresponding atom type, encoded by a one-hot-encoding scheme into a vector of 0/1 elements.
163 PAPERS • 2 BENCHMARKS
The DBLP is a citation network dataset. The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. The first version contains 629,814 papers and 632,752 citations. Each paper is associated with abstract, authors, year, venue, and title. The data set can be used for clustering with network and side information, studying influence in the citation network, finding the most influential papers, topic modeling analysis, etc.
140 PAPERS • 5 BENCHMARKS
ENZYMES is a dataset of 600 protein tertiary structures obtained from the BRENDA enzyme database. The ENZYMES dataset contains 6 enzymes.
124 PAPERS • 1 BENCHMARK
SNAP is a collection of large network datasets. It includes graphs representing social networks, citation networks, web graphs, online communities, online reviews and more.
113 PAPERS • NO BENCHMARKS YET
CLUSTER is a node classification tasks generated with Stochastic Block Models, which is widely used to model communities in social networks by modulating the intra- and extra-communities connections, thereby controlling the difficulty of the task. CLUSTER aims at identifying community clusters in a semi-supervised setting.
80 PAPERS • 1 BENCHMARK
PTC is a collection of 344 chemical compounds represented as graphs which report the carcinogenicity for rats. There are 19 node labels for each node.
77 PAPERS • 1 BENCHMARK
MoleculeNet is a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance.
74 PAPERS • NO BENCHMARKS YET
REDDIT-BINARY consists of graphs corresponding to online discussions on Reddit. In each graph, nodes represent users, and there is an edge between them if at least one of them respond to the other’s comment. There are four popular subreddits, namely, IAmA, AskReddit, TrollXChromosomes, and atheism. IAmA and AskReddit are two question/answer based subreddits, and TrollXChromosomes and atheism are two discussion-based subreddits. A graph is labeled according to whether it belongs to a question/answer-based community or a discussion-based community.
68 PAPERS • 1 BENCHMARK
PATTERN is a node classification tasks generated with Stochastic Block Models, which is widely used to model communities in social networks by modulating the intra- and extra-communities connections, thereby controlling the difficulty of the task. PATTERN tests the fundamental graph task of recognizing specific predetermined subgraphs.
65 PAPERS • 1 BENCHMARK
Orkut is a social network dataset consisting of friendship social network and ground-truth communities from Orkut.com on-line social network where users form friendship each other.
63 PAPERS • NO BENCHMARKS YET
Reddit-5K is a relational dataset extracted from Reddit.
61 PAPERS • 1 BENCHMARK
Source: Convolutional 2D Knowledge Graph Embeddings
54 PAPERS • 1 BENCHMARK
The AMiner Dataset is a collection of different relational datasets. It consists of a set of relational networks such as citation networks, academic social networks or topic-paper-autor networks among others.
53 PAPERS • NO BENCHMARKS YET
53 PAPERS • 1 BENCHMARK
The Materials Project is a collection of chemical compounds labelled with different attributes.
48 PAPERS • 2 BENCHMARKS
WebKB is a dataset that includes web pages from computer science departments of various universities. 4,518 web pages are categorized into 6 imbalanced categories (Student, Faculty, Staff, Department, Course, Project). Additionally there is Other miscellanea category that is not comparable to the rest.
48 PAPERS • 3 BENCHMARKS
Friendster is an on-line gaming network. Before re-launching as a game website, Friendster was a social networking site where users can form friendship edge each other. Friendster social network also allows users form a group which other members can then join. The Friendster dataset consist of ground-truth communities (based on user-defined groups) and the social network from induced subgraph of the nodes that either belong to at least one community or are connected to other nodes that belong to at least one community.
46 PAPERS • NO BENCHMARKS YET
The Reuters-21578 dataset is a collection of documents with news articles. The original corpus has 10,369 documents and a vocabulary of 29,930 words.
45 PAPERS • 3 BENCHMARKS
The Slashdot dataset is a relational dataset obtained from Slashdot. Slashdot is a technology-related news website know for its specific user community. The website features user-submitted and editor-evaluated current primarily technology oriented news. In 2002 Slashdot introduced the Slashdot Zoo feature which allows users to tag each other as friends or foes. The network cotains friend/foe links between the users of Slashdot. The network was obtained in February 2009.
42 PAPERS • 2 BENCHMARKS
This corpus includes annotations of cancer-related PubMed articles, covering 3 full papers (PMID:24651010, PMID:11777939, PMID:15630473) as well as the result sections of 46 additional PubMed papers. The corpus also includes about 1000 sentences each from the BEL BioCreative training corpus and the Chicago Corpus.
40 PAPERS • 2 BENCHMARKS
The Epinions dataset is built form a who-trust-whom online social network of a general consumer review site Epinions.com. Members of the site can decide whether to »trust» each other. All the trust relationships interact and form the Web of Trust which is then combined with review ratings to determine which reviews are shown to the user. It contains 75,879 nodes and 50,8837 edges.
40 PAPERS • 2 BENCHMARKS
AIDS is a graph dataset. It consists of 2000 graphs representing molecular compounds which are constructed from the AIDS Antiviral Screen Database of Active Compounds. It contains 4395 chemical compounds, of which 423 belong to class CA, 1081 to CM, and the remaining compounds to CI.
39 PAPERS • 1 BENCHMARK
Wiki-CS is a Wikipedia-based dataset for benchmarking Graph Neural Networks. The dataset is constructed from Wikipedia categories, specifically 10 classes corresponding to branches of computer science, with very high connectivity. The node features are derived from the text of the corresponding articles. They were calculated as the average of pretrained GloVe word embeddings (Pennington et al., 2014), resulting in 300-dimensional node features.
38 PAPERS • NO BENCHMARKS YET
GAP is a graph processing benchmark suite with the goal of helping to standardize graph processing evaluations. Fewer differences between graph processing evaluations will make it easier to compare different research efforts and quantify improvements. The benchmark not only specifies graph kernels, input graphs, and evaluation methodologies, but it also provides optimized baseline implementations. These baseline implementations are representative of state-of-the-art performance, and thus new contributions should outperform them to demonstrate an improvement. The input graphs are sized appropriately for shared memory platforms, but any implementation on any platform that conforms to the benchmark’s specifications could be compared. This benchmark suite can be used in a variety of settings. Graph framework developers can demonstrate the generality of their programming model by implementing all of the benchmark’s kernels and delivering competitive performance on all of the benchmark’s gra
32 PAPERS • 1 BENCHMARK
BioGRID is a biomedical interaction repository with data compiled through comprehensive curation efforts. The current index is version 4.2.192 and searches 75,868 publications for 1,997,840 protein and genetic interactions, 29,093 chemical interactions and 959,750 post translational modifications from major model organism species.
31 PAPERS • 2 BENCHMARKS
The Ciao dataset contains rating information of users given to items, and also contain item category information. The data comes from the Epinions dataset.
26 PAPERS • 1 BENCHMARK
STRING is a collection of protein-protein interaction (PPI) networks.
26 PAPERS • NO BENCHMARKS YET
Worldtree is a corpus of explanation graphs, explanatory role ratings, and associated tablestore. It contains explanation graphs for 1,680 questions, and 4,950 tablestore rows across 62 semi-structured tables are provided. This data is intended to be paired with the AI2 Mercury Licensed questions.
26 PAPERS • NO BENCHMARKS YET
Abstract Meaning Representation (AMR) Annotation Release 2.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado’s Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 39,260 English natural language sentences from broadcast conversations, newswire, weblogs and web discussion forums.
24 PAPERS • 2 BENCHMARKS
Arxiv HEP-TH (high energy physics theory) citation graph is from the e-print arXiv and covers all the citations within a dataset of 27,770 papers with 352,807 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this. The data covers papers in the period from January 1993 to April 2003 (124 months).
24 PAPERS • 9 BENCHMARKS
The set is based on the ZINC Clean Leads collection. It contains 4,591,276 molecules in total, filtered by molecular weight in the range from 250 to 350 Daltons, a number of rotatable bonds not greater than 7, and XlogP less than or equal to 3.5. We removed molecules containing charged atoms or atoms besides C, N, S, O, F, Cl, Br, H or cycles longer than 8 atoms. The molecules were filtered via medicinal chemistry filters (MCFs) and PAINS filters.

Datasets

Subscribe to the PwC Newsletter

Join the community

You need to log in to edit.
You can create a new account if you don’t have one.

93 dataset results for Environment
OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. It includes environment such as Algorithmic, Atari, Box2D, Classic Control, MuJoCo, Robotics, and Toy Text.
907 PAPERS • 3 BENCHMARKS
MuJoCo (multi-joint dynamics with contact) is a physics engine used to implement environments to benchmark Reinforcement Learning methods.
901 PAPERS • 2 BENCHMARKS
CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation).
583 PAPERS • 2 BENCHMARKS
The Arcade Learning Environment (ALE) is an object-oriented framework that allows researchers to develop AI agents for Atari 2600 games. It is built on top of the Atari 2600 emulator Stella and separates the details of emulation from agent design.
301 PAPERS • 57 BENCHMARKS
The DeepMind Control Suite (DMCS) is a set of simulated continuous control environments with a standardized structure and interpretable rewards. The tasks are written and powered by the MuJoCo physics engine, making them easy to identify. Control Suite tasks include Pendulum, Acrobot, Cart-pole, Cart-k-pole, Ball in cup, Point-mass, Reacher, Finger, Hooper, Fish, Cheetah, Walker, Manipulator, Manipulator extra, Stacker, Swimmer, Humanoid, Humanoid_CMU and LQR.
187 PAPERS • 11 BENCHMARKS
AirSim is a simulator for drones, cars and more, built on Unreal Engine. It is open-source, cross platform, and supports software-in-the-loop simulation with popular flight controllers such as PX4 & ArduPilot and hardware-in-loop with PX4 for physically and visually realistic simulations. It is developed as an Unreal plugin that can simply be dropped into any Unreal environment. Similarly, there exists an experimental version for a Unity plugin.
159 PAPERS • NO BENCHMARKS YET
D4RL is a collection of environments for offline reinforcement learning. These environments include Maze2D, AntMaze, Adroit, Gym, Flow, FrankKitchen and CARLA.
142 PAPERS • NO BENCHMARKS YET
ViZDoom is an AI research platform based on the classical First Person Shooter game Doom. The most popular game mode is probably the so-called Death Match, where several players join in a maze and fight against each other. After a fixed time, the match ends and all the players are ranked by the FRAG scores defined as kills minus suicides. During the game, each player can access various observations, including the first-person view screen pixels, the corresponding depth-map and segmentation-map (pixel-wise object labels), the bird-view maze map, etc. The valid actions include almost all the keyboard-stroke and mouse-control a human player can take, accounting for moving, turning, jumping, shooting, changing weapon, etc. ViZDoom can run a game either synchronously or asynchronously, indicating whether the game core waits until all players’ actions are collected or runs in a constant frame rate without waiting.
124 PAPERS • 3 BENCHMARKS
AI2-Thor is an interactive environment for embodied AI. It contains four types of scenes, including kitchen, living room, bedroom and bathroom, and each scene includes 30 rooms, where each room is unique in terms of furniture placement and item types. There are over 2000 unique objects for AI agents to interact with.
118 PAPERS • 1 BENCHMARK
TORCS (The Open Racing Car Simulator) is a driving simulator. It is capable of simulating the essential elements of vehicular dynamics such as mass, rotational inertia, collision, mechanics of suspensions, links and differentials, friction and aerodynamics. Physics simulation is simplified and is carried out through Euler integration of differential equations at a temporal discretization level of 0.002 seconds. The rendering pipeline is lightweight and based on OpenGL that can be turned off for faster training. TORCS offers a large variety of tracks and cars as free assets. It also provides a number of programmed robot cars with different levels of performance that can be used to benchmark the performance of human players and software driving agents. TORCS was built with the goal of developing Artificial Intelligence for vehicular control and has been used extensively by the machine learning community ever since its inception.
79 PAPERS • NO BENCHMARKS YET
RLBench is an ambitious large-scale benchmark and learning environment designed to facilitate research in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning.
34 PAPERS • 1 BENCHMARK
The General Video Game AI (GVGAI) framework is widely used in research which features a corpus of over 100 single-player games and 60 two-player games. These are fairly small games, each focusing on specific mechanics or skills the players should be able to demonstrate, including clones of classic arcade games such as Space Invaders, puzzle games like Sokoban, adventure games like Zelda or game-theory problems such as the Iterative Prisoners Dilemma. All games are real-time and require players to make decisions in only 40ms at every game tick, although not all games explicitly reward or require fast reactions; in fact, some of the best game-playing approaches add up the time in the beginning of the game to run Breadth-First Search in puzzle games in order to find an accurate solution. However, given the large variety of games (many of which are stochastic and difficult to predict accurately), scoring systems and termination conditions, all unknown to the players, highly-adaptive genera
31 PAPERS • NO BENCHMARKS YET
Jericho is a learning environment for man-made Interactive Fiction (IF) games.
30 PAPERS • NO BENCHMARKS YET
The StarCraft II Learning Environment (S2LE) is a reinforcement learning environment based on the game StarCraft II. The environment consists of three sub-components: a Linux StarCraft II binary, the StarCraft II API and PySC2. The StarCraft II API allows programmatic control of StarCraft II. It can be used to start a game, get observations, take actions, and review replays. PyC2 is a Python environment that wraps the StarCraft II API to ease the interaction between Python reinforcement learning agents and StarCraft II. It defines an action and observation specification, and includes a random agent and a handful of rule-based agents as examples. It also includes some mini-games as challenges and visualization tools to understand what the agent can see and do.
23 PAPERS • NO BENCHMARKS YET
MINOS is a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments. MINOS leverages large datasets of complex 3D environments and supports flexible configuration of multimodal sensor suites.
22 PAPERS • NO BENCHMARKS YET
Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators. Brax is written in JAX and is designed for use on acceleration hardware. It is both efficient for single-device simulation, and scalable to massively parallel simulation on multiple devices, without the need for pesky datacenters.
21 PAPERS • NO BENCHMARKS YET
HoME (Household Multimodal Environment) is a multimodal environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more.
20 PAPERS • NO BENCHMARKS YET
Obstacle Tower is a high fidelity, 3D, 3rd person, procedurally generated environment for reinforcement learning. An agent playing Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other benchmarks such as the Arcade Learning Environment, evaluation of agent performance in Obstacle Tower is based on an agent’s ability to perform well on unseen instances of the environment.
18 PAPERS • 6 BENCHMARKS
Benchmark for physical reasoning that contains a set of simple classical mechanics puzzles in a 2D physical environment. The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles.
18 PAPERS • 2 BENCHMARKS
Gibson is an opensource perceptual and physics simulator to explore active and real-world perception. The Gibson Environment is used for Real-World Perception Learning.
13 PAPERS • NO BENCHMARKS YET
The StarCraft Multi-Agent Challenges+ requires agents to learn completion of multi-stage tasks and usage of environmental factors without precise reward functions. The previous challenges (SMAC) recognized as a standard benchmark of Multi-Agent Reinforcement Learning are mainly concerned with ensuring that all agents cooperatively eliminate approaching adversaries only through fine manipulation with obvious reward functions. This challenge, on the other hand, is interested in the exploration capability of MARL algorithms to efficiently learn implicit multi-stage tasks and environmental factors as well as micro-control. This study covers both offensive and defensive scenarios. In the offensive scenarios, agents must learn to first find opponents and then eliminate them. The defensive scenarios require agents to use topographic features. For example, agents need to position themselves behind protective structures to make it harder for enemies to attack.
11 PAPERS • 13 BENCHMARKS
CHALET is a 3D house simulator with support for navigation and manipulation. Unlike existing systems, CHALET supports both a wide range of object manipulation, as well as supporting complex environemnt layouts consisting of multiple rooms. The range of object manipulations includes the ability to pick up and place objects, toggle the state of objects like taps or televesions, open or close containers, and insert or remove objects from these containers. In addition, the simulator comes with 58 rooms that can be combined to create houses, including 10 default house layouts. CHALET is therefore suitable for setting up challenging environments for various AI tasks that require complex language understanding and planning, such as navigation, manipulation, instruction following, and interactive question answering.
9 PAPERS • NO BENCHMARKS YET
A rich, extensible and efficient environment that contains 45,622 human-designed 3D scenes of visually realistic houses, ranging from single-room studios to multi-storied houses, equipped with a diverse set of fully labeled 3D objects, textures and scene layouts, based on the SUNCG dataset (Song et.al.)
9 PAPERS • NO BENCHMARKS YET
The NetHack Learning Environment (NLE) is a Reinforcement Learning environment based on NetHack 3.6.6. It is designed to provide a standard reinforcement learning interface to the game, and comes with tasks that function as a first step to evaluate agents on this new environment. NetHack is one of the oldest and arguably most impactful videogames in history, as well as being one of the hardest roguelikes currently being played by humans. It is procedurally generated, rich in entities and dynamics, and overall an extremely challenging environment for current state-of-the-art RL agents, while being much cheaper to run compared to other challenging testbeds. Through NLE, the authors wish to establish NetHack as one of the next challenges for research in decision making and machine learning.
9 PAPERS • 1 BENCHMARK
Mario AI was a benchmark environment for reinforcement learning. The gameplay in Mario AI, as in the original Nintendo’s version, consists in moving the controlled character, namely Mario, through two-dimensional levels, which are viewed sideways. Mario can walk and run to the right and left, jump, and (depending on which state he is in) shoot fireballs. Gravity acts on Mario, making it necessary to jump over cliffs to get past them. Mario can be in one of three states: Small, Big (can kill enemies by jumping onto them), and Fire (can shoot fireballs).
8 PAPERS • NO BENCHMARKS YET
SUMMIT is a high-fidelity simulator that facilitates the development and testing of crowd-driving algorithms. By leveraging the open-source OpenStreetMap map database and a heterogeneous multi-agent motion prediction model developed in our earlier work, SUMMIT simulates dense, unregulated urban traffic for heterogeneous agents at any worldwide locations that OpenStreetMap supports. SUMMIT is built as an extension of CARLA and inherits from it the physical and visual realism for autonomous driving simulation. SUMMIT supports a wide range of applications, including perception, vehicle control, planning, and end-to-end learning.
8 PAPERS • NO BENCHMARKS YET
Our dataset which consists of multiple indoor and outdoor experiments for up to 30 m gNB-UE link. In each experiment, we fixed the location of the gNB and move the UE with an increment of roughly one degrees. The table above specifies the direction of user movement with respect to gNB-UE link, distance resolution, and the number of user locations for which we conduct channel measurements. Outdoor 30 m data also contains blockage between 3.9 m to 4.8 m. At each location, we scan the transmission beam and collect data for each beam. By doing so, we can get the full OFDM channels for different locations along the moving trajectory with all the beam angles. Moreover, we use 240 kHz subcarrier spacing, which is consistent with the 5G NR numerology at FR2, so the data we collect will be a true reflection of what a 5G UE will see.
7 PAPERS • NO BENCHMARKS YET
ALFWorld contains interactive TextWorld environments (Côté et. al) that parallel embodied worlds in the ALFRED dataset (Shridhar et. al). The aligned environments allow agents to reason and learn high-level policies in an abstract space before solving embodied tasks through low-level actuation.
7 PAPERS • NO BENCHMARKS YET
Griddly is an environment for grid-world based research. Griddly provides a highly optimized game state and rendering engine with a flexible high-level interface for configuring environments. Not only does Griddly offer simple interfaces for single, multi-player and RTS games, but also multiple methods of rendering, configurable partial observability and interfaces for procedural content generation.
7 PAPERS • NO BENCHMARKS YET
LANI is a 3D navigation environment and corpus, where an agent navigates between landmarks. Lani contains 27,965 crowd-sourced instructions for navigation in an open environment. Each datapoint includes an instruction, a human-annotated ground-truth demonstration trajectory, and an environment with various landmarks and lakes. The dataset train/dev/test split is 19,758/4,135/4,072. Each environment specification defines placement of 6–13 landmarks within a square grass field of size 50m×50m.
6 PAPERS • NO BENCHMARKS YET
PasticineLab is a differentiable physics benchmark, which includes a diverse collection of soft body manipulation tasks. In each task, the agent uses manipulators to deform the plasticine into the desired configuration. The underlying physics engine supports differentiable elastic and plastic deformation using the DiffTaichi system, posing many under-explored challenges to robotic agents.
6 PAPERS • NO BENCHMARKS YET
ManiSkill is a large-scale learning-from-demonstrations benchmark for articulated object manipulation with visual input (point cloud and image). ManiSkill supports object-level variations by utilizing a rich and diverse set of articulated objects, and each task is carefully designed for learning manipulations on a single category of objects. ManiSkill is equipped with high-quality demonstrations to facilitate learning-from-demonstrations approaches and perform evaluations on common baseline algorithms. ManiSkill can encourage the robot learning community to explore more on learning generalizable object manipulation skills.
5 PAPERS • NO BENCHMARKS YET
MineRL BASALT is an RL competition on solving human-judged tasks. The tasks in this competition do not have a pre-defined reward function: the goal is to produce trajectories that are judged by real humans to be effective at solving a given task.
5 PAPERS • NO BENCHMARKS YET
RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, the datasets are provided with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established. This is a dataset accompanying the paper RL Unplugged: Benchmarks for Offline Reinforcement Learning.
5 PAPERS • NO BENCHMARKS YET
Random sampled instances of the Capacitated Vehicle Routing Problem with Time Windows (CVRPTW) for 20, 50 and 100 customer nodes.
4 PAPERS • NO BENCHMARKS YET
Kubric is a data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.
4 PAPERS • NO BENCHMARKS YET
MengeROS is an open-source crowd simulation tool for robot navigation that integrates Menge with ROS. It extends Menge to introduce one or more robot agents into a crowd of pedestrians. Each robot agent is controlled by external ROS-compatible controllers. MengeROS has been used to simulate crowds with up to 1000 pedestrians and 20 robots.
4 PAPERS • NO BENCHMARKS YET
NeoRL is a collection of environments and datasets for offline reinforcement learning with a special focus on real-world applications. The design follows real-world properties like the conservative of behavior policies, limited amounts of data, high-dimensional state and action spaces, and the highly stochastic nature of the environments. The datasets include robotics, industrial control, finance trading and city management tasks with real-world properties, containing three-level sizes of dataset, three-level quality of data to mimic the dataset we will meet in offline RL scenarios. Users can use the dataset to evaluate offline RL algorithms with near real-world application nature.
4 PAPERS • NO BENCHMARKS YET
The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure. It was created to test for the ability of agents to reason and plan via latent state inference, as well as useful exploration and experimentation.
3 PAPERS • NO BENCHMARKS YET
The 2021 SIGIR workshop on eCommerce is hosting the Coveo Data Challenge for «In-session prediction for purchase intent and recommendations». The challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers. We release a new session-based dataset including more than 30M fine-grained browsing events (product detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (images, text, pricing information). On this dataset, we ask participants to showcase innovative solutions fo
3 PAPERS • 1 BENCHMARK
LemgoRL is an open-source benchmark tool for traffic signal control designed to train reinforcement learning agents in a highly realistic simulation scenario with the aim to reduce Sim2Real gap. In addition to the realistic simulation model, LemgoRL encompasses a traffic signal logic unit that ensures compliance with all regulatory and safety requirements. LemgoRL offers the same interface as the well-known OpenAI gym toolkit to enable easy deployment in existing research work.
3 PAPERS • NO BENCHMARKS YET
SPACE is a simulator for physical Interactions and causal learning in 3D environments. The SPACE simulator is used to generate the SPACE dataset, a synthetic video dataset in a 3D environment, to systematically evaluate physics-based models on a range of physical causal reasoning tasks. Inspired by daily object interactions, the SPACE dataset comprises videos depicting three types of physical events: containment, stability and contact.
3 PAPERS • 1 BENCHMARK
The 2048 game task involves training an agent to achieve high scores in the game 2048 (Wikipedia)
2 PAPERS • 1 BENCHMARK
The AtariARI (Atari Annotated RAM Interface) is an environment for representation learning. The Atari Arcade Learning Environment (ALE) does not explicitly expose any ground truth state information. However, ALE does expose the RAM state (128 bytes per timestep) which are used by the game programmer to store important state information such as the location of sprites, the state of the clock, or the current room the agent is in. To extract these variables, the dataset creators consulted commented disassemblies (or source code) of Atari 2600 games which were made available by Engelhardt and Jentzsch and CPUWIZ. The dataset creators were able to find and verify important state variables for a total of 22 games. Once this information was acquired, combining it with the ALE interface produced a wrapper that can automatically output a state label for every example frame generated from the game. The dataset creators make this available with an easy-to-use gym wrapper, which returns this infor
2 PAPERS • NO BENCHMARKS YET
CARL (context adaptive RL) provides highly configurable contextual extensions to several well-known RL environments. It’s designed to test your agent’s generalization capabilities in all scenarios where intra-task generalization is important.
2 PAPERS • NO BENCHMARKS YET
CinemAirSim is an extension of the well-known drone simulator, AirSim, with a cinematic camera as well as extended its API to control all of its parameters in real time, including various filming lenses and common cinematographic properties.
2 PAPERS • NO BENCHMARKS YET
MineRLis an imitation learning dataset with over 60 million frames of recorded human player data. The dataset includes a set of tasks which highlights many of the hardest problems in modern-day Reinforcement Learning: sparse rewards and hierarchical policies.
2 PAPERS • NO BENCHMARKS YET
Multirotor gym environment for learning control policies for various unmanned aerial vehicles.

Источники:
http://paperswithcode.com/?ref=semscholar&page=2
http://paperswithcode.com/task/task
http://paperswithcode.com/top-social
http://paperswithcode.com/latest?page=2
http://paperswithcode.com/datasets?mod=3d
http://paperswithcode.com/datasets?mod=tabular
http://paperswithcode.com/?page=8
http://paperswithcode.com/datasets?mod=environment
http://paperswithcode.com/latest?page=8
http://paperswithcode.com/datasets?lang=russian&page=1
http://paperswithcode.com/?game=3&per=5
http://paperswithcode.com/datasets?mod=tables
http://paperswithcode.com/datasets?mod=tabular&page=1
http://paperswithcode.com/?ref=datahackers&page=11
http://paperswithcode.com/latest?page=15
http://paperswithcode.com/task/classification
http://paperswithcode.com/latest?page=7
http://paperswithcode.com/latest?page=14
http://paperswithcode.com/latest?page=20
http://paperswithcode.com/datasets?lang=russian
http://paperswithcode.com/datasets?mod=rgb-d
http://paperswithcode.com/?ref=datahackers&page=8
http://paperswithcode.com/?ref=undesign&page=6
http://paperswithcode.com/?page=10
http://paperswithcode.com/?ref=hackernoon.com
http://cs.paperswithcode.com/latest?page=2
http://paperswithcode.com/site/terms
http://paperswithcode.com/datasets?mod=speech&page=1
http://paperswithcode.com/newsletter/8/
http://paperswithcode.com/latest?page=17
http://math.paperswithcode.com/greatest?page=2
http://paperswithcode.com/latest?page=13
http://paperswithcode.com/?page=11
http://paperswithcode.com/?page=15
http://paperswithcode.com/datasets?mod=graphs&page=1
http://paperswithcode.com/datasets?mod=environment&page=1