avatarNaina Chaturvedi

Summary

The provided web content is a curated list of 15 influential AI and ML research papers, detailing their contributions to the field and offering links to the papers and their corresponding code repositories, along with additional resources and projects for practical learning and implementation in data science, machine learning, and related domains.

Abstract

The webpage presents an overview of 15 seminal research papers in the fields of Artificial Intelligence (AI) and Machine Learning (ML), which have significantly impacted the landscape of modern technology. These papers cover a range of topics, including deep learning frameworks like TensorFlow and PyTorch, OCR technology with Tesseract, convolutional neural networks for image super-resolution, and object detection systems such as YOLOv4. The page not only provides access to the papers and their code but also highlights the importance of pre-training compact models, efficient text classification methods, and the use of subword information in word vectors. Additionally, it introduces tools like OpenAI Gym for reinforcement learning and XGBoost for scalable tree boosting. The content emphasizes the practical application of these concepts through links to various projects, tutorials, and a newsletter for ongoing learning in tech and data science.

Opinions

  • The author(s) believe that pre-training compact models is crucial for efficient text classification.
  • There is an emphasis on the effectiveness of combining different machine learning techniques, such as weighted-residual-connections (WRC) and cross-stage-partial-connections (CSP), to achieve optimal speed and accuracy in object detection.
  • The page suggests that practical testing of feature combinations on large datasets is essential for improving CNN accuracy.
  • The author(s) advocate for the use of open-source tools and libraries, such as TensorFlow, PyTorch, and fastText, to facilitate research and development in AI and ML.
  • The inclusion of humor and real-world applications in programming and data science indicates a belief in the importance of a well-rounded and engaging learning experience.
  • The recommendation of additional articles and resources implies a commitment to continuous education and the sharing of knowledge within the tech community.

15 Greatest AI/ML Research Papers Of All Time

You must read…

Pic from Unsplash.com

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

By Google Team

Abstract —

TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google.

Paper can be found here :

https://arxiv.org/pdf/1603.04467v2.pdf

Code can be found here :

Projects Videos —

All the projects, data structures, SQL, algorithms, system design, Data Science and ML , Data Analytics, Data Engineering, , Implemented Data Science and ML projects, Implemented Data Engineering Projects, Implemented Deep Learning Projects, Implemented Machine Learning Ops Projects, Implemented Time Series Analysis and Forecasting Projects, Implemented Applied Machine Learning Projects, Implemented Tensorflow and Keras Projects, Implemented PyTorch Projects, Implemented Scikit Learn Projects, Implemented Big Data Projects, Implemented Cloud Machine Learning Projects, Implemented Neural Networks Projects, Implemented OpenCV Projects,Complete ML Research Papers Summarized, Implemented Data Analytics projects, Implemented Data Visualization Projects, Implemented Data Mining Projects, Implemented Natural Leaning Processing Projects, MLOps and Deep Learning, Applied Machine Learning with Projects Series, PyTorch with Projects Series, Tensorflow and Keras with Projects Series, Scikit Learn Series with Projects, Time Series Analysis and Forecasting with Projects Series, ML System Design Case Studies Series videos will be published on our youtube channel ( just launched).

Subscribe today!

Some of the other best Series —

30 days of Machine Learning Ops

Complete System Design Case Studies Series

30 Days of Natural Language Processing ( NLP) Series

30 days of Data Engineering with projects Series

Data Science and Machine Learning Research ( papers) Simplified **

60 days of Data Science and ML Series with projects

100 days : Your Data Science and Machine Learning Degree Series with projects

23 Data Science Techniques You Should Know

Tech Interview Series — Curated List of coding questions

Complete System Design with most popular Questions Series

Complete Data Visualization and Pre-processing Series with projects

Complete Python Series with Projects

Complete Advanced Python Series with Projects

Kaggle Best Notebooks that will teach you the most

Complete Developers Guide to Git

Exceptional Github Repos — Part 1

Exceptional Github Repos — Part 2

All the Data Science and Machine Learning Resources

210 Machine Learning Projects

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :

Github —

Scikit-learn: Machine Learning in Python

By Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay

Abstract —

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.

Paper can be found here :

https://arxiv.org/pdf/1201.0490v4.pdf

Code can be found here :

PyTorch: An Imperative Style, High-Performance Deep Learning Library

By Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury , Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary Devito , Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala

Abstract —

Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it was designed from first principles to support an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several commonly used benchmarks.

Paper can be found here :

http://papers.nips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

Code can be found here :

Automatic Differentiation in PyTorch

By Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, Adam Lerer

Abstract —

In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd, and provides a high-performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.

Paper can be found here :

https://openreview.net/pdf?id=BJJsrmfCZ

Code can be found here :

Adapting the Tesseract Open Source OCR Engine for Multilingual OCR

By Ray Smith, Daria Antonova, Dar-Shyang Lee

Abstract —

We describe efforts to adapt the Tesseract open source OCR engine for multiple scripts and languages. Effort has been concentrated on enabling generic multi-lingual operation such that negligible customization is required for a new language beyond providing a corpus of text. Although change was required to various modules, including physical layout analysis, and linguistic post-processing, no change was required to the character classifier beyond changing a few limits. The Tesseract classifier has adapted easily to Simplified Chinese. Test results on English, a mixture of European languages, and Russian, taken from a random sample of books, show a reasonably consistent word error rate between 3.72% and 5.78%, and Simplified Chinese has a character error rate of only 3.77%.

Paper can be found here :

https://ai.google/research/pubs/pub35248.pdf

Code can be found here :

Caffe: Convolutional Architecture for Fast Feature Embedding

By Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long , Ross Girshick, Sergio Guadarrama, Trevor Darrell

Abstract —

Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (≈≈ 2.5 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

Paper can be found here :

https://arxiv.org/pdf/1408.5093v1.pdf

Code can be found here :

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

By Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Abstract —

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training. Due to the cost of applying such models to down-stream tasks, several model compression techniques on pre-trained language representations have been proposed (Sun et al., 2019; Sanh, 2019). However, surprisingly, the simple baseline of just pre-training and fine-tuning compact models has been overlooked. In this paper, we first show that pre-training remains important in the context of smaller architectures, and fine-tuning pre-trained compact models can be competitive to more elaborate methods proposed in concurrent work. Starting with pre-trained compact models, we then explore transferring task knowledge from large fine-tuned models through standard knowledge distillation. The resulting simple, yet effective and general algorithm, Pre-trained Distillation, brings further improvements. Through extensive experiments, we more generally explore the interaction between pre-training and distillation under two variables that have been under-studied: model size and properties of unlabeled task data. One surprising observation is that they have a compound effect even when sequentially applied t the same data. To accelerate future research, we will make our 24 pre-trained miniature BERT models publicly available.

Paper can be found here :

https://arxiv.org/pdf/1908.08962v2.pdf

Code can be found here :

FastText.zip: Compressing text classification models

By Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, Tomas Mikolov

Abstract —

We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory. After considering different solutions inspired by the hashing literature, we propose a method built upon product quantization to store word embeddings. While the original technique leads to a loss in accuracy, we adapt this method to circumvent quantization artefacts. Our experiments carried out on several benchmarks show that our approach typically requires two orders of magnitude less memory than fastText while being only slightly inferior with respect to accuracy. As a result, it outperforms the state of the art by a good margin in terms of the compromise between memory usage and accuracy.

Paper can be found here :

https://arxiv.org/pdf/1612.03651v1.pdf

Code can be found here :

Bag of Tricks for Efficient Text Classification

By Armand Joulin Edouard Grave, Piotr Bojanowski, Tomas Mikolov

Abstract —

This paper explores a simple and efficient baseline for text classification. Our experiments show that our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. We can train fastText on more than one billion words in less than ten minutes using a standard multicore~CPU and classify half a million sentences, among~312K classes, in less than a minute.

Paper can be found here :

https://arxiv.org/pdf/1607.01759v3.pdf

Code can be found here :

Enriching Word Vectors with Subword Information

By Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov

Abstract —

Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character nn-grams. A vector representation is associated to each character nn-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

Paper can be found here :

https://arxiv.org/pdf/1607.04606v2.pdf

Code can be found here :

OpenAI Gym

By Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba

Abstract —

OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.

Paper can be found here :

https://arxiv.org/pdf/1606.01540v1.pdf

Code can be found here :

XGBoost: A Scalable Tree Boosting System

By Tianqi Chen, Carlos Guestrin

Abstract —

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

Paper can be found here :

https://arxiv.org/pdf/1603.02754v3.pdf

Code can be found here :

MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems

By Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang , Tianjun Xiao, Bing Xu, Chiyuan Zhang, Zheng Zhang

Abstract —

MXNet is a multi-language machine learning (ML) library to ease the development of ML algorithms, especially for deep neural networks. Embedded in the host language, it blends declarative symbolic expression with imperative tensor computation. It offers auto differentiation to derive gradients. MXNet is computation and memory efficient and runs on various heterogeneous systems, ranging from mobile devices to distributed GPU clusters. This paper describes both the API design and the system implementation of MXNet, and explains how embedding of both symbolic expression and tensor operation is handled in a unified fashion. Our preliminary experiments reveal promising results on large scale deep neural network applications using multiple GPU machines.

Paper can be found here :

https://arxiv.org/pdf/1512.01274v1.pdf

Code can be found here :

Image Super-Resolution Using Deep Convolutional Networks

By Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang

Abstract —

We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage. We explore different network structures and parameter settings to achieve trade-offs between performance and speed. Moreover, we extend our network to cope with three color channels simultaneously and show better overall reconstruction quality.

Paper can be found here :

https://arxiv.org/pdf/1501.00092v3.pdf

Code can be found here :

YOLOv4: Optimal Speed and Accuracy of Object Detection

By Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao

Abstract —

There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.

Paper can be found here :

https://arxiv.org/pdf/2004.10934v1.pdf

Code can be found here :

References and credits —

Want to read programmers humor?

Recommended Articles -

Machine Learning
Artificial Intelligence
Data Science
Research
Tech
Recommended from ReadMedium