We share news, discussions, videos, papers, and tutorials related to Machine Learning and NLP. Subscribe on Reddit!
Welcome to /r/TextDataMining! We share news, discussions, papers, tutorials, libraries, and tools related to NLP, machine learning and data analysis.
In this article, we will build a search engine on a huge corpus of custom dataset with Transformers
A mental model of how various components of a regular expression work from the bottom-up.
It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for language: words form a graph with a...
Language models have emerged as a central component across NLP, and a great deal of progress depends on the ability to cheaply adapt them (e.g., through finetuning) to new domains and tasks. A...
Posted by Prabhu Kaliamoorthi, Software Engineer, Google Research Deep neural networks have radically transformed natural language processing (NLP)...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous amounts of compute are...
The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface. - PAIR-code/lit
Recent work has shown that a multilingual neural machine translation (NMT) model can be used to judge how well a sentence paraphrases another sentence in the same language (Thompson and Post,...
In this paper, we propose a novel information criteria-based approach to select the dimensionality of the word2vec Skip-gram (SG). From the perspective of the probability theory, SG is considered...
Can intelligence emerge simply by training a big enough language model using lots of data? OpenAI tries to do so, using 175 billion parameters.
Interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. This success can be partly attributed to the advancements made in the sub-fields...