LFX Platform

Know more about LFX Platform

LFX Insights

Natural Language Processing Libraries

Libraries focused on natural language processing tasks, including text representation, classification, translation, and more.

31 projects

120,215 contributors

$1.1B

Home Assistant

Home Assistant is an open-source home automation platform that enables users to control and automate their smart home devices. It provides a centralized system for managing lights, thermostats, cameras, sensors, and other IoT devices through a unified interface, with support for thousands of integrations and custom automations.

Contributors

71,296

Organizations

5,594

Software value

$124M

Transformers

Transformers is a state-of-the-art Natural Language Processing (NLP) library that provides thousands of pretrained models for text, vision, and audio tasks. It offers APIs to easily download and use these models, as well as to train new ones. The library supports multiple deep learning frameworks including PyTorch, TensorFlow, and JAX.

Contributors

23,477

Organizations

3,081

Software value

$49M

spaCy

spaCy is an industrial-strength natural language processing library for Python, designed for production use. It offers fast and accurate syntactic analysis, named entity recognition, text classification, and more. The library includes pre-trained statistical models and word vectors, and supports deep learning integration.

Contributors

6,479

Organizations

1,126

Software value

$7.8M

Llama Models

A collection of large language models (LLMs) developed by Meta AI, including the Llama family of models. These models are designed for natural language processing tasks and are made available for research and commercial use under specific licensing terms.

Contributors

3,980

Organizations

670

Software value

$425K

FastChat

FastChat is an open-source platform for training, serving, and evaluating large language models (LLMs). It provides tools for training and deploying LLM-based chatbots, including implementations of models like Vicuna and support for various model architectures.

Contributors

2,841

Organizations

425

Software value

$1M

Natural Language Toolkit (NLTK)

NLTK (Natural Language Toolkit) is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

Contributors

2,698

Organizations

618

Software value

$4.1M

Sentence Transformers

Sentence Transformers is a Python framework for state-of-the-art sentence and text embeddings. It provides easy-to-use methods to compute dense vector representations for sentences, paragraphs and images, enabling semantic similarity comparisons and information retrieval tasks.

Contributors

2,496

Organizations

457

Software value

$2.3M

docling

A project focused on developing tools and resources for documenting and analyzing languages, particularly endangered and under-resourced languages, through computational and linguistic approaches

Contributors

2,401

Organizations

347

Software value

$637M

Whisper

Whisper is an automatic speech recognition (ASR) system developed by OpenAI that can transcribe and translate spoken language from audio into text. It is trained on a large dataset of multilingual speech data and can handle various languages, accents, and acoustic environments.

Contributors

2,154

Organizations

291

Software value

$606K

LanguageTool

LanguageTool is an open-source proofreading software that checks text for grammar, style, and spelling errors in multiple languages. It provides automated writing assistance through rule-based pattern matching and can be used as a standalone application, browser extension, or integrated into other software.

Contributors

1,750

Organizations

356

Software value

$141M

RWKV

The mission of the Project is to develop a recurrent neural net language model with GPT-level LLM performance, which can also be directly trained like a GPT transformer.

Contributors

283

Organizations

43

Software value

$71M

Tokenizers

πŸ’₯ Fast State-of-the-Art Tokenizers optimized for Research and Production

Contributors

161

Organizations

27

Software value

$1.6M

DELTA

Delta is a deep learning based end-to-end natural language and speech processing platform. DELTA aims to provide easy and fast experiences for using, deploying, and developing natural language processing and speech models for both academia and industry use cases. DELTA is mainly implemented using TensorFlow and Python 3.

Contributors

160

Organizations

19

Software value

$2M

RosaeNLG Project

An open source natural generation library.

Contributors

39

Organizations

10

Software value

$89M

CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.

This project hasn't been onboarded to LFX Insights.

Fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

This project hasn't been onboarded to LFX Insights.

Flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)

This project hasn't been onboarded to LFX Insights.

GROBID

A machine learning software for extracting information from scholarly documents

This project hasn't been onboarded to LFX Insights.

Gensim

Topic Modelling for Humans

This project hasn't been onboarded to LFX Insights.

HanLP

Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification

This project hasn't been onboarded to LFX Insights.

Haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

This project hasn't been onboarded to LFX Insights.

Jaseci

The Official Jaseci Code Repository

This project hasn't been onboarded to LFX Insights.

LibreTranslate

Free and Open Source Machine Translation API. Self-hosted, offline capable and easy to setup.

This project hasn't been onboarded to LFX Insights.

Moses

Moses, the machine translation system

This project hasn't been onboarded to LFX Insights.

NLP.js

An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more

This project hasn't been onboarded to LFX Insights.

PaddleNLP

πŸ‘‘ Easy-to-use and powerful NLP and LLM library with πŸ€— Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including πŸ—‚Text Classification, πŸ” Neural Search, ❓ Question Answering, ℹ️ Information Extraction, πŸ“„ Document Intelligence, πŸ’Œ Sentiment Analysis etc.

This project hasn't been onboarded to LFX Insights.

PyThaiNLP

Thai natural language processing in Python

This project hasn't been onboarded to LFX Insights.

RasaHQ/rasa

πŸ’¬ Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

This project hasn't been onboarded to LFX Insights.

SentencePiece

Unsupervised text tokenizer for Neural Network-based text generation.

This project hasn't been onboarded to LFX Insights.

Spark NLP

State of the Art Natural Language Processing

This project hasn't been onboarded to LFX Insights.
Looking for a project that’s not listed?