31 projects
Home Assistant
Home Assistant is an open-source home automation platform that enables users to control and automate their smart home devices. It provides a centralized system for managing lights, thermostats, cameras, sensors, and other IoT devices through a unified interface, with support for thousands of integrations and custom automations.
71,296
5,594
$124M
Transformers
Transformers is a state-of-the-art Natural Language Processing (NLP) library that provides thousands of pretrained models for text, vision, and audio tasks. It offers APIs to easily download and use these models, as well as to train new ones. The library supports multiple deep learning frameworks including PyTorch, TensorFlow, and JAX.
23,477
3,081
$49M
spaCy
spaCy is an industrial-strength natural language processing library for Python, designed for production use. It offers fast and accurate syntactic analysis, named entity recognition, text classification, and more. The library includes pre-trained statistical models and word vectors, and supports deep learning integration.
6,479
1,126
$7.8M
Llama Models
A collection of large language models (LLMs) developed by Meta AI, including the Llama family of models. These models are designed for natural language processing tasks and are made available for research and commercial use under specific licensing terms.
3,980
670
$425K
FastChat
FastChat is an open-source platform for training, serving, and evaluating large language models (LLMs). It provides tools for training and deploying LLM-based chatbots, including implementations of models like Vicuna and support for various model architectures.
2,841
425
$1M
Natural Language Toolkit (NLTK)
NLTK (Natural Language Toolkit) is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.
2,698
618
$4.1M
Sentence Transformers
Sentence Transformers is a Python framework for state-of-the-art sentence and text embeddings. It provides easy-to-use methods to compute dense vector representations for sentences, paragraphs and images, enabling semantic similarity comparisons and information retrieval tasks.
2,496
457
$2.3M
docling
A project focused on developing tools and resources for documenting and analyzing languages, particularly endangered and under-resourced languages, through computational and linguistic approaches
2,401
347
$637M
Whisper
Whisper is an automatic speech recognition (ASR) system developed by OpenAI that can transcribe and translate spoken language from audio into text. It is trained on a large dataset of multilingual speech data and can handle various languages, accents, and acoustic environments.
2,154
291
$606K
LanguageTool
LanguageTool is an open-source proofreading software that checks text for grammar, style, and spelling errors in multiple languages. It provides automated writing assistance through rule-based pattern matching and can be used as a standalone application, browser extension, or integrated into other software.
1,750
356
$141M
RWKV
The mission of the Project is to develop a recurrent neural net language model with GPT-level LLM performance, which can also be directly trained like a GPT transformer.
283
43
$71M
Tokenizers
π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
161
27
$1.6M
DELTA
Delta is a deep learning based end-to-end natural language and speech processing platform. DELTA aims to provide easy and fast experiences for using, deploying, and developing natural language processing and speech models for both academia and industry use cases. DELTA is mainly implemented using TensorFlow and Python 3.
160
19
$2M
RosaeNLG Project
An open source natural generation library.
39
10
$89M
CoreNLP
CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
Fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Flair
A very simple framework for state-of-the-art Natural Language Processing (NLP)
GROBID
A machine learning software for extracting information from scholarly documents
Gensim
Topic Modelling for Humans
HanLP
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
Haystack
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Jaseci
The Official Jaseci Code Repository
LibreTranslate
Free and Open Source Machine Translation API. Self-hosted, offline capable and easy to setup.
Moses
Moses, the machine translation system
NLP.js
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
PaddleNLP
π Easy-to-use and powerful NLP and LLM library with π€ Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including πText Classification, π Neural Search, β Question Answering, βΉοΈ Information Extraction, π Document Intelligence, π Sentiment Analysis etc.
PyThaiNLP
Thai natural language processing in Python
RasaHQ/rasa
π¬ Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
SentencePiece
Unsupervised text tokenizer for Neural Network-based text generation.
Spark NLP
State of the Art Natural Language Processing