LFX Platform

Know more about LFX Platform

LFX Insights

HTML/XML Parsing Libraries

Libraries for parsing, processing, and manipulating HTML and XML documents.

33 projects

40,034 contributors

$1.7B

The Symfony PHP Framework

Symfony is a PHP web application framework designed for building robust, scalable, and maintainable web applications using reusable components and a structured MVC architecture. Itโ€™s widely used for enterprise-level projects and forms the foundation of many other PHP platforms, including Laravel and Drupal.

Contributors

16,949

Organizations

3,344

Software value

$66M

Servo Project

The mission of the Project is to provide an independent, modular, embeddable web engine, which allows developers to deliver content and applications using web standards. NOTE: Servo Project was originally set up as a Series LLC (and under the Servo Project Fund). Both of those were archived on June 15, 2023 and Servo Project was transitioned to being a LF Europe Project with technical charter set at https://github.com/servo/project/blob/main/governance/CHARTER.md.

Contributors

8,407

Organizations

1,785

Software value

$1.4B

Nokogiri

Nokogiri is a Ruby library for parsing and manipulating HTML, XML, and SAX documents. It provides a robust API for reading, searching, modifying, and extracting data from structured documents using XPath and CSS selectors.

Contributors

2,706

Organizations

985

Software value

$4.4M

Cheerio

Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for server-side HTML parsing and manipulation. It provides an API for traversing and modifying HTML/XML documents using familiar jQuery-like syntax.

Contributors

2,076

Organizations

704

Software value

$456K

jsoup

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. The library implements the WHATWG HTML5 specification and parses HTML to the same DOM as modern browsers do.

Contributors

1,962

Organizations

396

Software value

$1.3M

HTMLMinifier

Javascript-based HTML compressor/minifier (with Node.js support)

Contributors

1,033

Organizations

377

Software value

$882K

sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance

Contributors

783

Organizations

199

Software value

$102K

kramdown

kramdown is a fast, pure Ruby Markdown superset converter, using a strict syntax definition and supporting several common extensions.

Contributors

752

Organizations

329

Software value

$443K

fast-xml-parser

Validate XML, Parse XML and Build XML rapidly without C/C++ based libraries and no callback.

Contributors

735

Organizations

184

Software value

$153M

AngleSharp

:angel: The ultimate angle brackets parser library parsing HTML5, MathML, SVG and CSS to construct a DOM based on the official W3C specifications.

Contributors

705

Organizations

117

Software value

$6.3M

Html Agility Pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.

Contributors

554

Organizations

73

Software value

$486K

htmlparser2

htmlparser2 is a fast and forgiving HTML/XML parser library for Node.js that can parse HTML according to the WHATWG HTML specification. It provides a streaming interface for efficiently parsing large chunks of data and supports custom handlers for processing parsed content.

Contributors

477

Organizations

175

Software value

$76K

Minify

Go minifiers for web formats

Contributors

474

Organizations

169

Software value

$23M

HTML React Parser

๐Ÿ“ HTML to React parser.

Contributors

405

Organizations

95

Software value

$98K

parse5

parse5 is a fast and specification-compliant HTML parsing/serialization toolset for Node.js. It provides a full-featured HTML parser that generates a DOM tree from HTML code, following the WHATWG HTML specification.

Contributors

361

Organizations

171

Software value

$874K

Floki

Floki is a fast and flexible HTML/XML parser written in Elixir that enables easy traversal and manipulation of HTML/XML documents using CSS selectors, similar to jQuery.

Contributors

323

Organizations

98

Software value

$2.6M

Loofah

Ruby library for HTML/XML transformation and sanitization

Contributors

302

Organizations

126

Software value

$231K

xmlbuilder-js

An XML builder for node.js

Contributors

283

Organizations

89

Software value

$232K

xmldom

A JavaScript implementation of the W3C DOM specification that allows parsing and serializing XML documents in Node.js and browser environments

Contributors

245

Organizations

84

Software value

$2.4M

rehype

HTML processor powered by plugins part of the @unifiedjs collective

Contributors

179

Organizations

69

Software value

$504K

REXML

REXML is a pure Ruby XML processor that provides a way to parse, validate, modify and generate XML documents in Ruby. It implements both DOM and SAX2 APIs for XML processing.

Contributors

174

Organizations

67

Software value

$1.4M

entities

A Node.js library for encoding and decoding HTML entities, providing functionality to convert special characters to their HTML entity representations and vice versa

Contributors

111

Organizations

52

Software value

$79K

hast-util-to-html

A JavaScript library that converts HAST (Hypertext Abstract Syntax Tree) nodes to HTML strings, providing a way to serialize HTML AST structures into their string representation

Contributors

38

Organizations

20

Software value

$133K

HTML DOM Parser

๐Ÿ“ HTML to DOM parser.

This project hasn't been onboarded to LFX Insights.

Web Metadata Scraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.

This project hasn't been onboarded to LFX Insights.

html-entities

Fastest HTML entities encode/decode library

This project hasn't been onboarded to LFX Insights.

libxml2

Read-only mirror of https://gitlab.gnome.org/GNOME/libxml2

This project hasn't been onboarded to LFX Insights.

lxml

The lxml XML toolkit for Python

This project hasn't been onboarded to LFX Insights.

pugixml

Light-weight, simple and fast XML parser for C++ with XPath support

This project hasn't been onboarded to LFX Insights.
Looking for a project thatโ€™s not listed?