Skip to main content

Diffbot

Diffbot is a suite of ML-based products that make it easy to structure and integrate web data.

Installation and Setup​

Get a free Diffbot API token and follow these instructions to authenticate your requests.

Document Loader​

Diffbot's Extract API is a service that structures and normalizes data from web pages.

Unlike traditional web scraping tools, Diffbot Extract doesn't require any rules to read the content on a page. It uses a computer vision model to classify a page into one of 20 possible types, and then transforms raw HTML markup into JSON. The resulting structured JSON follows a consistent type-based ontology, which makes it easy to extract data from multiple different web sources with the same schema.

See a usage example.

from langchain_community.document_loaders import DiffbotLoader

API Reference:

Graphs​

Diffbot's Natural Language Processing API allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.

See a usage example.

from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer

Was this page helpful?


You can leave detailed feedback on GitHub.