unihan-db

SQLAlchemy models for the UNIHAN CJK character database. unihan-db provides the schema and ORM layer. For the ETL pipeline, see unihan-etl. For end-user character lookups, see cihai.

Quickstart

Install and load UNIHAN data in 5 minutes.

Quickstart
Models & Bootstrap

Table models, bootstrap loader, and data importer.

API Reference
Contributing

Development setup, code style, release process.

Project

Install

$ pip install unihan-db
$ uv add unihan-db

At a glance

from sqlalchemy import create_engine
from sqlalchemy.orm import Session

from unihan_db.bootstrap import bootstrap_unihan
from unihan_db.tables import Base, Unhn

engine = create_engine("sqlite:///unihan.db")

# Step 1: Create the schema
Base.metadata.create_all(engine)

# Step 2: Bootstrap data from the Unicode consortium
bootstrap_unihan(engine)

# Step 3: Query characters
with Session(engine) as session:
    char = session.query(Unhn).filter_by(char="\u597D").first()
    if char:
        print(char.char, char.ucn)

See Quickstart for the full setup, including bootstrapping data from the Unicode consortium.