# Architecture overview Normally, you would think that a tool like `Dokka` simply parses some programming language sources and generates `HTML` pages for whatever it sees along the way, with little to no abstractions. That would be the simplest and shortest way to implement an API documentation engine. However, it was clear that `Dokka` may need to generate documentation from various sources (not only `Kotlin`), that users might request additional output formats (like `Markdown`), that users might need additional features like supporting custom `KDoc` tags or rendering `mermaid.js` diagrams - all these things would require changing a lot of code inside `Dokka` itself if all solutions were hardcoded. For this reason, `Dokka` was built from the ground up to be easily extensible and customizable by adding several layers of abstractions to the data model, and by providing pluggable extension points, giving you the ability to introduce selective changes on a single level. ## Overview of data model Generating API documentation begins with `Input` source files (`.kts`, `.java`, etc) and ends with some `Output` files (`.html`/`.md` pages, etc). However, to allow for extensibility and customization, several input and output independent abstractions have been added to the data model. Below you can find the general pipeline of processing data gathered from sources and the explanation for each stage. ```mermaid flowchart TD Input --> Documentables --> Pages --> Output ``` * `Input` - generalization of sources, by default `Kotlin`/`Java` sources, but could be virtually anything * `Documentables` - unified data model that represents _any_ parsed sources as a tree, independent of the source language. Examples of a `Documentable`: class, function, package, property, etc * `Pages` - universal model that represents output pages (e.g a function/property page) and the content it's composed of (lists, text, code blocks) that the users needs to see. Not to be confused with `.html` pages. Goes hand in hand with so-called `Content` model. * `Output` - specific output format like `HTML`/`Markdown`/`Javadoc`/etc. This is a mapping of pages/content model to some human-readable and visual representation. For instance: * `PageNode` is mapped as * `.html` file for `HTML` format * `.md` file for `Markdown` format * `ContentList` is mapped as * `
  • ` / `