# Architecture overview Normally, you would think that a tool like Dokka simply parses some programming language sources and generates HTML pages for whatever it sees along the way, with little to no abstractions. That would be the simplest and the most straightforward way to implement an API documentation engine. However, it was clear that Dokka may need to generate documentation from various sources (not only Kotlin), that users might request additional output formats (like Markdown), that users might need additional features like supporting custom KDoc tags or rendering [mermaid.js](https://mermaid.js.org/) diagrams - all these things would require changing a lot of code inside Dokka itself if all solutions were hardcoded. For this reason, Dokka was built from the ground up to be easily extensible and customizable by adding several layers of abstractions to the data model, and by providing pluggable extension points, giving you the ability to introduce selective changes on a given level. ## Overview of data model Generating API documentation begins with input source files (`.kt`, `.java`, etc) and ends with some output files (`.html`/`.md`, etc). However, to allow for extensibility and customization, several input and output independent abstractions have been added to the data model. Below you can find the general pipeline of processing data gathered from sources and the explanation for each stage. ```mermaid flowchart TD Input --> Documentables --> Pages --> Output ``` * `Input` - generalization of sources, by default Kotlin / Java sources, but could be virtually anything * [`Documentables`](data_model/documentable_model.md) - unified data model that represents _any_ parsed sources as a tree, independent of the source language. Examples of a `Documentable`: class, function, package, property, etc * [`Pages`](data_model/page_content.md) - universal model that represents output pages (e.g a function/property page) and the content it's composed of (lists, text, code blocks) that the users needs to see. Not to be confused with `.html` pages. Goes hand in hand with the so-called [Content model](data_model/page_content.md#content-model). * `Output` - specific output formats like HTML / Markdown / Javadoc and so on. This is a mapping of the pages/content model to a human-readable and visual representation. For instance: * `PageNode` is mapped as * `.html` file for the HTML format * `.md` file for the Markdown format * `ContentList` is mapped as * `
  • ` / `