Build A Personal Blog System From Scratch

Reading can elevate our cognitive level, broaden our horizons, and enhance our technical skills. When we read others' words, there is always an urge to express our own opinions. Thus, the transition from "reading" to "writing" is quite natural.

I have tried various blog platforms or tools, but somehow, I still feel they are not good enough. It's not that I'm too picky; it's just that as a developer, I have my own set of quirks. My ideal blog system should be like this:

"Data" always belongs to the individual, not the platform;
Use markdown as the standard format, ensuring that it can be displayed identically on any other platform when migrated;
Based on git repositories to support version management and CI&CD pipelines.

Based on the above "quirks", I want to build my own blog system.

Why so many requirements?

Blog just is a tool for recording and sharing, why make it so complicated? Well, let me tell you, why it should be so complicated.

Is the data I generate really mine?

I want to ask a core question: Whose ownership is the data I generate? It's me or the platform? I believe most people may not have pondered this question. Some might answer without hesitation, "Of course, it's mine!" But is that truly the case?

Let me pose another question. As we are now "tied" to various platforms, do we sometimes feel an urge to "escape"? All our chat records are on Line, the songs we listen to are on Spotify or Apple Music, and the posts we publish are on Twitter... and so on. Just imagine, one day I want to switch to a different music software, can my favorite songs and collected albums be automatically migrated to the new one? If one day I decide to stop using Twitter and switch to Instagram instead, can my previous thoughts and words be seamlessly transferred? Apparently not, as the "platform" will not enable such functionality. But aren't these my own "data"? Why is it so difficult for me to "obtain" or "migrate" them? Come to think of it, is the data truly ours?

If the data always belongs to the individual, take blogging as an example. Then, I should have the following capabilities:

No matter which platform is down or offline, my blog data will be safe and sound;
I can freely modify and publish my blog;
I can freely choose my blog platform and switch or migrate at any time;
I can search, organize, and archive all my blogs at any time;

I think no platform can fulfill all these requirements.

Why choose Markdown as the blog storage format?

Markdown is a widely used and concise text format that most platforms can recognize and display flawlessly. This ensures that the content we produce can be published on multiple platforms simultaneously, and the display on each platform remains relatively consistent without significant deviations.

In fact, HTML can also be used as a universal web page format, but compared to Markdown, it is not concise enough, and it is more complicated to write by hand, has redundant storage, and is not easy to expand.

In actual reading and browsing, we can find that most blogs and documents are written and published based on the Markdown format. Markdown is the most favorite text format for developers.

Why choose git?

Git is a version management system that is not limited to managing "code". Storing blog content in Git offers numerous benefits, such as:

As a secure repository for blog content: This guarantees the safety and stability of the data. Data may be lost on the local disk or deleted from a standalone database, but it will never be lost in git;
Blog content version management: With Git version management, we can trace version changes, roll back and compare before and after at any time, and it also provides the ability to store multiple copies;
Submit and publish. CI&CD based on Git can easily build our automated release process, making multi-platform release no longer cumbersome.

How to achieve these capabilities?

I drew a simple system architecture diagram:

My blog system consists of four subsystems:

Github: Responsible for storing and versioning blog content, and automatically pushing to the backend service for database storage via Github Action;
Backend service: responsible for docking with the database and providing APIs to the front end and Github Action to call, to achieve the addition, deletion, modification and query of blog content; it is also responsible for calling the APIs of various blog platforms for multi-platform publishing;
Database: Stores blog content, tags, categories, and user data;
Front-end Interface: Retrieves backend data and provides an enjoyable reading experience for readers.

This design offers several advantages, such as:

Multiple levels of data replicas: The data is stored in both Github repositories and the database, providing redundancy. In the event of database data loss, it can be quickly synchronized based on Github;
Decoupling of subsystems for enhanced flexibility: Each subsystem can function independently, allowing for the use of different technical stacks for the front and back ends, and any relational database. This offers great flexibility in system architecture;
Automated push based on Github Action for increased efficiency: After local editing and submission, Github Action automatically calls the backend API based on file changes to implement the addition, deletion, and modification of the blog. At the same time, the backend service can automatically invoke the API of other blogging platforms to publish the blog simultaneously across multiple platforms.

There are many benefits to this design, such as:

Multi-level data copies. You can see that the data is stored in the Github repository and database at the same time. Even if the database data is lost, it can be quickly synchronized based on Github;
Each subsystem is decoupled from each other and performs its own duties. For example, the front and backends can choose any technical stack, and the database can also be any relational database, without any requirements for the subsystem architecture.
Automatic push based on Github Action. After editing and submitting locally, Github Action can automatically call the backend API based on file changes to implement blog additions, deletions, and modifications; at the same time, the backend service can automatically call the API of other blog platforms to synchronously push blogs and achieve the ability to publish on multiple platforms.

Let's take a closer look at each system:

Database

The database uses Postgres to store data. The database design refers to the ER diagram: Database ER diagram

Github

In addition to providing storage and version management, the core function of Github is the automatic publishing process based on Github Action. I designed it like this:

Use a fixed directory structure to store blog content. The repository structure is as follows:
- Blog categories are used as the first-level directory;
- Then there is the blog itself, and the Slug of each blog is used as the second-level directory name;
- Since both Chinese and English content are provided, the file name is also named using the Slug, where the default is Chinese content and the English content ends with .en.
Each submission will automatically obtain the Diff with the last submission, and based on the Diff, determine the addition, deletion, and modification of the blog, so as to call different APIs to realize the automatic update of data in the database. Here is a workflow reference: Among them, blog-auto-push is a Github Action that I customized. The logic is relatively simple. It is to judge the addition, deletion and modification based on the changes of the file, and then call different APIs to implement it.

Backend service

The backend service is based on NestJS + Prisma. Some people may ask, why not choose Java? The reason is that as a front-end engineer, I am most familiar with JavaScript, and it is easy to get started and develop back-end applications based on NestJs.

NestJs provides an out-of-the-box program architecture. The underlying layer can be implemented based on ExpressJs or Fastify. It also provides supporting capabilities such as API authentication, logging and Swagger.
Prisma provides an object-relational mapping (ORM) based on Node and Typescript, which can easily perform database design, migration and release.

In the backend service, in addition to the basic addition, deletion, modification and query, the core needs to handle API calls from Github Action. Note that the database is stored with Id as the primary key, but in Workflow, it must be identified based on Slug, so it is necessary to ensure that Slug (also the directory name of each blog) is globally unique. Among them, the API for updating blogs based on Workflow is designed as follows:

async upsertArticleByMarkdownFile(dto: {
/** Unique ID */
slug: string;
/** Chinese content */
content: string;
/** English content */
enContent: string;
}): Promise<boolean>

After receiving the API call, you need to do the following:

Separate the properties definition and the body content based on front-matter;
Get the category, tag, cover image, etc. from the properties;
get the "excerpt" based on the body and store it
Determine whether to add or update based on Slug, and then call the creation and update services respectively.

Frontend User Interface

The frontend interface can be simple or relatively complex. In short, you only need to get the blog content for display. Of course, we need a Markdown to HTML conversion tool, and finally convert it into HTML for page rendering. If we want to be complex, we need:

Support black and white themes, and switch between Chinese and English;
Support filtering by category and tag;
Support automatic outline acquisition;
Support responsive layout to design and support display of different screen sizes;
Automatic SEO optimization, etc.

Of course, these are the capabilities that an excellent blog site must have. Many blog building tools on the market have these capabilities, but who makes me a front-end engineer? I chose to start from scratch, From determining the technology stack, studying various layouts, switching between Chinese and English, black and white themes, etc. This process is also fun and I learned a lot of knowledge that I had not touched before.

More tools sharing

After completing the development of the above content, we already have an automated blog publishing system and have achieved our initial goal. In this process, we also found some useful tools to further enhance our happiness in blogging.

Obsidian: A famous writing and knowledge management tool. Its declaration "Your thoughts are yours" coincides with the proposition of this blog. Based on Obsidian, we can directly create our Git repository as an Obsidian vault, so that we can use the Obsidian editor to improve the efficiency of coding; at the same time, Obsidian's rich plugin ecosystem also provides us with a variety of useful capabilities, among which the image auto upload plugin is very useful.
Based on PicGo + Cloudfare R2 + Image auto upload Plugin to realize the function of pasting and automatically uploading, if you are interested, I can post a separate blog to introduce these tools, including:
- Based on Cloudfare R2 object storage to build a personal image hosting for free;
- Based on PicGo to automatically upload images to a personal image hosting;
- Based on Image auto upload Plugin to paste and upload images.

Finally

Written here, our blog content ends. This is Chi's Talking. I will try my best to bring more interesting sharing regularly, welcome to subscribe my blog.