Project: Nightingale — AI Process Metrics

Cumulative Incidents by Category

Solid lines = with skills active. Dotted lines = baseline (before skills existed). Click a legend label to toggle both.

How This Works

Objectives:

Build out skills to automate routine tasks and functions for a small business.
Help streamline complex processes into a formal structure.
Limit-testing the capabilities of AI skillsets with a special focus on self-propulsion and organic growth.
Determine the veracity of AI skillsets for real-world application. Can AI reliably be used for each professional archetype?

Strategies:

Generated Skill: Researcher, which was instructed to research how to research, in an Ouroboros-like snake eating its own tail fashion. Raising the functionality of the producer element was a motivating factor for this cycle. I wanted to upgrade Skill: Researcher from D-Tier to B-Tier; the smith needs to be capable of making high quality products after all.
Had Researcher investigate how to make a skill that specialized in the Skill: Skill-Crafter archetype.
Ouroboros cycles of self-improvement were employed throughout.
Now that I had some specialized sub-classes as a base, I set out to add more team members to my roster.
I wanted to make sure I perfectly balanced the team by having a healthy balance of heavy hitters, supports, and safety nets.
Assigned a persona to each Profession. Note: this method of calling skills seems to result in better outcomes, possibly due to it feeling more clear to the human user.
Personas and Professions as a viewpoint enabled easier processing for gaps and scoping of responsibilities.
Page was used to look into more automation strategies and gaps.

Eventually developed an auditing system, compiled chats and tool-calls to have it process. Used these early stages of skill-less AI project development as a baseline metric of performance.
From this auditing system, Project: Nightingale was born. She is tasked with the delicate art of being a nurse. She must decide if she is helping or harming, on the fly. In order to do this, metrics were developed. 10 primary categories of the broad strokes, and 11+ is for tracking in a more granular mode. The granular mode has also been built out with organic growth in mind. The 10 primary categories will stay the same over time; but the granular mode, by nature, will be a list of problems that are naturally rotated through with the passage of time. Scaling this for long-term use is critical for the development process to survive.
Introduced the concept of Subconsciousness: F-Tier. I went down the skill tree and picked up Nightmares and Daydreaming, as well. We’ll see how well this introduces dynamic learning and repeated problem aversion. Building it out will most likely require time.

Results:

See for yourself. This is a progress report feed that’s updated daily. It features a graph representing the baseline with dotted lines, and solid lines highlight team performance. Additionally, it has more specific information about this in the character cards. Over time, I hope to fully refine the team composition, manage skill balancing issues, and upgrade all of my classes to S-Tier. Only time will tell, so come watch with me!

Failure Categories

Tag	Category	What It Tracks	Total

The Team

These are AI agents, not employees. Every agent has a name, a personality, and a job. The names are fun. The accountability isn't. Accuracy ratings are calculated by Nightingale from real audit data — tasks completed vs. incidents attributed. You don't get a card until you've done real work.

Below 80% 80–89% 90%+ accuracy