A Claude Code and Codex Skill for Deliberate Skill Development

A Claude Code and Codex Skill for Deliberate Skill Development(github.com)

253 points by cdrnsf 6 days ago | 50 comments

neuralkoi 6 days ago |

I'm not familiar with Skills, but looking at the repo I find the amount of decorative code/text as overkill for what amounts to just the following prompt in a bash script (yikes) executing after a commit is run:

    {"hookSpecificOutput":{"hookEventName":"PostToolUse","additionalContext":"[learning-opportunities-auto] The user just committed code. Per the learning-opportunities skill, consider whether this is a good moment to offer a learning exercise. If the committed work involved new files, schema changes, architectural decisions, refactors, or unfamiliar patterns, ask the user (one short sentence) if they'd like a 10-15 minute exercise. Do not start the exercise until they confirm. If they decline, note it — no more offers this session."}}

alexhans 6 days ago | |

Skills are just a good standard to describe repeatable workflows saving context through progressive disclosure, prompt sharing and, very underused feature, also bound the non deterministic parts with determism (which could be scripts).

Conceptually, you should treat them as incremental software instead of magic you grab from others [1]

The killer feature is that coding harnesses tend to have SkillBuilder agent skills so creating them becomes very easy and you can evolve them.

I recommend you build your own for your particular pain points.

Very simple example [2] showing what another user mentioned around "evals" so that you can really achieve good enough correctness for your automation.

- [1] https://alexhans.github.io/posts/series/evals/building-agent...

- [2] https://alexhans.github.io/posts/series/evals/sketch-to-text...

skinfaxi 6 days ago | | |

After reading your first article I'm not sure I would agree. Skills are certainly transferrable in the sense that a sufficiently narrowly-tailored skill can be applicable for others with no modification. Similar to how we grab libraries that encapsulate certain abstractions for us.

saidnooneever 6 days ago | |

most stuff in these tools is just another md file which get spliced into prompt somehow. its how llms work.. this is normal. its also why id recommend people to use claude to build a similar tool for themselve. you will spend some tokens on it and then after save like 90% token costs using your own tool... its really crazy how much less tokens and calls are needed to do meaningful work....

also you can secure/lockdown tool calls better and make the agents tasks retryable, give it failure modes etc. (not if ur laptop dies during agent work its only god and the agent who know what happened to your code.. oh no wait. the agent needs to just spend 100k tokens to remember where it was (great way to spend ur money).

skiing_crawling 6 days ago |

I was surprised to learn that some skills don't even describe exactly what steps to take or what to do. They just kind of give a motivational speech which I guess primes the model to output better text for a certain task.

https://github.com/anthropics/claude-code/blob/main/plugins/...

This frontend design skill that claude uses basically just begs it to pick nice fonts and make the design coherent. No specifics about which fonts or how to make nice color schemes and layout.

zihotki 6 days ago |

No benchmarks and evals present, how do you know it produces better result than /create-skill ? Naive testing doesn't provide any confidence

schnitzelstoat 6 days ago | |

I think it means human skill development. It offers learning opportunities to the user.

> When you complete architectural work (new files, schema changes, refactors), Claude offers optional 10-15 minute learning exercises grounded in evidence-based learning science. The exercises use techniques like prediction, generation, retrieval practice, and spaced repetition to provide you with semi-worked examples from across your own project work.

Confusing name though.

wiseowise 6 days ago | |

When your brain is so cooked on LLMs that mentioning any related terminology triggers Pavlovian response.

alexhans 6 days ago | |

Hey, it's awesome that you mention evals. May I ask what you currently use, or look for? Do you roll your own or use an existing framework?

bisonbear 6 days ago | | |

Not the OP, but I've been thinking about this problem a lot - as devs we're overly reliant on vibes for evaluating coding agents. This is already a problem, and especially so if you're working in an engineering organization where a bad edit to AGENTS.md can cause silent regressions for everyone in the codebase.

To solve this, I've built an agent-native tool to run evaluations based on merged PRs in your codebase. Basically you can ask Claude to evaluate whether the skill made things better/worse on real tasks, and to then iteratively improve it

Stalking your profile (sorry..) I see you're pretty deep in the eval space, so I'm super curious what your approach has been to being rigorous for things like skill changes?

aledevv 6 days ago |

What exactly is the "adaptive dynamic textbook approach"?

Examples?

> Generation effect: Accepting generated code and decreasing generating one's own code can skip the active processing that builds understanding.

Holy truth.

luodaint 6 days ago |

There is an iterative kind which applies specifically to the code-writing agents. Accepting the output of your coding assistant without checking whether it is correct will cause the loss of knowledge about your codebase. Context files, such as CLAUDE.md, migration protocols, and authentication protocols, function correctly if you possess sufficient knowledge to be able to update them properly.

Sometimes I have had sessions in which I blindly accepted the code produced by the agent for two hours, but afterwards was not able to create a new context file, having forgotten how my codebase worked. Such skill debt does not appear in the diff – it becomes apparent in situations when you must guide the agent, but cannot do it. Such is the nature of the practice proposed by this skill.

rglover 6 days ago |

For those who haven't gone down this rabbit hole like me yet: skills are just structured markdown files that describe how to handle a narrow-band task.

So, if I write my API endpoints a certain way, the skill would describe that specific process. Later, an agent can "see" this skill, load it when it's relevant to current chat context, and then do whatever is instructed.

Similar to "tool calls," but instead of being a function you can call, it's just instructions for how to perform that "skill."

At least for the agent I use (Cline), you can define skills either globally or locally (project level).

8note 6 days ago | |

skills also have a header called "frontmatter" some piece of which us shared early in the context like a claude.md file

ive heard here that that skill loads can have a separate impact on the context like staying past a compact.

if you load a bunch of skills your session might end up with them permanently loaded.

i think they pair well with subagents, since the subagent can load the skill, and once its done with the work, can present just the results, and the orchestrator agent doesnt need to know about it

aplthrowaway67 6 days ago |

I will never understand why someone would go through all the trouble of developing this cool idea, without bothering to link a demo or include sample output. I see this every day on HN.

So the only way I can see what this skill actually looks like is to download and run it myself? No thank you.

implexa_founder 1 day ago |

recursive skill-building's the obvious move. the missing piece is outcome tracking. without knowing which skill closed which loop, "deliberate" just means "you made one." the loop closes on the data, not the demo.

areoform 6 days ago |

I really love the idea, I've had Claude make textbooks for me on the fly using open source textbooks and documentation. Is it possible to extend this skill to more generalized areas of learning / application? Or, is it domain specific to code?

Juvination 6 days ago |

This is a great idea, I've been exploring with it this morning. I've really been feeling the brain drain from using AI to much, and while this isn't the fix. I think a few exercises a day can really help.

romanoonhn 6 days ago |

Looks interesting! I know it's easy to setup and test it but I'm on mobile current so I think it'd be great if there was full-interaction example to better understand how it works.

annjose 6 days ago | |

There are a few examples here:

https://github.com/DrCatHicks/learning-opportunities/blob/ma...

As I understand, this skill is intended to understand AI-generated code and potentially reduce skill atrophy. So it asks the agent to pause after important milestones (eg: created a file, changed db schema etc ) and ask the user questions about how they would do it.

esalman 6 days ago |

Interesting responses here. I think most are missing the point.

For me, the main lesson here is seeing and learning from how others are using skills. Yesterday I was watching a Matt Pocock class on using agents and he was also showing off skills, such as how he uses a "grill-me" skill to develop product requirement document. I am certainly not going to do exactly what he does, but I now have my own ideas about how to develop requirements and implement them.

After all, in the word of Anthropic engineers themselves, Claude is like a talented engineer, but lacks expertise. Skills are folders and files that build expertise. Another important thing I leaned from Pocock is that the longer the context (or token size), the dumber the responses tend to get. So skills are another way to present the problem to an LLM in a compact manner and get optimized response.

Claude also has behavioral traits. So if someone iteratively builds a skill, it is most likely not going to port well to another user, because each of us chat differently. This is why I hesitate to share my skill folder with my colleagues. But I will certainly demo what I built so that they can see what's possible and figure out their own workflows.

So the value is in seeing how someone else builds using Claude, and imitate in your own way. Very much like when I first learned programming, I was copying code form Kernighan and Richie's C book, but then changing up things to understand how it works and later customize the code for my purpose.

I mentioned behavioral traits for another reason- the author is a psychologist and it is really interesting to see how she interacts with Claude, which is probably very different from how programmers use Claude. Tangentially, she (and a host of other experts in the field) left Twitter long time ago. I'm going to install bsky/mastodon and follow them, because I think it's important to watch how expert non-programmers are using LLMs.

ruguo 6 days ago |

Just tried this skill, pretty interesting. The Q&A at the end actually went surprisingly deep.

itsafarqueue 6 days ago |

Hey bro I heard you like skills so I put a skill in your oh whatever

DonHopkins 6 days ago | |

A skill skill, aka overskill:

https://github.com/SimHacker/moollm/blob/main/skills/skill/S...

Mashimo 6 days ago |

Mhh, interesting.

I want to learn Java spring, and probably let ai help me / quiz me. I will take a look into the skills for inspiration.

tomaytotomato 6 days ago | |

I am a Java dev and Spring user for about 10 years now.

If you want to learn how Spring framework and Spring boot works, the best thing to do is build your own library and then learn to add it to a new spring boot service.

https://www.baeldung.com/spring-boot-custom-starter

Depending on which AI tool you are using, you can also get it to debrief what it is doing and what layer of the Spring architecture it is using (lifecycle, bean scope, is it using auth/messaging/data middleware etc)

Also here is a service I have built with Claude code along with a sample Spring boot service

https://github.com/tomaytotomato/spring-data-solr-lazarus

It is a demo to show that I could get Apache Solr working in the latest version of Spring Framework 7 and Spring Boot 4. There is a sample application in there for a bookstore you can play around with.

Mashimo 6 days ago | | |

Thanks mate. Will check it out later.

Current plan is to use a existing vue/typescript browser game as frontend, send high score and similar via web sockets. Do ~something~ with red panda to tip my toes into the Kafka world.

ramon156 6 days ago | |

Is there a reason why making a spring app and learning hands-on is not feasible?

I know I sometimes get demotivated mid-way, but that also tells me it might not be worth the investment

imtringued 6 days ago | | |

Spring is reasonably easy to learn. The hard part is knowing where beans are defined, because Spring doesn't make that easy at all. Anyone and anything can define new beans in any library you pull.

I still don't see why AI would be mandatory. It's helpful, yes, but not mandatory.

Mashimo 6 days ago | | |

It's feasible, but I want to try to learn something new with an Ai tutor. See how that goes.

I want to make an spring app, but instead of looking everything up on Google, I can ask the Ai with context and maybe give me an learning plan that fits my needs