How to train a language model on your resume (and why you shouldn’t)
How to Train a Language Model on Your Resume (and Why You Shouldn’t)

Introduction
It’s no mystery that the STEM job market is a bit hectic right now. But, it still makes us question whether the struggle is the market or is it us? Maybe I’m not conveying my strengths well enough to employers? What could be a solution to this? One solution I had was to add a Small Language Model (SLM) chatbot to my personal website (don’t judge it. Even now, it’s probably not done). What a great idea, right? All I had to do was find a small model that I could fine tune and then create the chatbot with ReactJS!
So, I started my journey by selected the Qwen3-0.6b. For anyone new to language models, Qwen is a model created by Alibaba Cloud and the “0.6b” refers to the model having 0.6 billion or 600 million parameters. A general rule you can usually count on is that for every billion parameters, the model is just shy of a gigabyte in size (presuming standard quantization, which is beyond the scope of this article). So, a 0.6b model should be ~500MB. This is important in browser models as almost any device can download and run it quickly. It’s also small enough to be able to easily fine-tune it on my personal computer. Qwen3, at the time of this article, is also one of their most advanced reasoning models. This allows the article to solve basic questions from the potential employer, in case they want the model to judge how well-suited for a position I am. Great! We’ve chosen our model, so let’s get started! What could possibly go wrong? (NOTE: If you’re just looking for the code for the project without considering why you shouldn’t do this, you can see it on my Github here. But, please keep reading)
How to train the model
To train this model, I used unsloth.ai. While they have their own tools for using Google Collab, I ran it locally. I don’t want to have to pay for any server. To do this, I created a Python environment in my selected directory with uv. (I’m running this on Windows 11 with this NVIDIA RTX 3090)
uv init .
This creates a main.py file that I will not be using, as I want to complete each step indepently. So, let’s consider the steps that I need to complete.
- # Create a JSON file of information to create synthetic information to train the model on and ensure there are matching schemas.
- # Utilize unsloth to fine tune the Qwen3–0.6b model in Python
- # Test the model to ensure it doesn’t hallucinate too much (that will definitely go well)
- # Convert the model to ONNX so that I can run it in a browser and deploy on our website!
- # Realize it doesn’t work and learn of two solutions from John
- # Final Thoughts
1. Creating a JSON file of your information
Why? Why not just use text? Well, the reason I use a JSON file is because it’s much easier to parse with the best AI Python package on the interwebs — Pydantic AI. Pydantic is a data validation library that helps with retaining data structures. This is excessively helpful when dealing with LLM systems, as it give context to the model for what each piece of data means and how to interpret them. It also enforces a data structure of the response from the LLM called. So, I alwasy use Pydantic AI for any LLM work. It’s a bit more work but it tends to assist in avoiding issues and reducing hallucination. So, we’ll build out our resume and facts about us in JSON. Below is an example of how I parsed my information but you can build yours however you choose, as long as you match. Let’s call this file initial_data.json
{
"resume": {
"work_experience": [
{
"role": "",
"company": "",
"duration": "",
"description": "",
"highlights": [
""
],
"skills_used": [
""
]
}
],
"education": [
{
"degree": "",
"institution": "",
"year": "",
"details": ""
}
],
"projects": [
{
"name": "",
"description": "",
"skills_used": [
""
]
}
]
},
"facts": [
""
],
"awards": [
{
"title": "",
"year": "",
"summary": ""
}
],
"publications": {
"papers": [
{
"title": "",
"journal": "",
"year": "",
"summary": "",
"facts": [
""
]
}
],
"patents": [
{
"title": "",
"patent_number": "",
"year": "",
"link": "",
"facts": [
""
]
}
],
"presentations": [
{
"title": "",
"conference": "",
"year": "",
"summary": "",
"link": ""
}
]
}
}
1.1 Building the schema for the LLM to parse
To properly build out the training, we need to build the schema for the input data so that the LLM knows what each piece of data is useful for. We will strategically call this file generation_schema.py, to signify the schema for our synthetic data generation. If you’ve never seen the schema before, it’s easier to understand if you see it. So, I’ll just share it below and it should make sense.
from pydantic import BaseModel, Field
from typing import List
class QAExample(BaseModel):
user: str = Field(..., description = "The user's question or prompt." )
assistant: str = Field(..., description = "The assistant's response, written in the first person as the persona." )
class QABatch(BaseModel):
examples: List[QAExample] = Field(..., description = "A list of question-answer pairs generated from the context." )
Great! Now, we have a structure for our data! The hardest part is complete!
https://medium.com/media/0e5251b77ceb23b5c45a36101cb2eead/href
1.2 Creating synthetic data
Despite how interesting we all think we are, the language model still needs more information to train. This usually involves creating multiple forms of the same question to give the same answer in multiple ways. This helps in building the proper word associations in the model. To do this, we will use a larger model to create synthetic data. For small models, it’s generally suggested that ~300 examples are used. But, if we want to reduce hallucinations, we can try over-fitting the model. In that case, we’ll want more like 2,000 examples. Each of these examples will take the form of the JSON below.
[
{
"messages": [
{
"role": "system",
"content": ""
},
{
"role": "user",
"content": ""
},
{
"role": "assistant",
"content": ""
}
]
}
]
In this structure, system tells the LLM what their role is in the conversation, user represents a question from the user, and assistant represents a correct answer. Finally, let’s get started making that synthetic information. We need to be strategic here so, let’s name this file synthetic_data.py. I know… I know… cryptic. While we could use Ollama locally for this, I’ll be showing how to use it with Gemini through Google’s Vertex AI on their Google Cloud.
import os
import json
import math
import time
import random
from typing import List, Dict
from dotenv import load_dotenv
from pydantic_ai import Agent
# Import the schemas we just created
from schemas.generation_schema import QABatch
# Google Gemini import
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider
# Load environment variables
load_dotenv()
# --- Configuration ---
API_KEY = os.getenv('GEMINI_API_KEY')
GEMINI_MODEL = os.getenv('GEMINI_AI_MODEL', 'gemini-3-flash-preview')
INPUT_COST = float(os.getenv('GEMINI_INPUT_COST', '0.5'))
OUTPUT_COST = float(os.getenv('GEMINI_OUTPUT_COST', '3'))
TARGET_TOTAL_EXAMPLES = 2000
INITIAL_DATA_PATH = "training_JSONs/initial_data.json"
SYNTHETIC_DATA_PATH = f"training_JSONs/synthetic_data_{TARGET_TOTAL_EXAMPLES}.json"
BOT_NAME = "JohnBot"
USER_NAME = "John Ferrier"
PASSES = 2 # Minimum number of times to run through the entire dataset
# Function to get the model based on provider
def get_model():
client = GoogleProvider( api_key = API_KEY )
return GoogleModel( GEMINI_MODEL, provider = client )
# Initialize the Model
MODEL = get_model()
# Personas to vary the "User" phrasing
PERSONAS = [
"a technical recruiter",
"a hiring manager",
"a curious student",
"a fellow researcher",
"an HR specialist",
"a startup founder"
]
SYSTEM_PROMPT = f"You are an expert data generator. Your task is to create high-quality, diverse fine-tuning data for a personal AI assistant representing {USER_NAME}."
def generate_qa_pairs(context_text: str, category: str, count: int = 2) -> List[Dict]:
"""
Generates Q&A pairs/conversations based on the provided context using PydanticAI
"""
persona = random.choice( PERSONAS )
prompt = f"""
Context ({category}): "{context_text}"
Task: Generate {count} distinct training examples where a user ({persona}) asks about this information and the assistant answers truthfully based on the context.
The assistant should sound like {USER_NAME} (the person in the context).
"""
try:
agent = Agent(
MODEL,
system_prompt = SYSTEM_PROMPT,
output_type = QABatch
)
# Run the agent (synchronously for this script)
result = agent.run_sync( prompt )
# Extract the structured data
batch: QABatch = result.output
used = result.usage()
return [example.model_dump() for example in batch.examples], used.input_tokens, used.output_tokens
except Exception as e:
print(f"Error generating data for {category}: {e}")
return [],0,0
def main():
if not os.path.exists( INITIAL_DATA_PATH ):
print(f"Error: {INITIAL_DATA_PATH} not found.")
return
# Open the initial data JSON that we built earlier
print(f"Loading initial data from {INITIAL_DATA_PATH}...")
with open( INITIAL_DATA_PATH, 'r', encoding='utf-8' ) as f:
initial_data = json.load( f )
# List of the generated synthetic data from the LLM
synthetic_dataset = []
# Flattened items to process for the LLM
items_to_process = []
# Process Resume
resume = initial_data.get("resume", {})
for job in resume.get("work_experience", []):
items_to_process.append({"category": "Work Experience", "text": json.dumps(job)})
for edu in resume.get("education", []):
items_to_process.append({"category": "Education", "text": json.dumps(edu)})
for proj in resume.get("projects", []):
items_to_process.append({"category": "Project", "text": json.dumps(proj)})
# Process Facts
for fact in initial_data.get("facts", []):
items_to_process.append({"category": "Personal Fact", "text": fact})
# Process Awards
for award in initial_data.get("awards", []):
items_to_process.append({"category": "Award", "text": json.dumps(award)})
# Process Publications
pubs = initial_data.get("publications", {})
for paper in pubs.get("papers", []):
items_to_process.append({"category": "Research Paper", "text": json.dumps(paper)})
for pres in pubs.get("presentations", []):
items_to_process.append({"category": "Presentation", "text": json.dumps(pres)})
for pat in pubs.get("patents", []):
items_to_process.append({"category": "Patents", "text": json.dumps(pat)})
print(f"Found {len(items_to_process)} base items. Starting generation...")
# Calculate how many examples to ask for per LLM call
examples_per_item = math.ceil(TARGET_TOTAL_EXAMPLES / len(items_to_process) / PASSES )
# Track usage
current_count = 0
pass_num = 1
input_tokens = 0
output_tokens = 0
# Generate the data
while current_count < TARGET_TOTAL_EXAMPLES:
print(f"--- Pass {pass_num} ---")
# For diversity, randomize the items_to_process list
random.shuffle(items_to_process)
for item in items_to_process:
if current_count >= TARGET_TOTAL_EXAMPLES:
break
# Ensure we're not generating too much
remaining = TARGET_TOTAL_EXAMPLES-current_count
if remaining<examples_per_item:
examples_per_item = remaining
# Generate a batch of examples for this item
new_examples, input_tks, output_tks = generate_qa_pairs( item["text"], item["category"], count = examples_per_item )
input_tokens += input_tks
output_tokens += output_tks
for ex in new_examples:
# Format into the training format expected by Unsloth/HuggingFace
entry = {
"messages": [
{ "role": "system", "content": f"You are {BOT_NAME}, an AI assistant created to communicate {USER_NAME}'s skillsets." },
{ "role": "user", "content": ex["user"] },
{ "role": "assistant", "content": ex["assistant"] }
]
}
synthetic_dataset.append( entry )
current_count += 1
print( f"Generated {len(new_examples)} examples for {item['category']} (Total: {current_count})" )
# Pause the script to avoid rate limits from Google
# Not needed for local models.
time.sleep( 0.5 )
# Update to show how many times we've processed the entire dataset
pass_num += 1
print( f"Generation complete. Saving {len(synthetic_dataset)} examples to {SYNTHETIC_DATA_PATH}..." )
# Save the data
with open( SYNTHETIC_DATA_PATH, 'w' ) as f:
json.dump( synthetic_dataset, f, indent = 4 )
# Calculate the total cost of this run
# This is just done for me so I'm not surprised. Should be around $0.20
total_input_price = INPUT_COST*input_tokens/1_000_000
total_output_price = OUTPUT_COST*output_tokens/1_000_000
total_cost = total_input_price+total_output_price
print( f"Total Input Tokens used = {input_tokens}, COST = ${total_input_price:.2f}" )
print( f"Total Output Tokens used = {output_tokens}, COST = ${total_output_price:.2f}" )
print( f"Final bill = ${total_cost:.2f}" )
print("Done!")
if __name__ == "__main__":
main()
This script obviously suggests the existence of a .env file, which should look like this:
GEMINI_AI_MODEL="gemini-3-flash-preview"
GEMINI_INPUT_COST="0.5"
GEMINI_OUTPUT_COST="3"
GEMINI_API_KEY="Your cool API key from Google"
2. Utilizing unsloth to fine-tune the model
Great! Now, we have our training dataset to use on our SLM. So, let’s get to fine-tuning! It’s important to note that this part of the code was written to be run on a computer with a CUDA-enabled NVIDIA GPU. If you don’t have a CUDA-enabled NVIDIA GPU, then you can utilize this script on a Google Collab server instead. But, hold off on that until you finish reading…
import os
import json
import torch
print(torch.cuda.is_available())
print(torch.version.cuda)
from dotenv import load_dotenv
from datasets import Dataset
from unsloth import FastLanguageModel
from trl import SFTTrainer
from transformers import TrainingArguments
# --- Configuration ---
load_dotenv()
MODEL_ID = "unsloth/Qwen3-0.6B"
OUTPUT_DIR = "unsloth_checkpoint"
ONNX_EXPORT_PATH = "./reactComponents/model_onnx"
MAX_SEQ_LENGTH = 2048
TRAINING_JSONS = "training_JSONs"
BOT_NAME = "JohnBot"
USER_NAME = "John Ferrier"
TRAINING_SIZE = 1000
# Load the local data for fine-tuning the SLM
def load_local_data( data_dir, size = TRAINING_SIZE ):
"""
Open the synthetic data we just made
"""
data = []
# Open Gemini examples
with open( os.path.join( data_dir, f"synthetic_data_{size}.json") , 'r' ) as f:
data = json.load( f )
print( f"Loaded { len( data ) } training examples." )
return Dataset.from_list( data )
# Fine-tune the SLM
def train_model():
# Load Model & Tokenizer via Unsloth (Fast & Memory Efficient)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = MODEL_ID,
max_seq_length = MAX_SEQ_LENGTH,
load_in_4bit = False,
dtype = None,
)
# Add LoRA Adapters
# The high r and lora_alpha are chosen to 'overfit' the model with our data
model = FastLanguageModel.get_peft_model(
model = model,
r = 64,
target_modules = [ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj" ],
lora_alpha = 64,
lora_dropout = 0,
bias = "none",
)
# Prepare Data
dataset = load_local_data( TRAINING_JSONS, size = TRAINING_SIZE )
# Unsloth handles the chat template formatting automatically if using 'messages' column
# but we need to ensure the tokenizer has a chat template.
tokenizer.chat_template = "{% for message in messages %}{{'<|im_start|>' + message['role'] + 'n' + message['content'] + '<|im_end|>' + 'n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistantn' }}{% endif %}"
# Format the data for training
def format_to_text(example):
# We process one example at a time to be safe
conversation = example["messages"]
text = tokenizer.apply_chat_template( conversation, tokenize = False, add_generation_prompt = False )
return {"text": text}
# Reformat the data for use in the fine-tuning
print("Formatting dataset...")
dataset = dataset.map( format_to_text, batched = False )
# Fine-tune!
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text", # Unsloth/TRL auto-formats this
max_seq_length = MAX_SEQ_LENGTH,
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 10,
max_steps = -1, # Adjust based on dataset size (e.g., 1 epoch)
num_train_epochs = 5,
learning_rate = 2e-4,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
output_dir = OUTPUT_DIR,
optim = "adamw_torch",
),
)
print("Starting Training...")
trainer.train()
print("Training Complete. Merging adapters...")
# Merge LoRA back into base model (required for ONNX export later)
model.save_pretrained
# Return the model and tokenizer for testing
return model, tokenizer
# Test the model to see what it says
def test_model( model, tokenizer ):
# Test Model
FastLanguageModel.for_inference( model )
messages = [
{"role": "user", "content": f"Who is {USER_NAME} and what is their biggest accomplishments?"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True,
return_tensors = "pt",
).to("cuda")
outputs = model.generate(
input_ids = inputs,
max_new_tokens = 256,
use_cache = True,
temperature = 0.0,
do_sample = False,
top_p = 1,
)
response = tokenizer.batch_decode(outputs)[0]
print(response)
if __name__ == "__main__":
model, tokenizer = train_model()
test_model( model, tokenizer )
print(f"Training Complete. Merged model saved to {merged_model_path}")
We now have a fine-tuned model that is completely perfect in every way…
3. Testing our new model
Obviously, this model trained with absolutely no errors at all and it perfectly described who I am and what my biggest accomplishments are…
https://medium.com/media/5d206f1662f941d969bb61f1e252fb65/href
From the test_model() function, my beautiful and perfect model output:
“John Ferrier is a renowned American physicist and engineer who famously created the Ferrierscope, which is a high-powered microscope made of microscopes.”
… Besides the blatantly true first fact, the rest is incorrect, though I wish it were true. I have, in fact, not invented a Ferrierscope, which is a high-powered microscope made of microscopes. So, what do we do then?
https://medium.com/media/142cdb4808b3fa7802ad5bb75bcf7cd3/href
I attempted training the model a multitude of times, with each giving their own flavor of hallucination. Even overfitting the model with way too much information did not help. Because of this, we obviously can’t use this model to represent me, though it would flatter me very well to recruiters. Since I know some people are reading this to see how to convert the model to ONNX format, I’ll continue with showig how to convert it. BUT, I will be addressing ways to actually make this work as a living resume in a section below.
4. Converting our model to ONNX
Let’s pretend like the model was actually perfect. How do we convert it to ONNX for use on ReactJS applications? Well, we can start by creating a misleading python script called export_onnx.py.
import os
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer
# --- Configuration ---
OUTPUT_DIR = "unsloth_checkpoint"
ONNX_EXPORT_PATH = "./reactComponents/model_onnx"
def export_to_onnx( model_path ):
"""
Export a model to ONNX format.
"""
print(f"Exporting model from {model_path} to ONNX...")
# 1. Export to ONNX (CPU based)
model = ORTModelForCausalLM.from_pretrained(
model_path,
export = True,
use_cache = True,
use_io_binding = True,
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Save intermediate FP32 model
onnx_path = os.path.join( ONNX_EXPORT_PATH, "weights" )
model.save_pretrained( onnx_path )
tokenizer.save_pretrained( ONNX_EXPORT_PATH )
print(f"Success! Model saved to {ONNX_EXPORT_PATH}")
# Done
if __name__ == "__main__":
# Export
export_to_onnx( OUTPUT_DIR )
That’s it. That’s how you can export the ONNX format for your ReactJS app. Now, let’s get to the meat. How do we actually make this work?
https://medium.com/media/3c2b754d80c22fcce68269fde194bbcf/href
5. The real solution that’s much easier
Ok. The model didn’t work properly. But, this is the nature of small models. So, how do we get this work? We still want a chatbot to represent us to potential employers and people who are just generally interested.
To solve this, I have 2 methods. One is free but comes at the cost of accuracy and performance on your user’s computer. The second is more accurate with no performance hits but costs a tiny amount of money per month. Let’s look at the cheap one first.
5.1 Sticking with the Qwen3–0.6b approach
“Wait… I thought the model didn’t work?” Well, the fine-tuning didn’t work. But, the model does work.
What we can do instead is include all of our data from intial_data.json in the prompt for the model. But, we don’t want it in a JSON format. While this format is perfectly fine for a language model to parse, the issue we could (unlikely) run into is with the context window of the model. For the Qwen3–0.6b model, this context window is 32,768 tokens, which is pretty big. For reference, that’s about equivalent to the size of The Old Man and the Sea by Ernest Hemingway. So, unless your resume is completely ridiculous and unbelievable, we can easily include our initial_data.
But, for the sake of reducing the context window usage, and making it easier for us to update later on, we should convert our resume into a markdown structure. Let’s call it resume.md. Following the markdown structure, it should essentially look like the file below.
You are the AI Living Resume for John Ferrier. Answer questions using ONLY the following context. If the answer is not in the context, apologize and state you don't know. Be professional, persuasive, and concise.
## PERSONAL FACTS & BIO
- Cool fact about me
...
## EDUCATION
- **PhD: Condensed Matter Physics** at Northeastern University (2024). Dissertation: Harnessing Machine learning...
- **Master of Science: Condensed Matter Physics** at Northeastern University (2020).
...
## WORK EXPERIENCE
### Doctoral Researcher at Northeastern University - 2D Quantum Materials Laboratory (Jan 2019-Sept 2024)
**Summary:** Machine Learning & Computational Physics; Software & Hardware Engineering
**Highlights:**
- Developed neural network methods that achieved more precise convergence...
- Reduced computational predictions for 2D quantum material...
**Skills Used:** Leadership, Machine Learning, PyTorch, TensorFlow...
## PROJECTS
### 2D Material Synthesis Optimization
Developed neural network methods and thermodynamic models to optimize the synthesis of 2D quantum materials, reducing computational predictions and experimental discovery time.
**Tech Stack:** Python, NumPy, SciPy, Machine Learning, Quantum Chemistry...
## AWARDS
### National Science Foundation Graduate Research Fellowship - 2020
John received the NSF Graduate Research Fellowship for proposed research...
## PUBLICATIONS
### PAPERS
#### Cool paper name
Paper Summary - Cool summary
Link - https://coollink.com/
...
This text file could then easily be referenced when setting the system context. An example of this context would look like this:
const response = await fetch('/resume.md');
const text = await response.text();
resumeContext.current = `
You are a professional AI assistant named JohnBot representing John Ferrier.
Use the following resume text from John's resume to answer questions:
---
${text}
---
STRICT RULES:
1. ONLY answer questions based on the resume provided.
2. If a question is unrelated to the professional experience or the person in the resume, politely decline to answer.
3. Keep answers concise and professional.`;
If you know what you’re doing, you can take this and run. If not, I will be writing an article showing how to this linked here (not currently written. Check back later).
This method is easy to setup and run but it requires your user to download a model and run it locally, which can be annoying for them but amazing for you, if you’re broke like me and running your website on a Raspberry Pi in your office…
https://medium.com/media/9fcd045e522b39afb8f0dc803391e908/href
But, if you can afford upwards of $1/month, you can use the faster and better version, which is just as easy to implement.
5.2 Utilizing an external model
Once again, our Google overlords have provided a solution to our dire issues. Much like the method above, this method utilizes including your entire resume.md in the system context for the model.
If we use the Google Gemini 2.0 Flash-Lite model, not only will the model “load” instantly for the user but, there will be no computational resources utilized on the user’s computer and the context window is 1M tokens (~30.5x larger than the Qwen model), which is about the size of all 7 Harry Potter books combined. If someone is asking these many questions about you, I don’t think that’s an employer and you should speak to the police.
This approach incurs a bill, so precautions should be taken. You can utilize Google Vertex AI for this approach and the model performs much better than the Qwen3–0.6b model so you can expect better responses with less hallucination.
If you want to see how to implement this method, I have also written an article about this linked here (it’s not written yet. Hold on).
6. Final thoughts
This has been an excellent learning experience, for sure. While training small language models is a great exercise in understanding how these models work, it’s important to realize that small models aren’t necessarily where they need to be for what we want. It could be argued that if I were to train a small language model from scratch, it would work and hallucinate less but that is a lot of work and it could potentially contain a lot of grammatical errors and be missing basic logical interpretations. Plus, this would require the use of a lot of data. Who knows? Maybe I’ll implement this later… (don’t hold your breath).
When it comes to creating a living resume, I implore you to not try to fine-tune your own model. Even if it works correctly, it still requires that the user download the model locally to run on their own hardware, unless you choose to host the model on a server. But, at that point, it would be much smarter to utilize existing models with the proper system context. Even though Google has newer models, like the Google Gemini 3 Pro, it wouldn’t make sense to use this just for our resume. Even their small models tend to perform well, when given constraints in the system context. Because of this, I’d suggest using the smallest and cheapest model in their line-up.
If you have any comments, please leave them below. Thank you for joining me on this journey and I hope you learned something interesting.
How to train a language model on your resume (and why you shouldn’t) was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.