At a Glance
Keeping token lengths to a minimum is crucial for achieving high-quality AI output. With a too big context, we overwhelm the AI, which might miss important details.
We often use objects to feed the LLM with additional information or history. Converting objects to JSON is, in most cases, not efficient in terms of token length. TOON (Token-Oriented Object Notation) is a format which can help in that case.
When it comes to working with AI, a big topic are the so-called “tokens”. Tokens are the building blocks of AI and natural language processing. They break down text into manageable pieces—think words, punctuation, or characters—that AI models use to understand language.
Tokens and context
Tokens help the AI maintain context. This is huge for chatbots or virtual assistants because understanding what someone means and keeping the conversation flowing is essential. By connecting the dots between tokens, the AI can give much more relevant and coherent replies.
Tokens are key to helping AI models understand and engage with language.
But here’s the catch: dealing with long contexts can get a bit tricky. Every AI model has a token limit that dictates how much text you can send at once. So if an API can handle 2048 tokens, you’ll need to trim any text beyond that. If you go over that, you might lose some critical context. This can lead to incomplete responses or misunderstandings, resulting in the AI giving you generic answers that don’t really relate to what you were talking about.
Longer contexts can also create a mess. As you add more tokens, it can get cluttered, making it more challenging for the AI to see what’s most important in the conversation. This might cause it to miss out on subtle details, leading to less accurate responses, especially with more complex questions or longer back-and-forths.
Today’s models can handle a crazy amount of tokens, but this doesn’t mean we should use them. In some scenarios, it’s needed, but context length is still an issue. Context in the middle gets less important as context grows. The start and the end are more present. It’s like a human brain. We remember memories from the far past and those from just a while ago better.
When the context gets cut short, the quality of your interactions can really suffer, but when it’s too long, AI misses details.
Figuring out how to manage long contexts means balancing token length with the need to communicate effectively. For that, we can use things like summarising or pruning context to avoid overwhelming the AI while keeping interactions smooth.
Saving money
Keeping tokens to a minimum is also important from a financial perspective. AI services count tokens to bill you. So keeping tokens low, keeps the bills smaller. It’s a balance between quality output and minimal token counts.
For a single call, that’s not a big deal, but on scale, it makes a huge difference. Just thinking about a chat history being sent to the AI over and over again, with or without summarising, makes a crazy difference in a single chat.
Such chat history for example is always send via objects. If you just encode those to strings, since AI needs a string in the end, we add a lot of unused data which adds in more tokens.
Prices per tokens are becoming less and less and the context window of models are increasing. But this doesn’t mean we should throw endless prompts to the AI.
Keeping objects shorter
TOON is a format which allows us to save on extra characters in our requests, ultimately reducing the token length without demanding much additional effort from us as developers. Objects are exactly where this format shines.
Token-Oriented Object Notation is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage. It’s intended for LLM input, not output.
https://github.com/johannschopplich/toon
TOON reduces token consumption by 30-60% compared to standard JSON.
Example
Let’s see how it actually looks like.
// JSON String
"items: [{ sku: 'A1', qty: 2, price: 9.99 }, { sku: 'B2', qty: 1, price: 14.5 }]"
// TOON
items[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5
You see the issue. Keys are repeated in JSON. When passing a long array, it’s evident that this is adding a lot of unnecessary characters. The good part is that LLMs understand TOON, and there’s no concern about not using it. There are multiple ports to every bigger programming language you can.
TOON says it will reduce token consumption by 30-60% compared to standard JSON. Let’s see if this is the case and how to implement it.
The usage is pretty straightforward. Install the package for your language of choice/project and pass the object/array you want to convert into the corresponding method. In JavaScript this looks something like that:
// Encode object to TOON
const toon = encode({
user: {
id: 123,
name: 'Ada'
}
})
// Decode it back to an object
const data = decode(toon)
There are a couple of options you can tweak, for example, what type of delimiter you want to use.
When generating a prompt that includes a TOON, they suggest wrapping it in a fenced code block and labelling it with the format. Overall, the structure is self-documenting, and LLMs pick it up.
```toon
// TOON data
```
When you want the response to be a TOON, it’s best to give the LLM an example and specify that format according to their documentation.
Data is in TOON format (2-space indent, arrays show length and fields).
```toon
users[3]{id,name,role,lastLogin}:
1,Alice,admin,2025-01-15T10:30:00Z
2,Bob,user,2025-01-14T15:22:00Z
3,Charlie,user,2025-01-13T09:45:00Z
```
Benchmark
It’s obvious to use TOON for longer arrays, but I also wanted to test it with a normal object without much fields inside and compare it between JSON and TOON.
I converted an array, which is 886 characters long as JSON. After converting it, I ended up with a 389-character-long TOON. So we saved more than half of the tokens with the conversion alone. We didn’t lose anything, just saved off some tokens in your prompt.
[
'customer_id' => 'CUST-2025-8934',
'name' => 'Jennifer Walsh',
'account' => [
'type' => 'premium',
'since' => '2023-06-15',
'monthly_value' => 149.99,
'status' => 'active'
],
'recent_tickets' => [
['id' => 'TKT-001', 'date' => '2025-01-10', 'issue' => 'Login problems', 'resolved' => true],
['id' => 'TKT-002', 'date' => '2025-01-15', 'issue' => 'Billing question', 'resolved' => true],
['id' => 'TKT-003', 'date' => '2025-01-20', 'issue' => 'Feature request', 'resolved' => false]
],
'satisfaction_scores' => [9, 8, 10, 9, 7],
'products' => ['CRM Pro', 'Analytics Plus', 'Support Suite']
];
/*
customer_id: CUST-2025-8934
name: Jennifer Walsh
account:
type: premium
since: 2023-06-15
monthly_value: 149.99
status: active
recent_tickets[3]{id,date,issue,resolved}:
TKT-001,2025-01-10,Login problems,true
TKT-002,2025-01-15,Billing question,true
TKT-003,2025-01-20,Feature request,false
satisfaction_scores[5]:
9,8,10,9,7
products[3]:
CRM Pro,Analytics Plus,Support Suite
*/
For me, this is a no-brainer. It’s not much effort to implement and saves a lot of tokens with no downsides. The quality of the response might increase, while the cost of the prompt decreases.
Conclusion
By being smart about how to manage tokens and their limits, you can create better conversations and a smoother user experience.
Every AI model has a token limit that dictates how much text you can send at once. So if an API can handle 2048 tokens, you’ll need to trim any text beyond that. It’s all about crafting concise prompts that keep things clear.
Token-Oriented Object Notation is a format which can significantly reduce the token length of an LLM input. It shines especially when using objects, which are often used to provide context or base information for the AI.