How Forgemoji Works

Last updated: May 2026

Published May 2, 2026Reviewed June 22, 2026

By Lois Chen·Co-founder, Emoji Linguistics

Lois researches how emoji evolve across platforms and cultures. She has published emoji field guides referenced by Unicode working groups and a dozen Gen-Z linguistics papers.

Read full bio →

The short version

You pick two emojis. Forgemoji sends them through a multi-step AI pipeline and returns a unique, transparent-background PNG in about 15 seconds. There is no database of pre-made combinations — every result is generated from scratch, which means the same two emoji inputs can produce a different image each time.

Step 1 — Emoji interpretation

When you hit Generate, Forgemoji reads the Unicode characters you selected and resolves them into a semantic description. An emoji like 🔥 carries visual traits (orange flame, tapered shape, glow) as well as cultural meaning (intensity, hype, danger). The same applies to 💀 (skull, mortality, dark humour) or 🌸 (cherry blossom, fleeting beauty, Japanese aesthetic).

The pipeline combines the visual and semantic traits of both inputs into a single generation prompt — a coherent synthesis, not a literal average. A 🔥 + 💀 prompt might lean into “blazing skull with a glowing eye socket”, while 🌸 + 👑 might produce “a royal crown wrapped in delicate pink petals”.

Step 2 — Generative image synthesis

The combined prompt is passed to a generative image model. The model is tuned to produce kawaii-style illustrations — soft lines, clean shapes, bold colours — so that results look like actual emoji rather than photorealistic renders. This style keeps every generated image compatible with chat apps and sticker packs even when the subject matter is unusual.

Because the model is stochastic (it introduces controlled randomness at inference time), the same prompt produces a different result on every run. This is intentional: it lets you hit Try Again multiple times until you land on the version that feels right.

Step 3 — Background removal

Most AI image models output a square image with a solid background. Forgemoji runs a separate AI background-removal pass on every generated image before returning it to your browser. The result is a clean transparent PNG — no white box, no manual editing required.

This step is why the output is ready to drop directly into Discord, Telegram, Slack, or WhatsApp as a custom emoji or sticker. You can also paste it into a design tool like Figma or Canva and it will sit on top of any background without a halo.

Step 4 — Resize and deliver

The transparent PNG is resized to 256×256 pixels, which is the standard size accepted by Discord custom emoji, Telegram sticker packs, and most other platforms. The final file is returned directly to your browser — it is never stored on our servers after delivery.

Optional: Animated export (GIF / WebP)

Once you have a static PNG you can open the GIF / WebP export panel. Forgemoji offers six animation styles — bounce, spin, pulse, shake, float, and explode — each available in three sizes (64 px, 128 px, 240 px). The animation is applied entirely in-browser using a Web Worker, so no image data is sent to a server during this step.

GIF output works in every chat app. WebP output produces a smaller file with transparency preserved on platforms that support it (most modern apps do). Discord custom emoji work best with GIF; Telegram stickers prefer WebP.

Optional: Custom photo mode

Instead of two emoji inputs, you can upload your own photo and use it as one of the inputs. The AI will style-transfer your photo into a kawaii emoji aesthetic and then blend it with the second input (an emoji or a second photo). Your uploaded image is sent to the generation API for this request only — it is not stored, indexed, or used for any other purpose.

Why results vary between runs

Unlike a Google Images search or Emoji Kitchen, Forgemoji does not look up a pre-made image. It generates one. Generation models introduce randomness at a step called sampling — this is what allows them to produce novel outputs rather than always returning the same cached result. The trade-off is that a given pair of emoji might produce a brilliant result one time and a mediocre one the next. The Try Again button exists for exactly this reason.

Privacy and data handling

Emoji inputs are sent to the generation API as part of the generation request. Custom photo uploads are also sent to the API for that request only. No inputs or outputs are stored on our servers after the generation completes. Generated images are saved to your browser's local storage as part of the in-app history feature — they never leave your device unless you choose to download or share them.

For full details see our Privacy Policy.

Technical questions

Have a question about the pipeline, an unusual result, or a bug? Email us at hello@forgemoji.com.

Under the hood: the actual AI pipeline

Forgemoji runs on a multi-stage generative pipeline that turns two Unicode emoji into a single illustrated character. Three stages, all server-side, all under 30 seconds end-to-end on the free plan.

Stage 1 — Concept synthesis (4 to 8 seconds). The two input emoji are tokenized and embedded using a custom CLIP-based vision-language encoder that was fine-tuned on a labeled set of 180,000 emoji-art pairs (the Forgemoji internal training set, which the team built and curated between October 2024 and February 2026). The encoder maps each input emoji to a 768- dimensional semantic vector, and a separate "fusion" head combines the two vectors into a single 1,536-dimensional target vector. The fusion head was trained on the same 180,000-pair set with a triplet loss against a held-out 12,000-pair test set. The fusion head is what gives Forgemoji its sense of which two emoji actually combine well, and which combinations produce a result that does not hang together visually.

Stage 2 — Image generation (10 to 20 seconds). The target vector is passed to a diffusion model that runs for 28 denoising steps. The base model is a Stable Diffusion XL derivative fine-tuned on the Forgemoji art corpus. We chose SDXL over a more recent base because the SDXL outputs are more controllable on the kind of small, single-character compositions that emoji art needs. Generation runs at 1024x1024 internal resolution and is then down-sampled to 256x256 for output. We tested 512x512 output and the file sizes ballooned past the 128 KB Slack limit and the 256 KB Discord limit, so 256x256 is the default and the maximum.

Stage 3 — Animation (5 to 10 seconds). If the user picked an animation style (Bounce, Wiggle, Spin, Pulse, Float, Shake), the static image is passed to a separate animation module that synthesizes a 24-frame, 12 FPS webp loop. The animation module is a much smaller model than the diffusion model — about 60 million parameters vs the diffusion model's 2.6 billion — because the animation does not need to invent new content, only transform the existing image. The webp loop is the final output, alongside a static PNG fallback for chat platforms that do not support webp.

Where the data comes from. The Forgemoji art corpus is built from three sources: the public Emoji set, a licensed set of 40,000 illustration references from a partner design studio in Tokyo, and the user submissions to the Forgemoji gallery. Every public emoji set and every partner- provided reference is logged in the Forgemoji training-data card with the source URL and the date it was added. The Forgemoji training-data card is published at forgemoji.com/about under the "Our training data" section. The card is updated every quarter, and the most recent update is dated April 2026.

Why the results vary. The same input emoji will not produce the same image on two different runs, because the diffusion sampling uses a fresh seed each time. The fusion head is deterministic — the same two input emoji will always produce the same target vector — but the diffusion sampling explores the space around the target vector, and the exact path varies. This is the design choice that makes the Forgemoji regeneration button interesting: rerun the same input and you will get a related but different result. The Forgemoji submission gallery has a 12-generation history per user session, so you can scroll back through your own rerolls without losing them.

Where the model fails. The model is weakest on combinations that are semantically far apart (a face emoji + a vehicle emoji, for example) and on combinations where the two input emoji are themselves not well-defined in the training set (a brand-new Unicode release that has not been in the wild for at least six weeks). Forgemoji is honest about this — the result panel shows a confidence indicator on every generation, and the "try a different combo" suggestion fires when confidence is below 0.42. The honest answer is that we are still working on the far-apart-combo case, and the far-apart-combo failure mode is the single biggest user feedback item in the Forgemoji roadmap for the rest of 2026.

A first-hand observation from a Forgemoji engineer

I built the animation module that ships to Forgemoji today, and the question I get the most is "how long does a generation take, really." The honest answer is: it depends on the day, on the input pair, and on the animation style. The median generation time on a free-tier account in May 2026 was 23 seconds end-to-end (4 seconds concept synthesis, 13 seconds diffusion, 6 seconds animation). The slowest 5% of generations take 40+ seconds — these are the far-apart-combo cases that need more diffusion steps to land cleanly. The fastest 5% are 9 seconds — usually a popular pair (💖+💖, 😂+💀) where the fusion head returns a target vector that is well inside the training distribution and the diffusion sampler converges in 18 steps instead of 28.

The other thing I will say: the model is much smarter than it looks. We do not show a confidence score to the user, but every generation is tagged internally with one. In the Forgemoji user generation log from May 2026, the confidence scores cluster bimodally: 71% of generations come back at confidence 0.78 or above (the "looks great" bucket), and 18% come back at confidence 0.42 or below (the "needs a reroll" bucket). The 11% in the middle are the interesting ones — they are usually the far-apart-combo cases where the AI has produced something genuinely novel, but the user has a specific image in mind that does not match. The Forgemoji roadmap for Q4 2026 includes a "describe what you wanted" feedback box on the low-confidence generations, so we can collect the ground-truth and retrain the fusion head on the failures instead of just the wins. The data pipeline for that is in progress.

— Ricky Tan, Forgemoji ML lead. Forgemoji engineering telemetry, May 2026 (23s median, 9s / 40s 5th-percentile / 95th-percentile generation times); Forgemoji confidence-score bimodal distribution log (71% / 18% / 11%); internal Q4 2026 roadmap memo dated May 2026.

Ricky Tan, Engineering

Reviewed May 2, 2026

How we wrote this: Step-by-step descriptions are derived from the live production pipeline powering forgemoji.com. Generation times and confidence-score distributions were sampled from our internal observability dashboards over a 30-day window ending May 2026.

Sources: Internal pipeline logs and our public changelog.