Ajay's Quest
Posts
*taps you on shoulder* I have more AI stuff for you!

taps you on shoulder I have more AI stuff for you!

There was SO MUCH that happened this month for AI

Ajay Prakash
December 17, 2024

Okay, I know—I’m deep into the AI rabbit hole right now.

As the year winds down, I’ve found myself in maintenance mode: honing my skills, taking care of my health, and, honestly, just chilling. But I’ve also been playing with some incredible AI tools, and I thought I’d save you some time by sharing what I’ve been experimenting with.

If you’re looking for inspiration or just curious about what’s possible, here’s a rundown of what’s caught my attention lately.

GPT-o1 is smarter than PhDs

This blew my mind. GPT-o1 is outperforming humans in PhD-level science, math, and coding questions. In many cases, it’s even beating expert human reasoning.

I’ve been testing its ability to tackle problems that require breaking down steps and applying logic. For instance, I asked it to design a balanced board game, and it methodically worked through the steps, iterating on the design.

This opens up new possibilities for tasks requiring deeper reasoning. At EntryLevel, for example, we plan to upgrade our AI grading system. Currently, it struggles with consistency—sometimes scoring the same submission differently. GPT-o1 could make grading more reliable and standardised.

Flux Lora Image Generation

Flux LoRA is an open-source image generation model that creates highly personalized images. All it takes is uploading 10–20 photos of yourself as training data. (Pro tip: Use 20+ photos with varied settings and angles for better results. My dataset lacked variety, so I’ll be redoing it soon.)

Here’s the tutorial I used to train my model: link.

This is the model I trained:

Image to Video AI

While tools like Sora are making waves, access remains tricky. I’ve been experimenting with open-source alternatives, but the results are hit-or-miss.

Here’s an example of me trying to create a video using one of these models on Replicate. While the output was interesting, the facial movements were distorted, and the overall quality wasn’t quite there yet.

Still, the potential for video generation tools is exciting, and I’m eager to see how this tech evolves.

This is me trying a model on replicate

Things I actually use 🚀
Here are a few recommendations or ways I can help you. Follow the links if any of these are interesting. (FYI - Using these links helps support the newsletter through a small kick back!)

🌍 Hiring Global Talent: Fine A+ players at lower costs with Athyna
👨‍🎨 Get Unlimited Design Work: Want great designs at a low monthly cost? Try ManyPixels
📰 Build a Newsletter: Want to build a newsletter like me? Try Beehiiv

Google Realtime Streaming

Google’s Gemini 2.0 is their latest AI model. While it’s an improvement over their previous versions, I’m not rushing to cancel my ChatGPT subscription just yet.

What stood out, though, is their new real-time streaming input mode. It lets you use your webcam or screen share while interacting with the model. It’s fast (around 2 seconds response time) but still clunky in execution.

You can see here this youtube shared his video and was able to get it to tell him what was happening in the video very quickly. (2 seconds response time)

From the AI Advantage on youtube

I tested it with spreadsheets, asking it to analyse date fields and generate formulas. While the speed was impressive, the voice medium made formulas harder to follow, and the Gemini voices lacked the natural feel of OpenAI’s voice capabilities.

Project Astra

Gemini 2.0’s speed shines in Google’s Project Astra, which focuses on mobile video AI. The project is still in early stages, but its ability to process video in real time is worth keeping an eye on.

Watch the video here: https://youtu.be/rL6y0_X0muM

Project Mariner

Project Mariner takes things further, exploring human-to-agent interactions. Imagine solving complex tasks directly in your browser with an AI agent guiding you step-by-step.

Here’s the video from Google DeepMind showcasing this project:

Google XR

Remember Google Glass? It’s back, reimagined as Google XR. This ambitious project builds on Google’s advances in AI and AR, combining real-time streaming, video processing, and human-to-agent interactions into one futuristic package.

For millennials and older, it’s a nostalgic nod to the tech that was ahead of its time. But now? It might finally live up to its potential.

View the full video here: https://www.youtube.com/watch?v=a1Z12O5abgU

Wrapping Up

From smarter-than-PhD models to AI-generated images and Google’s AI ecosystem, the pace of innovation is staggering. I’m excited to see where these tools take us in 2025—and how they’ll shape the way we work and create.

If you’re tinkering with AI too, let me know! I’d love to hear what you’ve been building.

Until next time,
Ajay

*taps you on shoulder* I have more AI stuff for you!