Introduction / History

(This entire 1st section is skippable if short on time or patience)

My AI Journey has been a very LONG journey to say the least. You could say it all started back in 2020 when I purchased my first true GPU for gaming. I bought the Nvidia RTX 1050TI GPU for my Gaming PC at the time. I was able to play decent AA-AAA games at lower settings at the time. Sure the card was already 4 years old at the time. But I was broke and only could afford to buy a card with crypto (via shopping.io) and I didn't have the budget for a GOOD GPU at the time. Once I realized that I could use that GPU to mine crypto, it was lights out. (There was no real Generative AI back then).

I quickly upgraded to a Nvidia RTC 3060 TI when I could afford it. I tried to run local AI when it became more mainstream and easier to use locally, but my RTX 3060 just couldn't do much because of it's tiny 4GB of VRAM.

So the fast forward to 2025 and I purchased a Nvidia RTX 5070TI because it has 16GB of VRAM and can run way more models. I still use my 5070TI to min via 'unmineable.com' but it's not a good ROI when you take into account Power Costs but I don't care!

Model	GTX 1050 Ti	RTX 3060 Ti	RTX 5070 Ti
Launch Date	Oct 25, 2016	Dec 2, 2020	Feb 20, 2025
Launch MSRP	$139	$399	$749
CUDA Cores	768	4,864	8,960
VRAM	4 GB GDDR5	8 GB GDDR6	16 GB GDDR7
Bus Width	128-bit	256-bit	256-bit
Base / Boost Clock	1290 / 1392 MHz	1410 / 1665 MHz	2295 / 2452 MHz
Memory Speed	7 Gbps	14 Gbps	28 Gbps

Yeah as you can tell... each GPU upgrade I performed was a MASSIVE upgrade at the time.

But roughly around late 2022 and early 2023, I started to get new work at my IRL job. I work as a IT consultant during the day and I moonlight as a Blockchain/Web3Gaming Enthusiast at night. This new work was working with Nvidia (not the company I work for) and going out to various Datacenters for customers and installing massive Nvidia DGX servers. A DGX server is "Nvidia's AI Server Standard". So basically a company wants to start running some sort of local (on-prem) instance of AI and so in order to do that at a corporate scale they need massive servers dedicated to holding Nvidia's latest and greatest GPU's. When you hear in the news that a company (like Twitter/X, Facebook, etc) is buying so many thousands of Nvidia GPU's, that are these servers that I install.

So of course naturally I started to get into running AI locally more and more. And no matter how I tried my RTX 3060TI was just not good enough. But when I got my 5070TI the floodgates opened up real wide!

The TLDR is: I have a pretty good consumer grade Nvidia GPU in my desktop and so I can run AI things locally. So instead of sending a picture of say you and your newborn baby in the bath and asking Gemini to change the artistic style of the photo and now Google keeps that image you sent it... I can just ask the local AI running on my Desktop PC to modify the image using almost the same AI and nobody gets to keep that data of mine except me!

Because I never want my data to end up in this clusterfuck:

Embedded Image

But enough of a history lesson and let's get into the point of this article, shall we?

Brief Recap - Kid in Candy Shop

So when I bought my 5070 (Let's just assume going forward when I say "5070" I am talking about my Nvidia RTX 5070 TI GPU, and "3060" means Nvidia RTX 3060TI GPU) I was in a complete conundrum. I had this GPU that could run local AI finally, but I didn't have much practice with it. All the practice I had before then was pulling down MASSIVE Blueprints from build.nvidia.com and spinning them up in a cloud GPU instance for work or playing around with our DGX H200 and running massive models there. Yet both those were work related and not fun models like Generating video with audio from text, or inputting a few images and getting a fully rendered 3D model as the output that I could then go 3D print that object.

What that means is that I had a lot to learn and a LOT to test out.

As of today I have tried out the various Local AI so far:

ComfyUI: I use this as my primary AI to generate images, videos, 3D Models, audio, etc etc. It is not a normal "enter text - output image" but more workflow based where you can setup boxes/nodes that each have a specific task which let's you create entire workflows for Generative AI easier (after the learning curve).
LMStudio: This was my default local LLM AI option.
Cursor: A code editor that integrates AI directly into your typing. It can predict your next code changes/edits. It also let's you select already written code and ask AI about that specific snippet. It is currently the gold standard for AI-assisted coding in the IT world.
Ollama: Ollama is a household name to anyone who runs local AI. It's almost a 1 stop shop but it's a little rough around the edges for me. It is primarily used as a backend instance for local (this is foreshadowing....).
Antigravity: This is Google's try at 'Cursor'. But it doesn't help you write code, it just uses AI to write entire files, end products, feature addons, etc. It is a great tool to start writing code.
AnythingLLM: This is more like a front end software that can integrate with local AI via Ollama or can connect to various web based AI's. It real purpose and focus is with setting up a RAG (Retrieval-Augmented Generation = upload all your documents which is good data and it searches those documents for the answers).
Nvidia AI Workbench: This is more a Out of Band Management tool for your local AI. You basically setup a local AI instance and run it inside a container so it doesn't break. What is nice though is that you can setup the software on say a laptop too and reach and use all your local AI back at home on your LAN.

Embedded Image

(ignore the Ollama download date, turns out I deleted the exe file and I am anal about keeping version backups of installation files)

Embedded Image

I have all those various software's installed and there is so much overlap so I need to do something about that and clean up my desktop experience. If you are not careful (like how I wasn't) you can quickly fill up your Terabytes of storage with downloading various models across various software.

Just look at my ComfyUI and LMStudio folders...

281 GB of storage space taken up by ALL the models I have downloaded that are only useable within ComfyUI and no other local AI application. Then another 41.6GB of Models just for the LMStudio application. This can't happen!

Now I just gotta' clean all that up and tidy up my Desktop and streamline my local AI!

Editor Note:
This article is part high level walk through, part installation guide, part me rambling and part just AI nerdyness. Also the software and models stated in this article are just ones that I like and ones that I choose to run based on my use cases and based on my current physical hardware too. Your mileage may vary.

Figuring Out What To Keep And What To Remove

Since this entire post is about AI and efficiently running AI Models locally. I figured I should have a web based LLM help me figure out how to replace it!

Embedded Image

The necessary evil is that I need a web based AI LLM in order to setup my Local AI properly..

Embedded Image

Sometimes #TheEndsJustifyTheMeans ...

AI Software To Keep

I still need ComfyUI because nothing else that I know of comes close to the range that get with ComfyUI. So she stays put (for now).

Ollama will become my default backend for all other local AI's.

AnythingLLM will be used primarily as my RAG local AI instance.

AI Software To Delete

Cursor will be my single Coding AI instance. I won't have this open all the time, so if I am doing some hardcore coding I have no problem closing out AnythingLLM or ComfyUI while using it.

The more I used it, the more I learned I don't like it. So uninstalling it will not be hard. Plus it will free up 41GB of space!

I do want to set this up and use it, but I think that will be another project for ~6 months down the road and then I will take another look at my local AI software and setup.

I don't need Cursor to help me code and then Antigravity to write the code for me. So that can go bye bye!

So I now know the path forward. I know which software is going to be used for what and what I can delete and not delete. In the next section I am going to walk through the setup of the various apps to rune optimally!

But first we need to screenshot my current storage space and see what I can clean up in the process! Also don't judge me, I know I have a problem and I am seeking ~~professional~~ AI help with it

Embedded Image

That is a 2TB C: Drive and a 1TB D: Drive, with a total of 592GB of free space across the usable 2.72 TB of NVME SSD storage space!

And now we reboot the PC to make sure we are starting fresh but Windows decided now was a really good time to install updates. Which if I am being honest... I rather this happens then 20 minutes AFTER I reboot and am in the middle of things!

Embedded Image

Setting Up (properly) My local AI Software

After a first pass I was able to free up a lot of storage space.

Embedded Image

I now have 928GB of free NVME SSD storage space instead of the 592GB I hade before uninstalling everything. So even without digging into my Windows folder structure to see what was missed, I already cleaned up 336 GB of space. On 1 hand that is terrible it was taking up that much, but on the other hand I am glad I cleaned up all that unneeded junk!

Now onto actually reinstalling only the things I need....

That being said, I am not going to get into every single little command and argument to use. Just ask Perplexity for instructions and steps if you get lost.

First we install with ComfyUI

Note from the Editor:
Originally when I installed ComfyUI I did NOT install the portable version. But that was the version where I took all my screenshots and documentation steps from. Then I got smart and uninstalled it and downloaded the portable version and for got to screenshot the steps... oops.

ComfyUI portable download link here: https://docs.comfy.org/installation/comfyui_portable_windows

Embedded Image

Create a directory/folder on your C: drive for your ComfyUI, I choose just C:\ComfyUI_portable

Extract your portable ComfyUI download into your C:\ComfyUI_portable folder

Embedded Image

And now ComfyUI is installed and ready to play with. You will need to run the "run_nvidia_gpu.bat" file to start ComfyUI. I copied it to my desktop for easier starting of my ComfyUI instance.

When ran, it opens a Command Prompt and starts up everything ComfyUI needs to run.

Then it's just downloading Models manually, or finding already made templates and it will tell you which Models are required for that Template.

DO NOT JUST FOLLOW THE MODEL DOWNLOAD STEPS BELOW WITHOUT READING IT ALL FIRST.. THIS IS YOUR WARNING!

ComfyUI Models to Download:

Based on my CPU, GPU, RAM, Storage here are the Models I found to work best for my use case. Your mileage may vary.

Image Generation:

Flux.1 [dev] (~19GB)

Best for text-to-image
Requires significant VRAM, but your RTX 5070Ti can handle it
Download via Manager

Stable Diffusion 3 Medium (~10GB)

Good alternative to Flux
Better for controlnet workflows

Video Generation:

LTX Video (recommended for 2-minute videos)

Text-to-video at 24fps
Download size: ~8GB
Via Manager → Model Manager → search "ltx"

3D Model Generation:

Hunyuan3D (~5GB)

Text or image to 3D model

VAE Models (Essential for all workflows):

Download TAESD VAE (lightweight)
Download SD VAE (standard quality)

I personally have downloaded more than that already, due to me testing and comparing models and such. I have 134GB of just Models for ComfyUI installed. You can get by with half that if needed.

Embedded Image

When you open up ComfyUI and get to the browser page of it, you can click on the blue "Manager" button towards the top right and it will open a popup like this:

Embedded Image

Then if you click on "Model Manager" it will bring you to a list of ALL the models available to run inside ComfyUI. You can either manually download the ones you know you need if you are familiar with models and such from here.

Embedded Image

Or you can click on the "Template" menu item on the far left side to open up a more GUI based Model download method. You can search in here for an entire workflow that does a specific task (like input Text and output Image).

Embedded Image

Once you find a Template that you want and you click on it, it will tell you which Models it requires to run that you don't have yet and will ask you to download them. This is way easier for beginners but be careful because some workflows are not created with your specific hardware in mind. For me I started with templates and then modified them to use Models that run good on my hardware.

Next we install Ollama

The way I installed Ollama may be different then how you want to install it. I installed it was running the installer I downloaded (installing to the C: drive) then reboot my computer.

Embedded Image

Then check the version of Ollama (and to make sure it installed properly) you run the following command in PowerShell (as admin)

ollama --version

Embedded Image

Then you create 2 System Variables in Windows.

Embedded Image

Reboot your computer for those changes to take effect. Those 2 System Variables should make it so that when you reboot your computer Ollama will start up in the background automatically.

Ollama Model For Chat / General Purpose (RAG Backend):

Best Choice: DeepSeek-R1 (8B variant)

Size: ~5GB VRAM
Speed: Very fast
Reasoning: Excellent

Install via PowerShell with:

ollama pull deepseek-r1:8b

Embedded Image

Alternative: Llama 3.1 (8B)

Size: ~4GB VRAM
Speed: Fast
Reasoning: Good

Install via PowerShell with:

ollama pull llama3.1:8b

Ollama Model For Coding (Cursor):

Best Choice: DeepSeek-Coder (7B)

Size: ~4GB VRAM
Speed: Very fast
Coding: Excellent

Install via PowerShell with:

ollama pull deepseek-coder:6.7b

Embedded Image

Alternative: Qwen 2.5 Coder (7B)

Size: ~4GB VRAM
Speed: Fast
Coding: Very good

Install via PowerShell with:

ollama pull qwen2.5-coder:7b

Ollama Model For Embedding (AnythingLLM RAG):

Use: nomic-embed-text

Size: ~274MB VRAM
Purpose: Converting documents to searchable vectors

Install via PowerShell with:

ollama pull nomic-embed-text

Embedded Image

Installing All Ollama Models at once:

Open PowerShell and run:

ollama pull deepseek-r1:8b
ollama pull deepseek-coder:7b
ollama pull nomic-embed-text

This will take 10-20 minutes depending on internet speed. My RTX 5070Ti with it's 16GB VRAM can can handle all of these models easily.

Verify that all the Models are installed:

ollama list

Output should show:

Embedded Image

Next we install AnythingLLM

Embedded Image

Download the Desktop Installer here: https://anythingllm.com/desktop

Then run it!

Embedded Image

During the install it downloads the necessary libraries to connect to Ollama which is great. So once it is installed and you open it up you can modify the settings.

I have it setup for the LLM to use Ollama and specifically the 'deepseek-r1:8b' Model fir everyday use.

Embedded Image

But if I am going to go down a deep vibe coding rabbit hole I can switch it to the 'deepseek-coder:6.7b' model.

Embedded Image

I am not going to get into the RAG setup because well that is my documents and not for your eyes to see ;) But again, if you are confused how to do it yourself. Just ask another AI to help you setup a local AI to replace that AI!

Now we install Cursor

Cursor is a tricky one because it doesn't natively allow you to connect to localhost (my Ollama instance). So ideally I will use AnythingLLM for code related questions and such, but I also want Cursor for more indepth coding session and logging and building out full projects from start to finish.

You can download Cursor here: https://cursor.com/download

Embedded Image

Then go through the install process as you see fit.

Embedded Image

I have been messing with Ngrok and similar options to get localhost to work. I think though between me starting to document this and time of posting Cursor has updated it and it let's you add a custom Model to it which you can use your http://localhost:port# as the Model Base URL and then choose the deepseek-coder model.

Your mileage may vary though.

So What Now?

Well now the fun begins really. I have the infrastructure setup and ready for me to play. I am already working in ComfyUI creating my own workflows to optimize the output that I am looking to generate. A lot of my GenAI work ties into my Project2026 that I am working towards.

But I also have my sights set on spinning up my web based tools for Hive. I have ideas on what I want to create and I intend to start working on those once I get the foundation setup in ComfyUI, and by the time I have that all setup there will most likely just be more newer models to test with. The endless cycle will never end really.