Unified Smart Infrastructure Hub

U.S.I.H. - a project idea, which I’ve been working on for a few months now, putting on a thread, to have it tracked publicly, so that others can give their inputs and suggestions on how to work around with things.

Goal - to complete an entire ecosystem of devices, sharing computational power and resources, across my home network, which allow me to take control of my home devices even when I’m not home.
No specific requirement of it, but just the “itch” to build something cool.

How it started - when I tried “Parsec” (a remote access software, allows you to take complete control of your PC from anywhere), it worked great for me to access my home PC and fetch files and do minor edit works (on photoshop) for college purpose, when I just have access to my laptop at the college. But this had a major issue - it required someone at home to power on my PC, and then I could grab remote access control right from the lock screen itself (unlike teamviewer OR anydesk, Parsec doesn’t need you to launch it’s application on the host system).
Hence got the idea what if I build something, that I text it a command, and it power on my PC; which brought me on a long run, and now a super massive list of goals to achieve through this project…..

(personal PC - R9 5900XT, RTX 3060ti, 64GB RAM, 1+2tb NVME)
(self laptop - R7 5850u, Vega 8 iGPU, 32GB RAM, 1tb NVME)


Phase 1 : Trusted Infrastructure for Technology and Advanced Networking
TITAN is going to be the Server that I’ll be putting into this project, which is expected to act as the backend for the entire thing, the primary source of “power” in terms of computation.
Considering this thing is going to be running 24x7, initial goals were to keep it on a slight stronger side MAINTAINING low power consumption.
Initial Build : R5 1600 (6c/12t), 8GB RAM (ddr4), 2tb NVME (gen3), 400w PSU (antesports), GT710 2gb, AntEsports B450m motherboard - All running on a Ubuntu Server headless installation, all configured and controlled via SSH from my man PC.
And day 1 requirements were Network Attached Storage, Media Server, Local network Workspace - which have been achieved via Samba, Jellyfin, and NextCloud.

Flaw 1 - GT710 was TOOO WEAK to encode / decode the media played through Jellyfin → straight demand for an upgrade
Target was to get a card, that would run without any power connector, and heat less as well, most importantly be cheaply available.

Upgrade Phase :
Not a major upgrade, but few minor changes
Processor - R5 1600 → R5 2600 ; AntB450m → MSI B450m ; 8gb → 8x2 16GB RAM ; GT710 → GTX 1050ti (4gb vram, no power connector); AE 400w → Gigabyte P450B (bronze & branded)
Most of these costing me almost nothing as they were my own stock, so costed like just a fraction of their actual rates, wont’ be calling a major upgrade, but just a little less old that what I initially started on.
Also swapped all parts from the table, into a Zebronics cabinet, where the front led is what brought me to choose it as the case (else the plan was to build a wall mounted shelf for this thing)


My home network setup is currently :
ONU → TP Link Archer C6 (main router) → TP Link Archer C6U (extension router, in my room, visible in image)
Internet plan is a 150MB/s one (got on discount, costed Rs. 5,900 incl GST for 1yr) - which is ofc of no use for the home infrastructure usage. Both routers are Gigabit ones, and yes all devices are at a 1Gbps link speed as well (both routers, server PC, self PC).
Next plan is to add a UPS to each of the Router setup, as there’s no inverter at home, though powercut is about 2-3 times a month, hardly for 5-10 minutes, having a BackUPS is a safeguard against any electrical failure & assures constant internet supply across all devices.


Phase 2 : Home Assistant - J.A.R.V.I.S.
I did read lots of home lab & smart home setups on the forum, but didn’t find anyone building a personal home assistant, not sure why, or maybe nobody is as free as me to tinker this hard :sweat_smile:
For this, the plan is to make use of a Raspberry Pi 4 as the interface, where it relies on the server for the backend computation job (whenever required and server online).

Goal is to make somewhat a real life Jarvis, ofc not as great as what Tony Stark had, but as great as it be possible to make it {something that can survive internet outages, so entire processing happens locally, cant’ rely on APIs from others ((cant afford APIs either)) }.
Setup has been as the Rpi connected to a Microhpone (to hear me) and a speaker (to speak up), and the rest is all configuration and setup of the interaction and conversational modules that will further be fed to the Pi, along with custom intents, commands, connectivity nodes etc etc.

Progress on the JARVIS so far, not so great, but took me a whole week.
Initially started with Rhasspy (prebuilt home assistant package) but did not work out well, had crazy issues, which I couldn’t fix up, later went in for all manual.

Simple workflow : I speak → J hears → Speech to Text → Process the text → Brain → process response in Text → Text to Speech → J speaks → I hear

Did this, wake word detection on “Hey Jarvis”, it hears, converts my speech to text (7 seconds), checks if that was among the wake word list, if yes, picks up a response from the acknowledgement list, text converted to speech, wav file generated (9 seconds), spoken back by the speaker.
Actually worked like a charm (after several tunings), but only issue - TTS & STT were very very slow (took like 16 seconds to get a response once I was done speaking) {just tried the wake word and acknowledgement, which took this long, so def longer sentence processing would take a lot more}.

Then moved to a WakeWordEngine (Porcupine, from PicoVoice), which detects the WakeWord in under a second if spoken, and wakes up the response engine - done under 1 second.
Response for “Hey Jarvis” has always to be like YES SIR, IM ONLINE, etc etc, so why convert text to speech each time! Generated pre-recorded WAVs, and upon wake, any of the acknowledgement recordings be played.
AFTER which the command window starts, where I speak the command, it be converted to text, processed, and so on…….

This is the status progress so far.
Want this to be offline survivable, Picovoice does use API key, but just for the 1st time, authentication, it needs Internet, after that works fully offline; and is full free for personal use.


Today started with split workload configuration - Pi does the I/O, and passes the WAV to the server for processing - cuz any day 2600 & 1050ti is a lot better than the onboard “Broadcom BCM2711, quad-core ARM Cortex-A72 (ARM v8) 64-bit SoC, clocked at 1.8GHz”
NOTE : I AM NO EXPERT, I AM A BEGINNER, AND THIS ENTIRE PROJECT IS ASSISTED & GUIDED BY HONORABLE CHATgpt (GO version, again, got that for free as well).

GPT did say that if Pi alone is taking 16 seconds to respond a full cycle, server can do it in approx 1-1.5 seconds.
Followed all instructions, installed Whisper.cpp STT engine on server, created a HTTP socket to access and pass data from Pi to Server, and back, and got stuck at a spot where data passing was taking roughly 18 seconds.
Later, did a single recording file on the Server itself, processed it Speech to Text on the server alone, took 16 seconds JUST TO CONVERT SPEECH TO TEXT (which is literally twice of what the Rpi took)

Found out that GPU was not active, and not in use - CPU was doing all the work.
Installed NVIDIA Drivers, turned out Secure boot is conflicting → Disabled secure boot, driver setup done, can also see GPU under load when watching movie via Jellyfin on the phone (downscaling).
Went to build Whisper with CUDA → THIS GPU DOES NOT SUPPORT CUDA ENCODING / DECODING :face_holding_back_tears:
Next suggestion was to shift to some RTX card for this purpose, and it is easily gonna cost me 12k + for this purpose of getting the CUDA cores……


Next will look for lightweight alternatives that can run on 1050ti for STT & TTS services, and work under 4 seconds for the processing, because 18 seconds for a reply is just unacceptable….

Open to suggestions, ideas, tips and tricks!
Will be sharing down my entire workflow here in this thread!

Also about making my infrastructure globally accessible, initial plan was with Cloudflare Tunnel - but turned out that for that, we need to own a domain (domains are under 100rs, but only for 1st year, “MIGHT” take one, for an year, use, and discard, next year another one :sweat_smile: )
So will be sticking to ZeroTier for this entire job (cant use port forwarding - ISP says get a Business Plan, only then we will open ports for you, and business plans are like 1500rs a month, for 50Mbps)

2 Likes

I’ll save this for a later read, very interesting.

2 Likes

Great.. keep doing it..

the problem is by the time we finish this , there would be a software which can do all this on a docker easily..

I went the same path for creating lora (SDXL) for myself to create AI images.. and within months , there was gemini and what not , which only needs a snigle crappy photo , and it will do everything..

Same goes to many such projects.

but one thing is , you learn something new in this path.. Because of these , i have docker running on Unraid and running pretty good containers.. latest one is automatic twitch drop collector.. pretty cool , i dont need to run the twitch stream to get the drops, the containers does this for me … lol.

There are many such interesting stuffs.

But what i found is the more you add , the more you should be ready to troubleshoot … currently working on n8n CE version to do many such automated stuffs.

n8n seems pretty interesting .. do look into it.

This sounds interesting!

Yes did look through n8n, but the very first thing on the homepage - start free trial (I’ve assumed n8n is all paid)

n8n self hosted is completely free. Its paid only for cloud.

You can use wake on lan feature for this.

1 Like

Is it ? Will surely look into it then. would streamline lots of stuff with n8n in the easy way - rather than to create intents for each action.

Wake On Lan - yes that was the 1st idea — but found a major drawback on WoL.

-If shutdown, then wol works

But if power cut / main power off, wol doesn’t work as the network component is totally off by now. While a safe shutdown keeps the pci devices active for wol acceptance.

So planning to add an esp32 to the power switch directly onboard, to do the manual trip, over a network signal (for both - pc as well as server)

Good motivation, start one of my side projects.

n8n can do lot.. just use chatgpt to pitch you idea , chatgpt will guide you easily.

For notifications etc , use telegram bots , they work nicely.

I recently added Uptime Kuma to know whether my website and concerned functions like email , etc are functioning .. any downtime , after x amount of retry , i will get a notification on telegram , this elimnates false notifications .

Already configured my smart plug to turn on and off Mosquito Liquidator few hours in early morning and evening.. this way i keeps mosquitos out and saves the liquidator as well.

Also i turn on/off by subwoofer at certain times , last time i kept i on always and board got fried.

things like this makes life easy , but i want to put everything in one place either on n8n or homeassistant..

lets see…

n8n is free when you selfhost.

1 Like

got to know something new, understood that “Uptime Kuma” is something that I can run over the server / pi, for realtime statistical dashboard.

Also, for the global access, I saw that cloudflare tunnel is a good way, only limiting factor was owning a domain - under 100rs for 1st year, so thinking to take one for a year, and then later, get another one for under 100rs xD (cuz my requirement is not to launch a public website, so changing URLs isnt an issue for me)

This idea is much more better than relying on WOL, but the problem in this one is power source for the ESP32 - either another 5v charger next to the PC, or a battery that needs changing time to time.
Will look if i can pull usb 5v from motherboard while the PC is off - would hugely sort the problem right there

My 2 cents on WOL, I personally use an old smartphone (Samsung J1 from 2017) to send the WOL packets to my PC. I use MacroDroid on the phone to trigger WOL whenever a message is received on Skred (a messenger app). I use Skred since it requires only an email for registration, most other apps i went through required a phone number.

The battery lasts about 3-4 days for me, which is good enough since it’s always on standby. But if you could root the phone, i believe there’s a magisk module which lets you control the charging limits of the phone, so you could cap it at 80 and let it discharge till 40 of 50% after which it automatically charges again. This was what I had in mind initially, but I couldn’t root my phone for some reason.

1 Like