My internship project at DataTribe seemed simple enough to fit on a seed packet: build a dashboard to watch digital daisies grow. The 'Daisies' part is the easy-to-grasp goal: a lovely dashboard showing plants’ thriving based on the weather. The 'Degrees' part is the science: using Growing Degree Days (GDD), we can predict a plant's progress. The 'Data' part… That’s where gardening got a tad bit complicated.
Achievement Unlocked: Over-Engineering Level 1
My main goal was to learn how to build data pipelines and core application logic. But I had expectations, which fueled my ambition to grow the scope of my project. I figured, why just plant a daisy when you can instantly build the entire automated, self-replicating botanical garden, painted with all the colours of cybersecurity reading I have enjoyed over the study years?
Having previously learnt that navigating some cloud vendors feels like hacking through a jungle with a butter knife, the temptation of a clean map like Terraform (IaC) was irresistible. Chanukya, with the patience of a seasoned gardener, later explained that trying to implement a data engineering project to a complex cloud infrastructure was… perhaps a bit much. A classic junior developer mistake unlocked here.
Deliver Ugly, But Always Deliver
I was sternly pointed back to the soil. The grand vision of an automated botanical fortress had to wait. At the end of the day, something needed to be delivered. Eevamaija put it perfectly: the first step is to create "the ugliest thing that you can stand watching". In an earlier conversation, Chanukya had pointed out the importance of having working end-to-end projects. This was a lesson in pragmatism. Build the functional core, no matter how unadorned, and then you have something real to hone and improve.
This shift in focus led to practical, powerful decisions. To build a complete end-to-end system locally, I used MinIO as a simple and effective stand-in for S3. This allowed me to create a sandbox environment, ensuring the whole pipeline worked before worrying about the cloud. This is the reason the application supports both MinIO and AWS S3 today. It’s a direct artifact of prioritising a working system over a polished, cloud-native-only approach from the start.
What I eventually delivered isn’t the source of immense pride, nor an eye-candy business product. The delivered prototype, even with its weeds and scraggly petals, is real. It’s something you can show, test, and build upon.
A Bird’s Eye-View of Esgaroth
My data lakehouse architecture processes data in two main stages on a foundation of an object storage. First, the bronze layer acts as the receiving dock for raw materials. An Airflow-orchestrated pipeline fetches weather data from the Yr.no API and stores it as partitioned Parquet files. This partitioning strategy is creating a clear "address system" for our data based on date, crop, and location, which makes finding and processing it incredibly easy.
Next, the silver layer refines this raw material into analysis-ready goods. Another Airflow pipeline uses DuckDB to transform the bronze data, calculating the Growing Degree Days. A key enabler in the setup is DuckDB’s ability to run high-performance SQL queries directly on the Parquet files in object storage. This two-layer system is governed by simple rules: the bronze layer allows updates to ensure completeness, while the immutable silver layer protects the integrity of the finished GDD calculations. Throughout the process, the Parquet format enforces a consistent schema, guaranteeing data quality.
In essence, the data's journey is a clear progression. Its path begins with Airflow bringing raw tidings into the bronze layer, and the quest is complete when a Streamlit dashboard blooms, revealing the insights forged in the silver layer.
An Open Invitation
This project is a seedling, a prototype that proves the concept. But a garden is a collaborative effort, and a single daisy is a bit lonely.
As an open-source project, the GDD App is now ready for more gardeners, especially the junior DataTribers and our Data Engineering Cohort: Whether you want to tend to the data pipelines, breed hardier GDD models, or make the dashboard bloom with beautiful visuals, there’s a plenty of land here for everyone. Let's cultivate this project together!
PS. Plants in the application were not daisies, but it rhymed with data and degrees brilliantly.

