Opportunity: Co-create the data engine powering “the Google for startups”.
Who are we and what do we do?
GlassDollar helps corporates to integrate cutting edge startup products into their value chain. We collect and analyze millions of data points about startups every day and we use that data to recommend our customers, like Daimler, PwC & Miele, the best startups to work with.
Why are we doing this?
As a team, we want to change the way people perceive work.We envision a world in which humans are free to search for an impact they want to make, to find purpose, instead of a job to pay the bills.A world so free, that humans can choose missions to undertake, instead of a title to add to their CV.
We believe we make that vision a reality, by enabling startup founders to get their first customers with whom to validate their business model and pay the bills.
What are we looking for?
We are looking for the (co-)architect of our data engine. That engine finds & tracks hundreds of thousands of companies (by collecting performance signals) and a query-engine that can help us to quickly highlight the best fit for a given pilot project of a corporate partner.
How does the data engine work?
Current State: “Data Engine”. Python Scripts, orchestrated by Airflow, that collect new data (i.e. crawl webpages, company registrar, social media sites)dedupe & integrate that data (right now this happens in a complex pandas-based script), deduping with the help of wonderful package: python dedupe.clean & enrich that data (pulls social media profiles, crawls websites, cleans up location with the help of google places)analyze, i.e. simple NLP classifier that weeds out agencies or consultancies from proper product-focused startups.
What tech stack will you be working with?
Python3, Airflow, PostgreSQL, Django (app), Selenium.
(You’ll naturally take part in deciding how this develops)
Projects we are currently working on?
Mining the websites of 10-thousands of B2B-startup websites for mentions of corporate partners. This includes finding names in text, analyzing and comparing logos and building everything so that it can be tracked over time. Ultimately we are building a “scoring” algorithm that evaluates the fit between a company and a given opportunity (i.e. a pilot with a corporate) and adjusts this score with increasing amounts of data about the company in a Bayesian fashion.
Core character tenets you should exhibit?
- Ruthless prioritization (there’s opportunity cost everywhere)Opinionated (to have the confidence of working with incomplete information), but acceptant of being wrong at times. Bayesian approach to beliefs.
- Unabated drive to learn and improve
- Happy to celebrate the small and big things with us
- Strong communicator who is able to speak up and not afraid of asking questions
- At least 2+ years of practical Data Science & especially Data Engineering experience.
- Very advanced level of Python.
- You have used big data tech like Hadoop, Spark, or similar.Experience with (but at least heard of) Apache AirflowWorked with: SQL, Document DBs & other NoSQLSome AWS and/or GCP experience. Maybe you’ve used SageMaker before, or the Data Pipelines they offer.
- Knowledgable of different deployment options: I.e. I.e. perhaps you’ve dockerized and deployed some heavy scripts to ECS or similar.
- A big plus is any experience with web-scraping/crawling.
Fluent & happy English communicator.
We will be building this company up together, so naturally, you should participate in the value we create = hold equity. We will agree on a salary you can live on comfortably, whilst also remaining frugal as a young company (which we will own jointly).
There is always space to grow and iterate from here.
Does this sound like you?
Reach out to email@example.com with your LinkedIn and Github (or any work you can share)
No CV required.