684 by flaque | 164 comments on Hacker News.
Hey folks! We're Alex and Evan, and we're working on putting together a 512 H100 compute cluster for startups and researchers to train large generative models on. - it runs at the lowest possible margins (<$2.00/hr per H100) - designed for bursty training runs, so you can take say 128 H100s for a week - you don’t need to commit to multiple years of compute or pay for a year upfront Big labs like OpenAI and Deepmind have big clusters that support this kind of bursty allocation for their researchers, but startups so far have had to get very small clusters on very long term contracts, wait months of lead time, and try to keep them busy all the time. Our goal is to make it about 10-20x cheaper to do an AI startup than it is right now. Stable Diffusion only costs about $100k to train -- in theory every YC company could get up to that scale. It's just that no cloud provider in the world will give you $100k of compute for just a couple weeks, so startups have to raise 20x that much to buy a whole year of compute. Once the cluster is online, we're going to be pretty much the only option for startups to do big training runs like that on.
0 Response to "New best story on Hacker News: Show HN: San Francisco Compute – 512 H100s at <$2/hr for research and startups"
Post a Comment