À propos
The success of large-scale pretraining hinges on intricate engineering heuristics. While the empirical benefits of these heuristics are evident, their underlying mechanisms remain elusive. My research endeavors to demystify the mathematical principles underlying these pretraining heuristics, aiming to illuminate their mechanisms and potentially guide future algorithm developments.