نتائج البحث
Meta-learning for wrestling
We show that for the task of simulated robot wrestling, a meta-learning agent can learn to quickly defeat a stronger non-meta-learning agent, and also show that the meta-learning agent can adapt to physical malfunction.
Competitive self-play
We’ve found that self-play allows simulated AIs to discover physical skills like tackling, ducking, faking, kicking, catching, and diving for the ball, without explicitly designing an environment with these skills in mind. Self-play ensures that the environment is always the right difficulty for an AI to improve. Taken alongside our Dota 2 self-play results, we have increasing confidence that self-play will be a core part of powerful AI systems in the future.
Competitive self-play
We’ve found that self-play allows simulated AIs to discover physical skills like tackling, ducking, faking, kicking, catching, and diving for the ball, without explicitly designing an environment with these skills in mind. Self-play ensures that the environment is always the right difficulty for an AI to improve. Taken alongside our Dota 2 self-play results, we have increasing confidence that self-play will be a core part of powerful AI systems in the future.
Improving Enforcement and Transparency of Ads on Facebook - meta.com
Improving Enforcement and Transparency of Ads on Facebook meta.com
Measure Brand Lift Across TV and Facebook - meta.com
Measure Brand Lift Across TV and Facebook meta.com
Boston Amazonians celebrate 'bring your kids to work day' - About Amazon
Boston Amazonians celebrate 'bring your kids to work day' About Amazon
Contact - Tesla
Contact Tesla
Find Us - Tesla
Find Us Tesla
Blocking Ads From Pages that Repeatedly Share False News - meta.com
Blocking Ads From Pages that Repeatedly Share False News meta.com
Supercharger - Tesla
Supercharger Tesla
Announcing New Ways to Enjoy Memories with Friends - meta.com
Announcing New Ways to Enjoy Memories with Friends meta.com
OpenAI Baselines: ACKTR & A2C
We’re releasing two new OpenAI Baselines implementations: ACKTR and A2C. A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) which we’ve found gives equal performance. ACKTR is a more sample-efficient reinforcement learning algorithm than TRPO and A2C, and requires only slightly more computation than A2C per update.
OpenAI Baselines: ACKTR & A2C
We’re releasing two new OpenAI Baselines implementations: ACKTR and A2C. A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) which we’ve found gives equal performance. ACKTR is a more sample-efficient reinforcement learning algorithm than TRPO and A2C, and requires only slightly more computation than A2C per update.
Hard Questions: What Should Happen to People’s Online Identity When They Die? - meta.com
Hard Questions: What Should Happen to People’s Online Identity When They Die? meta.com
More on Dota 2
Our Dota 2 result shows that self-play can catapult the performance of machine learning systems from far below human level to superhuman, given sufficient compute. In the span of a month, our system went from barely matching a high-ranked player to beating the top pros and has continued to improve since then. Supervised deep learning systems can only be as good as their training datasets, but in self-play systems, the available data improves automatically as the agent gets better.
More on Dota 2
Our Dota 2 result shows that self-play can catapult the performance of machine learning systems from far below human level to superhuman, given sufficient compute. In the span of a month, our system went from barely matching a high-ranked player to beating the top pros and has continued to improve since then. Supervised deep learning systems can only be as good as their training datasets, but in self-play systems, the available data improves automatically as the agent gets better.
Marketplace Expanding to Europe - meta.com
Marketplace Expanding to Europe meta.com
Dota 2
We’ve created a bot which beats the world’s top professionals at 1v1 matches of Dota 2 under standard tournament rules. The bot learned the game from scratch by self-play, and does not use imitation learning or tree search. This is a step towards building AI systems which accomplish well-defined goals in messy, complicated situations involving real humans.
Dota 2
We’ve created a bot which beats the world’s top professionals at 1v1 matches of Dota 2 under standard tournament rules. The bot learned the game from scratch by self-play, and does not use imitation learning or tree search. This is a step towards building AI systems which accomplish well-defined goals in messy, complicated situations involving real humans.
Gathering human feedback
RL-Teacher is an open-source implementation of our interface to train AIs via occasional human feedback rather than hand-crafted reward functions. The underlying technique was developed as a step towards safe AI systems, but also applies to reinforcement learning problems with rewards that are hard to specify.