F18-STAT946-Proposal

From statwiki
Revision as of 15:18, 5 October 2018 by S2ganapa (talk | contribs)
Jump to navigation Jump to search

Project # 0 Group members:

Last name, First name

Last name, First name

Last name, First name

Last name, First name

Title: Making a String Telephone

Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).


Project # 1 Group members:

Zhang, Xinyue

Zhang, Junyi

Chen, Shala

Title: Airbus Ship Detection Challenge

Description: The idea and data for this project is taken from https://www.kaggle.com/c/airbus-ship-detection#description. The goal for this project is to build a model that detects all ships in satellite images and put an aligned bounding box segment around the ships we locate. We are going to extract the segmentation map for the ships first, augment the images and train a simple CNN model to detect them.



Project # 2 Group members:

Nekoei, Hadi

Afify, Ahmed

Carrillo, Juan

Ganapathi Subramanian, Sriram

Title: Algorithmic Analysis and Improvements in Multi-Agent Reinforcement Learning in Partially Observable Settings

Description: Reinforcement learning (RL) is a branch of Machine Learning in which an agent learns to act optimally in an environment using weak reward signals, which is different from strong labels in supervised learning. Multi-Agent Reinforcement Learning (MARL) is composed of multiple agents that can be competing against each other or cooperating together to achieve a common goal.

Our project aims to investigate the performance of several state of the art Multi-Agent Reinforcement Learning (MARL) algorithms in playing the game of Pommerman. This game will be used as a benchmark during a competition that will be held in NIPS 2018 (https://www.pommerman.com). We plan to participate and compare the performance of our agents against agents created by other researchers. Our project also aims to make algorithmic improvements to the state of the art MARL algorithms and come up with a new algorithm that renders best performance in this partially observable multi-agent setting of Pommerman.

In Pommerman, we have two competing teams, each has two agents who work together to defeat the opponent team. The agents move inside the board leaving bombs that can eliminate other agents when exploding in their horizontal or vertical vicinity. The agents can obtain bonuses such as extra bombs, increased bomb range, or ability to kick installed bombs. Our two agents can choose one of the following actions: stop, move up, move left, move down, move right, or lay a bomb. Each agent will receive an 11x11 grid of integer values representing the board state. Additional information will be provided to the agents such as its own position, positions of his teammate and enemies, available bombs, blast strength, kicking ability, and surrounding walls and bombs.

The algorithms that we are considering are:

- Monte Carlo Tree Search and Reinforcement Learning: Combining MCTS with deep neural networks.

- Multi-Agent Deep Deterministic Policy Gradient (DDPG): A technique developed by OpenAI, based on the Deep Deterministic Policy Gradient technique that outperforms traditional Reinforcement Learning algorithms (DQN/DDPG/TRPO) in several environments.

- Opponent Modelling in Deep Reinforcement Learning: based on DQN to model opponents through a Deep Reinforcement Opponent Network (DRON).

We will use Convolutional Neural Networks for data pre-processing, where we extract features from inputs. We will also be using Feed Forward Deep Networks along with Reinforcement learning frameworks in all the algorithms we implement (Deep Reinforcement learning).