2,526 reads

My Time at NUS, Singapore

by Samanvya TripathiJuly 16th, 2019

Too Long; Didn't Read

Singapore is home to some of the best schools in the field of Computer Science. The cutting edge research going on there is unparalleled. Nanyang Technological University (NTU) and National University of Singapore (NUS) have a great reputation all over the world for their CS programs. The Academic Internship was very exciting because of it's thorough syllabus covering the basics and advanced topics related to Artificial Neural Networks(ANN) and Big Data. In a span of 15 days we had to work on two projects trying to implement whatever that was taught in the lectures keeping the business value of the project that was being developed.

Company Mentioned

Coin Mentioned

featured image - My Time at NUS, Singapore

Singapore is home to some of the best schools in the field of Computer Science, specifically Artificial Intelligence. The cutting edge research going on there is unparalleled. Colleges like Nanyang Technological University (NTU) and National University of Singapore (NUS) have a great reputation all over the world for their CS programs.

An opportunity presented itself to me when I was in my college in SRMIST, Chennai, India. A Global Academic Internship Programme (GAIP) by Corporate Gurukul, which sends students interested/experienced in Artificial Neural Networks and Big Data to NUS to study under world class faculty. I wasn't letting this pass, so I signed up for it, cleared the interview and got in!

It was December 2018 and I was on my flight to Singapore 🇸🇬.

The Academic Internship was very exciting because of it's thorough syllabus covering the basics and advanced topics related to Artificial Neural Networks(ANN) and Big Data (more on these topics later). In a span of 15 days we had to work on two projects trying to implement whatever that was taught in the lectures keeping in mind the business value of the project that was being developed.

We covered ANN for the first 8 days, followed by Big Data in the remaining.

ANN Lectures were delivered by Dr. Lek Hsiang Hui, Dr. Tan Wee Kek and Dr. Wang Wei.

Firstly, we had Dr. Lek Hsiang Hui who gave us an introduction to Data Analytics. It was a really good insight and added value to my understanding of the same concept.

Explained the different types of Decision Models and gave a few examples of how the data flows between different states, is modified and then a decision is taken based on the output.
Data mining also was covered with a great flow digram describing each of various stages involved like Data Extraction, Data Cleaning, Data Aggregation, Data Representation, Data Interpretation.
Basics of R programming and implementing various mathematical formulae was also taught, which included Measures of Location, Measures of Shape, Measures of Dispersion and Measures of Association.

Secondly, we had Dr. Tan Wee Kek who presented the concepts of Machine Learning in a very intuitive manner, and I was able to grasp most of the concepts with ease. Things I understood and implemented:

Simple and Multi Linear Regression and it's problems
Python Data Science Libraries like Numpy, Scipy, Matplotlib and Scikit Learn
Classification and it's types: Decision Trees, Bayesian Classifier, Logistic Regression, SVM (Support Vector Machines)
Clustering: K-Means, K-Mediods, Hierarchical Methods
Text Mining and KDD using Classification, Association and Clustering

An artificial neural network is designed to function like the neurons in the brain.

Lastly, Dr. Wang Wei introduced us to topics revolving around Artificial Neural Networks. These topics were harder to understand with the level of Calculus involved but Dr. Wei did a great job at teaching us the basics. Here's everything that I learned and implemented:

Why ANN: Problems with Logistic Regression
Back Propagation and Gradient Descent (GD) Algorithm
Some advanced GD Algorithms like Stochastic GD, Minibatch GD, RMSProp and Adam
Training Techniques: Random Initialisation, ReLU, Dropout
Convolutional Neural Networks: Pooling, Padding, Strides and some common CNN Architectures
Recurrent Neural Networks: Vanilla RNN and LSTM

We had to present our Artificial Neural Networks Course Project two days after our final lecture. Those days went by really quickly with little or no sleep as me and my group mates hustled to finish our project called Quick Draw.

We wanted to make something that has great scope for real-world implementation and helps the society. Our program tracks the strokes of the user and gives an output predicting what the user is trying to draw in real-time.

Quick Draw uses the existing Google's Dataset of various labeled hand drawn images in the Numpy Array format. We downloaded the data of 4 classes (20,000 images each) and started training it with different algorithms. First we started off with SVM, then K-Means Clustering, then Feed-Forward Neural Network, then Convolutional Neural Network and finally Long Short-term Memory (LSTM). We found out that CNN and LSTM gave us the highest accuracies. So we decided to use those two models and we made a front end for our project using OpenCV library which is basically taking input of our strokes from the keyboard.

You can checkout the project here.

Now moving on to the Big Data Course, the second one of our internship. This course was delivered by Ravindra Kumar, or as we call him, God of Linux. He helped us work on some really complicated linux commands, broke them down for us so that we find it easy. Here are the topics that I learned and implemented:

What is Big Data and how it is changing the world
Problems in Big Data
Hadoop and it's features, Hadoop vs RDBMS
The Hadoop Ecosystem: HDFS, Yarn, MapReduce, HBase, ZooKeeper, Pig and Hive
Setting up Node Cluster using Ambari Management: NameNode, DataNode, SSH-ing into these nodes without a password
Setting up HDFS using Shell Commands, Commissioning and De-commissioning, Resource Manager, Scheduler
HIVE Basic Queries and Data Ingestion mechanisms using Sqoop
Mapreduce Programming

After the lectures, we had to work on a project in which we set up a three node cluster deployed on a platform and managed using Ambari. We had to do everything from scratch, set up password-less SSH between the three nodes, setup Java, JDBC, Hadoop environments on the .bashrc file, creating a local repository, transferring data from the local repository to HBASE and then performing operations on that Data. We chose the book called 'Sherlock' from the Gutenberg Library and performed Word-Count and MapReduce programs on it. Every step of the project was done mostly through shell commands which was really cool and helped us understand the working of Linux commands, when and where to use them.

It was a really enriching experience to learn from the best in the world and work on projects under their guidance. I also would like to thank the Teaching Assistants(TA) for the course Puru Sharma and Devvrit Khattri, they were always there to help us out with any difficulty that we would face and help us overcome it.

Thank you for reading!