Data Science and Machine learning - Useful links and Tools

Courses -
Udacity Data Science Track (3 Courses) - https://www.udacity.com/courses#!/Data%20Science
Lots of other awesome courses also at Udacity.

Coursera - Lots and Lots and Lots of awesome courses. I have taken the Machine learning course by Prof. Andrew Ng - https://www.coursera.org/course/ml  - Awesome course!!

Edx also has good courses.

Insight Data Science Program - Awesome free 6 week program for engineers and Phd's interested in Data Science - by Jake Klamka

Learn Data Science using IPython Notebooks - by Alpine data labs

Open Source Data Science Masters - by Clare Corthell

Hands on Practice - Take part in Kaggle Competitions, very good way to learn.

Programming languages and libraries
R is very popular, RStudio is a good IDE for R, Shiny is a framework to use R from the browser.
Python is also very popular and I use it a lot - some of the very useful python libraries - Pandas, Scikit-learn, Scikit-image, NLTK, matplotlib, ggplot and many more.
There are various other libraries in C++, Java (Mahout), Scala (ScalaNLP/Breeze)

Machine Learning 
Apache Mahout helps run all the popular machine learning algorithms on top of Hadoop or possible any other distributed framework like Spark.
Graphlab - libraries and tools for machine learning.
MLib - Spark implementation of common machine learning algorithms.
Stanford has lot of useful information on NLP, topic modelling etc.

Machine Learning as a Service - These provide the ability to easily run the machine learning algorithms in the cloud (AWS, Google, Microsoft Azure)

yhat - Deploy Python and R scripts on multiple machine to process data.
BigML - Python API to create models and do data analysis.
PredictionIO - Opensource framework based on Hadoop and Mahout for developers.
0xdata - 0xdata H20 implements popular machine learning algorithms to be run in cloud.
Druid - Open source infrastructure for exploratory analysis

Hadoop and Other big data technologies - Hadoop, HDFS, HBase, Flume, Oozie, Zookeeper, Hive, Pig, Yarn, Impala. Lots of other Apache projects listed under the big-data or cloud category or database category - http://projects.apache.org/indexes/category.html#big-data

Apache Spark developed by Berkeley AMPLab is another framework for processing large amounts of data and is supposed to be faster than Hadoop. AMPLab has developed a whole data stack including database etc - https://amplab.cs.berkeley.edu/software/

Presto - Distributed SQL query engine for Big Data by Facebook.
Free Hadoop distributions, installing and trying Hadoop -
Cloudera - Cloudera has a free Hadoop distribution with lots of other useful software, can try on windows using VirtualBox.
MapR - also has a free Hadoop distribution
Hortonworks - also has a free Hadoop distribution
Hadoop on Azure - good tutorial

Databases -
MongoDB - NOSQL based database
Cassandra - Distributed NOSQL database
Graph databases - Neo4j

Backend (IAAS, PAAS, SAAS)
Amazon AWS - IAAS (Infrastructure as a service)
Google Cloud Compute - IAAS
Microsoft Azure - IAAS
Openshift
Heroku - PAAS
Nodejitsu - For hosting node.js apps
Rackspace

Mobile focused Backends
Parse
Firebase

Data Visualization and modelling software
SAS tools (Don't know much about them)
Knime - for visualization and modelling
RapidMiner - for visualization and modelling
Tibco Spotfire - Mostly for visualization
Weka - for visualization and modelling, online course - Data mining with Weka
D3js - Javascript library for visualization

Data Science books - Oreilly has this data science kit with 13 books related to data science (I haven't read any of them).

If you have read till here and have any job openings, I am looking for a job in the area of data science, machine learning, please contact me at - Manu Suryavansh