Professor Eric Moulines
Institut Télécom / Télécom ParisTech (ENST)
Département TSI / CNRS UMR 5141

Title: The sexy job in the next ten years will be statisticians

The goal of exploratory data analysis or data mining is making sense of data. We develop theory and algorithms that help us understand our data, with the goal that this helps formulating better hypotheses. The role of statisticians is to provide methods that give detailed insight in how data is structured: characterising distributions in easily understandable terms, showing the most informative patterns, associations, correlations, etc.Statisticians are part of the big data science wave but which part exactly next to data accessibility, data communication and visualization?

Bio for Professor Eric Moulines:
Eric Moulines graduates from the prestigious École Polytechnique in 1981. He is an international expert in Machine learning, Mathematical and Computational statistics, as well as Non-Linear Time series analysis and Markov Chains. Eric Moulines has consistently been able to develop projects and industrial collaborations on his favorite research topics with a constant concern for potential applications (infrared sensor detection, classification of satellite images...). He spent his entire career at Télécom ParisTech, where he held a position as Professor since 1996 and where he initiated and developed courses on statistics and data science. He received the CNRS Silver Medal in 2010 for the originality, quality and importance of his work in the field of stochastic learning and probabilistic methods applied to signal processing. He also received the Grand Prize of the Academy of Sciences of the France Telecom Foundation in 2011, created in 1992 and awarded to one or more researchers or engineers from France or the European Union for work on basic or applied research in the field of telecommunications with a potential impact on services, networks, equipment or components. Eric Moulines has been recently appointed as Professor at Ecole Polytechnique in data science.

Dr Michael Natusch
Senior Director Data Science EMEA | Pivotal

Title: TBA

Bio for Dr Michael Natusch:

Professional Background

Dr. Natusch leads Pivotal’s Data Science team in EMEA. His experience lies in predictive analytics and his area of specialization is the application of statistical methods to large-­‐scale data sets, in particular through the application of machine learning algorithms. Based in London, he has built a team of Data Scientists to support Pivotal’s activities in Europe, the Middle East and Africa and his role is to evangelize Data Science for Pivotal staff and customers, to lead Data Science focused proposition development, to engage in customer assignments and to extend Pivotal’s Data Science intellectual property portfolio.

He has worked on a wide variety of assignments such as customer churn reduction, customer proposition optimization, pricing, HR analytics, logistics and supply chain optimization and also more industry-­‐specific issues such as telecoms network deployment optimization and insurance claims optimization. Current interests include especially the development of algorithms for real-­‐time predictions on streaming real-­‐world data.

Before joining Pivotal, Dr. Natusch was co-­‐founder and Chief Analyst of London-­‐ based startup Cumulus Analytics which has since been acquired in a successful trade sale. Prior to this, Dr. Natusch was a Vice President and practice lead at Capgemini.

Academic Background
Dr. Natusch holds a PhD in theoretical physics from the University of Cambridge and an MBA. He is a Fellow of the Royal Statistical Society and lectures at the Open University.

Selected Project Experience

  • Traffic prediction for a European car maker
  • Customer churn modeling for a British telecoms operator
  • Real-time text analytics of the engineering status reports of a major telco
  • Development of the content strategy and game theoretical modeling of the TV rights for a new broadcaster
  • Game theoretical modeling of the spectrum auctions for telecoms operators in a number of Western and Central European countries
  • Optimization of the spare parts distribution system of a European car maker
  • Raw materials purchasing optimization for a European specialty chemicals maker
  • Sales optimization of a global copier manufacturer
  • Pricing and war gaming for a global carbonated soft drinks distributor

Florent Perronnin
Research Scientist Manager, Facebook AI Research

Title:Fine Grained Visual Categorisation

While in the past the computer vision community focused on tasks considered trivial for humans, e.g. distinguishing cats from dogs, it has started very recently to focus on tasks which are non-trivial even for humans, such as recognising breeds of cats or dogs. These are two examples of fine-grained recognition tasks, i.e. tasks which involve a large number of semantically related and visually similar classes. Other examples of fine-grained tasks include the recognition of plants and flora or the recognition of brands and models of cars. The visual distinctions between similar categories are often quite subtle and therefore difficult to address with general-purpose object recognition machinery. It is likely that radical re-thinking of some of the matching and learning algorithms and models that are currently used for visual recognition will be needed to approach fine-grained categorisation. I will briefly discuss recent progress in this field (including some of my work on the topic) and mention some of the challenges still ahead of us

Bio for Florent Perronnin:

Florent Perronnin is a Research Scientist Manager with Facebook AI Research, in charge of the Paris lab. He received his Engineering degree from Télécom ParisTech in 2000 and his PhD in Computer Science from the EPFL in 2004. Prior to joining Facebook, Florent was a Principal Scientist and a Manager of the Computer Vision Group at Xerox Research in Grenoble. There, he led several projects around visual recognition which won many awards in international benchmarks such as PASCAL VOC or ImageNet.

Dr. Gautam Shroff
Vice President and Chief Scientist, Tata Consultancy Services

Title: Big Data Analytics vs Business Intelligence

The distinction between ‘traditional business intelligence’ vs ‘big data analytics’ is often portrayed, typically by IT departments, as a case technology evolution from relational data warehouses to data-lakes based on distributed frameworks such as Hadoop. However, this distinction is superficial at best and distracts from the real issues, which are both scientifically deeper as well as cultural.

Regular polls demonstrate that apart from the relatively few web companies, the volume of data actually being analyzed in practice hardly warrants a technology change on its own; i.e., ‘big’ data is not really that ‘long’, and distributed processing is usually not essential, especially as Moore’s law progresses.

Instead, it is the increasing availability of many more pieces of information about each entity of interest, e.g., a customer, often from diverse sources (social-media, mobility, internet-of-things) that results in a ‘big data’ issue; i.e., ‘big’ data is more about ‘wide’ rather than ‘long’ data.

For one, queries are often an inefficient and inaccurate mechanism to derive insight from high-dimensional data. Second, disparate datasets often lack a natural ‘join key’. Third, datasets may describe measures at different levels of granularity, e.g., individual vs. aggregate data, and finally, different datasets may be gathered from physically distinct populations.

Thus, once data has been ingested in ‘raw’ form in a ‘data lake’, rather than mere distributed execution of traditional ETL operations, what is really needed is a systematic mechanism to address the challenges of diverse datasets.

I shall argue Bayesian graphical models provide a useful rudder with which to navigate a data lake and fuse disparate islands of data. Last but not least, a Bayesian paradigm also promises the ability to enable causal analysis that goes beyond querying the available data or its distribution.

While the cultural gulf between SQL-touting business intelligence technologists and Bayesian modelers may appear vast, it need not be so, if the big-data frameworks of the future focus on bringing together these different world-views.

Bio for Gautam Shroff:

Dr. Shroff heads TCS’ Innovation Lab in Delhi that conducts applied research in data analytics, information fusion, multimedia, robotics, and virtual reality. Prior to joining TCS in 1998, Dr. Shroff had been on the faculty of the California Institute of Technology, Pasadena, USA (1990-91) and thereafter of the Department of Computer Science and Engineering at Indian Institute of Technology, Delhi, India (1991 – 1997). He has also held visiting positions at NASA Ames Research Center in Mountain View, CA, and at Argonne National Labs in Chicago. In 1994 he was conferred the ‘Young Scientist Award’ from the Department of Atomic Energy.

Dr. Shroff has published over 40 research papers in the areas of computational mathematics, parallel computation, distributed systems, software architecture and software engineering. He has written a book “Enterprise Cloud Computing” published by Cambridge University Press, UK, in October 2010. Oxford University Press published his second book, “The Intelligent Web”, in 2013. In 2012 Dr. Shroff taught a massive online open course (MOOC), via Coursera, on “Web Intelligence and Big Data”, in his capacity as an adjunct professor at IIT Delhi and IIIT Delhi. The course was attended by tens of thousands of students all over the world, especially from India and the US.

Dr. Shroff is an active member of ACM and ACM-India, and serves as the founding chair of the ACM-India SIG on Knowledge Discovery from Data (IKDD), which is also the India chapter of ACM SIGKDD.

Dr. Shroff graduated with a B.Tech in Electrical Engineering from the Indian Institute of Technology, Kanpur, India, in 1985 and received his Ph.D. in Computer Science from Rensselaer Polytechnic Institute, NY, USA, in 1990.

Ganapathy Subramanian
Vice President & Unit Technology Officer – Big Data & Analytics, Infosys

Title: The Rise of Open Big data & Analytics platform

Open source technologies in the Big Data space are maturing day by day with powerful new features around scalability, performance and security that are a must for enterprise class solution deployments. Although there is a plethora of development around open source in Big Data and Analytics space, enterprises continue to face significant challenges in adopting these open source components. The challenge often lies not in the quality of platform offerings, but in its lack of sufficient tooling around packaging, development, administration and monitoring, and a lack of effective integration into organizational processes. Learn about how Infosys envisages an open source platform with the best of big data and analytics capabilities, running on commodity hardware that can address a diverse set of enterprise use cases across various industries like retail, financial, logistics, healthcare, manufacturing, etc. , thus making it significantly easier for large enterprises to adopt these open source capabilities while accelerating their Data-to-Insights-to-action journey.

Bio for Ganapathy Subramanian:

Ganapathy Subramanian recently joined Infosys as Vice President - Unit Technology Officer in the Big Data & Analytics unit. He is responsible for the service as well as building the Infosys Information Platform or IIP. IIP is an open source cloud based big data platform that can help customers not only get real time insights, but also real time foresights in their transactional data.
Prior to joining Infosys Ganapathy Subramanian was the Global Vice President of the Customer Engagement & Strategic Projects (CE & SP) team in SAP Labs India Bangalore. He was a member of SAP’s extended Global Leadership Team and part of the Global team of Executives at SAP. His main responsibility is towards the development of beautiful applications built on top of SAP’s Technology platform like SAP In-Memory (HANA), Mobile and the cloud. The applications are again being built using Design Thinking methodology. Ganapathy was recognized as “Leader of the Year” as a part of SAP Labs India Annual awards 2013.
Prior to SAP Ganapathy worked as a software Developer at IMR global for a period of 1.5 years. Overall Ganapathy has 16+ years of experience in the IT industry.

Areas of Expertise: Building of Products/Software applications in Big data, SAP In-memory platform and Design Thinking, setting up of high performance organizations, setting up of an Innovation and high result oriented culture within organizations

Educational Background and Achievements: Ganapathy Subramanian has graduated from BITS Pilani, done his MS in software Systems from BITS as well. He holds a diploma in Executive General Management Program from Indian Institute of Management (IIM) Bangalore. Ganapathy has done a senior leadership program at Harvard Business School

Josh Sullivan
Senior Vice President, Booz Allen Hamilton
Jonas Degrave
Ghent University

Title: Data Science for Social Good: How the Data Science Bowl Advanced the Field of Deep Learning and Ocean Science

In March 2015, the first-ever Data Science Bowl was won by a team from Ghent University who developed the most effective algorithm to automatically classify more than 100,000 underwater images of plankton, marking a major step forward for the marine research community, as plankton populations are key indicators of ocean health. The competition, co-sponsored by Booz Allen Hamilton and Kaggle and created with data from oceanographers at Oregon State University Hatfield Marine Science Center, challenged participants to create an algorithm that automates an ocean health assessment process, which would have taken marine researchers more than two lifetimes to manually complete. After 90 days of intense competition, in which the data science community applied its skills to effect change on a global scale, the team of deep learning specialists had won in inaugural challenge. In this talk, Jonas Degrave, a member of the winning team, will discuss the deep learning approach they used to win.

Bio for Josh Sullivan:

Dr. Josh Sullivan is a Booz Allen Hamilton Vice President in the firm’s Strategic Innovation Group, leading the Data Science and Advanced Analytics capabilities. He is a senior leader in emerging technologies, data science, and software development.

Dr. Sullivan drives the vision, strategy, investments, and delivery of complex technology and analytics programs for Booz Allen and our clients. He has more than 17 years of professional experience applying computer science to advance National capabilities in domains such as cyberspace, healthcare, national intelligence, and military operations.

Prior to joining Booz Allen, he was a technical director for an engineering firm in Virginia, leading enterprise software development and deployment for large-scale National intelligence systems. Previously, he was a staff engineer with the US government, where he served numerous technical and engineering roles.

He is a member of IEEE, ACM, AFCEA, and INSA. He serves on the advisory board of Computer Science for two major universities, and represents Booz Allen in the Open Cloud Consortium and OSGi Alliance standards council. He founded and leads the Washington, DC, area Hadoop User Group and is a former Commissioner on the Howard County Human Rights Commission.

He has a B.S. degree in computer science from Missouri University of Science and Technology, an M.S. degree in IT from Johns Hopkins University, and a Ph.D. in applied computer science from Northcentral University.

Bio for Jonas DeGrave:

Jonas Degrave received a M.Sc. degree in electronics engineering at Ghent University in 2012, where he is currently pursuing a Ph.D. in computer science. His research interests are focused on machine learning and how it can be applied on robotics in order to generate more efficient locomotion.