Java IoT Authors: Elizabeth White, Liz McMillan, Yeshim Deniz, Pat Romanski, Zakia Bouachraoui

Related Topics: @DXWorldExpo, Java IoT, @CloudExpo

@DXWorldExpo: Blog Post

Big Data Analysis Provides View into Biodiversity By @Dana_Gardner | @BigDataExpo #BigData

Big Data generates new insights into what’s happening in the world's tropical ecosystems

The next BriefingsDirect Big Data innovation case study interview explores how large-scale monitoring of rainforest biodiversity and climate has been enabled and accelerated by cutting-edge big-data capture, retrieval, and analysis.

We'll learn how quantitative analysis and modeling are generating new insights into what’s happening in tropical ecosystems worldwide, and we'll hear how such insights are leading to better ways to attain and verify sustainable development and preservation methods and techniques.

To learn more about data science -- and how hosting that data science in the cloud -- helps the study of biodiversity, we're pleased to welcome Eric Fegraus, Senior Director of Technology of the TEAM Network at Conservation International and Jorge Ahumada, Executive Director of the TEAM Network, also at Conservation International in Arlington, Virginia. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Knowing what’s going on in environments in the tropics helps us understand what to do and what not to do to preserve them. How has that changed? We spoke about a year ago, Eric. Are there any trends or driving influences that have made this data gathering more important than ever.

Fegraus: Over this last year, we’ve been able to roll out our analytic systems across the TEAM Network. We're having more-and-more uptake with our protected-area managers using the system and we have some good examples where the results are being used.


For example, in Uganda, we noticed that a particular cat species was trending downward. The folks there were really curious why this was happening. At first, they were excited that there was this cat species, which was previously not known to be there.

This particular forest is a gorilla reserve, and one of the main economic drivers around the reserve is ecotourism, people paying to go see the gorillas. Once they saw that these cats are going down, they started asking what could be impacting this. Our system told them that the way they were bringing in the eco-tourists to see the gorillas had shifted and that was potentially having an impact of where the cats were. It allowed them to readjust and think about their practices to bring in the tourists to the gorillas.

Information at work

Gardner: Information at work.

Fegraus: Information at work at the protected-area level.

Gardner: Just to be clear for our audience, the TEAM Network stands for the Tropical Ecology Assessment and Monitoring. Jorge, tell us a little bit about how that came about, the TEAM Network and what it encompasses worldwide?

Ahumada: The TEAM Network was a program that started about 12 years ago and it was started to fill a void in the information we have from tropical forests. Tropical forests cover a little bit less than 10 percent of the terrestrial area in the world, but they have more than 50 percent of the biodiversity.


So they're the critical places to be conserved from that point of view, despite the fact we didn’t have any information about what's happening in these places. That’s how the TEAM Network was born, and the model was to use data collection methods that were standardized, that were replicated across a number of sites, and have systems that would store and analyze that data and make it useful. That was the main motivation.

Gardner: Of course, it’s super-important to be able to collect and retrieve and put that data into a place where it can be analyzed. It’s also, of course, important then to be able to share that analysis. Eric, tell us what's been happening lately that has led to the ability for all of those parts of a data lifecycle to really come to fruition?

Fegraus: Earlier this year, we completed our end-to-end system. We're able to take the data from the field, from the camera traps, from the climate stations, and bring it into our central repository. We then push the data into Vertica, which is used for the analytics. Then, we developed a really nice front-end dashboard that shows the results of species populations in all the protected areas where we work.

The analytical process also starts to identify what could be impacting the trends that we're seeing at a per-species level. This dashboard also lets the user look at the data in a lot of different ways. They can aggregate it and they can slice and dice it in different ways to look at different trends.

Gardner: Jorge, what sort of technologies are they using for that slicing and dicing? Are you seeing certain tools like Distributed R or visualization software and business-intelligence (BI) packages? What's the common thread or is it varied greatly?

Ahumada: It depends on the analysis, but we're really at the forefront of analytics in terms of big data. As Michael Stonebraker and other big data thinkers have said, the big-data analytics infrastructure has concentrated on the storage of big data, but not so much on the analytics. We break that mold because we're doing very, very sophisticated Bayesian analytics with this data.

One of the problems of working with camera-trap data is that you have to separate the detection process from the actual trend that you're seeing because you do have a detection process that has error.

Hierarchical models

We do that with hierarchical models, and it's a fairly complicated model. Just using that kind of model, a normal computer will take days and months. With the power of Vertica and power of processing, we’ve been able to shrink that to a few hours. We can run 500 or 600 species from 13 sites, all over the world in five hours. So it’s a really good way to use the power of processing.

We’d been also more recently working with Distributed R, a new package that was written by HP folks at Vertica, to analyze satellite images, because we're also interested in what’s happening at these sites in terms of forest loss. Satellite images are really complicated, because you have millions of pixels and you don’t really know what each pixel is. Is it forest, agricultural land, or a house? So running that on normal R, it's kind of a problem.

Distributed R is a package that actually takes some of those functions, like random forest and regression trees, and takes full power of the vertical processing of Vertica. So we’ve seen a 10-fold increase in performance with that, and it allows us to get much more information out of those images.

Gardner: Not only are you on the cutting-edge for the analytics, you've also moved to the bleeding edge on infrastructure and distribution mechanisms. Eric, tell us a little bit about your use of cloud and hybrid cloud?

Fegraus: To back up a little bit, we ended up building a system that uses Vertica. It’s an on-premise solution and that's what we're using in the TEAM Network. We've since realized that this solution we built for the TEAM Network can also be readily scalable to other organizations and government agencies, etc., different people that want to manage camera trap data, they want to do the analytics.

So now, we're at a process where we’ve been essentially doing software development and producing software that’s scalable. If an organization wants to replicate what we’re doing, we have a solution that we can spin up in the cloud that has all of the data management, the analytics, the data transformations and processing, the collection, and all the data quality controls, all built into a software instance that could be spun up in the cloud.

In many of these countries, it's very difficult for some of those governments to expand out their old solutions on the ground. Cloud solutions offer a very good, effective way to manage data.

Gardner: And when you say “in the cloud,” are you talking about a specific public cloud, in a specific country or all the above, some of the above?

Fegraus: All of the above. We'll be using Vertica or we're using Vertica OnDemand. We're actually going to transition our existing on-premise solution into Vertica OnDemand. The solution we’re developing uses mostly open-source software and it can be replicated in the Amazon cloud or other clouds that have the right environments where we can get things up and running.

Gardner: Jorge, how important is that to have that global choice for cloud deployment and attract users and also keep your cost limited?

Ahumada: It’s really key, because in many of these countries, it's very difficult for some of those governments to expand out their old solutions on the ground. Cloud solutions offer a very good, effective way to manage data. As Eric was saying, the big limitation here is which cloud solutions are available in each country. Right now, we have something with cloud OnDemand here, but in some of the countries, we might not have the same infrastructure. So we'll have to contract different vendors or whatever.

But it's a way to keep cost down, deliver the information really quick, and store the data in a way that is safe and secure.

What's next?

Gardner: Eric, now that we have this ability to retrieve, gather, analyze, and now distribute, what comes next in terms of having these organizations work together? Do we have any indicators of what the results might be in the field? How can we measure the effectiveness at the endpoint -- that is to say, in these environments based on what you have been able to accomplish technically?

Fegraus: One of the nice things about the software that we built that can run in the various cloud environments, is that it can also be connected. For example, if we start putting these solutions in a particular continent, and there are countries that are doing this next to each other, there are not going to be silos that will be unable to share an aggregated level of data across each other so that we can get a holistic picture of what's happening.

So that was very important when we started going down this process, because one of the big inhibitors for growth within the environmental sciences is that there are these traditional silos of data that people in organizations keep and sit on and essentially don't share. That was a very important driver for us as we were going down this path of building software.

Gardner: Jorge, what comes next in terms of technology. Are the scale issues something you need to hurdle to get across? Are there analytics issues? What's the next requirements phase that you would like to work through technically to make this even more impactful?

Ahumada: As we scale up in size and  start  having more granularity in the countries where we work, the challenge is going to be keeping these systems responsive and information coming. Right now, one of the big limitations is the analytics. We do have analytics running at top speeds, but once we started talking about countries, we're going to have an the order of many more species and many more protected areas to monitor.

This is something that the industry is starting to move forward on in terms of incorporating more of the power of the hardware into the analytics, rather than just the storage and the management of data.

This is something that the industry is starting to move forward on in terms of incorporating more of the power of the hardware into the analytics, rather than just the storage and the management of data. We're looking forward to keep working with our technology partners, and in particular HP, to help them guide this process. As a case study, we're very well-positioned for that, because we already have that challenge.

Gardner: Also it appears to me that you are a harbinger, a bellwether, for the Internet of Things (IoT). Much of your data is coming from monitoring, sensors, devices, and cameras. It's in the form of images and raw data. Any thoughts about what others who are thinking about the impact of the IoT should consider, now that you have been there?

Fegraus: When we talk about big data, we're talking about data collected from phones, cars, and human devices. Humans are delivering the data. But here we have a different problem. We're talking about nature delivering the data and we don't have that infrastructure in places like Uganda, Zimbabwe, or Brazil.

So we have to start by building that infrastructure and we have the camera traps as an example of that. We need to be able to deploy much more, much larger-scale infrastructure to collect data and diversify the sensors that we currently have, so that we can gather sound data, image data, temperature, and environmental data in a much larger scale.

Satellites can only take us some part of the way, because we're always going to have problems with resolution. So it's really deployment on the ground which is going to be a big limitation, and it's a big field that is developing now.

Gardner: Drones?

Fegraus: Drones, for example, have that capacity, especially small drones that are showing to be intelligent, to be able to collect a lot of information autonomously. This is at the cutting edge right now of technological development, and we're excited about it.

You may also be interested in:

More Stories By Dana Gardner

At Interarbor Solutions, we create the analysis and in-depth podcasts on enterprise software and cloud trends that help fuel the social media revolution. As a veteran IT analyst, Dana Gardner moderates discussions and interviews get to the meat of the hottest technology topics. We define and forecast the business productivity effects of enterprise infrastructure, SOA and cloud advances. Our social media vehicles become conversational platforms, powerfully distributed via the BriefingsDirect Network of online media partners like ZDNet and IT-Director.com. As founder and principal analyst at Interarbor Solutions, Dana Gardner created BriefingsDirect to give online readers and listeners in-depth and direct access to the brightest thought leaders on IT. Our twice-monthly BriefingsDirect Analyst Insights Edition podcasts examine the latest IT news with a panel of analysts and guests. Our sponsored discussions provide a unique, deep-dive focus on specific industry problems and the latest solutions. This podcast equivalent of an analyst briefing session -- made available as a podcast/transcript/blog to any interested viewer and search engine seeker -- breaks the mold on closed knowledge. These informational podcasts jump-start conversational evangelism, drive traffic to lead generation campaigns, and produce strong SEO returns. Interarbor Solutions provides fresh and creative thinking on IT, SOA, cloud and social media strategies based on the power of thoughtful content, made freely and easily available to proactive seekers of insights and information. As a result, marketers and branding professionals can communicate inexpensively with self-qualifiying readers/listeners in discreet market segments. BriefingsDirect podcasts hosted by Dana Gardner: Full turnkey planning, moderatiing, producing, hosting, and distribution via blogs and IT media partners of essential IT knowledge and understanding.

IoT & Smart Cities Stories
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
DXWorldEXPO LLC announced today that ICOHOLDER named "Media Sponsor" of Miami Blockchain Event by FinTechEXPO. ICOHOLDER gives detailed information and help the community to invest in the trusty projects. Miami Blockchain Event by FinTechEXPO has opened its Call for Papers. The two-day event will present 20 top Blockchain experts. All speaking inquiries which covers the following information can be submitted by email to [email protected] Miami Blockchain Event by FinTechEXPOalso offers sp...
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time t...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by ...
SYS-CON Events announced today that IoT Global Network has been named “Media Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. The IoT Global Network is a platform where you can connect with industry experts and network across the IoT community to build the successful IoT business of the future.