Editor’s Note: The following is a summary of notes from GISinc’s developers at the Esri Developer Summit in Palm Springs.
As the availability and scale of data continues to grow, the storage of data is coming into focus more and more. Some Esri employees have been working with MongoDB, and gave a presentation entitled MongoDB Basics. MongoDB is short for humongous database, and is the most popular NoSQL database. Instead of table and record-based storage, MongoDB stores JSON-like objects in what is known as collections. There is no enforced relationship between collections, at least as enforced by the database. For example, in a traditional database, there might be a table with customers, and a table with addresses, and the two tables are joined by a key. MongoDB allows this same setup, but instead of having to have two separate tables, the addresses can be embedded into the customer document. A simple example from the MongoDB documentation:
MongoDB and NoSQL databases in general have flexible schemas. What does this mean? In a relational database, if we, as a developer try to insert a value for a column that doesn’t exist, the database will throw an error. However, MongoDB will gladly take the value, and create a field for that value on the fly. You may have noticed that the second address has a field called “street two, while the first address does not. This is an example of a flexible schema. MongoDB also allows reads from fields that may or may not exist. This could be very helpful in the early development stages of development, especially when the schema is changing rapidly.
Why you need Workflow Manager and Data Reviewer
Inaccuracies in GIS data are the banana peels on our Rainbow Road to success (we can make two Nintendo references in one week, right?). The fact of the matter is sometimes these are easy to spot and avoid, but ultimately we miss a few and maybe even hit one. Then we’re spinning out of control and perhaps off into the abyss of space just to get stuck behind Bowser hoping to get a lightning bolt. Well friends, I am here to tell you that a lightning bolt doesn’t solve the problem. In this case, you also need star power. Esri has that lightning bolt and star power wrapped up in the Workflow Manager and Data Reviewer extensions. Together, these extensions put you on the fast track to blow through the bananas and crush Peach to take the win.
For those still reading, let’s get serious. The Workflow Manager and Data Reviewer extensions are powerful as standalone extensions, but together they can redefine your data management practices. Workflow Manager defines how data editors make decisions throughout the data management process. As decisions are being made, Workflow Manager illustrates what the user is to do. For example, parcel editing in local government goes through many phases, and sometimes multiple departments. This results in many hands having a role on adding a parcel into a GIS dataset. Using Workflow Manager outlines what each person’s role is, the order in which they are involved, and the flow chart to handle each decision in a phase. This structured system standardizes the full editing process and reduces error in the data.
Now error is inevitable, and the multiple types of errors that can exist in data can be difficult to account for. Data Reviewer allows organizations to implement a quality assurance plan full of quality checks. These checks are created from more than 40 out of the box checks that Esri software can look for. These include checks against the spatial accuracy, attribute accuracy and metadata accuracy, The output of these checks are then stored in a reviewing workspace that editors can then use to fix their data. These checks can be saved in batch files which can execute numerous checks at the same time on specific sets or entire datasets. This ensures that GIS data is reviewed the same way, every single time. Think about how powerful that is? That’s how you win with the lightning bolt and start power. See you on the track.
Without a big group session (like the plenary or keynote speaker), today was simply full of session after session. Here are some of my big takeaways from today:
- ArcGIS Quartz Runtime has some significant improvements to both raster and polygon rendering … reducing the number of draw calls and being smarter about what to display (no more blank map tiles when you pan and zoom out quickly)
- The team combining ArcGIS and Hadoop, shared tools for analyzing big data, showing impressive results that nearly cut processing times in half with every doubling of additional nodes.
- Overall one of my favorite themes throughout almost all the sessions I attended today (and the week) was that they were writing code in their presentations. It is one thing to talk about features and show canned demos … but to write, compile and demo code on the fly during a presentation is exciting and enhances the experience. Kudos to the Presenters!
It was a lot to take in, and a bit of an exhausting day … Fortunately, they ended the day with a big party and a dodgeball tournament. Our team “Gone In 60 Seconds” made it to the third round (Sweet 16) before falling out of the race (with only a small rug-burn on the injury report). But that just meant we could enjoy the food, drinks, rides, and games earlier than the remaining teams ;)
The first demo I say was a map of counties where there was data related to people political leanings per county was visualized with 3D cylinders. What was really interesting about this was that various properties about the columns were tied to data from the county. The color had to do with the percentage of people who associated with a certain political party, but that’s not all! The color was interpolated as a value between two colors based on a value falling between a range set by the developers. This is a very powerful rendering technique that excited me! The diameter of the column and its height were also tied to attributes returned from the feature service. This binding of so many properties in the symbology to attributes gives developer amazing control to tell stories with data limited almost only to the imagination.
The second demonstration was using data related to trees on a college campus. This feature layer has trunk diameter, canopy dimensions, and carbon containment data for each tree on campus. Using a combination of renderers they were able to essentially draw representation of the trees on campus. The data from the services was then rendered as a cylinder and a shaped sphere whose shape is determined by the attributed of the canopy data. It was too cool and started the gears in my mind to turning and thinking of ways to impress customers in projects where I have created charts and other data visualization methods and wondering if I can use this to take it to the next level!
The strategic sessions included "Write Better Code" and "Defend Against the Caveman Coder". These were about improving yourself as a developer not by learning a new tool or skill (although that is important) but by better understanding what bad code looks like, what does a bad developer look like, how do we improve our team as a whole, how to avoid becoming stuck as an "expert beginner" and other fuzzier but critically important topics.
The ideas covered in these two sessions are too numerous to list in their entirely of this post. I definitely recommend catching a recording of these sessions if possible. They really make one think about deeper but less tangible aspects of software development, the development process, and personal growth as a developer.
Note: You can get a taste for the essence of the Caveman Coder session in How Developers Stop Learning: Rise of the Expert Beginner which has a pretty similar message.
As datasets get larger with new innovative technologies, it has become difficult to deal with the large data volumes. An individual standard computer has not scaled equivalently. The session “Big Data: Using ArcGIS with Apache Hadoop” went over ways to mitigate this issue in a cost effective, efficient and environmentally friendly manner. Of course, your organization can acquire multiple brand new servers with excessive memory but, you do not have to do that. Just ask around the office for old computers and daisy chain them together to utilize all of its cores for large data processing.
What is Hadoop? Hadoop is a scalable open source framework for the processing of large datasets on clusters of hardware. Hadoop is used for distributed storage and distributed processing. Multiple frameworks were created to extend Hadoop and are all a part of Apache. The demo went over Hive and Spark. These are SQL like languages that allow you to query the big data.
During the demo the instructor displayed all of the FAA departure and arrival points for a year. All of the points stacked on top of one another, was not readable and could not be used for analysis. To put the large data in a more readable format and to identify patterns within the data, Hadoop consumed the large dataset in comma delimited format, split the data up, allocated the data to various nodes and aggregated the data points to a bin file. A bin file is a form of a grid, which then can be summarized by symbolizing the data in ArcGIS. The bin polygons showed darker colors over the ocean because there were no flights arriving or departing. The bin file is a much readable format for analysis and visualization.
Today was all about UX and UI design as it applies to mapping. Attended sessions included User Experience and Interface Design for Map Apps, Designing Apps with Calcite-Maps, and CSS: Rules to Style By. Additionally, I had some valuable time with one of ESRI’s UX experts in a one-on-one chat about SmartSite and how it can be improved. One of the great resources available is http://www.designingmapinterfaces.com which provides tons of great patterns and best practices when designing mapping applications. The author, Mike Gaigg, presented the UX and UI session and delivered some of those patterns with additional context for how to approach the tasks. Mike presented with Alan LaFramboise who introduced us to a nice styling tool for testing different designs: http://esri.github.io/calcite-maps/extras/styler/styler.html. This tool is a simple WYSIWYG application for quickly seeing how different styles can affect an app using the new Calcite CSS framework.
Similar to the Bootstrap framework, Calcite is ESRI’s once-internal framework for styling applications. Again, similar to Bootstrap, Calcite began as an internal resource and is now available for public use and is the framework for all of ESRI’s web pages. A number of resources are available based on the ESRI’s style guidelines including calcite-web (which provides the core principles), calcite-bootstrap, calcite-maps, and calcite-colors. Each repository is available on GitHub and can be easily accessed here: http://esri.github.io/#Calcite
Finally, unrelated to UX was probably my favorite session of the entire event, Web3D and the JSAPI. This was an advanced session presented by ESRI R&D Zurich and went deep into WebGL and how to use external renderers to bring in custom WebGL applications to use with an ESRI map. The demo for external renderers even showed an HTML video inside the map’s canvas. This session also previewed a future feature for 3D that will likely be in v4.1: highlighting 3D graphics. The highlighting was quite impressive and was especially useful when in an urban area where buildings may obscure selected features. In this situation, the highlight will show through so the users always know where their selections are.