Everything is bigger in the Big Apple. New York City’s government has more than 330,000 employees and 400,000 endpoints to protect, totaling roughly 1 million systems. To defend such a large attack surface, the city has partnered with Google Cloud to power NYC Cyber Command.
The command, which was established in June 2017 via an executive order from Mayor Bill de Blasio, is the city’s centralized cybersecurity defense nerve center, and it works across more than 100 agencies and offices to “prevent, detect, respond, and recover from cyber threats.”
“We built it because we needed to solve a New York City–sized challenge ... with a new, cutting-edge, cloud-first approach that enabled the latest tools and technology to be applied at scale against our problem,” Colin Ahern, NYC’s deputy CISO, told ZDNet earlier this year. “One that would allow us to evolve and stay head of the threat.”
Since the command’s establishment, it has created “an open-source, cloud-based data pipeline to serve as a security log aggregation platform that analysts could use to quickly detect and mitigate threats to city networks and systems,” GCN reports. The command analyzes several terabytes of data per day.
New York Teams with Google to Analyze Cybersecurity Data
To improve scalability, the command built its data pipeline on top of managed services from Google Cloud, such as Cloud Pub/Sub, which serves as an entry point and ingests all manner of data from agencies’ cloud and on-premises sources. Then, the platform allows the command to analyze the data at scale, according to a Google case study.
The service puts log event data into the correct format for analysts and other users, and in some cases, pushes subscriptions and sends event information to stand-alone applications running in Google’s Cloud Functions event-driven serverless compute platform.
“We have data coming from external vendors, and all this data is ingested through Pub/Sub, and Pub/Sub pushes it through to Dataflow, which can parse or enrich the data,” Noam Dorogoyer, a data engineer and IT project specialist at the command, tells GCN. “The way the data comes in can be simple, such as comma-separated. Other times, it’s a mess. There is not a common format among the vendors.”
From there, the cyber command uses Dataflow to shift the data into BigQuery, Google’s serverless cloud data warehouse. That allows the command to put the data into tabular form and makes it easier for analysts to process, GCN reports.
Critically, the data is all flowing in real time. “Real time is king, and that’s the only data valuable to us,” Dorogoyer tells GCN. “If data comes in late, especially when it comes to cybersecurity, it’s no longer valuable, especially during an emergency.”
The command worked with Google to reduce the network latency at every stage of the process, according to Dorogoyer.
“It gives us a very robust amount of options to deal with this cleaner data, which still has retained context, which is the key thing,” Anthony Bocekci, Computer Emergency Response Team specialist, tells GCN. “When it comes to incident response, you are oftentimes reacting to ongoing activities, so having the data available live in front of you — again, parsed with the context still there — it allows you to react appropriately. … It allows you to focus on particular facets of the incident that you may not have the ability to do if the logs were provided to you in a slower format.”
The command uses Google’s Cloud identity and access management tools, including Cloud Identity-Aware Proxy, to determine who has access to what cybersecurity data. The pipeline operates on a zero-trust model, which verifies all users when they try to connect to apps and systems, no matter who they are.
“We wouldn’t be much of a cybersecurity firm if we weren’t careful with who had what permissions,” Dorogoyer tells GCN. “We want everybody to be able to do what they have to do for their job, but they don’t really need more than what they need. … There isn’t any account that would just be able to destroy the entire project and wipe it.”